1. Introduction
The number of Internet of Things (IoT) nodes has significantly increased. To reduce battery usage, most IoT nodes rely on energy harvesting circuits to supply power, which imposes stringent power constraints on the overall design of the nodes. Due to their high-frequency operation and the need for continuous operation, receivers responsible for communication with base stations have become a major power-consumption bottleneck.
In recent years, many receiver designs targeting low-power IoT applications have been reported. RF direct envelope detection eliminates all clock generation and intermediate frequency (IF) amplification modules [
1,
2,
3,
4,
5,
6], achieving power consumption in the micro-watt range. However, its limited sensitivity and poor interference immunity restrict its use to local area networks. To achieve higher sensitivity, the LNA-first architecture has become a popular choice [
7,
8,
9,
10,
11,
12], but signal amplification at the RF front end inevitably leads to a reduction in receiver efficiency. To lower power consumption while improving sensitivity and interference immunity, the mixer-first architecture has been widely adopted in the design of low-power receivers [
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25]. Nevertheless, the clock generation circuit still consumes considerable current [
26,
27,
28,
29].
In the design of transceivers for IoT applications, OOK modulation is often used to simplify circuit design and reduce power consumption. Non-coherent demodulation relaxes the requirements for the local oscillator (LO) phase accuracy and stability, enabling low-power implementation with a free-running oscillator. Refs. [
30,
31] adopts an uncertain-IF structure, which eliminates the phase-locked loop (PLL), significantly reducing the power consumption of the clock generation circuit. However, due to the absence of a feedback loop, free-running oscillators typically experience a significant frequency offset, leading to a wide IF frequency range. This makes it difficult to align the receiver’s channel center frequency with the desired signal. To ensure that the useful signal falls within the receiver’s passband, it is inevitable to use baseband circuits that cover the entire potential IF frequency range. This results in increased in-band noise power and adjacent channel interference, which degrades receiver sensitivity. This issue is particularly noticeable in narrowband communication.
Traditional frequency estimation algorithms commonly used to address the frequency offset of free-running oscillators include FFT-based [
32], PLL-based [
33], and Zero-Crossing detection [
34] methods. FFT-based algorithms transform the received signal into the frequency domain and estimate the frequency offset from the spectral peak location. However, their accuracy is easily affected by high-power interference signals. PLL-based algorithms continuously track the phase difference between the input and local oscillator and can estimate the frequency once locked. However, they involve high design complexity and substantial hardware overhead. Zero-crossing algorithms estimate the frequency by detecting the zero-crossing intervals of the signal. While simple, their frequency estimation accuracy is poor under low SNR conditions.
To achieve both low power consumption and high sensitivity, this paper presents a reconfigurable channel receiver operating in the 900 MHz band, designed for OOK modulation with a data rate of 100 kbps. The receiver employs a free-running oscillator to generate the LO signal, with the operating mode switched in the digital baseband (DBB). In multi-channel mode, the proposed preamble-based frequency estimation (PBFE) algorithm performs IF frequency estimation by capturing a specific preamble sequence corresponding to the desired channel using multiple parallel sub-channels implemented in the DBB, which collectively cover the expected IF frequency range. The same sub-channel structure offers good scalability and the potential for hardware efficiency optimization compared with traditional frequency estimation methods. Each sub-channel performs frequency estimation through hierarchical processing and correlation peak detection, which allows for high-frequency estimation accuracy even in the presence of strong noise and interference. In single-channel mode, a single sub-channel centered at the estimated IF frequency is selected for data reception, using a finite impulse response (FIR) filter that matches the passband frequency with the useful signal bandwidth. This helps to suppress out-of-band noise and interference, thus improving receiver sensitivity. Measurement results show that the PBFE algorithm achieves an effective IF signal-to-noise ratio (SNR) threshold of 2 dB, with frequency estimation error below 22 kHz. In single-channel mode, with a residual frequency offset of 30 kHz, an 8-point energy accumulation decoding scheme achieves a bit error rate (BER) of 10−3 at an IF SNR of 5.2 dB. Compared to the case without frequency estimation, where a 50 kHz IF frequency offset exists, the SNR improves by 4.1 dB.
Ref. [
35] proposes a data-startable baseband logic applied to wake-up radios (WURs), particularly showcasing a clock data recovery circuit based on a gated oscillator. The gated oscillator outputs clock edges that sample the input signal, and the edge transitions of the input data are captured to eliminate accumulated phase misalignment caused by differences in the data rate and the gated oscillator frequency. This technique allows for data reception while maintaining a simple baseband logic circuit with extremely low power consumption. Coupled with the analog front-end circuit, the receiver uses a direct RF envelope detection structure that eliminates the need for an RF local oscillator and intermediate frequency amplification, making the overall wake-up receiver power consumption in the nanowatt range, suitable for ultra-low-power IoT applications. However, due to the absence of down-conversion and intermediate frequency filtering, the input bandwidth of the RF envelope detector is much larger than the useful signal bandwidth, leading to a lower sensitivity for this WUR.
To improve receiver sensitivity while maintaining energy efficiency, the receiver proposed in this work adopts a mixer-first structure and utilizes a FIR filter bank to suppress out-of-band noise and interference signals. Furthermore, to reduce RF local oscillator power consumption, the free-running oscillator and PBFE algorithm are implemented to mitigate the impact of oscillator frequency offset on sensitivity. Compared to the design presented in [
35], the receiver proposed in this work achieves higher sensitivity at the cost of increased power consumption and circuit complexity. This trade-off allows for better performance in terms of sensitivity, making it more suitable for scenarios where improved signal reception is crucial.
The rest of the paper is organized as follows:
Section 2 describes the proposed receiver architecture and the PBFE algorithm.
Section 3 presents the design details of the digital baseband circuit.
Section 4 presents the receiver measurement results.
Section 6 concludes the paper.
3. Digital Baseband Circuit Design
3.1. Quadrature Envelope Detector
Figure 7 shows the block diagram of the QED structure, which consists of a quadrature digital LO, multipliers, a FIR bank, and an adder. To accommodate FPGA implementation, an 8-bit ADC is employed to digitize the analog IF signal. The digitized IF signal
is first mixed with the quadrature digital LO signal, followed by FIR filtering to suppress unwanted frequency components. The resulting difference-frequency components are then self-mixed, and the two orthogonal paths are summed to obtain the demodulated 16-bit digital baseband signal
. To minimize power consumption and FPGA resource utilization in the subsequent processing stages, bit-width truncation is applied throughout the signal chain without significant degradation in accuracy. The output bit-width of each module is annotated in
Figure 7.
The digital LO is generated by a direct digital synthesis (DDS), which comprises a phase accumulator and a phase-sine lookup table (LUT). The phase accumulator consists of an adder and a register. On each rising clock edge, a fixed phase increment determined by the frequency control word (FCW) of each sub-channel is added to the accumulated phase. The accumulated phase value is stored in the register and serves as the N-bit address input to the phase-sine LUT, which outputs the corresponding M-bit amplitude value. The output frequency
of the DDS is expressed as:
From Equation (
4), it can be observed that the output frequency resolution of the DDS is determined by the master clock frequency
and the phase address bit width
N. Due to the finite word length of both the phase and amplitude in the DDS, nonlinear quantization effects occur, generating higher-order harmonics that can fold back a large amount of out-of-band noise into the signal band, resulting in the in-band SNR degradation. This nonlinearity becomes more pronounced as the bit width decreases. Therefore, in this design, the phase address bit width
N is set to 10, and the signal amplitude bit width
M is set to 8. Furthermore, to overcome the limitation of frequency estimation accuracy imposed by the sub-channel center frequency spacing, and to ensure that the subsequent FIR bank can effectively suppress the higher-order harmonic components of the digital LO signal, the DDS output frequency must be much lower than the master clock frequency, typically below one-tenth of
. In this design, both the ADC sampling clock and DDS clock frequency
are set to 25.6 MHz, which is 256 times the baseband data rate. Consequently, the DDS frequency resolution is 25 kHz, a value that is divisible by the sub-channel center frequency vector [500 kHz:100 kHz:1.5 MHz]. This configuration eliminates the frequency error of the DDS-generated digital LO and fully exploits the discrete frequency estimation characteristics of the PBFE algorithm.
Considering the folding symmetry of the sine wave, the phase-sine LUT only needs to store the amplitude values corresponding to the first quarter wavelength in the DDS implementation. The remaining three-quarters can be obtained through phase address mapping and amplitude sign inversion. Therefore, the actual required storage bit width
is:
In the QED, the FIR filter bank must preserve the difference-frequency components generated by mixing the useful signal with the digital LO, while effectively suppressing both the sum-frequency components and the difference-frequency signal produced by adjacent-channel interference. Considering the narrowband characteristics of the useful signal and the potential adjacent-channel interference, the FIR filter bank is required to exhibit a steep transition band. However, at the high mixer output clock frequency of 25.6 MHz, achieving such a sharp transition necessitates a very high filter order, which would significantly increase hardware resource consumption. To address this issue, a multi-stage downsampling strategy is adopted. This approach reduces the clock frequency of the subsequent stages, thereby lowering dynamic power consumption and simplifying the implementation of the following filters.
Figure 8 shows the block diagram of the FIR filter bank structure used in this design. The filter bank adopts a three-stage cascaded configuration consisting of a cascaded integrator-comb (CIC) filter, a half-band (HB) filter, and a FIR filter. The main clock
(25.6 MHz) is first divided by 16 to generate
(1.6 MHz), and then further divided by 2 to obtain
(800 kHz). The CIC filter performs 16× downsampling in the
clock domain, the HB filter performs 2× downsampling in the
clock domain, and the final FIR filter operates in the
clock domain. To ensure reliable data transfer between stages, the output registers of the first two stages of downsampling filters are synchronized with the clock domain of the subsequent stage.
Figure 9 shows the block diagram of the three-stage cascaded CIC filter used in this design. In this configuration, the three integrators operate in the
clock domain, while the three differentiators operate in the
clock domain. A 16× downsampling is performed between the integrators and differentiators.
Figure 10 shows the frequency response of the CIC filter used in this design. By three stages of cascading, the stopband attenuation at the first sidelobe is −39.4 dB.
Figure 11 shows the block diagram of the 10th-order polyphase HB filter used in this design. The leftmost register operates in the
clock domain, while the remaining circuits operate in the
clock domain. Since downsampling is performed before filtering, the polyphase implementation reduces 4 registers compared to the transpose structure. The transfer function of this 10th-order polyphase HB filter is:
The HB filter is designed in MATLAB R2020b using the filterDesigner tool with a 10th-order specification and a stopband attenuation of 40 dB. The resulting coefficients are quantized to 8 bits, with the most significant bit (MSB) representing the sign.
Figure 12 shows the unquantized time-domain impulse response of the 10th-order polyphase HB filter, along with the frequency response before and after coefficient quantization. Due to the finite word length effect of the 8-bit quantization, the stopband attenuation at a normalized frequency of 0.85 decreases to −35.7 dB, which is a 4.3 dB degradation compared to the floating-point coefficients. However, since the
clock frequency remains much higher than the baseband bandwidth of the useful signal, this degradation has a negligible impact on the in-band SNR.
Figure 13 shows the 10th-order FIR filter used in this design, which adopts a transpose structure to minimize the critical path delay.
The FIR low-pass filter is designed in MATLAB using the equiripple method with a passband normalized frequency of 0.25, a stopband normalized frequency of 0.5, a passband ripple of 1 dB, and a stopband attenuation of 40 dB. The filter coefficients are quantized to 8 bits, with the MSB serving as the sign bit.
Figure 14 shows the unquantized time-domain impulse response of the 10th-order FIR filter, as well as the frequency response before and after coefficient quantization.
The primary source of timing latency in the DBB is the group delay from the cascaded FIR filter bank. The first stage 16× down-sampling third-order CIC filter, which operates at a 25.6 MHz clock frequency and introduces a group delay of 16 clock cycles. The second stage 10th-order HB filter, which performs 2× down-sampling at a 1.6 MHz clock frequency, resulting in a group delay of 5 clock cycles. The third stage 10th-order FIR filter operating at 800 kHz, contributing an additional group delay of 5 clock cycles. The total timing delay of the PBFE algorithm for convergence is 256 clock cycles, which corresponds to 10 s at the 25.6 MHz system clock frequency. This delay is also equivalent to one bit period.
The CIC filter has a −3 dB passband frequency of approximately 420 kHz and a −39.4 dB stopband frequency around 1.3 MHz. The quantized HB filter has a −3 dB passband frequency of about 360 kHz and a −35.7 dB stopband frequency around 540 kHz. The quantized FIR filter has a −3 dB passband frequency of around 125 kHz and a −40 dB stopband frequency of approximately 200 kHz. Due to the down-sampling in the first two filter stages, the overall FIR filter bank exhibits a sharp transition band. The frequency response of the entire FIR filter bank is mainly determined by the third-stage FIR filter, which has a −3 dB passband that covers the 100 kHz baseband bandwidth of the useful signal.
3.2. DC Offset Detector
Figure 15 shows the block diagram of the DOD, which includes a moving average filter (MAF), an integrator, a comparator, and a register. The target detection signal for the DOD is the first part of the preamble, specifically, the fixed ’10101010’ sequence. The 16-bit baseband signal
output from the QED is passed through the MAF to obtain the current signal average
. This average is compared with the amplitude detection threshold
. When the average exceeds
, the signal is considered valid, and the first comparator outputs a logic ‘1’. The output of the first comparator
is then integrated. When the output of the integrator
reaches the detection count threshold
, the signal
is considered stable, and the second comparator generates the correlation enable signal
, which transitions to a high level. Simultaneously,
serves as the clock signal for the 16-bit register. On the rising edge of the clock, the current value of
is stored, and this value is passed to the subsequent circuit as the DC offset component
. To reduce the power consumption and resource usage of the subsequent circuit modules, all operations in this process are width-truncated, ensuring that accuracy is maintained without significant degradation. The output bit widths for each module are labeled in
Figure 15. The DOD operates in the
clock domain, with one bit period corresponding to 8 sample points.
The MAF is implemented as a 32-tap FIR filter with all coefficients set to 1. Its frequency response exhibits the characteristics of a sinc function, with the main lobe width being 1/16th of the Nyquist frequency, which corresponds to 25 kHz. The alternating pattern of ‘1’ and ‘0’ bits in the fixed ‘10101010’ sequence effectively reflects the DC component of the signal. During reception of this sequence, the MAF averages the signal over a 4-bit period time window, yielding the DC component of the demodulated signal while suppressing random noise through multi-point averaging. Although a longer time window for sliding averaging would provide a more accurate DC component, it would increase system delay and the implementation complexity of the MAF. The choice of a 32-tap filter length is the result of balancing these trade-offs. Since the input signal is 16 bits, summing 32 data points requires a 5-bit sign extension, resulting in an internal register width of 21 bits for the MAF. The top 16 bits of the final register are then selected as the output.
The amplitude detection threshold
is set to determine whether the input signal is valid. When the signal amplitude is below
, it is considered that only noise has been received, and the detection process continues. The value of
is related to the receiver parameters, and its expression is:
where
is the voltage amplitude corresponding to the receiver sensitivity,
is the receiver link gain,
is the ADC full-scale range, and
is the scaling factor for adjusting
, with a value of 0.1 in this design. The factor of 1/2 accounts for the alternating high and low levels in the fixed sequence, and the factor of
ensures that
is scaled to match the output signal of the MAF.
3.3. Correlation Value Generator
Figure 16 shows the block diagram of the CVG, which includes an adder, a matching filter, a comparator, and a maximum value iteration module. The target detection signal for the CVG is the second part of the preamble, specifically, the predefined 31-bit PRBS sequence. After the DOD outputs the
signal, the output from the QED is fed into the CVG. The input signal is then subtracted by the DC component
, resulting in a demodulated signal
containing only the AC information. This signal is subsequently passed through a matching filter, which uses the PRBS sequence as a reference, to generate the real-time correlation value
. Finally, the maximum correlation value
is obtained through a delay and maximum value selection operation.
The CVG operates in the clock domain, where one bit period corresponds to 8 sampling points. Consequently, the matching filter has a length of 248 taps, with the filter coefficients arranged in reverse order relative to the PRBS sequence. In the PRBS sequence, a bit ’1’ corresponds to a filter coefficient of +1, while a bit ‘0’ corresponds to −1, represented as 2-bit binary numbers. The matching filter performs only addition operations. To prevent precision loss when summing 248 signed 16-bit numbers, 8-bit sign extension is applied. Therefore, both the internal registers and the output signal of the filter are set to a width of 24 bits.
When high-power OOK-modulated interference is present, the demodulated output of the QED may still trigger the DOD module to produce a valid signal
, leading the CVG module to produce real-time correlation values
, which could result in incorrect frequency estimation and erroneous enabling of the single-channel data reception mode. To mitigate this issue, the design exploits the fact that the interference and the desired signal use different PRBS sequences. As a result, the
obtained from an interference is significantly smaller than that from the desired signal. In this design, a comparator is employed to determine whether a real-time correlation value is valid. The threshold of the comparator
is expressed as:
where
G denotes the number of taps in the matching filter, and
is a scaling coefficient used to adjust
. In this design,
is set to 0.2. The matching filter output is considered valid only when
, at which point the comparator outputs a high-level signal
. When the logical OR of the
signals from all sub-channels equals 1, it indicates that at least one sub-channel has produced a valid correlation value. Consequently, the CVG module asserts a high-level signal
, enabling the frequency estimator.
3.4. Frequency Estimator
Figure 17 shows the block diagram of the FE, which includes a set of multiplexers, a quadratic interpolation module, and two register arrays: one storing the maximum correlation values of all sub-channels generated by the matching filters, referred to as
, and the other storing the corresponding valid correlation signals, referred to as
. In this design, the assumed initial frequency offset range corresponds to 11 sub-channels, so the sizes of the two arrays are 24 × 11 and 1 × 11, respectively. Each multiplexer, controlled by the corresponding
signal, outputs either the maximum correlation value
of the sub-channel or zero. The 11 outputs are then compared to identify the largest value, whose address
n represents the integer part of the frequency estimation. The values in
at addresses
,
n, and
are used for quadratic interpolation to obtain the fractional part
a, improving the frequency estimation accuracy.
According to the design of the quadrature digital LO module, the frequency estimation accuracy must exceed the DDS frequency resolution of 25 kHz. Since the interval between adjacent sub-channel center frequencies is 100 kHz, the precision of the fractional part a must be less than 0.25. In this design, the fractional part a is represented with 4 bits: the MSB is the sign bit, and the remaining 3 bits represent the magnitude. Therefore, the resolution of a is 0.125, and the representable range is from −0.875 to +0.875.
Quadratic interpolation is an effective method for data fitting. Assuming that the maximum correlation values at address
n and its neighboring points follow a parabolic function, the offset
a relative to the vertex of the parabola can be expressed as:
where
,
and
represent the maximum correlation values at addresses
,
n and
respectively. The estimated frequency
can then be expressed as:
where
represents the lower bound of the IF frequency range, which is 500 kHz in this design. During the single-channel data reception stage, the digital LO frequency control word
is obtained by rounding the estimated frequency
to the DDS frequency resolution. Once the frequency estimation is completed, the FE outputs a high-level signal
, enabling the receiver to enter the single-channel data reception mode.
3.5. Finite State Machine
Figure 18 shows the state transition diagram of the FSM, which controls the transition of the receiver from the multi-channel frequency estimation mode to the single-channel data reception mode. In the diagram, each arrow is labeled with the transition condition
x and the corresponding output
y for the next state; if the condition is not met, the FSM remains in the current state.
The FSM consists of five states. The initial state serves as the starting point for the receiver operation. When the asynchronous reset signal = 0, all internal registers are cleared. On the rising edge of , the FSM transitions to the DC offset detection state and outputs the mode selection control word = 1. When the condition = 1 is met, the FSM moves to the correlation value computation state and outputs = 1. Correspondingly, when = 1 and = 1, the FSM transitions to the multi-channel frequency estimation state and single-channel data reception state, outputting = 1 and = 1, respectively. On the falling edge of , the FSM returns to the initial state, clears , and waits for the next receiver enable signal.
4. Measurement Results
Figure 19 shows the system block diagram of the front-end chip of the receiver [
36], which mainly consists of the radio frequency front end (RFFE), the ABB, and the LO. The RF input signal is first down-converted by a passive mixer based on a Gm-C unit, then amplified by a TIA before entering the ABB. In this work, the ideal IF frequency of the receiver is set at 1 MHz. In joint testing with the DBB, the center frequency of the equivalent BPF at 1 MHz is adjusted by fixing the transconductance value and tuning the capacitor values.
The analog baseband circuit comprises a first-stage programmable gain amplifier (PGA1), a BPF, and a second-stage programmable gain amplifier (PGA2). The BPF adopts a dual-second-order Gm-C architecture, with a passband center frequency at the ideal IF of 1 MHz and a −3 dB bandwidth covering the possible IF range from 500 kHz to 1.5 MHz. Each of the two PGAs provides an 18 dB gain tuning range to enhance the dynamic range of the receiver. The single-channel IF signal output from the ABB is sampled and quantized by an ADC before being processed by the DBB.
The LO circuit consists of an oscillator, a frequency divider, and a MUX. The oscillator is a two-stage differential ring oscillator operating in an open-loop configuration, with its frequency tunable by adjusting the injected bias current. The frequency divider converts the differential input clock into a four-phase output clock to control the mixer switches. The clock source can be selected between the internal oscillator output and an external differential clock via the MUX enable signal.
With a supply voltage of 1.2 V, the total simulated power consumption of the RF/analog front-end circuit is 687 W.
To facilitate correlated testing and verification with the digital baseband, a printed circuit board (PCB) was fabricated. The chip was bonded onto the PCB using chip-on-board (COB) packaging. The PCB employs a daughter–mother board design, with partitioned layouts for different types of grounds within the chip. The daughter board mainly hosts the chip pads and high-frequency signal interfaces, while the mother board contains various power and bias circuits. To minimize noise coupling, the pads corresponding to different ground types on the chip are connected to their respective PCB pads during bonding. These separate grounds are then interconnected at the far end of the PCB through 0-ohm resistors.
The digital baseband of the receiver was implemented on a Xilinx Artix-7 series FPGA, specifically the XC7A35T device, which features a standard package with 484 pins and a speed grade of −2. The FPGA offers 20,800 lookup tables (LUTs), 41,600 flip-flops (FF), 90 DSP slices, and 250 available I/O pins.
Due to the limited resources available on the FPGA, only a single channel of the DBB was implemented on the FPGA board during the actual test. The hardware resource utilization of the implemented sub-channel is summarized in
Table 1. Among all modules, the CVG consumed the largest number of LUTs and registers, primarily due to the inclusion of a matching filter, the order of which is proportional to the length of the PRBS sequence. The DSP consumption was entirely attributed to the multipliers in the QED module, which are utilized in the HB and FIR filters for signed multi-bit multiplications. For the FPGA implementation, the power consumption of a single sub-channel is 204 mW, simulated at a 200 MHz crystal oscillator clock frequency.
Figure 20 shows the measurement setup of the receiver. For power supply, an Agilent E83631A triple-output DC power supply provides a 5 V input to the PCB test board. Several ADI LT3042 low-dropout linear regulators (LDOs) are integrated on the PCB to generate the necessary supply voltages, bias currents, and bias voltages for different on-chip functional modules, ensuring stable operation of the chip under various test conditions.
For signal input, an Agilent E4438C vector signal generator (VSG) generates a 900 MHz RF signal that is fed into the chip. The KEYSIGHT 33600A arbitrary waveform generator (AWG)produces the original baseband bit sequence, which is split into two paths using a power splitter. One path is used as the modulation source for the 900 MHz VSG, while the other path is connected to a ROHDE & SCHWARZ RTO2044 16-bit oscilloscope for waveform observation and data acquisition.
Due to parasitic effects introduced by surrounding dummy metals, the on-chip ring oscillator is unable to reach the nominal 1.8 GHz frequency. To address this issue, an Agilent E8267D VSG is employed to generate an external 1.8 GHz LO signal, which is then converted into a differential clock through a balun and fed into the chip.
The differential IF output signals (Vop and Von) from the I-path are first amplified and converted into single-ended form by an ADI AD8421 PGA. The amplified signals are then sampled and quantized by an ADI AD9238 ADC. The lower 8 bits of the ADC output are sent to the FPGA for frequency estimation and data demodulation. The ADC operates at a sampling clock frequency of 25.6 MHz, which is generated by the FPGA. The outputs of the PGA and FPGA are connected to the oscilloscope, which are respectively used for IF signal SNR measurement and DBB verification.
A PC communicates with the FPGA through a JTAG interface. The Vivado design suite, along with the built-in integrated logic analyzer (ILA) IP core in the FPGA, is used for DBB configuration and debugging during the test process.
A 50 kbps pseudorandom sequence was Manchester-encoded to generate encoded data with a symbol rate of 100 kS/s, which served as the original data used for testing. The encoded data were modulated onto the RF input signal using the AWG, with the RF carrier set to 900 MHz+, while the external LO frequency was fixed at 1.8 GHz. The OOK-modulated IF signal at , amplified by the PGA, was captured by an oscilloscope and processed using a 51,200-point FFT. The oscilloscope sampling rate was 25.6 MS/s, identical to that of the ADC. Within the Nyquist bandwidth, the main-lobe region ( ± 100 kHz) was defined as the signal band, whereas the out-of-band region was considered noise. Based on this definition, the SNR of the IF signal was calculated. By varying the RF input power, OOK-modulated IF signals with different SNR levels were generated for testing.
Figure 21 shows the power spectral density (PSD) of the IF signal at an SNR of 10 dB and
of 1.03 MHz. As can be observed from the figure, the useful signal occupies the frequency range from 0.93 MHz to 1.13 MHz. Due to the filtering effect of the RFFE and the ABB circuits, the noise spectrum exhibits a band-pass characteristic over the entire Nyquist bandwidth.
The preamble sequence was OOK-modulated onto the RF input signal, while the external LO frequency was maintained at 1.8 GHz. The differential IF signal from the I-path of the ABB was amplified and converted to a single-ended signal by the PGA, followed by sampling and quantization by the ADC. The digitized data were then sent to the FPGA for testing. The maximum correlation values of each sub-channel were captured using the ILA tool in Vivado. The center frequency vector of the DBB sub-channels was configured as [500 kHz:100 kHz:1.5 MHz]. Due to limited FPGA resources, only one sub-channel was implemented at a time. With the analog IF signal kept constant, the correlation value tests were performed by varying the center frequency of the sub-channel implemented on the FPGA, repeated 11 times in total. Finally, the correlation outputs from all sub-channels were post-processed in MATLAB using three-point quadratic interpolation to obtain the estimated IF frequency. By varying the RF input frequency, the frequency estimation errors were evaluated for different IF frequencies.
Figure 22 illustrates the frequency estimation errors at different IF frequencies under an SNR of 10 dB. To comprehensively cover various practical IF scenarios, the tested IF frequency vector was set as [510 kHz:110 kHz:1.5 MHz], corresponding to frequency offsets from adjacent sub-channel center frequencies in the range of [10 kHz:10 kHz:50 kHz]. It can be observed that within the IF frequency range of 500 kHz to 1.5 MHz, the frequency estimation errors of the PBFE algorithm remain within ±25 kHz, corresponding to a relative frequency estimation accuracy of ±27 ppm. The estimation errors become relatively larger for IF frequencies at 730 kHz and 1.17 MHz. Both cases correspond to an actual IF frequency offset of 30 kHz from the nearest sub-channel center frequency, which can be regarded as the worst-case condition for frequency estimation performance.
To evaluate the minimum SNR required for the PBFE algorithm to perform reliable frequency estimation under the worst-case scenario, the IF frequency was fixed with a 30 kHz offset relative to the nearest sub-channel center frequency. By varying the input RF signal power, the frequency estimation errors were measured under different SNRs, as shown in
Figure 23. As illustrated in the figure, when the SNR exceeds 2 dB, the frequency estimation errors remain relatively stable and are below 22 kHz, corresponding to a relative frequency estimation accuracy of 24 ppm. In this range, frequency estimation is effective, with the remaining error primarily originating from the interpolation of the fractional part. When the SNR is between 0.7 dB and 2 dB, the frequency estimation error increases to 30 kHz, indicating that while the integer part of the estimated frequency is correct, the fractional part is zero, which signifies a failure in interpolation. For SNR below 0.7 dB, the maximum correlation values of all sub-channels are zero, rendering frequency estimation impossible. These results indicate that the PBFE algorithm requires a minimum IF SNR of 2 dB to achieve effective frequency estimation, with errors remaining below 22 kHz.
To access the performance enhancement of the receiver achieved by the PBFE algorithm, BER tests were conducted. The external LO frequency was set to 1.8 GHz, with the DBB sub-channel center frequency fixed at 1 MHz. By varying the RF input frequency, IF signals of different frequencies were generated. A 50 kbps pseudorandom sequence was Manchester encoded to produce a symbol rate of 100 kS/s, serving as the raw test data. This encoded data was modulated onto the RF input signal using an AWG. Both the AWG output and the FPGA-decoded output were connected to an oscilloscope for measurement. To determine the SNR required to achieve a BER of 10−3, 105 bits were collected per acquisition for BER calculation.
In the FPGA, an 8-point energy accumulation decoding scheme was implemented. The OOK-demodulated signal output from the single-channel QED had a sampling rate of 800 kS/s and was split into two paths. In the first path, the signal passed through an 8-tap matching filter with all coefficients set to 1 for energy accumulation, followed by a delay line and 8× downsampling. In the second path, the signal was processed by a 64-tap moving average filter to extract the DC component. The outputs of the two paths were then quantized by a comparator operating at 100 kHz to generate the final decoded data. To achieve optimal sampling for bit decision, the phase of the 100 kHz comparator clock was aligned by adjusting the delay in the correlation output. By varying the RF input power and averaging over multiple measurements, the BER under different SNR conditions was obtained.
As shown in
Figure 23, the residual frequency offset under effective PBFE operation is less than 22 kHz. To provide a safety margin, the BER tests in this section assume a residual frequency offset of 30 kHz after frequency estimation.
Figure 24 shows the measured BER under different IF frequency offsets and SNR. Specifically, the case with a 30 kHz frequency offset corresponds to the residual offset after applying the proposed PBFE algorithm, while the case with a 50 kHz frequency offset represents the maximum error when the IF frequency range of 0.5 MHz to 1.5 MHz is simply divided into 11 sub-channels with 100 kHz spacing, without frequency estimation. For a 30 kHz frequency offset, an SNR of 5.2 dB is required to achieve a BER of 10
−3, which is above the effective SNR threshold for the PBFE algorithm. For a 50 kHz frequency offset, the required SNR increases to 9.3 dB. These results indicate that the PBFE algorithm improves the required IF signal SNR for achieving a BER of 10
−3 by 4.1 dB.
With a system simulation, the 5.2 dB SNR corresponding to a 10
−3 BER translates to an input sensitivity of −92 dBm.
Table 2 shows the comparison with published receiver designs. For a fair comparison, the power consumption of the ADC and DBB was excluded from the calculations in the other works.
5. Discussions
In the proposed DBB design, each sub-channel uses hierarchical processing. The next stage of processing only begins when the enable signal from the previous stage is active, which helps to reduce overall system power consumption by ensuring that unnecessary components are not always active. This approach minimizes idle power consumption, particularly in multi-channel scenarios.
In the multi-channel implementation, resource consumption is directly proportional to the number of parallel sub-channels. As the number of sub-channels increases, resource consumption grows, which can limit the scalability of the DBB. However, the same sub-channel structure has the potential for hardware efficiency optimization. To reduce resource consumption, the following strategies can be used:
- (1)
Advanced Initial Calibration Method: Implementing more advanced initial calibration techniques could reduce the initial frequency offset range of the free-running oscillator, thereby decreasing the number of sub-channels.
- (2)
Time-multiplexed Resource-shared Architecture: In sparse communication scenarios for IoT, a retransmission mechanism can be introduced in the transmitter, and the receiver’s multi-channel parallel operation can be transformed into a time-multiplexed operation for single-channel processing. This approach, by replacing full parallel processing with time-multiplexing, can significantly reduce LUT and DSP consumption while maintaining real-time performance, especially given the relatively low data rates in ambient IoT applications.
- (3)
Sub-channel Structure Optimization: The multi-phase HB filters and transposed FIR filters have been adopted to minimize resource consumption in the current design. Additionally, when the interference signal has a large frequency offset, the cascaded down-sampling filter can be implemented using only the CIC filter, which greatly reduces the number of signed multipliers.
- (4)
Resource-friendly Multipliers: For the unavoidable multi-bit signed multiplications, such as those in the mixer between the IF digital signal and the digital local oscillator, as well as the self-mixing of the frequency-difference signal from the FIR filter bank output, a more hardware-efficient multiplier architecture, such as Booth multipliers, can be used. Booth multipliers reduce the required addition operations by encoding the input signal, thus improving hardware efficiency.
Although the current work is based on an FPGA for rapid prototyping and validation, future work will focus on implementing the design in an ASIC. An ASIC implementation could provide significant improvements in power consumption and area efficiency, especially with advanced fabrication technologies. Future hardware validation on ASIC will fully assess the actual performance and resource utilization.
It is worth noting that quantization in the FPGA implementation may lead to potential mismatches between simulation results and hardware results. In this work, the most prominent effect of quantization is observed in the FIR filter coefficients’ effective word length and bit-width truncation at each operation. With the resolution and interpolation estimation method used in this design’s DDS, the frequency estimation accuracy is not significantly affected by quantization. However, suppose the goal is to achieve higher frequency estimation accuracy, larger bit-widths can be used to ensure that quantization does not limit frequency estimation performance, albeit at the cost of increased resource consumption.
The limitations and potential modifications required for the proposed receiver under real-world interference scenarios warrant further discussion. To mitigate adjacent channel interference, each communication channel is assigned a unique PRBS, determined by different LFSR structures and initial register states. This ensures that the cross-correlation coefficient between PRBS sequences of adjacent channels is minimized. To further improve the receiver’s ability to suppress adjacent channel interference, longer PRBS sequences can be used. However, this approach increases frequency estimation timing delay, which may reduce communication efficiency.
OOK modulation relies on amplitude variations to convey information, and accurate amplitude decision thresholds are essential for OOK signal decoding. In slow fading conditions, the DOD module within the DBB can adaptively track the DC value of the OOK demodulated signal. This DC value is then used as the amplitude decision threshold for subsequent decoding, ensuring reliable performance. In fast fading conditions, multipath fading can significantly impact the received signal’s amplitude within a single bit time, leading to inaccurate decoding. To address this, several strategies can be employed, such as spatial diversity, time diversity, and frequency diversity. However, these methods may increase the complexity of the transceiver implementation and node cost.