1. Introduction
The rapid growth of artificial intelligence (AI)-driven systems, robotics, and autonomous devices is fueling an unprecedented demand for ubiquitous, low-power wireless connectivity. From smart homes and wearable health monitors to industrial automation and swarm robotics, the Internet of Things (IoT) forms the backbone of these next-generation technologies [
1]. Central to this ecosystem is the need for energy-efficient radios capable of reliably transmitting and receiving data across diverse environments while maintaining ultra-low power consumption. Recently, IoT radios have been expected to operate under strict power constraints, often relying on harvested or battery-limited energy sources, while still delivering high sensitivity to detect weak signals and remaining interoperable with a broad range of wireless protocols [
2,
3].
As billions of connected devices compete for limited spectrum, such as the 915 MHz and 2.4/5 GHz ISM bands, the wireless landscape has become increasingly congested. This spectral crowding substantially increases the likelihood of channel interference arising from overlapping transmissions, legacy systems, or co-located networks. Consequently, ensuring robust and energy-efficient communication in such dense environments has emerged as a critical challenge for future IoT systems [
4]. Narrowband receiver architectures improve selectivity but remain vulnerable to in-band interference that spectrally overlaps with the desired signal [
5,
6,
7]. Spread-spectrum techniques, such as code-division and chirp spread spectrum (CSS), enhance robustness by distributing energy over wide bandwidths, thereby enabling interference tolerance and scalable multi-user access [
8,
9,
10]. Among these techniques, the LoRa protocol offers strong interference resilience and high energy efficiency for low-power wide-area networks [
9,
10]. LoRa encodes data symbols using linearly swept frequency chirps, providing strong tolerance against interference. However, conventional CSS-based receivers, including LoRa, rely on FFT-based synchronization with long preambles, which increases both latency and power consumption.
The LoRa preamble structure is illustrated at the top of
Figure 1. To compensate for center-frequency offset and SBO, the LoRa protocol employs a multi-symbol preamble sequence that facilitates both coarse and fine phase synchronization. These synchronization phases require multiple consecutive chirp symbols. Software-defined LoRa receivers perform synchronization in the spectral domain using FFT engines and long symbol durations to estimate boundary offsets with fine resolution [
9]. However, this approach introduces significant complexity and latency. Due to the nature of FFT-based processing, the resolution of offset detection and compensation is directly dependent on the duration of the recorded signal, which in turn increases latency.
To reduce computational overhead, prior work introduced a dual-chirp architecture utilizing self-mixing and a delayed CSS-modulated LO profile to eliminate the need for FFT processing, as illustrated in the middle portion of
Figure 1 [
4]. This prior design reduces system complexity and the active power consumed by the digital demodulator. However, it relies on a long preamble sequence (e.g., 24 symbols for <5% SBO) to perform maximum-likelihood symbol detection for accurate offset estimation and preamble boundary detection. As a result, large timing errors may occur, leading to a trade-off between synchronization resolution and latency, thereby hindering low-latency communication.
This work proposes a novel low-latency digital demodulator that enables precise symbol-boundary synchronization for CSS-modulated RF packets without requiring FFT processing or long preambles. As shown in the bottom portion of
Figure 1, the proposed system achieves synchronization using as few as 2.5 to 17.5 preamble symbols (corresponding to 0.25–1.75 ms) while maintaining a frequency error below 2%, even under strong in-band interference. The system significantly reduces synchronization latency while preserving high synchronization accuracy in the presence of severe interference. The proposed structure enhances robustness while maintaining energy efficiency, making it well suited for next-generation IoT deployments in spectrally congested environments.
This paper presents a comprehensive design and analysis of the proposed CSS-modulated RF packet synchronization method and its implementation.
Section 2 introduces the concept of amplitude-domain CSS synchronization using oversampled sub-chirp windows, thereby eliminating the need for an FFT engine.
Section 3 details the overall digital system architecture and the demodulator/LO co-designed architecture for CSS synchronization.
Section 4 presents the experimental simulation results, followed by the conclusion in
Section 5.
2. Amplitude-Domain CSS Synchronization with Oversampled Sub-Chirp Windows
A receiver employing a CSS-modulated LO exhibits high tolerance to in-band interference and is well suited for operation in spectrally congested environments. As shown in
Figure 2, when a chirp-modulated LO is used, the receiver experiences minimal spectral overlap with interfering signals because the frequencies coincide only over a short time interval. Consequently, with a narrow detection bandwidth, the effect of in-band interference within a given symbol period appears only momentarily [
4,
11,
12].
A receiver employing a CSS-modulated LO performs synchronization with an incoming RF CSS packet, as shown in
Figure 3. As illustrated, the LO operating in synchronization mode generates a sequence of oversampled sub-chirp windows within each RF preamble symbol period. Each sub-chirp (Lead/Lag symbol) spans a chirp ramp rate that is twice that of the incoming CSS-modulated RF packet. Since spectral overlap with interferers occurs only momentarily during the chirp sweep [
4,
12], these brief overlaps appear as instantaneous amplitude peaks in the time domain due to the frequency-modulated continuous-wave (FMCW) nature of chirps. Given a predefined chirp bandwidth and LO frequency profile, the oversampled sub-chirp windows enable estimation of the instantaneous frequency of the incoming signal, thereby determining the SBO of the desired CSS-modulated RF packet.
In this work, a CSS-modulated RF symbol period of 100 µs is selected, while the receiver operates at a sampling rate of 2 MS/s, resulting in 200 samples per symbol. The chirp bandwidth is set to 4 MHz, and the incoming RF CSS packet has a chirp ramp rate of 20 kHz/sample. For each Lead/Lag symbol, four sub-chirps are generated, each occupying 50 samples.
As illustrated in the figure, the Lead symbol employs a sub-chirp configuration designed to detect leading SBOs in the range from −50% to 0%. The Lag symbol uses a different sub-chirp configuration compared to the Lead symbol, enabling detection of SBOs in the range from 0% to +50%. The incoming RF CSS symbol exhibits an SBO corresponding to one of the two consecutive symbols. Spectral overlap occurs momentarily twice when the incoming RF CSS symbol aligns with the appropriate LO symbol.
The quantized double-sideband baseband data, down-converted and captured using oversampled sub-chirp windows in synchronization mode, is first stored in a 14-bit-wide register and then transferred to a 17-bit-wide memory array consisting of 200 cells. After sampling and storing data across two symbol periods and assigning appropriate indices, each sample index corresponds to a quantized estimate of the SBO, as depicted in the bottom portion of
Figure 3. Sample indices (0–99) represent leading SBOs, whereas indices (100–199) represent lagging SBOs.
A synchronization symbol consists of 200 discrete samples, each represented by a 6-bit I/Q pair. As shown at the bottom of
Figure 3, a pair of 6-bit SAR ADCs with an effective number of bits (ENOB) of 5.7 bits quantizes the demodulated analog signal down-converted by the oversampled sub-chirp windows. The analog signal bandwidth is ±100 kHz.
To compute the amplitude, each 6-bit I/Q sample pair is converted into a 13-bit value representing the received signal energy, calculated as
. Since sample indices (0–99) and (100–199) within a symbol correspond to the same SBO, these two groups of 100 samples are accumulated and stored in a memory array, resulting in 100 memory cells for one symbol (Lead/Lag symbol). The index-sorted amplitude signal obtained from the Lead symbol is derived from the bottom-right portion of
Figure 3 and is expressed as
Here, denotes the ADC sample index. represents the amplitude peak of the index-sorted signal, and corresponds to the received signal energy during a synchronization symbol period.
Likewise, the samples from the complementary synchronization symbol are processed and stored in a similar 100-cell memory structure. The index-sorted amplitude signal obtained from the Lag symbol is similarly expressed as
Upon completion of both synchronization patterns, the digital baseband performs maximum likelihood estimation (MLE) to determine the optimal SBO.
3. System Architecture
The proposed digital baseband demodulator is depicted in
Figure 4. The digital baseband performs its synchronization process under the control of the main finite-state machine (FSM), which manages both synchronization and data reception modes in the receiver. As described in
Section 2, the CSS-modulated RF packet employs a symbol period of 100 µs and a chirp bandwidth of 4 MHz. The operations of the receiver and the charge-pump-based fractional-
analog PLL for CSS packet synchronization are dynamically controlled by the digital baseband controller.
The memory size is determined to support multiple symbol repetitions, allowing up to eight preamble accumulation periods for consecutive Lead–Lag symbols, constrained by the target CMOS die area. In this implementation, each sampled energy value is represented using 14 bits, while each memory cell has a width of 17 bits to accommodate energy accumulation across multiple symbol repetitions.
As the digital baseband demodulator stores sampled energy over a predefined number of symbol repetitions (2–16 synchronization symbol cycles), the demodulator performs maximum-likelihood estimation (MLE) by searching for the peak among the stored samples. To mitigate glitches and noise artifacts in the sampled data, a digital low-pass filter implemented as a 5-point moving average is applied to the energy data prior to differentiation and MLE. The peak energy index from the filtered energy profile is then selected as the most probable symbol boundary.
Since the sampled memory contents must be preserved during the peak-search process for SBO MLE, the digital baseband requires half of the symbol period for processing. During this interval, data reception is suspended. Once the peak value exceeds a predefined threshold, the demodulator adjusts the symbol boundary to the estimated SBO by controlling the frequency-control word (FCW) of the fractional- PLL. After SBO estimation, the demodulator compensates for the SBO by providing the appropriate initial frequency setting to the fractional- PLL.
Following SBO detection and alignment, the demodulator verifies synchronization by accumulating the incoming data during the first symbol cycle in data reception mode using the newly aligned symbol boundary synchronized to the incoming CSS-modulated RF packet, before storing the payload data. Once the accumulated power exceeds a predefined threshold, the demodulator begins receiving the payload data for the predefined packet length.
In an ideal scenario, assuming that the PLL exhibits zero settling time, the oversampled sub-chirp windows would be seamlessly concatenated, enabling direct and continuous offset estimation, as illustrated in
Figure 3. However, in practice, the analog PLL exhibits a finite settling time during frequency transitions, as shown in
Figure 5a. To accommodate this behavior, finite time intervals must be inserted between adjacent sub-chirp windows to allow for PLL settling. This settling delay can lead to boundary-estimation mismatches, resulting in residual frequency errors and potential signal loss in the detection path.
To accurately detect the SBO of an incoming CSS-modulated RF packet with a fixed chirp bandwidth, the non-ideal sub-chirp window requires a steeper frequency ramp than that of the ideal profile; in this work, a ramp rate of 40 kHz/sample is employed. The adjusted sub-chirp profile accounting for these non-idealities is shown in
Figure 5a.
The combination of finite PLL settling intervals and the modified frequency ramp introduces a redundant region in which the same amplitude response is captured by adjacent sub-chirp windows. As a result, only a subset of the samples—those that avoid the settling interval and redundant overlap—are considered useful, as shown in
Figure 5b. To mitigate the effects of this non-ideal behavior, the receiver applies a compensation factor by introducing a fractional multiplier that adjusts the sample index to reflect the actual SBO. This multiplier scales the offset estimate to correct the temporal distortion introduced by PLL settling and the modified frequency ramp rate.
In this implementation, the ideal sub-chirp profile exhibits a frequency ramp rate of 40 kHz/sample. However, given a PLL settling-time margin of 5 µs between adjacent sub-chirp windows, the required chirp ramp rate is increased to 50 kHz/sample. As shown in
Figure 5b, the number of unutilized samples within a single sub-chirp window can therefore be calculated as
Here, denotes the number of unutilized samples in a single sub-chirp window. and represent the chirp slopes of the CSS-modulated RF signal and the sub-chirp signal, respectively. denotes the number of samples allocated for PLL settling, and denotes the total number of samples within a single sub-chirp window. Based on (3) and the predefined system parameters, the fraction of unutilized samples within a single sub-chirp window is calculated to be 34%, corresponding to approximately 17 samples per sub-chirp window. In other words, the fraction of useful samples is 66%, corresponding to 33 valid samples per sub-chirp window.
The maximum likelihood index with fractional offset adjustment, denoted as
, is expressed as
Here,
is the integer quotient of
divided by
. The adjusted estimated SBO is then computed as
The entire SBO detection/estimation sequence, including the 5-point moving-average digital low-pass filter, MLE, and fractional offset adjustment, requires 80 sampling periods, corresponding to 40 µs. An additional 10 µs is allocated for PLL settling to the correct SBO. Finally, the digital demodulator synchronizes the RX LO in data reception mode using a frequency ramp identical to that of the incoming CSS-modulated RF packet. In this work, the ramp rate is 20 kHz/sample.
The synchronization pattern is generated by an analog fractional-
PLL, which must produce the sub-chirp frequency profiles with fast locking performance, particularly during instantaneous frequency transitions between adjacent sub-chirp windows in synchronization mode, as shown in
Figure 6a. As shown in
Figure 6b, the digital demodulator dynamically adjusts both VCO operating frequency and charge-pump current at the beginning of the PLL settling phase. During the initial settling phase, the charge pump delivers a higher current, effectively increasing the loop gain of the PLL. This elevated loop gain accelerates the locking process, allowing the PLL to stabilize within the short settling interval allocated between sub-chirp windows.
In addition, abrupt operating-frequency transitions occur in the VCO at the beginning of each sub-chirp window to support frequency hopping between sub-chirps. A thermometer-coded VCO capacitor bank is applied for this purpose. This pre-selection significantly reduces the frequency acquisition time, even during rapid frequency hops. Once frequency lock is achieved, the digital demodulator controls the PLL to modulate the chirp frequency at the predefined ramp rate, linearly sweeping the output frequency according to the desired chirp profile by updating the frequency-control word every 20 MHz clock period. The frequency modulation is driven by a second-order delta-sigma modulator, which introduces fine-grained dithering to accurately shape the target chirp slope [
4].
The voltage-controlled oscillator (LC-VCO) exhibits a phase noise of −100 dBc/Hz at a 1 MHz offset. The reference clock frequency is 20 MHz and is shared with the system clock of the LO modulation logic core. As shown in
Figure 6c, this configuration enables robust and accurate chirp generation for both synchronization and data reception modes, with a PLL settling time of 5 µs.
4. Results
The simulation was conducted to evaluate the synchronization performance of the proposed digital demodulator under varying SBO conditions for incoming CSS-modulated RF packets. The synchronization performance was evaluated over more than independent synchronization trials. The simulations were performed across three process-voltage-temperature (PVT) corners, including typical-typical (TT), fast-fast (FF) at high supply voltage and low temperature, and slow-slow (SS) at low supply voltage and high temperature. In addition, minimum and maximum RC extraction corners, along with independent random noise seeds, were included to account for process and environmental variations.
The synchronization decision threshold was programmable, and successful synchronization was defined as an estimated symbol boundary offset (SBO) error within ±1% of the full symbol duration. Furthermore, the proposed synchronization architecture was evaluated under a carrier frequency offset (CFO) of up to 100 ppm, corresponding to approximately one sample displacement over fifty symbol periods under the adopted system parameters. This CFO condition reflects the frequency accuracy limitation of practical crystal reference oscillators. The selected fifty-symbol duration corresponds to the expected preamble and payload packet length.
In this work, the syncword misdetection probability was measured to evaluate synchronization accuracy. As shown in
Figure 7a, the receiver achieved a syncword misdetection probability below
at required signal-to-noise ratios (SNRs) of 9 dB without repetition and 1 dB with eightfold repetition. In-band SIR was simulated at different frequency offset within chirp bandwidth using two types of interference signals: a 4 MHz wideband RF-modulated signal based on the IEEE 802.11ba WLAN Wake-Up Radio (WUR) standard [
13] and a GFSK-modulated signal emulating a Bluetooth Low Energy-like interference pattern. Those two types of interferers were configured to generate non-deterministic symbol patterns. As plotted in
Figure 7b, the worst-case in-band SIRs with WLAN WUR-modulated interference were −10.7 dB without repetition and −22.2 dB with 8× Lead–Lag synchronization repetition.
The BLE GFSK signal has a narrower effective bandwidth than the WLAN WUR signal within the chirp bandwidth. Assuming that the BLE GFSK-modulated interferer is located at the center of the chirp bandwidth, the proposed digital demodulator achieves worse SIR performance under BLE GFSK interference compared to WLAN WUR interferer. The worst-case in-band SIRs with BLE GFSK-modulated interference were −5.8 dB without repetition and −16.2 dB with 8× Lead–Lag synchronization repetition.
The digital demodulator consumed an average power of 877 µW, with the SAR ADCs, FCW modulator, and main digital baseband consuming 255 µW, 92 µW, and 530 µW, respectively. The charge-pump-based analog fractional-N PLL consumed 830 µW on average.
As shown in
Figure 7c, the receiver successfully demodulates a CSS-modulated RF packet using a well-managed PLL, as shown in
Figure 6b. After the synchronization interval, the receiver accurately detects the amplitude peak and compensates for the SBO between the incoming RF packet and the local reference, achieving an SBO error within 2%. Depending on the estimated SBO, which ranges from −50% to 50%, the synchronization interval can vary from 250 µs to 350 µs without repetition. Following this process, the demodulator verifies synchronization before transitioning to RX mode. As illustrated in the figure, the detected SBO closely matches that of the incoming CSS signal. With 8× Lead–Lag synchronization repetition, the synchronization interval can vary from 1650 µs to 1750 µs.
Table 1 compares the proposed CSS synchronization scheme with LoRa [
9] and a prior dual-chirp design [
4]. The proposed demodulator achieves the decent symbol alignment while maintaining competitive SNR and interference tolerance. The LoRa protocol delivers the best required SNR for perfect synchronization; however, this comes at the cost of higher latency due to its processing gain. The digital demodulator, PLL modulator, and I/Q SAR ADCs are implemented in a 65 nm CMOS process and occupy a total active area of 0.195 mm
2, as shown in
Figure 8.
5. Conclusions
This work presented a low-latency digital demodulator and synchronization architecture for CSS-modulated RF packets targeting energy-efficient IoT wireless systems operating in spectrally congested environments. The proposed approach performs amplitude-domain synchronization using oversampled sub-chirp windows and maximum likelihood estimation to minimize SBO without FFT processing. A co-designed digital demodulator and fractional-N PLL architecture enables rapid sub-chirp generation and fast frequency settling, while compensation techniques mitigate PLL non-idealities and redundant sampling effects. As a result, the proposed receiver achieves accurate symbol boundary offset estimation within 17.5 cycles (corresponds to 1.75 ms), while maintaining synchronization error below 2% under >10 dB in-band interference. Implemented in 65 nm CMOS, the proposed digital demodulator, PLL modulation logic, and IQ SAR ADCs occupy a compact active area of 0.195 mm2.
These results demonstrate that accurate and interference-resilient CSS synchronization can be achieved with substantially lower latency and complexity than conventional approaches, making the proposed architecture well suited for dense IoT sensor networks, industrial automation, wake-up radios, wearable healthcare devices, and battery-powered or energy-harvesting wireless nodes operating in congested ISM-band environments. In particular, the proposed low-latency synchronization capability can support next-generation autonomous and distributed IoT systems, such as swarm robotics and smart infrastructure, where rapid and reliable packet detection is required under severe in-band interference.
6. Future Works
This work focuses on the design and simulation-based validation of a low-latency digital demodulator for CSS-modulated RF packet synchronization. Although the digital demodulator, PLL modulation logic, and I/Q SAR ADCs have been implemented in a 65 nm CMOS process, the reported synchronization performance, including SBO estimation accuracy, in-band SIR tolerance, PLL settling behavior, and power consumption, is currently based on circuit and system level simulations. Future work will therefore include silicon measurements of the fabricated prototype to experimentally verify the proposed synchronization scheme under practical RF channel conditions.
The planned measurement will evaluate power consumption, PLL settling time, synchronization latency, SBO estimation error, packet error rate, syncword misdetection probability, and interference tolerance across various SNR and in-band interference conditions. In addition, this demodulator will be extended toward a complete CSS receiver system comprising an interference-resilient RF front-end, chirp-modulated LO generation, mixed-signal baseband processing, and the proposed digital synchronization demodulator. This system-level integration will enable end-to-end CSS packet reception and further demonstrate the suitability of the proposed approach for low-power, low-latency, and interference-resilient IoT applications.