A Gated Oscillator Clock and Data Recovery Circuit for Nanowatt Wake-Up and Data Receivers

This article presents a data-startable baseband logic featuring a gated oscillator clock and data recovery (GO-CDR) circuit for nanowatt wake-up and data receivers (WuRxs). At each data transition, the phase misalignment between the data coming from the analog front-end (AFE) and the clock is cleared by the GO-CDR circuit, thus allowing the reception of long data streams. Any free-running frequency mismatch between the GO and the bitrate does not limit the number of receivable bits, but only the maximum number of equal consecutive bits (Nm). To overcome this limitation, the proposed system includes a frequency calibration circuit, which reduces the frequency mismatch to ±0.5%, thus enabling the WuRx to be used with different encoding techniques up to Nm = 100. A full WuRx prototype, including an always-on clockless AFE operating in subthreshold, was fabricated with STMicroelectronics 90 nm BCD technology. The WuRx is supplied with 0.6 V, and the power consumption, excluding the calibration circuit, is 12.8 nW during the rest state and 17 nW at a 1 kbps data rate. With a 1 kbps On-Off Keying (OOK) modulated input and −35 dBm of input RF power after the input matching network (IMN), a 10−3 missed detection rate with a 0 bit error tolerance is measured, transmitting 63 bit packets with the Nm ranging from 1 to 63. The total sensitivity, including the estimated IMN gain at 100 MHz and 433 MHz, is−59.8 dBm and−52.3 dBm, respectively. In comparison with an ideal CDR, the degradation of the sensitivity due to the GO-CDR is 1.25 dBm. False alarm rate measurements lasting 24 h revealed zero overall false wake-ups.


Introduction
Energy efficiency is a fundamental metric for all battery-powered devices, such as wireless sensor and actuator network (WSAN) nodes, whose most power-hungry subsystem is usually the RF transceiver. A wake-up receiver (WuRx) is an always-on ultra-low-power receiver which constantly monitors the channel and wakes the node up at the reception of a communication request in order to overcome the trade-off between power consumption and node latency [1,2]. WuRxs can be classified depending on their application range. Short-range WuRxs are fully passive and achieve a communication distance limited to few centimeters or meters. Medium-range WuRxs are used in applications requiring a range of, at most, 100 m, their power consumption typically being in the nanowatt range. Long-range WuRxs consume microwatts and can receive packets from kilometers away.
A typical WuRx architecture is composed of two subsystems: an analog front-end (AFE) and a baseband logic. The AFE turns the RF input's OOK-modulated signal into a The remainder of this paper is organized as follows. Section 2 describes the proposed WuRx architecture with special emphasis on the baseband logic. Section 3 and Section 4 present the circuit design and the implementation choices, respectively. Section 5 shows the measurement results, and finally, Section 6 concludes the paper.

Wake-Up and Data Receiver Architecture
The proposed WuRx is shown in Figure 1. The always-on AFE was clockless (i.e., it did not need an oscillator), while the baseband logic required a clock to sample the incoming data. This allowed the WuRx to operate in two phases. During Phase 1, the baseband logic was off, whereas the AFE was active. Phase 2 started upon recognition of the first 0to-1 transition of the message, occurring at the first transition of the AFE output signal. The baseband logic was turned on, and the incoming bitstream was compared with the stored codeword. This approach allowed us to reduce the power consumption of the node if the specific application was characterized by long idle periods, since the baseband logic was off most of the time [16]. The AFE was composed of an external lumped component input matching network (IMN) followed by an envelope detector (ED) and a comparator, both of which were integrated on the chip.
As indicated in Figure 1, the proposed data-startable baseband logic included [11] (1) a GO-CDR, (2) a control logic with addressing capabilities (CL) to generate the wake-up signal and control signals for GO-CDR and (3) a bias and calibration (BC) circuit for the GO-CDR.
As illustrated in Figure 2a, the purpose of the CDR circuit was to provide a clock to the CL to correctly sample on the positive edges a delayed version of the input data (DDin). As shown in Figure 2b, ideally, the sampling edges would be placed at the center of each bit time ( ). The circuit was composed of three sections: a delay block (DB), an edge detector implemented through an Exclusive NOR (EXNOR) gate and the GO. The EXNOR gate was fed with the data signal, Din, and its delayed version, DDin, resulting The AFE was composed of an external lumped component input matching network (IMN) followed by an envelope detector (ED) and a comparator, both of which were integrated on the chip.
As indicated in Figure 1, the proposed data-startable baseband logic included [11] (1) a GO-CDR, (2) a control logic with addressing capabilities (CL) to generate the wakeup signal and control signals for GO-CDR and (3) a bias and calibration (BC) circuit for the GO-CDR.
As illustrated in Figure 2a, the purpose of the CDR circuit was to provide a clock to the CL to correctly sample on the positive edges a delayed version of the input data (DDin). As shown in Figure 2b, ideally, the sampling edges would be placed at the center of each bit time (T b ). The circuit was composed of three sections: a delay block (DB), an edge detector implemented through an Exclusive NOR (EXNOR) gate and the GO. The EXNOR gate was fed with the data signal, Din, and its delayed version, DDin, resulting in a pulse of the gate signal at each Din transition. When Gate = 1, the GO was in free running mode with a frequency f ck = 1/T ck , while with Gate = 0, it was blocked in a predefined state. When Gate switched from 0 to 1, the GO generated the positive edge of Clock, ideally after T ck /2, thus allowing it to clear any phase error accumulated up to that time, even if the free-running clock frequency was not precisely matched to the data rate (T ck = T b ). Therefore, the only constraint of this architecture is on the maximum number of equal consecutive bits (N m ) that can be correctly sampled. N m can be calculated, imposing that no bit is sampled twice (which can occur if T ck < T b ) or not sampled at all (which can occur if T ck > T b ). Defining α = |T ck − T b |/T b , a simplified analysis carried out assuming a start-up time of zero for the oscillator and neglecting the Clock jitter, leads to the following constraints: Electronics 2021, 10, x FOR PEER REVIEW 4 of 18 in a pulse of the gate signal at each Din transition. When Gate = 1, the GO was in free running mode with a frequency = 1/ , while with Gate = 0, it was blocked in a predefined state. When Gate switched from 0 to 1, the GO generated the positive edge of Clock, ideally after /2, thus allowing it to clear any phase error accumulated up to that time, even if the free-running clock frequency was not precisely matched to the data rate ( ≠ ). Therefore, the only constraint of this architecture is on the maximum number of equal consecutive bits (Nm) that can be correctly sampled. Nm can be calculated, imposing that no bit is sampled twice (which can occur if < ) or not sampled at all (which can occur if > ). Defining = | − |/ , a simplified analysis carried out assuming a start-up time of zero for the oscillator and neglecting the Clock jitter, leads to the following constraints: is the clock period, is the bit-time and is the delay between the input data (Din) and the delayed version of the input data (DDin). T ck is the clock period, T b is the bit-time and τ d is the delay between the input data (Din) and the delayed version of the input data (DDin).
This results in N m < (1 − α)/(2α) [11]. In case a Manchester code is employed, which contains a transition in each bit time (i.e., N m = 2) at the cost of halving the data rate compared with the standard binary encoding, the equation leads to α < 0.2. Such a frequency error upper limit is easily achievable in integrated ultra-low-power oscillators. To avoid the use of a Manchester code with its associated limitations on the data rate, the proposed architecture included a bias and calibration circuit for the GO-CDR, which reduced α to negligible values and then allowed the WuRx to process data containing long sequences of equal consecutive bits.

Analog Front-End
The AFE of the implemented WuRx ( Figure 1) was composed of an envelope detector, which simultaneously extracted the envelope of the incoming OOK signal and amplified it at the baseband, a comparator with a variable threshold to digitize the extracted envelope and a reference current generator to provide bias currents for both the ED and the comparator. The circuit schematic of the ED, shown in Figure 3a, was an elaboration of that in [18], where subthreshold operation allowed envelope extraction, leveraging second-order nonlinearities. The self-biasing scheme for the gain transistor M5 allowed for setting a robust DC operating point. The correct operation required the Resistance-Capacitance (RC) time constant chosen to be large enough to maintain the gate voltage of M5, V REF , almost equal to its quiescent value (corresponding to the zero RF input signal) also during the reception of an entire packet. If this condition was met, M5 effectively operated as a common gate amplifier. Correspondingly, the high and low values of V OUT_AMP remained constant throughout the whole packet, as shown in the inset of Figure 3a, and the output voltage V OUT _ AMP was a low-pass filtered version of the RF input envelope. This guaranteed the correct operation of the comparator with a fixed threshold. More details can be found in [18]. Unlike the solution in [18], no cascode transistor was employed in the ED, thus allowing the use of a supply voltage lower than the nominal one, leading to a reduction in power consumption.

Gated Oscillator and Delay Block
The GO is shown in Figure 4. It was a ring oscillator composed of three stages, each of which consisted of a current-starved inverter (CSI) (M1-M4, M7-M10 and M13-M16), a capacitor (C1, C2 and C3) and two additional transistors (M5-M6, M11-M12 and M17-M18) driven by the Gate signal to reset the output of each CSI (O1, O2 and O3) to a predefined state at each pulse in the Gate signal.
The output of the GO was fed to an inverting stage to generate a squared clock signal. The oscillation frequency was 1/(2 ), where = 3 was the number of stages and was the propagation delay of each stage, yielding = /6. Bias voltages vbias_p and vbias_n controlled the charging and discharging currents for capacitors C1, C2 and C3 and thus the value of . The delay block DB consisted of a stage equal to the ones used in the GO biased by the same control voltages vbias_p and vbias_n (with the two additional transistors biased as off) followed by an inverting stage to square its output signal (DDin). These choices ensured = (where is the delay between Din and DDin), which implied that the necessary condition < /2 was always satisfied [11]. This con- The comparator schematic is shown in Figure 3b. It received both the ED output voltage and the voltage at the gate of M5 which, as said above, remained almost constant at its quiescent value for the entire packet reception. The body effect of the differential pair transistors (M3 = M5) was exploited to set the effective threshold of the comparator V THR by adjusting the externally supplied bulk voltages V BULK1 and V BULK2 . The inset of Figure 3a shows the relationships between V OUT _ AMP , V REF and V THR .
Both the ED and the comparator were biased with Proportional-to-Absolute-Temperature (PTAT) currents [19] generated by the circuit in Figure 3c. The effectiveness of using a PTAT current for a more constant ED gain within the −20-85 • C temperature range has been proven through simulations [18].

Gated Oscillator and Delay Block
The GO is shown in Figure 4. It was a ring oscillator composed of three stages, each of which consisted of a current-starved inverter (CSI) (M1-M4, M7-M10 and M13-M16), a capacitor (C1, C2 and C3) and two additional transistors (M5-M6, M11-M12 and M17-M18) driven by the Gate signal to reset the output of each CSI (O1, O2 and O3) to a predefined state at each pulse in the Gate signal.
The output of the GO was fed to an inverting stage to generate a squared clock signal. The oscillation frequency was 1/(2 ), where = 3 was the number of stages and was the propagation delay of each stage, yielding = /6. Bias voltages vbias_p and vbias_n controlled the charging and discharging currents for capacitors C1, C2 and C3 and thus the value of . The delay block DB consisted of a stage equal to the ones used in the GO biased by the same control voltages vbias_p and vbias_n (with the two additional transistors biased as off) followed by an inverting stage to square its output signal (DDin). These choices ensured = (where is the delay between Din and DDin), which implied that the necessary condition < /2 was always satisfied [11]. This condition prevented the clock's high phase from having a null duration. Furthermore, must be a chosen value larger than the reset time ( ) of the oscillator. The design constraints on the value of are therefore  The output of the GO was fed to an inverting stage to generate a squared clock signal. The oscillation frequency was 1/ 2Nτ p , where N = 3 was the number of stages and τ p was the propagation delay of each stage, yielding τ p = T ck /6. Bias voltages vbias_p and vbias_n controlled the charging and discharging currents for capacitors C1, C2 and C3 and thus the value of τ p . The delay block DB consisted of a stage equal to the ones used in the GO biased by the same control voltages vbias_p and vbias_n (with the two additional transistors biased as off) followed by an inverting stage to square its output signal (DDin). These choices ensured τ d = τ p (where τ d is the delay between Din and DDin), which implied that the necessary condition τ d < T b /2 was always satisfied [11]. This condition prevented the clock's high phase from having a null duration. Furthermore, τ d must be a chosen value larger than the reset time (τ res ) of the oscillator. The design constraints on the value of τ d are therefore

Control Logic with Addressing Capabilities
The CL is shown in Figure 5. It was composed of four blocks: (1) a serial-input paralleloutput (SIPO) register, (2) a correlator with a programmable codeword and threshold (adapted from [8]), (3) a programmable timeout counter and (4) a sequential unit (SU). The configuration parameters (i.e., the codeword, correlator threshold and timeout values), were assigned to the CL by programming the SIPO register. Figure 6 summarizes the behavior of the CL. The SU detected a 0-to-1 transition of the Din and forced the WuRx into Phase 2. When the GO-CDR was activated, the generated clock was used by the CL to sample the incoming bitstream (DDin). In particular, the SU detected a start frame delimiter (SFD), which enabled the correlator to start the comparison between the DDin and the codeword (en_corr switches from 0 to 1). The CL generated the wake-up signal only when the correlation result was higher than the threshold of the correlator. The CL also included a timeout counter, triggered by the clock provided by GO-CDR, to push the system back to Phase 1 after a predefined time interval elapsed without detecting the correct codeword.
The assignment of the configuration parameters was crucial to optimizing the performance of the entire system. In particular, the correlator threshold together with the codeword length could be set as a function of the sensitivity of the WuRx in order to reduce Electronics 2021, 10, 780 7 of 16 the number of false wake-ups in noisy environments. Consistently, the timeout value could be set accordingly to reduce power consumption during Phase 2.
the codeword (en_corr switches from 0 to 1). The CL generated the wake-up signal only when the correlation result was higher than the threshold of the correlator. The CL also included a timeout counter, triggered by the clock provided by GO-CDR, to push the system back to Phase 1 after a predefined time interval elapsed without detecting the correct codeword.
The assignment of the configuration parameters was crucial to optimizing the performance of the entire system. In particular, the correlator threshold together with the codeword length could be set as a function of the sensitivity of the WuRx in order to reduce the number of false wake-ups in noisy environments. Consistently, the timeout value could be set accordingly to reduce power consumption during Phase 2.   limiter (SFD), which enabled the correlator to start the comparison between the DDin and the codeword (en_corr switches from 0 to 1). The CL generated the wake-up signal only when the correlation result was higher than the threshold of the correlator. The CL also included a timeout counter, triggered by the clock provided by GO-CDR, to push the system back to Phase 1 after a predefined time interval elapsed without detecting the correct codeword.
The assignment of the configuration parameters was crucial to optimizing the performance of the entire system. In particular, the correlator threshold together with the codeword length could be set as a function of the sensitivity of the WuRx in order to reduce the number of false wake-ups in noisy environments. Consistently, the timeout value could be set accordingly to reduce power consumption during Phase 2.

Bias and Calibration Circuit
The block diagram of the bias and calibration (BC) circuit is shown in Figure 7. It was composed of a frequency detector (FD) adapted from [20], a successive approximation logic (SA Logic) and a digital controlled current source (DCCS). The BC circuit was in charge of generating the bias voltages (vbias_p and vbias_n) for the GO-CDR so that the oscillation frequency of the GO was equal to the target data rate, even with process, voltage and temperature (PVT) variations. The FD detected the frequency difference between the clock and the external reference (Clock_ref) equal to the data rate, while the SA Logic was used to set the bits (bi_UP-bi_DN) of the DCCS, exploiting the output signals (UP-DN) of the FD. In particular, the DCCS generated the bias voltages vbias_p and vbias_n for GO-CDR using binary weighted currents. For testing purposes, in the present implementation, these currents were generated from an external current source (Ibias). If the frequency of clock was too close or too far from the Clock_ref, the generation of UP-DN pulses could require many clock cycles or could not occur at all. To avoid the stall condition, the SA Logic included a counter, which forced the end of the calibration in case its timeout value was reached (i.e., End_Calib switches from 0 to 1). Clock_ref was applied to the FD. The calibration ended when the least significant bit of the DCCS was set (i.e., End_Calib switches from 0 to 1) or, as mentioned above, when the counter reached the timeout value. In particular, the calibration cycle was managed by the node Microcontroller Unit (MCU), which had to generate the Clock_ref and start_calib signals for BC. The power consumption of Clock_ref was negligible, since the calibration procedure could be activated only in few cases: (1) when the node was started up, (2) at predefined time steps and (3) when the temperature of the node was higher or lower than the predefined thresholds.

Implementation Choices
The proposed WuRx was designed using STMicroelectronics 90 nm BCD technology, targeting a 1 kbps bitrate. The AFE, the GO-CDR and the CL were designed to have a 0.6 V supply voltage (vdd), whereas in the current prototype, the BC circuit was designed for operation with a standard 1.2 V supply.

Analog Front-End
The ED (Figure 3a) was biased with I1 = 1 nA. The first pole was due to an integrated 75 MΩ resistor in series with an output resistance of M6 (roughly 75 MΩ) and an external The calibration started switching from 0 to 1 the Start-Calib signal to enable the SA Logic and force the CL to trigger GO-CDR using the Enable signal. At the same time, Clock_ref was applied to the FD. The calibration ended when the least significant bit of the DCCS was set (i.e., End_Calib switches from 0 to 1) or, as mentioned above, when the counter reached the timeout value. In particular, the calibration cycle was managed by the node Microcontroller Unit (MCU), which had to generate the Clock_ref and start_calib signals for BC. The power consumption of Clock_ref was negligible, since the calibration procedure could be activated only in few cases: (1) when the node was started up, (2) at predefined time steps and (3) when the temperature of the node was higher or lower than the predefined thresholds.

Implementation Choices
The proposed WuRx was designed using STMicroelectronics 90 nm BCD technology, targeting a 1 kbps bitrate. The AFE, the GO-CDR and the CL were designed to have a 0.6 V supply voltage (vdd), whereas in the current prototype, the BC circuit was designed for operation with a standard 1.2 V supply.

Analog Front-End
The ED (Figure 3a) was biased with I 1 = 1 nA. The first pole was due to an integrated 75 MΩ resistor in series with an output resistance of M6 (roughly 75 MΩ) and an external 500 nF capacitance. The comparator (Figure 3b) was biased with I 2 = 1 nA as well. As mentioned in Section 3, the bulk voltages of the transistors belonging to the comparator input differential pair were supplied externally to adjust the effective threshold.

Gated Oscillator and Delay Block
With reference to Figure 4, the nominal value of the charging and discharging currents was 2 nA to generate a free-running 1 kHz clock frequency with capacitance C 1 = C 2 = C 3 = 1.1 pF.
The same values were used in the delay block leading to a 163 µs delay (τ d ) for the rising edges of the input data (Din) and 146 µs for the falling ones. Since the reset time of the oscillator was 340 ns, the conditions discussed in Section 3 on τ d were largely satisfied. The start-up time of the oscillator was τ start−up = 7 µs, which implied no preamble was needed for the oscillator to settle. The GO-CDR performances were evaluated by also performing transient noise simulations. The simulated clock rms jitter turned out to be lower than 1 µs, which was negligible compared with the clock period.

Control Logic with Addressing Capabilities
The CL ( Figure 5) was designed and compiled starting from an RTL-HDL behavioral description, targeting a 1.2 V low-power standard cell library, which yielded a circuit with an 800 equivalent gates complexity. In order to minimize its power consumption, as Electronics 2021, 10, 780 9 of 16 mentioned above, its supply voltage was set to 0.6 V. This required post-layout transistorlevel simulations to verify the correct operation of the circuit. The maximum codeword length and correlator threshold were both set to 16 bits, while the timeout value was set to 63 cycles. This resulted in a 26-bit SIPO register (16 bits for the codeword, 4 bits for the correlator threshold and 6 bits for the timeout value). From the design parameters reported above, it can be concluded that the maximum timeout value limited the maximum packet length to 63 bits. Furthermore, to minimize the preamble time, the CL was designed to detect a start frame delimiter consisting of two consecutive zeros after the first 0-to-1 transition, thus resulting in a 3-bit preamble (100).

Bias and Calibration Circuit
The bias and calibration circuit (Figure 7) was designed to compensate for PVT variations [11]. Assuming temperature variations from −25 • C to +125 • C, and ±12.5% supply voltage variations, in the worst process corner case, the simulated largest clock frequency error referring to its nominal value (1 kHz) was 15%. The FD and SA Logic were designed and compiled on a 1.2 V low-power standard cell library. The timeout counter in SA Logic was designed with 8 bits, resulting in 255 clock edges before the raise of the End_Calib signal. The FD and SA Logic yielded a circuit with a 700 equivalent gates complexity.
It was verified through transistor-level simulations that with a 1 kHz Clock_ref, the FD operated correctly for a clock frequency between 700 Hz and 1450 Hz. Furthermore, it was demonstrated that the timeout value was long enough to enable the FD to detect frequency differences down to ±0.5%. Then, the DCCS was designed using five weighting bits to compensate both clock frequency variations up to ±20% and calibration loop non-idealities. Simulations demonstrated that the bias and calibration circuit yielded a ±0.5% GO freerunning frequency accuracy after calibration. This, according to theoretical Equations (1) and (2), resulted in a simulated N m = 100 bits, which was only affected by the oscillator PVT variations.
During Phase 1, the simulated power consumption of the AFE was 8 nW, while that of the baseband logic was 4.8 nW, making the total simulated power consumption equal to 12.8 nW.
During Phase 2, the average consumption of the baseband logic was 9 nW, whereas the AFE still consumed 8 nW. Therefore, the total simulated power consumption during Phase 2 was 17 nW. Since the operating bitrate was 1 kbps, the energy per bit of the proposed system was 17 pJ/bit. The contribution to the overall power consumption of the BC circuit ( Figure 7) was not included in this computation because, in the present implementation, Ibias was an external current source. In the final implementation, when Ibias was replaced with the PTAT current generated by the circuit of Figure 3c, the simulated additional power consumption of the BC with a 0.6 V supply would be 5.48 nW, including 0.8 nW consumed by the digital controlled current source. Figure 8a shows the chip photograph before the application of the protective resin. The AFE occupied 0.2 mm 2 , whereas the baseband logic area was 0.126 mm 2 . Most of the overall area was due to passives, in particular the resistors in the ED (75 MΩ) and in the PTAT current generator (13 MΩ and 113 MΩ). An additional area of 0.042 mm 2 was due to BC.  (Figure 7) was not included in this computation because, in the present implementation, Ibias was an external current source. In the final implementation, when Ibias was replaced with the PTAT current generated by the circuit of Figure 3c, the simulated additional power consumption of the BC with a 0.6 V supply would be 5.48 nW, including 0.8 nW consumed by the digital controlled current source. Figure 8a shows the chip photograph before the application of the protective resin. The AFE occupied 0.2 mm 2 , whereas the baseband logic area was 0.126 mm 2 . Most of the overall area was due to passives, in particular the resistors in the ED (75 MΩ) and in the PTAT current generator (13 MΩ and 113 MΩ). An additional area of 0.042 mm 2 was due to BC.

Measurement Results
The fabricated chip was mounted on a board using a chip-on-board wiring technique, as shown in Figure 8a. Figure 8b shows the measurement setup employed for the performance evaluation of the proposed wake-up and data receiver. It included an RF generator for the RF input signal and its OOK modulation. An STM32 Nucleo board (Main Nucleo in Figure 8b) was used for the generation of the bitstream, programming the SIPO register, processing the output bits generated by the WuRx and managing the calibration cycle. An additional STM32 Nucleo board, as described below, was used to characterize the impact of the gated-oscillator CDR on the WuRx sensitivity. (a)

Measurement Results
The fabricated chip was mounted on a board using a chip-on-board wiring technique, as shown in Figure 8a. Figure 8b shows the measurement setup employed for the performance evaluation of the proposed wake-up and data receiver. It included an RF generator for the RF input signal and its OOK modulation. An STM32 Nucleo board (Main Nucleo in Figure 8b) was used for the generation of the bitstream, programming the SIPO register, processing the output bits generated by the WuRx and managing the calibration cycle. An additional STM32 Nucleo board, as described below, was used to characterize the impact of the gated-oscillator CDR on the WuRx sensitivity.
The input impedance at the SMA connector was characterized by means of a vector network analyzer (VNA) in the 10 MHz-1.5 GHz range (see Figure 9). The resonance frequency clearly visible around 1.1 GHz was due to the wire inductance and the input capacitance (2.95 pF), which could be ascribed mainly to the pad, as verified by means of an extracted lumped element equivalent circuit. Indeed, in the present implementation, a standard analog pad was used, which needed to be replaced by a low-capacitance RF pad in the final implementation. Due to these limitations, the present prototype did not address the implementation of the input matching network (IMN). Consequently, all measurements shown hereafter were performed with a 50 Ω resistor soldered parallel to the input of the ED and using a commercial coaxial impedance adapter (see Figure 8b), thus providing a unity gain IMN. Since the AFE response is independent on the RF carrier frequency, all measurements were performed using the 868 MHz European ISM band carrier frequency.
For the sake of completeness, IMNs for different carrier frequencies were designed using the extracted input impedance lumped element model to estimate the obtainable IMN voltage gain. The simulated IMNs were based on an L-shaped inductor-capacitor (LC) stage using inductances with quality factor Q = 80 [21]. The simulated IMN gains at 100 MHz, 433 MHz and 868 MHz were 24.8 dB, 17.3 dB and 8.3 dB, respectively. The simulated IMN gains needed to be added to the measured circuit sensitivity to obtain the projected WuRx total sensitivity. frequency, all measurements were performed using the 868 MHz European ISM band carrier frequency.
For the sake of completeness, IMNs for different carrier frequencies were designed using the extracted input impedance lumped element model to estimate the obtainable IMN voltage gain. The simulated IMNs were based on an L-shaped inductor-capacitor (LC) stage using inductances with quality factor Q = 80 [21]. The simulated IMN gains at 100 MHz, 433 MHz and 868 MHz were 24.8 dB, 17.3 dB and 8.3 dB, respectively. The simulated IMN gains needed to be added to the measured circuit sensitivity to obtain the projected WuRx total sensitivity. Figure 9. Input admittance vs. frequency. Blue indicates the measured real part of the input admittance, orange is the simulated real part of the input admittance using the extracted model, yellow is the measured imaginary part of the input admittance and violet is the simulated imaginary part of the input admittance using the extracted model.
First, functional tests were performed to verify correct operation. Then, systematic measurements were accomplished to characterize the missed detection rate (MDR) and the false alarm rate (FAR). Finally, the capability of the WuRx to receive long sequences of data was investigated, and the performance of the bias and calibration circuit was analyzed.
The functional tests revealed problems with the data-startable baseband logic, which operated correctly only for a supply voltage ranging from 0.3 V to 0.5 V (i.e., lower than the nominal 0.6 V). To investigate the precise origin of this unexpected problem, postlayout transistor-level simulations were carried out for different supply voltage values. The simulation results revealed the occurrence of ringing phenomena caused by interline Figure 9. Input admittance vs. frequency. Blue indicates the measured real part of the input admittance, orange is the simulated real part of the input admittance using the extracted model, yellow is the measured imaginary part of the input admittance and violet is the simulated imaginary part of the input admittance using the extracted model.
First, functional tests were performed to verify correct operation. Then, systematic measurements were accomplished to characterize the missed detection rate (MDR) and the false alarm rate (FAR). Finally, the capability of the WuRx to receive long sequences of data was investigated, and the performance of the bias and calibration circuit was analyzed.
The functional tests revealed problems with the data-startable baseband logic, which operated correctly only for a supply voltage ranging from 0.3 V to 0.5 V (i.e., lower than the nominal 0.6 V). To investigate the precise origin of this unexpected problem, postlayout transistor-level simulations were carried out for different supply voltage values. The simulation results revealed the occurrence of ringing phenomena caused by interline capacitances between the O3 and Clock signals in Figure 4, which had been underestimated by the extractor. The problem could be suppressed by lowering the supply voltage. Therefore, all measurements shown hereafter have been performed with a 0.4 V supply for the baseband logic. Figure 10 shows the sample measured waveforms in response to a packet composed of a 3-bit preamble (100) followed by a 16-bit string matching the stored codeword (1011101101010011). This measurement was performed with a −34 dBm RF input sequence at 1 kbps, with a 0.5% clock frequency error (| f ck − f b |/ f b ) measured after calibration. The curves demonstrate that the ED output was the correct envelope of the modulated RF signal, the generated clock sampled the DDin accurately and the baseband logic correctly generated the wake-up pulse.
MDR measurements were performed to evaluate the sensitivity of the WuRx. The MDR is the ratio between the number of missed wake-ups and the total number of sent packets. To evaluate it, the Nucleo was employed to generate 10,000 equal 19-bit packets (identical to the one reported in Figure 10) separated by 100 ms from each other and to count the number of wake-up pulses. To investigate the impact of the GO-CDR on the sensitivity of the WuRx, an additional Nucleo (see Figure 8b), synchronized and running in parallel with the main one, was employed to decode the AFE output (Din, see Figure 1) with an external precisely timed clock and to compare the received stream with the one transmitted by the main Nucleo. The difference between the MDRs computed by the two Nucleo boards was a measure of how far the proposed GO-CDR affected the WuRx sensitivity. The MDR results are reported in Figure 11. The measurements were performed by changing the power of the input RF signal and adjusting the AFE comparator threshold accordingly with a 0.5% GO free-running frequency error measured after calibration. Measurements were repeated for correlator thresholds equal to 16/16, 15/16 and 14/16. with an external precisely timed clock and to compare the received stream with the one transmitted by the main Nucleo. The difference between the MDRs computed by the two Nucleo boards was a measure of how far the proposed GO-CDR affected the WuRx sensitivity. The MDR results are reported in Figure 11. The measurements were performed by changing the power of the input RF signal and adjusting the AFE comparator threshold accordingly with a 0.5% GO free-running frequency error measured after calibration. Measurements were repeated for correlator thresholds equal to 16/16, 15/16 and 14/16.  The Nucleo dedicated to decoding the AFE output was programmed consistently. The input power corresponding to MDR = 10 −3 , when the received data was processed by GO-CDR, was PIN = −35.75 dBm for the 16/16 case and PIN = −36 dBm for the 14/16 and 15/16 cases. The Nucleo that decoded the Din with an external clock counted an MDR = 10 −3 for PIN = −36.25 dBm for all correlator thresholds. Therefore, the use of the proposed GO-CDR circuit affected the sensitivity of the WuRx at MDR = 10 −3 for 0.5 dBm. The same measurement procedure was repeated with different codewords by varying the number of consecutive zeros and ones, the correlator threshold and the codeword length. The measured MDR = 10 −3 was always found for PIN = −35.75 dBm which, as for the aforementioned measurements, was affected by GO-CDR for 0.5 dBm.
In the 16/16 case, the total sensitivity at MDR = 10 −3 referred to the input of the IMN, which included the projected IMN voltage gain, as explained above, of −60.5 dBm, −53 dBm and −44 dBm at 100 MHz, 433 MHz and 868 MHz, respectively.
To measure the false alarm rate (FAR), which is defined as the number of false wakeups per hour due to the noise present in the receiver, the input of the coaxial impedance The same measurement procedure was repeated with different codewords by varying the number of consecutive zeros and ones, the correlator threshold and the codeword length. The measured MDR = 10 −3 was always found for P IN = −35.75 dBm which, as for the aforementioned measurements, was affected by GO-CDR for 0.5 dBm.
In the 16/16 case, the total sensitivity at MDR = 10 −3 referred to the input of the IMN, which included the projected IMN voltage gain, as explained above, of −60.5 dBm, −53 dBm and −44 dBm at 100 MHz, 433 MHz and 868 MHz, respectively.
To measure the false alarm rate (FAR), which is defined as the number of false wakeups per hour due to the noise present in the receiver, the input of the coaxial impedance adapter was closed on a 50 Ω resistance. Typically, a FAR ≤ 1/h is considered acceptable [5]. The Nucleo was used for counting the number of false wake-ups. The correlator was programmed with the 14/16 threshold, the AFE comparator threshold V THR was set to the value corresponding to P IN = −35.75 dBm, and the clock frequency error measured after calibration was 0.5%. Measurements were performed for 24 h time windows, resulting in zero overall false wake-ups.
To evaluate the WuRx capability to receive long sequences of data, additional MDR measurements were performed by sending 3174 equal 63-bit packets (for a total of 199,962 transmitted bits) separated by 100 ms from each other. All the transmitted packets contained a sequence of 20 consecutive ones. As reported in Section 4, the 63-bit packet length was limited in the present prototype by the chosen maximum timeout value of the baseband logic (see Figure 5). To perform these measurements, the output stream of the baseband logic (DDin) was sampled by the main Nucleo, using the clock generated by GO-CDR with a 0.5% frequency error after calibration (see Figure 8b). As for the previous MDR measurements on 16-bit codewords, an additional Nucleo was employed to decode the AFE output with an external clock and then to compare the received stream with the one transmitted by the first Nucleo. Measurements were repeated with thresholds on the received bits equal to 63/63 and 58/63. In case the received sequence was processed by GO-CDR, an MDR = 10 −3 was found for P IN = −35 dBm and P IN = −35.5 dBm for the 63/63 and 58/63 cases, respectively. When the received sequence was decoded off-chip by the external MCU clock, an MDR = 10 −3 was found for P IN = −36.25 dBm in either threshold case. Therefore, the use of the on-chip clock degraded the WuRx sensitivity by 1.25 dBm. This measured packet sensitivity differed from the 16-bit code sensitivity in Figure 11 by 0.75 dBm, thus demonstrating the GO-CDR capability to also process long data streams. These results lead to the conclusion that the sensitivity is limited by the AFE.
Measurements were repeated in the 63/63 threshold case by varying the number of consecutive zeros and ones from 1 to 63 bits. In any case an MDR = 10 −3 was found with P IN = −35 dBm.
Finally, measurements were performed to test the bias and calibration circuit (Figure 7), supplying the GO with the nominal V DD = 0.6 V. The main Nucleo was used to generate the reference clock (Clock_ref) for the frequency detector and manage the control signals (start_calib, end_calib) (see Figure 8b). The current Ibias was set to get an initial frequency error between −20% and +20% relative to the nominal frequency (1 kHz), and a calibration cycle was performed for each value of Ibias. Figure 12 shows the measured mean frequency error post-calibration, evaluated over 2000 clock periods. The maximum frequency error after calibration was limited to 0.5%, which was consistent with the simulation results. In these conditions, the GO-CDR was tested in terms of the maximum number of equal consecutive bits (N m ). To perform this measurement, 63-bit packets, characterized by a variable number of zeros and ones, were provided to GO-CDR (i.e., excluding the AFE) by the Nucleo. The same Nucleo was used to sample the DDin using the clock generated by GO-CDR. The measurements revealed N m = 63 bits, thus demonstrating that GO-CDR was able to process packets even in case where they were made of all zeros or ones. With a 1% clock frequency error, N m decreased to 50 bits. These results were consistent with both Equations (1) and (2) and the simulation results, which projected N m~1 00 bits and 50 bits with α = 0.005 and 0.01, respectively (i.e., far above the maximum packet length of the WuRx). Furthermore, N m was not affected by the noise in the GO. The clock rms jitter was found to be 3 µs, thus revealing that N m was only affected by the free-running GO-CDR frequency error.

Discussion and Conclusions
This paper presented a nanowatt WuRx which enabled nodes to receive long data streams in addition to a wake-up codeword. It included an always-on clockless AFE and a data-startable baseband logic based on a gated oscillator clock and data recovery (GO-CDR) circuit. GO-CDR ensured phase alignment between the received data and clock with nanowatt power consumption, thus avoiding the use of power-hungry PLLs or crystal oscillators. Any free-running frequency mismatch between the GO and bitrate did not limit the number of receivable bits, but rather only the maximum number of receivable equal consecutive bits (Nm). To overcome this limitation, the proposed system included a frequency calibration circuit.
The proposed architecture was fabricated in STMicroelectronics 90 nm BCD technology. The circuit was supplied with 0.6 V, and the overall power consumption, excluding the calibration circuit, was 12.8 nW during the rest state and 17 nW at a 1 kbps data rate. Measurements on the GO-CDR calibration circuit revealed that, starting from a ±20% initial error, the maximum free-running frequency error after calibration was ±0.5%. In these conditions, the GO-CDR correctly sampled packets even if they were made of all zeros or ones. In the same conditions, with a 100 MHz RF carrier 1 kbps OOK modulated input, a 10 −3 missed detection rate (MDR) with a −60.5 dBm sensitivity (including the projected input matching network gain) was measured, transmitting 16-bit codewords and tolerating 0 errors. The WuRx sensitivity was mainly limited by the AFE. A comparison with an experimental setup where sampling and correlation were performed by an external MCU with precise clock showed that the GO-CDR reduced the WuRx sensitivity by 0.5 dBm. Furthermore, it has been verified through measurements that WuRx received, with MDR = 10 −3 , 63-bit packets, even if they were made of all zeros or ones, with a 0-bit error tolerance and a −59.8 dBm sensitivity (including the projected input matching network gain). In this case, the GO-CDR affected the sensitivity for 1.25 dBm. Finally, the WuRx false alarm rate (FAR) was measured for 24 h time windows, resulting in zero overall false wake-ups. Table 1 summarizes the system performance and compares it with other state-of-theart WuRxs reported in the literature. When we compare the Figure-of-Merit (FoM), which is conventionally defined to take into account the sensitivity normalized to the bitrate and the power consumption, it can be observed that our implementation provided similar performance compared to other state-of-the-art WuRxs. However, it must be remarked that the sensitivity is determined essentially by the AFE, which is not the main focus of this paper. Therefore, we do not comment further on this point.

Discussion and Conclusions
This paper presented a nanowatt WuRx which enabled nodes to receive long data streams in addition to a wake-up codeword. It included an always-on clockless AFE and a data-startable baseband logic based on a gated oscillator clock and data recovery (GO-CDR) circuit. GO-CDR ensured phase alignment between the received data and clock with nanowatt power consumption, thus avoiding the use of power-hungry PLLs or crystal oscillators. Any free-running frequency mismatch between the GO and bitrate did not limit the number of receivable bits, but rather only the maximum number of receivable equal consecutive bits (N m ). To overcome this limitation, the proposed system included a frequency calibration circuit.
The proposed architecture was fabricated in STMicroelectronics 90 nm BCD technology. The circuit was supplied with 0.6 V, and the overall power consumption, excluding the calibration circuit, was 12.8 nW during the rest state and 17 nW at a 1 kbps data rate. Measurements on the GO-CDR calibration circuit revealed that, starting from a ±20% initial error, the maximum free-running frequency error after calibration was ±0.5%. In these conditions, the GO-CDR correctly sampled packets even if they were made of all zeros or ones. In the same conditions, with a 100 MHz RF carrier 1 kbps OOK modulated input, a 10 −3 missed detection rate (MDR) with a −60.5 dBm sensitivity (including the projected input matching network gain) was measured, transmitting 16-bit codewords and tolerating 0 errors. The WuRx sensitivity was mainly limited by the AFE. A comparison with an experimental setup where sampling and correlation were performed by an external MCU with precise clock showed that the GO-CDR reduced the WuRx sensitivity by 0.5 dBm. Furthermore, it has been verified through measurements that WuRx received, with MDR = 10 −3 , 63-bit packets, even if they were made of all zeros or ones, with a 0-bit error tolerance and a −59.8 dBm sensitivity (including the projected input matching network gain). In this case, the GO-CDR affected the sensitivity for 1.25 dBm. Finally, the WuRx false alarm rate (FAR) was measured for 24 h time windows, resulting in zero overall false wake-ups. Table 1 summarizes the system performance and compares it with other state-of-theart WuRxs reported in the literature. When we compare the Figure-of-Merit (FoM), which is conventionally defined to take into account the sensitivity normalized to the bitrate and the power consumption, it can be observed that our implementation provided similar performance compared to other state-of-the-art WuRxs. However, it must be remarked that the sensitivity is determined essentially by the AFE, which is not the main focus of this paper. Therefore, we do not comment further on this point. (1) Computed assuming 1% activity of reception [15]. (2) Input matching network (IMN) gains of this paper are estimated through simulations. (3) Sensitivity defined through a 10 −3 missed detection rate (MDR). In this paper it was evaluated using 63-bit packets. (4) Includes the IMN gains estimated through simulations. (5) Sensitivity defined through a 0.02 missed detection rate (MDR). (6) Normalized sensitivity = Sensitivity − 5logBW BB , where BW BB = bitrate (derived from [8]). (7) FoM = Normalized sensitivity + 10log(Power/1 mW). (8) A half clock cycle phase-shifted RF transmission is sent after the initial transmission to protect against TX/WuRx asynchronization [20]. (9) Maximum packet length is only limited by the maximum timeout value. Table 1 shows that the proposed WuRx provides state-of-the-art performance in terms of the maximum packet length, error tolerance and maximum number of equal consecutive bits. Oversampling techniques, such as those in [5] and [9], exhibit limitations on the maximum packet length (11 bits and 63 bits, respectively) but do not set a constraint on N m . It must be noticed that [9] showed the only WuRx which achieved the same packet length as the wake-up and data receiver we propose (i.e., 63 bits). In Reference [9], a 13-bit error tolerance was accepted, while in our implementation, the same packet length was achieved with 0 errors and was only limited by the timeout register size. Furthermore, in [9], the sensitivity was evaluated with MDR = 20 × 10 −3 and FAR < 1/h, while as reported above, we characterized the performance of the proposed WuRx with more stringent constraints (i.e., MDR = 10 −3 and FAR = 0).
In conclusion, we believe that the proposed scheme is well suited for ultra-low-power WuRxs with the capability to receive long streams.