A 4-bit 36 GS / s ADC with 18 GHz Analog Bandwidth in 40 nm CMOS Process

: This paper presents a 4-bit 36 GS / s analog-to-digital converter (ADC) employing eight time-interleaved (TI) ﬂash sub-ADCs in 40 nm complementary metal-oxide-semiconductor (CMOS) process. A wideband front-end matching circuit based on a peaking inductor is designed to increase the analog input bandwidth to 18 GHz. A novel o ﬀ set calibration that can achieve quick detection and accurate correction without a ﬀ ecting the speed of the comparator is proposed, guaranteeing the high-speed operation of the ADC. A clock distribution circuit based on CMOS and current mode logic (CML) is implemented in the proposed ADC, which not only maintains the speed and quality of the high-speed clock, but also reduces the overall power consumption. A timing mismatch calibration is integrated into the chip to achieve fast timing mismatch detection of the input signal which is bandlimited to the Nyquist frequency for the complete ADC system. The experimental results show that the di ﬀ erential nonlinearity (DNL) and integral nonlinearity (INL) are − 0.28 /+ 0.22 least signiﬁcant bit (LSB) and − 0.19 /+ 0.16 LSB, respectively. The signal-to-noise-and-distortion ratio (SNDR) is above 22.5 dB and the spurious free dynamic range (SFDR) is better than 35 dB at 1.2 GHz. An SFDR above 24.5 dB and an SNDR above 18.6 dB across the entire Nyquist frequency can be achieved. With a die size of 2.96 mm * 1.8 mm, the ADC consumes 780 mW from the 0.9 / 1.2 / 1.8 V power supply.


Proposed ADC Architecture
For an ultra-wideband and high-speed ADC, analog-to-digital conversion circuit, high-frequency clock generation circuit, and high-rate data output circuit are indispensable parts. The analog-to-digital conversion circuit realizes the analog-to-digital conversion function of the signal, while meeting the requirements of the bandwidth and sampling rate; the high-frequency clock generation circuit generates a high-frequency clock signal through a phase-locked loop (PLL) inside to control the normal operation of the ADC; high speed interface receives the high-speed parallel data from the ADC, and converts it to serial data and outputs. Figure 1a illustrates the overall architecture of the proposed ADC. Compared with the requirement for accuracy, the requirements for the sampling rate and signal bandwidth of the ADC chip are higher. Therefore, the flash structure is most preferable. Due to the limitation of the cut-off frequency in the CMOS process, the maximum sampling rate that can be achieved by a single flash ADC is still limited. In order to retain a certain margin, a single ADC channel is designed to work at a conversion rate of 4.5 GS/s. Then 8 channels are time-interleaved to reach the overall data conversion rate of 36 GS/s. The ADC overall timing diagram is shown as Figure 1b. The CLK m inside the chip is an 18 GHz main clock signal. After the frequency division process, 8 different phase sampling clock signals with a 25% duty ratio are generated to drive each channel. One conversion period of the sub-channel is 222 ps, of which 55.5 ps is used by the sampling circuit to sample the output signal of the input buffer, and 166.5 ps is used by the data conversion circuit to quantize the sampled signal. The circuit always has only 2 channels that are in the sampling phase at the same time, so the load of the input buffer stage is a small value, which will not cause a large attenuation of the high-frequency input signal. And the output load is stable, which can guarantee the linearity of the output signal.
For high-speed ADCs, a high-frequency clock is required as the reference clock signal for the ADC circuit to drive all parts of the ADC. Therefore, a high-frequency PLL is integrated inside the chip. Compared with using external high-speed clock input directly, using PLL can ensure the quality of the high-frequency clock signal, thereby ensuring the conversion performance of the ADC [16,17].
A wideband front-end matching circuit based on a peaking inductor is designed to increase the analog input bandwidth. The input buffer can also improve the distortion and the driving ability. Although it increases the noise and power of the ADC inevitably, a highly linear input buffer is essential to achieve the targeted linearity at the GHz sampling rate.
The quantized digital codes of each channel are aligned and then demuxed to reduce the transmission frequency for related digital processing. In terms of data output, due to the high conversion rate of the ADC, the whole throughput is very large. In this design, a high-speed serializer circuit is used to output the converted data. The serializer is a high-speed serial data output module. It combines multiple parallel low-speed data into high-speed serial data, thereby simplifying the structure of data transmission and realizing high-speed transmission. Due to process limitations, the speed of a single serializer is limited. In order to make the data output more stable and enable the data output to be easily received by the subsequent stage, a 9 Gbps serializer module is used in the design. 16 serializer modules work in parallel, and the overall data output rate reaches 144 Gbps.
In addition, some circuits in the chip need to be calibrated, including the comparator offset and timing mismatch. And the serial peripheral interface (SPI) circuit is integrated inside the chip to configure the on-chip related registers.
The analog-to-digital conversion and calibration circuits are the core parts of the ADC. For the single-channel ADC, the block diagram is shown as Figure 2. After being received by the front-end matching and the input buffer module, the input analog signal enters the track and hold (T/H) module and is sampled at a sampling rate of 4.5 GS/s. Then the sampled signal is sent to the comparator for comparison and output a 4-bit quantization code. In order to improve the overall performance of the ADC, calibrations of timing mismatch and comparator offset are implemented.

Wideband Front-End Matching Circuit
The front-end matching circuit is the first-stage circuit for the high-speed ADC to receive external input signals. Its performance directly determines the linearity and bandwidth of the ADC. As shown in Figure 3a, the common way to receive the input signal source is through a 50 Ω matching resistor. However, in the practical chip design, there are some non-ideal factors that cause attenuation of the high-frequency signal. When the chip is packaged, the pins of the chip and the pins of the package are connected together by bonding wire to achieve the purpose of electrical connection. The bonding wire is generally long, so it will introduce a large equivalent inductor. On the other hand, the pin of the chip is a metal pad with a large area, which has a large parasitic capacitance to ground on the layout. Considering these two non-ideal factors, the equivalent circuit is shown in Figure 3b. L wb is the equivalent inductor of the bonding wire between the chip pad and the package pin, and C pad is the parasitic capacitor from the pad to ground. This circuit shows a low-pass filter feature. When the high-frequency signal is input, the non-ideal parasitic will cause attenuation that reduces the bandwidth of the analog input signal.
In order to solve the signal attenuation caused by this non-ideal factor and increase the analog bandwidth, the wideband front-end matching circuit is used in this design, as shown in Figure 4. Before the load resistance R L , an on-chip inductor L in is implemented in series. Through reasonable parameter design, the resonant circuit shows the characteristics of high-pass filter and increases the bandwidth [18,19].  At this time, the load impedance seen from the signal source is: The voltage at point Q is: According to Equation (2), the voltage-frequency characteristics of the point Q varying with the input signal frequency can be obtained, as shown in Figure 5. It can be seen from Figure 5 that there is one resonance frequency point in the voltage -frequency curve. The voltage of the point Q increases with the increase of the input frequency and reaches the maximum value when the frequency reaches f 1 . After that, the voltage decays quickly. According to this characteristic, the frequency point f 1 can be set to the highest frequency of the input signal according to adjust the size of L in . Then when a high-frequency signal is input, the amplitude of the signal received by the load increase, which can achieve the effect of amplifying the high-frequency signal and increasing the analog input bandwidth of the circuit. Compared with other complex designs, this wideband front-end matching circuit can effectively expand the bandwidth with a small cost.

Novel Calibration of Comparator Offset
Sub-ADC uses the flash structure to achieve the fast single-channel conversion rate. As a core module of the flash ADC, the comparator plays a key role in signal quantization. It quantizes the input analog signal to a digital code of 0 or 1. The speed and accuracy of the quantization directly determine the overall performance of the ADC. For the high-speed comparator, its offset is easily affected by non-ideal factors such as asymmetry of layout and process mismatch. The offset voltage of comparator can be seen equivalently as an excursion in the conversion curve, which causes an error in the conversion result.
For the comparator offset in the flash ADC, a novel offset calibration is proposed in this paper, as shown in the Figure 6. Generally, the calibration is composed of detection and correction. A statistics-based offset detection is proposed to implement the offset detection. There are fifteen comparators with different thresholds in the comparator array, and each of them needed to be calibrated. The proposed detection works as follows. First of all, the offset detection of the comparator whose threshold voltage is the middle value among comparators is performed. Ideally, if a statistically symmetrical signal is input, such as a sine wave, the ratio of 0 and 1 output from this comparator should be approximately equal. Based on the above conclusion, the output distribution of the middle comparator is counted when a symmetrical distribution signal is input within a certain sampling point. If the ratio of 1 is greater/less than the ratio of 0, then the threshold of this comparator is adjusted to low/high direction through the offset correction circuit. The convergent iteration process based on least mean square (LMS) algorithm is performed as follows: where D cal [i + 1] is (i + 1)th digital detection code of offset calibration, D cal [i] is ith digital detection code of offset calibration, C d is the result of comparing 1 and 0 numbers, µ is the convergence step factor. If µ is a constant value, the fixed step will make the convergence time proportional to offset. The larger comparator offset occurs, the more time it will take to accomplish the convergence. Figure 7a shows the simulated calibration convergence with fixed step. For this situation, a dynamic step adjustment method is implemented in the proposed detection. The convergence step is no longer a fixed value, but a dynamic value related to the ratio of 0 and 1. Figure 7b shows the simulated calibration convergence with the proposed dynamic step. It is obvious that convergence speed with the proposed dynamic step is faster than fixed step.  After a number of loop iterations, the offset calibration value of the comparator can converge to a suitable value. At this time, the ratio of 0 and 1 output by the comparator is approximately equal, which indicates the offset calibration of this comparator is over.
For the offset detection of other comparators, a comparator threshold reference voltage adjustment circuit is designed on chip. According to the distribution of the comparator threshold, the threshold reference voltage of other comparators can be adjusted to the threshold of the middle comparator one by one, and then perform the offset detection according the above detection method until offset detection of 15 comparators is completed.
For dynamic comparators, the common offset correction methods are as follows [20,21]. Adding an adjustable capacitor at the output of the comparator is one way to calibrate the offset. By adjusting the size of the capacitors at both ends, the discharge speed at both ends of the circuit can be the same, thereby eliminating the offset. But this method will increase the load of circuit and reduce the conversion speed of the comparator. Another way is to draw out the substrate of the differential input MOSFETs, and change the threshold voltage of them by adjusting the substrate voltage to narrow the input offset. This method will not affect the normal operation of the circuit, but the special deep well devices need to be used to separate the substrate of the NMOSFET in the CMOS process. There is also a method of adding a pair of auxiliary differential pairs. By adjusting the gate voltages of the calibrated differential MOSFETs, the offset of the circuit itself is cancelled. The disadvantage of this method is that it will increase the noise of the comparator.
The schematic diagram of two-stage dynamic comparator is shown in Figure 8. The first-stage pre-amplifier before the dynamic comparator can isolate the reset signal from the input signal, thereby greatly reducing the noise fed back by the comparator to the input. Another function of the pre-amplifier is to convert the input common-mode voltage to an appropriate range to increase the regeneration speed of the second-stage dynamic comparator. The positive feedback of the second-stage comparator can output a comparison result and ensure the speed of the comparator.
For the comparator offset, we propose a method to correct the comparator offset by adjusting the bias current of pre-amplifier. In the pre-amplifier, a current mirror structure is generally used to provide a gate bias to the PMOS transistor load, and the equivalent load of the PMOS transistor can be changed by adjusting the magnitude of the bias current to calibrate the comparator offset. The PMOS transistor load at the right end of the pre-amplifier in Figure 8 is connected to the fixed bias current I bias , and the load at the left end is connected to the adjustable bias current I cal . Assuming that the input offset voltage of the comparator is V offset , the voltage difference generated at the output of the pre-amplifier is: Among them, R 0 is the output resistance of the pre-amplifier, and g M1,2 is the transconductance of the transistor M1, M2. The M1 and M2 are thin-oxide devices that can meet the speed requirement of the comparator. In order to calibrate the output voltage difference, the calibration current I cal needs to be adjusted as: In the Equation (5), K is the multiple of the current mirror. This offset correction method will neither increase the load on the intermediate nodes of the circuit nor affect the normal working sequence of the circuit. It does not affect the speed of the comparator, so it is suitable to use it in high-speed circuits. The correction current I cal is generated by the current-steering DAC, which is controlled by the offset detection output code. Ideally, the weight of the 7-bit input code of the current-steering DAC is 1, 2, 4, 8, 16, 32, 64 (set the minimum transition value of the current-steering DAC output as 1 LSB). But in fact, due to factors such as process and layout, the equivalent weight will drift. If the circuit corresponding to the 32-weighted input code is affected by the process mismatch and its equivalent weight drifts to 36, then the codes of 33/34/35 LSB will disappear. Figure 9a shows the transmission curve of non-ideal 7-bit DAC in this case. If the ideal offset calibration convergence value is 34 LSB, DAC output will jump between 32 LSB and 36 LSB eventually, as shown in Figure 10a. This will affect the accuracy of the calibration.  In order to achieve an accurate offset correction, the mismatch-insensitive offset correction is proposed. A 7-bit precision correction is taken as the example.
First of all, an 8-bit DAC with a redundant bit is designed based on the original 7-bit current-steering DAC in analog domain. The redundant bit weight is set as 8 LSB to cover mismatches less than 8 LSB. The ideal bit-weights of the 8-bit current-steering DAC are 1, 2, 4, 8, 8, 16, 32 and 64 now.
Second, the redundant encoder module is integrated to expands the 7-bit D cal (n) into 8-bit D R,cal (n) in digital domain. It works in four steps: 1.
The full-scale of the 7-bit digital control code is divided into 16 intervals of length 8 2.
Judge the slope of the interval code of D cal (n) 3.
Determine whether to use the two 8 weighted bits to replace the 16 weighted bit according to the slope of D cal (n) 4.
Obtain the 8-bit D R,cal (n) with a redundant bit.
The conversion characteristic of non-ideal 8-bit redundant current-steering DAC is shown in the Figure 9b. With the help of this algorithm, the redundant current-steering DAC output can be adjusted according to the slope of offset detection output code. Figure 10b shows the simulation of the calibration convergence with non-ideal 8-bit redundant current-steering DAC based on proposed algorithm. It is obviously that the convergence eventually reaches 34 LSB when the end point of the offset calibration convergence value is 34 LSB.
The proposed offset calibration algorithm can achieve quick detection and accurate correction with low cost. Meanwhile, it does not affect the speed of the comparator, so it is suitable to use it in high-speed circuits.

Multi-Phase High-Speed Clock Generation and Calibration
Time-interleaved ADC requires multi-phase clock signals to drive each sub-ADC to work normally. Usually, the chip only inputs a main clock signal, and the clock generation circuit needs to generate multi-phase signals to drive the sub-ADCs to work normally. The sampling rate of the chip in this design is 36 GS/s, and the frequency of the main clock signal generated by the chip's own PLL is 18 GHz. According to the working timing of the interleaved ADC in Figure 1b, the clocks required for each channel are 4.5 GHz clock signals. Therefore, it is necessary to divide the 18 GHz main clock into 4.5 GHz multi-phase clock to drive the multi-channel to work orderly.
The CMOS clock divider circuit has a simple structure and no static power consumption, but the speed is slow, which is suitable for low-frequency clock processing; the CML clock divider circuit works fast, but the power consumption and area consumption are large [22,23]. Based on comprise between speed and power consumption, a multi-phase high-speed clock generation circuit based on CMOS and CML is presented in this design. The CML clock divider circuit is used to divide the high-frequency clock signal in the first stage to ensure the quality of the high-frequency output clock. After the frequency division, the frequency of output clock is halved. Then use the CMOS clock divider circuit to divide the clock in the second stage to reduce the power consumption of the overall circuit. Figure 11a is a schematic diagram of the first-stage CML frequency-dividing circuit, which divides the differential master clock into 4 different-phase 9 GHz clock signals CLK_4<1:4>; Figure 11b is a schematic diagram of the second-stage frequency-dividing circuit, which divide the 4 different-phase 9 GHz clock signals into 8 different-phase 4.5 GHz clock signals CLK_8<1:8>. The voltage waveform of each node in the circuit is shown in Figure 11c. CLK_8<1:8> is a 4.5 GHz clock signal with a 50% duty cycle, which is used as the clock drive signal for the comparator array of each channel. For the sample-and-hold circuit of the ADC, a 4.5 GHz clock signal with a duty cycle of 25% is required. These signals can be generated by simple combinational logic between CLK_8<1:8> and CLK_4<1:4>. As shown in the Figure 12, the logical operation of C4<2> and C8<1> and C8<5> signals respectively can obtain 4.5 GHz clock signals C1 and C5 with a duty ratio of 25%, which drives the track-and-hold circuit of each channel to work normally. The Figure 12 only shows the processing of the clock signals in two phases, and the clock generation methods of the other phases are the same. The proposed clock divider circuit guarantees the working speed without causing large power consumption.
The 8 different-phase clock signals generated by the above scheme are used to drive the eight time-interleaved sub-ADCs. Ideally, the phase difference between adjacent phase clock signals should be 45 • . However, the incomplete symmetry of the circuit on the layout will cause the route length of the multiphase clock signal to be different, which will bring mismatch to the delay of the multi-phase clocks. On the other hand, the process mismatch in chip production will also cause timing mismatch. These will make the delay of the output clock of the frequency dividing circuit reach each channel different. The timing mismatch among the channels will reduce the overall linearity of the time-interleaved ADC [24,25]. In order to solve this problem, a timing mismatch calibration module is integrated to achieve the timing mismatch detection and correction [12]. Using high-speed finite impulse response (FIR) filters is a way to correct the timing mismatch in digital domain, but it is complex and the power consumption is large [26,27]. VDL is a simple and effective way to achieve the timing mismatch correction in analog domain, but the timing mismatch detection is not easy to realize. A wideband timing mismatch detection (WTD) is proposed and utilized on chip, as shown in Figure 13.
where τ m is the additional phase of each channel m due to timing mismatch. Channel 1 is used as a reference, so τ 1 can be regarded as 0.
The τ m can be extracted from the timing mismatch error e m written as following: It is difficult to prove the relationship between e m and τ m directly. But if the absolute value operation can be approximated by the squaring function, some conclusions can be derived [12]. Now, e m can be simplified to: Substituting the Equation (6) into the Equations (8) and (9) can be obtained.
Because the timing mismatch is a very small amount relative to T s , Equation (9) can be approximated to Equation (10): Timing mismatch error of channel 8 is Among them, the derivative of autocorrelation R x (t) can be expressed as where S x (f ) denotes the signal spectrum.
Since S x (f ) is an even function concerning f, it is easy to get the Equation (13): If the input signal x(t) is bandlimited to the Nyquist frequency, R x (T s ) can be expressed as Equation (14) based on the mean value theorem of integrals.
Because the item −4πζsin(2πζT s ) is a negative value and the integral of S x (f ) is positive, we can get the conclusion that R x (T s ) < 0. It indicates that the sign of 2τ m − τ (m+1)modM − τ m−1 and e m are different if there is a timing mismatch among them. e m can be subtracted cumulatively and feedback the result to the timing mismatch correction until e m turns to zero, which indicates the completion of detection.
Compared with algorithm proposed in [12], the WTD can achieve the timing mismatch detection of all channels at the same time. This saves time waiting for the completion of intermediate channel calibration before calibrating other channels. So the proposed algorithm can achieve fast timing mismatch detection as long as the input signal is bandlimited to the Nyquist frequency for the complete ADC system.

Measurement Results
The ADC occupies an area of 2.96 mm × 1.8 mm in a 40 nm CMOS process. Figure 14 shows its micrograph, including eight sub-ADCs, PLL, digital module and serializers output. In order to keep the power supply voltage clean and stable, a large number of decoupling capacitors are filled in the spaces between different modules. Figure 15 shows the static performance of the differential nonlinearity (DNL) and integral nonlinearity (INL). The measured DNL and INL before the TI mismatch calibration and offset calibration are −0.77/+0.96 and −0.76/+0.76 respectively. After the calibration, they are improved to −0.28/+0.22 LSB and −0.19/+0.16 LSB, as shown in Figure 16.   The output spectrums before and after calibration can indicate the effect of calibration the. Before the error between the channels and the offset of each comparator are calibrated, the spectrum characteristics of the ADC output when the proposed ADC worked at a sampling rate of 36 GS/s and input a 1.2 GHz sinusoidal signal are shown in Figure 17a. It can be seen that in addition to the high-order harmonic components of the signal, there are also spurs caused by offset, gain and timing mismatch. The spurs generated by the offset mismatch appear at the frequency of k*fs/8. The spurs produced by the mismatch of the gain and timing appear at the frequency of k*fs/8 ± fin. At the same time, because the offset of the sub-ADC comparator array is not calibrated, there are many other spurs in the spectrum. The SNDR of the output signal is 13.05 dB and the SFDR is 20.83 dB before calibration. After calibrating the mismatch between the channels of the chip and the offset of each comparator, the output spectrum when the sampling rate is 36 GS/s and the input frequency is 1.2 GHz is shown in Figure 17b. It can be seen that the spurs generated by the mismatch between the channels and some harmonics are greatly reduced. The SNDR of the output signal is 22.57 dB and the SFDR is 35.71 dB. When the input signal frequency is 8.1 GHz, the frequency spectrum is shown in Figure 18a. The SNDR of the output signal is 21.24 dB and the SFDR is 29.12 dB. When the input frequency is 17.1 GHz, the frequency spectrum is shown in Figure 18b. The SNDR of the output signal obtained from measurement is 18.6 dB, and the SFDR is 24.50 dB. The spurs produced by the mismatch of the gain and timing mismatch become the main spurious components. The higher the input frequency, the greater the impact of the timing mismatch on the output linearity.  Figure 19 shows the ADC performance with calibration versus the input frequency at 36 GS/s. It can be seen that in the input frequency range of 0~18 GHz, the SNDR of the ADC is greater than 18.6 dB, and the SFDR is greater than 24.5 dB. The ADC core consumes 780 mW powered at 0.9/1.2/1.8 V and the Walden figure of merit (FOM) of 1.9 pJ/step is achieved. Table 1 summarizes the comparison of our results with previous published papers for ADCs. Our work achieves a relatively good SFDR with the help of the proposed calibration. Compared to the ADCs manufactured in SiGe process, the proposed ADC manufactured in CMOS process has lower power consumption and achieves a nice FOM.

Conclusions
A 4-bit 36 GS/s analog-to-digital converter (ADC) employing eight flash sub-ADCs with calibration is presented in this paper. A wideband front-end matching circuit based on a peaking inductor is designed to increase the analog input bandwidth to 18 GHz. A novel offset calibration that can achieve fast detection and accurate correction without affecting the speed of the comparator is proposed, guaranteeing the high-speed operation of the ADC. In order to balance speed, quality and power of the high-speed clock, a clock distribution circuit based on CMOS and CML is implemented in the proposed ADC. A timing mismatch calibration is also integrated into the chip, which can achieve the timing mismatch detection of the input signal that is bandlimited to the Nyquist frequency. The measurement results show that the proposed ADC can achieve an analog bandwidth of 18 GHz at the sampling rate of 36 GS/s. The DNL and INL are −0.28/+0.22 LSB and −0.19/+0.16 LSB, respectively. The SNDR is above 22.5 dB and SFDR is better than 35 dB at 1.2 GHz. It can achieve an SFDR above 24.5 dB and an SNDR above 18.6 dB across the entire Nyquist frequency. With a die size of 2.96 mm *1.8 mm, the ADC consumes 780 mW from the 0.9/1.2/1.8 V power supply. The calculated Walden figure of merit reaches 1.9 pJ/step. Author Contributions: H.J. designed the circuits, analyzed the measurement data, and wrote the manuscript. X.G., X.Z., X.X., D.W., and L.Z. assisted the circuit simulation and implementation. X.G., X.Z. and J.W. guided the design, reviewed the manuscript and gave suggestions for revision; X.L. gave some valuable guidance and confirmed the final version of manuscript. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.