A High Phase Detection Density and Low Space Complexity Mueller-Muller Phase Detector for DB PAM-4 Wireline Receiver

: A Mueller-Muller Phase Detector (MM PD) technology based on duo-binary four-level pulse amplitude modulation (DB PAM-4) with low complexity and high phase-detection density is presented. The proposed low complexity includes low phase-detection complexity and low space complexity of data processing. The waveform sifting technology simpliﬁes 175 speciﬁc waveform changes into ﬁve fuzzy waveform change trends, reducing the complexity of subsequent phase detection. By making the data sample before the waveform sifting, the data bit width is reduced from 8 bit to 3 bit, which realizes data dimensionality reduction, greatly reduces the scale of subsequent auxiliary data, reduces the number of basic devices by 13.7%, and reduces the spatial complexity of data processing. The coherent coding of DB PAM-4 combined with waveform sifting increases the phase-detection density from 50% to 65% and improves both phase-detection density and phase-detection gain by 30%, and improves the jitter tolerance. Through the simulation of the clock and data recovery (CDR) model built by Cadence, the fast locking capability of CDR is veriﬁed.


Introduction
In high-speed serial port design, multi-level modulation technology has been applied, such as a four-level pulse amplitude modulation (PAM-4). In the transmission protocol of peripheral component interconnect express (PCIe), PAM-4 has been adopted since PCIe 5.0. In the optoelectronic interfaces, PAM-4 is the current mainstream electrical interface and has evolved into the main growth trend of replacing non-return to zero (NRZ) with PAM-4. In the current situation of limited transmission bandwidth, to improve the transmission rate and bandwidth utilization, the development of high-order modulation technology has become inevitable.
In the process of high-speed signal transmission, compressing data bandwidth to improve transmission efficiency is an effective way to improve signal transmission quality. Duo binary (DB) modulation has the function of compressing signal bandwidth at the same transmission rate [1]. In order to reduce the transmission bandwidth, the DB PAM-4 combined with the DB modulation technology can compress the bandwidth. At the same working rate, the bandwidth required by DB PAM-4 is greatly reduced, but the 7-level characteristic of DB PAM-4 makes the signal eye height compressed to half [2][3][4], increasing the difficulty of the subsequent judgment.
The high-speed signal channel is seriously lost, and clock and data recovery (CDR) needs to extract the clock information contained in the data, generate a synchronous clock to resample the data, and complete the data retiming and synchronous clock recovery. The phase information extraction of conventional CDR relies on the Bang-Bang (BB) phase-detection technique [5][6][7], which requires two samples 1 unit interval (UI) to obtain variable edge information and data information in turn, and requires high sampler bandwidth. Using Mueller-Muller (MM) phase-detection technique [8][9][10][11], with only sample 1 UI, reducing the sampler bandwidth requirement and the amount of data processing. The original structure of the sampler plus CDR for Data Retiming under pam-4 modulation [11][12][13] is not applicable to the current modulation mode. The analog to digital converter (ADC) + digital signal processing (DSP) architecture is a modular architecture, in which the output clock of the phase interpolator (PI) controls the high-speed ADC for data sampling, and the DSP controls the clock phase of the PI [14][15][16], which is applicable to DB pam-4. ADC integrate to the system through intellectual property (IP) core for data sampling and quantization. DSP performs data processing and loop control. For DB PAM-4, a generalized low-complexity phase-detection method based on multiple-level MM phase-detection technique is proposed, and the CDR based on this phase-detection method is built in ADC + DSP architecture.

System Architecture Documentation
Compared with PAM-4, DB PAM-4 changes from 4 level to 7 level. When the output swing is the same, the eye height becomes half, the eye width is reduced, the eye diagram is compressed, and the CDR is increased judgment difficulty. The recovered quarter-speed clock and 3-bit data of the CDR under DB PAM-4 modulation are shown in the Figure 1. The recovered data is equivalent to the rightmost waveform. The ADC + DSP architecture used by CDR as shown in Figure 2, and the working principle is as follows. The input data is an equalized 112 Gbps DB PAM-4 and sampled by a 56 GS/s 8-bit ADC, driven by the PI. The 64 × 8 bit data obtained by sampling and the previous group of data is stored in 2 × 8 bit data, and are converted into 66 × 8 bit data with a single line rate of 875 Mbps through the combination of the first in first out (FIFO) module. MM PD takes 3 UI data as the minimum phase-detection unit, completes parallel detection with 66 × 8 bit data, and obtains 64 sets of phase-detection information. The 16 bit temperature code is obtained through DSP processing, and the PI is controlled to perform a clock phase shift so that the sampling clock falls near the optimal sampling point. The best DB PAM-4 signal is obtained, and the phase-locking process of the CDR loop is completed. The ADC is an 8-bit precision digital-to-analog converter driven by a 4-phase clock for sampling to obtain 64 channels of parallel data. The FIFO combines data information. The MM PD consists of the same 64 sets of MM phase-detection modules. The DSP consists of the voter, digital filter, phase integrator, and bandwidth controller. The PI is controlled by a 16-bit temperature code, works on a four-phase 14 GHz clock, equivalent to a 56 GHz clock, and generates a 4-phase sampling clock through vector synthesis.
The BBPD has a unique locking point, while the locking point of MMPD is a phase region, and the unique locking point has little phase redundancy. The MMPD avoids the inherent phase jitter problem of BBPD due to small phase error. In order to improve the insufficient CDR phase-detection density at high loop speed, the MMPD is redesigned for DB PAM-4. With reference to the CDR loop of DB PAM-4 based on BBPD [16], this MMPD is introduced later.

MMPD Design for DB PAM4
DB PAM-4 is a seven-level signal, and it cannot meet the needs of phase-detection only by the voltage threshold "0" judgment [14], and at least six-voltage threshold judgments are required in all levels. The MM phase-detection algorithm realizes phase-detection at the edge of data conversion. The eye diagram of the 3 UI of DB PAM-4 is shown in the Figure 3. Compared with 3 UI PAM-4, DB PAM-4 has an increase in data conversion edge from 64 cases to 175 cases, while only 37 objects need to be discriminated in BBPD mode [16], the complexity of phase-detection process increases significantly, mainly in the space complexity. To solve the complex phase-detection problem of DB PAM-4, the waveform sifting technology used to transform the phase-detection object into five types of judgment processes based on trends, which significantly reduces the space complexity of the phasedetection logic process. The reduction in space complexity lies in a significant reduction in the amount of auxiliary data and basic devices in the process.
The MMPD of DB PAM-4 consists of a Data Sampler, Waveform Selector, Error Sampler, and Phase Detector, as shown in Figure 4, with the core of waveform sifting, error judgment, and phase judgment.
The waveform sifting process of DB PAM-4 divides the waveforms into five categories according to the relative relationship between D n−2 , D n−1 and D n , as shown in Table 1, including Up, Down, Keep-Jump, Jump-Keep, No-Decision, as shown in Figure 5. The operation principle is as follows: The D n [7:0] obtained by the 8-bit data ADC is compared with the six judgment thresholds in Figure 3 through the data sampler, and the 3-bit data d n [2:0] as shown in Figure 4, then input to the waveform selector, according to the relative relationship of the three continuous data in Table 1 to get the work mode mode[2:0]; It corresponds to 001, Down corresponds to 100, Keep-Jump corresponds to 110, Jump-Keep corresponds to 011, and No-Decision corresponds to 000. After waveform sifting, Up, Down, Keep-Jump, and Jump-Keep are used as phase-detection objects, and the phase-detection waveform is reduced from 175 to 114.
The waveform selector consists of comparator and selector. By adding quantization processing, the waveform selector input is reduced from 24 bit to 9 bit, the comparator input is reduced from 8 bit to 3 bit, and the circuit size is reduced by 5/8. While three data samplers add some space, they reduce the amount of auxiliary data during computation. After digital synthesis, the look-up table (LUT) and FIFO structures that occupy the most data space are reduced from 56 to 44, which is equivalent to reducing the data space by 21.4%.   This PD can provide phase-detection information in 50% of the data transitions in PAM-4, and the phase-detection density is 50%, which is the same as the typical density under NRZ data [8]; the phase-detection density of PAM-4 MMPD [11] is about 43.75%. The phase-detection density of DB PAM-4 is about 65% and 25% with BBPD [16]. As shown in Table 2, DB PAM-4 improves the phase by about 30% compared to 50% of the classical phase-detection density and 160% compared to BBPD. The logical expression of the Data Sampler logic gate circuit is:

Relative Relation Number Probability
According to the logic expression, the logic gate circuit of the Data Sampler is shown in Figure 6.
According to the logic expression, the logic gate circuit of the Comparator is shown in Figure 7.
According to the logic expression, the logic gate circuit of the Selector is shown in Figure 8. The error sampler give the error conclusion by compared the data with the error boundary. The error boundary is +V re f and −V re f centered on seven levels, 12 in total, as in Figure 9. This error boundary V re f is set in the form of a digital signal Ref Compared with the method of directly searching the LUT through the 8-bit raw data, the number of LUT and FIFO after the synthesis of the error sampler is reduced from 68 to 60, and the data space is reduced by 11.8%. The basic devices of the entire MMPD are reduced by 13.7%, which significantly reduces the auxiliary data and effectively reduces the space complexity of the phase-detection process. The Error Sampler circuit and logic expression will change with the change of the error boundary, and the logic expression is not unique, so no special design is carried out. The Phase Detector circuit obtains two phase difference signals Y E and Y L based on the preset phase-detection truth table according to the output mode[2:0] of the waveform selector and the error signal output by the error sampler. The circuit is shown in the Figure 10. The data selector circuit is specially designed to simplify the circuit structure, and the logic truth table is shown in Table 3.  According to the logical expression, the logic gate circuit of Data Selector is shown in Figure 11. According to the above principle of phase-detection, referring to the MMPD phasedetection logic [8], the MMPD phase-detection logic of DB PAM-4 is obtained, as shown inTable 4. Mode errdata n−2 errdata n−1 errup n−1 errlow n−1 errdata n Y E Y L Phase Info

MMPD Phase-Detection Gain Analysis
The MMPD outputs the phase-detection result in the form of levels, the early output low level is recorded as "−1", and the late output high level is recorded as "+1". Under the DB PAM4 data, the relationship between the sampling phase of the phase detector clock and the output voltage is shown in Figure 12. The phase-detection boundary is determined by the voltage thresholds +V re f and −V re f of the error sampler, and the voltage is converted into phase information to obtain the decision phases −ϕ re f and +ϕ re f . Due to the presence of jitter, the output voltage varies with the phase difference between clock and data. The output voltage can replace by the average value of the output voltage. Such a jitter generally includes Gaussian jitter, uniform jitter, and sinusoidal jitter. When the jitter is Gaussian with mean ϕ 0 and variance σ, the average output voltage is represented by µ. The early probability is Pr(early|ϕ 0 ), and the late probability is Pr(late|ϕ 0 ). When the average value of the phase difference is ϕ 0 , the expression of the average output voltage is µ [16]: According to the Gaussian distribution obtained: Similarly, obtained: Substituting (6), (7) into (5) obtained: Average output curve of variable variance when ϕ re f = 0.09 UI, as shown in Figure 13. When the phase is in the linear region, the equation can be simplified to obtain the gain in the linear region of the MMPD [8,16].
Using the first-order Taylor formula to approximate, there is some error when calculating the gain in the linear region of the MMPD. Since the approximate area of the first-order Taylor is limited, getting an accurate approximation of the gain, increasing the Taylor order can improve the calculation accuracy and reduce the error.
Based on Taylor series, let e −y 2 ≈ 1 − y 2 , e −y 2 ≈ 1 − y 2 + 1 2 y 4 , then When the error range is less than 0.01, the two data are considered equal, and the algebraic solution of y is approximately 0.381, 0.636. The equivalent integral range corresponding to ϕ 0 is ϕ re f ± 0.538σ, ϕ re f ± 0.899σ. Within the same error range, the second-order Taylor expansion has a larger phase margin than the first-order expansion. The deviation between the calculated gain and the measured gain is mainly caused by the insufficient precision of the approximate substitution [8,16]. When σ = 0.09 UI and ϕ re f = 0.09 UI, the output average voltage is input to µ, as shown in Figure 14. Combining the numerical analysis and images, select (7) as an approximate expression.
The gain of the PD is the slope of the output versus input phase difference, when the data source is an 11-level pseudo-random binary sequence (PRBS11), the phase-detection density of DB PAM-4 is 21/32, which is higher than 1/2 phase-detection density 30%, the phase-detection gain is 30% higher.
K MM : Input Gaussian jitter: At σ = 0.09 UI, ϕ re f = 0.09 UI, the measured gain at ϕ 0 = 0 is 3.5287, and the calculated gain is 3.6362, which basically matches the gain error and proves that the gain calculation is correct. Input uniform jitter and sinusoidal jitter [16]: Input sinusoidal jitter:

High Phase Density Analysis in CDR Loop
The purpose of improving MMPD is to improve CDR loop performance. Investigate the effect of high phase-detection density on CDR loop performance. By simplifying the CDR loop to analyze performance changes, the simplified model is shown in Figure 15.
TD is the phase-detection density determined by waveform sifting. K PD is the PD gain determined by the input jitter distribution. K P is the digital filter gain. K DPC is determined by the PI resolution. The open-loop transfer function: The open-loop gain K: The Closed loop transfer function: At a given jitter frequency, as the input phase ϕ in amplitude increases, the bit error rate (BER) begins to increase as the phase difference ϕ in − ϕ out approaches 0.5 UI. Therefore, to avoid BER rising, the following conditions need to be met: Substituting (15) into (16), obtain: The jitter tolerance expression G jt (z): Increasing the phase-detection density increases the open-loop gain, which improves jitter tolerance, and the loop becomes more tolerant to jitter.
According to the simplified loop equation, the index parameters of the design are given, as shown in the Table 5.

Simulation Result
The proposed algorithm is verified by constructing a CDR hybrid digital-analog model of MMPD in Cadence. The digital part of the system, including the MMPD and DSP algorithm parts, are all designed with Verilog code, PI uses the analog circuit designed by CMOS 28 nm process and the rest of the components are completed by Verilog-A and simulated based on AMS simulation mode. At 112 Gb/s, with a phase error of 0.48 UI input, the lock time of the CDR loop is verified and the performance of the CDR digital algorithm in DB PAM-4 and PAM-4 is tested.
This paper mainly verifies that the CDR loop completes the simulation of the clock and data locking process by inputting 112 Gb/s DB PAM-4. The phase shift of the 14GHz clock is 45 degrees, which is equivalent to moving 0.5UI on the DB PAM-4 eye diagram, as shown in the Figure 16. DB PAM-4 enters the locked state after 136.5 ns, as shown in the Figure 17. Enter the locked state after PAM-4 184.6 ns, as shown in the Figure 18. Compared with PAM-4, DB PAM-4 has a higher phase-detection density and shorter locking time, which is consistent with theoretical speculation.

Conclusions
The clock and data recovery (CDR) design of the 112Gb/s duo-binary four-level pulse amplitude modulation (DB PAM-4) receiver adopts the new Mueller-Muller Phase Detector (MMPD) combined with the digital signal processing (DSP) algorithm to complete the multistage phase-detection task with a shorter locking time, and high phase-detection density, and low phase space complexity detection. An MMPD algorithm based on DB PAM-4 is proposed, which reduces the space complexity of all-digital PD by the 7-level. The DB PAM-4 CDR algorithm based on PAM-4 CDR design shortens the design cycle, and analog to digital converter (ADC) + DSP architecture is compatible with low-order modulation methods such as PAM-4, which can realize the multi-mode isomorphic design of the receiver.