A 112 Gb/s DAC-Based Duo-Binary PAM4 Transmitter in 28 nm CMOS

: To reduce the high bit error rate of serial transceivers under strong channel attenuation, a low-power 112 Gb/s SerDes transmitter was designed using a duo-binary PAM4 modulation technology. By adopting duo-binary PAM4 modulation technology, the problem of the low band-width utilization of a high-speed PAM4 (pulse amplitude modulation 4) signal was improved. The problem of high jitter caused by charge sharing and the limited bandwidth of a 4:1 high-speed MUX was improved by using precharging auxiliary transistors. The system power consumption of the transmitter was reduced by using a 7-bit weighted voltage-driven digital-to-analog converter (DAC). The transmitter was designed with a 28 nm CMOS process and powered by a voltage of 0.9 V. The simulation results showed that when the channel attenuation was 20.9 dB, the transmitter could work at 112 Gb/s, the power consumption was 2.02 pJ/bit, and the linearity was 96.7%.


Introduction
In recent years, the big data industry has been booming.The combination of big data with emerging information technologies such as virtual reality and automatic driving, relying on the analysis and processing of massive data, has accelerated the development of industries such as medical treatment, transportation, and artificial intelligence, and has even changed the mode of economic development.The explosive growth of data volumes requires faster data processing and transmission.The traditional parallel interface improves the transmission speed by increasing the number of channels or increasing the data rate of a single channel.However, with the continuous improvement of the communication speed, serious crosstalk and coupling will occur between multiple closely connected channels.Furthermore, with the increase in transmission distance and communication rate, the synchronization between parallel channels becomes extremely difficult due to the changes in chip PVT (process voltage temperature) and the uncontrollable deviation in packaging, wiring, and plate.The high-speed serial interface (SerDes) only uses a pair of differential lines for signal transmission, which can enhance its anti-interference ability, resist noise, reduce I/O pins, and solve the synchronization problem, so the serial interface replaces the parallel interface, becoming the modern mainstream interface [1][2][3][4][5][6].A SerDes transmitter converts the multichannel low-speed parallel signal into a high-speed serial signal, which passes through transmission media (optical cable or copper wire), and finally converts the high-speed serial signal back into a low-speed parallel signal at the receiver [5][6][7].In the high-speed transmission of data, the NRZ (non return zero) modulation has a high attenuation in high-speed serial port applications above 112 Gb/s.Therefore, the PAM4 modulation has progressively been replacing the NRZ modulation, and its Nyquist frequency is half of that of the NRZ, which better solves the problem of data errors under a 70 dB strong channel attenuation [5,[8][9][10][11][12][13].
The University of California, Los Angeles in the United States proposed a new type of transmitter with an FFE delay unit constructed by LC with a speed of 50-64 Gb/s.FFE with an LC structure can improve the bandwidth of key delay chain and driver and uses a 4:1 MUX to save power consumption [14].The University of California, Berkeley, proposed an NRZ SerDes transceiver with up to 60 Gb/s speed, using adaptive equalization and offset cancellation techniques to cope with harsh channel conditions [15].Texas A&M University in the United States proposed a SerDes receiver with a speed of up to 10 Gb/s ADC, creatively combined with the traditional DFE circuit to significantly reduce the power consumption of the SerDes circuit [16].ESilicon proposed a PAM4 receiver with a rate up to 56 Gb/s, which adopted an ADC+DSP architecture.The architecture used an ADC (analog-to-digital converter) to sample and quantize the data and then used a CMOS logic circuit to implement multi-draw balancing and forward error correction in a DSP (digital signal processor), to alleviate bandwidth degradation and linearity constraints in traditional architectures [17].Compared with the existing references, the transmitter rate was not high, the PAM4/NRZ modulation was adopted more often, and the study of the duo-binary PAM4 modulation was concentrated in the optical channel [14,15,18,19].The transmitter designed in this paper is used in the electrical channel with a rate of 112 Gb/s and adopts the duo-binary PAM4 modulation, which has more concentrated energy and higher bandwidth utilization.The high-speed MUX uses precharging transistors to reduce the jitter of the transmitter, and the drive circuit uses a high-precision DAC architecture to improve the linearity.
In this context, a high-speed SerDes is studied in depth.A 112 Gb/s SerDes transmitter is designed based on the duo-binary PAM4 modulation technology.Under the condition of limited bandwidth transmission, the duo-binary PAM4 modulation can keep the bit error rate at a lower level than the PAM4 modulation, reducing the bandwidth requirement [10,11,18,20].This paper describes the general circuit architecture of the transmitter, the critical module of the transmitter is analyzed in detail, and the simulation results are given.

Basic Theory of SerDes
The high-speed SerDes interface, namely, the high-speed serializer/deserializer (HSS), uses a time-division multiplexing (TDM) and point-to-point communication mode.The SerDes interface and the transmission channel constitute the physical layer devices of the serial data transmission system, as shown in Figure 1.A SerDes transmitter serializes multichannel low-speed parallel data into high-speed serial data and drives the serial signal to the transmission channel.A channel (usually copper wire or optical cable) connects the sender and receiver and acts as a carrier of signal transmission.The receiver samples the signal received from the channel, restores the high-speed serial data to the low-speed parallel data, and then completes the whole process of data transmission.When testing a high-speed serial communication system, jitter and bit error rate should be paid more attention to.Jitter refers to the phenomenon that the jump edge of a signal deviates from its ideal or predetermined time.Nonideal circuits, nonideal channels, and noise are the causes of jitter.The total jitter of data can be divided into two categories, namely random jitter (RJ) and deterministic jitter (DJ).Random jitter is caused by unpredictable noise in the circuit.Deterministic jitter refers to a repetitive jitter whose behavior is predictable and is caused by certain deterministic factors, such as intersymbol interference, channel reflection, and channel bandwidth; these noises can be corrected by the circuit.Deterministic jitter can be divided into periodic jitter (PJ) and data-dependent jitter (DDJ) according to their generation mechanism.Periodic jitter refers to the jitter that repeats periodically.Because any periodic waveform can be composed of multiple sinusoidal signals, periodic jitter is also called sinusoidal jitter.Periodic jitter has nothing to do with data code type.It is generally caused by external noise (electromagnetic interference, spread spectrum clock interference, etc.) coupled to the system.Data correlation jitter is caused by the nonideal factor of the channel.Because different code types contain different frequency components, and the attenuation and delay of different frequency components of the channel are different, a nonlinear phase shift will occur.Therefore, the nonideal factors of the channel will produce a deterministic jitter related to data.In addition, the impedance discontinuity in the channel will reflect the signal, and the crosstalk of adjacent channels will cause the data jitter.Bit error ratio (BER) refers to the ratio of the number of received error bits to the total number of transmitted bits within a certain period.It is an important indicator to measure the link reliability of transceivers.In the process of signal transmission due to noise, loss, and other nonideal factors, the error code is inevitable, but as long as it is lower than a certain degree, it can be recovered by the algorithm.Different protocols require different bit error rates, but most systems require less than 10 −12 .

Transmitter System Architecture
The transmitter designed in this paper is shown in Figure 2, which adopts the 4:1 MUX based on a 1/4 speed architecture and low-power source-series terminated driver (SST Driver), including a data path and a clock path.The data path includes a pseudorandom code generator, precoding module, dual-binary conversion module (DB), the 4:1 MUX based on a 1/4 speed architecture, and the SST Driver.In each serial channel, a pseudorandom binary data stream is generated by the built-in PRBS generator (parallel PRBS gen.), 64 channels of 875 Mb/s parallel pseudorandom codes are precoded to eliminate the correlation between the front and back codes, and then double-binary signals are generated by the DB conversion module.The 64 channels of 875 Mb/s parallel signals are converted into 4 channels of 14 Gb/s signals through the 64:4 parallel series conversion module.After that, narrow pulses are sampled using rising and falling edges of clocks with phase differences of 90 degrees.The 4 channels of data are converted into a 56 Gb/s data stream, and the output is finally driven by the SST driver, generating the 112 Gb/s duo-binary PAM4 signal.The clock path contains a CML-to-CMOS level converter (CML2CMOS), frequency divider, and duty cycle calibration circuit.The input 14 GHz clock signal is converted to the CMOS signal through the CML2CMOS.After the duty cycle is calibrated, the clock signal is used to drive the 4:1 MUX.The divider generates the binary clock, which is used to drive the 64:4 MUX circuit.

Precoding Design
In the process of signal transmission, the two-level input signal generates the threelevel DB signal through the DB conversion circuit.The DB conversion circuit realizes y n = x n + x n−1 , and the signal judgment x n = y n − x n−1 is the reverse operation of the sending end.Therefore, the sampled decision value x n is closely related to x n−1 at the previous moment.Assuming that the sampled decision value x n−1 at the previous moment is wrong, error codes will appear at the next moment, and error codes will be transmitted continuously, resulting in an error transmission.Therefore, in order to have no correlation between the front and back codes, a precoding circuit is used before the DB conversion circuit.
The precoding module is usually behind the MUX, as shown in Figure 3. Considering the power, area, and PVT variations, it is not advisable to cascade active or passive devices to obtain an accurate delay in an open loop.Traditionally, the clock-driven flip-flop is used to ensure the delay time, but it also has strict phase requirements.Its timing diagram is shown in Figure 4; data experience T XOR and T D−Q delays through the XOR gate and flip-flop, respectively, [21].For the precoder to work properly, the two delays must contain an exact bit period T b , that is, T XOR + T D−Q = T b , where T XOR is the delay time after the XOR gate, T D−Q is the delay time after the flip-flop, and T b is an exact bit period.
The input clock has very little phase margin to generate the appropriate D − Q delay for the flip-flop.This timing problem becomes more serious at high speed.When the data rate rises to 112 Gb/s or above, 1 UI is only 8.93 ps, which makes the change of PVT difficult to resist.Therefore, when the data rate reaches 112 Gb/s, the data should be precoded before the MUX, as shown in Figure 5.The parallel precoding structure diagram proposed in this paper is shown in Figure 6. ( The resulting D out0 signal and D out1 signal are The timing sequence diagram of the proposed precoder is shown in Figure 7.The critical path of the circuit is controlled by the 1/2 clock from D x to D y , and its timing expression is where T qclk is the clock period, T b is an exact bit period, T d_lat is the time difference between the data and the clock, T d_mux is the delay time of the MUX, and T d_setup is the hold time.
Assume that D k stabilizes in half a cycle, or the cycle path of the whole cycle starts and ends at D y , and its sequence expression is where T d_d f f is the delay time after the flip-flop, T d_mux is the delay time after the MUX, and T d_setup is the hold time.

High Speed 4:1 MUX Circuit Design
The key to the high-speed serial transmitter is the high-speed analog MUX, which provides an adequate timing margin to ensure the accuracy of the timing of the whole circuit.The traditional 1/2 speed architecture MUX provides only a timing margin of 1 UI, when the data rate is 112 Gb/s, 1 UI is about 8.9 ps.To ensure the accuracy of the overall circuit timing, the 4:1 MUX based on 1/4 speed architecture is designed, which can extend the timing from 1 UI to 3 UI [5,[22][23][24].
The structure of the traditional pulse generating unit is shown in Figure 8a.Clock signals CK0 and CK90 control transistors M1, M4, and M3 to sample the signal.When the input data are at a high level, if CK0 is at a low level, M2 cuts off, and the output terminal charges the parasitic capacitor C X at X.The level at X is determined by the previous data.When CK0 is at a high level, M2 turns on, and the input data are sampled.When CK90 is at a low level, the M3 is in a cut-off state, the Y point is suspended before CK90 high level arrives, and its level is determined by the last input data and remains in the original state.When CK90 becomes high, the voltage at X is sampled, the voltage at Y becomes low, and the Y point remains low until the low level of CK0 arrives.When CK0 becomes low, the Y node extracts current from the output terminal to charge the parasitic capacitor C Y at Y, which leads to a charge-sharing effect and increases the longitudinal noise.At the same time, the charge sharing effect will slow down the rising edge of the output signal, which is presented on the eye diagram as the lateral jitter increases.Inspired by [7], our design adds the precharging auxiliary transistor PM7/PM8 on the traditional structure to eliminate the charge-sharing effect caused by parasitic capacitance.The specific circuit is shown in Figure 8b.Under the action of PM7/PM8, when CK0 and CK90 are at low level, PM7 and PM8 turn on, and the parasitic capacitance at X and Y is charged to high level.When CK0 changes from high level to low level, there is no need to extract current from the output end, thus eliminating the charge-sharing effect.The red curve in Figure 9 shows the waveform of the output signal simulation after adding PM7/PM8.From the simulation, it can be found that the pulse unit designed in this paper can significantly optimize the glitch and slow rising edge caused by the charge-sharing effect.The red-eye diagram in Figure 10 shows that intersymbol interference (ISI) decreases from 286 fs to 154 fs, and the glitch of 103 mV is eliminated.The quality of eye diagram has been improved obviously.

Driver Circuit Design
The driver has two basic constructs, current mode logic (CML) and source-series terminated (SST).When the output swing is the same, the current required by the voltage mode driver is only 1/4 of the current mode driver [25].The SST driver circuit has better linearity than the CML driver circuit for high-speed transmission, and the SST driver circuit can increase the height and width of the eye, so this paper used the SST driver circuit to drive the whole transmitter [5].
The transmitter in this design adopts a 7-bit SST driver circuit.If it is evenly divided, the SST driver circuit is divided into 127 parts, and the output impedance of each part is 6350 Ω.In this design, different weight ratios were used to reduce the impedance of each slice, thereby reducing the area occupied by the driver circuit, as shown in Figure 12.The 50 Ω impedance matching of the 7-bit SST driver circuit is composed of MOS switch resistors and linear resistors in series and parallel.In general, the linear resistance occupies the main part of the device, because the linear resistance is much less sensitive to PVT changes than the MOS transistor as a switch tube.However, when the resistance of the MOS transistor is too small, the charging and discharging speed will be affected, thus limiting the output bandwidth.The relation between resistance and capacitance of the MOS transistor is shown in Figure 13.When the MOS transistor resistance increases from 12 to 25 Ω, the parasitic capacitance decreases by about half.However, when the resistance of the MOS transistor continues to increase, the reduction of capacitance is not obvious.Therefore, it is very important to choose a suitable MOS transistor resistance and linear resistance R E .The impedance of the traditional SST driver circuit matches the output impedance of 50 Ω, that is, R MOS + R E = Z 0 .The small resistance of the MOS transistor leads to a large parasitic capacitance.Therefore, the driver circuit was designed to add a linear parallel resistance R T and linear series resistance R E at the output end to increase the resistance of the MOS transistor and reduce the parasitic capacitance.Furthermore, the higher driver resistance can reduce the static current consumed by the driver.However, increasing the parallel resistance R T will reduce the output swing, so it is necessary to make a compromise between power consumption and output swing.The relationship between the parallel resistance R T and output swing is shown in Figure 14.According to references [26][27][28], it is infeasible that the output swing is far less than 0.67 Vppd under a multilevel modulation.Therefore, selecting R MOS + R E = 75 Ω and R T = 150 Ω can not only ensure the output swing of the driver, but also reduce the dynamic power consumption of the driver.It can be seen from the previous analysis that when the resistance of the MOS transistor exceeds 25 Ω, the reduction of parasitic capacitance is not obvious.Finally, R MOS = 50 Ω, R E = 25 Ω, R T = 150 Ω were selected.Its output resistance is where R out is the output resistance, R MOS is the MOS transistor resistance, R E is the series resistance, and R T is the parallel resistance.
The output voltage swing is where V out is the output voltage swing, R T is the parallel resistance, Z 0 is the channel resistance, R MOS is the MOS transistor resistance, R E is the series resistance, and V CC is the supply voltage.The 7-bit SST driver can act as a DAC, and the output voltage signal can be linear with the input digital signal.The static error characteristic has a crucial influence on the static precision of the DAC, mainly including differential nonlinearity (DNL) and integral nonlinearity (INL) [19].The static parameters of the 7-bit SST DAC designed in this paper are shown in Figures 15 and 16

Results
The transmitter designed in this paper used Cadence Virtuoso IC617 simulation software for circuit design and layout postsimulation.The transmitter used a 28 nm CMOS process.In the whole design, 64 channels of parallel PRBS7 were used as input data.The 14 Ghz clock signal generated by the PLL provided the clock signal to each transmitter module through the CML2CMOS module and the frequency division circuit.The whole layout of the transmitter is shown in Figure 17.The overall module size was 269 µm × 162 µm, and the power consumption of the transmitter was 226 mW, the energy efficiency was 2.02 mW/Gb/s.The 112 Gb/s duo-binary PAM4 eye diagram obtained after layout imitation is shown in Figure 18.The six eye heights of the duo-binary PAM4 were 82.3 mV, 80.2 mV, 78.6 mV, 81.1 mV, 82.1 mV, and 83.4 mV, respectively, and the level mismatch rate was 96.7%.The width was 11.7 ps, 10.5 ps, 8.9 ps, 10.3 ps, 11.6 ps, and 11.4 ps, respectively.The minimum opening height obtained was 78.6 mV, the minimum width was 8.9 ps, and the level mismatch rate was 96.7%. Figure 19 shows the test results of the bathtub curve.Through the error code test on the transmitter with an ideal receiver, the bit error rate was lower than 10 −12 and the eye width was about 0.4 UI.From the layout postsimulation results, it could be concluded that the transmitter designed in this paper had good performance and met the design indicators.Table 1 summarizes the performance of this transmitter and compares it with other transmitters with similar rate and process [29][30][31].It can be found that under the same process, it had a higher eye height and similar performance compared with the advanced 10 nm/7 nm process design.

Conclusions
To solve the problem of the high bit error of serial transmitters under strong channel attenuation, this paper designed a low-power 112 Gb/s SerDes transmitter adopting a duo-binary PAM4 modulation technology.The transmitter adopted a precoding circuit to reduce the correlation between the front and back symbols and reduce the bit error rate.The problem of jitter caused by charge sharing of the 4:1 MUX was solved by using precharging auxiliary transistors.In order to improve the linearity, a 7-bit weighted SST driver circuit was adopted.Meanwhile, a parallel resistance was added to the driver circuit and a new current path was added to ensure the output swing of the driver and reduce the dynamic power consumption of the driver.The differential nonlinearity and integral nonlinearity of the DAC were 0.38 LSB and 0.68 LSB, respectively.The transmitter adopted a 28 nm process, could work at 112 Gb/s, consumed 226 mW, had an energy efficiency of 2.02 mW/Gb/s, its linearity reached 96.7%, and its bit error rate was lower than 10 −12 .

Figure 6 .
Figure 6.Proposed precoding structure diagram.D out3 calculated from the previous loop is

Figure 8 .
Figure 8. Pulse generation unit: (left) conventional structure; (right) structure designed in this paper.

Figure 9 .
Figure 9. Simulation waveforms of output nodes with and without PM7/PM8.

Figure 10 .
Figure 10.Simulation comparison of the output eye diagram of 4:1 MUX with and without PM7/PM8.

Figure 11
Figure11shows the differential output eye diagram of the proposed 4:1 MUX.The simulation is the layout postsimulation, and the next-level load is set to 30 f in the simulation process.As can be seen from the simulation result, the output eye width of the structure is about 17.8 ps, the four eyes are uniform, and the maximum jitter is 285 fs.

Figure 13 .
Figure 13.Relation between MOS resistance and capacitance.

Figure 14 .
Figure 14.Relation between shunt resistance and voltage swing.
. The DNL is 0.38 LSB and the INL is 0.68 LSB.

Figure 18 .
Figure 18.The layout post-simulation of the transmitter.

Table 1 .
Performance summary and comparison.