An 18-Gb/s/pin Single-Ended PAM-4 Transmitter for Memory Interfaces with Adaptive Impedance Matching and Output Level Compensation

: This paper presents a method for preventing output level distortion while matching the channel impedance in the single-ended PAM-4 transmitter for memory interfaces. ZQ codes for all four output signal levels were obtained through ZQ calibration and saved in the ZQ code table. The ZQ code generator then adaptively selected the appropriate codes depending on the data pattern and delivered them to the output driver; this can improve the level separation mismatch ratio (RLM) while matching the channel impedance. To validate the effectiveness of our approach, a prototype chip with an active area of 0.035 mm 2 was fabricated in a 65 nm CMOS process. It achieved the energy efﬁciency of 3.09 pJ/bit/pin at 18 Gb/s/pin, and its RLM was 0.971 while matching the channel impedance. the FFE strength of 1, the pull-up resistance should be 500 Ω ; thus, the ﬂowing current should be 1.9 mA. After ﬁnding the PU codes to achieve this current value, we store these codes in the ZQ code table. Finally, we perform the same process to ﬁnd the PD codes and store them in the ZQ code table. The ZQ calibration for other PAM-4 signal levels is subsequently followed, obtaining the ZQ codes for all signal levels. Through this process, the ZQ code table is ﬁlled. This calibration can be implemented on-chip and performed automatically during the training sequence when applied to the memory system. leads alleviate these issues, PU/PD our each


Introduction
The demand for high-bandwidth DRAMs is constantly increasing due to the datacentric trend [1][2][3]. DRAM bandwidth can be increased by raising the number of I/O pins or the clock frequency [4]. However, double data-rate (DDR), low-power double data-rate (LPDDR), and graphic double data-rate (GDDR) memories have a pin count limitation, so a higher clock frequency is required to increase the data-rate per pin. However, since power consumption and frequency-dependent channel loss increases at higher frequencies, it is challenging to increase data transfer speeds using non-return-to-zero (NRZ) signaling. Therefore, four-level pulse amplitude modulation (PAM-4) signaling is one of the most promising methods to address this problem [5].
PAM-4 signaling, which can transmit 2 bits per symbol, is more suitable for higher speeds because its clock frequency is only half of that required for NRZ signaling at the same data rate. However, a PAM-4 signal has only one third of the eye height of an NRZ signal, resulting in a signal-to-noise ratio (SNR) attenuation by 9.5 dB [6,7]. In addition, further SNR degradation may occur due to the non-linearity characteristics of the transmitter output [8]; this causes an imbalance between PAM-4 signal levels. Since the overall performance depends on the smallest eye height of the transmitter, it is important to equalize the voltage difference between signal levels [8] while matching the channel impedance at each signal level.
There have been PAM-4 transmitters that adopt a current-or voltage-mode driver, but PAM-4 transmitters using current-mode drivers [6,9,10] consume a lot of power. Therefore, PAM-4 transmitters using voltage-mode drivers, while improving the PAM-4 signal linearity, have been recently reported [5,[11][12][13]. However, in previous designs [5,11,12], the signal level distortion cannot be alleviated while matching the channel impedance, or the output impedance is calibrated based on only one output level; thus, impedance mismatch can occur at other signal levels. A PAM-4 transmitter with a three-point ZQ calibration [13] can match the impedance at +3, +1, and −1 levels considering the V DS variation but cannot match the impedance at −3 level with the channel impedance.
To alleviate the above issues, we present a single-ended voltage-mode PAM-4 transmitter with adaptive impedance matching and output level compensation. First, it finds and stores the ZQ codes for each PAM-4 output signal level while matching the channel impedance through a ZQ calibration. When the output driver transmits data, an adaptive ZQ code generator changes the driver codes in real time according to the output signal level; that is, level-dependent adaptive impedance matching and output level compensation are performed.
This paper is organized as follows: in Section 2, we introduce the design considerations for single-ended voltage-mode PAM-4 transmitter; in Section 3, we describe the proposed PAM-4 transmitter design; in Section 4, the measurement results are presented; and in Section 5, we draw the conclusions. Figure 1a shows V DS variation of the single-ended voltage-mode PAM-4 driver. The variation of V DS,PU and V DS,PD in the pull-up and pull-down transistor occurs as the output signal level changes from one level to another level; this causes variation of the transistor's on-resistance, distorting the output signal level. Figure 1b shows the output eye diagram considering V DS variation. The output signal level is deviated from the ideal level, deteriorating the level separation mismatch ratio (RLM) that is a function of the intervals between the four PAM-4 output levels. Since the performance of PAM-4 signaling is determined by the smallest eye, calibration for each signal level needs to be made, taking into account V DS fluctuation. signal linearity, have been recently reported [5,[11][12][13]. However, in previous designs [5,11,12], the signal level distortion cannot be alleviated while matching the channel impedance, or the output impedance is calibrated based on only one output level; thus, impedance mismatch can occur at other signal levels. A PAM-4 transmitter with a threepoint ZQ calibration [13] can match the impedance at +3, +1, and −1 levels considering the VDS variation but cannot match the impedance at −3 level with the channel impedance.

Design Considerations
To alleviate the above issues, we present a single-ended voltage-mode PAM-4 transmitter with adaptive impedance matching and output level compensation. First, it finds and stores the ZQ codes for each PAM-4 output signal level while matching the channel impedance through a ZQ calibration. When the output driver transmits data, an adaptive ZQ code generator changes the driver codes in real time according to the output signal level; that is, level-dependent adaptive impedance matching and output level compensation are performed.
This paper is organized as follows: in Section 2, we introduce the design considerations for single-ended voltage-mode PAM-4 transmitter; in Section 3, we describe the proposed PAM-4 transmitter design; in Section 4, the measurement results are presented; and in Section 5, we draw the conclusions. Figure 1a shows VDS variation of the single-ended voltage-mode PAM-4 driver. The variation of VDS,PU and VDS,PD in the pull-up and pull-down transistor occurs as the output signal level changes from one level to another level; this causes variation of the transistor's on-resistance, distorting the output signal level. Figure 1b shows the output eye diagram considering VDS variation. The output signal level is deviated from the ideal level, deteriorating the level separation mismatch ratio (RLM) that is a function of the intervals between the four PAM-4 output levels. Since the performance of PAM-4 signaling is determined by the smallest eye, calibration for each signal level needs to be made, taking into account VDS fluctuation. While achieving good RLM performance regarding VDS fluctuation, the channel impedance should be matched because a reflection caused by an impedance mismatch worsens signal integrity. In memory interfaces, impedance matching can be performed through the ZQ calibration. The conventional ZQ calibration is carried out based on one level [5]; thus, the impedance can be mismatched at the other three signal levels that have While achieving good RLM performance regarding V DS fluctuation, the channel impedance should be matched because a reflection caused by an impedance mismatch Electronics 2021, 10, 1768 3 of 10 worsens signal integrity. In memory interfaces, impedance matching can be performed through the ZQ calibration. The conventional ZQ calibration is carried out based on one level [5]; thus, the impedance can be mismatched at the other three signal levels that have not been calibrated. The three-point ZQ calibration [13] can match the impedance at +3, +1, and −1 output levels considering V DS variation, improving the linearity. When transmitting the −3 output level, however, the pull-down driver uses the driver codes obtained from the −1 output level. This can cause an impedance mismatch at the −3 output level, leading to the signal reflection. Therefore, it is necessary to match the impedance with the channel for all four signal levels. Figure 2 shows a proposed PAM-4 transmitter with four-level impedance matching. The four-phase clock signals (CLK0, CLK90, CLK180, and CLK270) are generated from an internal clock path, which is composed of a clock buffer (CLK BUF), an IQ divider (DIV), and a single-to-differential converter (S-to-D). The 32 bit parallel data generated by a PRBS generator (Gen.) are transmitted to the output driver through a 32:8 serializer (SER), a data aligner, and a 4:1 serializer. The 32:8 serializer is composed of several 2:1 serializers. The most significant bit (MSB) and least significant bit (LSB) drivers are divided into 20 and 10 segments. In each MSB/LSB driver segment, the 2:1 multiplexer (MUX) selects the current or 1-UI delayed data to implement two-tap feed-forward equalization (FFE). The FFE strength is controlled by adjusting the number of segments operating with the current or 1-UI delayed data, and its coefficient can be adjusted up to 13.98 dB. After a ZQ code table is filled by the ZQ calibration, an adaptive ZQ code generator changes the ZQ codes depending on the data pattern and transmits them to the output driver.

Proposed PAM-4 Transmitter
Electronics 2021, 10, x FOR PEER REVIEW 3 of 10 not been calibrated. The three-point ZQ calibration [13] can match the impedance at +3, +1, and −1 output levels considering VDS variation, improving the linearity. When transmitting the −3 output level, however, the pull-down driver uses the driver codes obtained from the −1 output level. This can cause an impedance mismatch at the −3 output level, leading to the signal reflection. Therefore, it is necessary to match the impedance with the channel for all four signal levels. Figure 2 shows a proposed PAM-4 transmitter with four-level impedance matching. The four-phase clock signals (CLK0, CLK90, CLK180, and CLK270) are generated from an internal clock path, which is composed of a clock buffer (CLK BUF), an IQ divider (DIV), and a single-to-differential converter (S-to-D). The 32 bit parallel data generated by a PRBS generator (Gen.) are transmitted to the output driver through a 32:8 serializer (SER), a data aligner, and a 4:1 serializer. The 32:8 serializer is composed of several 2:1 serializers. The most significant bit (MSB) and least significant bit (LSB) drivers are divided into 20 and 10 segments. In each MSB/LSB driver segment, the 2:1 multiplexer (MUX) selects the current or 1-UI delayed data to implement two-tap feed-forward equalization (FFE). The FFE strength is controlled by adjusting the number of segments operating with the current or 1-UI delayed data, and its coefficient can be adjusted up to 13.98 dB. After a ZQ code table is filled by the ZQ calibration, an adaptive ZQ code generator changes the ZQ codes depending on the data pattern and transmits them to the output driver. To change appropriate ZQ codes in the adaptive ZQ code generator according to the output signal level, the ZQ calibration is first performed for all PAM-4 levels reflecting the FFE strength, and the corresponding ZQ codes are then stored in the ZQ code table. In this prototype, the ZQ calibration is performed manually. Figure 3 shows an example of the manual ZQ calibration using a simplified driver circuit diagram when both current and 1-UI delayed data are "11". First, we determine the target FFE strength, α, considering channel loss; this example chooses α as 1. Since the on-resistance, Ron, of each transistor should be 1.5 kΩ to match the channel impedance (Z0) of 50 Ω, the output signal level is 50 mV. We then fix OUT to 50 mV in order to find the PU codes at this output level. With the FFE strength of 1, the pull-up resistance should be 500 Ω; thus, the flowing current should be 1.9 mA. After finding the PU codes to achieve this current value, we store these   To change appropriate ZQ codes in the adaptive ZQ code generator according to the output signal level, the ZQ calibration is first performed for all PAM-4 levels reflecting the FFE strength, and the corresponding ZQ codes are then stored in the ZQ code table. In this prototype, the ZQ calibration is performed manually. Figure 3 shows an example of the manual ZQ calibration using a simplified driver circuit diagram when both current and 1-UI delayed data are "11". First, we determine the target FFE strength, α, considering channel loss; this example chooses α as 1. Since the on-resistance, R on , of each transistor should be 1.5 kΩ to match the channel impedance (Z 0 ) of 50 Ω, the output signal level is 50 mV. We then fix OUT to 50 mV in order to find the PU codes at this output level. With the FFE strength of 1, the pull-up resistance should be 500 Ω; thus, the flowing current should be 1.9 mA. After finding the PU codes to achieve this current value, we store these codes in the ZQ code table. Finally, we perform the same process to find the PD codes and store them in the ZQ code table. The ZQ calibration for other PAM-4 signal levels is subsequently followed, obtaining the ZQ codes for all signal levels. Through this process, the ZQ code table is filled. This calibration can be implemented on-chip and performed automatically during the training sequence when applied to the memory system.

Proposed PAM-4 Transmitter
Electronics 2021, 10, x FOR PEER REVIEW codes in the ZQ code table. Finally, we perform the same process to find the PD co store them in the ZQ code table. The ZQ calibration for other PAM-4 signal levels sequently followed, obtaining the ZQ codes for all signal levels. Through this proc ZQ code table is filled. This calibration can be implemented on-chip and performe matically during the training sequence when applied to the memory system. The ZQ code table has two 4 × 5 structures for PU/PD codes, as shown in F During the ZQ calibration, the ZQ code table is filled with 0 or 1 for all outpu levels. The adaptive ZQ code generator consists of ten 4:1 MUXs. This generator u MSB/LSB data to adaptively select codes from the ZQ code table and transmit sponding codes to the output driver as PU<4:0> and PD<4:0>. The mismatch of th agation delay between the transmitted codes and the data can degrade the overall mance of our structure. Therefore, a ZQ code generator replica is placed at each segment as a delay matching component, as shown in Figure 2. The ZQ code table has two 4 × 5 structures for PU/PD codes, as shown in Figure 4. During the ZQ calibration, the ZQ code table is filled with 0 or 1 for all output signal levels. The adaptive ZQ code generator consists of ten 4:1 MUXs. This generator uses the MSB/LSB data to adaptively select codes from the ZQ code table and transmits corresponding codes to the output driver as PU<4:0> and PD<4:0>. The mismatch of the propagation delay between the transmitted codes and the data can degrade the overall performance of our structure. Therefore, a ZQ code generator replica is placed at each driver segment as a delay matching component, as shown in Figure 2.   Figure 5a shows an output driver circuit diagram using a data and driver code encoder [5,13]. When the encoder is placed in front of the output driver, the data and the driver code are encoded, and then encoded data are sent to the output driver; this can make the output driver configuration simple. However, the propagation delay of the data path increases by the delay of the encoder, increasing the power-supply-induced jitter and deteriorating the output drift characteristics in memory interfaces [14]. The proposed output driver in Figure 5b can improve these issues by removing the encoder. Each driver segment has a source-series terminated structure. All segments share a 71 Ω passive resistor to improve linearity, and 5 bit PU/PD codes from the adaptive ZQ code generator control the driver's on-resistance.
(a) (b) Figure 5. (a) Output driver using a data and driver code encoder [5,13] and (b) the proposed output driver.
The operation for the conventional [5] and proposed PAM-4 output driver is shown in Figure 6a. In the conventional driver for memory interfaces [5], after ZQ calibration based on the one signal level, PU/PD driver codes are fixed during the burst operation; thus, VDS fluctuation can vary the transistors' on-resistance, leading to the RLM degradation and the impedance mismatch. Although the previous PAM-4 transmitter with threepoint ZQ calibration [13] changes the driver codes according to the data pattern, this structure uses the driver codes obtained from the −1 output level when transmitting the −3 output level; this leads to signal reflection. To alleviate these issues, the PU/PD codes of  Figure 5a shows an output driver circuit diagram using a data and driver code encoder [5,13]. When the encoder is placed in front of the output driver, the data and the driver code are encoded, and then encoded data are sent to the output driver; this can make the output driver configuration simple. However, the propagation delay of the data path increases by the delay of the encoder, increasing the power-supply-induced jitter and deteriorating the output drift characteristics in memory interfaces [14]. The proposed output driver in Figure 5b can improve these issues by removing the encoder. Each driver segment has a source-series terminated structure. All segments share a 71 Ω passive resistor to improve linearity, and 5 bit PU/PD codes from the adaptive ZQ code generator control the driver's on-resistance.  Figure 5a shows an output driver circuit diagram using a data and driver code encoder [5,13]. When the encoder is placed in front of the output driver, the data and the driver code are encoded, and then encoded data are sent to the output driver; this can make the output driver configuration simple. However, the propagation delay of the data path increases by the delay of the encoder, increasing the power-supply-induced jitter and deteriorating the output drift characteristics in memory interfaces [14]. The proposed output driver in Figure 5b can improve these issues by removing the encoder. Each driver segment has a source-series terminated structure. All segments share a 71 Ω passive resistor to improve linearity, and 5 bit PU/PD codes from the adaptive ZQ code generator control the driver's on-resistance. The operation for the conventional [5] and proposed PAM-4 output driver is shown in Figure 6a. In the conventional driver for memory interfaces [5], after ZQ calibration based on the one signal level, PU/PD driver codes are fixed during the burst operation; thus, VDS fluctuation can vary the transistors' on-resistance, leading to the RLM degradation and the impedance mismatch. Although the previous PAM-4 transmitter with threepoint ZQ calibration [13] changes the driver codes according to the data pattern, this struc-  The operation for the conventional [5] and proposed PAM-4 output driver is shown in Figure 6a. In the conventional driver for memory interfaces [5], after ZQ calibration based on the one signal level, PU/PD driver codes are fixed during the burst operation; thus, V DS fluctuation can vary the transistors' on-resistance, leading to the RLM degradation and the impedance mismatch. Although the previous PAM-4 transmitter with three-point Electronics 2021, 10, 1768 6 of 10 ZQ calibration [13] changes the driver codes according to the data pattern, this structure uses the driver codes obtained from the −1 output level when transmitting the −3 output level; this leads to signal reflection. To alleviate these issues, the PU/PD codes of our driver adaptively change in real time for each output signal level after the ZQ calibration is performed at all four signal levels. Figure 6b shows one example of PD code generation. When the data pattern is changed to '11 , '10 , '01 , and '00 , the PD<4:0> is changed to '01010 , '10010 , '10011 , and '10111 correspondingly (however, in the conventional structure [5], PD<4:0> is fixed to '01010 regardless of the data pattern). PU<4:0> changes in the same way. PEER REVIEW 6 of 10 our driver adaptively change in real time for each output signal level after the ZQ calibration is performed at all four signal levels. Figure 6b shows one example of PD code generation. When the data pattern is changed to '11′, '10′, '01′, and '00′, the PD<4:0> is changed to '01010′, '10010′, '10011′, and '10111′ correspondingly (however, in the conventional structure [5], PD<4:0> is fixed to '01010′ regardless of the data pattern). PU<4:0> changes in the same way.
(a) (b) Figure 6. (a) Operation for conventional [5] and proposed PAM-4 driver and (b) one example of the adaptive PD code generation in the proposed PAM-4 transmitter.

Measurement Results
A prototype of the proposed PAM-4 transmitter was fabricated in a 65 nm CMOS process. Figure 7 shows the measurement setup and a die micrograph with a magnified layout. The total active area of the transmitter is 0.035 mm 2 . The differential clock signals CLKP and CLKN generated by the signal quality analyzer (Anritsu MP1800A) are transmitted to the printed circuit board (PCB), and the output signal passes through a channel where the insertion loss of −3.02 dB at 4.5 GHz is measured on an oscilloscope (Tektronix MSO73304DX). The operation of the transmitter can be controlled externally by means of an inter-integrated circuit (I2C) interface.

Measurement Results
A prototype of the proposed PAM-4 transmitter was fabricated in a 65 nm CMOS process. Figure 7 shows the measurement setup and a die micrograph with a magnified layout. The total active area of the transmitter is 0.035 mm 2 . The differential clock signals CLKP and CLKN generated by the signal quality analyzer (Anritsu MP1800A) are transmitted to the printed circuit board (PCB), and the output signal passes through a channel where the insertion loss of −3.02 dB at 4.5 GHz is measured on an oscilloscope (Tektronix MSO73304DX). The operation of the transmitter can be controlled externally by means of an inter-integrated circuit (I2C) interface. CLKP and CLKN generated by the signal quality analyzer (Anritsu MP1800A) are transmitted to the printed circuit board (PCB), and the output signal passes through a channel where the insertion loss of −3.02 dB at 4.5 GHz is measured on an oscilloscope (Tektronix MSO73304DX). The operation of the transmitter can be controlled externally by means of an inter-integrated circuit (I2C) interface.     (Figure 8a). When the proposed method is applied, eye distortion is compensated; the voltage gap between signals is 164 mV, 160 mV, and 155 mV, respectively, achieving an improved RLM of 0.971 (Figure 8b). With this method, the worst eye-opening also increases to 75.9 mV. Although this method has the better RLM performance, its structure using single-ended signaling may make the PAM-4 eye diagram look somewhat less symmetrical than an eye diagram based on differential signaling [11,12].  (Figure 8a). When the proposed method is applied, eye distortion is compensated; the voltage gap between signals is 164 mV, 160 mV, and 155 mV, respectively, achieving an improved RLM of 0.971 (Figure 8b). With this method, the worst eye-opening also increases to 75.9 mV. Although this method has the better RLM performance, its structure using single-ended signaling may make the PAM-4 eye diagram look somewhat less symmetrical than an eye diagram based on differential signaling [11,12].   Figure 9 shows the measured BER bathtub curves of the upper, middle, and lower eye at 18 Gb/s/pin, with the conventional and proposed PAM-4 transmitter using the PRBS-7 data pattern and the FFE coefficient of 1.94 dB. When the proposed method is not applied, a minimum horizontal margin at a BER of 10 −12 is only 0.05 UI (Figure 9a). With the proposed PAM-4 transmitter, the minimum horizontal margin increases to 0.22 UI at a BER of 10 −12 (Figure 9b). Figure 9 shows the measured BER bathtub curves of the upper, middle, and lower eye at 18 Gb/s/pin, with the conventional and proposed PAM-4 transmitter using the PRBS-7 data pattern and the FFE coefficient of 1.94 dB. When the proposed method is not applied, a minimum horizontal margin at a BER of 10 −12 is only 0.05 UI (Figure 9a). With the proposed PAM-4 transmitter, the minimum horizontal margin increases to 0.22 UI at a BER of 10 −12 (Figure 9b).   In Tables 1 and 2, the performance of our prototype is summarized and compared with those of other recent voltage-mode PAM-4 transmitters. Our transmitter can match the channel impedance at all four PAM-4 signal levels while achieving a good RLM performance by adaptively adjusting the ZQ code according to the data pattern. The ZQ codes are transmitted to the output driver parallel to the data. Thus, the encoder block in front of the output driver can be removed, decreasing the data path delay and improving the output drift characteristic. Although the differential PAM-4 transmitter [8,11] has better energy efficiency, the differential architecture with the on-chip voltage regulator cannot be adopted in memory interfaces. The previous single-ended PAM-4 transmitter [13] has better RLM and energy efficiency performances; however, the impedance values of the NMOS-only driver are vulnerable to the VT variations, and the circuits for the ZQ calibration are needed, even in the main path. Even though the PAM-4 transmitter using the differential ternary R-2R DAC [15] also shows better RLM performance, the impedance calibration is not performed at all signal levels. Furthermore, the bootstrapping method of the R-2R switch is susceptible to the PVT variations, which can vary the output driver's on-resistance.  In Tables 1 and 2, the performance of our prototype is summarized and compared with those of other recent voltage-mode PAM-4 transmitters. Our transmitter can match the channel impedance at all four PAM-4 signal levels while achieving a good RLM performance by adaptively adjusting the ZQ code according to the data pattern. The ZQ codes are transmitted to the output driver parallel to the data. Thus, the encoder block in front of the output driver can be removed, decreasing the data path delay and improving the output drift characteristic. Although the differential PAM-4 transmitter [8,11] has better energy efficiency, the differential architecture with the on-chip voltage regulator cannot be adopted in memory interfaces. The previous single-ended PAM-4 transmitter [13] has better RLM and energy efficiency performances; however, the impedance values of the NMOS-only driver are vulnerable to the VT variations, and the circuits for the ZQ calibration are needed, even in the main path. Even though the PAM-4 transmitter using the differential ternary R-2R DAC [15] also shows better RLM performance, the impedance calibration is not performed at all signal levels. Furthermore, the bootstrapping method of the R-2R switch is susceptible to the PVT variations, which can vary the output driver's on-resistance.

Conclusions
We have presented an 18 Gb/s/pin single-ended PAM-4 transmitter for memory interfaces with adaptive impedance matching and output level compensation. The driver codes for all PAM-4 signal levels are stored in a ZQ code table after a ZQ calibration. Using an adaptive ZQ code generator, the output driver adjusts the output signal level and impedance for four signal levels to compensate for the V DS variation caused by the output level change. This transmitter achieves an RLM of 0.971 while achieving 3.09 pJ/bit/pin at 18 Gb/s/pin.