Experimental Efﬁciency Evaluation of Stacked Transistor Half-Bridge Topologies in 14 nm CMOS Technology

: Different Half-Bridge (HB) converter topologies for an Integrated Voltage Regulator (IVR), which serves as a microprocessor application, were evaluated. The HB circuits were implemented with Stacked Transistors (HBSTs) in a cutting-edge 14 nm CMOS technology node in order to enable the integration on the microprocessor die. Compared to a conventional realization of the HBST, it was found that the Active Neutral-Point Clamped (ANPC) HBST topology with Independent Clamp Switches (ICSs) not only ensured balanced blocking voltages across the series-connected transistors, but also featured a more robust operation and achieved higher efﬁciencies at high output currents. The IVR achieved a maximum efﬁciency of 85.3% at an output current of 300mA and a switching frequency of 50MHz. At the maximum measured output current of 780mA, the efﬁciency was 83.1%. The active part of the IVR (power switches, gate-drivers, and level shifters) realized a high maximum current density of 24.7A/mm 2 .


Introduction
Modern high-performance microprocessors feature tens of processor cores [1], operate with clock frequencies up to 5 GHz [2], and, with this, achieve previously unmatched computational performance. However, this performance increase is accompanied by an increase of the required power, which can exceed 150 W per device [3]. In this context, the concept of Dynamic Voltage and Frequency Scaling (DVFS), in combination with a granular power delivery system that features independent Voltage Domains (VDs) for different cores or parts of cores, can achieve an effective reduction of the power demand [4]. By way of example, it was found in [5] that a video decoder test chip, which employs multiple different VDs with configurable supply voltages (two defined voltages: 1.6 V and 2.2 V), can achieve a reduction of the supplied power by 55% to 61%. In addition, it is desirable to reduce the high package currents of modern microprocessors. Reduced package current and adjustable supply voltages for the different VDs can be achieved with buck-type Integrated Voltage Regulators (IVRs) that are located in the package of the microprocessor or on the microprocessor die [6]. Table 1 summarizes the main specifications of the considered IVR.
Common topologies for IVRs are Switched Capacitor Converters (SCCs) and buck converters with output inductors. Numerous examples of SCCs are documented in the literature, e.g., a standalone SCC topology with regulated output voltage (using pulse

Parameter Value
Input voltage, V in 1.6 V Range of output voltage, V out 0.8 V to 1.2 V Maximum output current, I out,max 800 mA Switching frequency range, f sw 50 MHz to 150 MHz This paper presents an evaluation of an inductor-based buck-type IVR, whose power stage is comprised of four converter phases (half-bridges) that are operated in parallel with interleaving. The power stage was implemented in the high-end 14 nm CMOS technology node of the microprocessor, in order to allow for on-die integration, and employed short-channel transistors, instead of long-channel devices, to take advantage of their low conduction and switching losses [16]. With this, a high switching frequency of up to 150 MHz can be achieved, which enables a fast transient response that is needed for microprocessor applications [17]. However, short-channel transistors are subject to a reduced breakdown voltage. For this reason, the power switches of each half-bridge were realized with stacked transistors (CMOS Half-Bridges with Stacked Transistors (HBSTs)) [18], depicted in Figure 1a, in order to reduce the voltage applied to each power transistor to half of the input voltage of the CMOS HBST. (Please note that all circuits depicted in Figure 1 refer to Phase 1 of the four-phase IVR and include level-shifters, which are needed to provide the correct voltage levels for the gate drivers connected to TP 2 and TN 3 .) The CMOS HBST has been used in IVRs of commercial microprocessors in modern technology nodes built in the most recent tri-gate technology nodes, e.g., 22 nm [13] and 14 nm [19]. Although, high efficiencies are achievable using the CMOS HBST, e.g., 88% in [14], the blocking voltages of the stacked transistors can be unequal, which leads to reduced reliability and increased losses [20]. To achieve equal blocking voltages, the CMOS HBST with Active Neutral Point Clamping (CMOS ANPC HBST), depicted in Figure 1b, was considered, which adds clamp switches to the CMOS HBST to actively clamp the potential between the stacked transistors to a middle potential [21]. The CMOS ANPC HBST uses common gate drivers for the main switches and the respective clamp switches. Therefore, the clamp switches TN 3 and TP 3 are both turned on if TP 2 and TN 2 are both turned off. In addition, TP 1 starts to conduct if its gate-to-drain voltage is less than −0.3 V and TN 1 starts to conduct for a gate-to-drain voltage higher than 0.3 V, which is found in Section 2, in the course of the investigation of the simulation results.
For this reason, the output of the power stage cannot be switched to high impedance. Accordingly, phase shedding, which is used to increase part-load efficiency in multi-phase converters [22], is not possible. The CMOS ANPC HBST with Independent Clamp Switches (ICSs) of Figure 1c eliminates this shortcoming by using separate gate drivers for the clamp switches [20]. Please note that Figure 1c defines all currents and voltages of the waveforms discussed in Section 2. Furthermore, the symbols ck GP,H,1 , ck GN,H,1 , ck GP,L,1 , and ck GN,L,1 in Figure 1c refer to digital gate signals that stem from the circuitry explained in Section 3.   Adapted from [6,20,23]. Copyright © 2017 by IEEE. Adapted with permission.
The three investigated topologies, HBST, ANPC HBST, and ANPC HBST with ICS, mainly differ with regard to their behavior during switching. Accordingly, the switching operations are inspected in a first step in Section 2 based on simulation results in order to gain a deeper understanding of the different topologies. However, the conducted simulations disregard different loss components, e.g., due to the Power Distribution Network (PDN) and the metal layers of the chip. In order to provide a robust comparison of the three topologies, experimental efficiency results are used to assess the topologies. In this regard, Section 3 summarizes the implementation details of the realized IVR. Section 4 presents the results of the experimental evaluation. Section 5 provides a final discussion. With this, the paper provides the two key contributions listed below.

1.
first, the experimental validation of a CMOS ANPC HBST with ICS that is realized in 14 nm CMOS technology; 2.
a comparative evaluation of the conventional CMOS HBST, the conventional CMOS HBST ANPC, and the CMOS HBST ANPC with ICS based on measured efficiencies.

Investigation of the Switching Operations
This section investigates the switching operations of the three considered circuit topologies, i.e., the CMOS HBST in Section 2.1 and the CMOS ANPC HBST, as well as the CMOS ANPC HBST with ICS in Section 2.2, using simulated waveforms, and presents a related discussion of the main findings in Section 2.3. All simulations were conducted with the circuit simulator that is part of the Virtuoso Custom IC Design Environment by Cadence (Version ICADV12.1), which, for the employed 14 nm CMOS technology, was the only software tool for simulation that was available to our research group in the scope of this work. By way of example, Converter Phase 1 of the four-phase IVR was selected for this purpose. With this, it provides a knowledge basis for the discussions presented in the subsequent sections of this paper. This section summarizes the findings of a previous conference publication [20]. Figure 2 presents typical gate voltages that are used to generate an output voltage, v x,1 , with a defined duty cycle at the switching node of the buck converter. Figure 2a depicts the gate voltages v GP,H,1 and v GN,L,1 that are applied to the transistors TP 2 and TN 2 , respectively. These waveforms are valid for all three topologies shown in Figure 1. Figure 2b shows the additional gate voltages, v GN,H,1 and v GP,L,1 , for the clamping switches TN 3 and TP 3 of the CMOS ANPC HBST with ICS. The presented gate voltages are measured with respect to the minus terminal of the power stage. (Please note that TP 1 , TP 2 , and TP 3 are p-type MOSFETs; for this reason, negative gate-to-source voltages are needed to turn on TP 1 , TP 2 , and TP 3 . In addition, the gate voltages of TP 2 and TN 3 , v GP,H,1 and v GN,H,1 , feature an offset of V in /2, which is needed to keep the transistors' gate-tosource voltages within the allowable voltage range.) In the case of the CMOS HBST, solely the switching states of TP 2 and TN 2 determine the switching states of TP 1 and TN 1 , respectively, e.g., if TP 2 is in the on-state and TN 2 in the off-state (with an assumed drain-to-source voltage of V in /2), the gate-to-source voltages of TP 1 and TN 1 are equal to −V in /2 and zero, respectively. Accordingly, TP 1 is in the on-state and TN 1 in the off-state. Figure 2a,b reveals the dead times between subsequent turn-off and turn-on events, to avoid short-time short circuits in the HBST, and Figure 2c illustrates the waveform of the voltage at the switching node, v x,1 , that results for the gate voltages of Figure 2a     The simulated waveforms of the transistor currents and drain-source voltages during the time intervals where v x,1 changes from V in to zero and vice versa are presented in Figures 3-5 for the CMOS HBST, the CMOS ANPC HBST, and the CMOS ANPC HBST with ICS topologies, respectively. All simulations considered the settings of Table 1 and a constant output current of 250 mA (i.e., a negligible output current ripple was assumed) and disregarded the implications of the metal layers and the power distribution network of the IVR's Power Management IC (PMIC) on the waveforms.  Table 1 and i x,1 = 250 mA: (a-d) falling edge of v x,1 ; (e-h) rising edge of v x,1 . Subfigures (d,h) show the instantaneous powers in the power transistors of the HBST. Adapted from [20]. Copyright © 2017 by IEEE. Adapted with permission.   Table 1 and i x,1 = 250 mA: (a-e) falling edge of v x,1 ; (f-j) rising edge of v x,1 .
Subfigures (e,j) show the instantaneous powers in the power transistors of the HBST. Adapted from [20]. Copyright © 2017 by IEEE. Adapted with permission.   Table 1 and i x,1 = 250 mA: (a-e) falling edge of v x,1 ; (f-j) rising edge of v x,1 . Subfigures (e,j) show the instantaneous powers in the power transistors of the HBST. Adapted from [20]. Copyright © 2017 by IEEE. Adapted with permission.

Conventional HBST
The conventional CMOS HBST is considered in a first step. In the case of a falling edge of v x,1 , first, TP 2 is commanded to switch off at t = t 0 , as shown in Figure 3a (please note that the definitions of all currents and voltages used in Figure 3 are given in Figure 1c). The constant output current charges the output capacitance of TP 2 , and v sd,TP2 increases, which is depicted in Figure 3b. As a consequence, the gate-to-source capacitance of TP 1 is discharged, and subsequently, TP 1 is turned off. With increasing source-to-drain voltages of TP 2 and TP 1 , the drain-to-source voltages of TN 1 and TN 2 decrease, as shown in Figure 3c, in order to keep the sum v sd,TP2 + v sd,TP1 + v ds,TN1 + v ds,TN2 equal to the input voltage.
Accordingly, very small turn-on losses can be achieved during t 1 < t < t 2 , if the dead time is sufficiently large, such that v ds,TN1 and v ds,TN2 reach zero before TN 2 is commanded to turn on at t = t 1 . Figure 3d depicts the instantaneous losses of the four power transistors, that is the products of drain-source voltages and drain currents (not considering the gate driver losses). This result reveals that a large part of the stored energy can be recycled, leading to low total switching losses. However, after the switching operation has elapsed, for t > t 2 , the simulation computes unequal source-to-drain voltages for TP 1 and TP 2 , i.e., v sd,TP1 = v sd,TP2 applies.
The rising edge of v x,1 is initiated by commanding TN 2 to turn off at t = t 3 , which is depicted in Figure 3e. During the dead time interval, t 3 < t < t 4 , TN 1 remains in the on-state, and the high-side transistors TP 1 and TP 2 remain off. As a consequence, TN 1 conducts the output current, i x,1 , which charges the output capacitance of TN 2 until v ds,TN2 reaches approximately −0.4 V; cf. Figure 3g. With this, the gate-to-drain voltage of TN 2 , v gd,TN2 , reaches approximately 0.4 V, which leads to a turn-on of TN 2 , i.e., TN 2 conducts the output current. As a result, increased conduction losses occur in TN 2 during the dead time. Furthermore, the negative value of v ds,TN2 leads to an overvoltage condition for TP 1 . (The current that charges the output capacitance of TP 1 , i d,TN1 in Figure 3g, finds a low impedance path (to V in /2) in the large input capacitance of TP 1 . For this reason, only negligible charging current remains for TP 2 , leading to a negligible overvoltage across TP 2 during the dead time interval.) At t = t 4 , TP 2 is commanded to turn on. The associated turn-on processes force v sd,TP2 and v sd,TP1 to decrease to zero, as shown in Figure 3f. During turn-on, large spikes are observed in the drain currents of TN 1 , TP 1 , and TP 2 . Figure 3h depicts the instantaneous losses in the transistors, which are particularly high for TP 1 and TP 2 . According to Figure 3g, TN 1 and TN 2 are subject to unequal drain-to-source voltages after the switching operation has elapsed, i.e., for t > t 5 .  Figure  4 depicts the waveforms of the same physical variables as Figure 3 and, in addition, also shows the drain-to-source voltages and the drain currents of the clamping switches, TN 3 and TP 3 in Figure 4d,i. Compared to the CMOS HBST, the clamping switches TN 3 and TP 3 are both turned on during the dead time and define the drain potentials of TP 2 and TN 2 during this time.

ANPC HBST without and with ICS
In case of a negative slope of v x,1 , the turn-on of TN 3 leads to increased switching losses during t 0 < t < t r and the turn-on of TN 2 to increased switching losses during t 1 < t < t 2 . In addition, an increased forward voltage drop across TN 1 , which conducts the output current during the dead time, of v ds,TN1 ≈ −0.3 V, is found during t r < t < t 1 , which leads to increased conduction losses during this time interval. (The constant output current charges the output capacitance of TN 1 until v ds,TN1 ≈ −0.3 V applies, which increases the gate-to-drain voltage of TN 1 to approximately 0.3 V, since TP 3 is turned on. As a consequence, TN 1 is forced to conduct the output current (via TP 3 ) during the dead time interval, t r < t < t 1 .) In the case of a positive slope of v x,1 , the turn-on losses are found to be less than for the HBST topology, because the active clamp switches enforce a reduction of v sd,TP1 during the dead time t 3 < t < t 4 , which also decreases the value of v sd,TP1 during the turn-on time interval, t 4 < t < t 5 , as shown in Figure 4g. Furthermore, equal blocking voltages are achieved after both switching operations, i.e., v sd,TP1 = v sd,TP2 for t > t 2 in Figure 4b and v ds,TN1 = v ds,TN2 for t > t 5 in Figure 4h.
The waveforms of the transistor voltages and currents simulated for the CMOS ANPC HBST with ICS are depicted in Figure 5. Compared to the CMOS ANPC HBST without ICS, the main switches and the clamp switches are commanded to turn off during every dead time. This can be seen in Figure 5a,f, which presents the transistors' gate signals for a falling and a rising edge of v x,1 , respectively. For this reason, the switching operations are the same as for the conventional HBST described in Section 2.1 during t 0 < t < t 1 and t 3 < t < t 4 . However, at the end of the dead time interval, either TN 3 (at t = t 1 in Figure 5a) or TP 3 (at t = t 4 in Figure 5f) is turned on in addition to the two main switches (TN 1 and TN 2 or TP 1 and TP 2 , respectively). With this, equal drain-to-source voltages of the main power switches are enforced as soon as the respective clamp switch is in the on-state, i.e., for t > t 2 in Figure 5b and for t > t 5 in Figure 5h.

Discussion
A comparison of the instantaneous switching losses during the rising edge of v x,1 of the ANPC HBST without and with ICS revealed lower losses of 46 pJ for the ANPC HBST without ICS, compared to 55 pJ for the ANPC HBST with ICS (for the instantaneous power waveforms depicted in Figures 4j and 5j). However, in the case of a falling edge of v x,1 , the ANPC HBST without ICS is subject to higher losses of 37 pJ (Figure 4e), compared to 5 pJ for the ANPC HBST with ICS (Figure 5e). In addition, the switching losses depend on the dead times, dt ↑ and dt ↓ . Accordingly, it is not directly possible to draw a meaningful conclusion. For this reason, the characteristics of the overall simulated efficiencies of the three investigated topologies are compared to each other in Figure 6. The efficiencies were determined for different dead times during the falling edge of v x,1 , dt ↓ , for the settings listed in Table 1, D = 50%, I x,1 = 250 mA, f sw = 150 MHz, and the minimum configurable dead time during the rising edge of v x,1 , dt ↑ , of 40 ps, which leads to minimum switching losses during the rising edge of v x,1 . As expected, the conventional ANPC HBST featured the best results for very small values of dt ↓ , due to the additional losses during the dead time, as shown in Figure 4. The efficiency characteristics of the conventional HBST and the ANPC HBST with ICS were approximately parallel, which was attributed to the similar processes that occur during the corresponding dead times. However, the conventional HBST generated higher losses in TP 1 and TP 2 than the ANPC HBST with ICS during the rising edge of v x,1 ; cf. Figures 3h and 5j. In summary, the ANPC HBST with ICS achieved the highest efficiency for dt ↓ = 200 ps overall; for dt ↓ < 200 ps, the efficiency decreased due to increasing turn-on losses (during t 1 < t < t 2 in Figure 5) and for dt ↓ > 200 ps due to increased conduction losses during the dead time.  Table 1, D = 0.5, I x,1 = 250 mA, f sw = 150 MHz, dt ↑ = 40 ps, and variable dead time dt ↓ . Adapted from [20]. Copyright © 2017 by IEEE. Adapted with permission.

Realized IVR
The realized hardware demonstrator was a four-phase IVR, which was comprised of the Power Management IC (PMIC) that was bonded to a PCB. Figure 7a shows a picture of the hardware demonstrator. The PCB contains the PMIC, four discrete output inductors (one PFL1005-36NMR device, manufactured by Coilcraft, for each converter phase), and additional buffer capacitors. The PMIC contains four power stages, all gate drivers (each gate driver consists of a three-stage gate driver circuit, as explained in [6]), a high-frequency digital PWM, a configurable load resistor that emulates the power dissipation of a microprocessor, and limited internal buffer capacitances to stabilize the voltages at the input terminal (V in ) and the mid-point terminal (V m ), both with respect to the ground. Figure 7b depicts the chip layout of the power stage of one converter phase. The Front-End-Of-Line (FEOL) area of the HB is 0.0081 mm 2 . The switching frequency, f sw , can be adjusted between 50 MHz and 150 MHz. A detailed explanation of the realized demonstrator was given in [6].    Figure 8b,c were separately implemented for each converter phase. The PWM unit receives the input clock from the clock signal ck in and generates a PWM signal, ck, that features 16 discrete duty cycles. The output of the PWM unit is connected to the configurable delay block, which realizes the phase-shifted gate signals ck {1,2,3,4} , in order to enable interleaving. Each dead time generation unit uses one output signal of the configurable delay block, e.g., ck 1 in the case of Converter Phase 1, to generate the gate signals depicted in Figure 8a and includes an output enable logic that is controlled by en 1 to allow for the deactivation of selected converter phases (phase shedding). Two four-bit values, which represent dt ↑ and dt ↓ , are used to configure the respective dead times. Finally, the multiplexer circuit employs the two-bit configuration signal cfg_out to configure how the gate signals for the clamp switches are generated. With this, the three investigated topologies can be emulated: HBST (Position 1), ANPC HBST (Position 2), and ANPC HBST with ICS (Position 3).

Experimental Setup
The experimental setup used to measure the efficiency of the first converter phase of the four-phase IVR is shown in Figure 9. All currents and voltages were measured with a precision data acquisition unit (34,970 by Keysight). The chip provides a Kelvin-pad at the input (V in,k in Figure 9) that allows for a measurement of the chip-internal input voltage, i.e., without the voltage drops on the input-side bond wire and PCB tracks. Accordingly, the voltage between V in,k and ground (gnd) estimates the voltage across the half-bridge more accurately than the voltage between V in,p and gnd. Reference [6] gave a detailed description of this measurement setup.

Output Voltage Characteristic
The measured ratios of the output voltage-to-input voltage, for the HBST and the ANPC HBST with ICS at f sw = 150 MHz are depicted in Figure 10.
In most operating regions, M k (D, I out ) is proportional to the duty cycle, D. However, at an output current of I out = 260 mA, M k of the HBST is non-linear for D ∈ [50%, 80%], which is similar to the result obtained in [24]. This is presumably due to an unstable chip-internal supply voltage of the HB and was solved in [24] by using a capacitive interposer that reduces the parasitic inductances between on-chip and off-chip buffer capacitances. The ANPC HBST with ICS is more robust in this regard, i.e., M k remains proportional to D, since the clamp switch TP 3 can take over a part of the excess current in the commutation loop during the rising edge of the switched voltage, v x,1 ; cf. Figure 3f (HBST) and Figure 3g (the peak of i d,TP2 decreases from 3.2 A to 2.5 A). This leads to a reduction of the current peaks in the on-chip input capacitor that is connected between V in1,2 and ground.

Impact of Dead Time on Efficiency
The characteristics of the converter efficiencies with respect to the dead times dt ↓ and dt ↑ are shown in Figure 11a,b, respectively. The general characteristics depicted in Figure 11a are similar to the simulated characteristics shown in Figure 6; however, the absolute values of the measured efficiencies are substantially lower, due to the additional conduction losses in the PDN and the metal layers of the chip. These relatively high conduction losses are inherent to the employed 14 nm CMOS technology, since PDN and metal layers use very thin conductors [6]. With regard to a falling edge of v x,1 , maximum efficiency is obtained for dt ↓ = dt ↓,opt . For a rising edge, the minimum possible dead time leads to the highest efficiency, since any increase of dt ↑ increases the conduction losses during the dead time. The standard ANPC HBST (without ICS) achieves only a comparably low maximum efficiency. For this reason, the efficiency evaluation given in the next subsection considers only the HBST and the ANPC HBST with ICS. result reveals the great efficiency improvements that are feasible with phase shedding if the IVR is operated at reduced output power levels. By way of example, at I out = 140 mA, the efficiency increases from 72% (four active converter phases) to 80% (two active phases) to 83.4% if only a single converter phase is active. The maximum achieved efficiency is 85.3% for two active phases, and I out = 300 mA; at a maximum output current of 780 mA, the efficiency is 83.1% (for all four phases being active).

Discussion
The footprint area of the Power Management IC (PMIC) of the investigated IVR is comparably small, which enables a very high current density of 24.7 A/mm 2 . (In this paper, the current density of the PMIC was defined as the maximum output current of the IVR divided by the area of the enabled power switches, gate drivers, and level shifters.) In comparison, the realization presented in [14], which is also based on a 14 nm CMOS technology and operates the buck converter in discontinuous conduction mode, reveals a current density of approximately 10 A/mm 2 (estimated based on Figure 8.5.7 in [14]).
The maximum achieved efficiency of 85.3% is in a similar range as the efficiencies achieved in [11,12,14] (84% for 1.5 V:1.15 and f sw = 100 MHz in [11], 80% for 1.2 V:0.93 V and f sw = 90 MHz in [12], and 88% for 1.6 V:1.2 V and f sw = 70 MHz in [14]). The decreased efficiency values are due to substantial conduction losses in the thin conductors of the metal layers and the power distribution network of the 14 nm CMOS technology node [6]. Accordingly, IVRs realized in more mature CMOS technology nodes can achieve higher maximum efficiencies, e.g., in [25], a peak efficiency of 91.5% was reported for an IVR realized in a 40 nm technology (3.3 V:2.4 V, f sw = 100 MHz). Conversely, the interconnects in CMOS nodes with higher integration levels, e.g., 7 nm, will be even thinner and, thus, more prone to generate even higher conduction losses. For this reason, future research is expected to increasingly address alternative realizations of the IVR's PMIC that are not directly integrated into the die of the microprocessor, e.g., 3D realizations as presented in [26].

Conclusions
This paper presented an experimental evaluation of chip-integrated HBST, ANPC HBST, and ANPC HBST with ICS topologies, using a high-end 14 nm CMOS technology. Compared to the conventional HBST, the ANPC HBST topologies guarantee the same voltages across the stacked transistors. The standard ANPC HBST is subject to increased switching losses and is less suitable for a multi-phase converter (no phase shedding possible). The ANPC HBST with ICS eliminates the shortcomings of the standard ANPC HBST. Compared to the HBST, it requires 14% more chip area; in return, it is more robust and achieves a higher efficiency at high output currents.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to internal policies of the industry research partner.

Conflicts of Interest:
The authors declare no conflict of interest.