Worldwide digitalization has led to an explosion of mobile data traffic within the recent fourth generation (4G) and long term evolution (LTE) telecommunication generations. To ensure the enormous user experience demands for future digital systems and services are met, 5G/6G is expected to provide 1000× more capacity [1
]. To achieve this enormous increase in data rates, several wideband millimeter-wave (mmWave) frequency bands between 24–53 GHz for 5G commercial wireless telecommunication have been allocated by the third generation partnership project (3GPP) new radio (NR) standard [2
], and frequencies above 100 GHz for 6G are being discussed [3
]. However, at mmWave frequencies, we must overcome not only the increasing path losses but also the decreased antenna size. In fact, it has been proposed that large phased arrays with RF beamformers are needed to provide enough antenna gain and spatial coverage [5
]. This results in a dramatic change in the transceiver implementation and raises severe design challenges. First, the size of an antenna array at mmWave frequencies is comparable to the wavelength (antenna elements are spaced by
), and at 30 GHz,
is ≈1 cm. This results in the requirement that the transmitter should be small and highly integrated. Second, in large phased arrays, the required output power levels for each antenna decrease as the number of antennas increases. This results in the fact that, in phased arrays, each antenna is preceded by a small or medium power amplifier (PA), which is preferably integrated in the transceiver RFIC [9
]. Third, due to the high modulation schemes and high PAPR (peak to average power), wideband up to 400 MHz 256-QAM based OFDM signals are proposed, and the specified linearity requirements are very strict, as can be seen in Table 1
. As a result, the phase linearity of the transmitter needs to be small. On the other hand, the wider FR2 bands’ adjacent channel power ratio (ACPR) requirements shown in Table 2
are more relaxed and allow more spectral distortion [17
]. To achieve linearity, the PA is usually backed off at around 10 dB from its peak power. As efficiency is proportional to the output amplitude, the PA operates inefficiently most of the time. In traditional macro base stations (BS), this is solved by massive digital predistortion (DPD), which linearizes a single and large efficiency-enhanced PA (conventional BS power consumption is highly dominated by the PA). With the introduction of mmWave phased arrays, the number of PAs is so large that linearizing each of them separately is simply not cost-effective; instead, either analog or averaged effects are needed [18
This paper describes a highly linear and compact four-stack PA compatible for 3GPP/NR FR2 bands n258 and n257 that is implemented using GLOBAL FOUNDRIES 45 CMOS SOI technology. The achievable output power is limited by the nominal VDD, which is 1 in 45 CMOS SOI. Fortunately, transistor stacking is possible in the SOI technology, which allows higher operating voltages to be used and thus higher output power. In addition, with compact input and output matching and a distributed transistor core, the total size and thus parasitics of the PA are minimized, providing higher gain.
This paper is an extended version of [19
]. The structure and the design flow of the proposed stacked PA is presented in detail with an illustration of the actual layout in Section 2
. Section 3
shows the measurement setup in detail. Measured results with simulations are shown and compared against the state-of-the-art in Section 4
. Conclusions are presented in Section 5
2. Stacked Power Amplifiers
In CMOS SOI technology, the transistor body is not tied to a substrate but instead can be connected to a preferred node or left floating. The proposed stacking PA structure utilizes this feature. The devices are stacked; i.e., they are electrically floating on top of each other [20
]. This enables a higher VDD and thus higher output power as the devices in the stack do not exceed the breakdown voltages if bias points are selected correctly. The schematic of the design is shown in Figure 1
. The design is constructed by stacking four devices; in this case, by stacking 40 nm floating body devices. Based on the design manual, VDD can be increased from a nominal 1
and still maintain its reliability. Thus, by stacking four transistors, we can increase the VDD up to
taking into account the inductor and routing losses. As a result, the maximum output power that can be transferred to a 50 load is above 20 dBm or >
V peak-to-peak signal.
The gates of the transistors in the stacked PA are not RF grounded, but the inter-stage matching and voltage swing of
are controlled by dimensioning the gate capacitors
correctly. In order to avoid breakdown, the transistors source node waveforms are kept synchronous and progressively increased. Note also that each stage in the stack increases the delay and parasitics (which is a problem especially at mmWave frequencies), and thus it is impractical to increase the number of stacked devices excessively [21
]. The simulated voltage swings at the drain nodes of the proposed design with an input power of 0 dBm are presented in Figure 2
. It can be seen that the amplitude roughly doubles after each stage, but also a small amount of delay can be observed after each stage. In fact, with the gate capacitors, we also minimize the delay and provide a linear increase in the voltage swing in the stack. In addition, a direct match to a 50 load is enabled by optimizing the device size and gate capacitors. As a result, an additional output impedance matching network is not needed, which simplifies the design and minimizes the parasitics, losses and area.
The PA core (highlighted with red dashed line in Figure 1
) consists of four current combined power cells. The schematic of one power cell is shown in Figure 3
. Each stage consists of three parallel devices. The total width of each device is 21
with a minimum length to maximize the speed. Thus, the total width of
(three devices in one cell and four cells in parallel) is 258
each. The transistor size is optimized in terms of power, linearity and efficiency. The layout of the PA core is illustrated in Figure 4
. Power cells are connected symmetrically, the input is connected from both sides of the power cells to minimize the gate resistance, and the drain nodes of
are combined as currents instead of power and thus connected directly to the output. As can be seen in Figure 3
and Figure 4
, each gate capacitor is distributed in eight small metal oxide metal (MOM) capacitors. This is due to the fact that small capacitors (in a range of 6 × 6 um2) can be placed close to the transistor cells with fewer parasitics. The total value of the gate capacitors decreases higher in the stack.
= 330 fF is the largest capacitor, while
= 208 fF is the smallest capacitor.
Output matching (see Figure 1
) consists of a parallel DC feed inductor
and high Q (HQ) metal-insulator-metal (MIM) capacitor
. These (along with stack transistor sizing and gate capacitors) are optimized to the
and to provide good gain. The operation point of the PA is selected in moderate class AB by using external analog controls for
is biased using a digitally controlled, variable current (current DAC) source followed by a diode-connected transistor. The gate bias of
can be tuned with 3 bit control from 100 mV up to 700 mV.
= 450 mV is set as a nominal gate bias value for each transistor. In order to prevent breakdown,
are derived from
V VDD in an external control board and thus turned on at the same time. An additional precaution is taken with
, which prevent breakdown by setting a 100 mV voltage at
in case the VDD is turned on while the current DAC is set to 0.
The input matching is implemented using a high density (HD) metal insulator metal (MIM) capacitor
and two turn
, from which the signal is fed to the transistor core via the center tap of
. This provides compact and high Q matching around 26 GHz. As the input impedance of
is inherently capacitive, the input matching is designed so that it resonates out the capacitive load and maximizes the power transfer to the PA. Resistive parasitics of the
play a significant role in the resonator’s Q factor and thus, by feeding the signal out from the center tap of the
, the resistive parasitics are decreased significantly. The DC is blocked from input and output nodes by using HD MIM capacitors
of 1 pF, which are low loss capacitors at mmWave frequencies. In addition, a large 10 pF HD MIM capacitor is used in the bias feed to provide a sufficient RF ground. An ADS Momentum EM simulator is used to design the input and output matching circuit, while the PA core (see Figure 4
) is verified using parasitic extraction.
3. Measurement Setup
The micrograph of the fabricated integrated stacked PA is presented in Figure 5
. Including the input and output pads, the dimensions of the PA is 684
. By excluding the probe pads, the active area is only 239
. In Figure 5
, HD MIM capacitors are highlighted with red rectangles in the micrograph (
), and the HQ MIM capacitor is shown with a blue rectangle (
A Keysight PNA-X network analyzer was used to measure the proposed PA with Cascade Infinity I40 probes on a Cascade Microtech model 11,000 probe station. The measurement system is large and does not fit into an environmental chamber for temperature dependency testing, for example. In a single-tone power sweep, a measurement pre-amplifier (Caio Wireless CA263-141) was needed due to the fact that the PNA-X was not able to provide enough power to drive the PA into the compression. The power was calibrated at the end of the input cable. Then, the measurement was normalized using a Thru standard that was implemented on chip. The power calibration was repeated for every measured frequency point in single-tone power sweep measurements. The actual input and output power at the reference planes (see Figure 5
) were calculated by subtracting the measured losses of the probe and the Thru.
Measurements with a modulated signal were performed using a Keysight UXA N9040A signal analyzer. A Keysight arbitrary waveform generator M5502A was used to generate a 3GPP/NR FR2 OFDM 100 MHz wide 16-QAM signal, which was mixed to mmWave frequencies with a Keysight E8267D signal generator. In order to minimize the EVM error of the test setup, we did not use a pre-amplifier, and therefore the signal generator limited the available input power.