Multi-Band Power Amplifier Module with Back-Off Efficiency Improvement using Ultra-Compact 3D Vertical Stack Multi-Chip Package for Cellular Handsets

A highly integrated multi-mode multi-band (MMMB) power amplifier module (PAM) using hybrid bulk complementary metal oxide semiconductor (CMOS), gallium arsenide (GaAs) heterojunction bipolar transistor (HBT), and silicon-on-insulator (SOI) technologies for low band (LB, 824–915 MHz) and high band (HB, 1710–1980 MHz) is proposed. The hybrid MMMB PAM integrates a bulk CMOS controller die, a GaAs HBT power amplifier (PA) die and a SOI switch die on a six-layer laminate. To simultaneously obtain both highly efficient and highly linear characteristics over a wide range of input power levels, a parallel dual-chain PA strategy has been adopted to provide vary bias current and gain for low-power mode (LPM) and high-power mode (HPM) operation. Additionally, a broadband two-section low-pass output matching network design based on the suppression of high-order harmonics is proposed for enhanced efficiency and linearity. In order to achieve further miniaturization, a three-dimensional (3D) die stack multi-chip module (MCM) packaging structure, where the presented CMOS controller die is stacked vertically on the GaAs HBT PA die, is implemented. The measurement results show that the fabricated MMMB PAM achieves 26.1–27 dB of power gains and 38–38.4% of PAEs at an output power (Pout) of 28 dBm in the HPM, and 20.4–20.9 dB of power gains and 12.4–13.8% of PAEs at Pout of 17 dBm in the LPM over LB. For HB, power gains of 24.3–26.7 dB while maintaining PAEs of 38.2–39.9% at Pout of 28 dBm, and power gains of 15.9–17.5 dB while maintaining PAEs of 12.3–12.8% at Pout of 17 dBm are realized in the HPM and LPM, respectively. The fabricated PAM covering five frequency bands and operating at two power modes only occupies a 5 × 3.5 mm2 area. To the best of the authors’ knowledge, this work is the first demonstration of a MMMB PAM adopting an ultra-compact 3D vertical stack MCM package with favorable RF performance.


Introduction
As more and more mobile communication standards (2G/3G/4G/5G) are incorporated in modern cellular handsets, single-band power amplifiers (PA) are not suitable anymore, which places multi-mode multi-band (MMMB) power amplifier modules (PAMs) in an increasingly vital role in cramping multiple bands into a single radio frequency (RF) front end to support the growing number of frequency bands for mobile device applications [1][2][3].
Generally, a complete MMMB PAM consists of one PA controller for diverse control levels, one or more PA blocks operating in multiple bands and one post-PA band-selected switch to separate different bands. A common perception is that bulk silicon complementary metal oxide semiconductor (CMOS) is currently an extremely suitable integrated circuit technology for high integration and low cost [4]. Thus, for the PA controller design, bulk CMOS is a cost-effective solution due to its ability to incorporate a massive number of transistors on a single die. However, in the fields of the PA blocks, as the modulation complexity and data rates increase in cellular communication, the high linearity and high peak-to-average ratio requirements have limited the use of the bulk CMOS process due to its inherently low power density and conductive substrate characteristics. To realize acceptable RF performance, most of the currently available PAs for handsets employ gallium arsenide (GaAs) heterojunction bipolar transistor (HBT) technology since it reveals higher power density and better linearity characteristic over the bulk CMOS device in high frequency [5][6][7]. Additionally, silicon-on-insulator (SOI) CMOS technology is more desirable compared to the bulk CMOS in the RF switch design thanks to its insulating substrate that lowers substrate losses and parasitic capacitances, and most importantly, compatibility that is able to integrate the analogy circuitry [8,9]. All things considered, in this article, we have implemented a handset MMMB PAM in hybrid bulk CMOS/GaAs HBT/SOI technologies, where a bulk CMOS process is used for the PA controller design, a GaAs HBT technology is utilized for the design of PA blocks and a SOI process is utilized for the post-PA band-selected switch design.
Commercially cellular wideband PAs are required to operate with high efficiency with high output power back-off regions. However, conventional PAs are designed to operate with significant back-off for high linearity, thereby remarkably decreasing the power efficiency [10]. To accommodate a higher data rate and extend battery life, many strategies for efficiency improvement over a wide power range, i.e., envelope tracking (ET) [11,12], envelope elimination and restoration (EER) [13], Doherty [14,15] and adaptive load [16,17]/bias [18,19] for high-efficiency linear PA have been investigated, but the complexity and cost are critical drawbacks. Moreover, various output matching topologies have been developed to meet the tough efficiency requirements at linear output power levels across a broadband frequency range. By squaring the output voltage waveform via the output network based on lumped-element resonators, Class-F/F −1 [20,21] operation can be achieved to provide excellent efficiency. Additionally, assisted by the capacitive second-order harmonic frequency impedance, Class-J/J −1 [22,23] loading networks exhibit outstanding efficiency performance, but their complexity and cumbersomeness are major obstacles for high-integration PAMs. In addition, it had been demonstrated that controlling second-or higher-order harmonics helps to deliver high efficiency in the PA design [24][25][26]. In this work, to improve the power added efficiency (PAE) at power back-off while achieving multiband operation, a parallel dual-chain two-stage PA strategy for different power modes is realized and a broadband output matching network (OMN) design based on the suppression of high-order harmonics is proposed.
In addition to the requirement of multiband operation with higher efficiency, multiband operation with smaller size has lately been strongly expected for cellular PAMs. Currently, a smart mobile device is capable of easily operating in more than 20 cellular bands and other non-cellular wireless services, revealing the complexity inherent with multiple RF chips operating and coexisting within a small physical volume. Thus, the implementation of handset PAMs with small size has become extremely crucial. Although the use of MMMB PAMs is highly beneficial to reduce the number of PAs and save size area, most PAMs reported or released using traditional planar multi-chip module (MCM) packaging structures still occupy large chip sizes [27][28][29][30]. Consequently, to achieve further miniaturization, a novel three-dimensional (3D) die stack packaging structure instead of a planar structure is adopted in our proposed PAMs, in which a presented bulk CMOS controller die is stacked vertically on a GaAs HBT PA die to realize the requirement of size reduction. To the best of the authors' knowledge, this work is the first demonstration of a MMMB PAM using a 3D vertical stack MCM package with superior RF performance for mobile handset applications. Figure 1 describes the simplified block diagram of the proposed MMMB PAM, which integrates a GaAs HBT PA die, a SOI dual single-pole double-throw (SPDT)/single-pole triple-throw (SP3T) switch die, a bulk CMOS controller die and multiple passive components mounting on a multi-layer laminate substrate. The PAM adopts an improved architecture to support penta-band cellular modulations. There are two PA blocks inside the GaAs HBT die for the LB (824-915 MHz) and HB (1710-1980 MHz). Both LB and HB PA blocks utilize a two-stage solution and are composed of a driver stage and a power stage. Particularly, in contrast to the LB PA block, the HB PA block has two switchable driver stages to separate the input signal.

Two-Stage Dual-Chain HBT PA Configuration
Typically, the efficiency of a PA decreases as the input power level decreases. small power level, the operating points of a PA are lowered further away from its sa tion point, leading to severe PAE degradation [31,32]. In this work, to simultane achieve both high efficiency and high linearity over a wide power range, a parallel stage dual-chain strategy with a common input, medium and output matching netw illustrated in Figure 2a, have been adopted to provide vary bias current and gain for power mode (LPM) and high-power mode (HPM) operation. In the HPM, both the m chain and aided-chain amplifiers in the driver stage and power stage are activated w high quiescent operating point for a high gain, and a 1 dB compression point (P1dB) dBm can be achieved. On the other hand, only the main-chain amplifiers are activated the aided-chain amplifiers in both driver and power stage are inactivated in the leading to the drop down of the DC bias current resulted from the decrease in the em area, as depicted in Figure 2b. Under the circumstances, the PA exhibits a low power and low P1dB of 17 dBm, thus benefitting the efficiency improvement in the presen a low input power level. More importantly, the proposed two-stage dual-chain HB strategy only adopts a single common input, medium and output matching netw Extra matching networks or switching circuits that might degrade the performance increase the chip size are not required.  Typically, the efficiency of a PA decreases as the input power level decreases. At a small power level, the operating points of a PA are lowered further away from its saturation point, leading to severe PAE degradation [31,32]. In this work, to simultaneously achieve both high efficiency and high linearity over a wide power range, a parallel two-stage dualchain strategy with a common input, medium and output matching network, illustrated in Figure 2a, have been adopted to provide vary bias current and gain for low-power mode (LPM) and high-power mode (HPM) operation. In the HPM, both the main-chain and aided-chain amplifiers in the driver stage and power stage are activated with a high quiescent operating point for a high gain, and a 1 dB compression point (P1dB) of 28 dBm can be achieved. On the other hand, only the main-chain amplifiers are activated and the aided-chain amplifiers in both driver and power stage are inactivated in the LPM, leading to the drop down of the DC bias current resulted from the decrease in the emitter area, as depicted in Figure 2b. Under the circumstances, the PA exhibits a low power gain and low P1dB of 17 dBm, thus benefitting the efficiency improvement in the presence of a low input power level. More importantly, the proposed two-stage dual-chain HBT PA strategy only adopts a single common input, medium and output matching networks. Extra matching networks or switching circuits that might degrade the performance and increase the chip size are not required.

Broadband Output Matching Technique
The design of the common input, inter-stage and output impendence matching networks are of great importance due to their sensitivities to frequency and RF performance [33]. In particular, the OMN design is most essential because it determines the bandwidth, output power level and efficiency [34,35]. This work presents a novel broadband twosection low-pass matching technique, as shown in Figure 3, which is designed to embed high-order harmonic manipulations for realizing harmonic suppression and efficiency promotion for the LB and HB PA blocks. In Figure 3a, the LB output matching circuit mainly contains 9 elements. At the fundamental frequency (f0), LLBo1, CLBo1, LLBo2 and CLBo2 form a second-order LC low-pass matching network, which plays the role of transforming the 50 Ω to the optimum load impedance (Ropt ≈ 3.5 Ω) at f0. The CLBo3 is a DC blocking capacitor. The CLB2f0 and LLB2f0 form the second-order harmonic frequency (2f0) trap, so that the output network can obtain a short-circuit load at 2f0 for realizing Class-F operation. The voltage waveform at the collector of the power-stage transistor exhibits sharper edges than sinusoid, lessening the overlay time between the voltage across and the current flowing in the transistor, thereby reducing the power loss. Similar to the LB, the HB output matching circuit is shown in Figure 3b. Considering the fact that higher harmonics can be exploited to further minimize the time during which the transistor maintains a large voltage and carries a large current, an additional tank consisting of CHBh and LHBh is added in the HB output matching network and is optimally designed between twice and three times the fundamental frequency to further realize different termination impedances for different harmonics, thereby improving the efficiency while sustaining good linearity. Figure 4a,b show the optimum load impedance traces on Smith Charts for both LB and HB.
LLBo1, LLB2f0, LHBo1, LHB2f0 and LHBh in the output matching circuits are composed of bond-wire inductance and laminate inductance. LLBo2 and LHBo2 are laminate inductances, and LLBo3, LLBo4, LHBo3 and LHBo4 are bond-wire inductances. The CLB2f0, CHB2f0 and CHBh with moderate capacitances, are placed on the HBT die. Six surface-mount devices (SMDs), CLBo1, CLBo2, CLBo3, CHBo1, CHBo2 and CHBo3, are used on the laminate to reduce the HBT die size and obtain low loss.

Broadband Output Matching Technique
The design of the common input, inter-stage and output impendence matching networks are of great importance due to their sensitivities to frequency and RF performance [33]. In particular, the OMN design is most essential because it determines the bandwidth, output power level and efficiency [34,35]. This work presents a novel broadband two-section low-pass matching technique, as shown in Figure 3, which is designed to embed high-order harmonic manipulations for realizing harmonic suppression and efficiency promotion for the LB and HB PA blocks. In Figure 3a, the LB output matching circuit mainly contains 9 elements. At the fundamental frequency (f 0 ), L LBo1 , C LBo1 , L LBo2 and C LBo2 form a second-order LC low-pass matching network, which plays the role of transforming the 50 Ω to the optimum load impedance (Ropt ≈ 3.5 Ω) at f 0 . The C LBo3 is a DC blocking capacitor. The C LB2f 0 and L LB2f 0 form the second-order harmonic frequency (2f 0 ) trap, so that the output network can obtain a short-circuit load at 2f 0 for realizing Class-F operation. The voltage waveform at the collector of the power-stage transistor exhibits sharper edges than sinusoid, lessening the overlay time between the voltage across and the current flowing in the transistor, thereby reducing the power loss. Similar to the LB, the HB output matching circuit is shown in Figure 3b. Considering the fact that higher harmonics can be exploited to further minimize the time during which the transistor maintains a large voltage and carries a large current, an additional tank consisting of C HBh and L HBh is added in the HB output matching network and is optimally designed between twice and three times the fundamental frequency to further realize different termination impedances for different harmonics, thereby improving the efficiency while sustaining good linearity. Figure 4a

Broadband Output Matching Technique
The design of the common input, inter-stage and output impendence matchin works are of great importance due to their sensitivities to frequency and RF perform [33]. In particular, the OMN design is most essential because it determines the bandw output power level and efficiency [34,35]. This work presents a novel broadband section low-pass matching technique, as shown in Figure 3, which is designed to e high-order harmonic manipulations for realizing harmonic suppression and effic promotion for the LB and HB PA blocks. In Figure 3a, the LB output matching c mainly contains 9 elements. At the fundamental frequency (f0), LLBo1, CLBo1, LLBo2 and form a second-order LC low-pass matching network, which plays the role of transfor the 50 Ω to the optimum load impedance (Ropt ≈ 3.5 Ω) at f0. The CLBo3 is a DC blo capacitor. The CLB2f0 and LLB2f0 form the second-order harmonic frequency (2f0) trap, s the output network can obtain a short-circuit load at 2f0 for realizing Class-F oper The voltage waveform at the collector of the power-stage transistor exhibits sharper than sinusoid, lessening the overlay time between the voltage across and the current ing in the transistor, thereby reducing the power loss. Similar to the LB, the HB o matching circuit is shown in Figure 3b. Considering the fact that higher harmonics c exploited to further minimize the time during which the transistor maintains a large age and carries a large current, an additional tank consisting of CHBh and LHBh is add the HB output matching network and is optimally designed between twice and times the fundamental frequency to further realize different termination impedanc different harmonics, thereby improving the efficiency while sustaining good line Figure 4a,b show the optimum load impedance traces on Smith Charts for both L HB.
LLBo1, LLB2f0, LHBo1, LHB2f0 and LHBh in the output matching circuits are compos bond-wire inductance and laminate inductance. LLBo2 and LHBo2 are laminate induct and LLBo3, LLBo4, LHBo3 and LHBo4 are bond-wire inductances. The CLB2f0, CHB2f0 and CHB moderate capacitances, are placed on the HBT die. Six surface-mount devices (SM CLBo1, CLBo2, CLBo3, CHBo1, CHBo2 and CHBo3, are used on the laminate to reduce the HB size and obtain low loss.      Figure 5 depicts the detailed schematic of the proposed two-stage dual-chain MMMB PAs for both LB and HB. In the LB PA block, a same emitter area of 504 um 2 has been employed for both main-and aided-chain amplifiers (QLBm1 and QLBa1) in the driver stage and an identical emitter area of 3570 um 2 has been adopted for the dual-chain amplifiers in the power stage (QLBm2 and QLBa2). In the HB block, a two-stage strategy with a switchable driver stage is presented for multiband operation. The inputs of the first and the second driver stages are connected to HB_RFIN1 and HB_RFIN2, respectively, and the outputs of both of them are connected to the input of the second power stage. The main-and aided-chain amplifiers in the first driver stage (QHB1m1 and QHB1a1) have the same emitter area of 336 um 2 , while for the second driver stage, (QHB2m1 and QHB2a1) have the same emitter area of 378 um 2 and the dual-chain amplifiers in the power stage (QHBm2 and QHBa2) have an identical emitter area of 1932 um 2 . In the HPM, a relatively high quiescent current of approximately 60-70 mA is shown as both the dual-chain amplifiers in the driver stage and power stage of both LB and HB PA blocks are activated. In the LPM, only the mainchain amplifiers are enabled and the aided-chain amplifiers in both driver and power stage of the LB and HB blocks are disabled, leading to a low quiescent current of approximately 30 mA.

Detailed Structure of Proposed MMMB PA
In view of the trade-off between efficiency and linearity, the proposed two-stage dual-chain PAs in both LB and HB have Mid-Class AB operation for the driver stages and Deep-Class AB operation for the power stages. However, the second-stage PA operating with a low quiescent bias point and high-efficiency output matching design would lead to gain expansion and lag phase shift, which in turn introduces severe nonlinear distortion and reduces linearity. To compensate for the distortion, the first-stage PA is designed to realize gain compression and lead phase shift by using appropriate matching networks and higher quiescent current bias. As a result, the first and second stage complementarily cancel the nonlinear components of AM-AM and AM-PM, thereby obtaining relatively flat gain and phase characteristics and improving linearity.
A detailed description of the broadband OMN design is illustrated in Section 2.2.2. The inter-stage matching network (ISMN) of the proposed work is carefully designed on L LBo1 , L LB2f 0 , L HBo1 , L HB2f 0 and L HBh in the output matching circuits are composed of bond-wire inductance and laminate inductance. L LBo2 and L HBo2 are laminate inductances, and L LBo3, L LBo4 , L HBo3 and L HBo4 are bond-wire inductances. The C LB2f 0, C HB2f 0 and C HBh with moderate capacitances, are placed on the HBT die. Six surface-mount devices (SMDs), C LBo1 , C LBo2 , C LBo3 , C HBo1 , C HBo2 and C HBo3 , are used on the laminate to reduce the HBT die size and obtain low loss. Figure 5 depicts the detailed schematic of the proposed two-stage dual-chain MMMB PAs for both LB and HB. In the LB PA block, a same emitter area of 504 um 2 has been employed for both main-and aided-chain amplifiers (Q LBm1 and Q LBa1 ) in the driver stage and an identical emitter area of 3570 um 2 has been adopted for the dual-chain amplifiers in the power stage (Q LBm2 and Q LBa2 ). In the HB block, a two-stage strategy with a switchable driver stage is presented for multiband operation. The inputs of the first and the second driver stages are connected to HB_RFIN1 and HB_RFIN2, respectively, and the outputs of both of them are connected to the input of the second power stage. The main-and aidedchain amplifiers in the first driver stage (Q HB1m1 and Q HB1a1 ) have the same emitter area of 336 um 2 , while for the second driver stage, (Q HB2m1 and Q HB2a1 ) have the same emitter area of 378 um 2 and the dual-chain amplifiers in the power stage (Q HBm2 and Q HBa2 ) have an identical emitter area of 1932 um 2 . In the HPM, a relatively high quiescent current of approximately 60-70 mA is shown as both the dual-chain amplifiers in the driver stage and power stage of both LB and HB PA blocks are activated. In the LPM, only the main-chain amplifiers are enabled and the aided-chain amplifiers in both driver and power stage of the LB and HB blocks are disabled, leading to a low quiescent current of approximately 30 mA. and minimum reflection loss. The IMN for both LB and HB blocks is designed using a simple L-type matching structure involving a series capacitor (CLBi1/CHB1i1/CHB2i1) and a shunt inductor (LLBi1/LHB1i1/LHB2i1). To improve the stability, a small series resistor of 5-Ohm (RLB1) and 8-Ohm (RHB1 and RHB2) for LB and HB PA blocks, respectively, is added before the base of the first-stage transistor. The reason we did not employ the stabilized resistor in the second stage is due to its significant deterioration of efficiency and linearity. Eventually, linear bias circuits are employed to the overall circuit.  Figure 6a illustrates the 0.18 μm SOI switch controller structure, where the voltage regulator consisting of a bandgap and low dropout regulator (LDO) is used to provide a stable positive voltage of +2.5 V, the charge pump is employed to generate a negative voltage of −2.5 V and the level shifters are utilized to switch the voltage level from +2.5/0 V to +2.5/−2.5 V for the gate bias of the switch-FETs in the on and off states, respectively. Figure 6b shows the detailed RF-core configuration of the presented SOI switch, which comprises a SPDT configuration for LB and a SP3T configuration for HB. The saturated output power of the PA is better than 28 dBm. In order not to limit the linearity of the PA, In view of the trade-off between efficiency and linearity, the proposed two-stage dual-chain PAs in both LB and HB have Mid-Class AB operation for the driver stages and Deep-Class AB operation for the power stages. However, the second-stage PA operating with a low quiescent bias point and high-efficiency output matching design would lead to gain expansion and lag phase shift, which in turn introduces severe nonlinear distortion and reduces linearity. To compensate for the distortion, the first-stage PA is designed to realize gain compression and lead phase shift by using appropriate matching networks and higher quiescent current bias. As a result, the first and second stage complementarily cancel the nonlinear components of AM-AM and AM-PM, thereby obtaining relatively flat gain and phase characteristics and improving linearity.

SOI Switch Design
A detailed description of the broadband OMN design is illustrated in Section 2.2.2. The inter-stage matching network (ISMN) of the proposed work is carefully designed on chip to transform the input impedance of the second stage to the optimal load impedance of the first stage for achieving sufficient gain and output power capability. It can be observed in Figure 5 that a single-section L-type matching structure including L LBc1 and C LBm1 /C LBm2 is used for the ISMN of the LB PA block, whereas the HB PA block adopts a two-section L-type matching structure including L HB1c1 /L HB2c1 , C HBm3 , L HBm2 and C HBm1 /C HBm2 . The L LBc1 , L HB1c1 and L HB2c1 consisting of bond-wire inductance and laminate inductance are also used to feed DC voltage to the first-stage amplifier. The conjugate match design is chosen for the input matching network (IMN) to realize maximum gain and minimum reflection loss. The IMN for both LB and HB blocks is designed using a simple L-type matching structure involving a series capacitor (C LBi1 /C HB1i1 /C HB2i1 ) and a shunt inductor (L LBi1 /L HB1i1 /L HB2i1 ). To improve the stability, a small series resistor of 5-Ohm (R LB1 ) and 8-Ohm (R HB1 and R HB2 ) for LB and HB PA blocks, respectively, is added before the base of the first-stage transistor. The reason we did not employ the stabilized resistor in the second stage is due to its significant deterioration of efficiency and linearity. Eventually, linear bias circuits are employed to the overall circuit. Figure 6a illustrates the 0.18 µm SOI switch controller structure, where the voltage regulator consisting of a bandgap and low dropout regulator (LDO) is used to provide a stable positive voltage of +2.5 V, the charge pump is employed to generate a negative voltage of −2.5 V and the level shifters are utilized to switch the voltage level from +2.5/0 V to +2.5/−2.5 V for the gate bias of the switch-FETs in the on and off states, respectively. Figure 6b shows the detailed RF-core configuration of the presented SOI switch, which comprises a SPDT configuration for LB and a SP3T configuration for HB. The saturated output power of the PA is better than 28 dBm. In order not to limit the linearity of the PA, eight stacking transistors are employed for both series and shunt arms to handle a power level up to 31 dBm with a voltage standing wave ratio (VSWR) as high as 5:1. The device peripheries in series and shunt arms are set to 3000 and 500 um, respectively, with the given stack height to support the IL and isolation requirements. Switching is controlled by two logic voltages, VC1 and VC2, from the bulk CMOS controller. Depending on the VDD and logic voltage level, the LBin signal is connected to one of two switched RF outputs (LB_RFOUT1 or LB_RFOUT2), while the HBin signal is simultaneously switched to one of three RF outputs (HB_RFOUT1, HB_RFOUT2 or HB_RFOUT3).

Bulk CMOS Controller Design
To ensure that the power gain of the power amplifier is constant under different ambient temperatures, different transmit powers and different operating voltages, the output voltage of the PA controller is required to have a deviation of less than 50 mV at room temperature. Figure 7a reveals the proposed controller structure based on a 0.18 μm bulk CMOS technology, which consists of a logic circuitry, a bandgap reference source (BGR) and multiple low dropout linear regulators (LDOs) and a buffer. The logic circuitry converts the logic control signal to the internal supply voltage and decodes it into the control signal for the subsequent BGR and the SOI switch through a buffer. The BGR is used to provide voltage reference and current bias for the subsequent LDOs. The LDOs aim to offer suitable bias regular voltages for the HBT PA. Since the CMOS controller is required to provide voltage biases with a specific temperature coefficient for the HBT PA, the output voltages of the CMOS controller are designed to have a negative temperature coefficient to compensate for the negative temperature coefficient of the triode base-emitter voltage. Figure 7b plots the temperature drift curse of Vref1 (one of the controller output voltages). It can be calculated that the average voltage is 2.89 V and the temperature coefficient is approximately −370 ppm/°C.

Bulk CMOS Controller Design
To ensure that the power gain of the power amplifier is constant under different ambient temperatures, different transmit powers and different operating voltages, the output voltage of the PA controller is required to have a deviation of less than 50 mV at room temperature. Figure 7a reveals the proposed controller structure based on a 0.18 µm bulk CMOS technology, which consists of a logic circuitry, a bandgap reference source (BGR) and multiple low dropout linear regulators (LDOs) and a buffer. The logic circuitry converts the logic control signal to the internal supply voltage and decodes it into the control signal for the subsequent BGR and the SOI switch through a buffer. The BGR is used to provide voltage reference and current bias for the subsequent LDOs. The LDOs aim to offer suitable bias regular voltages for the HBT PA. Since the CMOS controller is required to provide voltage biases with a specific temperature coefficient for the HBT PA, the output voltages of the CMOS controller are designed to have a negative temperature coefficient to compensate for the negative temperature coefficient of the triode base-emitter voltage. Figure 7b plots the temperature drift curse of Vref1 (one of the controller output voltages). It can be calculated that the average voltage is 2.89 V and the temperature coefficient is approximately −370 ppm/ • C.

Laminate Design
A MMMB PAM generally uses a MCM package structure where the laminate design is extremely essential as it contributes to the efficient interconnection of multiple dies and SMDs, thereby reducing assembling loss and achieving high performance. As shown in Figure 8a, a multiband PAM typically uses a conventional planer MCM packaging structure, where multi-functional dies are directly mounted on a laminate. To achieve smaller size and realize further miniaturization, a novel 3D die stack packaging structure instead of a planar structure is adopted in our proposed PAMs, as shown in Figure 8b, in which the PA controller die is stacked vertically on the PA-core die to realize size reduction. Figure 9a exhibits the laminate designed in this work. Note that the SOI die and the GaAs HBT die are mounted directly on the laminate, whereas the bulk CMOS die is 3Dstacked vertically on the GaAs HBT die to save physical volume. Golden bond wires are used to form electrical connections between the laminate and different dies. In particular, the transmission of regular voltage signals (Vref1, Vref2, …, Vref10) are directly realized via the bond wires between the CMOS die and the GaAs die. Similarly, the transmission of controller signals (VDD, VC1 and VC2) are directly implemented through the bond wires between the CMOS die and the SOI die. Additionally, as described in Figure 4 above, several matching transmission lines and capacitors (CLBo1, CLBo2, CHBo1 and CHBo2) of the LB and HB PA OMNs are designed on the laminate to achieve lower loss and better efficiency for broadband operation. As depicted in Figure 9b, the proposed laminate is based on a multilayered structure containing six copper metal layers (M1, M2, …, M6) with the same thickness of 23 um and five epoxy dielectric layers (D1, D2, …, D5) with the same relative

Laminate Design
A MMMB PAM generally uses a MCM package structure where the laminate design is extremely essential as it contributes to the efficient interconnection of multiple dies and SMDs, thereby reducing assembling loss and achieving high performance. As shown in Figure 8a, a multiband PAM typically uses a conventional planer MCM packaging structure, where multi-functional dies are directly mounted on a laminate. To achieve smaller size and realize further miniaturization, a novel 3D die stack packaging structure instead of a planar structure is adopted in our proposed PAMs, as shown in Figure 8b, in which the PA controller die is stacked vertically on the PA-core die to realize size reduction.

Laminate Design
A MMMB PAM generally uses a MCM package structure where the laminate des is extremely essential as it contributes to the efficient interconnection of multiple dies a SMDs, thereby reducing assembling loss and achieving high performance. As shown Figure 8a, a multiband PAM typically uses a conventional planer MCM packaging str ture, where multi-functional dies are directly mounted on a laminate. To achieve sma size and realize further miniaturization, a novel 3D die stack packaging structure inste of a planar structure is adopted in our proposed PAMs, as shown in Figure 8b, in wh the PA controller die is stacked vertically on the PA-core die to realize size reduction. Figure 9a exhibits the laminate designed in this work. Note that the SOI die and GaAs HBT die are mounted directly on the laminate, whereas the bulk CMOS die is 3 stacked vertically on the GaAs HBT die to save physical volume. Golden bond wires used to form electrical connections between the laminate and different dies. In particu the transmission of regular voltage signals (Vref1, Vref2, …, Vref10) are directly realized the bond wires between the CMOS die and the GaAs die. Similarly, the transmission controller signals (VDD, VC1 and VC2) are directly implemented through the bond wi between the CMOS die and the SOI die. Additionally, as described in Figure 4 abo several matching transmission lines and capacitors (CLBo1, CLBo2, CHBo1 and CHBo2) of the and HB PA OMNs are designed on the laminate to achieve lower loss and better efficien for broadband operation. As depicted in Figure 9b, the proposed laminate is based o multilayered structure containing six copper metal layers (M1, M2, …, M6) with the sa thickness of 23 um and five epoxy dielectric layers (D1, D2, …, D5) with the same relat  Figure 9a exhibits the laminate designed in this work. Note that the SOI die and the GaAs HBT die are mounted directly on the laminate, whereas the bulk CMOS die is 3D-stacked vertically on the GaAs HBT die to save physical volume. Golden bond wires are used to form electrical connections between the laminate and different dies. In particular, the transmission of regular voltage signals (V ref1 , V ref2 , . . . , V ref10 ) are directly realized via the bond wires between the CMOS die and the GaAs die. Similarly, the transmission of controller signals (VDD, VC1 and VC2) are directly implemented through the bond wires between the CMOS die and the SOI die. Additionally, as described in Figure 4 above, several matching transmission lines and capacitors (C LBo1, C LBo2, C HBo1 and C HBo2 ) of the LB and HB PA OMNs are designed on the laminate to achieve lower loss and better efficiency for broadband operation. As depicted in Figure 9b, the proposed laminate is based on a multilayered structure containing six copper metal layers (M1, M2, . . . , M6) with the same thickness of 23 um and five epoxy dielectric layers (D1, D2, . . . , D5) with the same relative dielectric constant (ε r ) of 4.3. The thickness of the D3 layer is 80 um while the rest (D1, D2, D4 and D5) are of the same thickness of 30 um. To investigate all the parasitic coupling effects and enhance the design accuracy, the electromagnetic (EM) simulation for the proposed laminate, including the output matching elements and all RF signal bonding wires, is carried out.  Figure 10 shows the complete microphotograph of the proposed MMMB PAM that integrates the designed GaAs HBT PA die, SOI switch die and bulk CMOS controller die on the implemented laminate using a MCM package. To achieve further miniaturization, the CMOS controller die is 3D stacked vertically on the HBT PA die. Thus, an entire chip size as small as 5 × 3.5 mm 2 is realized for the proposed PAM, where the die sizes of the HBT PA, SOI switch and CMOS controller are 1.45 × 1.05 mm 2 , 1.3 × 0.7 mm 2 and 0.84 × 0.44 mm 2 , respectively.   Figure 10 shows the complete microphotograph of the proposed MMMB PAM that integrates the designed GaAs HBT PA die, SOI switch die and bulk CMOS controller die on the implemented laminate using a MCM package. To achieve further miniaturization, the CMOS controller die is 3D stacked vertically on the HBT PA die. Thus, an entire chip size as small as 5 × 3.5 mm 2 is realized for the proposed PAM, where the die sizes of the HBT PA, SOI switch and CMOS controller are 1.45 × 1.05 mm 2 , 1.3 × 0.7 mm 2 and 0.84 × 0.44 mm 2 , respectively.  Figure 10 shows the complete microphotograph of the proposed MMMB PAM that integrates the designed GaAs HBT PA die, SOI switch die and bulk CMOS controller die on the implemented laminate using a MCM package. To achieve further miniaturization, the CMOS controller die is 3D stacked vertically on the HBT PA die. Thus, an entire chip size as small as 5 × 3.5 mm 2 is realized for the proposed PAM, where the die sizes of the HBT PA, SOI switch and CMOS controller are 1.45 × 1.05 mm 2 , 1.3 × 0.7 mm 2 and 0.84 × 0.44 mm 2 , respectively. On-evaluation-board small-signal and large-signal measurements were carried out for the broadband PAM under 3.4 V of power supplies (VCC1, VCC2 and Vbat) and 1.8 V of control logics (V_EN, Vmode1, Vmode2 and Vmode3). Figure 11a depicts the smallsignal S-parameter and stability factors (k and b) experiment setup using a Keysight E5071C network analyzer and RIGOL DP832A DC power supply, and Figure 11b On-evaluation-board small-signal and large-signal measurements were carried out for the broadband PAM under 3.4 V of power supplies (VCC1, VCC2 and Vbat) and 1.8 V of control logics (V_EN, Vmode1, Vmode2 and Vmode3). Figure 11a depicts the small-signal S-parameter and stability factors (k and b) experiment setup using a Keysight E5071C network analyzer and RIGOL DP832A DC power supply, and Figure 11b illustrates the large-signal parameter measurement setup with a Keysight N5182B vector signal generator, Keysight N9020A signal analyzer and RIGOL DP832A DC power supply.   Figure 12 shows the measured S-parameters of the proposed MMMB PA in both HPM and LPM for the LB and HB. It can be seen in Figure 12a that small-signal power gains (S21) varying from 27.5-27.8 dB and 19.7-20.9 dB are obtained in the HPM and LPM, respectively, over the frequency range from 824 to 915 MHz for the LB. In both modes, input return loss (IRL) of better than −9 dB are achieved. As shown in Figure 12b, the HB PA delivers S21 ranging from 28-30 dB and 16-19 dB in the HPM and LPM, respectively, with IRLs of better than −9 dB for 1710 to 1980 MHz frequency range. Additionally, it can be noted in the HPM that with the use of the proposed broadband output matching technique, the differences of output power between the fundamental and the second-/thirdorder harmonic frequencies are as large as approximately 45/60 dB and 60/60 dB for the LB and HB, respectively.    Figure 12 shows the measured S-parameters of the proposed MMMB PA in both HPM and LPM for the LB and HB. It can be seen in Figure 12a that small-signal power gains (S21) varying from 27.5-27.8 dB and 19.7-20.9 dB are obtained in the HPM and LPM, respectively, over the frequency range from 824 to 915 MHz for the LB. In both modes, input return loss (IRL) of better than −9 dB are achieved. As shown in Figure 12b, the HB PA delivers S21 ranging from 28-30 dB and 16-19 dB in the HPM and LPM, respectively, with IRLs of better than −9 dB for 1710 to 1980 MHz frequency range. Additionally, it can be noted in the HPM that with the use of the proposed broadband output matching technique, the differences of output power between the fundamental and the second-/third-order harmonic frequencies are as large as approximately 45/60 dB and 60/60 dB for the LB and HB, respectively.   Figure 12 shows the measured S-parameters of the proposed MMMB PA in both HPM and LPM for the LB and HB. It can be seen in Figure 12a that small-signal power gains (S21) varying from 27.5-27.8 dB and 19.7-20.9 dB are obtained in the HPM and LPM, respectively, over the frequency range from 824 to 915 MHz for the LB. In both modes, input return loss (IRL) of better than −9 dB are achieved. As shown in Figure 12b, the HB PA delivers S21 ranging from 28-30 dB and 16-19 dB in the HPM and LPM, respectively, with IRLs of better than −9 dB for 1710 to 1980 MHz frequency range. Additionally, it can be noted in the HPM that with the use of the proposed broadband output matching technique, the differences of output power between the fundamental and the second-/thirdorder harmonic frequencies are as large as approximately 45/60 dB and 60/60 dB for the LB and HB, respectively.           The performance comparison between this work and previous reported results is listed in Table 2. It can be observed that the proposed hybrid CMOS/HBT/SOI PAM exhibits the comparable performance over other state-of-the-art reported PAMs. Particularly, with the use of the 3D vertical stack MCM package strategy, the proposed PAM covering five frequency bands and operating at two power modes only occupies a 5 × 3.5  The performance comparison between this work and previous reported results is listed in Table 2. It can be observed that the proposed hybrid CMOS/HBT/SOI PAM exhibits the comparable performance over other state-of-the-art reported PAMs. Particularly, with the use of the 3D vertical stack MCM package strategy, the proposed PAM covering five frequency bands and operating at two power modes only occupies a 5 × 3.5 mm 2 area, which achieves the smallest chip size compared to the published work in Table 2. To the best of the authors' knowledge, this work is the first demonstration of a MMMB PAM using 3D vertical stack MCM package to achieve extremely small size with favorable RF performance.

Conclusions
A fully integrated MMMB PAM with one bulk CMOS die, one GaAs HBT die and one switch die on a six-layer laminate has been developed for mobile handset applications. On the one hand, by using a parallel dual-chain two-stage PA strategy with broadband matching technique, the PAM is qualified for operating at two power modes for different power levels in both LB and HB, thereby improving back-off efficiency with multiband operation. The implemented PAM delivers sufficient output power and exhibits PAEs as high as 38-39.9% for HPM and 12.4-13.8% for LPM. On the other hand, the ultra-compact 3D vertical stack MCM package structure in which the CMOS die is 3D stacked vertically on the HBT die contributes significantly to chip size reduction. The fabricated PAM covering five frequency bands only occupies a 5 × 3.5 mm 2 chip size. It is a demonstration that the PAM, with its related design and package techniques, is capable of meeting the stringent requirements of modern mobile communication systems. The proposed wideband highefficiency PAM can be a new practical solution to the realization of MMMB cellular handsets with smaller chip size.

Data Availability Statement:
The presented data in this paper are available on request from the first author.

Conflicts of Interest:
The authors declare no conflict of interest.