Multiphase interpolating digital power amplifiers for TX beamforming

This paper presents a 4-channel beamforming TX implemented in 65nm CMOS. Each beamforming TX is comprised of a C-2C split-array multiphase switched-capacitor power amplifier (SAMP-SCPA). This is the first use of multiphase interpolation (MPI) for beam-steering. This technique is ideal for low-frequency beamforming and MIMO, as it does not require passive or LO based phase shifters. The SCPA is ideal to use as the core element since it can perform frequency translation, data conversion and drive an output at high power and efficiency in a compact die area. A prototype 4-element beamforming TX, occupying 2 mm X 2.5 mm, can achieve peak output power of 24.4 dBm with a peak system efficiency (SE) of 24%, while achieving<1{\deg} phase resolution and<1 dB gain error. When transmitting a 15 MHz, 64 QAM long-term evolution (LTE) signal it outputs 18.4 dBm at 14% SE with a measured adjacent channel leakage ratio (ACLR)<-30 dBc and error vector magnitude (EVM) of 3.27 %-rms at 1.75 GHz. A synthesized beam pattern based on measured results from a single die achieves<0.32{\deg}-rms beam angle error and<0.15 dB rms beam amplitude error.


Introduction
In recent years, there has been a tremendous growth in research focused on increasing data transmission capacity. Beamforming and MIMO techniques that leverage arrayed transceivers can increase communications capacity through SNR enhancement, spatial diversity or spatial multiplexing. The larger the array, the more the transmission capacity can be increased. In a transmitter (TX) beamformer, the amplitude and phase of the signal being transmitted on each TX in an array can be set accurately to steer a beam toward a user or to steer multiple beams to multiple users. TX beamformers are most flexible when individual data streams can be formed and combined in the digital baseband. Ideally, the combined spatial streams would then be connected to every antenna element in the array. However, this requires data conversion in each signal path, in addition to frequency translation and front-end amplification, as shown in Figure 1a This leads to high power consumption per TX chain and precludes the use of large-scale antenna arrays.
Traditional analog beamforming can be grouped into three main methods, with LO phase-shifting [1,2], IF phase-shifting [3] and RF phase-shifting [4,5] architectures. It is noted that TX and RX beamforming operations typically have reciprocal behavior for each of the cases, and in many cases can share/re-use hardware. Generally, RF phase-shifting architectures consume lower power, since the baseband and IF stages can be shared for all signal paths; hence, only the phase shifting and gain weighting needs to be duplicated. However, nonlinearity and losses can have severe impacts on the fidelity and efficiency of the TX. Additionally, purely RF-and LO-based approaches can only be used for singlebeam beamforming. To fully leverage MIMO approaches, IF-based beamforming must be used, but this requires a digital-to-analog converter (DAC), as well as frequency translation in every signal path, as shown in Figure 1a. Direct-digital, bits-to-RF (DDRF) transmitters are attractive for use in digital beamforming transmitters as they combine the functionality of a DAC and mixer [6] and can also embed the power amplifier, as in the switched-capacitor power amplifier (SCPA) [7]. Recently, DDRF techniques have begun to be investigated in digital beamforming applications [8][9][10], as shown in Figure 1b.
In this paper, a DDRF beamformer using a split-array multiphase SCPA (SAMP-SCPA) is proposed [11][12][13][14]. The proposed digital beamforming TX leverages the SAMP-SCPA as the core transmitter element in a four-element beamforming transmitter, though any digital power amplifier (DPA) could be adapted to use the MP approach [15][16][17][18][19][20]. As with other DDRF techniques, SAMP-SCPA allows for simultaneous frequency translation, digitalto-analog conversion and front-end power amplification. Moreover, it allows for high resolution complex beam weighting by leveraging precision control of the clocking edges and the ratioed capacitance of a switched-capacitor array. In fact, the mechanism for beam weighting is the same mechanism that is used for precision wideband data modulation, with only changes to the digital baseband required for beamforming operation.
This work expands upon our conference paper, providing new details, including a review of prior DDRF beamforming transmitters, an analysis of the phase and amplitude accuracy and new circuit and measurement details. In Section 2, a review of recent applications of highly digital beamforming transmitters/PAs is provided. Next, in Section 3, the theory and operation of multiphase interpolation beamforming are provided. This is followed by Section 4, where details of the circuit and system design are presented. Measurement results are presented in Section 5 and are followed by conclusions in Section 6.

Review of Highly Digital Beamforming Transmitters/PAs
In traditional analog beamforming transmitters, the beam weighting is performed in either the IF/baseband, LO or RF paths, using either passive phase shifters (e.g., reflective [21]), or active phase shifters (e.g., Cartesian combiner [4], ring oscillators [2], etc.). In purely analog beamformers, only one beam can be formed per array. Digital beamforming makes it possible to operate with a single beam or with multiple beams using digital processing, but it can be costly in hardware and power consumption as it requires a DAC in every transmitter path.
Recently, DDRFs have been proposed for use in beamforming transmitters, as they combine the functions of data conversion, frequency translation and front-end amplification [22]. Such configurations naturally enable digital beamforming, as individual beams can be combined digitally in the baseband DSP and directly input to the DDRFs digital decoders, but to date, there have been limited reports of DDRFs being used in beamforming.
A pulse-based transmitter with digitally programmable pulse shape and delay elements was proposed by Wang, et al. for a UWB transmitter ( Figure 2) [8]. In this transmitter, digitally programmable delay lines were used to linearly adjust the true time-delay between transmitter paths. Each of the paths is comprised of a cascade of digitally programmable delay elements. This implementation does not allow complex weighting between elements, as only the delay and pulse shape can be adjusted. Although using time delay, rather than phase shifting allows for a wider frequency bandwidth, the technique suffers from reduced phase resolution at higher frequency due to degradation in the performance of the delay elements. A Cartesian phase shifter [4] was used in combination with a current-mode polar DPA by Qian, et al. to realize a DDRF beamforming transmitter ( Figure 3) [9]. In this implementation, the phase shift required for beamforming is performed using a digitally weighted Cartesian phase shifter, before driving the RF input of a current-mode DPA. Amplitude weighting could be accomplished separately in the AM path, with the caveat that higher resolution would be needed to allow for both modulation and weighting. This implementation achieves outstanding phase accuracy at moderately high output power, owing to the polar DPA acting as an embedded transmitter, but the current-mode DPA requires digital pre-distortion (DPD) to correct for nonlinearity in the DPA. Operation in the polar domain affords high energy efficiency but suffers from challenges due to systematic nonlinearity. A bandpass ∆ − Σ modulator (DSM) combined with an N-path filter [10] was proposed by Zheng, et al. (Figure 4). The inherently linear 1.5b ∆ − Σ DAC combined with the N-path filter reduces the transmitter area, while not requiring significant DSP. The architecture is promising, particularly for dense arrays in deeply scaled CMOS. Chip area reduction is critical for dense arrays, but it is notable that the DSM requires significant interpolation for oversampling, and the digital baseband consumes a fairly high amount of power, considering that the implementation does not include a high-power output stage. The implementation achieved excellent beam accuracy, but the bandwidth (<1 MHz) and error vector magnitude (EVM) were relatively limited (>3.5% at 400 MHz, >6.6% at 1.2 GHz; both for single-carrier 64-QAM) and it would probably require significantly increased power for more complex modulation. To overcome the challenges of the aforementioned DDRF beamforming architectures, we propose multiphase interpolation beamforming using the SAMP-SCPA [12] ( Figure 5). The SAMP-SCPA can output any required amplitude and phase at the rate that its decoders can operate. Multiphase signaling does not have the same inherent bandwidth limitations as polar signaling, and the SCPA is more linear than current-mode DPAs due to the use of ratioed capacitors. The addition of beamforming to the operation requires only modification to the logic decoder, which can perform the vector combination of modulation and beam weight. Finally, the SCPA allows for an embedded output-stage/power amplifier that can operate linearly at low power [23] and at powers exceeding 1 W [16,24]. Multiphase interpolation beamforming using an SAMP-SCPA is discussed in detail in the next sections.

Multiphase Interpolation Beamforming
The proposed beamforming transmitter block diagram schematic is shown in Figure 5. An SAMP-SCPA forms the core [14] of the multiphase interpolation beamforming transmitter. In the proposed transmitter, a multiphase (MP) clock is generated using a ring oscillator that is injection-locked to a global clock. Any conventional technique can generate the MP clock (e.g., multi-stage ring oscillator, MP DLL, polyphase RC filter, etc.) The MP is then passed to an MP logic decoder that selects the appropriate phase and encodes the appropriate amplitude of each phase in the digital domain, before the signal is reconstructed using the SAMP-SCPA as a power DAC.
The SCPA is a DPA where the original variants were all operated in the polar domain [7,[25][26][27]. Digital polar operation requires a coordinate rotation digital computer (CORDIC) to convert a Cartesian signal (e.g., I + jQ) into a polar signal (e.g., Ae jφ ), which results in bandwidth expansion of both the amplitude (A) and phase (φ) components, due to the nonlinear conversion. Polar systems are problematic for wide bandwidth modulation due to the bandwidth truncation and timing misalignment that dominate the out-of-band noise and linearity [28]. Ideally, phase modulation is implemented with a phase-locked loop (PLL), but this is unsuitable for wider band modulation [29,30]. Quadrature modulation can be used to create wideband phase modulation [31], but if performed before the output stage it still requires precise time alignment with the amplitude signal, and if performed in the output stage it is subject to peak output power reduction [15,24,32].
The multiphase interpolator was proposed to overcome these challenges [13]. In multiphase modulation, the complex plane is subdivided into M-basis phases that can be weighted and summed to achieve an output at any arbitrary amplitude and phase. It should be noted that the multiphase modulator has recently shown superior linearity as a stand-alone phase modulator [33,34].

Single Multiphase Transmitter Operation
An example of multiphase vector addition is shown in Figure 6, where the complex plane is divided by M = 8 basis phases (φ 0 − φ 7 , Figure 6a). In the example, the approach to generating the vector v with amplitude A and phase θ is depicted. First the two adjacent basis phases of the clock, φ 0 and φ 1 , are selected (See Figure 6b), then they are individually weighted by basis phase weights n 1 and n 2 . The basis phase weights can first be found by mapping to the traditional Cartesian basis vectors, according to the following [13]: , and (1) A representative multiphase SCPA (MP-SCPA) is shown in Figure 6c. The weights n 1 and n 2 control how many of the capacitors are switched on φ 0 or φ 1 , respectively, and the sum of n 1 and n 2 is bound by the following: where k represents the total number of bits in the array (e.g., 2k = N). Furthermore, n 1 and n 2 can be found in terms of A and θ using the following substitution for I and Q: Substitution of (4) and (5) into (1) and (2) yields the following: , and (6) It can be shown that the voltage amplitude V out of the output voltage across R opt , for a given supply voltage, V DD and set of codes and basis phases is given by the following [13]: The output power P out is given by the following: The input power P in is the power required to switch the total input capacitance C in : where f 0 is the output frequency. C in is given by the following: where C is the value of a unit capacitor in the array. The total array capacitance can be selected to optimize matching (e.g., larger unit capacitors [35]) or efficiency at backoff, by controlling the network quality factor Q NW [13]. Split-array techniques allow a trade-off between the two [14]. The SCPA is a series resonant circuit where Q NW is given by the following: L ser (Figure 6c) is chosen to be resonant with the total array capacitance: It is noted that the series resonant inductor and output resistor can be replaced with any load and an impedance matching network, where the aim of the matching transformation is to present a net inductive reactance and real impedance designed for the desired power level, according to (9).
With a known output power and input power, the drain efficiency η can be found according to the following: Substituting (9)-(12) into (14) yields the following: The efficiency can be found at any output power level and phase angle by appropriately selecting n 1 and n 2 according to (1) and (2). It is noted that choosing a larger Q NW increases the efficiency level at output power backoff, at the expense of output bandwidth.
Operation using split arrays such as the SAMP-SCPA do not change the operation as presented above, but they do allow the resolution of the SCPA to be arbitrarily controlled while also controlling Q NW . Additionally, it is noted that the multiphase technique originally proposed in [13] has been adapted for use only as a constant envelope phase modulator [33,34].

Amplitude and Phase Resolution in MP-SCPA
Both n 1 and n 2 are quantized values that are used to reconstruct arbitrary output amplitude and phase combinations. Because the number of available states decreases as the amplitude decreases, the phase resolution for small amplitudes also decreases. This is true for any digital multiphase transmitter, including the special cases of the quadrature digital transmitters (e.g., M = 4) [15,32]. Because the entire array can be switched fully by either φ 0 or φ 1 , or by a combination of the both, there are 2k + 1 possible states between the basis phases. To illustrate, constant amplitude arcs are plotted for several values of M and k in Figure 7. It is noted visually that increasing the MP-SCPA resolution k increases both the amplitude and phase resolution, as would be expected. Similarly, increasing the number of phases increases both the amplitude and phase resolution, particularly for low output amplitudes, as the density of states in each cone between two adjacent phases increases as the number of basis phases M increases. To quantify the impact of both k and M, simulations of an ideal MP-SCPA were run across the full output amplitude and phase range. The RMS phase error is plotted as a function of the normalized output amplitude for a k = 9b array, for an ideal MP-SCPA with M = {4, 8, 16}, in Figure 8a. We note that as the number of phases is increased, the discrete number of amplitude/phase states covers a reduced amount of area, meaning that the RMS error would be expected to be reduced. As expected, when doubling the number of phases, the same RMS phase error can be achieved for 3 dB less power. The RMS amplitude error is plotted as a function of the normalized output amplitude for a k = 9b array, for an ideal Figure 8b. Increasing the number of phases does not have a significant impact on the RMS amplitude error at large amplitudes, but the increased density of states does have some impact at lower amplitude. In MP-SCPAs, it was noted that as M is increased, the average power drop relative to a polar system is reduced at the expense of reduced time available for charge settling and hence reduced linearity [13]. It was noted that M = 16 resulted in a good trade-off between the power drop, efficiency and linearity. The RMS phase error is plotted versus normalized output amplitude for M = 16 and several different array resolutions in Figure 9a. For an array resolution of 10b, <1 • RMS phase error can be achieved for ∼20 dB of output power range. The RMS amplitude error is plotted versus normalized output amplitude for M = 16 and several different array resolutions in Figure 9b. As expected, increasing the array resolution reduces the RMS amplitude error by ∼6 dB per bit of resolution.

Multiphase Beamforming Operation Example
The operation of an MP-SCPA used in MP interpolation beamforming is explained by the block diagram schematic in Figure 10. First, a set of basis phases (e.g., φ 0 − φ 15 ) that span the unit circle is generated by a multiphase clock generator. Next, the desired instantaneous modulation phase (180 • ) and beam phase (30 • ) are added as inputs into a phase selection logic that determines the total desired phase shift (210 • ) and selects two adjacent phases (φ A = 202.5 • and φ B = 225 • ) from the set of basis phases to this desired output phase. The desired amplitude of the output is next provided as an input, as the product of the instantaneous modulation envelope and the desired beam weighting. From this, the decoder determines n 1 and n 2 , the required weightings for φ A and φ B , respectively. Finally, these weights are applied to an SAMP-SCPA to determine how many of the cells are switched on φ A , how many are switched on φ B and how many are held at ground. In this way, the weighted basis phases are added on the SAMP-SCPA capacitor array to form a vector summation that contains the desired output amplitude and phase modulation, as well as the desired beam steering and weighting. Unlike [9], which only applies digital phase shifting using a quadrature digital phase shifter at the input of a polar DPA, the proposed design allows for the amplitude and phase weighting to occur at the SAMP-SCPA, and the weighting can be completely controlled by an individual multiphase logic decoder, which saves power and area and allows direct recombination at the output stage.
In choosing the designed resolution, it is noted that the phase and amplitude control for the beam steering are simply added as offsets using a digital encoder. The resolution required for wideband wireless communication to suppress out-of-band noise and achieve in-band high fidelity (e.g., low EVM and ACLR) is higher than that required to achieve high phase and amplitude resolution for beam steering [14]. Hence, array resolution is primarily dictated by the in-band and out-of-band signal requirements of the communication signal to be transmitted.

Circuit Design Details
The proposed four-element TX beamformer architecture is shown in Figure 5. It consists of four identical 16b TXs that each drive an off-chip ceramic balun and SMP connector jack for interfacing to an antenna. The resolution of the TX was chosen to be similar to the split-array MP-SCPA (SAMP-SCPA) previously presented in [14]. Whereas in that design the I/O was serialized, allowing the full array resolution to be tested, in this design the I/O remains parallelized. Hence, although the circuit is designed with a 16b array resolution, the measurement setup is limited to 22 I/O channels, meaning that only the upper 9b of the array could be utilized in the measurement setup. The circuit design details of all major blocks in the beamforming TX are now discussed.

MP Clock Generation
Each TX has a local eight-stage pseudo-differential ring oscillator, as shown in Figure 11, which is injection-locked to a global clock to create 16 evenly spaced basis phases. The delay cell used in the ring oscillator is the Maneatis delay cell [36]. The global clock is input to the chip via an LVDS clock RX and care is taken to route the clock with an equal delay to each ring oscillator, to provide a common time/phase basis. In cases where layout/routing mismatches create too much phase variation, calibration can be implemented through control of the digital input codes of the individual TX slice. All phases are input into a MUX tree that is controlled by the MP logic decoder.

MP Logic Decoder
The MP logic decoder shown in Figure 12 takes as its input all basis phases from the MP clock generator and digital input codes representing the desired output phase and amplitude. An additional control bit allows each transmitter to separate beam weighting and modulation. When the control bit is enabled, beam weighting and steering are performed in two steps. First, the clock selection logic chooses the two adjacent phases to the desired output phases. The clock selection logic uses 4 input bits and is used to select two adjacent phases (φ A and φ B ) from the original input phases that will be routed to the SAMP-SCPA. At this stage, a course phase shift has been achieved, given that the output phase must be between the two input phases selected. Next, using a 16b amplitude and 16b phase code the decoder finds the weights, n 1 and n 2 for φ A and φ B , respectively, according to (1)- (7). In this way a fine beam weight and phase shift is achieved.
When the control bit is disabled, the prior weight and phase shift are stored as the desired spatial weighting for the beam to be formed. The amplitude and phase input can now be added to the stored beam weighting, resulting in the computation of new values for n 1 and n 2 and selection of new phases for a vector output, where the output amplitude is a product of the modulation envelope with the desired beam weight and the output phase is the sum of the beam phase and modulation phase, as discussed in Section 3.3.
The decoders that set the weights for n 1 and n 2 are identical cascaded binary-tothermometer decoders [13]. The first 16b binary-to-thermometer decoder selects how many cells are switched by phase φ A . The second 16b binary-to-thermometer decoder selects whether the balance of the cells are switched by φ B or held at ground. All decoders are written in Verilog and then synthesized and automatically placed and routed.

16-b SAMP-SCPA
The schematic for one of the SAMP-SCPA cores is shown in Figure 13. The SAMP-SCPA was chosen due to its ability to achieve good linearity and output power with high efficiency in a compact area [14]. The schematic and layout were performed in slices, where the entire cell was designed from input logic through to capacitor as an individual slice. Unary weighted cells were designed to be identical and C-2C cells were optimized so that their delay matched the unary cells, given that each C-2C cell drives a slightly different impedance. The individual layout slices were then tiled to realize the layout of the completed SCPA. The design of each element in the slice, shown in Figure 14, is now described starting from the input logic.

SAMP-SCPA Input Logic Design
The input to each cell of the SAMP-SCPA consists of two clock signals with identical frequency and a phase separation equal to 360 • /M, a phase selection control (SEL) that determines whether to switch on the leading or lagging phase and an enable bit (EN) that enables/disables switching on the selected phase.
The clock signals are input into a static MUX, whose output is controlled by SEL. Any MUX implementation is appropriate, but a static NAND-NAND MUX was used for easy pitch matching with other cells in the layout. The output of the MUX drives one input of a static NOR gate whose other input is EN. When EN is low, the clock signal propagates to the switch driver chain, and when it is high, the clock signal is blocked from the driver chain. This saves power as the driver chain in each cell does not consume power when the cell is off. This also effectively closes the output switch to ground, helping the SCPA to present a constant impedance to the load.

Switch Driver Slice Design
After the input logic, a driver chain consisting of two parallel inverter chains (Figure 14a) was used to drive the output switch. In one path, a level shifter like the one proposed in [37] is used to convert the input logic level from . This path is used to drive the PMOS transistor in the switch (Figure 14c). Inverters after the level shifters are placed in deep N-wells to allow operation from the shifted supply rails. The other path operates between V GND and V DD to drive the NMOS transistor in the switch. Each path is comprised of a cascade of scaled buffer cells based upon the unit cell, as shown in Figure 14b.
The driver slice is located adjacent to the switch and takes its input from the MP logic decoder, where the decoders outputs have been pitch-matched to the appropriate input in the array. Co-location of the logic and driving chains allows the parasitic routing capacitance to be minimized and easier timing synchronization of the switching signals.

Output Switch and Capacitor
To provide for a larger optimum termination impedance, which allows for reduced loss in the output matching network, the switch is composed of a cascoded CMOS inverter (Figure 14c). This topology allows each transistor in the stack to maintain no more than V DD across any two terminals. This is a feature of the SCPA that is unique amongst CMOS PAs. The switch widths are optimized to drive the slice capacitance optimizing the power/delay product.
In the unary MSB sub-array, all the unit capacitors in each path are identical, so the size for every switch in the unary MSB path is identical. In the C-2C LSB sub-array, the total equivalent input capacitance for each successive bit in a C-2C array increases linearly as the number of C-2C bits are increased. In addition, the nodal parasitic capacitance cannot be ignored when considering the total equivalent capacitance. Hence, the size of the transistor in the switch and the drivers in the C-2C LSB paths should be optimized, so that the delay is matched to that of the unary paths.
The capacitors are arrayed so that the output of the switch pairs with the top plate of the capacitor, which minimizes exposure of this node to the substrate. The bottom plates of every capacitor slice in the array are shared. The array is sub-divided into a 12b C-2C LSB sub-array and 4b unary MSB sub-array. The choice of array resolution was primarily dictated by the signal fidelity requirements and complexity/area in the layout, noting that for every additional unary bit, the number of cells doubles, whereas an additional binary bit only increases the cell count by one. In our case, it has been shown that signal fidelity requirements can be largely met using 9b-10b of array resolution. The choice of where to sub-divide the array also depends on the desired linearity and complexity of the thermometer decoder required for the unary weighted bits [14].
For the presented design, all these options were considered for the array size and segmentation before settling on the chosen values. Using a 4b unary and a 12b binary allows for a design that can meet the signal fidelity requirements and also makes digital predistortion possible if needed, due to excess "throw-away states", all while not exceeding the assigned dimensions for each transmitter, which were dominated by I/O pad requirements and the size of the matching network, as discussed next.

Matching Network
The total capacitance in the array seen from the matching network remains constant, regardless of the input code. This is because when a switching cell is disabled, it holds the top plate of the capacitor at a constant potential through a fixed-value resistance such that the impedance seen looking into each slice of the array is constant. Hence, the matching network is unchanged for any choice of input code. The matching network shown in Figure 13 comprises a shunt inductor L sh , a series inductor L ser and a shunt capacitor C sh , forming a band-pass network that presents R opt to the capacitor array and is series resonant with the total array capacitance.
Each PA is matched to 50 Ω differentially at the pads on the chip. The bondwire inductance is in series with the PA output and is resonated with an off-chip capacitor at the center frequency of the band. An off-chip ceramic transformer balun is used to convert the output from differential to single-ended before connection to an SMP jack.
The on-chip matching and off-chip network serve to filter high-frequency harmonics that arise due to the switching behavior. The −3 dB output power bandwidth is ∼700 MHz, centered at 1.8 GHz. The bandwidth is determined by the Q NW (∼3) of the band-pass matching network. Q NW is primarily chosen to maximize the efficiency of the topology while minimizing loss in the impedance transformation network. If off-chip impedance transformations are used, a higher Q NW can be chosen.

Experimental Results
A prototype four-element beamforming TX was fabricated in a 65 nm RF CMOS process with nine metal layers, including an ultra-thick top metal for high-quality passive components. The chip microphotograph is shown in Figure 15a. The combined area of all four TXs is 5mm 2 , including the matching network, output stage, logic decoders and the I/O pad frame. The chip area is dominated by the I/O pad requirements and could be reduced in an SOC implementation. All circuits operate at 1.4 V, except for the cascaded switches that operate at 2.8 V. The TX array is chip-on-board bonded to a PCB, and an off-chip transformer balun converts the differential signal to single-ended to drive an SMP jack, as shown in Figure 15b. An external clock signal is received by an on-chip low-voltage differential signaling (LVDS) amplifier and is used to injection-lock the multistage ring oscillators in every path. The digital I/O is input from a high-speed digital I/O (HSDIO) pattern generator and is composed of the bits to control phase selection and the fine phase and amplitude control of each SAMP-SCPA. The HSDIO that was used limited the number of available I/O lines to 24, meaning that only the 9b of the array for each SAMP-SCPA could be utilized. MSBs were favored due to their larger impact on output power and efficiency. Individual TXs were characterized for both their static and dynamic (modulated) characteristics and beamforming measurements as follows.

Static Measurements
The individual TXs showed similar measured performance. The measurements considered all losses including those of the off-chip balun and SMP connector. The measured peak output power P out and the system efficiency (SE) versus the frequency are plotted in Figure 16a,b, respectively. A peak output power and SE of 24.4 dBm and 24.2% were observed at the center frequency of 1.75 GHz, respectively. Also plotted are the simulated output power at the connector P out,smp and the simulated output power at the IC periphery P out,IC , shown in Figure 16a. In Figure 16b, the simulated SE is plotted at the connector, SE smp and the IC periphery SE IC . The measured output power versus input code and measured SE versus output power at three frequencies across the band are shown in Figure 16c,d, respectively. It is again noted that SE is measured at the connector, rather than wafer probed, which is what accounts for the reduced peak efficiency compared to other recently reported DPAs. The sharp roll-off in performance away from the center frequency is due to an external capacitor that is used to resonate the bondwire inductance from the packaging and due also to a ceramic transformer balun that is placed at the output of each PA.

Dynamic Measurements
To verify the individual TX performance with signals with large PAPR, it was tested in both a polar mode, where the injected clock is phase modulated, and in a full multiphase mode, where the injected clock has a static input phase. In the polar modulation mode, a 15 MHz, 64 QAM, OFDM modulated signal was input to the PA, and in the multiphase mode, a 10 MHz, 64 QAM, OFDM modulated signal was input to the PA. The bandwidth was limited by the fact that the data clock buffering was undersized, to drive the input parasitic at a higher desired data rate. This was due to legacy circuits that were used on the much larger four-element chip. The problem was discovered after fabrication, and it limited the data clock to a rate of 150 MHz.
The measured PSD is shown for both the polar case and the multiphase case at low, center and high frequency, in Figure 17. The measured ACLR was <−30 dBc for all measurements. The ACLR level is largely determined by de-troughing. The signal is detroughed to reduce the PAPR until it just meets the −30 dBc ACLR limit for E-UTRA [38]. De-troughing could be reduced to improve the ACLR at the expense of reduced output power and efficiency. It is noted that, in the polar case, systematic nonlinearities due to bandwidth limitation and timing mismatch create large spectral aliases. The measured signal constellations are shown for the polar and multiphase modulation cases at low, center and high frequencies, in Figure 18. The measured EVM for the signal is found to be <4% RMS across all frequencies and output beam angles.  Due to the linearity of the transmitter, digital pre-distortion (DPD) is not used at any frequency and ACLR and EVM are maintained across the band. Prior work has shown that for the 64 QAM OFDM signal used, the primary degradation to linearity is due to supply network parasitics (e.g., packaging inductance and resistance and on-chip/offchip decoupling capacitance) [39]. To mitigate supply-network-dependent nonlinearity, we adopted staggered "de-Q" decoupling capacitors to maintain a low supply network impedance across frequency, while reducing ringing due to high-Q resonances [31,40].
The wideband power spectral density for both polar and multiphase transmission is plotted in Figure 19a. The out-of-band noise is typical for a DPA with 9b of resolution. Additionally, the signal is measured as the phase angle is arbitrarily controlled between 0 • and 360 • . The EVM is plotted vs. output phase angle for an individual transmitter, as shown in Figure 19b.

Beamforming Measurements
To validate the ability to form beams, four TXs on a single die were measured for their performance across the 9b phase control code range. The phase error is plotted as a function of phase code in Figure 20a. There was a static phase offset in the initial measurements that was corrected with a static off-line calibration. After the calibration, the phase error was <±1 • across the 9b code range. The output power is plotted as a function of output phase in Figure 20b for each of the four TXs and varies by <±0.5 dB for outputs across the code range. No amplitude calibration was applied.
To estimate the performance of the four independent elements in beamforming, the array factor was synthesized using the measured data for several pointing directions for a linear array with λ/2 spacing. The measured transmitter data from four transmitters on a single die was fully characterized across the desired beam amplitude and phase. This measured data was then used to synthesize the array factor and hence the beam pattern, given the assumptions about antenna element spacing. The same pattern was synthesized for a linear array with ideal transmitters, and the two patterns are compared in Figure 21. The synthesized beam from the measured data matches well with the ideal synthesized beam across all output phases. The beam phase error as a function of ideal output phase angle is plotted for the synthesized pattern based on measured transmitters, in Figure 22. The resulting phase error across all patterns was 0.32 • RMS.

Discussion
A four-element beamforming TX was introduced and implemented in 65 nm CMOS. The beamformer leverages multiphase interpolation to enable beam steering and beam weighting without the need for external phase shifters or individual DACs for each element. The technique can leverage any DPA to act as the weighting element, because DPAs are band-pass transmitters that simultaneously enable frequency translation and data conversion, and can operate with a large effective gain in a small die area. This is because in band-pass DPAs, the output reconstruction filter is a band-pass filter that is often smaller than the low-pass baseband filters that are used in conventional up-converting transmitters. The SCPA, which in recent years has been relabeled an SC-RFDAC or a C-DAC DPA, was chosen as the DPA to simultaneously realize high linearity and power and system efficiency with a small die area. When operating at 1.75 GHz, all four TXs can deliver a peak Pout of 24.4 dBm with 24.2% SE, while achieving <1 • phase resolution and <1 dB gain error. The performance was validated by static and dynamic (modulation) measurements using both polar mode and multiphase mode without the use of DPD. The ACLR was below the required −30 dBc LTE standard for both modes, and the measured EVM was 3.27% RMS and 3.13% RMS, respectively.
A comparison with the prior state of the art for digital beamforming transmitters is provided in Table 1. Compared to [9], which is the most closely related work, the proposed work achieves similar phase and amplitude resolution but at higher output power and without the use of DPD, while achieving better linearity (ACLR and EVM). This is partially due to the use of the multiphase technique, which does not have the systematic nonlinearities of polar modulation, but is primarily due to the use of the SCPA, which is more linear than the current-mode DPA. Additionally, beam amplitude weighting is natively included in the proposed work. A comparison with the prior state of the art for recent DPAs and DTXs is provided in Table 2. The DPA performance alone compares well to other recent DPAs, and we note that it is the only one measured at the connector, rather than wafer probed. Because of the relatively high output power and small area, and the ability to independently weight the output beam and adapt its angle on an element-by-element basis in the digital domain, this can be deployed in large-scale antenna arrays used for both beamforming and for MIMO.