A Tree-Based Architecture for High-Performance Ultra-Low-Voltage Ampliﬁers

: In this paper, we introduce a novel tree-based architecture which allows the implementation of Ultra-Low-Voltage (ULV) ampliﬁers. The architecture exploits a body-driven input stage to guarantee a rail-to-rail input common mode range and body-diode loading to avoid Miller compensation, thanks to the absence of high-impedance internal nodes. The tree-based structure improves the CMRR of the proposed ampliﬁer with respect to the conventional OTA architectures and allows achievement of a reasonable CMRR even at supply voltages as low as 0.3 V and without tail current generators which cannot be used in ULV circuits. The bias currents and the static output voltages of all the stages implementing the architecture are accurately set through the gate terminals of biasing transistors in order to guarantee good robustness against PVT variations. The proposed architecture and the implementing stages are investigated from an analytical point of view and design equations for the main performance metrics are presented to provide insight into circuit behavior. A 0.3 V supply voltage, subthreshold, ultra-low-power (ULP) OTA, based on the proposed tree-based architecture, was designed in a commercial 130 nm CMOS process. Simulation results show a dc gain higher than 52 dB with a gain-bandwidth product of about 35 kHz and reasonable values of CMRR and PSRR, even at such low supply voltages and considering mismatches. The power consumption is as low as 21.89 nW and state-of-the-art small-signal and large-signal FoMs are achieved. Extensive parametric and Monte Carlo simulations show the robustness of the proposed circuit to PVT variations and mismatch. These results conﬁrm that the proposed OTA is a good candidate to implement ULV, ULP, high performance analog building blocks for directly harvested IoT nodes.


Introduction
The continuous evolution of electronic systems and the ever increasing symbiotic relationship between humans and electronic devices characterize the era of Internet of Things (IoT) [1,2]. Smart and portable devices, such as laptops, smartphones, smartwatches, fit-trackers and so on, are used more and more often for checking emails, banking management, counter services and the like. Indeed, most of these electronic apparatuses have changed the way we work, study or play.
This IoT revolution has also driven the development of body area networks [3], which exploit implantable and wearable devices, and are widely used in healthcare monitoring and in the study of neurodegenerative diseases such as Parkinson's, Alzheimer's and so on [4][5][6][7].
The growing popularity of these electronic devices is also due to their increasing capability to work with low power consumption and low supply voltage in order to maximize battery life or employ energy harvesting techniques.
The stringent requirements in terms of ultra-low-power (ULP) and ultra-low-voltage (ULV) operation set by the above applications have brought about a revolution also in output voltages of all the stages implementing the proposed architecture are accurately set through the gate terminals of biasing transistors in order to guarantee a good robustness against PVT variations. However, this biasing strategy results in pseudo-differential stages and therefore has a negative impact on CMRR performance. The proposed tree-like structure improves the CMRR of the OTA with respect to conventional pseudo-differential amplifiers and allows achievement of a reasonable CMRR even in ULV conditions. A 0.3 V supply voltage ULP OTA based on this architecture was designed in a 130 nm CMOS process, and simulation results show state of the art small-signal and large-signal figures of merit (FoMs).
The paper is organized as follows: Section 2 introduces the proposed OTA architecture. Circuit analysis is reported in Section 3. Section 4 deals with design and simulation results and conclusions are drawn in Section 5.

Proposed Topology
The block scheme of the proposed OTA architecture is depicted in Figure 1. This architecture of ULV OTA was derived from the OTA introduced by the authors in [20] and is a three stage, tree-like OTA, made up of the cascade of differential-to-single-ended converter stages, to maximize CMRR. Three different topologies are exploited in the three stages of the OTA to optimize the tradeoff between performance and efficiency. Each one of these stages was extensively investigated and their behavior is discussed in the next subsections. It has to be remarked that the proposed ULV OTA makes extensive use of the body terminals of MOS devices and thus it can be implemented only in CMOS technologies (such as triple-well-bulk or FDSOI), where both NMOS and PMOS transistors have available body connections. However, this is not a strong limitation, since most modern processes have available body connections for both PMOS and NMOS transistors.  The topology of the blocks denoted as stage 1 in Figure 1 is reported in Figure 2, and is made up of transistors M 1A , M 1B and M 2A , M 2B . This input stage has the same topology adopted for the OTA in [40]. It is a bulk-driven stage in which the bias current is accurately set through the V GN voltage applied to the gate of transistor M 2A . The bias voltage V GN is generated by the biasing circuit reported in Figure 3. The current flowing in M 2A is mirrored through M 1A and M 1B , so that the standby current of all MOS devices is accurately set. The body terminals of transistors M 1A and M 1B are connected to the input voltages, V IP and V I M , respectively. The output of stage 1 is loaded through a body-diode connection on the transistor M 2B whose gate voltage is connected to the bias voltage V GN , and results in an output impedance lower than the one of conventional input stages. This stage thus provides limited gain, but allows achievement of a rail-to-rail input common mode range and improvement of the bandwidth. As a consequence, noise and mismatch of the second stage contributes to the total input referred noise and offset. However, even if noise and offset performance are suboptimal, the OTA can still be designed to exhibit acceptable noise and offset, while achieving very good bandwidth efficiency.

Stage 2
The topology of stage 2 is shown in Figure 4. This stage converts the input differential signal to single-ended providing some gain, a well defined bias point and contributing to the overall CMRR. The input signal is applied to the gates of M 4A and M 4B , and the bias current is set through the gates of M 3A and M 3B connected to the bias voltage V GP generated by the circuit in Figure 3. The current cancellation given by the body-to-body (B2B) current mirror (Appendix B) M 4A , M 4B allows to attain good common mode rejection ratio as will be better shown in the next sections. Since the output is body-loaded, also this stage doesn't show any high-impedance internal node and thus does not require any internal compensation.

Stage 3
The topology of stage 3 is shown in Figure 5. This stage combines the signal behavior of an inverter-based pseudo-differential pair (Arbel topology) with differential-to-singleended conversion through the body current mirror and robust biasing, and is composed by an n-input and a p-input stage similar to that of Figure 4, but without diode loading, connected together. The signal is applied to the gates of two PMOS and two NMOS devices, respectively M 6A , M 6B and M 8A , M 8B , and the body-diode connections in M 6A and M 7B implement body-driven current mirrors performing differential-to-single-ended conversion and common mode current cancellation. Transistors M 5A , M 5B and M 7A and M 7B act as current sources and are exploited to set the bias current in all the branches of the third stage through V GP and V GN , respectively; thus, each transistor has a well-defined bias point.

Architectural Considerations
It has to be noted that, referring to the proposed architecture, at the interfaces between stage 1 and stage 2 and between stage 2 and stage 3 , we have a body-to-gate (B2G) connection. These B2G connections result in lower voltage gain with respect to the conventional drainto-gate connections, but the lower gain allows avoidance of high-impedance internal nodes, and therefore compensation capacitors. In fact, even if each B2G interface generates a pole (as shown in Appendix A), it is placed at a much higher frequency than the one given by the output stage, which provides the dominant pole.

Circuit Analysis
In this section, the small-signal and large-signal performances of the proposed architecture are analyzed from an analytical point of view, and design equations for the main performance parameters, such as gain, frequency response, slew-rate and noise, are presented to provide insight into circuit behavior.

Differential Gain
Referring to the small-signal equivalent circuits of stage 1 , stage 2 and stage 3 , the differential mode gain of the different stages was computed. Using the standard notation for small-signal parameters of MOS devices, the differential gain of the first stage can be expressed as: where: According to usual approximations, the pole-zero doublet in Equation (1) can be neglected.
Thereafter, the differential gain of stage 2 can be derived to be: where: Moreover, in this case, the pole-zero doublet in Equation (3) can be neglected. Finally, the stage 3 differential gain can be computed by neglecting the pole-zero doublets given by body-diode connections of M 6 A,B and M 7 A,B ; hence, it can be expressed as: where it is denoted with: considering that M 5 = M 8 and M 6 = M 7 . The overall gain of the amplifier can then be expressed as: and rewritten as: It is evident from Equation (8) that the output capacitance sets the dominant pole since the poles of stage 1 and stage 2 are at higher frequencies due to the body-diode connected loads and the smaller load capacitances.
Starting from the above results, the gain-bandwidth product (GBW) of the proposed OTA can be computed as: where: The phase margin of the whole OTA can then be expressed as: According to Equation (11), the proposed OTA requires a minimum value of C L for stability. However, Equation (11) shows also that the desired phase margin can be set by properly designing MOS devices' size for a given load capacitor; a higher C L results in a smaller GBW and a larger phase margin.

Common Mode Gain
The common mode gain of stage 1 was found to be: where: therefore, the CMRR of stage 1 can be expressed as: The common mode gain of stage 2 is: where: whereas its CMRR amounts to: Stage 3 shows a common mode gain of: where: and its CMRR results: Due to the body current mirror, the CMRR of these stages is reduced with respect to stage 1 . Combining the above results, the common mode gain of the proposed tree-like architecture can be derived as: Finally, the CMRR of the overall OTA can be expressed as: therefore, the total CMRR is about: By looking at Equation (22), it is evident that the CMRR in typical conditions is high, due both to the cascade of several stages and to the scaling factor of the tree architecture, and that it can be enhanced by further iterating the tree-like structure of the proposed OTA architecture. However, in ULV conditions, PVT variations and mismatch may impact on the stability of the operating point, especially in the presence of a B2G interface, and significantly degrade the CMRR i-th of the OTA. As a consequence, the CMRR of this architecture is more sensitive to PVT variations and mismatch than other architectures which adopt higher supply voltages and/or a more stable operating point. Anyway, to cope with this problem, design centering techniques are exploited in this work in order to increase the overall CMRR in a given range of PVT and mismatch conditions achieving a reasonable robustness. The above reported frequency analysis shows that the common mode gain presents some zeros that could appear before the unity-gain frequency (depending on the C L /C gs ratio), thus reducing the CMRR at high frequency. A large load capacitance is usually required to achieve stability, therefore the resulting CMRR reduction is often limited.

Large-Signal Performances
The large-signal performance of the proposed OTA has been investigated by assuming that the load capacitance C L is much larger than the other circuit capacitances. The slew-rate is thus determined by the output stage, and it can be assumed that the output voltage v O2 of stage 2 , which drives stage 3 , is a rail-to-rail signal.
With reference to Figure 5, the output current is given by For the current, we use the standard relationship for sub-threshold current: where U t = kT/q is the thermal voltage and |V th n,p | = V th n,p 0 − α n,p |V bs |. For the positive slew-rate, we have v 1 = V DD and v 2 = 0, and we can assume that the body voltages of M 6 B and M 7 A are approximately 0. By denoting with I re f , the quiescent current of the devices of stage 3 , we obtain: where: ∆V BH = −V B0 , ∆V GH = V DD − V GP with V B0 and V GP the quiescent voltage at body and gate terminals of the NMOS and PMOS devices.
For the negative slew-rate, we have v 1 = 0, v 2 = V DD and in this case we derive: where: ∆V BL = V DD − V B0 and ∆V GL = V DD − V GN with V GN as the quiescent voltage at gate terminals of NMOS devices. In this case, we assume that the body terminals of M 6 B and M 7 A are approximately V DD . Equations (25) and (26) show that, in general, positive and negative slew-rates give different results.

Noise Analysis
The noise analysis has been carried out assuming that each transistor can be modelled with only one noise current generator, which includes both thermal and flicker noise. The power spectral density of the modelled current generator can be expressed as follows: where: Taking into account that the noise sources due to stage 3 can be neglected due to the high gain of the preceding stages (considering also the contribution of the tree structure), the equivalent input noise mainly results from the first two stages and can be expressed as follows: S v eq = S n 1 + S n 2 2 g 2 mb 1 As it can be observed from Equation (30), the noise performance of the amplifier is worsened by body driving, which shows a transconductance gain (i.e., g mb ) which is n-times lower than g m . Consequently, in order to reduce the equivalent input noise, larger transistors are required. The result in Equation (30) can be written in a less concise form as: where and are the input-referred noise spectra for the first and second stage (contribution of the single cell). Factor 16 in the denominator of (31) accounts for the 2 (N−1) gain contribution of a N-level tree architecture, whereas the factors 4 and 2 in the numerator consider how many identical cells are present.

Amplifier Design and Simulation Results
The proposed OTA has been designed and simulated in a 130 nm CMOS process from STMicroelectronics. Small-signal and large-signal figures of merit (FoMs) were used to compare it against recently published OTAs with supply voltages lower than 0.5 V. Extensive parametric and Monte Carlo simulations were carried out in order to assess the robustness of the amplifier to PVT variations and mismatch referring to both open-loop and closed-loop simulation test benches.

Sizing
The transistors in the stages implementing the architecture in Figure 1 were sized as reported in Table 1. The bias voltages V GN and V GP in Figures 2, 4 and 5, are generated by the biasing circuit shown in Figure 3. Moreover, the sizing of the NMOS transistors M 9A and M 9B and of the PMOS transistor (M 10 ) of the biasing circuit are reported in Table 1. The voltages V GN and V GP propagate the bias current, I B = 4 nA, through body-mirroring or gate-mirroring.

Circuit Simulations
The proposed OTA was simulated within the Cadence Virtuoso environment assuming a supply voltage of 0.3 V and an output load capacitance of 50 pF.
Referring to the open-loop simulation test bench the differential gain (magnitude and phase) was evaluated as reported in Figure 6. As can be observed from the figure, the phase margin is about 52.40°, whereas the gain-bandwidth product is about 35.16 kHz. Figure 6 also shows the common mode gain in typical conditions.      The amplifier was then tested in unity-gain configuration and its transfer characteristic is reported in Figure 8, highlighting the rail-to-rail capabilities of the OTA.  Sinusoidal waves at different amplitudes and with a frequency of 200 Hz were used to excite the unity-gain amplifier and evaluate distortions. The OTA exhibits very good total harmonic distortion (THD), also with an input signal swing equal to the supply voltage (as depicted in Figure 9). As can be observed from Figure 9, when a 90% signal swing is considered, the THD is about 0.673%, whereas when a full-swing signal is used the THD is still good and equal to about 1.38%. Furthermore, to assess the slew-rate (SR) performance of the amplifier, a full range square wave was used, and results are shown in Figure 10. The amplifier shows positive and negative slew-rate (SR p and SR n ) equal to 18.61 and 11.51 V/ms, respectively. Though not symmetrical, the worst-case slew-rate is not much worse than the best one, hence large-signal performance is good on both signal edges. The input-referred noise spectrum of the proposed OTA is reported in Figure 11 and shows a value of about 1.60 µV/ √ Hz at 1 kHz.

Robustness to Mismatch and PVT Variations
The OTA was then extensively tested by means of parametric and Monte Carlo simulations to demonstrate its robustness to PVT and mismatch variations. Table 2 reports the results of 200 Monte Carlo iterations. Power dissipation (P D ) has a standard deviation lower than the 10% of the mean value. Large-signal performance (i.e., SR p and SR m ) is close to the nominal value, whereas the attained mean value of the phase margin m ϕ is about 53°. The standard deviation of the offset is relatively large, confirming the suboptimal performance in terms of noise and offset of the proposed OTA. Its value is however similar to other ULV OTAs reported in the literature.  Figure 12 reports the histogram of the CMRR that clearly shows a log-normal distribution, probably due to the sub-threshold operating condition of the circuit. The architecture exhibits a CMRR up to 98dB for some iterations (as expected from theoretical results in Section 3.2), and remains relatively high under mismatch variations, with a mean value of about 42 dB. The power supply rejection ratio (PSRR) of the proposed OTA is also quite good despite the very low supply voltage. Figure 13 reports the histogram of the PSRR, that shows a mean value of about 56.13 dB with a limited variation under mismatch. The performance under PVT variations was investigated taking into account a ±10% supply voltage variation and a [0, 70]°C temperature range. In Table 3, the performance under temperature variations is summarized. Total power consumption, the gain-bandwidth product as well as noise and total harmonic distortion are adequately stable across the considered temperature range. However, it is evident from Table 3 that the differential gain and CMRR degrade at high temperatures; this is probably due to variations in the bias point of stage 2 and in particular in transistors M 4A and M 4B entering the triode region. A temperature-dependent current biasing approach would probably allow achievement of better results, but this has not been considered in this work. Furthermore it has to be noted that an ideal constant current source was considered: while such generator can be devised (e.g., see [49], or using a higher supply voltage for the current reference), this clearly remains a critical issue, dependent on the application environment of the OTA.  Table 4 shows that the amplifier is stable under power supply variations, with power dissipation and slew-rate increasing significantly with the supply voltage, whereas CMRR improves at lower supply voltages due to the following design centering approach. The OTA was then tested under different process corners and results are reported in Table 5. As is evident from Table 5, the proposed OTA shows good performance, even assuming the worst case process conditions.

Discussion and Comparison with the Literature
In order to compare the amplifier with the literature, we employ the two standard figures of merit (FOMs) for small and large-signal performance, namely FOM S and FOM L . The FOM S is defined as: where C L is the load capacitance; the FOM L is defined as: where SR avg is the average (between the positive and negative edge) slew-rate. However, since most works presented in the literature show an asymmetric slew-rate, it is more meaningful to consider the worst case slew-rate. Consequently, as in [40], we define the FOM L WC as: where SR WC is the worst case slew-rate between the positive and negative signal edges. The proposed amplifier exhibits the largest small-signal FOM among the comparable ULV literature, with a FOM S approaching 80.29 k against the previously reported record of about 20.16 k attained by [42]. The proposed OTA outperforms gate-driven, bodydriven and also digital OTAs. Large-signal performance is also very good, especially if the worst-case FOM is considered: the proposed amplifier is the best in the literature. Indeed, the FOM L is about 34.40 k; furthermore, the worst case FOM L WC also is very good, approximately 26.30 k, which is an awesome result, also given that previous works attained in the best case FOM L ≈ 21.00 k and in the worst case FOM L WC ≈ 8.36 k. The proposed amplifier has a small area occupation with respect to comparable body-driven designs, though the area is larger than digital and gate-driven designs ( Table 6).

Conclusions
In this work, we propose a novel tree-based OTA architecture that exploits body-driven stages to achieve rail-to-rail ICMR, and body-diode loads to avoid Miller compensation, improving the bandwidth efficiency. A ULV ULP OTA exploiting this approach was designed in a 130 nm CMOS process from STMicroelectronics. Simulation results show a dc gain higher than 52 dB, a gain-bandwidth product of about 35.16 kHz with nominal CMRR and PSRR, respectively, equal to 42.11 dB and 56.13 dB. Large-signal characteristics are also very good both in terms of THD and slew-rate. Due to the very limited power consumption of about 21.89 nW, the OTA exhibits state-of-the-art small-signal and large-signal FoMs. Summarizing, the overall performance of the proposed OTA shows record-breaking smallsignal and large-signal performance, relatively large DC gain and reasonable PSRR and CMRR performance. The OTA exhibits good stability and robustness against PVT and mismatch variations.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Body-to-Gate (B2G) Interface
This section aims to explain the body-to-gate (B2G) interface which is exploited in each stage ith−1, ith interface. Following the notation in Figure A1a, the current gain can be expressed as: where χ α derives from Miller approximation on C gd B and can be therefore expressed as: where g load load conductance and as a consequence it could be equal to g mb load or g ds load (respectively, for stage 1,2 and stage 3 ). It is possible thereafter to conclude that the interface behaves as a small signal current-mirror with gain.

Appendix B. Body-to-Body (B2B) Mirror
This section aims at explaining the body-to-body (B2B) interface which is exploited in each stage. Following the notation in Figure A1b, the current gain can be expressed as: where also in this case χ β denotes the Miller approximation and can be derived as: Finally, it can be concluded that the interface could be considered as a B2B mirror that enables a small-signal current mirror whose gain is fixed by properly sizing M A and M B .