A 0.3 V Rail-to-Rail Ultra-Low-Power OTA with Improved Bandwidth and Slew Rate

: In this paper, we present a novel operational transconductance ampliﬁer (OTA) topology based on a dual-path body-driven input stage that exploits a body-driven current mirror-active load and targets ultra-low-power (ULP) and ultra-low-voltage (ULV) applications, such as IoT or biomedical devices. The proposed OTA exhibits only one high-impedance node, and can therefore be compensated at the output stage, thus not requiring Miller compensation. The input stage ensures rail-to-rail input common-mode range, whereas the gate-driven output stage ensures both a high open-loop gain and an enhanced slew rate. The proposed ampliﬁer was designed in an STMicroelectronics 130 nm CMOS process with a nominal supply voltage of only 0.3 V, and it achieved very good values for both the small-signal and large-signal Figures of Merit. Extensive PVT (process, supply voltage, and temperature) and mismatch simulations are reported to prove the robustness of the proposed ampliﬁer. methodology, F.C., R.D.S., P.M. and G.S.; software, R.D.S.; validation, F.C., R.D.S., P.M. and A.T.; formal analysis, F.C.; investigation, F.C., R.D.S. and G.S.; resources, A.T.; writing—original draft preparation, R.D.S.; writing—review and editing, F.C., P.M., G.S. and A.T.; supervision, G.S.; funding acquisition, A.T. All authors have read and agreed to the published version of the manuscript.


Introduction
The development of ultra-low-power (ULP) and ultra-low-voltage (ULV) integrated circuits is driven by applications such as the Internet of Things (IoT) [1][2][3][4] and implanted biomedical devices [5][6][7][8]. The number of IoT devices connected to the internet for gathering and processing information is increasing at a faster and faster rate. The enormous number of connected devices and the fact that they are often battery powered or must scavenging energy from the environment make low power consumption a key feature in IoT devices [1]. Therefore, the design of integrated circuits (ICs) for IoT applications is becoming more and more difficult due to the stringent constraints in terms of power dissipation, minimum supply voltage, and area footprint [1,2]. Due to the above constraints and the low intrinsic gain of nanometer MOS transistors, analog interfaces are, in many cases, the most challenging building blocks of ICs for IoT applications [2,3].
In the context of implanted biomedical applications, front-end amplifiers for neural recording systems often rely on AC coupling with cutoff frequencies below 1 Hz to eliminate the DC offset of the electrodes and to properly process the neural signals [5,6]. Such devices are usually designed for different frequency slots in the range from 1 Hz to 10 kHz. For example, in the case of epileptic seizure detection, band-pass filters in the range of 250 to 500 Hz are required to extract the signals of interest. Low-voltage operation, low power consumption, and a small silicon area are the main requirements of ICs for these systems [7,8].
ULP and ULV operational transconductance amplifiers (OTAs) are key components for both the IoT and implanted biomedical devices, and a huge number of OTAs have been presented in recent years [9]. In [10], Stockstad et al. presented a 0.9 V operational amplifier that, for the first time, exploited the bulk-driven technique to attain rail-to-rail input-output swing. Over the years, the research community has increasingly focused on reducing the supply voltage and power consumption at the expense of common-mode rejection ratio (CMRR) performance [11][12][13][14]. Several common-mode feedback (CMFB) approaches, which exploit triode-biased devices or current cancellation, have been proposed with the aim of minimizing the common-mode gain in fully differential OTAs, thus improving their CMRR performance [15][16][17][18][19].
Multi-stage OTAs based on folded cascode or gain-boosting topologies and with supply voltages in the range from 0.6 to 0.9 V have been presented in the literature [20][21][22][23][24]. However, when targeting supply voltages lower than 0.4 V, the adoption of the topologies and design strategies reported above is no longer possible, and pseudo-differential or inverter-based architectures are often used [25][26][27][28][29][30][31][32]. In fact, at these supply voltages, gate-driven amplifiers are not adequate for ensuring a rail-to-rail input common-mode range (ICMR). However, due to the lack of tail current generators, pseudo-differential and inverter-based circuits typically exhibit a not-well-defined bias point and a poor CMRR, and body bias strategies are less effective due to the limited available voltage swing on body terminals.
ULV circuits can employ floating gate devices [33] or body-driven stages [34,35] to increase the input common-mode swing. Body-driven input stages are inherently rail-torail in ULV applications, and forward biasing of the NP junctions is not a concern when supply voltages are lower than about 0.6 V. Unlike gate-driven stages, however, they tend to have higher noise, lower bandwidth, and resistive input impedance. Despite these shortcomings, there is no alternative at supply voltages around 0.3 V if a large commonmode input signal swing is desired. Furthermore, the gate terminals of the input devices can be used for biasing, the biasing current of the amplifiers can be accurately set, and pseudo-differential architectures can even be exploited.
Additional interesting OTA architectures based on fully digital operation were recently proposed in [36,37], which showed the feasibility of full standard-cell-based analog amplifiers.
In addition to gain and robustness considerations, ULP and ULV topologies may have low bandwidth (because of sub-threshold biasing) and poor slew-rate performance (because of low biasing currents). Hence, it is important to assess the OTA performance in terms of bandwidth and slew rate for a given current or power consumption and capacitive load, which refer to the popular large-signal and small-signal Figures of Merit (FOMs).
In this paper, we present a novel OTA topology based on a body-driven input stage with a dual path to improve CMRR that exploits body-driven current mirror load for differential-to-single-ended conversion at the output. The proposed OTA has only one high-impedance node, and can therefore be compensated at the output stage, thus avoiding Miller compensation. The body-driven input stage ensures the rail-to-rail input commonmode range, whereas the coupling between the first and second stages and the gate-driven output stage ensure high open-loop gain and good slew-rate performance.
The paper is organized as follows: Section 2 describes the proposed topology; Section 3 analyzes the small-signal and large-signal circuit responses, including the CMRR; Section 4 reports the design and simulation results, with emphasis on the process, supply voltage, and temperature (PVT) variations, as well as on stochastic mismatch variations; a comparison with the related literature to highlight the advantages of the proposed topology in terms of the FOMs is also presented in Section 4. Finally, the conclusions are reported in Section 5.

Proposed Topology
The block scheme of the proposed OTA is reported in Figure 1, showing the usage of two matched transconductors G M A and G M B , whose output currents are sent to a transimpedance stage T Z , in which the common-mode currents are ideally cancelled, and the differential currents are summed to double the transconductance gain.  To allow rail-to-rail input swing, a body-driven first stage is used. Due to the ULV supply, particular care was taken to identify a biasing strategy that was able to guarantee robustness with respect to the PVT and mismatch variations. In fact, since each transistor requires a minimum V ds to properly operate, tail current generators are not an option, and the lack of tail current generators reduces common-mode rejection. Hence, the dual transconductance path in the input stage was exploited to guarantee high CMRR under the hypothesis of well-matched transconductances. Furthermore, since M 1A,B and M 2A,B comprise a current mirror and M 3A,B acts as a current source, a well-defined bias current is obtained. It has to be noted that the two transconductors G M A and G M B have to be designed symmetrically and must be well matched with each other in order to optimize common-mode cancellation. The second stage of the OTA exploits two body-diode NMOSs, so the input impedance is equal to 1/g mb . Transistors M 4A , M 5 and M 4B , M 6 act as current amplifiers. The gates of transistors M 7 and M 8 determine the biasing current of the second stage. The body-driven current mirror M 7 and M 8 provides differential-to-single-ended conversion and allows further rejection of the common-mode current component, since the common-mode current in M 6 and M 8 is the same. The output conductance g ds 8 + g ds 6 of the stage provides voltage gain to the amplifier.
It has to be noted that the differential input stage is very similar to the one used in [35,38,39]. However, in these works, the transconductors G M A and G M B are loaded by a conventional gate-driven current mirror, which performs the differential-to-single-ended conversion in the first stage. The first stage is then followed by one or two stages with Miller or Nested Miller compensation. In the proposed circuit, instead, the transconductors G M A and G M B are loaded by a differential body-diode load, and therefore, the input stage can To allow rail-to-rail input swing, a body-driven first stage is used. Due to the ULV supply, particular care was taken to identify a biasing strategy that was able to guarantee robustness with respect to the PVT and mismatch variations. In fact, since each transistor requires a minimum V ds to properly operate, tail current generators are not an option, and the lack of tail current generators reduces common-mode rejection. Hence, the dual transconductance path in the input stage was exploited to guarantee high CMRR under the hypothesis of well-matched transconductances. Furthermore, since M 1A,B and M 2A,B comprise a current mirror and M 3A,B acts as a current source, a well-defined bias current is obtained. It has to be noted that the two transconductors G M A and G M B have to be designed symmetrically and must be well matched with each other in order to optimize common-mode cancellation. The second stage of the OTA exploits two body-diode NMOSs, so the input impedance is equal to 1/g mb . Transistors M 4A , M 5 and M 4B , M 6 act as current amplifiers. The gates of transistors M 7 and M 8 determine the biasing current of the second stage. The body-driven current mirror M 7 and M 8 provides differential-to-single-ended conversion and allows further rejection of the common-mode current component, since the common-mode current in M 6 and M 8 is the same. The output conductance g ds 8 + g ds 6 of the stage provides voltage gain to the amplifier.
It has to be noted that the differential input stage is very similar to the one used in [35,38,39]. However, in these works, the transconductors G M A and G M B are loaded by a conventional gate-driven current mirror, which performs the differential-to-single-ended conversion in the first stage. The first stage is then followed by one or two stages with Miller or Nested Miller compensation. In the proposed circuit, instead, the transconductors G M A and G M B are loaded by a differential body-diode load, and therefore, the input stage can be considered as a fully differential amplifier. The second stage of the proposed topology can be seen as a pseudo-differential pair loaded with a body-driven current mirror that performs the differential-to-single-ended conversion at the output, thus further increasing the CMRR and avoiding Miller compensation. The relatively high CMRR is obtained as a combination of two kinds of effects: the intrinsic CMRR of the simple differential pairs M1A-M3A and M1B-M3B, as explained in [38], and trough symmetry and cancellation of currents due to the body-driven current mirror in the output stage.

Circuit Analysis
In this section, we present a small-signal and large-signal analysis of the proposed amplifier, including the frequency response, the CMRR, and the slew-rate performance.

Differential Gain Frequency Response
Referring to the block scheme in Figure 1, the differential gain of the proposed OTA can be easily computed as: is the transconductance gain of the first stage and T Z is the transimpedance gain of the second stage. To calculate the gains, we refer to the detailed schematic in Figure 2 and denote with subscript p 1 the small-signal parameters of , with subscript n 2 the parameters of M 5 and M 6 , and with subscript p 2 those of M 7 and M 8 . Note that we have exploited both the symmetry of the matched devices and the identity of their bias point due to the appropriate sizing of the devices. Therefore: (2) g mb 4 A(B) = g mb n 1 ; g mb 7(8) = g mb p 2 ; (3) g m 5(6) = g m n 2 ; (4) and under the hypotheses that G M A(B) can be calculated as: Since the ratio between the pole and the zero in the above equation is 2 and because they are at a high frequency, the effect of the pole-zero doublet can usually be neglected. Now, focusing on the second stage, the transimpedance gain T z can be expressed as: where p 1 and p 2 are respectively equal to: In Equation (9), we neglect the effect of the pole-zero doublet of the current mirror M 7 -M 8 , analogously to the one in Equation (8), and that of the mirror M 4 A -M 5 (M 4 B -M 6 ), since they have a negligible impact on the frequency response. In addition, the additional pole-zero doublets were neglected due to transistors M 4 A(B) and M 5(6) , thus introducing a limited error in the frequency response derivation. Now, we focus on the DC differential gain, which can be expressed as: Considering Equations (8) and (9), it is clear that the dominant pole is given by the output impedance g ds 8 + g ds 6 , whereas the poles and zeros proportional to g mb i or g m i are positioned at a higher frequency. Therefore, the GBW can be easily derived as:

Common-Mode Gain
The common-mode gain was estimated under the hypothesis of there being no mismatch between the two input transconductors (i.e., G M A = G M B ).
According to these approximations and to Equation (3), the common-mode gain can be written as: where α depends on the M 1 -M 2 current mirror: α = g ds n 1 + g ds p 1 g m n 1 (15) β is defined as: β = g ds n 1 + g ds p 1 g mb p 1 (16) and the third term γ depends on the M 7 -M 8 mirror: The factors in Equations (15)- (17) are inversely proportional to g m /g ds (the intrinsic gain of the device) or g m b /g ds (the intrinsic gain when driven by the body terminal). Equation (14) thus shows that, for the ideal case of infinite output resistance (infinite intrinsic gain), the common-mode gain would be zero. This is, however, not the case for deep submicron devices, which present a limited intrinsic gain; thus, the gain in Equation (15) is not negligible. Equations (13)-(15) allow the calculation of the CMRR as:

Noise Analysis
The noise analysis of the amplifier can be performed by considering a current noise generator between the drain and source terminals for each device. The generator includes both thermal channel noise and flicker noise; thus, its power spectral density can be expressed with Equation (19): where: The equivalent input noise voltage can be derived by calculating the output voltage due to the noise sources and dividing it by the differential gain in Equation (12).
The resulting power spectral density can be expressed as Equation (22), where the noise spectra i 2 n(p) i (i = 1, 2) represent the noise spectra of i-th NMOS or PMOS stage. Therefore, Equation (22) shows that the main noise contributions are due to the input stages of N(P)MOS, as can be expected. However the body-driven stage rejects the noise contributions less efficiently than the gate-driven ones. Thus, a larger device size must be chosen to achieve the same noise performance.

Large-Signal Analysis
The amplifier was optimized for large-signal performance by exploiting class AB behavior. Indeed, the first-stage output current is not limited by the M 3(A,B) bias current I bias , but by the voltage swing on the input body terminals. The output terminal of the first stage can thus practically reach the supply voltage V DD . Therefore, the current in the output stage is also not limited by the bias current, but by the maximum V gs of M 5 and M 6 .
To be more precise, the negative slew rate is limited by the gate voltage of M 6 reaching the supply voltage. On the other hand, the maximum current of M 8 is limited by the available voltage swing of the body terminal of M 7 and M 8 , thus resulting in a slightly asymmetrical slew rate. The positive and negative slew rates are respectively equal to: and where V th p 0 represents the absolute value of the threshold voltage when V sb is equal to 0. On the other hand, in Equation (24), the V th n represents the NMOS threshold voltage at the selected V bs n .

Amplifier Design and Simulation Results
The proposed amplifier was designed with the 130 nm STMicroelectronics CMOS technology and simulated within the Cadence Virtuoso environment.

Sizing
To achieve the minimum power consumption with a supply voltage V dd -V ss of only 0.3 V and |V th | ≈ 0.35 V, all devices were biased in the sub-threshold region, and |V ds | and |V gs | were equal to 150 mV. A load capacitance C L of 40 pF was assumed in reference to a typical IoT sensor interface application [16,[40][41][42][43].
Large transistor widths and lengths were used to reduce the contributions of thermal and flicker noise, which represent a crucial issue in body-driven amplifiers [21,39,44]. In fact, larger devices have lower narrow-and short-channel effects, lower flicker noise, and larger body transconductance, which are useful for ULV and ULP applications. The bias current of the two symmetrical stages G M A and G M B was set to 13.1 nA as a tradeoff between noise and power consumption. The bias current of the second stage was set to about 95 nA as a tradeoff between slew rate, GBW, and phase margin for the assigned load capacitance. The sizing of the MOS transistors and their small-signal parameters are reported in Table 1.

Circuit Simulations
The results of the open-loop AC simulations of the proposed OTA are reported in Figures 3 and 4. In particular, Figure 3 shows that the amplifier attains an overall DC gain and GBW of about 40 dB and 18.65 kHz, respectively, as expected from the model in Equations (12) and (13). Furthermore, as previously mentioned, the pole-zero doublets have a marginal effect on the amplifier frequency response. The positive zero g mn 2 /C gd n 2 is at a much greater frequency than the poles p 1 and p 2 , and is therefore negligible. The phase margin (mϕ) of the amplifier is set with the position of pole p 2 with respect to the unity gain frequency: the transistor sizing reported in Table 1 results in mϕ = 52°. Figure 4 shows the CMRR of the amplifier, which resulted in about 67 dB.    The amplifier was then simulated in a non-inverting buffer configuration, as shown in Figure 5a. The closed-loop frequency response of the OTA is depicted in Figure 5b, whereas the DC transfer characteristics are shown in Figure 6a, which highlights a railto-rail behavior. Figure 6b shows the dependence of the first-stage bias current when the amplifier is excited with a rail-to-rail signal. The positive and negative slew rates of the proposed amplifier were simulated with a full-swing input square wave. Figure 7a shows the time-domain response of the non-inverting buffer in the slew-rate test. The positive and negative slew rates are SR + = 10.83 V/ms and SR − = 32.37 V/ms, respectively, and are thus in good agreement with Equations (23) and (24). The simulation results validate the preliminary analysis. The total harmonic distortion (THD) was simulated for different values of the peak-to-peak amplitude of a sinusoidal wave as the input, and is plotted in Figure 7b, which shows a THD lower than 1% for peak-to-peak amplitudes lower than about 230 mV. The plot of the power supply rejection ratio (PSRR) as a function of frequency is reported in Figure 8a, which shows a PSRR of about 45 dB at low frequencies.

Robustness to Mismatch and PVT Variations
The robustness of the amplifier to device mismatches was validated through Monte Carlo simulations, whose results are summarized in Table 2. The results are consistent with those of typical simulations. However, mismatches negatively affect the CMRR of the amplifier, as it strongly depends on the matching between two symmetrical paths. The Monte Carlo simulations still show a CMRR of about 52 dB, which is a good result considering the low supply voltage and the lack of tail generators. The amplifier area amounts to about 0.0036 µm 2 , and it was estimated with the aid of Cadence Layout XL. Additionally, the amplifier performances were also validated using PVT variations. The amplifier shows a good stability with ±10% supply voltage variations, as shown in Table 3.
It has to be pointed out that the proposed topology can work at higher supply voltages provided that the circuit is properly resized according to the main guidelines reported at the beginning of this section (e.g., |V ds | and |V gs | are equal to half of |V dd -V ss |). As a confirmation, we successfully resized the circuit to operate at 0.4 V, thus obtaining an improvement of about 12 dB in the DC gain. If a supply voltage in the range of 0.6 V is available, the topology can still work, but in this case, it would be preferable to add cascode MOS devices in order to keep the devices in the sub-threshold and to achieve a much larger DC gain. Table 4 reports the performance of the OTA as a function of the operating temperature, showing that the amplifier is robust in a range of [−10, 80]°C, whereas at higher temperatures, the performances exhibit larger variations. However, if needed, the effects of temperature variations could be mitigated by exploiting PTAT (proportional to absolute temperature) current sources.

Results and Comparison with the Literature
In Table 5, the proposed amplifier is compared to other ULV-ULP topologies from the literature. To fairly compare the performance of different designs, we considered the following well-known small-signal and large-signal FOMs: The small-signal FOMs (25) and (27) measure the efficiency of the OTA in providing high-frequency performance, since they measure the gain-bandwidth product, which is normalized to the load capacitance, for a given power consumption. In particular, both the total DC current and power consumption are taken into account. This also allows the comparison of the effects of different supply voltages. Analogously, the FOMs (26) and (28) measure the efficiency for large-signal behavior, since they measure the slew rate, which is normalized to the load capacitance, for a given power or current consumption. This is particularly significant for class AB amplifiers, where the large-signal performance is often the most significant one. As in the literature, the average slew rate is considered; however, for practical applications, the parameter of interest is the worst-case slew rate; thus, the FOMs (29) and (30) were also considered: In the cases where the OTA presents an asymmetric slew rate, both kinds of largesignal FOMs could be of interest.
The comparison in Table 5 shows that the proposed amplifier provides a trade-off between small-signal and large-signal FOMs in the state of the art. The OTAs in [28,36,44] present higher values for the small-signal FOMs; however, their large-signal performance is not optimized. The authors of [44] also used a higher supply voltage; nevertheless, their FOM S in Equation (27) was lower than the OTA in this work. The topology in [28] is made up of an Arbel-like body-driven pseudo-differential input stage followed by a dual-path differential-to-single-ended converter with gain, and the resulting OTA is compensated by a Miller capacitance. The OTA presented in this paper has a common-mode cancelling bodydriven input stage with a body-diode load, which produces a high-frequency pole, followed by a differential-to-single-ended converter, without the need for Miller compensation, as the OTA is output-compensated. The proposed design exhibits a much better slew rate and, consequently, large-signal FOM performance than those in [28]. The amplifier in [35] presents a higher value for the large-signal FOMs due to its class-AB approach, even if the worst-case SR is much lower than the average value. On the other hand, the amplifier in [38] presents good values for this latter FOM. Both of these amplifiers are, however, optimized for the large-signal behavior, providing lower values than the proposed ones for the small-signal FOMs.

Conclusions
In this paper, we proposed an ultra-low-voltage amplifier that operates at a supply voltage of 0.3 V and attains good Figures of Merit for both small-signal and large-signal behaviors. The amplifier exploits a pseudo-differential body-driven fully differential input stage and a gate-driven second stage, which performs differential-to-single-ended conversion. This provides a rail-to-rail input common-mode signal swing, large DC gain and CMRR, and good frequency performance. In particular, the amplifier does not present high-impedance internal nodes thanks to the body-diode active loads; thus, it does not require Miller compensation. The gate terminals are used for biasing, thus achieving robustness against PVT variations.
The comparison against the state of the art of ULV OTAs has shown how the proposed topology optimizes the trade-off between small-signal and large-signal performance, and achieves very good values for all of the considered FOMs.
On the other hand, the proposed architecture presents limitations in terms of noise performance, exhibits an input impedance that is not purely capacitive, and shows a CMRR that is strongly dependent on mismatch variations. Another limitation is due to the body-driven current mirror load of the second stage, which causes an asymmetric slew rate. Nevertheless, these drawbacks are common to many ULV and ULP OTAs, in which a trade-off between many requirements has to be pursued.
It also has to be pointed out that, even if we presented results referring to a 130 nm CMOS process, the proposed topology can be implemented in even more advanced CMOS technology nodes, provided that the design guidelines outlined in the manuscript are properly followed. In particular, if a fully depleted silicon-on-insulator (FDSOI) CMOS process is available, the implementation of the proposed OTA (as well as of all body-driven OTAs) results in an input impedance similar to those of gate-driven circuits, and this is an interesting advantage over implementations based on conventional bulk CMOS processes, especially if switched applications are targeted.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: