A gm/ID-Based Design Strategy for IoT and Ultra-Low-Power OTAs with Fast-Settling and Large Capacitive Loads

In this paper, a new strategy for the design of ultra-low-power CMOS operational transconductance amplifiers (OTAs), using the gm/ID approach, is proposed for the Internet-of-things (IoT) scenario. The strategy optimizes the speed/dissipation of the OTA in terms of settling time, including slew-rate effects. It was designed for large capacitive loads and for transistors biased in the sub-threshold region, but it is also suitable for low-capacitive loads or for transistors biased in the saturation region. To validate the proposed strategy, a well-known three-stage OTA was designed starting from capacitive load and settling time requirements. Simulations confirmed that the OTA satisfies the specifications (even under Monte Carlo analysis), thus proving the correctness of the proposed approach.


Introduction
At present, the Internet of things (IoT) paradigm is certainly playing a crucial role in the electronic market. It is estimated that more than 30 billion devices are operating in a wide range of applications that span from health-care to agriculture or from industrial manufacturing to automotive, if only to mention the most significant [1]. As is well known, an IoT node is required to have sensing, processing and wireless capabilities [2,3]. Sensing means that the IoT node is interfaced with different kinds of sensors whose analog signals (e.g., temperature, pressure, humidity, CO 2 , light, acceleration, etc.) must be read correctly by a proper sensing circuitry. Processing means that the IoT node converts the sensor signals into digital words that are subsequently elaborated, as requested by the given application. Wireless means that the IoT node can send/receive data to/from other nodes or a central processing unit. In addition to these features, it is not uncommon to deal with applications that require energetically autonomous IoT nodes, a section of which is dedicated to the energy harvesting from the environment [4][5][6] and to the management of such harvested power [7][8][9][10]. Whatever the overall application or the specific sub-circuit of the IoT node, energy efficiency is one of paramount factors to survive in the semiconductor market and therefore, wise energy management is a critical task to be addressed during the design phase.
As far as the analog section is concerned, the optimization of the power consumption is mainly accomplished by reducing the operating voltage (1 V or less). This makes the analog section compatible with the same sub-100 nm CMOS process adopted for the digital circuitry, which, in general, occupies the larger amount of silicon area. In the ultra-low-power context, the bias current must also be reduced but at the expense of an increased noise level and of a slower operating speed. In this scenario, the design is not a straightforward task because the technology scaling, the low supply voltage and the low bias current introduce many degenerative effects.
Among the different analog circuits that coexist in an IoT node, the operational transconductance amplifier (OTA) is a very common and versatile analog building block [19][20][21][22][23]. For its characteristics, it suits the sensing section, where high precision and moderate bandwidth are required. Due to the reduction in the power supply voltage, multistage topologies are often adopted in these amplifiers, and the optimization of the power consumption must face with noise, dynamic range and speed trade-offs [24][25][26][27][28][29][30][31][32][33][34][35][36]. Ultra-low-power amplifiers cut down the power consumption by biasing all the transistors in the sub-threshold region; however, the speed requirements are more arduous to fulfill also because the slew-rate effects produce serious limitations.
In this paper, we propose a new strategy for the design of ultra-low-power CMOS OTAs using the g m /I D approach. Because of its wide knowledge and diffusion, we focus our attention on the three-stage OTA based on the reverse nested Miller compensation with a feed-forward stage (RNMC-FF) and to the case of large capacitive loads. Differently from the past works reported in the literature [37][38][39][40][41][42][43][44][45][46][47][48][49], the proposed design strategy (a) optimizes the speed/dissipation of the amplifier taking into account the settling time and the slewrate effects; (b) despite being developed for large capacitive loads, it also holds for lowcapacitive loads; and (c) despite being developed for the transistors biased in the subthreshold region, it also holds for the saturation one.
The paper is structured as follows. In Section 2, the OTA topology is presented at the block schematic level and at the transistor-level implementation. Section 3 discusses the design strategy focusing the attention on the g m /I D approach, to the sub-threshold region, to the settling time (in terms of small-and large-signal behavior), to the noise specification, and to the optimization of the overall power dissipation. In Section 4, an RNMC-FF three-stage OTA was designed and validated through simulations. Finally, in Section 5, conclusions are drawn.

The Three-Stage OTA
The design strategy, which will be introduced in the next section, was applied to the three-stage OTA whose simplified block schematic is shown in Figure 1. The OTA is made up of three main transconductors, G mi , whose output terminals are connected to the respective i-th node of the overall amplifier. In general, the first transconductor, G m1 , is a differential-input stage that better rejects external disturbances. The second and the third transconductors, G m2 and G m3 , are implemented using an inverting stage and a non-inverting one. The stability is achieved using the reverse nested Miller compensation (RNMC) made up of a main compensation capacitor, C C1 , connected between nodes 3 and 1, and a secondary compensation capacitor, C C2 , connected between nodes 2 and 1. It is worth noting that, under equal design specifications, RNMC topologies are intrinsically faster than nested Miller compensated counterparts, since the inner capacitor of the RNMC network is not connected to the output node [50,51].
Some RNMC topologies also adopt a nulling resistor to improve the compensation in terms of bandwidth/consumption performance [51]. However, its resistance must be made proportional to the reciprocal of the amplifier transconductances that, in an ultralow-power design, can be in the order of few microsiemens or less. Therefore, the resulting resistance would be in the order of hundreds of kiloohms and this, in turn, would worsen both area and noise performance. For this reason, in our design, we preferred not to use any nulling resistor and maintained the simplest possible compensation network.
An additional transconductor, G mf , makes a feed-forward (FF) path that allows some simplifications in the transfer function and helps the final stage drive large capacitive loads. In general, this additional stage is obtained without any further current dissipation but by simply connecting the bias transistor of the third stage to the output node of the first transconductor. Equivalent output resistances, R o1 , R o2 and R o3 (not shown in the figure), are assumed as placed between the output node of each i-th transconductor and the ground, respectively. Similarly, parasitic capacitors, C o1 and C o2 , are assumed to be in parallel to R o1 and R o2 . However, in our models, we can neglect these output resistances and parasitic capacitors, provided that G mi R oi 1 and C C1 , C C2 C o1 , C o2 , respectively. A transistor-level implementation of the block schematic depicted in Figure 1 is shown in Figure 2. In particular, the circuit is designed using a 65 nm CMOS process and is optimized for operating at the V DD = 1 V taking advantage of the three types of complementary transistors that differ for the threshold voltage. Low-threshold (LVT) transistors are used for the two input devices of the differential pair. In this manner, we optimize the common-mode input range that is bounded by High-threshold (HVT) transistors are used for the current mirror M9-M10. This makes the gate-source voltage of M9 similar to the source-drain bias voltage of M10 and reduces the systematic error of the current mirror due to channel-length modulation. High-threshold transistors are also used in M12 and M14 of the slew-rate (SR) enhancer but their role shall be discussed in the following. Regular-threshold (RVT) transistors are used for the remaining active devices of the OTA.
With respect to the transistor transconductances, the stage transconductances of the block schematic in Figure 1 are G m1 = g m1,2 , G m2 = g m6 , G m3 = g m8 (g m10 /g m9 ) = rg m8 and G mf = g m11 . Bias currents 2I 1 and I 2 are provided through current mirrors M5-MBP and M7-MBP, respectively. As we mentioned before, the feed-forward transconductor, G mf , is implemented by simply connecting the gate of M11 to node v 1 . This also sets the bias current of the third stage to I 3 = (W/L) 11 /(W/L) 6 I 2 . The current across M8 is set to I 3 /r through the 1-to-r current mirror M9-M10.

Design Strategy of the Three-Stage OTA for Sub-Threshold Region
Whether the analog system is powered by an energy harvesting/scavenging mechanism or by a battery, in analog IoT applications, there are very tight power and energy constraints. Especially when the power of the IoT system is harvested from the outside, the designed circuits cannot exceed the driving capability of the instantaneous current provided by the power source. Consequently, in the designing of the low-power IoT interface circuit, special attention must be paid to optimizing the power consumption [1]. Typical available power levels in IoT are in the order of a few micro-Watts, thus implying that the transistors operate in weak inversion (or sub-threshold).
The proposed design strategy makes use of the g m /I D parameter that allows to easily relate the transconductance of a single transistor (or of the amplification stage) to its bias current, [52,53]. The g m /I D parameter permits to describe the MOSFET behavior in short channel devices (even in moderate and weak inversion regions) and to overcome the limitations of the simple square-law MOSFET model.
Thanks to these properties, the g m /I D parameter was recently exploited to develop new and interesting design strategies, many of them based on process datasets generated from simulation sweeps (i.e., look-up tables) [54][55][56][57][58]. These strategies allow to investigate a complex design space, made of a large number of degrees of freedoms, in a reasonable simulation time. In this way, the designer can find the optimum point required by the circuit specifications. On the one hand, this approach has the evident advantage of being easily implemented in an automated design procedure, however, on the other hand, the designer risks losing sight of the intimate operation of the circuit, since no design equations are provided.
In contrast to the aforementioned design strategies, we used the g m /I D parameter to overcome the limitations of the square-law MOSFET model and the subsequent optimization procedure was conducted by properly manipulating the design equations derived in the next sessions. Moreover, the use of the g m /I D parameter also allows to define the same design strategy independently of the bias region of the transistors, that is, whether they are in the saturation region or the sub-threshold one. Figure 3 reports the plot of Γ = g m /I D versus the gate-source overdrive (V GS − V TH ) for the two regular-threshold complementary devices of our 65 nm CMOS process. The remaining transistors of this process exhibit similar plots. In the case of other nanometer CMOS technologies, the g m /I D ratio behaves similarly and there are no practical differences with respect to the curves in Figure 3. In the figure, we labeled the sub-threshold region and the saturation one just because of the sign of the gate-source overdrive. In the following, we shall refer to these two regions even if this abrupt transition does not actually exist, since sweeping the gate-source overdrive from negative to positive values, the channel under the oxide moves smoothly from the weak-inversion region towards the moderateand strong-inversion regions.
In the analog scenario of a base-band application, a transistor is generally biased in the saturation region but with a small overdrive (i.e., V GS − V TH ≤ 100 mV). This corresponds to the moderate-inversion region where Γ is in the range of 8-16 V −1 . When the speed is of maximum concern for the application, the saturation region becomes mandatory and the overdrive is even increased up to well above 100 mV so that Γ may reduce below 5 V −1 (strong-inversion region). Conversely, in the ultra-low-power context, the transistor is biased in the sub-threshold region where Γ can be as high as 30 V −1 (weak-inversion region). The choice between a low or a high g m /I D ratio depends mainly on the application. As known, the sub-threshold region allows good power efficiency but for limited speed requirements. On the contrary, the saturation region allows higher speed requirements but at the expense of poorer power efficiency. Whatever is the region, the key point is that the choice of the g m /I D ratio is the starting point in the design of analog circuits [59].

Small-Signal Analysis and Stability Requirements
Referring to the block schematic in Figure 1 the transfer function of the open-loop gain can be written as where β is the feedback factor (not shown in the figure), where GBW = βa 0 ω d = βG m1 /C C1 the gain-bandwidth product of the OTA. Finally, assuming the usual and convenient constraint G mf = G m3 [51,60], the remaining coefficients are: The stability of the amplifier can be achieved using the global separation factors introduced in [60] and used in [61,62]. Without going into theoretical detail, our three-stage OTA can be considered as a feedback circuit with two nested loops, each of which must be frequency compensated. Referring to the block schematic in Figure 1, the internal loop is represented by the amplification stages in the direct path between nodes 1 and OUT, with capacitor C C1 acting as the feedback element. The external loop is identified by the amplification stages in the path between nodes IN and OUT, with the overall feedback factor, β (not shown in the figure) acting as the feedback element. The internal separation factor,K i , is responsible for the stability of the internal loop while the external separation factor,K e , is responsible for the stability of the external one.
As long as the zeros of the open-loop gain are placed above the GBW of the OTA, the two global separation factors are defined by [60] For the overall circuit to be stable, both the global separation factors must be set to larger than unity (i.e.,K e > 1 andK i > 1). Among the various possibilities, settinĝ K e =K i = 2 turns the denominator of the closed-loop transfer function into a third-order Butterworth polynomial with a cut-off frequency ω 0 = 2 GBW/(1 + b 1 GBW). In similar fashion, as demonstrated in [62], settingK e = 8/3 andK i = 9/4 optimizes the step response of the OTA, since it minimizes the settling time for a given GBW.
In our specific case, b 1 = 0 and (8) and (9) simplify into: however, multiplying the square of (10) by (11), we obtain the simpler and equivalent set of equations:K e a 1 GBW + b 2 GBW 2 = 1 (12) Substituting (4)-(7) and GBW = βG m1 /C C1 into (12) and (13), the two equations that govern the stability become: where we placed emphasis on four normalized parameters, i.e., C C1 /C C2 , C L /C C2 , G m2 /(βG m1 ) and G m3 /(βG m1 ). Closed-form solutions can be obtained solving (14) and (15) for G m2 /(βG m1 ) and G m3 /(βG m1 ), with respect to the normalized capacitances C C1 /C C2 and C L /C C2 . Therefore, the small-signal analysis and the stability requirements lead to the two design equations: As a final remark, we have to recall that the zeros of the open-loop gain must be placed above the GBW of the OTA. In our case, the constraint GBW < |b 2 | must be satisfied, or: Substituting (16) and (17) into (18), we obtain the equivalent condition: that is easily guaranteed.

Settling Time, Slew Rate and Gain-Bandwidth Product
The speed of an amplifier can be defined either in terms of its settling time, t s , or in terms of its gain-bandwidth product, GBW. In single-pole amplifiers that operate under small-signal condition, the two quantities are clearly related as t s = | ln |/GBW, where is the maximum allowed settling error. The situation is not so straightforward when slew-rate (SR) limitations occur or in multi-pole feedback amplifiers. As known, the SR affects the time response in a non-linear fashion, making the final settling time a function of the input step amplitude. In addition, in multi-pole feedback amplifiers, if the compensation network is not well designed, undesired overshoots or oscillations can severely slow down even the benefits of a promising GBW. Furthermore, this problem is even more relevant when the transistors of the amplifier are biased in the sub-threshold region, as in the case of the ultra-low-power context.
Recently, in [60,63], the authors obtained an approximated but useful design equation that estimates the settling time when slew-rate limitations occur in the first stage of the amplifier. With respect to the small-signal settling time, t s0 , the settling time under slew-rate limitations is: where ∆V o is the amplitude of the step at the output node of the OTA and ν = I o /(βG m1 ) is the equivalent saturation limit of the first stage, where I o is the maximum current that this stage can deliver at its output node. The small-signal settling time, t s0 , depends on the GBW of the amplifier and on the values of the global separation factors,K e andK i . As demonstrated in [62], choosinĝ K e = 8/3 andK i = 9/4 makes the small-signal settling time lower than that of a singlepole amplifier with the same GBW. In other words, under small-signal condition, the amplifier settles in t s0 ≤ | ln |/GBW and, to our purposes, (20) can be simplified into: that relates the settling time, the slew-rate effects and the gain-bandwidth product. Equation (21) can be used to obtain the necessary GBW from a settling-time specification. In fact, supposing that our circuit in Figure 2 is required to settle in t s seconds within a given percentage error, . The transconductance and the maximum output current of the first stage are G m1 = g m1,2 and I o = 2I 1 , respectively. Assuming that the sourcecoupled pair, M1-M2, is biased with a known Γ = g m /I D , the equivalent saturation limit results ν = 2I 1 /(βg m1,2 ) = 2/(βΓ). Substituting this latter value in (21) and considering for the output step, ∆V o , the maximum possible value, V DD , we obtain for the GBW the design equation: In regard to the slew-rate limitations, a final consideration must be pointed out. The design equation in (21) is valid if the slew-rate limitation resides in the first stage. If this condition is not satisfied, the amplifier can experience a positive feedback connection during its slewing period that leads to large overshoots and degrades the settling-time performance [60,64]. To prevent this situation, as usually done in sub-threshold OTAs, slew-rate enhancers [65][66][67][68] or class-AB topologies [26,[69][70][71] must be used for the stages after the first one. In fact, a slew-rate enhancer is adopted for the last stage of the OTA in Figure 2, where the large capacitive load can dominate the slew-rate limitation. The slew-rate enhancer works as follows. Transistors M12-M13 act as a current comparator that senses the output node of the second stage, v 2 . The current comparator is sized so that, when node v 2 is in its quiescent state, transistor M14 is off. Conversely, when node v 2 goes down, M12 increases its current so that transistor M14 is switched on and helps M11 discharge the capacitive load.

Noise Analysis and First-Stage Transconductance
Noise is expressed in terms of input equivalent noise spectral density and, as known, in multi-stage amplifiers it mainly depends on the first stage. Neglecting for simplicity the flicker noise, the general expression of the noise spectral density is: where k B = 13.8 · 10 −24 J/K is the Boltzmann's constant, T is the absolute temperature and c is a coefficient that depends on the topology of the first stage. Referring to Figure 2, in our specific case, c = g m3,4 /g m1,2 ≈ 1 where the latter approximation assumes that the current mirror M3-M4 shares the same g m /I D ratio with the differential pair, M1-M2. Solving (23) for G m1 , we obtain the design equation: that establishes the minimum first-stage transconductance on the basis of noise specifications.

Gain-Bandwidth Product and Current Dissipation
As mentioned previously, especially when the power of the IoT system is harvested from the outside, the designed circuits cannot exceed the driving capability of the power source. In low-power interface circuits, the most critical section is represented by the amplifiers used inside since they affect the power consumption significantly [1]. Therefore, the design of the OTA cannot be separated from the optimization of its power dissipation.
The overall current dissipation of the OTA can be minimized for a given GBW specification. We assume that Γ = g m /I D is a starting point for the transistors of our amplifier. Specifically, Γ is a known quantity set by the designer on the basis of the application [59], and in the ultra-low-power context, it is certainly high (i.e., Γ > 20 V −1 ).
To simplify our discussion, we assume that all the transistors in our amplifier are biased with the same Γ; however, it is quite effortless to adapt the following analysis when different Γ are used. Referring to Figure 2, the total current required to bias the amplifier is: where, as far as the current mirror ratio, r, is concerned, the higher this value is then the lower the total current is since the current in the branch M8-M9 is reduced. However, if the current in M9 is too small, the internal pole at the drain of M8 (whose resistive contribution is 1/g m9 ) may decrease too much and degrade the phase margin. As a rule of thumb, a good trade-off for r is between 3 and 6 and in the following, we assume this as a known parameter. Solving (25) for I 1 and substituting the result into GBW = βG m1 /C C1 = βΓI 1 /C C1 , leads to: where, for the latter expression, we considered G mi = ΓI i . Finally, using (16)-(17), we obtain: where we defined the function: Since the load capacitor, C L , is specified by the application, in the proposed design strategy, we first choose C C2 to be as small as possible but sufficiently higher than the other parasitic capacitive contributions, and then we find the ratio C C1 /C C2 that maximizes the GBW in (27). This latter operation can be done analytically only for C L C C1 , C C2 or, from a practical point of view, when C C2 can be chosen to be at least two orders of magnitude less than C L . In this case, we can simplify F c (C C1 /C C2 ) ≈K eKi C C1 /C C2 and the solution that optimizes the GBW results: In any other case, the ratio C C1 /C C2 that maximizes the GBW has to be numerically determined from (27).

The Design Procedure in the Sub-Threshold Region
In the proposed design procedure, we assume that the power supply, V DD , the load capacitor, C L , and the feedback factor, β, are established on the basis of the specific application. Other OTA specifications, in general, concern speed requirements (given either in terms of minimum GBW or in terms of maximum settling time within a given percentage error, ) and maximum equivalent input noise. However, as we shall discuss in the following, the design steps that make use of the GBW specification are a subset of the design steps that make use of the settling-time specification. Therefore, without loss of generality, the proposed design procedure assumes that speed requirements are specified in terms of maximum settling time. As far as the noise is concerned, we shall not consider it for the moment as it will be discussed at the end of the section.
The first step in the design procedure is to choose the transistors' bias region in terms of Γ = g m /I D . The simplest option is to choose the same Γ for all the active devices of the OTA. However, if the designer has the necessity of using different Γ for different amplification stages (or groups of transistors), the design procedure can be adapted without much effort. Clearly, in our ultra-low-power context, any value of Γ higher than 20 V −1 represents a good design choice. In this step, we also choose the secondary compensation capacitor, C C2 , and the ratio r of the current mirror M9-M10. As discussed in Section 3.4, capacitor C C2 has to be set as small as possible provided that, at the end of the procedure, it must result to be sufficiently higher than the other parasitic capacitive contributions. For the current mirror ratio, r, a value between 3 and 6 represents a reasonable choice.
As a second step, using (22), we evaluated the required GBW from the settling-time specifications. Obviously, if the speed requirements are already given in terms of GBW we shall use this latter value, instead.
In the third step, we have to assign a proper value to the separation factors and find the optimum ratio C C1 /C C2 . In the process, we must consider that settingK e = 8/3 and K i = 9/4 optimizes the small-signal settling time, while settingK e =K i = 2 optimizes the closed-loop bandwidth of the amplifier. In our specific case, C L is large (i.e., about two orders of magnitude higher than C C2 ) and we can use the approximate equation (29) to evaluate C C1 /C C2 . Otherwise, we have to use (27) to calculate numerically the ratio C C1 /C C2 that maximizes the GBW. Since C C2 is known, we easily evaluate C C1 and, subsequently: In the fourth step, we use (16)- (17) to evaluate the ratios G m2 /(βG m1 ) and G m3 /(βG m1 ), and then, the remaining transconductances G m2 and G m3 . Of course, we also have G mf = G m3 .
Finally, in the fifth step, we evaluate the stage currents from I i = G mi /Γ. If noise specification is given, a minimum value for G m1 is established from (24). If this minimum value is higher than that obtained from (30), the procedure shall use the transconductance value sets by the noise specification. This can be accomplished by simply evaluating the maximum between (30) and (24) in the computation of G m1 in the third step of the procedure.
A MATLAB script containing the aforementioned design procedure is reported in the Appendix A. The section 'Specifications' can be changed by the designer on the basis of the final application. In Section 'Step 1', the designer chooses the parameters gamma, CC2 and r. Similarly, in 'Step 3' the designer sets the separation factors, Ke and Ki. The remaining part of the script evaluates the amplifier parameters. Note that the procedure can be implemented without any advanced computational tool and that the script is provided just to summarize the steps. In addition, in contrast to those procedures based on more advanced ad hoc tools (such as that offered in [72]), a fine tuning at the circuit simulator level is in general required to finalize the design.

OTA Design and Validation Results
The three-stage OTA discussed in the previous sections is designed using a 65 nm CMOS process provided by STMicroelectronics. The power supply is set to 1 V.
As far as the specifications are concerned, the OTA is required to settle in 10 µs within the 1-% error, when fed back in unity-gain configuration (i.e., β = 1) and with a load capacitor of 100 pF. These values are inserted in the MATLAB script in the Appendix A. The input noise spectral density is specified as 200 nV/ √ Hz, however, in the design, it will not play any role as it leads to a minimum transconductance smaller than that set by (30).
To improve the efficiency, we set Γ = 30 V −1 . We also set C C2 = 150 fF, r = 4,K e = 8/3 andK i = 9/4. The design procedure leads to the OTA parameters in Table 1. With these parameters, the expected GBW is 253 kHz.
The circuit is designed in the Cadence environment following the transistor-level schematic in Figure 2. The transistors' aspect ratios are sized according to the values in Table 1 and are reported in Table 2.
The simulated Bode plot of the open-loop gain of the OTA is shown in Figure 4 where the magnitude and the phase are depicted. The two black lines, in the magnitude graph and in the phase one come from the simulation for typical transistor models. The colored regions (violet and green, respectively) are the results of a 400-run Monte Carlo simulation that includes both intra-die (local) and inter-die (global) variations. In the format µ ± σ, the OTA exhibits a DC gain of 82.9 ± 0.24 dB, a GBW is 240 ± 27.0 kHz and a phase margin of 58.3 ± 2.8 deg.   The time response to a step input in a unity-gain configuration is reported in Figure 5. Specifically, Figure 5a shows the response to a ±100 mV input step and Figure 5b shows the response to a ±500 mV input step. Moreover, in these cases, the black lines come from the simulation for typical transistor models while the colored regions are the results of a 400-run Monte Carlo simulation that includes both intra-die (local) and inter-die (global) variations. In the format µ ± σ, for the ±100 mV case, the rising step settles in 2.64 ± 0.35 µs and the falling step in 3.84 ± 0.48 µs. When the step increases up to ±500 mV, slew-rate limitations slow down the OTA response that settles in 6.37 ± 0.94 µs and in 6.68 ± 0.57 µs for the rising and the falling step, respectively. Finally, in Figure 6, for the various cases, we plot the distributions of the settling times that come out from the 400-run Monte Carlo simulation. The cases with ±100-mV steps are reported in Figure 6a,b and the cases with ±500 mV steps are reported in Figure 6c

Comparison with Other Recent Sub 1-V Amplifiers
A comparison with a small but significant selection of recent sub 1 V amplifiers has been carried out and the results are summarized in Table 3. The selection includes the gate-driven and bulk-driven three-stage OTAs, where the latter group exhibits lower supply voltages. Power dissipations range from few tens of nano-Watts to few tens of micro-Watts. To have an effective comparison, the "speed" of the OTAs was measured in terms of the GBW instead of using the settling time. This is because the step response was partially characterized in some of the papers used for the comparison. The comparison was made in terms of the two well-known figures-of-merit (FOMs), defined as FOM s = GBW × C L V DD × I DD MHz × pF mW (32) both measuring the goodness of the trade-off between the speed, the load capacitor and the current/power dissipation. From the last two rows of Table 3, it is apparent that the OTA designed with the proposed approach has the best performance even compared to the most recent ultra-low-voltage bulk-driven amplifiers.

Conclusions
In this paper we proposed a new strategy for the design of ultra-low-power CMOS OTAs, using the g m /I D approach and for the IoT scenario. The design strategy allowed to optimize the speed/dissipation in terms of settling time, including slew-rate effects. Despite the fact that procedure was cut out for large capacitive loads and for transistors biased in the sub-threshold region, it is suitable also for low-capacitive loads or for transistors biased in the saturation region. The procedure was validated through the design of the well-known three-stage RNMC-FF OTA, starting from capacitive load and settling time requirements. Simulations confirmed that the OTA satisfied the specifications (even under Monte Carlo analysis) thus proving the correctness of the proposed design strategy.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the non-disclosure agreement signed with owner of the technology process.