Energy Efﬁciency in Slew-Rate Enhanced Single-Stage OTAs for Switched-Capacitor Applications

: Slew-rate enhancement (SRE) techniques assist the charge transfer process in OTA-based switched-capacitor circuits. Parallel-type slew-rate enhancement circuits, i.e., circuits that provide a feed-forward path external to the main OTA, are attractive solutions, since they introduce a further degree of freedom in the speed/power consumption design space without affecting other speciﬁcations regarding the main OTA. This technique lends itself to be employed jointly with advanced OTA topologies in order to compose a highly energy efﬁcient OTA/SRE system. However, insights in design choices such as power optimization are still missing for such systems. Here we discuss system level choices with the help of a simple model. Using precise electrical simulations, we demonstrate energy savings greater than 30% for different OTA/SRE systems implemented in a standard 180-nm CMOS technology.


Introduction
The settling behaviour of Switched-Capacitor (SC) stages, as the one depicted in Figure 1a, has been conveniently described by a simplified model [1][2][3][4][5][6][7][8]. This model breaks down the charge-transfer operation in a SC stage, whether a SC amplifier or a SC integrator, into two phases, corresponding to the idealized operating regions of the OTA: slew rate and linear regions. Hence, the total settling time, t S , is then given by two contributions t 1 (slew-rate time) and t 2 (linear time) as: where: ∆V i (0 + ) is the initial step seen at the OTA's input due to the charge redistribution at t = 0, proportional to ∆V S through the attenuation factor C S /[C S + C P + C F C L /(C F + C L )]; I omax is the OTA's maximum output current; C S = (C S + C P )(1 + C L /C F ) + C L ; V dmax discriminates the OTA's operation region (slew rate for |V i | ≥ V dmax , linear for |V i | < V dmax ), V in = R ∆V S C S /(C S + C P + C F ); where R is the relative error on the output voltage step and finally τ = C S /G m is the time constant in the linear transient of a singlepole OTA, being G m the OTA's transconductance. As shown in Figure 1a, we are interested in the case of large voltage steps which trigger the OTA to operate initially in its slewing region. This leads us to point out that both t 1 and t 2 depends on ∆V S respectively through ∆V i (0 + ) and V in . On the other hand, Equation (1) explicitly shows how V dmax , which is a design parameter, influences the settling time, both in the t 1 and the t 2 terms also. For a Class-A OTA, like the common folded-cascode (FC) OTA, V dmax can be identified with the range of operation of the input differential pair. When sized to operate in weak inversion to achieve the maximum current efficiency [9], V dmax = 2nU T , being n the sub-threshold slope and U T the thermal voltage (≈ 25.7 mV at 25 • C). Figure 1b shows the relative impact of t 1 on the overall settling as a function of the input voltage step assuming the conventional FC OTA modeled from (1). For this OTA, a simple linear relationship between its G m (weak inversion operation of input devices) and I omax exists, which can be easily shown to be: τ I omax = 2nU T C S . This fact allows us to eliminate all the transistor-level design parameters from t 1 /t S . The design space, for this particular case, can be represented as a family of curves parameterized for a given capacitive network (C S , C F , C L , C P ) and the relative error R . In Figure 1b two design cases, named "high-resolution" and "low-resolution", are shown. They correspond to the respective parameters sets in the inset table. These are typical values, mainly derived from kT/C-noise specifications, that can be found in high-resolution (≥16 bits) and lowresolution (12 bits or less) systems. Marginally, specific transistor-level design choices (sizing of input devices) affect C P , but as long as C P is sufficiently smaller than C S , the model accuracy is not compromised. The main OTA, depending on the specific application case, follows design optimization taking into account many other aspects, such as offset, low-frequency noise. Using a parallel-type SRE, the speed/power-consumption trade-off can be targeted without affecting other design parameters.  . SC circuit and its relevant waveforms under the charge transfer process for a stimulus of ∆V S (a); Slew-rate time over total settling time as function input voltage step ∆V S considering a folded-cascode OTA with input pair working in weak inversion (V dmax = 2nU T ) (b). Colored bands correspond to voltage supply (V dd ) regions considering n ranging from 1.5 and 2.0, room temperature conditions. The numerical values of the circuit parameters for high-resolution and low-resolution cases are indicated in table.
In order to achieve the best signal-to-noise ratio, ∆V S is set equal to the maximum magnitude of the input step voltage that we assume to be the supply voltage V dd . Moving from one technological node to another, distinct supply voltage domains are given. In Figure 1b, the V dd regions for 1.2 V, 1.8 V, 3.3 V can be associated to the 65-nm, 180-nm and 350-nm CMOS processes, respectively. As expected, the lower V dd the lower is the impact of t 1 on t S . However, even for the 1.2-V/low-resolution case t 1 is approximately the 75% of t S , justifying the need for power-efficient circuit techniques to reduce t 1 . For example, by reducing t 1 to one third of its original value, t S would be halved. Intuitively this t S reduction can be translated to a design situation where t S is maintained, but the total power consumption is scaled down. We recently proposed a capacitive-boosted slew-rate enhancer (SRE) technique fit to this purpose [10] based on Nagaraj's SRE [11,12]. Nevertheless, a systematic study of this technique to provide clear system-level design insight is still missing.
Here we want to address this issue by analyzing the performances of the SRE technique when combined with more advanced OTA configurations, such as the recycling folded cascode (RFC) [13]. The remainder of this paper is organized as follows: Section 2 develops the settling-time model for power-aware system-level choices; Section 3 introduces energy metrics to evaluate the performances of different OTA/SRE systems by the means of accurate electrical simulations; Section 4 concludes this work by stating the major findings.

System-Level Settling Model
We recently introduced an extension of model in (1) concerning system-level parameters [10]. This model is useful to describe the settling behaviour of the SC stage in Figure 1a whether it employs a single-stage OTA, or a OTA/SRE system: The two addends in the right hand side of (2) correspond to t 1 and t 2 of (1) respectively, normalized by the time t X . Indeed, Equation (2) descends from Equation (1), once the following identities are defined: As evident from (3), the model of (2) emphasizes the role of the quiescent current I sup drawn by the circuit from the supply rail. The parameters in (2) have the following meaning: • t X acts as a normalizing unit for t S taking into account system-level specifications regarding the capacitive load C S , the input step amplitude ∆V S , and the current consumption I sup . • c 1 and c 2 : both parameters mainly depend on the capacitive feedback network C S , C F , C L and the OTA's input parasitic capacitance C P . • k AB expresses the efficiency by which the OTA uses the given I sup to produce its output current when operating in slew-rate region. • k G expresses the efficiency by which the OTA uses the given I sup to produce large transconductance when operating in linear region.
By applying the following transformation: We observe that the same expression in (2) can be used to estimate the static current consumption I sup needed for a given t S , while maintaining the rest of the constraints.
The normalizing current I X , similarly to t X , takes into account only system-level specifications. Figure 2a shows a typical Fully-Differential SC integrator configuration. It is possible to demonstrate its equivalence in terms of settling time with the single-ended circuit in Figure 1a, considering voltages and currents of the single-ended model representing the total differential-mode components of the fully-differential circuit in Figure 2a. It can be easily shown that the only transformation that has to be applied regards the input capacitance of the OTA in the equivalent circuit, C P in Figure 1a, that should be set to twice the input capacitance of the fully-differential OTA. Considering Figure 2a, the other capacitances of the circuit are simply replicated in the single-ended equivalent, i.e.,

Model Extension to OTA/SRE Systems
The SRE of [10] provides a parallel signal path in order to assist the OTA during the charge transfer process; its transistor-level schematic is also shown in Figure 2a. The SRE delivers non-zero output currents only during the slew rate time t 1 , while mirrors Mm1-Mm8 are ideally turned off afterwards. This behaviour is achieved thanks to the current comparison occurring at nodes A-A' and B-B' under the action of Mb1-Mb2 and Mb3-Mb4 transistors which are set to subtract a fixed amount of current I th , determining the turn-on/off threshold of the SRE. For large input differential voltage (> V dmax ), the SRE provides an amount of current equal in magnitude to I omax,SRE .
The introduction of the SRE adds a static current consumption indicated with I sup,SRE = 2I tail . Its maximum output current capability is then measured by: since the SRE is statically biased by 2I tail (bias chain not included) and, by considering a robust sizing with I th = 3I tail /4, it is capable to deliver kI tail /4 at the output under maximum unbalanced condition. Actually, SREs with large values of k together with low I tail , which would represent an optimum design choice for the SRE, show a degradation of their effectiveness. The capacitive-boosting proposed in [10] and implemented by the capacitor C B shown in Figure 2a solves this issue and has been adopted in this work. From the design point of view, the SRE input commutation threshold is designed to coincide with voltage V dmax , that defines the boundary between the input regions where the OTA behaves in a linear and non-linear (saturated) fashion. The threshold-conditioned behaviour of the SRE is similar to that of comparator-based SC circuits introduced in [14]. Ideally, comparator-based SC circuits has a null linear settling time (t 2 = 0 corresponding to V dmax = 0) making t S = t 1 . This condition is extremely beneficial from the power-efficiency point of view since the current drawn from the supply rail is almost entirely used to charge directly the load. In practice, the absence of virtual ground prevents comparator-based SC solutions to be used in medium/high resolution applications. In our case the SRE always operates in parallel with an OTA in order to ensure precise virtual ground settling and thus circumventing the linearity limitations typical of comparator-based SC circuits.
From the system point of view, the overall static current consumption is now composed by the OTA's contribution I sup,OTA and the SRE's contribution I sup,SRE ; this can be accounted for by defining the η parameter such that: η = I sup,SRE I sup,OTA and I sup = I sup,OTA + I sup,SRE = (1 + η)I sup,OTA .
As already stated, the settling time model in (2) is valid also for OTA/SRE system, with due attention to the expressions of k AB and k G . The former, being related to the maximum output current, is given by the combined action of the OTA and SRE, while the latter is strictly related to the G m of the OTA alone which is now biased by a portion of the total supply current, namely I sup /(1 + η): Figure 2b shows the settling time t S and I sup while increasing the k AB of the OTA/SRE system for both the FC and RFC OTA topologies. Solid traces show the possible reduction in t S /t X or I sup /I X when using an ideal SRE (no static power) to increase the k AB . A more realistic prediction is shown by the dotted traces, which account for a η = 10% budget. In any case, substantial benefit of the SRE action is predicted by the model. Detailed discussion on RFC vs. FC parameters (k AB , k G and V dmax ) is presented in the following section. (b) Plot of t S /t X as a function of k AB for two different OTA topologies combined with an ideal (no power consumption) and real (η = 10%) SRE. Note that different k AB values correspond to different design choices for the SRE. The relative settling time t S /t X can be converted to a relative supply current consumption I sup /I X through (4).

Model Extension to Advanced OTA Topologies
The validity of the model in (2) for advanced single-stage OTA architectures, such as RFC [13], Super-Class AB [15], VMA [16] has not been yet demonstrated. The model in (1) and hence (2) hinges on a piece-wise linear approximation of OTA's characteristic of the output differential current as function of input differential voltage, i.e., I od (V id ). A hard threshold, V dmax , is set between large-signal and small-signal regions. Within the V dmax range, i.e., |V id | < V dmax , the model considers the linear small-signal circuit approximation. Outside the V max range, i.e., |V id | ≥ V dmax , the model considers perfect saturation of currents to the I omax value. This highly simplified model is intrinsically prone to inaccuracy [2,3] and cannot be used to fine tune any final design, for which accurate electrical simulations are still needed [17]. Nevertheless it provides a uniform and simple analytic tool useful to compare different OTAs and OTA/SRE architectures, as will be shown in the following discussion.
Here we will discuss the RFC topology [13] as exemplary case study for mapping advanced single-stage OTAs to the model in (2). Although the methodology is of general applicability, exhaustive mapping of other advanced OTA families, as those stemming from [15,16], are beyond the scope of this paper. Figure 3a shows a conceptual schematic of folded cascode architectures formed by a current-steering core and an output section. The current-steering core is in charge to provide the differential voltage to differential current conversion and to properly bias the rest of the circuit. The output section provides low-impendance inputs for the differential current (through Mc1-Mc2) and high-impedance output of the whole OTA. The FC and the RFC OTAs are obtained when the current-steering core is implemented as the standard and the current-recycling core, respectively, as shown in Figure 3a. The RFC core is obtained by equally splitting the input devices to create an auxiliary current path. Thanks to the action of mirrors Mm1-Mm2 and Mm3-Mm4, both the G m and the I omax are enhanced with respect to the FC. Theoretically, in the case of k R = 3, G m is multiplied by 2 and I omax is multiplied by 3. This enhancement comes without any static power penalty [13].
Considering now the piecewise approximation, since no discontinuities are present, the relationship I omax = G m V dmax is set for both the FC and the RFC OTA. Since in the RFC architecture I omax and G m scale differently, the V dmax parameter needs to scale accordingly, i.e., V dmax,RFC = 3 2 V dmax,FC . From the circuit point of view, the wider V dmax is due to the mirrors Mm1-Mm2 and Mm3-Mm4 which provide both biasing and signal propagation, differently from what happens in the standard core where the NMOS section only provides biasing currents. Electrical simulations confirm the theoretical behaviour as shown in Figure 3b where the I od (V id ) characteristics are shown for both the FC and RFC. The inset shows the actual FC and RFC characteristics, resulting in I omax,RFC /I omax,FC = 3.13 and G m,RFC /G m,FC = 1.97. The main plot is normalized to the maximum output current for each topology. The relative piecewise asymptotes are also reported for comparison. The extracted V dmax parameters are found to be 97.0 mV and 154.0 mV for the FC and the RFC, respectively, which are in good agreement with the expected scaling factor. The normalized I od values for V id = V dmax are 77.6% (FC) and 77.2% (RFC) indicating that the impact of non-linearities on the model prediction accuracy is very close, and that, in both cases, the analytical techniques proposed in [2,3] would be equally effective to mitigate inaccuracy.

Energy Efficiency of OTA/SRE Systems
The transient of the SC circuit in Figure 1a implies a quantity of charge delivered to the effective load capacitance C L seen at OTA's output: The magnitude of the total charge delivered from the OTA to C L depends on the total voltage swing at the output node, ∆V oL : where the ∆V o (0 + ) and the ∆V o (∞) are easily calculated from the initial charge redistribution and the asymptotic value for an ideally infinite open-loop gain OTA: Under the action of the fully-differential OTA (or OTA/SRE system), the power supply delivers the charge Q L , given by: Note that in differential-circuits the ∆V oL variation is equally distributed between the output nodes (V op and V on in Figure 2a) around the common mode of the OTA. For the sake of clarity, let us assume that V op and V on undergo a variation of + 1 2 ∆V oL and − 1 2 ∆V oL , respectively. Discharge at V on node occurs due to charge flow to the ground rail, so Q L is only given by the charge variation at V op node. This fact accounts for the 1/2 factor in (11). Finally, the energy E L needed for the charge transfer is calculated considering (8)- (11): It is important to observe that E L is proportional to ∆V S (see (10)) through a rather complex function of the capacitor network. While C S , C F and C L values derive directly from system-level specifications, C P is the result of a specific OTA design. First-hand estimation of E L , prior any OTA design, can be done neglecting C P in (8), (10) and asserting the condition C P C S in the aftermath. In a system where the stochastic or pseudo-stochastic characteristics ∆V S are known, like in a SC ∆Σ modulator [18], E L can be used to estimate the energy needed for signal processing purposes, regardless of the overheads due to the employment of actual circuits. A simple electrical testbench can be employed to numerically calculate the actual energy, E sup , drawn from the supply rail by the OTA or OTA/SRE for a single transition step (∆V S ). The normalized E sup /E L quantity may be employed as a useful indicator to optimize the OTA or OTA/SRE system tailored to its final application. In such testbench different OTA(/SRE) topologies can be tested for efficiency comparison aiding the search for power optimization among different topological solutions.
As exemplary design cases, specifications in Table 1 are assumed. We will discuss the application of the SRE technique in both a FC and RFC topologies in comparison with the FC and RFC alone aiming to fulfill the same settling speed and precision (t S = 15 ns, R = 100 ppm). Regular NMOS and PMOS devices from the UMC 180-nm CMOS process under 1.8-V supply condition are assumed. The comparison methodology starts by designing a FC OTA (FC1) compliant with the specifications in Table 1. The next step is to consider the FC/SRE system, for η = 10%, C B = 500 fF and k = 30 which are a valid set of parameters for nearly optimum behaviour of the SRE [10]. For this configuration the FC biasing currents and its input devices are scaled to maintain the same current density in the input devices and to attain to the same settling time (FC2). A further step is to consider the RFC topology (RFC1) which embeds power-efficient class-AB behaviour. Finally, the RFC/SRE system is considered, with the correspondent current and input devices scaling (RFC2). Since the SRE is completely turned off in the last part of the settling, the noise, offset and gain properties of the original OTA are left unchanged.
Actual transistor parameters and biasing currents are also reported in Table 1. The I tail current predicted by the model Equations (2), (4) and (7), together with the mirror ratios k = 30 for the SRE, k R = 3 for the RFC and η = 10% are: 375 µA for FC1, 195 µA for FC2, 152 µA for RFC1 and 96 µA for RFC2. The FC-OTA parameters are k AB,FC = k G,FC = 0.5, while the RFC-OTA parameters are k AB,FC = k G,FC = 1.5, calculated applying the definitions in (3). The input devices are biased in weak inversion operation in all cases, so that V dmax,FC = 98.8 mV and V dmax,RFC = 148.2 mV. Note that the SRE commutation threshold has been kept around V dmax,FC in both cases for the sake of simplicity; further optimization can be achieved in the SRE/RFC2 design. k AB,SRE has been estimated from the electrical simulations, due to the lack of a proper description of the capacitive-boosting technique effects in the modeling approach. As discussed in [10], the relation (5) is valid only neglecting the turn-on and turn-off transients of the SRE circuit. The actual k AB,SRE , calculated through electrical simulations, is found to be 41.5.
The I tail estimation is quite accurate for FC1 and FC2, while the evident underestimation for RFC1 and RFC2 can be ascribed to the simplistic modeling. More specifically, it derives from the phase-margin degradation of the RFC topology due to the non-dominant pole determined by the current mirrors Mm1-Mm2 and Mm3-Mm4 (see Figure 3a). In such condition the single-pole OTA approximation used in Equation (1) is not accurate, reinforcing the need for more refined models to abstract circuit behaviour including the presence of non-dominant singularities in the OTA frequency response [8]. Table 2 lists the results from electrical simulation using Spectre/Cadence. As expected, the use of the SRE greatly enhances the E sup /E L figure of merit in both cases, i.e., FC1 vs. FC2 + SRE and RFC1 vs. RFC2 + SRE, which in the second case showed to be even more beneficial than in the first case. Interestingly, the use of the SRE coupled to the standard FC showed to surpass the efficiency performances of the RFC alone, proving to be a quite effective and versatile technique. In absolute terms, the energy reduction enabled by the SRE is 34% for both the FC and the RFC topology. The t 1 /t S has been also estimated using the model with the k G corrected values. In the first case, a reduction of almost 1/5 is obtained, while in the second case the slew-rate time is approximately divided by three. As already mentioned, offset and noise performances are not affected by the action of the SRE, with respect to the OTAs considered individually. For this reason we do not report comparative figures in Table 2.

Conclusions
This work discuss energy efficiency optimization by using parallel-type SRE circuits to assist single-stage OTAs in the charge transfer process. Detailed electrical simulations demonstrated that power savings greater of 30% are achieved both when using standard Class-A OTAs and more advanced OTA topologies like the recycling folded cascode topology. The optimization process is aided by a simple model useful to fairly compare different OTA topologies. Model accuracy limitations, when used to predict absolute power figures, are also discussed.