0.5-V Frequency Dividers in Folded MCML Exploiting Forward Body Bias: Analysis and Comparison

Two frequency divider architectures in the Folded MOS Current Mode Logic which allow to operate at ultra-low voltage thanks to forward body bias are presented, analyzed, and compared. The first considered architecture exploits nType and pType divide-by-two building blocks (DIV2s) without level shifters, whereas the second one is based on the cascade of nType DIV2s with input level shifter. Both the architectures have been previously proposed by the same authors with higher supply voltages, but are able to work at a supply voltage as low as 0.5 V due to the threshold lowering allowed by forward body bias. For each architecture, analytical design strategies to optimize the divider under different operation scenarios are considered and a comparison among all the treated case studies is presented. Simulation results considering a commercial 28 nm FDSOI CMOS process are reported to confirm the advantages and features of the different architectures and design strategies. The analysis show that the use of the forward body bias allows to design frequency dividers which have the best efficiency. Moreover, we have found that the frequency divider architecture based on nType and pType DIV2s without level shifter provides always better performance both in terms of speed and power consumption approaching about 17 GHz of maximum operating frequency with less than 30 μW power consumption.


Introduction
Thanks to its very low switching noise capability and to its intrinsic robustness, MOS current mode logic (MCML) is still a very popular digital circuit approach which finds use in a wide range of applications, from high-accuracy mixed-signal circuits to very highspeed integrated systems [1][2][3][4][5][6][7][8][9][10][11][12]. Indeed, in mixed-signal applications, the low switching noise dramatically reduces the digital noise induced on the analog circuits, and intrinsically also provides a better signal integrity [13,14].
Despite its features, traditional MCML has, as one of the most important issues to cope with, the dramatic power supply reduction in the recently developed deeply scaled technologies. Indeed, the possibility to stack several transistors adopted in the conventional MCML is strongly limited by the low voltage environment, making the conventional MCML unsuitable for supply voltages lower than 1 V.
To implement MCML topologies able to work with a reduced supply voltage, a solution based on a bipolar CML triple tail cell [26] was originally suggested and analyzed [27][28][29]. Moreover, recently, to implement a MCML suitable for low voltage operation,

FMCML Latch and FlipFlops Topologies
The topology of the conventional FMCML latch is depicted in Figure 2a, whereas Figure 2b shows the topology of the FMCML latch exploiting forward body bias (FBB) and the improved current mirror. The FBB technique is a widely adopted strategy not only in digital VLSI circuits (for example to cope with process parameter variations) [40,41], but also in analog design to improve circuit performance or lower the minimum allowable supply voltage [37,38]. The implementation of FBB requires a triple well CMOS technology, but it is worth noting that nowadays this is the standard for advanced CMOS processes.
In the FMCML latch scheme represented in Figure 2b, the improved current mirror which should be adopted in actual implementations is also explicitly indicated. In particular, transistors M7A and M8A equalize the drain-source voltages of M7 and M8 to strongly increase the current mirror accuracy under low voltage conditions, thus improving latch noise margin and propagation delay [30,39].
Starting from the topology in Figure 2b, the FMCML D Flip-Flop (DFF) with FBB has the scheme in Figure 3 [39]. This topology has the D and the clock inputs on NMOS and PMOS differential pairs, respectively, and will be denoted as nType DFF in the following. However, to implement the frequency divider architecture in [35], also the pType FMCML DFF (which has the D and the clock inputs on PMOS and NMOS differential pairs, respectively) is required. The topology of the pType FMCML DFF with FBB is reported in Figure  3b.

FMCML Latch and FlipFlops Topologies
The topology of the conventional FMCML latch is depicted in Figure 2a, whereas Figure 2b shows the topology of the FMCML latch exploiting forward body bias (FBB) and the improved current mirror. The FBB technique is a widely adopted strategy not only in digital VLSI circuits (for example to cope with process parameter variations) [40,41], but also in analog design to improve circuit performance or lower the minimum allowable supply voltage [37,38].

FMCML Latch and FlipFlops Topologies
The topology of the conventional FMCML latch is depicted in Figure 2a, whereas Figure 2b shows the topology of the FMCML latch exploiting forward body bias (FBB) and the improved current mirror. The FBB technique is a widely adopted strategy not only in digital VLSI circuits (for example to cope with process parameter variations) [40,41], but also in analog design to improve circuit performance or lower the minimum allowable supply voltage [37,38]. The implementation of FBB requires a triple well CMOS technology, but it is worth noting that nowadays this is the standard for advanced CMOS processes.
In the FMCML latch scheme represented in Figure 2b, the improved current mirror which should be adopted in actual implementations is also explicitly indicated. In particular, transistors M7A and M8A equalize the drain-source voltages of M7 and M8 to strongly increase the current mirror accuracy under low voltage conditions, thus improving latch noise margin and propagation delay [30,39].
Starting from the topology in Figure 2b, the FMCML D Flip-Flop (DFF) with FBB has the scheme in Figure 3 [39]. This topology has the D and the clock inputs on NMOS and PMOS differential pairs, respectively, and will be denoted as nType DFF in the following. However, to implement the frequency divider architecture in [35], also the pType FMCML DFF (which has the D and the clock inputs on PMOS and NMOS differential pairs, respectively) is required. The topology of the pType FMCML DFF with FBB is reported in Figure  3b. The implementation of FBB requires a triple well CMOS technology, but it is worth noting that nowadays this is the standard for advanced CMOS processes.
In the FMCML latch scheme represented in Figure 2b, the improved current mirror which should be adopted in actual implementations is also explicitly indicated. In particular, transistors M 7A and M 8A equalize the drain-source voltages of M 7 and M 8 to strongly increase the current mirror accuracy under low voltage conditions, thus improving latch noise margin and propagation delay [30,39].
Starting from the topology in Figure 2b, the FMCML D Flip-Flop (DFF) with FBB has the scheme in Figure 3 [39]. This topology has the D and the clock inputs on NMOS and PMOS differential pairs, respectively, and will be denoted as nType DFF in the following. However, to implement the frequency divider architecture in [35], also the pType FMCML DFF (which has the D and the clock inputs on PMOS and NMOS differential pairs, respectively) is required. The topology of the pType FMCML DFF with FBB is reported in Figure 3b.

FMCML Latch Propagation Delay Model
Propagation delay from the clock input node to the output of the FMCML latch, t LATCH , is the key parameter to optimize the FMCML DFFs speed performance and to estimate the resulting frequency dividers performance. It can be calculated, as shown in [30,39], by using a linearized simplified model and the open-circuit time-constant method [42][43][44]. In particular, as detailed in [30], by inspection of the small-signal differential half-circuit Electronics 2021, 10, 1383 5 of 17 model of the FMCML latch, we can find three main contributions to t ATCH , which arise from the three circuit sections below (see Figure 2a):

•
The clock input sub part, which gives the time constant τ 1 and includes the differential pair M 1 -M 2 , loaded by the diode connected devices M 7 -M 8 ; • The folding section, given by the unity gain current mirrors and, in particular, the output transistors of current mirror M 9 -M 10 , which determines the time constant τ 2 ; • The output section, which gives the time constant τ 3 and is due to the differential pair M 3 -M 4 , loaded by the triode devices M D .
Thus, from these three contributions, whose detailed derivation can be found in [34,35], we get: Following the MCML design strategy reported in [45], it can be demonstrated, as shown in [34,35], that the first two time constants, τ 1 and τ 2 , do not depend on the latch bias current I SS , while the third one, τ 3 , is constituted by a part which is inversely proportional to I SS , and another one which is constant (As shown in [34,35], inside τ 3 there is a third contribution, named in those papers τ RD , which is related to the kind of load adopted-a MOS in triode region or a resistance load [46]-and may have a slight I SS dependence. However, this contribution is in any case negligible). Hence, relationship (1) can be rewritten as: where ∆V = R D I SS is the half voltage swing, where R D is the equivalent resistance of the triode-biased MOS load M D , C Lex is the extrinsic capacitive load (i.e., the load contribution which is not due to the latch itself, but to the circuit loading the latch) and is the constant contribution, where τ MOS3 includes also the time constant at the output node due to the intrinsic capacitive load, R D C Li . The intrinsic capacitive load C Li includes both the output capacitance of the DFF and the loading effect due to the DFF itself, that is closed in unitary feedback as shown in Figure 1b.

Level Shifter Propagation Delay Model
The frequency divider architecture presented in [34] is based on the cascade of nType DIV2s. In such configuration, the output of the i-th DIV2 (i.e., the output Q of the nType DFF) has to drive the clock input of the of the (i + 1)-th DIV2 (i.e., the CK input of the DFF). Since in the nType FMCL DFF depicted in Figure 3a the common mode voltage of the output Q is higher than the common mode voltage of the CK input, a source follower level shifter at the clock input of the DFF is needed in the nType DIV2 block for the architecture in [34] (see Figure 4).
Evaluation of the level shifter time constant and its inclusion in the DIV2 design is a key factor to achieve an optimized design. In particular, according to [30], the level shifter propagation delay is given by t pLS = τ p ln 2 (4) where the time constant τ p can be divided into two contributions: • A constant part, τ Bconst , i.e., independent from the level shifter bias current, I B , which accounts for all the capacitances, unless the one, C LB , which loads the level shifter but extrinsic to it; • The contribution related to the level shifter extrinsic load (i.e., the clock input latch capacitance) which is inversely proportional to the level shifter bias current I B . Electronics 2021, 10, x FOR PEER REVIEW 6 of 17 Evaluation of the level shifter time constant and its inclusion in the DIV2 design is a key factor to achieve an optimized design. In particular, according to [30], the level shifter propagation delay is given by where the time constant τp can be divided into two contributions: • A constant part, τBconst, i.e., independent from the level shifter bias current, IB, which accounts for all the capacitances, unless the one, CLB, which loads the level shifter but extrinsic to it; • The contribution related to the level shifter extrinsic load (i.e., the clock input latch capacitance) which is inversely proportional to the level shifter bias current IB.
(The contribution due to the zero which arises by the circuit analysis is demonstrated to be negligible.) Since it can be demonstrated that the input capacitance of the latch is linearly related to the latch bias current ISS, the level shifter propagation delay in (2) can be rewritten as: where, like τBconst, also τBLex is independent from the level shifter bias current IB and K = IB/ISS is the ratio between the level shifter bias current and the latch bias current.

FMCML DIV2 Speed Performance
In general, the DIV2 maximum toggle frequency defines the speed performance of the static frequency divider [27,47]. Since the FMCML DFF in Figure 3 has a master-slave configuration (i.e., two cascaded D latches having counter-phase clock signals implemented through a single clock differential pair and current mirrors [39]), the static frequency divider speed performance is set by the clock-to-output propagation delay, tCKQ, of the DFF (strictly related to the propagation delay from the clock input node to the output of the FMCML DFF slave latch, tLATCH). Moreover, since the DIV2 is implemented Since it can be demonstrated that the input capacitance of the latch is linearly related to the latch bias current I SS , the level shifter propagation delay in (2) can be rewritten as: where, like τ Bconst , also τ BLex is independent from the level shifter bias current I B and K = I B /I SS is the ratio between the level shifter bias current and the latch bias current.

FMCML DIV2 Speed Performance
In general, the DIV2 maximum toggle frequency defines the speed performance of the static frequency divider [27,47]. Since the FMCML DFF in Figure 3 has a master-slave configuration (i.e., two cascaded D latches having counter-phase clock signals implemented through a single clock differential pair and current mirrors [39]), the static frequency divider speed performance is set by the clock-to-output propagation delay, t CKQ , of the DFF (strictly related to the propagation delay from the clock input node to the output of the FMCML DFF slave latch, t LATCH ). Moreover, since the DIV2 is implemented through a unitary feedback DFF (Figure 1b), the signal at its clock input must exhibit a minimum period greater than two t CKQ for the DIV2 to properly work (starting from one clock edge, we need a t CKQ of the slave latch to obtain a stable output (the output of the master DFF) and another t CKQ from the input to have stable the intermediate output (the output of the master latch).
In order to analyze more in detail the dependence of t CKQ on design parameters, we consider two possible implementations of the DIV2 block: 1.
DIV2 block implemented by a unitary feedback DFF without level shifter on the clock input ( Figure 3); 2.
DIV2 block implemented by a unitary feedback DFF with level shifter on the clock input ( Figure 4).
Referring to Case 1, in which the DIV2 is realized with a DFF without the input level shifter, the t CKQ of the DFF is given by (2) (i.e., t CKQ = t LATCH ), where C Lex is the capacitive load due to the next DIV2. In general, especially for the applications under consideration, the constant part inside relationship (2) is dominant with respect the other one [34] and the DFF speed is almost constant versus I SS . Thus, even trying to optimize the DIV2 performance, the DFF bias current should be set as low as possible. However, in case the external load is too heavy, thus affecting too negatively the DIV2 speed performance, we can damp it by properly increasing the DFF bias current I SS .
For Case 2, in which the DIV2 includes the level shifter at the clock input to provide the required common mode voltage level shift, a wider design scenario is opened. In particular, the t CKQ of the DFF results: Again, we can consider the last term of (6) negligible; the bias current I SS can, therefore, be set as low as possible to minimize the power consumption without affecting speed and the only open variable to optimize the design is the ratio K, which in turn means to properly set the bias current of the input level shifter with respect to the minimum value of I SS . In particular, if we want the DIV2 with the minimum t CKQ , which means almost equal to τ Bconst + τ const , a sufficiently high K has to be set and a too high current consumption can be required.
On the other hand, a different design strategy can be pursued to set the most suited K parameter to minimize the power delay product (PDP) defined as the product between the power consumption of the DIV2 P DIV2 and the propagation delay t CKQ . of the DFF which implements the DIV2 itself. The power consumption of the DIV2 with the level shifter at the input ( Figure 4) is given by: Thus multiplying (7) by the approximated (6), in which the last term is not considered, the DIV2 power delay product (PDP) versus K results with a hyperbolic curve whose minimum is: and is surely lower than one. Moreover, if we want to minimize the energy-delay product (EDP) (i.e., the optimum tradeoff between the energy per operation and speed), combining (7) and the approximated (6) we get: whose minimum, evaluated by setting to zero its derivative with respect to K

Architecture with nType and pType DIV2 without Level Shifters
A 2 N static frequency divider can be implemented cascading N DIV2 building blocks, but, if we want to use the simplest DIV2 without the input level shifter, we cannot use the same DIV2 type for each stage. Indeed, considering an nType DIV2, the output common mode voltage, V CM,0 , (equal to V DD −∆V/2) is significantly higher than the maximum input common mode at the PMOS differential pair clock input (equal to V DD −|V TH |−2 V DSsat ). For example, with a deep submicron CMOS technology with |V TH | about equal to 0.25 V, which can be easily achieved exploiting FBB, and assuming V DSsat to be about 50 mV and ∆V between 0.2-0.3 V, the difference between the maximum allowable common mode at the clock input and the output common mode (i.e., ∆V/2−|V TH |−2 V DSsat ) results a negative value.
The problem discussed above can be solved by alternating in the frequency divider architecture nType and pType DIV2 stages [35] (see Figure 5), since the output common mode of the nType DIV2 is compatible with the maximum input common mode at the NMOS differential pair clock input and vice versa. Of course, since the speed of the first DIV2 is crucial for the maximum divider operating frequency, the first DIV2 has to be a nType one, since it has surely a t CKQ lower than the pType one, due to the lower transition frequency of PMOS devices.

Architecture with nType and pType DIV2 Without Level Shifters
A 2 N static frequency divider can be implemented cascading N DIV2 building blocks, but, if we want to use the simplest DIV2 without the input level shifter, we cannot use the same DIV2 type for each stage. Indeed, considering an nType DIV2, the output common mode voltage, VCM,0, (equal to VDD−ΔV/2) is significantly higher than the maximum input common mode at the PMOS differential pair clock input (equal to VDD−|VTH|−2 VDSsat). For example, with a deep submicron CMOS technology with |VTH| about equal to 0.25 V, which can be easily achieved exploiting FBB, and assuming VDSsat to be about 50 mV and ΔV between 0.2-0.3 V, the difference between the maximum allowable common mode at the clock input and the output common mode (i.e., ΔV/2−|VTH|−2 VDSsat) results a negative value.
The problem discussed above can be solved by alternating in the frequency divider architecture nType and pType DIV2 stages [35] (see Figure 5), since the output common mode of the nType DIV2 is compatible with the maximum input common mode at the NMOS differential pair clock input and vice versa. Of course, since the speed of the first DIV2 is crucial for the maximum divider operating frequency, the first DIV2 has to be a nType one, since it has surely a tCKQ lower than the pType one, due to the lower transition frequency of PMOS devices. After the first nType DIV2, the second stage is a pType DIV2, thus, in order to guarantee that the frequency divider speed performance is set by the first one, and naming tCKQ,nType and tCKQ,pType the propagation delays of the nType and pType DIV2, respectively, the following condition has to be satisfied: Moreover, since the DIV2 input capacitance is linearly related to the DIV2 bias current, expressing the DIV2 input capacitances at the clock inputs as:  After the first nType DIV2, the second stage is a pType DIV2, thus, in order to guarantee that the frequency divider speed performance is set by the first one, and naming t CKQ,nType and t CKQ,pType the propagation delays of the nType and pType DIV2, respectively, the following condition has to be satisfied: Moreover, since the DIV2 input capacitance is linearly related to the DIV2 bias current, expressing the DIV2 input capacitances at the clock inputs as: C in,p,nType = c in,p,nType I SS,nType (12) C in,n,pType = c in,n,pType I SS,pType from (2) we can write t CKQ,nType = ln 2 τ const + c in,n,pType I SS,pType I SS,nType ∆V In case we want to design the frequency divider for the maximum speed performance, hence the first nType DIV2 with the minimum t CKQ,nType , from (13) we have to set I SS,nType sufficiently higher than I SS,pType . As shown in [35], when in (13) I SS,nType = I SS,pType , 3/4 of the contribution to t CKQ,nType is due to τ const ; hence, I SS,nType at least two times higher than I SS,pType allows a t CKQ,nType value very close to its minimum asymptotic value. Alternatively, if we want to minimize the PDP, which for this DIV2 without level shifters is equal to: the minimum allowable I SS,nType has to be used. Indeed, with the constant term τ const having the greatest weight on (13), from (14) the increase in power consumption with current is higher than the delay decrease.

Architecture Based on the Cascade of nType DIV2 with Input Level Shifter
In case we design a frequency divider using DIV2 blocks with a level shifter at the input, the simplest procedure is to use identical nType DIV2 blocks designed for one of the optimizing conditions: maximum speed, minimum PDP or minimum EDP. In particular, for all the three cases, I SS has its minimum allowable value, while K (i.e., the bias current of the common source input stage, I B ) has to be set according to: • Highest speed, K = 2-3; • Minimum PDP, K given by (8); • Minimum EDP, K given by (10).
An efficient design strategy can be pursued by customizing the DIV2 blocks of the frequency divider, thus achieving the maximum speed performance with the minimum allowed power dissipation. Indeed, remembering that each DIV2 cell operates at a halved frequency with respect to the previous one, we can tune accordingly in each DIV2 the bias current of the level shifters. In particular, from (6) and neglecting the last term, we get: Thus, assuming the first DIV2 designed for the maximum speed, i.e., K 1 = 2, we can derive the K i of the following stages from the relationship below (see Appendix of [34] for more details), whose α i values are summarized in Table 1 K

Preliminary Remarks and Comparison among the Topologies and Design Strategies
Preliminary considerations among the various frequency divider architectures and design strategies can be pursued comparing the results from the original papers in which the FBB was not implemented and the supply voltage was 0.8 V [34,35]. The comparison can outline the direction and potentiality of each approach that can eventually be further confirmed for the ultra-low voltage implementation allowed by the FBB technique.
Considering the frequency divider by 16 presented in [35], whose architecture exploits nType and pType DIV2 blocks without level shifters, two cases of interest can be considered. The first design case has the target of minimum PDP and the other one the maximum speed performance. In addition, a third design case, the minimum power one, could be considered, in which all the DIV2 bias currents are set to the minimum allowable value, but as shown in [35] this case is not of practical value for the too low speed performance achieved.
For the first design case, the first DIV2, which is an nType one, is designed for the minimum PDP (i.e., the minimum allowable I SS,nType is set for the first DIV2, equal to 5 µA). Moreover, to satisfy (11) the second DIV2, which is a pType one, has a slightly higher bias current (the optimal current is 7 µA in the considered design) and all the following DIV2 blocks regardless of their type have the minimum allowable bias current (i.e., 5 µA). In the second case, at the price of a higher power consumption, the first nType DIV2 is designed for the minimum propagation delay, which needs a bias current about equal to two times the optimal current of the following pType DIV2 (i.e., 14 µA) and all the other DIV2 blocks are biased with the same current level of the previous design case. The results in [35] show that the minimum PDP design reaches a maximum operating frequency of 10.5 GHz with a power consumption of 52.8 µW, whereas the maximum speed desing operates up to 12.2 GHz with a power consumption of 74.4 µW. It is worth noting that, as stated in [35], the divider power consumption increases linearly with the number of stages N and, unless for the initial offset due to the contribution of the first two DIV2 blocks, the increase is due to the power required by a minimum power DIV2.
Considering the frequency divider architecture based on the cascade of nType DIV2 blocks with a level shifter at the clock input discussed in [34], the inspection of the results reported for a divide-by-eight frequency divider shows that only the cases with customized DIV2 blocks according to (16) seem of interest, since the others, despite the simpler design procedure, have apparently a non-negligible price in terms of power consumption. From the results summarized in that paper (not reported for brevity), all have more power consumption than the cases with nType and pType DIV2. In addition, in this case, we have a power consumption which increases linearly with N, and the increase is heavier in terms of power consumption, due to the higher consumption level also in the minimum power DIV2 (mainly due to the level shifters bias current).
On the other hand, despite being more energy hungry, the architecture with a cascade of nType DIV2 with level shifters at the input is able to achieve the best speed performance if designed for this target. Of course, we expect almost the same behavior also when applying the FBB. To verify this point, we consider in the following both the divider architectures, and in particular the design cases of maximum speed and minimum PDP with customized DIV2 blocks, that seem to be the most significant.

Simulation Results and Comparison
In this section we report the simulation results of the frequency dividers by 16 based on the FMCML latches exploiting FBB, designed according to the different approaches and design guidelines described in the previous sections.
To quantitatively evaluate and compare the frequency divider architectures using DIV2 with FBB, the commercial 28 nm FD-SOI CMOS technology by STMicroelectronics [48] has been considered. The main parameters of this technology are summarized in Table 2. Since we are adopting an FDSOI CMOS process, the value of the body to source voltage is not limited by the forward biasing of bulk diodes and in this case V BBP can be set lower than ground and V BBN higher than V DD to maximize the threshold lowering effect due to FBB. In particular, V BBN has been set to 1 V and V BBP to −1.5 V in order to have V TN about equal to |V TP | around 0.25 V. The adopted values are in the range allowed by the technology (which is −3 V to 3 V as reported in [49]) and can be implemented by suitable back bias generators as in [50].  Simulations have been carried out in the Cadence Virtuoso environment by using the Spectre simulator and the accurate models provided by the IC manufacturer.
A commonly used FOM in the literature to compare the performance of frequency dividers is simply given by: where f max is the maximum operating frequency and P TOT the total power consumption. However, FOM 1 is mainly adopted to compare divide-by-two frequency dividers; when dividers with different division factors N DIV have to be compared, some sort of normalization to the number of cascaded DIV2 building blocks is required. To allow a better comparison among the different designs, we have adopted also the figure of merit (FOM 2 ) defined in (19) [35], that takes into account the maximum operating frequency, the total power consumption and the division factor N DIV (base-2 logarithm of the division factor is the number of DIV2 stages):

Architecture with nType and pType DIV2 Without Level Shifters and Exploiting FBB
Both nType and pType DFFs (to implement the DIV2 blocks) have been designed exploiting FBB which allowed a supply voltage as low as 0.5 V; the voltage swing has been set to 0.6 V (∆V = 0.3 V) and a triode-biased load has been exploited as load device. The minimum current to avoid operating the devices in subthreshold is in both cases 4 µA (note that lower power supply and use of the FBB allow a minimum current slightly lower that the value in the previous authors' paper, equal to 5 µA), whereas the optimum current, corresponding to minimum size load devices and hence minimum propagation delay [46] is 6 µA and 5.5 µA, respectively, for nType and pType latches. Table 3 reports the sizing of the devices of the DFFs (see Figure 3) in the case of optimum bias current (hence minimum PDP). The frequency divider architecture using nType and pType DIV2 blocks with FBB, as in Figure 5, follows the design procedure in [35] and summarized in the previous section. In particular, the bias current of the first nType DIV2 for the minimum PDP and the maximum speed design cases is set to 6 µA and 11 µA (twice the current of the pType DFF), respectively. The second pType DIV2 is biased at its optimum current of 5.5 µA, and the other two nType and pType DIV2 are biased at the minimum bias current that allows strong inversion operation, i.e., 4 µA. In these conditions the propagation delay t CKQ of the nType DFF biased at 11 µA loaded by the pType DFF biased at 5.5 µA has been found to be 31 ps, whereas the t CKQ of the nType DFF biased at 6 µA loaded by the pType DFF biased at 5.5 µA has been found to be 34 ps.
The bias currents together with the resulting divider performance are summarized in Table 4. It can be observed that in this case the advantage of biasing the first DIV2 stage for maximum speed is very limited (an increase on the maximum speed slightly lower than 5%). This is in agreement with the theoretical results in [35], where it is shown that the time constant τ 3MOS,nType is dominant and increasing I SS,nType results only in a small improvement. Results in Table 4 also confirm that T CK,MI N , results a little bit lower than 2t CKQ , as it has been pointed out in [35]. Table 4. Summary of dividers by 16 with nType and pType DIV2 without level shifters (with FBB).

Architecture with Only nType DIV2 with Input Level Shifters and Exploiting FBB
The divide-by-16 frequency divider using only nType DIV2 blocks with input level shifters and FBB follows the guidelines described in [34] and summarized in the previous section. We have considered nType DFFs biased at the optimum current of 6 µA (design parameters are the same of the nType DFF discussed in the previous subsection); the input level shifters (see Figure 4) exploit FBB and feature devices with minimum gate length, and a gate width of 100 nA for each microampere of bias current I B . It has to be noted that the required voltage shift is slightly lower than the threshold value, thus forcing the level shifters to work in near threshold region: this requires a lower current density for such devices, hence, larger gate widths than the devices in the latches for similar current levels.
For the case study of the divider-by-16, only the procedures customizing DIV2s have been considered, in the light of the results reported in [34]. We have considered both the cases of maximum speed and minimum PDP to set the bias current of the first level shifter, whereas for the following stages we have used the design guideline given by (16); we have, however, limited the minimum current ratio K to 0.1 (i.e., in this case 600 nA bias current for each level shifter). The resulting bias current ratios K = I B /I SS together the divider performance are summarized in Table 5. The output waveforms of the divide-by-16 frequency dividers reported in Tables 4 and 5 are shown in Figure 6a,b respectively. Each plot contains the output of the maximum operating frequency design and that of the minimum PDP design, at the respective maximum operating frequency.
The output waveforms of the divide-by-16 frequency dividers reported in Tables 4  and 5 are shown in Figure 6a,b respectively. Each plot contains the output of the maximum operating frequency design and that of the minimum PDP design, at the respective maximum operating frequency.
(a) (b) Figure 6. Output waveforms of the divide-by-16 frequency dividers with nType and pType DIV2 blocks without level shifters (a) and with only nType DIV2 with input level shifters (b). Solid lines refer to maximum speed design whereas dashed lines to minimum PDP design.

Effects of Process, Supply Voltage and Temperature Variations
In order to assess the robustness of the proposed circuits to process, supply voltage and temperature (PVT) variations, we have performed some parametric and corner simulations of the tCKQ of both the nType and pType DFFs which implement the first DIV2 stages of the maximum speed designs in Tables 4 and 5.
The nType DFF biased with = 11 and loaded by a pType DFF biased with = 5.5 exhibits a tCKQ = 31.1 ps in typical conditions, which ranges from 33 ps to 28.5 ps when the temperature is swept between −20 °C and 120 °C. When the supply voltage is changed from 0.45 V to 0.55 V (±10% variation), tCKQ ranges from 37.5 ps to 29.5 ps.
The pType DFF biased with = 5.5 and loaded by a nType DFF biased with = 4 exhibits a nominal tCKQ = 85.6 ps, which ranges from 96.3 ps to 70.6 ps when the temperature is swept between −20 °C and 120 °C. When the supply voltage is changed from 0.45 V to 0.55 V, tCKQ ranges from 128 ps to 65 ps.
The nType DFF biased with = 6 with the input level shifter biased = 12 and loaded by the following DIV2 stage (i.e., input level shifter biased with = 1.5 and nType DFF biased with = 6 ) exhibits a tCKQ = 55.0 ps in typical conditions, which ranges from 51.0 ps to 55.5 ps when the temperature is swept between −20 °C and 120 °C. When the supply voltage is changed from 0.45 V to 0.55 V, tCKQ ranges from 41 ps to 55.1 ps. As a further check, we have performed corner simulations and results are summarized in Table 6 for all the considered DFFs. Results of parametric and corner simulations confirm the robustness of the proposed dividers to PVT variations; some performance degradation when the supply voltage is 10% lower than the nominal value is evident in the pType DFF.
These results show a greater sensitivity of the pType DFF (hence of the nType/pType architecture) to supply voltage. In particular, reducing the supply voltage results in a net increase of tCKQ. This is probably due to some of the devices approaching the triode region, and could be contrasted by scaling the body bias voltages with the supply voltage to further reduce the threshold.

Effects of Process, Supply Voltage and Temperature Variations
In order to assess the robustness of the proposed circuits to process, supply voltage and temperature (PVT) variations, we have performed some parametric and corner simulations of the t CKQ of both the nType and pType DFFs which implement the first DIV2 stages of the maximum speed designs in Tables 4 and 5.
The nType DFF biased with I SS = 11 µA and loaded by a pType DFF biased with I SS = 5.5 µA exhibits a t CKQ = 31.1 ps in typical conditions, which ranges from 33 ps to 28.5 ps when the temperature is swept between −20 • C and 120 • C. When the supply voltage is changed from 0.45 V to 0.55 V (±10% variation), t CKQ ranges from 37.5 ps to 29.5 ps.
The pType DFF biased with I SS = 5.5 µA and loaded by a nType DFF biased with I SS = 4 µA exhibits a nominal t CKQ = 85.6 ps, which ranges from 96.3 ps to 70.6 ps when the temperature is swept between −20 • C and 120 • C. When the supply voltage is changed from 0.45 V to 0.55 V, t CKQ ranges from 128 ps to 65 ps.
The nType DFF biased with I SS = 6 µA with the input level shifter biased I B = 12 µA and loaded by the following DIV2 stage (i.e., input level shifter biased with I B = 1.5 µA and nType DFF biased with I SS = 6 µA) exhibits a t CKQ = 55.0 ps in typical conditions, which ranges from 51.0 ps to 55.5 ps when the temperature is swept between −20 • C and 120 • C. When the supply voltage is changed from 0.45 V to 0.55 V, t CKQ ranges from 41 ps to 55.1 ps.
As a further check, we have performed corner simulations and results are summarized in Table 6 for all the considered DFFs. Results of parametric and corner simulations confirm the robustness of the proposed dividers to PVT variations; some performance degradation when the supply voltage is 10% lower than the nominal value is evident in the pType DFF. These results show a greater sensitivity of the pType DFF (hence of the nType/pType architecture) to supply voltage. In particular, reducing the supply voltage results in a net increase of t CKQ . This is probably due to some of the devices approaching the triode region, and could be contrasted by scaling the body bias voltages with the supply voltage to further reduce the threshold.

Final Comparison and Remarks
A comparison of the results in Tables 4 and 5 shows that, in the case where FBB is used, the divider architecture based on the cascade of complementary DFFs without using level shifters provides both a higher maximum frequency and a lower power consumption. This is different from what observed in [34,35], where the use of level shifters allowed a higher maximum frequency at the expense of increased power, and is due to the fact that the low adopted supply voltage requires large devices in the level shifters to operate in near-threshold region. The level shifters are therefore no more able to minimize the loading effect of a DFF on the previous one in the cascade, and instead they add a contribution to the overall propagation delay t CKQ .
In particular, comparing the best FOMs of the dividers with only nType DIV2 blocks with input level shifters and the dividers with nType and pType DIV2 blocks without level shifters, it is found that for the latter architecture the FOM is about 65% higher than the other case. Moreover, comparing the best speed cases which allow the same speed performance for the two architectures, we find that he divider with nType and pType DIV2 blocks has a FOM which is close to 40% higher than the one of the topology with only nType DIV2 blocks. Table 7 compares the performance of the proposed dividers with CMOS frequency dividers operating in the multi-GHz range. The comparison shows a very high efficiency for the FMCML dividers exploiting FBB, as highlighted by the FOM. Both the topologies with forward body bias are better than the original topologies in [34,35] without forward body bias both in terms of power consumption and speed. Moreover, especially for the frequency divider architecture using nType and pType DIV2 with FBB, the values are much better even with respect to a true single-phase clock (TSPC) static frequency divider implementation working at lower supply voltage. Frequency performance beyond 15 GHz is achieved, that is, however, lower than the one achieved by many other dividers in Table 7, optimized to work at very high frequencies.

Conclusions
Two frequency divider architectures in the MOS Current Mode Logic suited for ultralow voltage operation are developed, analyzed and compared. In particular, both the architectures are based on the FMCML DFF in which forward body bias is exploited to further reduce the power supply at a value as low as 0.5 V. The adoption of forward body bias improves the original architectures previously proposed by the same authors in [34,35] and allows to achieve better performance both in term of power consumption and speed performance.
Four case studies, the two most significant for each architecture, of a divide-by-16 circuit have been presented referring to a 28 nm FDSOI CMOS process. The results, which are reported in Tables 4 and 5, apparently demonstrate that, unlike when the forward body bias is not used, the dividers with nType and pType DIV2 blocks outperform the other architecture with only nType DIV2 blocks and input level shifters. Moreover, both the architectures, which are very suitable to realize frequency dividers in the 12-17 GHz range with a power consumption lower than 50 µW, show an efficiency in terms of the considered FOM significantly higher than that of the other dividers in the literature (see the results summarized in Table 7).

Conflicts of Interest:
The authors declare no conflict of interest.