<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="review-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Journal of Low Power Electronics and Applications</journal-id>
<journal-title>Journal of Low Power Electronics and Applications</journal-title>
<issn pub-type="epub">2079-9268</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/jlpea1010001</article-id>
<article-id pub-id-type="publisher-id">jlpea-01-00001</article-id>
<article-categories>
<subj-group>
<subject>Review</subject></subj-group></article-categories>
<title-group>
<article-title>Robust and Energy-Efficient Ultra-Low-Voltage Circuit Design under Timing Constraints in 65/45 nm CMOS</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Bol</surname><given-names>David</given-names></name></contrib>
<aff id="af1-jlpea-01-00001">ICTEAM institute, Université catholique de Louvain, Place du Levant 3, Louvain-la-Neuve, Belgium; E-Mail: <email>david.bol@uclouvain.be</email>; Tel.: +32-10472148; Fax: +32-10472598</aff></contrib-group>
<pub-date pub-type="collection">
<year>2011</year></pub-date>
<pub-date pub-type="epub">
<day>25</day>
<month>01</month>
<year>2011</year></pub-date>
<volume>1</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>19</lpage>
<history>
<date date-type="received">
<day>23</day>
<month>11</month>
<year>2010</year></date>
<date date-type="rev-recd">
<day>18</day>
<month>01</month>
<year>2011</year></date>
<date date-type="accepted">
<day>21</day>
<month>01</month>
<year>2011</year></date></history>
<permissions>
<copyright-statement>© 2011 by the author; licensee MDPI, Basel, Switzerland.</copyright-statement>
<copyright-year>2011</copyright-year>
<license>
<p>This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)</p></license></permissions>
<abstract>
<p>Ultra-low-voltage operation improves energy efficiency of logic circuits by a factor of 10×, at the expense of speed, which is acceptable for applications with low-to-medium performance requirements such as RFID, biomedical devices and wireless sensors. However, in 65/45 nm CMOS, variability and short-channel effects significantly harm robustness and timing closure of ultra-low-voltage circuits by reducing noise margins and jeopardizing gate delays. The consequent guardband on the supply voltage to meet a reasonable manufacturing yield potentially ruins energy efficiency. Moreover, high leakage currents in these technologies degrade energy efficiency in case of long stand-by periods. In this paper, we review recently published techniques to design robust and energy-efficient ultra-low-voltage circuits in 65/45 nm CMOS under relaxed yet strict timing constraints.</p></abstract>
<kwd-group>
<kwd>digital CMOS circuits</kwd>
<kwd>ultra-low power</kwd>
<kwd>subthreshold logic</kwd>
<kwd>variability</kwd>
<kwd>leakage currents</kwd>
<kwd>yield</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<title>1.Introduction</title>
<p>Low power consumption is nowadays paramount for digital integrated circuits. High-performance chips such as multi-core processors for servers are power constrained by the die temperature limit and by both the cooling and electricity costs [<xref ref-type="bibr" rid="b1-jlpea-01-00001">1</xref>]. Portable applications such as smart phones obviously have an even tighter power budget for battery life concern, which drove innovation during the last decade in advanced power management techniques [<xref ref-type="bibr" rid="b2-jlpea-01-00001">2</xref>]. Besides these mainstream designs stands another chip category: ultra-low power circuits for applications such as RFID, biomedical devices and sensor networks [<xref ref-type="bibr" rid="b3-jlpea-01-00001">3</xref>]. These application have in common a minute power budget as the circuits should operate either on tiny batteries (&lt;1 cm<sup>3</sup> [<xref ref-type="bibr" rid="b4-jlpea-01-00001">4</xref>]) or harvest energy from their environment [<xref ref-type="bibr" rid="b5-jlpea-01-00001">5</xref>]: from a few nW to hundreds of <italic>μ</italic>W. Fortunately, these applications feature low-to-medium speed requirements with target clock frequencies <italic>f<sub>target</sub></italic> from 10 kHz to 50 MHz, depending on the application and circuit topology. These relaxed speed constraints give room for power savings beyond simple frequency scaling or duty-cycled operation. Indeed, the supply voltage <italic>V<sub>dd</sub></italic> can be scaled down to reduce the energy required to switch on-chip capacitances at each clock cycle 
<inline-formula>
<mml:math id="mm1" display="inline">
<mml:semantics id="sm1">
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>s</mml:mi>
<mml:mi>w</mml:mi></mml:mrow></mml:msub>
<mml:mo>∝</mml:mo>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi>L</mml:mi></mml:msub>
<mml:msubsup>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:semantics></mml:math></inline-formula> as the associated delay penalty is acceptable given the relaxed cycle time <italic>T<sub>cycle</sub></italic> at low-to-medium <italic>f<sub>target</sub></italic>. Ultra-low-voltage operation is the extreme case where <italic>V<sub>dd</sub></italic> is aggressively scaled down to 0.3–0.5 V with potential energy savings above 10× when compared to nominal-<italic>V<sub>dd</sub></italic> operation at 1–1.2 V</p>
<p>Ultra-low-voltage (ULV) operation was proposed in the 1970s [<xref ref-type="bibr" rid="b7-jlpea-01-00001">7</xref>,<xref ref-type="bibr" rid="b8-jlpea-01-00001">8</xref>] and put back in light for digital circuits in 1999 at the <italic>Int. Symp. on Low-Power Electronics and Design</italic> [<xref ref-type="bibr" rid="b9-jlpea-01-00001">9</xref>]. When <italic>V<sub>dd</sub></italic> is reduced to or below the threshold voltage <italic>V<sub>t</sub></italic>, MOSFETs start to operate in near-threshold or subthreshold regime [<xref ref-type="bibr" rid="b8-jlpea-01-00001">8</xref>,<xref ref-type="bibr" rid="b9-jlpea-01-00001">9</xref>]. As the subthreshold <italic>I<sub>on</sub></italic> current is exponentially dependent on <italic>V<sub>dd</sub></italic>, the gate delay dramatically increases. As shown in <xref ref-type="fig" rid="f1-jlpea-01-00001">Figure 1</xref>, it significantly reduces the maximum clock frequency for digital circuits. The resulting <italic>T<sub>cycle</sub></italic> penalty also has a detrimental side effect on the total energy per cycle composed by switching and leakages contributions: <italic>E<sub>cycle</sub></italic> = <italic>E<sub>sw</sub></italic> + <italic>E<sub>leak</sub></italic>. Indeed, the leakage energy increases when reaching subthreshold regime as it results from the integration of leakage power over <italic>T<sub>cycle</sub></italic>: <italic>E<sub>leak</sub></italic> = <italic>V<sub>dd</sub>I<sub>leak</sub></italic> × <italic>T<sub>cycle</sub></italic>. There is thus an optimum supply voltage <italic>V<sub>min</sub></italic>, which minimizes the energy to an <italic>E<sub>min</sub></italic> level [<xref ref-type="bibr" rid="b10-jlpea-01-00001">10</xref>], as depicted in <xref ref-type="fig" rid="f1-jlpea-01-00001">Figure 1</xref>. The <italic>V<sub>min</sub></italic> level is often comprised between 0.25 and 0.5 V depending on the ratio between <italic>E<sub>sw</sub></italic> and <italic>E<sub>leak</sub></italic>, which varies accordingly to circuit parameters and technology characteristics through total leakage current <italic>I<sub>leak</sub></italic>, average switched capacitance per cycle <italic>C<sub>L</sub></italic>, gate delay and number of gates in the critical path [<xref ref-type="bibr" rid="b11-jlpea-01-00001">11</xref>]. This concept known as the minimum-energy point has received a lot of attention in the research community during the last decade [<xref ref-type="bibr" rid="b3-jlpea-01-00001">3</xref>,<xref ref-type="bibr" rid="b12-jlpea-01-00001">12</xref>] with numerous successful ULV chip implementations: microcontrollers for biomedical applications [<xref ref-type="bibr" rid="b13-jlpea-01-00001">13</xref>,<xref ref-type="bibr" rid="b14-jlpea-01-00001">14</xref>] for wireless sensor nodes [<xref ref-type="bibr" rid="b5-jlpea-01-00001">5</xref>,<xref ref-type="bibr" rid="b15-jlpea-01-00001">15</xref>] as well as dedicated ASICs for biomedical applications [<xref ref-type="bibr" rid="b16-jlpea-01-00001">16</xref>,<xref ref-type="bibr" rid="b17-jlpea-01-00001">17</xref>], communication [<xref ref-type="bibr" rid="b18-jlpea-01-00001">18</xref>], image processing [<xref ref-type="bibr" rid="b19-jlpea-01-00001">19</xref>,<xref ref-type="bibr" rid="b20-jlpea-01-00001">20</xref>] or RFIDs [<xref ref-type="bibr" rid="b21-jlpea-01-00001">21</xref>].</p>
<p>Along with this ULV trend, new CMOS technology nodes have been introduced to maintain the historical increase in on-chip device density. Unfortunately in nanometer CMOS technologies, reaching <italic>E<sub>min</sub></italic> in practice raises important challenges because ULV operation magnifies the sensitivity of circuits against MOSFET variability, short-channel effects and leakage currents [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>,<xref ref-type="bibr" rid="b12-jlpea-01-00001">12</xref>]. Several design solutions have recently been proposed to reliably operate nanometer CMOS logic circuits at ultra-low voltage under relaxed yet strict timing constraints: gate length upsize [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>], process flavor [<xref ref-type="bibr" rid="b22-jlpea-01-00001">22</xref>] and MOSFET selection [<xref ref-type="bibr" rid="b23-jlpea-01-00001">23</xref>], circuit adaptation [<xref ref-type="bibr" rid="b22-jlpea-01-00001">22</xref>] and power gating [<xref ref-type="bibr" rid="b24-jlpea-01-00001">24</xref>]. In this paper, we provide for the first time a unified review of:
<list list-type="bullet">
<list-item>
<p>The pitfalls of nanometer ULV circuits limiting their minimum <italic>V<sub>dd</sub></italic> for functional robustness and timing closure;</p></list-item>
<list-item>
<p>The detrimental impact of stand-by periods on energy efficiency;</p></list-item>
<list-item>
<p>The proposed techniques to overcome these limitations.</p></list-item></list></p>
<p>We specifically target 65 and 45 nm CMOS nodes as they share many characteristics: multiple process flavors, std-<italic>κ</italic> oxide/poly-Si gate stack and strained-Si, which give similar behaviors at ultra-low voltage as shown in <xref ref-type="fig" rid="f1-jlpea-01-00001">Figure 1</xref>. To illustrate the findings, we combine chip measurements in 65 nm and simulation results in 45 nm. The results are based on the work carried out in this field at <italic>UCLouvain</italic> and more specifically on papers [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>,<xref ref-type="bibr" rid="b22-jlpea-01-00001">22</xref>,<xref ref-type="bibr" rid="b24-jlpea-01-00001">24</xref>,<xref ref-type="bibr" rid="b25-jlpea-01-00001">25</xref>].</p>
<p>The paper is organized as follows. In Section 2, we recall the impact of CMOS technology scaling on ULV circuits and set up a framework for evaluating energy-efficiency under robustness and timing constraints. We then address the impact of these constraints on the minimum ultra-low <italic>V<sub>dd</sub></italic>: the speed limit and the functional limit in Sections 3 and 4, respectively. Existing solutions are also presented. Section 5 finally deals with the impact of stand-by periods on energy efficiency, given these constraints on minimum <italic>V<sub>dd</sub></italic>.</p></sec>
<sec>
<label>2.</label>
<title>Energy Efficiency of ULV Circuits in Nanometer CMOS Technologies</title>
<p>CMOS technology scaling driven by Moore's law increases MOSFET density on a chip by a factor of two every 18–24 months. This is particularly useful for increasing the functionality of CMOS circuits without increasing die area and thereby by keeping manufacturing costs acceptable. It also boosts speed performances at each technology generation while reducing the energy required to perform a given function [<xref ref-type="bibr" rid="b26-jlpea-01-00001">26</xref>]. ULV circuits for ultra-low-power applications similarly benefit from these enhancements. Indeed, <italic>E<sub>sw</sub></italic> is effectively reduced thanks to lower on-chip capacitances <italic>C<sub>L</sub></italic> while gate delay at ultra-low voltage is improved thanks to a higher subthreshold <italic>I<sub>on</sub></italic> current resulting from the scaled <italic>V<sub>t</sub></italic> [<xref ref-type="bibr" rid="b12-jlpea-01-00001">12</xref>]. This leads to boosted speed performances at the minimum-energy point.</p>
<p>However, CMOS technology scaling also comes with severe drawbacks when reaching nanometer CMOS nodes: leakage currents including subthreshold <italic>I<sub>off</sub></italic> current and gate tunneling leakage [<xref ref-type="bibr" rid="b27-jlpea-01-00001">27</xref>], short-channel effects [<xref ref-type="bibr" rid="b27-jlpea-01-00001">27</xref>] and variability [<xref ref-type="bibr" rid="b28-jlpea-01-00001">28</xref>]. The impact of these nanometer MOSFET effects are magnified at ultra-low voltage [<xref ref-type="bibr" rid="b12-jlpea-01-00001">12</xref>]. A first consequence at device level is a reduction of the effective <italic>I<sub>on</sub>/I<sub>off</sub></italic> ratio due to lower <italic>V<sub>t</sub></italic> values and short-channel increase of the subthreshold swing and of the drain-induced barrier lowering (DIBL) effect. A major consequence at circuit level is on the minimum-energy level <italic>E<sub>min</sub></italic> which stopped scaling from 90 nm node and actually increases significantly at 45 nm node because of the combined effects of subthreshold swing, DIBL, gate leakage and statistical variability [<xref ref-type="bibr" rid="b29-jlpea-01-00001">29</xref>]. Fortunately, this <italic>E<sub>min</sub></italic> increase can be limited by choosing the optimum MOSFET (medium gate length and low <italic>V</italic><sub>t</sub>) within a versatile yet standard CMOS technology menu with good speed performances and negligible area penalty [<xref ref-type="bibr" rid="b23-jlpea-01-00001">23</xref>]. Moreover, fully-depleted Silicon-on-Insulator (SOI) technology can further save 60% of <italic>E<sub>min</sub></italic> [<xref ref-type="bibr" rid="b29-jlpea-01-00001">29</xref>] although this technology is not yet commercially available for industrial circuit design.</p>
<p>Beyond <italic>E<sub>min</sub></italic> scaling trend, a key challenge for ULV circuit design in nanometer CMOS technologies is to reliably operate at the corresponding supply voltage <italic>V<sub>min</sub></italic> of the minimum-energy point. Indeed, as shown in <xref ref-type="fig" rid="f2-jlpea-01-00001">Figure 2(a)</xref>, the minimum <italic>V<sub>dd</sub></italic> for a logic circuit is given by both timing and robustness constraints [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>]. Speed must be sufficient to meet the timing constraint associated with the target frequency <italic>f<sub>target</sub></italic> of the application. The delay of the critical path has to be shorter than the cycle time <italic>T<sub>cycle</sub></italic> = 1/ <italic>f<sub>target</sub></italic>. Moreover, even if safe timing closure is achieved, there is a functional limit <italic>V<sub>limit</sub></italic> on <italic>V<sub>dd</sub></italic>, which is independent from <italic>f<sub>target</sub></italic>. We set up a framework for evaluating energy efficiency under timing and robustness constraints in [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>], illustrated in <xref ref-type="fig" rid="f2-jlpea-01-00001">Figure 2</xref>. This framework shows that the <italic>f<sub>target</sub></italic> range of ultra-low-power applications can be divided into 3 regions for ULV circuits:
<list list-type="bullet">
<list-item>
<p>R1 region where <italic>E<sub>sw</sub></italic> dominates and minimum <italic>V<sub>dd</sub></italic> is speed limited,</p></list-item>
<list-item>
<p>R2 region where <italic>E<sub>leak</sub></italic> dominates and minimum <italic>V<sub>dd</sub></italic> is speed limited,</p></list-item>
<list-item>
<p>R3 region where <italic>E<sub>leak</sub></italic> dominates and minimum <italic>V<sub>dd</sub></italic> is limited by functionality.</p></list-item></list></p>
<p>Within this framework, it is obvious that <italic>E<sub>min</sub></italic> is only reached at one particular clock frequency <italic>f<sub>min</sub></italic> corresponding to a <italic>T<sub>cycle</sub></italic> equal to the critical path delay at <italic>V<sub>min</sub></italic>. <italic>f<sub>min</sub></italic> is in R1 region as <italic>E<sub>leak</sub></italic> accounts for 30% of <italic>E<sub>cycle</sub></italic> at the minimum-energy point. <italic>E<sub>min</sub></italic> can thus only be reached for one particular target frequency. If <italic>f<sub>target</sub></italic> is higher than <italic>f<sub>min</sub></italic>, switching energy is wasted because <italic>V<sub>dd</sub></italic> is higher than <italic>V<sub>min</sub></italic> and, if <italic>f<sub>target</sub></italic> is below <italic>f<sub>min</sub></italic>, leakage energy is wasted because leakage power is integrated over a prohibitively long <italic>T<sub>cycle</sub></italic>. For example, <italic>E<sub>min</sub></italic> of an 8-bit multiplier in a 45 nm LP (Low-Power) CMOS technology is reached at <italic>V<sub>min</sub></italic> = 0.38 V and <italic>f<sub>min</sub></italic> = 630 kHz, as shown in <xref ref-type="fig" rid="f2-jlpea-01-00001">Figure 2</xref>. <italic>E<sub>cycle</sub></italic> is within <italic>E<sub>min</sub></italic> + 10% between 200 kHz and 2 MHz. For <italic>f<sub>target</sub></italic> outside this range, <xref ref-type="fig" rid="f2-jlpea-01-00001">Figure 2</xref> shows that practical energy under robustness and timing constraints can thus significantly differ from <italic>E<sub>min</sub></italic> [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>].</p>
<p>As the minimum-energy point (<italic>V<sub>min</sub>,f<sub>min</sub>,E<sub>min</sub></italic>) varies with technology generations according to technological characteristics [<xref ref-type="bibr" rid="b12-jlpea-01-00001">12</xref>], there is an optimum CMOS technology node for each <italic>f<sub>target</sub></italic> that minimizes <italic>E<sub>cycle</sub></italic> under timing and robustness constraints [<xref ref-type="bibr" rid="b12-jlpea-01-00001">12</xref>,<xref ref-type="bibr" rid="b30-jlpea-01-00001">30</xref>]. However, using an older CMOS technology is not optimum regarding die area and thus high-volume manufacturing costs. For this reason, we focus in this paper on techniques to reliably operate ULV logic at the minimum-energy point in 65/45 nm CMOS technologies.</p>
<p>Finally, let us introduce here that statistical MOSFET variations in nanometer CMOS technologies due to random dopant fluctuations, line edge roughness, oxide thickness variations, etc. have an important impact on energy efficiency. Indeed, these variability sources induce local within-die random <italic>V<sub>t</sub></italic> variations that exponentially affect subthreshold <italic>I<sub>on</sub></italic> and <italic>I<sub>off</sub></italic> [<xref ref-type="bibr" rid="b31-jlpea-01-00001">31</xref>]. The consequences at circuit level are [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>]:
<list list-type="bullet">
<list-item>
<p>A guardband on <italic>T<sub>cycle</sub></italic> or on the minimum <italic>V<sub>dd</sub></italic> for sufficient timing (parametric) yield because worst-case delay of critical paths has to be considered given its large statistical distribution;</p></list-item>
<list-item>
<p>Increase in functional limit <italic>V<sub>limit</sub></italic> voltage to ensure sufficient functional yield for large chips;</p></list-item>
<list-item>
<p>Increase in mean leakage <italic>I<sub>leak</sub></italic> because <italic>I<sub>leak</sub></italic> is a lognormal distribution (exponentially dependent on the normally-distributed <italic>V<sub>t</sub></italic>) with a mean value higher than the typical one.</p></list-item></list></p>
<p>It has further been reported that <italic>E<sub>sw</sub></italic> is also statistically distributed in nanometer ULV circuits because local gate delay distribution introduces random glitches with <italic>E<sub>sw</sub></italic> penalties [<xref ref-type="bibr" rid="b32-jlpea-01-00001">32</xref>]. However, for the sake of simplicity we do not consider this effect in this paper.</p>
<p>As shown in <xref ref-type="fig" rid="f2-jlpea-01-00001">Figure 2</xref>, statistical variability leads to energy penalties. As local variations can hardly be compensated by circuit adaption due to their randomness from a MOSFET to another, it is important to consider statistical variability when designing ULV circuits in nanometer CMOS technologies. In next sections, we review the constraints on minimum <italic>V<sub>dd</sub></italic> to ensure circuit robustness given this high local variability in nanometer CMOS technologies.</p></sec>
<sec>
<label>3.</label>
<title>Speed Limit on <italic>V<sub>dd</sub></italic></title>
<sec>
<label>3.1.</label>
<title>Timing Constraint and the Minimum-Energy Point</title>
<p>The first constraint on minimum <italic>V<sub>dd</sub></italic> is a timing constraint on the critical path delay, which have to be lower than <italic>T<sub>cycle</sub></italic> given by the <italic>f<sub>target</sub></italic> of the application. Typical <italic>f<sub>target</sub></italic> for ultra-low-power circuits ranges from 10 kHz to 50 MHz. As explained in Section 2, minimum energy of ULV circuits can only be reached at a single clock frequency <italic>f<sub>min</sub></italic>. The challenge for the designers is thus to tune the circuit to make <italic>f<sub>min</sub></italic> meet the <italic>f<sub>target</sub></italic> of the application. This can be done by changing the <italic>V<sub>t</sub></italic> of MOSFETs in the circuit. Indeed, reducing <italic>V<sub>t</sub></italic> will exponentially boost speed performances at ultra-low voltage through an exponential increase of subthreshold <italic>I<sub>on</sub></italic> which can be expressed from the subthreshold drain current expression [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>]:
<disp-formula id="FD1">
<label>(1)</label>
<mml:math id="mm2" display="block">
<mml:semantics id="sm2">
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mrow>
<mml:mi>sub</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:mo>×</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn></mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>g</mml:mi>
<mml:mi>s</mml:mi></mml:mrow></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:mtext mathvariant="italic">DIBL</mml:mtext></mml:mrow></mml:msub>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mi>S</mml:mi></mml:mfrac></mml:mrow></mml:msup>
<mml:mo>×</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>U</mml:mi>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:msup></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:semantics></mml:math></disp-formula>where <italic>I<sub>0</sub></italic> is a reference current proportional to the MOSFET size <italic>W</italic>/<italic>L<sub>g</sub></italic> that exponentially depends on <italic>V<sub>t</sub></italic>, <italic>S</italic> is the subthreshold swing, <italic>U<sub>th</sub></italic> the thermal voltage and <italic>η<sub>DIBL</sub></italic> the drain-induced barrier lowering (DIBL) factor. The impact of <italic>V<sub>t</sub></italic> reduction on <italic>E<sub>leak</sub></italic> at a given ultra-low <italic>V<sub>dd</sub></italic> is not significant [<xref ref-type="bibr" rid="b11-jlpea-01-00001">11</xref>]. Indeed, as <italic>I<sub>leak</sub></italic> is often dominated by subthreshold leakage in 65/45 nm CMOS, the exponential <italic>I<sub>leak</sub></italic> increase from a <italic>V<sub>t</sub></italic> reduction through <italic>I</italic><sub>0</sub> parameter is compensated by the shorter critical path delay and thus <italic>T<sub>cycle</sub></italic>, as long as the MOSFETs remain in subthreshold regime:
<disp-formula id="FD2">
<label>(2)</label>
<mml:math id="mm3" display="block">
<mml:semantics id="sm3">
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>leak</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi></mml:mrow></mml:msub>
<mml:mo>×</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mrow>
<mml:mtext>leak</mml:mtext></mml:mrow></mml:msub>
<mml:mo>×</mml:mo>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mtext>cycle</mml:mtext></mml:mrow></mml:msub>
<mml:mo>∝</mml:mo>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi></mml:mrow></mml:msub>
<mml:mo>×</mml:mo>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn></mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:mi>DLBL</mml:mi></mml:mrow></mml:msub>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mi>S</mml:mi></mml:mfrac></mml:mrow></mml:msup>
<mml:mo>×</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>D</mml:mi></mml:msub>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi>L</mml:mi></mml:msub>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>I</mml:mi>
<mml:mn>0</mml:mn></mml:msub>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn></mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:mi>DLBL</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mi>S</mml:mi></mml:mfrac></mml:mrow></mml:msup></mml:mrow></mml:mfrac>
<mml:mo>∝</mml:mo>
<mml:msub>
<mml:mi>L</mml:mi>
<mml:mi>D</mml:mi></mml:msub>
<mml:msub>
<mml:mi>C</mml:mi>
<mml:mi>L</mml:mi></mml:msub>
<mml:msup>
<mml:mrow>
<mml:mn>10</mml:mn></mml:mrow>
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>dd</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mi>S</mml:mi></mml:mfrac></mml:mrow></mml:msup>
<mml:msubsup>
<mml:mi>V</mml:mi>
<mml:mrow>
<mml:mi>d</mml:mi>
<mml:mi>d</mml:mi></mml:mrow>
<mml:mn>2</mml:mn></mml:msubsup></mml:mrow></mml:semantics></mml:math></disp-formula>where <italic>L<sub>D</sub></italic> is the logic depth (number of gates in the critical path) and gate delay is modeled with <italic>CV</italic>/<italic>I</italic> approximation. By changing <italic>I<sub>0</sub></italic> reference current through <italic>V<sub>t</sub></italic> tuning, the voltage <italic>V<sub>min</sub></italic> and energy level <italic>E<sub>min</sub></italic> of the minimum-energy point are thus not modified while <italic>f<sub>min</sub></italic> can be exponentially tuned to make it corresponds to the <italic>f<sub>target</sub></italic> of the application.</p>
<p>Standard nanometer CMOS technologies feature a versatile technology menu with several process flavors targeting different applications: General-purpose (GP) also called generic (G) process targets high-performance applications with short gate delay and relaxed leakage constraints while low-power (LP) process targets portable applications with relaxed speed and tight leakage constraints [<xref ref-type="bibr" rid="b22-jlpea-01-00001">22</xref>]. GP flavor features short gate length, low <italic>V<sub>t</sub></italic> and thin oxide for maximizing <italic>I<sub>on</sub></italic> at nominal <italic>V<sub>dd</sub></italic> whereas LP flavor feature longer gate length and higher <italic>V<sub>t</sub></italic> for subthreshold leakage concern and thicker oxide for gate leakage concern. As a result, subthreshold current varies by several orders of magnitude between GP and LP flavor through <italic>I</italic><sub>0</sub> reference current. <xref ref-type="fig" rid="f3-jlpea-01-00001">Figure 3</xref> illustrates this fact with the measured frequency of 65 nm ring oscillators in GP and LP flavors. At 0.35 V for example, GP flavor frequency (11 MHz) is 125 × higher than LP flavor frequency (88 kHz). Notice that this speed difference is much higher than at nominal 1–1.2 V <italic>V<sub>dd</sub></italic> because of the exponential dependence of subthreshold current on <italic>V<sub>t</sub></italic> at ultra-low voltage.</p>
<p>We thus showed in [<xref ref-type="bibr" rid="b22-jlpea-01-00001">22</xref>] that process flavor selection can effectively be used to operate at the minimum-energy point (<italic>V<sub>min</sub></italic>, <italic>E<sub>min</sub></italic>) for a wide <italic>f<sub>target</sub></italic> range. LP flavor can be used for frequencies between 10 kHz and 1 MHz, and GP flavor can be used for frequencies between 1 and 50 MHz. Moreover, nanometer CMOS technologies feature MOSFETs with two or three different <italic>V<sub>t</sub></italic> values within each flavor. Fine <italic>f<sub>min</sub></italic> tuning to meet <italic>f<sub>target</sub></italic> can thus further be achieved by proper <italic>V<sub>t</sub></italic> selection for the MOSFETs. As shown in <xref ref-type="fig" rid="f3-jlpea-01-00001">Figure 3</xref>, moving from standard-<italic>V<sub>t</sub></italic> (SVT) to low-<italic>V<sub>t</sub></italic> (LVT) devices in 65 nm boosts frequency and thus <italic>f<sub>min</sub></italic> by factors of 5.6× and 1.75× in LP and GP flavors, respectively. The frequency difference between SVT and LVT is lower in GP flavor because at 0.35 V, GP MOSFETs operate more in the near-threshold regime (<italic>V<sub>t</sub></italic> ≈ 350 mV) than in subthreshold regime and the <italic>I<sub>on</sub></italic> dependence on <italic>V<sub>t</sub></italic> is not fully exponential. Let us mention here that the curve of energy <italic>vs. f<sub>target</sub></italic> is quite flat in the vicinity of the minimum-energy point, as shown in <xref ref-type="fig" rid="f2-jlpea-01-00001">Figure 2</xref>. Therefore, once a proper process flavor and <italic>V<sub>t</sub></italic> selection has been performed to bring <italic>f<sub>min</sub></italic> close to <italic>f<sub>target</sub></italic>, fine tuning of <italic>V<sub>dd</sub></italic> by a few tens of mV can be used for meeting exactly the timing constraint with negligible energy overhead [<xref ref-type="bibr" rid="b22-jlpea-01-00001">22</xref>].</p></sec>
<sec>
<label>3.2.</label>
<title>Timing Constraint and Process/Temperature Variations</title>
<p>As MOSFETs in ULV circuits operate in the near- or sub-threshold regime, not only their <italic>I<sub>off</sub></italic> but also their <italic>I<sub>on</sub></italic> current depend exponentially on <italic>V<sub>t</sub></italic> through <italic>I</italic><sub>0</sub> parameter from Equation (1). Gate delay is thus very sensitive to <italic>V<sub>t</sub></italic> variations [<xref ref-type="bibr" rid="b31-jlpea-01-00001">31</xref>] coming either from local random variations, global process corners or temperature variations. The frequency distribution of a ring oscillator at 0.3 V on 20 dies in 65 nm LP CMOS is plotted in <xref ref-type="fig" rid="f4-jlpea-01-00001">Figure 4</xref> for three different operating temperatures. This figure also compares the results with simulations at extreme SS (Slow NMOS/Slow PMOS) and FF (Fast NMOS/Fast PMOS) process corners. At 25 °C, the frequency at SS corner is 6.5× lower than typical frequency, which induces a large <italic>T<sub>cycl</sub></italic><sub>e</sub> guardband to ensure sufficient timing (parametric) yield regarding the <italic>f<sub>target</sub></italic> timing constraint. However, we showed in [<xref ref-type="bibr" rid="b25-jlpea-01-00001">25</xref>] that the main concern regarding timing constraint in ULV circuits comes from low-temperature operation. Indeed, low-temperature operation dramatically reduces subthreshold <italic>I<sub>on</sub></italic> due to <italic>V<sub>t</sub></italic> increase and subthreshold swing reduction. The measured impact of a −40 °C operation on speed is a 8.5× delay increase at 0.3 V. The <italic>T<sub>cycle</sub></italic> guardband to ensure safe timing closure over the standard temperature range from −40 to +85 °C is thus more important than the guardband for handling global process variations. This obviously implies energy penalties as minimum <italic>V<sub>dd</sub></italic> for speed constraint has to be increased to handle low-temperature operation. The simulated combined effect of SS corner and −40 °C operation on speed is a degradation of gate delay by a factor of 40×. Notice that the speed of ULV circuits in GP flavor suffer less from process/temperature variations [<xref ref-type="bibr" rid="b25-jlpea-01-00001">25</xref>]. Indeed, their near-threshold operation limits the exponential dependence of gate delay on <italic>V<sub>t</sub></italic>.</p>
<p>Although local variations have a strong impact on gate delay as mentioned in Section 2, the consequence on speed performances is smaller than the effect of global process/temperature variations. Indeed, gate delay variability is averaged out over the high number of gates in critical paths [<xref ref-type="bibr" rid="b31-jlpea-01-00001">31</xref>] and the guardband on <italic>T<sub>cycle</sub></italic> is thus reduced. For example, simulations of the 8-bit multiplier from Section 2 in 45 nm LP at 0.3 V show a 3<italic>σ</italic> worst-case delay due to local variations 2.3 × higher than the typical delay. This is further mitigated by the use of an upsized <italic>L<sub>g</sub></italic> required to improve noise margins, as will be explained in Section 4.1.</p>
<p>In order to limit <italic>T<sub>cycle</sub></italic> guardbands and <italic>E<sub>cycle</sub></italic> penalties due to process/temperature variations, adaptive techniques can be used. Assuming that clock frequency is fixed at <italic>f<sub>target</sub></italic> by the application, adaptation can be achieved through either <italic>V<sub>dd</sub></italic> scaling or body biasing. We showed in [<xref ref-type="bibr" rid="b22-jlpea-01-00001">22</xref>] that adaptive body biasing is potentially more energy-efficient than adaptive voltage scaling as it exactly compensates <italic>V<sub>t</sub></italic> variations while the circuit is constantly operated at <italic>V<sub>min</sub>.</italic> However, adaptive body biasing raises practical implementation issues. Indeed, the body bias voltages to compensate process/temperature variations are quite high in 65/45 nm CMOS technologies due to the vanishing body effect in short-channel thin-oxide MOSFETs [<xref ref-type="bibr" rid="b22-jlpea-01-00001">22</xref>]. Measurement results of ring oscillators in 65 nm LP CMOS at 0.3 V show that forward body biasing by 300 mV only reduces gate delay by a factor of 5×, which is not sufficient when compared to the 8.5× delay increase due to −40 °C operation only. Adaptive voltage scaling is more efficient to mitigate delay increase at low temperature. <xref ref-type="fig" rid="f5-jlpea-01-00001">Figure 5</xref> shows the measured minimum <italic>V<sub>dd</sub></italic> to keep the delay constant <italic>vs.</italic> temperature. A 75 mV <italic>V<sub>dd</sub></italic> boost is capable of fully compensating the −40 °C delay increase. This comes at the expense of a 50% <italic>E<sub>sw</sub></italic> penalty at such a low temperature.</p></sec></sec>
<sec>
<label>4.</label>
<title>Functional Limits on <italic>V<sub>dd</sub></italic></title>
<sec>
<label>4.1.</label>
<title>Noise Margin Constraint</title>
<p>When <italic>V<sub>dd</sub></italic> is reduced from 1–1.2V to ultra-low values, the <italic>I<sub>on</sub></italic> reduction leads to lower <italic>I<sub>on</sub></italic>/<italic>I<sub>off</sub></italic> ratio for subthreshold MOSFETs. The impact on ULV logic is not only a speed penalty but also a strong reduction of noise margins [<xref ref-type="bibr" rid="b33-jlpea-01-00001">33</xref>,<xref ref-type="bibr" rid="b34-jlpea-01-00001">34</xref>]. The vanishing noise margins can lead to soft errors due to a higher sensitivity to transient noise from crosstalk [<xref ref-type="bibr" rid="b35-jlpea-01-00001">35</xref>] or radiations [<xref ref-type="bibr" rid="b36-jlpea-01-00001">36</xref>]. In 65/45 nm CMOS, local <italic>V<sub>t</sub></italic> variations further degrade output logic levels of ULV circuits, which can even lead to hard “stuck-at” faults and thus a functional failure of several manufactured chips [<xref ref-type="bibr" rid="b33-jlpea-01-00001">33</xref>,<xref ref-type="bibr" rid="b37-jlpea-01-00001">37</xref>]. When the number of gates in a circuit increases, the probability of a hard fault increases and the minimum <italic>V<sub>dd</sub></italic> for functionality <italic>V<sub>limit</sub></italic> increases fast. Measurement of 90 nm ring oscillators in [<xref ref-type="bibr" rid="b38-jlpea-01-00001">38</xref>] show that the mean <italic>V<sub>limit</sub></italic> between 1 k and 1 M gates is increased from 0.2 to 0.35 V. Robust ULV operation can thus only be achieved by taking <italic>V<sub>limit</sub></italic> into account, which might significantly degrade energy efficiency if <italic>V<sub>limit</sub></italic> gets close to the minimum-energy voltage <italic>V<sub>min</sub></italic>.</p>
<p>A convenient way to evaluate noise margins of ULV logic was proposed in [<xref ref-type="bibr" rid="b33-jlpea-01-00001">33</xref>] with the simulation of a NAND gate cross-coupled with a NOR gate, similarly to SRAM static noise margin extraction. This benchmark circuit represents an infinite chain of alternating NAND/NOR gates, which is a worst case regarding noise margins as the NAND (resp. NOR) gate features the highest <italic>V<sub>ih</sub></italic> (resp. lowest <italic>V<sub>il</sub></italic>) level with the highest <italic>V<sub>ol</sub></italic> (resp. lowest <italic>V<sub>oh</sub></italic>) due to stacking of on transistors and parallel combination of off transistors [<xref ref-type="bibr" rid="b33-jlpea-01-00001">33</xref>]. Let us mention that precise noise margin extraction for a given circuit can be performed according to the method from [<xref ref-type="bibr" rid="b39-jlpea-01-00001">39</xref>] but for the sake of generality, we stick to the NAND/NOR method in this paper. <xref ref-type="fig" rid="f6-jlpea-01-00001">Figure 6(a)</xref> shows the noise margins of ULV logic in 45 nm LP technology from statistical Monte-Carlo simulation with this benchmark circuit at 0.3 and 0.4 V. The wide noise margin distribution implies that many gates with low noise margins exhibit a high susceptibility to transient noise. The probability of gates with a negative noise margin is even not null, which means that hard errors might be encountered in a large ULV chip with many gates. At 0.4 V, noise margins are higher, which decreases the susceptibility to transient noise and the probability of hard errors but might also degrade energy efficiency.</p>
<p>In order to reliably operate at <italic>V<sub>min</sub></italic>, several techniques have been proposed to increase noise margins and thereby improve <italic>V<sub>limit</sub></italic>. In [<xref ref-type="bibr" rid="b33-jlpea-01-00001">33</xref>], the authors propose to upsize the transistor width of critical gates to improve their resilience to local <italic>V<sub>t</sub></italic> variations and thereby limit their worst-case noise margins. However, this comes at the cost of energy penalties due to high <italic>CL</italic> and <italic>I<sub>leak</sub></italic> in the circuit [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>]. A widespread technique for robust ULV operation consists in the restriction of logic gates from the standard-cell library [<xref ref-type="bibr" rid="b20-jlpea-01-00001">20</xref>]. Indeed, gates with large transistors stacks or a large number of parallel branches such as NAND/NOR gates with 4 inputs feature worse noise margins. Eliminating these cells for ULV operation is thus very efficient to improve circuit robustness at the cost of slight area overhead. Another solution was proposed in [<xref ref-type="bibr" rid="b40-jlpea-01-00001">40</xref>,<xref ref-type="bibr" rid="b41-jlpea-01-00001">41</xref>]: <italic>V<sub>t</sub></italic> balancing also called adaptive <italic>β</italic> ratio. This technique can be used to match the subthreshold current between NMOS and PMOS devices in case of “crossed” process corners with slow NMOS/fast PMOS or the opposite. Implemented with an adaptive body biasing scheme, this technique can only address global process variations as the area overhead for compensating statistical local variations would be unacceptable. Therefore, this technique significantly improves nominal noise margins but is not capable of mitigating local noise margin variations.</p>
<p>We showed in [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>,<xref ref-type="bibr" rid="b12-jlpea-01-00001">12</xref>] that both the degradation of the subthreshold swing and the increase of DIBL factor due to short-channel effects in nanometer CMOS technologies threatens ULV circuit robustness by further degrading output logic levels and thereby increasing <italic>V<sub>limit</sub></italic>. Therefore, upsizing the gate length <italic>L<sub>g</sub></italic> of MOSFETs in ULV logic is able to significantly improve noise margins [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>,<xref ref-type="bibr" rid="b12-jlpea-01-00001">12</xref>]. As shown in <xref ref-type="fig" rid="f6-jlpea-01-00001">Figure 6(b)</xref>, an upsize of the drawn <italic>L<sub>g</sub></italic> by 20 nm in 45 nm LP CMOS tightens noise margins distributions. The impact on functional die yield is computed for 0.3 V logic circuits with a varying number of gates <italic>N<sub>gates</sub></italic> from 1 k to 1000 M. We constrained the minimum noise margins to 20 mV for robustness against transient noise and extrapolated die yield with a simple model:
<disp-formula id="FD3">
<label>(3)</label>
<mml:math id="mm4" display="block">
<mml:semantics id="sm4">
<mml:mrow>
<mml:msub>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:mi>die</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mi>η</mml:mi>
<mml:mrow>
<mml:mi>gate</mml:mi></mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>gates</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mo>/</mml:mo>
<mml:mn>2</mml:mn></mml:mrow></mml:mrow></mml:msubsup></mml:mrow></mml:semantics></mml:math></disp-formula>with <italic>η<sub>gate</sub></italic> the functional yield with a 20 mV noise margin constraint for the NAND2/NOR2 benchmark circuit (2 gates). Notice that this is quite a pessimistic assumption as it considers a logic circuit with only alternating NAND2/NOR2 gates. The resulting die yield is plotted for both 40 and 60 nm drawn <italic>L<sub>g</sub></italic> in <xref ref-type="fig" rid="f7-jlpea-01-00001">Figure 7</xref>. It shows that the maximum number of logic gates in a circuit for 95% die yield is increased from 15 k at the minimum <italic>L<sub>g</sub></italic> to 4 M logic gates at the upsized <italic>L<sub>g</sub></italic>. This technique is thus very efficient to improve <italic>V<sub>limit</sub></italic> for robust ULV operation. Moreover, it does not bring energy penalty as the <italic>C<sub>L</sub></italic> increase due to an upsized <italic>L<sub>g</sub></italic> is significantly compensated by <italic>E<sub>leak</sub></italic> reduction thanks to reduced subthreshold swing, DIBL and variability [<xref ref-type="bibr" rid="b23-jlpea-01-00001">23</xref>,<xref ref-type="bibr" rid="b29-jlpea-01-00001">29</xref>].</p></sec>
<sec>
<label>4.2.</label>
<title>Hold Time Constraint</title>
<p>As ULV logic features a magnified sensitivity against local <italic>V<sub>t</sub></italic> variations, the statistical distribution of gate delay is quite large. It not only limits speed due to <italic>T<sub>cycle</sub></italic> guardband reported in Section 2 but also threatens functionality of ULV circuits due to potential hold time failures [<xref ref-type="bibr" rid="b37-jlpea-01-00001">37</xref>]. Indeed, local delay variations in the clock tree of ULV circuits might lead to large clock skew values between two branches of the clock tree and short logic paths might thus exhibit timing violations of hold constraint [<xref ref-type="bibr" rid="b42-jlpea-01-00001">42</xref>]. This is a critical point as hold time violations cannot be fixed by relaxing the clock frequency and generate a fault each time the path is triggered. Hold time failures thus sets another limit on the minimum <italic>V<sub>dd</sub></italic> for functionality <italic>V<sub>limit</sub></italic>.</p>
<p>We further showed in [<xref ref-type="bibr" rid="b25-jlpea-01-00001">25</xref>] that low-temperature operation magnifies the sensitivity of ULV gate delay to <italic>V<sub>t</sub></italic> variations because of the steeper subthreshold swing. Low temperature thus increases the probability of hold time violations due to this variability-induced clock skew. As this raises <italic>V<sub>limit</sub></italic> with potential energy penalties, low temperature has to be carefully addressed when checking the timing closure of hold constraints in ULV logic.</p>
<p>Although this problem can be addressed by upsizing the width or length of MOSFETs within the clock tree, it comes with <italic>E<sub>sw</sub></italic> penalty from <italic>C<sub>L</sub></italic> increase because the clock tree has a high activity factor. Another technique was recently proposed in [<xref ref-type="bibr" rid="b42-jlpea-01-00001">42</xref>]. The idea comes from the fact that RC interconnect delays are less important than gate delays at ultra-low voltage [<xref ref-type="bibr" rid="b43-jlpea-01-00001">43</xref>]. Therefore, the distributed buffering of a standard H-type clock tree at each level in the tree can be replaced by a single yet stronger bufferization stage at the clock root without incurring delay penalties within the clock tree. In this case, all leaf flip-flops in the tree share a common buffer stage, which can be composed of several series-connected buffers, and delay variations in this buffer thus do not introduce clock skew. This significantly reduces the probability of hold time failures. For circuits with more than a few kgates, a single bufferization stage might not be practical due to the prohibitively large dimension of buffers. In this case, the approach can be extended to a clock tree with a reduced depth of 2–4 buffer stages.</p>
<p>To validate this technique, we measured <italic>V<sub>limit</sub></italic> of two versions of a small logic circuit presented in [<xref ref-type="bibr" rid="b21-jlpea-01-00001">21</xref>]: one version with a standard distributed clock tree bufferization and a second with a single bufferization stage at the clock root: two large series-connected buffers. The <italic>V<sub>limit</sub></italic> histograms are plotted in <xref ref-type="fig" rid="f8-jlpea-01-00001">Figure 8</xref>. The use of a single bufferization stage allows safe operation down to 0.23 V, whereas several dies of the circuit with standard distributed bufferization fail below 0.5 V. Let us recall here that ULV circuits in GP process flavor exhibit less delay variations as MOSFETs operate in near-threshold regime. They are thus less sensitive to variability-induced hold time violations.</p></sec></sec>
<sec>
<label>5.</label>
<title>Energy Efficiency and Stand-By Periods</title>
<p>Many ultra-low-power applications such as data logging in environmental [<xref ref-type="bibr" rid="b44-jlpea-01-00001">44</xref>] or biomedical [<xref ref-type="bibr" rid="b5-jlpea-01-00001">5</xref>] domains typically operate on a duty-cycled basis with long stand-by. Power consumed in stand-by mode degrades energy efficiency by adding an overhead to the effective energy per active cycle <italic>E<sub>cycle</sub></italic> [<xref ref-type="bibr" rid="b45-jlpea-01-00001">45</xref>]. When assuming ideal clock gating for eliminating switching power during stand-by periods, the effective <italic>E<sub>cycle</sub></italic> can be expressed as [<xref ref-type="bibr" rid="b45-jlpea-01-00001">45</xref>]:
<disp-formula id="FD4">
<label>(4)</label>
<mml:math id="mm5" display="block">
<mml:semantics id="sm5">
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>cycle</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>act</mml:mi></mml:mrow></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>stb</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>act</mml:mi></mml:mrow></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>leak</mml:mi></mml:mrow></mml:msub>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mi>cycle</mml:mi></mml:mrow></mml:msub>
<mml:mo>×</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>α</mml:mi>
<mml:mrow>
<mml:mi>duty</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>α</mml:mi>
<mml:mrow>
<mml:mi>duty</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:semantics></mml:math></disp-formula>where <italic>α<sub>duty</sub></italic> is the duty cycle <italic>i.e.</italic>, the percentage of time that the circuit spends in active mode. As illustrated in <xref ref-type="fig" rid="f9-jlpea-01-00001">Figure 9</xref>, an <italic>α<sub>duty</sub></italic> of 0.1% increases the effective <italic>E<sub>cycle</sub></italic> by a factor of 220× at the minimum-energy point.</p>
<p>To mitigate the <italic>E<sub>cycle</sub></italic> overhead, <italic>P<sub>leak</sub></italic> can be reduced either with an active-mode reduction technique or with a sleep-mode reduction technique [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>]. Active-mode leakage reduction techniques relies on a <italic>V<sub>t</sub></italic> increase either globally in the whole circuit or selectively in gates from non-critical paths. At ultra-low voltage, a global <italic>V<sub>t</sub></italic> increase induces an exponential delay increase that requires a subsequent <italic>V<sub>dd</sub></italic> increase to maintain speed. If the <italic>V<sub>t</sub></italic> was already properly selected for making <italic>f<sub>min</sub></italic> meet the <italic>f<sub>target</sub></italic> of the application as proposed in Section 3, a global <italic>V<sub>t</sub></italic> assignment will make the minimum <italic>V<sub>dd</sub></italic> for speed deviate from <italic>V<sub>min</sub></italic> and in turn increase <italic>E<sub>cycle</sub></italic> [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>]. Moreover, a selective <italic>V<sub>t</sub></italic> increase in non-critical paths is not efficient at ultra-low voltages because the exponential delay dependence on <italic>V<sub>t</sub></italic> due to MOSFET subthreshold operation limits the high- <italic>V<sub>t</sub></italic> assignment to a few logic gates in very short paths [<xref ref-type="bibr" rid="b22-jlpea-01-00001">22</xref>]. Besides <italic>V<sub>t</sub></italic> increase, serial operation was proposed in [<xref ref-type="bibr" rid="b46-jlpea-01-00001">46</xref>] to limit the number of gates and thereby reduce <italic>P<sub>leak</sub></italic>. This is an efficient technique which comes at the cost of more complex architectural design. In any case, active-mode leakage reduction techniques can only cut <italic>P<sub>leak</sub></italic> by a factor of 3–10× [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>], which is not sufficient with <italic>α<sub>duty</sub></italic> values below 5%.</p>
<p>Therefore, a sleep-mode leakage reduction technique is preferred. Amongst them, power gating relies on the addition of a high-<italic>V<sub>t</sub></italic> sleep transistor to cut off the leakage path in sleep mode. The effective <italic>E<sub>cycle</sub></italic> can thus be expressed as [<xref ref-type="bibr" rid="b24-jlpea-01-00001">24</xref>]:
<disp-formula id="FD5">
<label>(5)</label>
<mml:math id="mm6" display="block">
<mml:semantics id="sm6">
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>cycle</mml:mi></mml:mrow></mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>act</mml:mi></mml:mrow></mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:mi>sleep</mml:mi></mml:mrow></mml:msub>
<mml:msub>
<mml:mi>T</mml:mi>
<mml:mrow>
<mml:mi>cycle</mml:mi></mml:mrow></mml:msub>
<mml:mo>×</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mi>α</mml:mi>
<mml:mrow>
<mml:mi>duty</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>α</mml:mi>
<mml:mrow>
<mml:mi>duty</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac>
<mml:mo>+</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mi>E</mml:mi>
<mml:mrow>
<mml:mi>wake</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi>u</mml:mi>
<mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:mi>cycles</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:semantics></mml:math></disp-formula> where <italic>P<sub>sleep</sub></italic> is the leakage power in stand-by mode, <italic>E<sub>wake-up</sub></italic> the energy required to wake up from sleep mode and <italic>N<sub>cycles</sub></italic> the number of cycles in active mode between two stand-by periods in sleep mode. Notice that both wake-up and sleep-mode energies are amortized over <italic>N<sub>cycles</sub></italic> to calculate the effective energy per active cycle <italic>E<sub>cycle</sub></italic>. For sleep-mode energy, this is done through the (1 – <italic>α<sub>duty</sub></italic>) /<italic>α<sub>duty</sub></italic> term, which also corresponds to the ratio between cycles in sleep and active modes. Wake-up energy can usually be neglected when <italic>N<sub>cycles</sub></italic> is high (e.g., above 80 cycles in [<xref ref-type="bibr" rid="b24-jlpea-01-00001">24</xref>]). As in nominal-<italic>V<sub>dd</sub></italic> operation, the sleep transistor introduces a series resistance on the supply rails, which degrades ULV logic delay [<xref ref-type="bibr" rid="b45-jlpea-01-00001">45</xref>]. Sizing the sleep transistor thus results from a trade-off between large <italic>P<sub>sleep</sub></italic> reduction for narrow sleep transistors and small delay overhead for wide sleep transistors. Indeed, the delay overhead need a subsequent <italic>V<sub>dd</sub></italic> increase to meet the speed constraint with a subsequent <italic>E<sub>cycle</sub></italic> penalty [<xref ref-type="bibr" rid="b45-jlpea-01-00001">45</xref>]. Moreover, we showed in [<xref ref-type="bibr" rid="b24-jlpea-01-00001">24</xref>] that the series resistance of the sleep transistor also reduces noise margins of ULV logic. The consequence on <italic>V</italic><sub>li</sub><italic><sub>mit</sub></italic> for functional robustness is even worse than on minimum <italic>V<sub>dd</sub></italic> for speed. <xref ref-type="fig" rid="f10-jlpea-01-00001">Figure 10</xref> shows the impact of the sleep transistor sizing on <italic>P<sub>leak</sub></italic> reduction and the noise margin degradation at 0.35 V. A <italic>P<sub>leak</sub></italic> reduction by a factor of 100× reduces the noise margins by more than 50%, which makes ULV logic prone to functional failures. The impact of the sleep transistor on noise margin should thus carefully be addressed when designing a power-gated ULV circuit.</p>
<p>In order to limit this noise margin degradation, we showed in [<xref ref-type="bibr" rid="b24-jlpea-01-00001">24</xref>] that standard-<italic>V<sub>t</sub></italic> (SVT) MOSFETs with an upsized gate length should be preferred as they usually shows better subthreshold characteristics than high-<italic>V<sub>t</sub></italic> MOSFETs in 65/45 nm LP CMOS. As shown in <xref ref-type="fig" rid="f10-jlpea-01-00001">Figure 10</xref>, this optimum sleep transistor in 45 nm LP CMOS degrades the noise margins by less than 20% for a <italic>P<sub>leak</sub></italic> reduction of 100×. These results with optimum sleep transistor further show that power gating is much more efficient than dynamic reverse body biasing in ULV circuits with long stand-by periods as dynamic reverse body biasing only enable 10× <italic>P<sub>leak</sub></italic> reduction [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>].</p></sec>
<sec sec-type="conclusions">
<label>6.</label>
<title>Conclusions</title>
<p>Ultra-low-voltage (ULV) operation between 0.3 and 0.5 V leads to minimum-energy consumption at the expense of speed for ultra-low-power applications. However, ensuring robust and energy-efficient ULV operation in nanometer CMOS technologies raises a number of design challenges due to high short-channel effects, leakage currents and variability of these technologies. In this paper, we reviewed these challenges and the potential circuit solutions, as summarized in <xref ref-type="table" rid="t1-jlpea-01-00001">Table 1</xref>.</p>
<p>First, we set up a general framework for analyzing energy efficiency under timing and robustness constraints for the whole range of target clock frequencies <italic>f<sub>target</sub></italic> in ultra-low-power applications. We specifically took the impact local <italic>V<sub>t</sub></italic> variations into account in this framework through statistical circuit simulation.</p>
<p>We then reported that the frequency of the minimum-energy point <italic>f<sub>min</sub></italic> can significantly differ from <italic>f<sub>target</sub></italic> with large <italic>E<sub>cycle</sub></italic> energy penalties. Process flavor and <italic>V<sub>t</sub></italic> selection in a versatile yet standard CMOS technology menu can be used to operate at the minimum-energy point under the timing constraint of the considered application, <italic>i.e.</italic>, make <italic>f<sub>min</sub></italic> meet <italic>f<sub>target</sub></italic>. We investigated the impact of global process/temperature variations on the timing constraint set by <italic>f<sub>target</sub></italic>. Low-temperature operation was shown to be a primary concern as it dramatically degrades delay and thereby involves large cycle time <italic>T<sub>cycle</sub></italic> guardbands. Adaptive voltage scaling was shown to be able to fix this at reasonable energy penalty.</p>
<p>We then analyzed how the minimum supply voltage for functionality <italic>V<sub>limit</sub></italic> is set by degraded noise margins and variability-induced clock skew. The first phenomenon induces soft errors due to increased noise sensitivity and even hard errors due to “stuck-at” faults. This can be fixed by gate length upsize and restriction of the logic gates within the standard-cell library to only low fan-in gates. The second phenomenon can lead to hold time violations and can be addressed by single-stage bufferization in the clock tree.</p>
<p>We finally analyzed the impact of stand-by periods on effective <italic>E<sub>cycle</sub></italic>. Application with low duty cycles need a leakage reduction technique to reduce leakage power in stand-by mode. Power-gating technique is preferred thanks to its high leakage power reduction capability. However, the addition of the sleep transistor harms noise margins and thereby increases <italic>V<sub>limit</sub></italic>. This side effect can be effectively mitigated by the choice of an optimum sleep transistor.</p></sec></body>
<back>
<sec sec-type="display-objects">
<title>Figures and Table</title>
<fig id="f1-jlpea-01-00001" position="float">
<label>Figure 1.</label>
<caption>
<p>Maximum clock frequency <italic>f<sub>clk</sub></italic> and corresponding energy per cycle <italic>E<sub>cycle</sub></italic> at ultra-low voltage (SPICE simulations of an 8-bit multiplier [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>] in 65 and 45 nm LP CMOS technologies, at 25 °C, nominal results).</p></caption>
<graphic xlink:href="jlpea-01-00001f1.gif"/></fig>
<fig id="f2-jlpea-01-00001" position="float">
<label>Figure 2.</label>
<caption>
<p>Minimum <italic>V<sub>dd</sub></italic> and energy per cycle <italic>E<sub>cycle</sub>vs</italic>. the target frequency of the application <italic>f<sub>target</sub></italic> (SPICE simulations of an 8-bit multiplier [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>] in 45 nm LP CMOS technology, at 25 °C, Monte-Carlo simulations addresses local variations through statistical extraction of worst-case speed and functional limits as well as mean <italic>I<sub>leak</sub></italic>).</p></caption>
<graphic xlink:href="jlpea-01-00001f2.gif"/></fig>
<fig id="f3-jlpea-01-00001" position="float">
<label>Figure 3.</label>
<caption>
<p>Measured speed for different CMOS flavors and <italic>V<sub>t</sub></italic>'s (measurements of 251-stage ring oscillators with FO1 inverters [<xref ref-type="bibr" rid="b25-jlpea-01-00001">25</xref>] in 65 nm LP/GP CMOS technology, at 25 °C, mean frequency of 20 measured dies).</p></caption>
<graphic xlink:href="jlpea-01-00001f3.gif"/></fig>
<fig id="f4-jlpea-01-00001" position="float">
<label>Figure 4.</label>
<caption>
<p>Distribution of maximum frequency with process and temperature variations (measurements of 251-stage ring oscillators with FO1 inverters [<xref ref-type="bibr" rid="b25-jlpea-01-00001">25</xref>] in 65 nm LP CMOS technology with simulation results of global process corners, <italic>L<sub>g</sub></italic> = 60 nm).</p></caption>
<graphic xlink:href="jlpea-01-00001f4.gif"/></fig>
<fig id="f5-jlpea-01-00001" position="float">
<label>Figure 5.</label>
<caption>
<p>Minimum <italic>V<sub>dd</sub></italic> for compensating temperature-induced speed variations (measurements of 251-stage ring oscillators with FO1 inverters [<xref ref-type="bibr" rid="b25-jlpea-01-00001">25</xref>] in 65 nm LP CMOS technology).</p></caption>
<graphic xlink:href="jlpea-01-00001f5.gif"/></fig>
<fig id="f6-jlpea-01-00001" position="float">
<label>Figure 6.</label>
<caption>
<p>Noise margin distribution of ULV logic (SPICE simulations of NAND2/NOR2 gates [<xref ref-type="bibr" rid="b33-jlpea-01-00001">33</xref>] in 45 nm LP CMOS technology, at 25 °C, 1 k Monte-Carlo runs).</p></caption>
<graphic xlink:href="jlpea-01-00001f6.gif"/></fig>
<fig id="f7-jlpea-01-00001" position="float">
<label>Figure 7.</label>
<caption>
<p>Functional yield at 0.3 V with a 20 mV constraint on minimum noise margin (SPICE simulations of NAND2/NOR2 gates in 45 nm LP CMOS technology, at 25 °C, 50 k Monte-Carlo runs with 95% confidence interval plotted).</p></caption>
<graphic xlink:href="jlpea-01-00001f7.gif"/></fig>
<fig id="f8-jlpea-01-00001" position="float">
<label>Figure 8.</label>
<caption>
<p><italic>V<sub>limit</sub></italic> distribution for two versions of a small logic circuit (measurements of an 8-bit AES coprocessor with 3500 gates [<xref ref-type="bibr" rid="b21-jlpea-01-00001">21</xref>] in 65 nm LP CMOS technology, at 25 °C. Hold time violations due to clock tree variability prevent from reliably operating below 0.5 V. The use of a clock tree with a single bufferization stage significantly improves <italic>V<sub>limit</sub></italic> thanks to mitigation of hold time violations.</p></caption>
<graphic xlink:href="jlpea-01-00001f8.gif"/></fig>
<fig id="f9-jlpea-01-00001" position="float">
<label>Figure 9.</label>
<caption>
<p>Impact of stand-by periods on effective energy per cycle <italic>E<sub>cycle</sub></italic> (SPICE simulations of an 8-bit multiplier [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>] 45 nm LP CMOS technology, at 25 °C.</p></caption>
<graphic xlink:href="jlpea-01-00001f9.gif"/></fig>
<fig id="f10-jlpea-01-00001" position="float">
<label>Figure 10.</label>
<caption>
<p>Degradation of noise margins with sleep transistor sizing (SPICE simulations of an 8-bit multiplier for the leakage reduction [<xref ref-type="bibr" rid="b6-jlpea-01-00001">6</xref>] and NAND2/NOR2 circuit for noise margins ([<xref ref-type="bibr" rid="b33-jlpea-01-00001">33</xref>]) 45 nm LP CMOS technology, at 25 °C, sleep transistor width is normalized to the total width of parallel NMOS branches.</p></caption>
<graphic xlink:href="jlpea-01-00001f10.gif"/></fig>
<table-wrap id="t1-jlpea-01-00001" position="float">
<label>Table 1.</label>
<caption>
<p>Design challenges for robust and energy-efficient ULV operation under timing constraints in 65/45 nm CMOS technologies.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="middle">Challenge</th>
<th align="center" valign="middle">Circuit consequence</th>
<th align="center" valign="middle">Preferred solution</th></tr></thead>
<tbody>
<tr>
<td align="center" valign="top">Mismatch between <italic>f<sub>target</sub></italic> and <italic>f<sub>min</sub></italic></td>
<td align="center" valign="top"><italic>E<sub>cycle</sub></italic> penalty</td>
<td align="center" valign="top">Process flavor &amp; <italic>V<sub>t</sub></italic> selection</td></tr>
<tr>
<td align="center" valign="top">Operation at −40 °C</td>
<td align="center" valign="top">Delay increase—<italic>T<sub>cycle</sub></italic> guardband</td>
<td align="center" valign="top">Adaptive voltage scaling</td></tr>
<tr>
<td align="center" valign="top">Degraded noise margins</td>
<td align="center" valign="top">Soft and hard errors—<italic>V<sub>limit</sub></italic> increase</td>
<td align="center" valign="top">Upsized <italic>L<sub>g</sub></italic> &amp; logic gate restriction</td></tr>
<tr>
<td align="center" valign="top">Variability-induced clock skew</td>
<td align="center" valign="top">Hold time violations—<italic>V<sub>limi</sub></italic><sub>t</sub> increase</td>
<td align="center" valign="top">Single-stage clock bufferization</td></tr>
<tr>
<td align="center" valign="top">Long stand-by periods</td>
<td align="center" valign="top">Effective <italic>E<sub>cycle</sub></italic> penalty</td>
<td align="center" valign="top">Power gating with opt. sleep transistor</td></tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top"/>
<td align="left" valign="top"/></tr></tbody></table></table-wrap></sec>
<ack>
<p>Dr. Bol is with UCLouvain as a postdoctoral researcher from the National Foundation for Scientific Research (FNRS) of Belgium. Chip manufacturing was supported by the Walloon region of Belgium under TABLOID and E.USER projects. The author would like to thank C. Hocquet from UCLouvain for his precious help with chip measurements.</p></ack>
<ref-list>
<title>References</title>
<ref id="b1-jlpea-01-00001"><label>1.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bianchini</surname><given-names>R.</given-names></name><name><surname>Rajamony</surname><given-names>R.</given-names></name></person-group><article-title>Power and energy management for server systems</article-title><source>Computer</source><year>2004</year><volume>37</volume><fpage>68</fpage><lpage>76</lpage></citation></ref>
<ref id="b2-jlpea-01-00001"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gammie</surname><given-names>G.</given-names></name><name><surname>Wang</surname><given-names>A.</given-names></name><name><surname>Mair</surname><given-names>H.</given-names></name><name><surname>Lagerquist</surname><given-names>R.</given-names></name><name><surname>Chau</surname><given-names>M.</given-names></name><name><surname>Royannez</surname><given-names>P.</given-names></name><name><surname>Gururajarao</surname><given-names>S.</given-names></name><name><surname>Ko</surname><given-names>U.</given-names></name></person-group><article-title>SmartReflex Power and Performance Management Technologies for 90 nm, 65 nm, and 45 nm Mobile Application Processors</article-title><source>Proc. IEEE</source><year>2010</year><volume>98</volume><fpage>144</fpage><lpage>159</lpage><pub-id pub-id-type="doi">10.1109/JPROC.2009.2034684</pub-id></citation></ref>
<ref id="b3-jlpea-01-00001"><label>3.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>A.</given-names></name><name><surname>Calhoun</surname><given-names>B.</given-names></name><name><surname>Chandrakasan</surname><given-names>A.</given-names></name></person-group><source>Sub-Threshold Design for Ultra-Low-Power Systems</source><publisher-name>Springer</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2006</year></citation></ref>
<ref id="b4-jlpea-01-00001"><label>4.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Warneke</surname><given-names>B.</given-names></name><name><surname>Pister</surname><given-names>K.</given-names></name></person-group><article-title>An ultra-low energy microcontroller for Smart Dust wireless sensor networks</article-title><conf-name>Proceedings of the 2004 IEEE International Solid-State Circuits Conference</conf-name><conf-loc>San Francisco, CA, USA</conf-loc><year>2004</year><fpage>316</fpage><lpage>317</lpage></citation></ref>
<ref id="b5-jlpea-01-00001"><label>5.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>G.</given-names></name><name><surname>Fojtik</surname><given-names>M.</given-names></name><name><surname>Kim</surname><given-names>D.</given-names></name><name><surname>Fick</surname><given-names>D.</given-names></name><name><surname>Park</surname><given-names>J.</given-names></name><name><surname>Seok</surname><given-names>M.</given-names></name><name><surname>Chen</surname><given-names>M.T.</given-names></name><name><surname>Foo</surname><given-names>Z.</given-names></name><name><surname>Sylvester</surname><given-names>D.</given-names></name><name><surname>Blaauw</surname><given-names>D.</given-names></name></person-group><article-title>Millimeter-scale nearly perpetual sensor system with stacked battery and solar cells</article-title><conf-name>Proceedings of the 2010 IEEE International Solid-State Circuits Conference</conf-name><conf-loc>San Francisco, CA, USA</conf-loc><year>2010</year><fpage>288</fpage><lpage>289</lpage></citation></ref>
<ref id="b6-jlpea-01-00001"><label>6.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Bol</surname><given-names>D.</given-names></name><name><surname>Ambroise</surname><given-names>R.</given-names></name><name><surname>Flandre</surname><given-names>D.</given-names></name><name><surname>Legat</surname><given-names>J.D.</given-names></name></person-group><article-title>Analysis and minimization of practical energy in 45nm subthreshold logic circuits</article-title><conf-name>Proceedings of the 2008 IEEE International Conference on Computer Design</conf-name><conf-loc>Lake Tahoe, CA, USA</conf-loc><year>2008</year><fpage>294</fpage><lpage>300</lpage></citation></ref>
<ref id="b7-jlpea-01-00001"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Swanson</surname><given-names>R.</given-names></name><name><surname>Meindl</surname><given-names>J.</given-names></name></person-group><article-title>Ion-implanted complementary MOS transistors in low-voltage circuits</article-title><source>IEEE J. Solid-State Circuits</source><year>1972</year><volume>7</volume><fpage>146</fpage><lpage>153</lpage><pub-id pub-id-type="doi">10.1109/JSSC.1972.1050260</pub-id></citation></ref>
<ref id="b8-jlpea-01-00001"><label>8.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vittoz</surname><given-names>E.</given-names></name><name><surname>Fellrath</surname><given-names>J.</given-names></name></person-group><article-title>CMOS analog integrated circuits based on weak inversion operations</article-title><source>IEEE J. Solid-State Circuits</source><year>1977</year><volume>12</volume><fpage>224</fpage><lpage>231</lpage><pub-id pub-id-type="doi">10.1109/JSSC.1977.1050882</pub-id></citation></ref>
<ref id="b9-jlpea-01-00001"><label>9.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Soeleman</surname><given-names>H.</given-names></name><name><surname>Roy</surname><given-names>K.</given-names></name></person-group><article-title>Ultra-low power digital subthreshold logic circuits</article-title><conf-name>Proceedings of the 1999 ACM/IEEE International Symposium on Low Power Electronics and Design</conf-name><conf-loc>San Diego, CA, USA</conf-loc><year>1999</year><fpage>94</fpage><lpage>96</lpage></citation></ref>
<ref id="b10-jlpea-01-00001"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kao</surname><given-names>J.</given-names></name><name><surname>Miyazaki</surname><given-names>M.</given-names></name><name><surname>Chandrakasan</surname><given-names>A.</given-names></name></person-group><article-title>A 175-mV multiply-accumulate unit using an adaptive supply voltage and body bias architecture</article-title><source>IEEE J. Solid-State Circuits</source><year>2002</year><volume>37</volume><fpage>1545</fpage><lpage>1554</lpage><pub-id pub-id-type="doi">10.1109/JSSC.2002.803957</pub-id></citation></ref>
<ref id="b11-jlpea-01-00001"><label>11.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Calhoun</surname><given-names>B.</given-names></name><name><surname>Wang</surname><given-names>A.</given-names></name><name><surname>Chandrakasan</surname><given-names>A.</given-names></name></person-group><article-title>Modeling and sizing for minimum energy operation in subthreshold circuits</article-title><source>IEEE J. Solid-State Circuit</source><year>2005</year><volume>40</volume><fpage>1778</fpage><lpage>1786</lpage><pub-id pub-id-type="doi">10.1109/JSSC.2005.852162</pub-id></citation></ref>
<ref id="b12-jlpea-01-00001"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bol</surname><given-names>D.</given-names></name><name><surname>Ambroise</surname><given-names>R.</given-names></name><name><surname>Flandre</surname><given-names>D.</given-names></name><name><surname>Legat</surname><given-names>J.D.</given-names></name></person-group><article-title>Interests and limitations of technology scaling for subthreshold logic</article-title><source>IEEE Trans. VLSI Syst.</source><year>2009</year><volume>17</volume><fpage>1508</fpage><lpage>1519</lpage><pub-id pub-id-type="doi">10.1109/TVLSI.2008.2005413</pub-id></citation></ref>
<ref id="b13-jlpea-01-00001"><label>13.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kwong</surname><given-names>J.</given-names></name><name><surname>Ramadass</surname><given-names>Y.</given-names></name><name><surname>Verma</surname><given-names>N.</given-names></name><name><surname>Chandrakasan</surname><given-names>A.</given-names></name></person-group><article-title>A 65 nm sub-<italic>V<sub>t</sub></italic> microcontroller with integrated SRAM and switched capacitor DC-DC converter</article-title><source>IEEE J. Solid-State Circuits</source><year>2009</year><volume>44</volume><fpage>115</fpage><lpage>126</lpage><pub-id pub-id-type="doi">10.1109/JSSC.2008.2007160</pub-id></citation></ref>
<ref id="b14-jlpea-01-00001"><label>14.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Sridhara</surname><given-names>S.</given-names></name><name><surname>DiRenzo</surname><given-names>M.</given-names></name><name><surname>Lingam</surname><given-names>S.</given-names></name><name><surname>Lee</surname><given-names>S.J.</given-names></name><name><surname>Blazquez</surname><given-names>R.</given-names></name><name><surname>Maxey</surname><given-names>J.</given-names></name><name><surname>Ghanem</surname><given-names>S.</given-names></name><name><surname>Lee</surname><given-names>Y.H.</given-names></name><name><surname>Abdallah</surname><given-names>R.</given-names></name><name><surname>Singh</surname><given-names>P.</given-names></name><name><surname>Goe</surname><given-names>M.</given-names></name></person-group><article-title>Microwatt embedded processor platform for medical system-on-chip applications</article-title><conf-name>Proceedings of the 2010 IEEE Symposium on VLSI Circuits (VLSIC)</conf-name><conf-loc>Honolulu, HI, USA</conf-loc><year>2010</year><fpage>15</fpage><lpage>16</lpage></citation></ref>
<ref id="b15-jlpea-01-00001"><label>15.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Zhai</surname><given-names>B.</given-names></name><name><surname>Nazhandali</surname><given-names>L.</given-names></name><name><surname>Olson</surname><given-names>J.</given-names></name><name><surname>Reeves</surname><given-names>A.</given-names></name><name><surname>Minuth</surname><given-names>M.</given-names></name><name><surname>Helfand</surname><given-names>R.</given-names></name><name><surname>Pant</surname><given-names>S.</given-names></name><name><surname>Blaauw</surname><given-names>D.</given-names></name><name><surname>Austin</surname><given-names>T.</given-names></name></person-group><article-title>A 2.60 pJ/Inst Subthreshold Sensor Processor for Optimal Energy Efficiency</article-title><conf-name>Proceedings of the 2006 IEEE Symposium on VLSI Circuits (VLSIC)</conf-name><conf-loc>Honolulu, HI, USA</conf-loc><year>2006</year></citation></ref>
<ref id="b16-jlpea-01-00001"><label>16.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname><given-names>C.I.</given-names></name><name><surname>Soeleman</surname><given-names>H.</given-names></name><name><surname>Roy</surname><given-names>K.</given-names></name></person-group><article-title>Ultra-low-power DLMS adaptive filter for hearing aid applications</article-title><source>IEEE Trans. VLSI Syst.</source><year>2003</year><volume>11</volume><fpage>1058</fpage><lpage>1067</lpage><pub-id pub-id-type="doi">10.1109/TVLSI.2003.819573</pub-id></citation></ref>
<ref id="b17-jlpea-01-00001"><label>17.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Jocke</surname><given-names>S.C.</given-names></name><name><surname>Bolus</surname><given-names>J.F.</given-names></name><name><surname>Wooters</surname><given-names>S.N.</given-names></name><name><surname>Jurik</surname><given-names>A.D.</given-names></name><name><surname>Weaver</surname><given-names>A.C.</given-names></name><name><surname>Blalock</surname><given-names>T.N.</given-names></name><name><surname>Calhoun</surname><given-names>B.H.</given-names></name></person-group><article-title>A 2.6-<italic>μW</italic> sub-threshold mixed-signal ECG SoC</article-title><conf-name>Proceedings of the 2009 IEEE Symposium on VLSI Circuits (VLSIC)</conf-name><conf-loc>Kyoto, Japan</conf-loc><year>2009</year><fpage>60</fpage><lpage>61</lpage></citation></ref>
<ref id="b18-jlpea-01-00001"><label>18.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Sze</surname><given-names>V.</given-names></name><name><surname>Chandrakasan</surname><given-names>A.</given-names></name></person-group><article-title>A 0.4-V UWB baseband processor</article-title><conf-name>Proceedings of the 2007 ACM/IEEE International Symposium on Low Power Electronics and Design</conf-name><conf-loc>Portland, OR, USA</conf-loc><year>2007</year><fpage>262</fpage><lpage>267</lpage></citation></ref>
<ref id="b19-jlpea-01-00001"><label>19.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sze</surname><given-names>V.</given-names></name><name><surname>Finchelstein</surname><given-names>D.</given-names></name><name><surname>Sinangil</surname><given-names>M.</given-names></name><name><surname>Chandrakasan</surname><given-names>A.</given-names></name></person-group><article-title>A 0.7-V 1.8-mWH.264/AVC 720p Video Decoder</article-title><source>IEEE J. Solid-State Circuits</source><year>2009</year><volume>44</volume><fpage>2943</fpage><lpage>2956</lpage><pub-id pub-id-type="doi">10.1109/JSSC.2009.2028933</pub-id></citation></ref>
<ref id="b20-jlpea-01-00001"><label>20.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pu</surname><given-names>Y.</given-names></name><name><surname>Pineda de Gyvez</surname><given-names>J.</given-names></name><name><surname>Corporaal</surname><given-names>H.</given-names></name><name><surname>Ha</surname><given-names>Y.</given-names></name></person-group><article-title>An Ultra-Low-Energy Multi-Standard JPEG Co-Processor in 65 nm CMOS With Sub/Near Threshold Supply Voltage</article-title><source>IEEE J. Solid-State Circuits</source><year>2010</year><volume>45</volume><fpage>668</fpage><lpage>680</lpage><pub-id pub-id-type="doi">10.1109/JSSC.2009.2039684</pub-id></citation></ref>
<ref id="b21-jlpea-01-00001"><label>21.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hocquet</surname><given-names>C.</given-names></name><name><surname>Kamel</surname><given-names>D.</given-names></name><name><surname>Regazzoni</surname><given-names>F.</given-names></name><name><surname>Legat</surname><given-names>J.D.</given-names></name><name><surname>Flandre</surname><given-names>D.</given-names></name><name><surname>Bol</surname><given-names>D.</given-names></name><name><surname>Standaert</surname><given-names>F.-X.</given-names></name></person-group><article-title>Harvesting the potential of nano-CMOS for lightweight cryptography: An ultra-low-voltage 65 nm AES coprocessor for passive RFID tags</article-title><source>J. Cryptogr. Eng.</source><year>2011</year><volume>1</volume><fpage>8</fpage></citation></ref>
<ref id="b22-jlpea-01-00001"><label>22.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Bol</surname><given-names>D.</given-names></name><name><surname>Flandre</surname><given-names>D.</given-names></name><name><surname>Legat</surname><given-names>J.D.</given-names></name></person-group><article-title>Technology flavor selection and adaptive techniques for timing-constrained 45 nm subthreshold circuits</article-title><conf-name>Proceedings of the 2009 ACM/IEEE International Symposium on Low Power Electronics and Design</conf-name><conf-loc>San Francisco, CA, USA</conf-loc><year>2009</year><fpage>21</fpage><lpage>26</lpage></citation></ref>
<ref id="b23-jlpea-01-00001"><label>23.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Bol</surname><given-names>D.</given-names></name><name><surname>Kamel</surname><given-names>D.</given-names></name><name><surname>Flandre</surname><given-names>D.</given-names></name><name><surname>Legat</surname><given-names>J.D.</given-names></name></person-group><article-title>Nanometer MOSFET effects on the minimum-energy point of 45 nm subthreshold logic</article-title><conf-name>Proceedings of the 2009 ACM/IEEE International Symposium on Low Power Electronics and Design</conf-name><conf-loc>San Francisco, CA, USA</conf-loc><year>2009</year><fpage>3</fpage><lpage>8</lpage></citation></ref>
<ref id="b24-jlpea-01-00001"><label>24.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Bol</surname><given-names>D.</given-names></name><name><surname>Hocquet</surname><given-names>C.</given-names></name><name><surname>Flandre</surname><given-names>D.</given-names></name><name><surname>Legat</surname><given-names>J.D.</given-names></name></person-group><article-title>Robustness-aware sleep transistor engineering for power-gated nanometer subthreshold circuits</article-title><conf-name>Proceedings of the 2010 IEEE International Symposium on Circuits and System</conf-name><conf-loc>Paris, France</conf-loc><year>2010</year><fpage>1484</fpage><lpage>1487</lpage></citation></ref>
<ref id="b25-jlpea-01-00001"><label>25.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Bol</surname><given-names>D.</given-names></name><name><surname>Hocquet</surname><given-names>C.</given-names></name><name><surname>Flandre</surname><given-names>D.</given-names></name><name><surname>Legat</surname><given-names>J.D.</given-names></name></person-group><article-title>The detrimental impact of negative Celsius temperature on ultra-low-voltage CMOS logic</article-title><conf-name>Proceedings of the 2010 European Solid-State Circuit Research Conference</conf-name><conf-loc>Seville, Spain</conf-loc><year>2010</year><fpage>522</fpage><lpage>525</lpage></citation></ref>
<ref id="b26-jlpea-01-00001"><label>26.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dennard</surname><given-names>R.</given-names></name><name><surname>Gaensslen</surname><given-names>F.</given-names></name><name><surname>Rideout</surname><given-names>V.</given-names></name><name><surname>Bassous</surname><given-names>E.</given-names></name><name><surname>LeBlanc</surname><given-names>A.</given-names></name></person-group><article-title>Design of ion-implanted MOSFET's with very small physical dimensions</article-title><source>IEEE J. Solid-State Circuits</source><year>1974</year><volume>9</volume><fpage>256</fpage><lpage>268</lpage><pub-id pub-id-type="doi">10.1109/JSSC.1974.1050511</pub-id></citation></ref>
<ref id="b27-jlpea-01-00001"><label>27.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roy</surname><given-names>K.</given-names></name><name><surname>Mukhopadhyay</surname><given-names>S.</given-names></name><name><surname>Mahmoodi-Meimand</surname><given-names>H.</given-names></name></person-group><article-title>Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits</article-title><source>Proc. IEEE</source><year>2003</year><volume>91</volume><fpage>305</fpage><lpage>327</lpage><pub-id pub-id-type="doi">10.1109/JPROC.2002.808156</pub-id></citation></ref>
<ref id="b28-jlpea-01-00001"><label>28.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Asenov</surname><given-names>A.</given-names></name><name><surname>Brown</surname><given-names>A.</given-names></name><name><surname>Davies</surname><given-names>J.</given-names></name><name><surname>Kaya</surname><given-names>S.</given-names></name><name><surname>Slavcheva</surname><given-names>G.</given-names></name></person-group><article-title>Simulation of intrinsic parameter fluctuations in decananometer and nanometer-scale MOSFETs</article-title><source>IEEE Trans. Electron Device</source><year>2003</year><volume>50</volume><fpage>1837</fpage><lpage>1852</lpage><pub-id pub-id-type="doi">10.1109/TED.2003.815862</pub-id></citation></ref>
<ref id="b29-jlpea-01-00001"><label>29.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bol</surname><given-names>D.</given-names></name><name><surname>Flandre</surname><given-names>D.</given-names></name><name><surname>Legat</surname><given-names>J.D.</given-names></name></person-group><article-title>Nanometer MOSFET Effects on the Minimum-Energy Point of Sub-45 nm Subthreshold Logic—Mitigation at Technology and Circuit Levelsc</article-title><source>ACM Trans. Design Autom. Electr. Syst.</source><year>2010</year><volume>16</volume><fpage>2</fpage><lpage>26</lpage></citation></ref>
<ref id="b30-jlpea-01-00001"><label>30.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Seok</surname><given-names>M.</given-names></name><name><surname>Sylvester</surname><given-names>D.</given-names></name><name><surname>Blaauw</surname><given-names>D.</given-names></name></person-group><article-title>Optimal technology selection for minimizing energy and variability in low voltage applications</article-title><conf-name>Proceedings of the 2008 ACM/IEEE International Symposium on Low Power Electronics and Design</conf-name><conf-loc>Bangalore, India</conf-loc><year>2008</year><fpage>9</fpage><lpage>14</lpage></citation></ref>
<ref id="b31-jlpea-01-00001"><label>31.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Zhai</surname><given-names>B.</given-names></name><name><surname>Hanson</surname><given-names>S.</given-names></name><name><surname>Blaauw</surname><given-names>D.</given-names></name><name><surname>Sylvester</surname><given-names>D.</given-names></name></person-group><article-title>Analysis and mitigation of variability in subthreshold design</article-title><conf-name>Proceedings of the 2005 ACM/IEEE International Symposium on Low Power Electronics and Design</conf-name><conf-loc>San Diego, CA, USA</conf-loc><year>2005</year><fpage>20</fpage><lpage>25</lpage></citation></ref>
<ref id="b32-jlpea-01-00001"><label>32.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Kamel</surname><given-names>D.</given-names></name><name><surname>Hocquet</surname><given-names>C.</given-names></name><name><surname>Standaert</surname><given-names>F.X.</given-names></name><name><surname>Flandre</surname><given-names>D.</given-names></name><name><surname>Bol</surname><given-names>D.</given-names></name></person-group><article-title>Glitch-induced within-die variations of dynamic energy in voltage-scaled nano-CMOS circuits</article-title><conf-name>Proceedings of the 2010 European Solid-State Circuit Research Conference</conf-name><conf-loc>Seville, Spain</conf-loc><year>2010</year><fpage>518</fpage><lpage>521</lpage></citation></ref>
<ref id="b33-jlpea-01-00001"><label>33.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Kwong</surname><given-names>J.</given-names></name><name><surname>Chandrakasan</surname><given-names>A.</given-names></name></person-group><article-title>Variation-Driven Device Sizing for Minimum Energy Sub-threshold Circuits</article-title><conf-name>Proceedings of the 2006 ACM/IEEE International Symposium on Low Power Electronics and Design</conf-name><conf-loc>Tegernsee, Germany</conf-loc><year>2006</year><fpage>8</fpage><lpage>13</lpage></citation></ref>
<ref id="b34-jlpea-01-00001"><label>34.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Alioto</surname><given-names>M.</given-names></name></person-group><article-title>Understanding DC Behavior of Subthreshold CMOS Logic Through Closed-Form Analysis</article-title><source>IEEE Trans. Circuits Syst. I</source><year>2010</year><volume>57</volume><fpage>1597</fpage><lpage>1607</lpage><pub-id pub-id-type="doi">10.1109/TCSI.2009.2034233</pub-id></citation></ref>
<ref id="b35-jlpea-01-00001"><label>35.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Nanua</surname><given-names>M.</given-names></name><name><surname>Blaauw</surname><given-names>D.</given-names></name></person-group><article-title>Investigating Crosstalk in Sub-Threshold Circuits</article-title><conf-name>Proceedings of the 8th International Symposium on Quality Electronic Design</conf-name><conf-loc>San Jose, CA, USA</conf-loc><year>2007</year><fpage>639</fpage><lpage>646</lpage></citation></ref>
<ref id="b36-jlpea-01-00001"><label>36.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dhillon</surname><given-names>Y.</given-names></name><name><surname>Diril</surname><given-names>A.</given-names></name><name><surname>Chatterjee</surname><given-names>A.</given-names></name><name><surname>Singh</surname><given-names>A.</given-names></name></person-group><article-title>Analysis and optimization of nanometer CMOS circuits for soft-error tolerance</article-title><source>IEEE Trans. VLSI Syst.</source><year>2006</year><volume>14</volume><fpage>514</fpage><lpage>524</lpage><pub-id pub-id-type="doi">10.1109/TVLSI.2006.876104</pub-id></citation></ref>
<ref id="b37-jlpea-01-00001"><label>37.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Verma</surname><given-names>N.</given-names></name><name><surname>Kwong</surname><given-names>J.</given-names></name><name><surname>Chandrakasan</surname><given-names>A.</given-names></name></person-group><article-title>Nanometer MOSFET variation in minimum energy subthreshold circuits</article-title><source>IEEE Trans. Electron Device</source><year>2008</year><volume>55</volume><fpage>163</fpage><lpage>174</lpage><pub-id pub-id-type="doi">10.1109/TED.2007.911352</pub-id></citation></ref>
<ref id="b38-jlpea-01-00001"><label>38.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Niiyama</surname><given-names>T.</given-names></name><name><surname>Piao</surname><given-names>Z.</given-names></name><name><surname>Ishida</surname><given-names>K.</given-names></name><name><surname>Murakata</surname><given-names>M.</given-names></name><name><surname>Takamiya</surname><given-names>M.</given-names></name><name><surname>Sakurai</surname><given-names>T.</given-names></name></person-group><article-title>Increasing minimum operating voltage (VDDmin) with number of CMOS logic gates and experimental verification with up to 1 Mega-stage ring oscillators</article-title><conf-name>Proceedings of the 2008 ACM/IEEE International Symposium on Low Power Electronics and Design</conf-name><conf-loc>Bangalore, India</conf-loc><year>2008</year><fpage>117</fpage><lpage>122</lpage></citation></ref>
<ref id="b39-jlpea-01-00001"><label>39.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Pu</surname><given-names>Y.</given-names></name><name><surname>de Gyvez</surname><given-names>J.</given-names></name><name><surname>Corporaal</surname><given-names>H.</given-names></name><name><surname>Ha</surname><given-names>Y.</given-names></name></person-group><article-title>Statistical noise margin estimation for sub-threshold combinational circuits</article-title><conf-name>Proceedings of the 13th Asia and South Pacific Design Automation Conference</conf-name><conf-loc>Seoul, Korea</conf-loc><year>2008</year><fpage>176</fpage><lpage>179</lpage></citation></ref>
<ref id="b40-jlpea-01-00001"><label>40.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Pu</surname><given-names>Y.</given-names></name><name><surname>de Jesus Pineda de Gyvez</surname><given-names>J.</given-names></name><name><surname>Corporaal</surname><given-names>H.</given-names></name><name><surname>Ha</surname><given-names>Y.</given-names></name></person-group><article-title>Vt balancing and device sizing towards high yield of sub-threshold static logic gates</article-title><conf-name>Proceedings of the 2007 ACM/IEEE International Symposium on Low Power Electronics and Design</conf-name><conf-loc>Portland, OR, USA</conf-loc><year>2007</year><fpage>355</fpage><lpage>358</lpage></citation></ref>
<ref id="b41-jlpea-01-00001"><label>41.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hwang</surname><given-names>M.E.</given-names></name><name><surname>Roy</surname><given-names>K.</given-names></name></person-group><article-title>ABRM: Adaptive <italic>β</italic>-Ratio Modulation for Process-Tolerant Ultradynamic Voltage Scaling</article-title><source>IEEE Trans. VLSI Syst.</source><year>2010</year><volume>18</volume><fpage>281</fpage><lpage>290</lpage><pub-id pub-id-type="doi">10.1109/TVLSI.2008.2010767</pub-id></citation></ref>
<ref id="b42-jlpea-01-00001"><label>42.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Seok</surname><given-names>M.</given-names></name><name><surname>Blaauw</surname><given-names>D.</given-names></name><name><surname>Sylvester</surname><given-names>D.</given-names></name></person-group><article-title>Clock network design for ultra-low power applications</article-title><conf-name>Proceedings of the 2010 ACM/IEEE International Symposium on Low Power Electronics and Design</conf-name><conf-loc>Austin, DX, USA</conf-loc><year>2010</year><fpage>271</fpage><lpage>276</lpage></citation></ref>
<ref id="b43-jlpea-01-00001"><label>43.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hanson</surname><given-names>S.</given-names></name><name><surname>Zhai</surname><given-names>B.</given-names></name><name><surname>Seok</surname><given-names>M.</given-names></name><name><surname>Cline</surname><given-names>B.</given-names></name><name><surname>Zhou</surname><given-names>K.</given-names></name><name><surname>Singhal</surname><given-names>M.</given-names></name><name><surname>Minuth</surname><given-names>M.</given-names></name><name><surname>Olson</surname><given-names>J.</given-names></name><name><surname>Nazhandali</surname><given-names>L.</given-names></name><name><surname>Austin</surname><given-names>T.</given-names></name><name><surname>Sylvester</surname><given-names>D.</given-names></name><name><surname>Blaauw</surname><given-names>D.</given-names></name></person-group><article-title>Exploring Variability and Performance in a Sub-200-mV Processor</article-title><source>IEEE J. Solid-State Circuits</source><year>2008</year><volume>43</volume><fpage>881</fpage><lpage>891</lpage><pub-id pub-id-type="doi">10.1109/JSSC.2008.917505</pub-id></citation></ref>
<ref id="b44-jlpea-01-00001"><label>44.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hanson</surname><given-names>S.</given-names></name><name><surname>Seok</surname><given-names>M.</given-names></name><name><surname>Lin</surname><given-names>Y.S.</given-names></name><name><surname>Foo</surname><given-names>Z.Y.</given-names></name><name><surname>Kim</surname><given-names>D.</given-names></name><name><surname>Lee</surname><given-names>Y.</given-names></name><name><surname>Liu</surname><given-names>N.</given-names></name><name><surname>Sylvester</surname><given-names>D.</given-names></name><name><surname>Blaauw</surname><given-names>D.</given-names></name></person-group><article-title>A Low-Voltage Processor for Sensing Applications With Picowatt Standby Mode</article-title><source>IEEE J. Solid-State Circuits</source><year>2009</year><volume>44</volume><fpage>1145</fpage><lpage>1155</lpage><pub-id pub-id-type="doi">10.1109/JSSC.2009.2014205</pub-id></citation></ref>
<ref id="b45-jlpea-01-00001"><label>45.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Seok</surname><given-names>M.</given-names></name><name><surname>Hanson</surname><given-names>S.</given-names></name><name><surname>Sylvester</surname><given-names>D.</given-names></name><name><surname>Blauw</surname><given-names>D.</given-names></name></person-group><article-title>Analysis and optimization of sleep modes in subthreshold circuit design</article-title><conf-name>Proceedings of the 44th ACM/IEEE Design Automation Conference</conf-name><conf-loc>San Diego, CA, USA</conf-loc><year>2007</year><fpage>604</fpage><lpage>699</lpage></citation></ref>
<ref id="b46-jlpea-01-00001"><label>46.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Khanna</surname><given-names>S.</given-names></name><name><surname>Calhoun</surname><given-names>B.H.</given-names></name></person-group><article-title>Serial sub-threshold circuits for ultra-low-power systems</article-title><conf-name>Proceedings of the 14th ACM/IEEE International Symposium on Low Power Electronics and Design</conf-name><conf-loc>New York, NY, USA</conf-loc><year>2009</year><fpage>27</fpage><lpage>32</lpage></citation></ref></ref-list></back></article>
