<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">jlpea</journal-id>
      <journal-title>Journal of Low Power Electronics and Applications</journal-title>
      <abbrev-journal-title abbrev-type="publisher">JLPEA</abbrev-journal-title>
      <abbrev-journal-title abbrev-type="pubmed">JLPEA</abbrev-journal-title>
      <issn pub-type="epub">2079-9268</issn>
      <publisher>
        <publisher-name>MDPI</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3390/jlpea2020180</article-id>
      <article-id pub-id-type="publisher-id">jlpea-02-00180</article-id>
      <article-categories>
        <subj-group>
          <subject>Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Timing-Error Detection Design Considerations in Subthreshold: An 8-bit Microprocessor in 65 nm CMOS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Mäkipää</surname>
            <given-names>Jani</given-names>
          </name>
          <xref rid="af1-jlpea-02-00180" ref-type="aff">1</xref>
          <xref rid="c1-jlpea-02-00180" ref-type="corresp">*</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Turnquist</surname>
            <given-names>Matthew J.</given-names>
          </name>
          <xref rid="af2-jlpea-02-00180" ref-type="aff">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Laulainen</surname>
            <given-names>Erkka</given-names>
          </name>
          <xref rid="af2-jlpea-02-00180" ref-type="aff">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Koskinen</surname>
            <given-names>Lauri</given-names>
          </name>
          <xref rid="af2-jlpea-02-00180" ref-type="aff">2</xref>
        </contrib>
      </contrib-group>
      <aff id="af1-jlpea-02-00180"><label>1 </label>VTT Technical Research Centre of Finland, FI-02044 VTT, Finland</aff>
      <aff id="af2-jlpea-02-00180"><label>2 </label>Department of Micro- and Nanosciences, Aalto University, FI-00076 Aalto, Finland; Email: <email>matthew.turnquist@aalto.fi</email> (M.J.T.); <email>elaulain@ecdl.tkk.fi</email> (E.L.); <email>lkoskine@ecdl.tkk.fi</email> (L.K.)</aff>
      <author-notes>
        <corresp id="c1-jlpea-02-00180"><label>*</label> Author to whom correspondence should be addressed; Email: <email>jani.makipää@vtt.fi</email>; Tel.: +358-(0)20-722-111; Fax: +358-(0)20-722-7001. </corresp>
      </author-notes>
      <pub-date pub-type="epub">
        <day>06</day>
        <month>06</month>
        <year>2012</year>
      </pub-date>
      <pub-date pub-type="collection">
        <month>06</month>
        <year>2012</year>
      </pub-date>
      <volume>2</volume>
      <issue>2</issue>
      <fpage>180</fpage>
      <lpage>196</lpage>
      <history>
        <date date-type="received">
          <day>02</day>
          <month>03</month>
          <year>2012</year>
        </date>
        <date date-type="rev-recd">
          <day>29</day>
          <month>05</month>
          <year>2012</year>
        </date>
        <date date-type="accepted">
          <day>30</day>
          <month>05</month>
          <year>2012</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>© 2012 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
        <copyright-year>2012</copyright-year>
        <license xmlns:xlink="http://www.w3.org/1999/xlink" license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/">
          <p>This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p>
        </license>
      </permissions>
      <abstract>
        <p>This paper presents the first known timing-error detection (TED) microprocessor able to operate in subthreshold. Since the minimum energy point (MEP) of static CMOS logic is in subthreshold, there is a strong motivation to design ultra-low-power systems that can operate in this region. However, exponential dependencies in subthreshold, require systems with either excessively large safety margins or that utilize adaptive techniques. Typically, these techniques include replica paths, sensors, or TED. Each of these methods adds system complexity, area, and energy overhead. As a run-time technique, TED is the only method that accounts for both local and global variations. The microprocessor presented in this paper utilizes adaptable error-detection sequential (EDS) circuits that can adjust to process and environmental variations. The results demonstrate the feasibility of the microprocessor, as well as energy savings up to 28%, when using the TED method in subthreshold. The microprocessor is an 8-bit core, which is compatible with a commercial microcontroller. The microprocessor is fabricated in 65 nm CMOS, uses as low as 4.35 pJ/instruction, occupies an area of 50,000 μm<sup>2</sup>, and operates down to 300 mV.</p>
      </abstract>
      <kwd-group>
        <kwd>subthreshold</kwd>
        <kwd>ultra-low-power</kwd>
        <kwd>timing-error detection</kwd>
        <kwd>subthreshold source-coupled logic</kwd>
        <kwd>SCL</kwd>
        <kwd>weak inversion</kwd>
        <kwd>dynamic supply voltage</kwd>
        <kwd>dynamic voltage scaling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec sec-type="intro">
      <title>1. Introduction</title>
      <p>Exploiting the full potential of ubiquitous ambient intelligence, smart sensor networks, and energy-harvesting, requires extremely low power processing. One of the saving graces is that, in many cases, power can be traded for performance, and thus the main target in these systems should be low energy per operation. Targeting low energy per operation, while simultaneously taking advantage of the relaxed performance requirements, can mainly be achieved by using a lower operating voltage. Low energy (and low power) operation extends the operating time of the systems, which reduces maintenance costs, device size, and unit cost. Systems with a small form-factor and low energy operation can also utilize alternate energy sources (e.g., they can harvest energy from body heat). These systems might be deployed in smart sensor network applications where it is cost prohibitive or not feasible to replace batteries [<xref ref-type="bibr" rid="B1-jlpea-02-00180">1</xref>,<xref ref-type="bibr" rid="B2-jlpea-02-00180">2</xref>,<xref ref-type="bibr" rid="B3-jlpea-02-00180">3</xref>].</p>
      <p>In addition to sensor networks, a large number of applications exist that benefit from extremely low energy processing. One application that benefits from low energy processing is a fully autonomous robot capable of learning and adapting. The intelligence behind such robots is likely to be enabled by neuromorphic algorithms [<xref ref-type="bibr" rid="B4-jlpea-02-00180">4</xref>]. Such algorithms are inherently parallelizable, run efficiently on architectures resembling graphics processing units (GPU), and, if parallelized sufficiently, do not require high performance in a single processing element. Therefore, the brain behind a future small autonomous robot could very likely be a massively parallel computing unit running at a low energy point for a single processing node.</p>
      <p>For CMOS static logic technologies down to 45 nm, the minimum energy per operation point (MEP) is achieved in the subthreshold operation region [<xref ref-type="bibr" rid="B3-jlpea-02-00180">3</xref>,<xref ref-type="bibr" rid="B5-jlpea-02-00180">5</xref>], thereby making subthreshold operation a target for the above-mentioned applications. However, design for the subthreshold region is more complicated than for strong inversion. The effects of process, supply voltage, temperature, and aging (PVTA) variance are amplified in the subthreshold region due to the exponential dependency of the subthreshold current on parameters that are susceptible to PVTA variance. Without intelligent design solutions, countering the increased variance effects requires large design margins or individual post-fabrication measurements of the components. In terms of these options, the former negates the minimum energy operation, while the latter increases production costs considerably. Further, in a massively parallel system these measurements would have to be performed separately for each processing node. Otherwise, the system would operate at the speed of the slowest node. In strong inversion, a popular solution for overcoming margining has been to use canary (replica) circuits [<xref ref-type="bibr" rid="B6-jlpea-02-00180">6</xref>]. However, canary circuits cannot compensate for local variations and, therefore, they are not suitable for subthreshold operation.</p>
      <p>To compensate for global and local variations, timing-error detection (TED) can be used [<xref ref-type="bibr" rid="B6-jlpea-02-00180">6</xref>]. By allowing for the detection and correction of timing errors, TED systems are able to reduce the safety margins required to ensure the correct timing under PVTA variations [<xref ref-type="bibr" rid="B6-jlpea-02-00180">6</xref>,<xref ref-type="bibr" rid="B7-jlpea-02-00180">7</xref>,<xref ref-type="bibr" rid="B8-jlpea-02-00180">8</xref>]. Furthermore, TED can be used to mitigate thermal and power supply variations across the chip in massively parallel systems and take into account the effects of ageing without extra effort.</p>
      <p>This paper presents a subthreshold TED microprocessor, which could represent a computation node for a future, massively parallel system. To our knowledge, this is the first known subthreshold TED system. The paper is organized as follows. <xref ref-type="sec" rid="sec2-jlpea-02-00180">Section 2</xref> explains the characteristics and benefits of subthreshold design, describes the motivation behind using TED techniques, and discusses previous works on subthreshold and TED. <xref ref-type="sec" rid="sec3-jlpea-02-00180">Section 3</xref> explains the architecture and operation of the subthreshold TED microprocessor that we designed. <xref ref-type="sec" rid="sec4-jlpea-02-00180">Section 4</xref> provides design and measurement results. Finally, we present our conclusions in <xref ref-type="sec" rid="sec5-jlpea-02-00180">Section 5</xref>.</p>
    </sec>
    <sec id="sec2-jlpea-02-00180">
      <title>2. Background</title>
      <sec>
        <title>2.1. Minimum Energy Point and Subthreshold Operation</title>
        <p>The minimum energy point (MEP) denotes the operating point where the energy per operation is minimized. The energy per operation is composed of switching and leakage energy. Theoretically, the MEP for static CMOS logic depends on the technology [<xref ref-type="bibr" rid="B5-jlpea-02-00180">5</xref>]. For a given technology, the absolute MEP is tied to a certain threshold voltage. For newer technologies, there is typically a choice of devices with different threshold voltages (e.g., high threshold voltage, HVT, or low threshold voltage, LVT). These devices have their own respective MEPs which may differ from the absolute MEP. When the threshold voltage is fixed, the MEP is mainly dependent upon the technology and activity factor. For example, a 90 nm CMOS process has a MEP that ranges from 250 mV to 400 mV depending on the architecture and activity factor [<xref ref-type="bibr" rid="B3-jlpea-02-00180">3</xref>,<xref ref-type="bibr" rid="B5-jlpea-02-00180">5</xref>]. </p>
        <p>The MEP is situated in the subthreshold region for technologies down to 45 nm [<xref ref-type="bibr" rid="B5-jlpea-02-00180">5</xref>]. <xref ref-type="fig" rid="jlpea-02-00180-f001">Figure 1</xref> shows the MEP for a 65 nm process. A ring oscillator, with an activity factor (α) of 0.1, was used to generate the MEP curves. Different process corners change the leakage energy and, thus, change the MEP.</p>
        <fig id="jlpea-02-00180-f001" position="anchor">
          <label>Figure 1</label>
          <caption>
            <p>Simulation of a 65 nm ring oscillator with an activity factor (α) of 0.1. The energy per operation (E/op) is normalized to the SS corner.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jlpea-02-00180-g001.tif"/>
        </fig>
        <p>As shown in <xref ref-type="fig" rid="jlpea-02-00180-f001">Figure 1</xref>, the MEP of 65 nm CMOS lies in the subthreshold region. The functional boolean design of static CMOS gates for the subthreshold region is comparable to a design for the strong inversion region with a few exceptions, which are mainly due to logic level deterioration due to leakage. However, in the subthreshold region, <italic>I<sub>ds</sub></italic> has exponential dependencies [<xref ref-type="bibr" rid="B3-jlpea-02-00180">3</xref>]:</p>
        <p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jlpea-02-00180-i001.tif"/></p>
        <p>where <italic>I<sub>O</sub></italic> is the drain current when <italic>V<sub>gs</sub></italic> = <italic>V<sub>t</sub></italic>, <italic>V<sub>t</sub></italic> is the threshold voltage, <italic>n</italic> is the subthreshold slope factor, and <italic>V<sub>th</sub></italic> is the thermal voltage. As can be seen from Equation (1), PVTA variations cause exponential changes in the subthreshold current (e.g., a change in <italic>V<sub>t</sub></italic> due to process variations). </p>
        <p>To show the impact of the exponential effects of Equation (1), different process corners and temperatures were simulated on an inverter chain. As shown in <xref ref-type="fig" rid="jlpea-02-00180-f002">Figure 2</xref>(a), at 1.2 V, the SS and FF corners are, respectively, 1.26 and 0.78 times the delay at the TT corner. At 0.3 V, the SS and FF corners are, respectively, 2.56 and 0.39 times the delay at the TT corner. Low temperatures further exacerbate the variation impact. For example, the delay is 60 times larger at a voltage of 0.3 V, a temperature of −40 °C, and the SS corner, than at the TT corner. <xref ref-type="fig" rid="jlpea-02-00180-f002">Figure 2</xref>(b) shows the coefficient of variation (σ/µ) for the local variance at 0.3 V and 1.2 V. The σ/µ at 0.3 V is 10 times larger than at 1.2 V. A 1000-point Monte-Carlo is used to generate both <xref ref-type="fig" rid="jlpea-02-00180-f002">Figure 2</xref>(a) and (b). </p>
        <fig id="jlpea-02-00180-f002" position="anchor">
          <label>Figure 2</label>
          <caption>
            <p>(<bold>a</bold>) Relative delay compared to the TT corner at different corner and voltage combinations. The delays at each corner are the mean value of distribution generated from the Monte-Carlo runs; (<bold>b</bold>) Coefficient of variation for the TT and SS corners. </p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jlpea-02-00180-g002.tif"/>
        </fig>
        <p>Several subthreshold processors have been presented previously. In a recent study by Kwong <italic>et al</italic>. a 16-bit processor that is based on the MSP430 microcontroller and built in 65 nm is presented [<xref ref-type="bibr" rid="B9-jlpea-02-00180">9</xref>]. The processor achieves a frequency of 434 kHz and consumes 27.2 pJ/cycle at a <italic>V<sub>dd</sub></italic> of 0.5 V. In another study, Zhai <italic>et al</italic>. present an 8-bit custom ISA processor fabricated in 130 nm [<xref ref-type="bibr" rid="B10-jlpea-02-00180">10</xref>]. The processor achieves a frequency of 833 kHz and consumes 2.6 pJ/instruction at a <italic>V<sub>dd</sub></italic> of 360 mV. </p>
        <p>Typically, the functionality of processors during variations in temperature is not analyzed in modern low voltage processors. Recently, Bol <italic>et al</italic>. addressed the issue of global PVT variation by utilizing a compensation system [<xref ref-type="bibr" rid="B11-jlpea-02-00180">11</xref>]. However, both the study by Kwong <italic>et al</italic>. [<xref ref-type="bibr" rid="B9-jlpea-02-00180">9</xref>] and the one by Zhai <italic>et al</italic>. [<xref ref-type="bibr" rid="B10-jlpea-02-00180">10</xref>] show frequency measurements over temperature but do not comment on the functionality of the circuit during variations in temperature. Prior studies also rarely focus on active variance robustness methods are also slightly commented. In a study by Hanson <italic>et al</italic>. [<xref ref-type="bibr" rid="B12-jlpea-02-00180">12</xref>], body bias is used to achieve variance robustness in 130 nm technology. However, the effect of body bias decreases with smaller process nodes [<xref ref-type="bibr" rid="B2-jlpea-02-00180">2</xref>]. </p>
      </sec>
      <sec id="sec2dot2-jlpea-02-00180">
        <title>2.2. Timing-Error Detection</title>
        <p>Timing-error detection (TED) has been shown to remove PVTA variation-incurred safety margins [<xref ref-type="bibr" rid="B6-jlpea-02-00180">6</xref>,<xref ref-type="bibr" rid="B7-jlpea-02-00180">7</xref>,<xref ref-type="bibr" rid="B13-jlpea-02-00180">13</xref>], which would conventionally guarantee operation across all corners with a sufficient yield. The lower safety margins can then either be turned into power savings (<italic>i.e.</italic>, lower <italic>V<sub>dd</sub></italic> [<xref ref-type="bibr" rid="B13-jlpea-02-00180">13</xref>]) or a higher yield [<xref ref-type="bibr" rid="B6-jlpea-02-00180">6</xref>]. The TED methodology is based on having the system operate at a voltage and frequency point in which the timing of critical paths fails intermittently. The failed timing occurrences are detected and corrected, for example, with an instruction replay system. If the error rate is low enough (e.g., 0.04% in a study by Blaauw <italic>et al</italic>. [<xref ref-type="bibr" rid="B13-jlpea-02-00180">13</xref>]), then an energy consumption benefit is achieved as a result of operating at a lower <italic>V<sub>dd</sub></italic>. If the error rate is too high, the instruction replay portion of the TED system begins to consume too much energy.</p>
        <p>The key component of a TED system is an error-detection sequential (EDS) circuit. EDS circuits generate error signals when the path setup timing fails. This is also known as late signal detection, and it is a well-known synchronization concept. With a TED system, the EDS circuits are placed at critical logic paths where timing errors can occur. When using an EDS circuit, a timing error is flagged when a transition of D occurs in the TED window, as shown in <xref ref-type="fig" rid="jlpea-02-00180-f003">Figure 3</xref>(a). The TED window for the EDS circuits can be tied to the clock signal [<xref ref-type="bibr" rid="B13-jlpea-02-00180">13</xref>], or it can be independently generated [<xref ref-type="bibr" rid="B6-jlpea-02-00180">6</xref>]. </p>
        <p>There are two main types of EDS architectures: a dynamic node [<xref ref-type="bibr" rid="B13-jlpea-02-00180">13</xref>,<xref ref-type="bibr" rid="B14-jlpea-02-00180">14</xref>,<xref ref-type="bibr" rid="B15-jlpea-02-00180">15</xref>] and a delayed shadow latch [<xref ref-type="bibr" rid="B7-jlpea-02-00180">7</xref>,<xref ref-type="bibr" rid="B8-jlpea-02-00180">8</xref>]. Of these architectures, the dynamic node can achieve a lower power and lower clock node capacitance. The dynamic node implementation typically uses an inverter delay chain and a logic gate (e.g., XOR) to produce a signal pulse. The signal pulse, or PULSE, as shown in <xref ref-type="fig" rid="jlpea-02-00180-f003">Figure 3</xref>(a), is used to change the state of a dynamic node and generate a timing error signal. The inverters and logic gates used to produce the PULSE signal require a high level of precision across all PVTA variations, especially at low voltage levels. In addition to being robust, the size of the PULSE should be minimized since it limits the speed of the entire TED system as is further explained in <xref ref-type="sec" rid="sec3dot3-jlpea-02-00180">Section 3.3</xref>. </p>
        <fig id="jlpea-02-00180-f003" position="anchor">
          <label>Figure 3</label>
          <caption>
            <p>(<bold>a</bold>) Basic timing-error detection (TED) operation with a dynamic node-style error-detection sequential (EDS) circuit. The generated PULSE signal is used to flip the state of a dynamic node and generate a timing error, or ERRf [<xref ref-type="bibr" rid="B15-jlpea-02-00180">15</xref>]; (<bold>b</bold>) Block diagram of a TED system. </p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jlpea-02-00180-g003.tif"/>
        </fig>
        <p><xref ref-type="fig" rid="jlpea-02-00180-f003">Figure 3</xref>(b) shows a high-level block diagram of a TED system using EDS circuits rather than normal FFs. The EDS circuits, called <italic>TEDsc</italic>, are explained in more detail in <xref ref-type="sec" rid="sec3dot3-jlpea-02-00180">Section 3.3</xref>. Since the TED window is reserved for detecting timing errors in the previous clock cycle, no signals from the current clock cycle can arrive within the TED window. A signal that propagates too quickly through the combinational logic leads to a false error being flagged. Thus, the minimum delay for the combinational logic is the TED window. To prevent these false errors from being generated because of fast transitions, additional buffers are required as is described in more detail in <xref ref-type="sec" rid="sec3dot4-jlpea-02-00180">Section 3.4</xref>. </p>
        <p>In practice, the minimum and maximum delay are both limited by the design uncertainties rather than by the logical operation. More specifically, the design of the EDS circuit defines two uncertainty regions, during which an error is captured with a finite probability (<xref ref-type="fig" rid="jlpea-02-00180-f004">Figure 4</xref>). Local variation in an EDS circuit results in an uncertainty region at the microprocessor’s clock signal (CLK) positive and negative edges, as shown in <xref ref-type="fig" rid="jlpea-02-00180-f004">Figure 4</xref>. Near the CLK edges, the probability that N EDS circuits in a system would generate a timing error may not be 100% at some positions of the CLK (<italic>i.e.</italic>, uncertainty regions A<sub>2</sub> an A<sub>4</sub>). In <xref ref-type="fig" rid="jlpea-02-00180-f004">Figure 4</xref>, <italic>t<sub>edge</sub></italic><sub>2 </sub>and <italic>t<sub>edge</sub></italic><sub>4</sub> are defined as the position before the positive CLK edge at which the probability of a timing error is 100% and the position before the falling CLK edge at which the probability of detecting a timing error is 0%, respectively. </p>
        <p>Thus, the uncertainty region can be defined as the location within a CLK cycle (T<sub>CLK</sub>) in which the probability of a timing error for N EDS circuits (EDS<sub>0</sub> to EDS<sub>N</sub>) is between 0 and 100%:</p>
        <p><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jlpea-02-00180-i002.tif"/></p>
        <p>A study by Bull <italic>et al</italic>. refers to a similar concept at the positive CLK edge as setup pessimism [<xref ref-type="bibr" rid="B6-jlpea-02-00180">6</xref>]. For new processes or applications with weak to moderate inversion voltage levels, it is essential to understand the size and location of the uncertainty region. Since the uncertainty region is largely determined by the EDS circuit, it needs to be considered at the same time as the EDS design.</p>
        <fig id="jlpea-02-00180-f004" position="anchor">
          <label>Figure 4</label>
          <caption>
            <p>Local variation (e.g., in PULSE) causes the uncertainty region <italic>t<sub>a</sub></italic><sub>2</sub> and <italic>t<sub>a</sub></italic><sub>4</sub>. At the positive microprocessor’s clock signal (CLK) edge, local variation in the PULSE signal makes it possible for <italic>t<sub>CLKh,min</sub></italic> to be met earlier for some EDS circuits than for others.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jlpea-02-00180-g004.tif"/>
        </fig>
      </sec>
    </sec>
    <sec id="sec3-jlpea-02-00180">
      <title>3. Subthreshold TED Microprocessor</title>
      <p>We studied timing error detection in a microprocessor that is capable of subthreshold operation. The central processor unit (CPU) that we implemented had an 8-bit core, which is compatible with a commercial microcontroller. The design was done in VHDL and the entire code was developed in-house for TED design testing purposes. By using an existing instruction set, we were also able to use of a readily available assembler and other software development tools.</p>
      <sec>
        <title>3.1. Architecture</title>
        <p>The architecture of the general purpose processor is an accumulator-based style in which the second operand is always the accumulator register. The processor core is pipelined into three stages: “Fetch”, “Execution”, and “Write”. The instruction memory, which has a size of 256 bytes, resides in a separate block; the size of the block is 256 bytes. Due to design-time resource constraints, we do not consider here the memory design associated with the processor. The memory is designed for functionality and is not optimized in any way.</p>
        <p>As explained in <xref ref-type="sec" rid="sec2-jlpea-02-00180">Section 2</xref>, the EDS-cells are inserted on the critical paths. The three-stage pipeline is configured so that the first stage, “Fetch,” and the last stage, “Write,” are shorter than the “Execute”. Thus, the “Fetch” or the “Write” stages never fail before the “Execute” stage, and only the paths on the “Execute” stage had to be considered as potential candidates for critical paths. This design choice limits both the length of the clock cycle and the number of EDS circuits, and it facilitates the placement of the EDS latches by limiting the critical paths to one pipeline stage of the core. Since the error signals from the EDS circuits are combined using a logical OR tree, this design choice keeps the OR tree shallow. This simplifies the error control, keeps the control delay short, and reduces the control overhead. The study by Bull <italic>et al</italic>. solves this control delay by adding two stages to the pipeline [<xref ref-type="bibr" rid="B6-jlpea-02-00180">6</xref>]; in this study the clock cycle remains unchanged, but the clock cycles per instruction may increase. In the solution presented in this paper, the length of the clock cycle may be limited depending on how balanced the logic is between the pipeline stages of the core. </p>
        <p><xref ref-type="fig" rid="jlpea-02-00180-f005">Figure 5</xref> shows the block diagram of the core. The paths that can generate timing errors are highlighted in red. The core contains a total of 20 EDS circuits; 8 of them are in the accumulator register, 8 of them in the register file write buffer, and 4 of them are used for the arithmetic and logic unit (ALU) flags. The error signal paths are highlighted in blue.</p>
        <p>The design requires more circuit modifications than a conventional design. For example, we inserted buffers on the fastest paths during the place and route stage to ensure that the hold time requirement for the TED error detection window was met. During the "Decode" stage, there are significant modifications to allow for error recovery.</p>
        <fig id="jlpea-02-00180-f005" position="anchor">
          <label>Figure 5</label>
          <caption>
            <p><italic>TEDsc</italic>-enabled subthreshold microprocessor architecture. The timing error signal propagation paths (EPP) are highlighted in blue and the critical paths (CP) in red.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jlpea-02-00180-g005.tif"/>
        </fig>
      </sec>
      <sec>
        <title>3.2. Timing-Error Detection and Recovery</title>
        <p>Both the architecture of the core and the timing constraints set during the synthesis ensure that timing errors can only occur during the “Execution” stage. A timing error occurs when a data signal on a critical path arrives too late to the subsequent EDS data storage element (<italic>i.e.</italic>, the latch). At this point, incorrect values can be written to the accumulator and register file. In addition, the Program Counter (PC) and the stack might be incorrectly updated due to incorrect ALU flags.</p>
        <p>After a timing error, the core needs to be able to restore the previous state using the following methods. First, when a timing error is detected, the system operation is halted by disabling the clocking. Next, the data stored during the previous cycle is restored (<italic>i.e.</italic>, the previous values of the PC, the accumulator, and the last stack push/pop are stored in the data FFs). Thus, the system stage becomes the previous stage. Finally, the failed instruction is re-executed using two clock cycles instead of one to guarantee an error-free operation. After the two clock cycle execution, the normal operation frequency is restored.</p>
        <p>The error signals are not distinguished from each other, but are, instead, combined with one another. Thus, the system does not know which path generated an error. This arrangement is simple and it enables fast operation. With regards to functionality, it is not necessary to know on which path an error occurred.</p>
        <p>Correct TED operation requires that signals do not arrive too early or late with respect to a TED window (<italic>TED<sub>win,N</sub></italic>), since these signals are not accounted for in real time at the system level or within the EDS. A signal that arrives too early has an insufficient delay time and, thus, it incorrectly arrives in the previous TED detection window (<italic>TED<sub>win,N</sub></italic><sub>−1</sub>). In other words, a timing error is incorrectly generated (false positive). False positives are avoided by constructing correctly sized delay buffers. When a signal arrives too late (<italic>i.e.</italic>, at <italic>TED<sub>win,N+</sub></italic><sub>1</sub>), it means that the delay is too large and that an error has not been correctly detected. To avoid these false negatives, timing constraints within the design are implemented to ensure that a signal cannot be delayed too greatly. </p>
      </sec>
      <sec id="sec3dot3-jlpea-02-00180">
        <title>3.3. <italic>TEDsc</italic></title>
        <p><italic>TEDsc</italic> is an EDS circuit [<xref ref-type="fig" rid="jlpea-02-00180-f006">Figure 6</xref> (a)] that uses subthreshold source-coupled logic (STSCL) to detect timing errors [<xref ref-type="bibr" rid="B15-jlpea-02-00180">15</xref>]. Depending on the logic depth, the leakage current, the activity factor, and the operation frequency of a system, STSCL can have several advantages over static CMOS (e.g., tunability, reduced power consumption, and a decreased sensitivity to supply noise [<xref ref-type="bibr" rid="B16-jlpea-02-00180">16</xref>,<xref ref-type="bibr" rid="B17-jlpea-02-00180">17</xref>]). STSCL has been shown to be advantageous for ultra-low-power (ULP) systems. </p>
        <fig id="jlpea-02-00180-f006" position="anchor">
          <label>Figure 6</label>
          <caption>
            <p>(<bold>a</bold>) <italic>TEDsc</italic> circuit [<xref ref-type="bibr" rid="B15-jlpea-02-00180">15</xref>]; (<bold>b</bold>) <italic>TEDsc</italic> timing.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jlpea-02-00180-g006.tif"/>
        </fig>
        <p>An STSCL gate is composed of a network of differential NMOS pairs, an adjustable PMOS load (M<sub>3</sub>,M<sub>4</sub>) with output resistance <italic>R<sub>P</sub></italic>, and an adjustable tail current <italic>I<sub>SS</sub></italic> [<xref ref-type="fig" rid="jlpea-02-00180-f007">Figure 7</xref>(a)]. The NMOS pairs are used to construct logic gates. The voltage swing is defined as <italic>V<sub>SW</sub></italic> = <italic>R<sub>P</sub></italic>·<italic>I<sub>SS</sub></italic>, and it is maintained by dynamically adjusting the size of <italic>R<sub>P</sub></italic> and the magnitude of <italic>I<sub>SS</sub></italic>. Since <italic>I<sub>SS</sub></italic> can be reduced to the pA range, <italic>R<sub>P</sub></italic> needs to be in the GΩ range to achieve a proper <italic>V<sub>SW</sub></italic> (<italic>i.e.</italic>, <italic>V<sub>SW</sub></italic> &gt; 150 mV). By connecting the bulk of the PMOS load devices to the drain, a large <italic>R<sub>P</sub></italic> is achieved without excessively large transistor lengths [<xref ref-type="bibr" rid="B16-jlpea-02-00180">16</xref>,<xref ref-type="bibr" rid="B17-jlpea-02-00180">17</xref>]. </p>
        <p>The size of <italic>R<sub>P</sub></italic> and the magnitude of <italic>I<sub>SS</sub></italic> are both adjusted by the voltage swing control (VSC) block as shown in <xref ref-type="fig" rid="jlpea-02-00180-f007">Figure 7</xref>(b). The VSC decreases the dependence on global variations (e.g., supply noise, temperature fluctuations, and ageing). The VSC ensures a voltage swing greater than 150 mV across all global variations. The VSC for <italic>TEDsc</italic> uses a two-stage, miller-compensated opamp for ASW. The opamp is able to maintain an open loop gain of 40 dB for all the global process corners. The bias voltage (<italic>V<sub>P</sub></italic>) from one VSC can be used for a large number of <italic>TEDsc</italic> gates [<xref ref-type="bibr" rid="B16-jlpea-02-00180">16</xref>].</p>
        <fig id="jlpea-02-00180-f007" position="anchor">
          <label>Figure 7</label>
          <caption>
            <p>(<bold>a</bold>) Subthreshold source-coupled logic (STSCL) circuit; (<bold>b</bold>) Voltage swing control (VSC).</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jlpea-02-00180-g007.tif"/>
        </fig>
        <p>Since <italic>TEDsc</italic> uses STSCL, it has the unique ability to adjust its D-to-timing error delay (D-ERRf delay); this results in an adjustable TED window. This ability to adjust the D-ERRf delay can be explained by first understanding that during a D transition, <italic>TEDsc</italic> requires a minimum amount of charge (<italic>Q<sub>emin</sub></italic>) to move from the dynamic output node in order to induce a differential timing error [<xref ref-type="bibr" rid="B15-jlpea-02-00180">15</xref>]. Reaching <italic>Q<sub>emin</sub></italic> is dependent on <italic>I<sub>TEDsc</sub></italic> and the β-delay that is extended under the CLK high (<italic>i.e.</italic>, <italic>t<sub>βCLK</sub></italic>). For example, when <italic>I<sub>TEDsc</sub></italic> is increased, the TED window is widened at both of the CLK’s edges since the required <italic>t<sub>βCLK</sub></italic> is decreased to meet <italic>Q<sub>emin</sub></italic>.</p>
        <p>The starting point of the TED window (<italic>t<sub>a</sub></italic><sub>2 </sub>+ <italic>t<sub>edge</sub></italic><sub>2</sub> from <xref ref-type="fig" rid="jlpea-02-00180-f004">Figure 4</xref>) has two important implications. First, at the positive CLK edge, an excessively early starting point of the TED window (<italic>i.e.</italic>, (<italic>t<sub>a</sub></italic><sub>2 </sub>+ <italic>t<sub>edge</sub></italic><sub>2</sub>)/T<sub>CLK</sub> is too large) does not allow for the maximum clock frequency to be reached and, thus, the energy consumption is increased. Second, for a flip-flop based pipeline, an overly delayed TED window starting point (<italic>i.e.</italic>, due to a low <italic>I<sub>TEDsc</sub></italic>) does not correctly report all setup time failures as timing errors, which results in a non-functional design. In the presence of large global variation susceptibility, as found in subthreshold, the tunable TED window enables fine tuning on the system level. </p>
        <p>Fine tuning of the TED window is achieved by adjusting <italic>I<sub>TEDsc</sub></italic> within <italic>TEDsc</italic>. To understand how the <italic>I<sub>TEDsc</sub></italic> affects the TED window, three <italic>TEDsc</italic> circuits were measured on the same die. <italic>TEDsc</italic> and VSC used the following settings: <italic>V<sub>dd</sub></italic><sub>,scl</sub> = 400 mV, <italic>V<sub>L</sub></italic> = 200 mV, and <italic>V<sub>dd</sub></italic> = 300 mV. A total of 500 positions of D were applied as input to <italic>TEDsc</italic>. There were 16,384 transitions of D at each of the 500 positions. The duty cycle of the CLK was at 50%. The TED window for <italic>TEDsc</italic> in <xref ref-type="fig" rid="jlpea-02-00180-f008">Figure 8</xref> is located between (Position of D Transition) 250 and 500. <xref ref-type="fig" rid="jlpea-02-00180-f008">Figure 8</xref> shows the error probability of the three <italic>TEDsc</italic> circuits as a function of the D transition. For this measurement, the frequency of the CLK was 10.37 kHz.</p>
        <p>As shown in <xref ref-type="fig" rid="jlpea-02-00180-f008">Figure 8</xref>, by adjusting <italic>I<sub>TEDsc</sub></italic>, <italic>TEDsc</italic> can adjust its D-ERRf delay. This subsequently makes fine tuning of the TED window (and the uncertainty region) possible. For example, to reduce the D-ERRf delay, <italic>I<sub>TEDsc</sub></italic> was increased from 300 pA to 1.56 nA (<xref ref-type="fig" rid="jlpea-02-00180-f008">Figure 8</xref>). In previous designs [<xref ref-type="bibr" rid="B14-jlpea-02-00180">14</xref>,<xref ref-type="bibr" rid="B15-jlpea-02-00180">15</xref>], the uncertainty region and TED window have been fully defined at design time, which is not favorable for weak inversion TED design. Simulations showed an uncertainty region (<italic>i.e.</italic>, A<sub>2</sub>, A<sub>4</sub>) of approximately the same size as found in measurement [<xref ref-type="bibr" rid="B15-jlpea-02-00180">15</xref>].</p>
        <fig id="jlpea-02-00180-f008" position="anchor">
          <label>Figure 8</label>
          <caption>
            <p>(<bold>a</bold>) <italic>I<sub>TEDsc</sub></italic> at 300 pA; (<bold>b</bold>) When <italic>I<sub>TEDsc</sub></italic> is increased to 1.5 nA, the size of the TED window increases.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jlpea-02-00180-g008.tif"/>
        </fig>
        <p>As the microprocessor’s performance is altered by local and global variations, it is essential that the EDS circuit operate correctly and accurately. Through simulations, <italic>TEDsc</italic> was shown to be robust to both local and global variations. Local variations were accounted for by applying Monte-Carlo simulations at each process corner (<italic>i.e.</italic>, TT, FF, SS, SF, and FS). This simulation also showed a robustness to global process corners due to the VSC. Additionally, <italic>TEDsc</italic> showed a correct functionality from −40 °C to 90 °C as a result of the VSC. Using STSCL also reduces the sensitivity of <italic>TEDsc</italic> to changes in the supply voltage [<xref ref-type="bibr" rid="B17-jlpea-02-00180">17</xref>]. In addition, the probability of a fast change in the supply voltage at the exact same time that D transitions is low. To verify this, we applied a sawtooth-wave ripple voltage from 0 to 40 mV and a frequency from 10 MHz to 100 MHz to <italic>TEDsc</italic>; the correct functionality was shown under these ripple conditions.</p>
        <p>The effects of local variations on <italic>TEDsc</italic> are minimized by proper sizing techniques developed by Wang, Calhoun and Chandrakasan [<xref ref-type="bibr" rid="B3-jlpea-02-00180">3</xref>] and Alioto and Leblebici [<xref ref-type="bibr" rid="B18-jlpea-02-00180">18</xref>]. The effects of global variations on <italic>TEDsc</italic> are minimized due the STSCL design choice. As mentioned in <xref ref-type="sec" rid="sec3dot3-jlpea-02-00180">Section 3.3</xref>, STSCL uses the VSC to maintain proper operation during the application of both static and dynamic global variations [<xref ref-type="bibr" rid="B17-jlpea-02-00180">17</xref>]. As mentioned in <xref ref-type="sec" rid="sec2dot2-jlpea-02-00180">Section 2.2</xref>, larger local variations increase the size of <italic>t<sub>a2</sub></italic> and <italic>t<sub>edge</sub></italic><sub>2</sub>. This fundamentally limits the speed of the entire TED system since if (<italic>t<sub>a2</sub></italic> + <italic>t<sub>edge</sub></italic><sub>2</sub>)/<italic>T<sub>CLK</sub></italic> is too large, there is not ample time to detect errors. </p>
      </sec>
      <sec id="sec3dot4-jlpea-02-00180">
        <title>3.4. Implementation of Core 1 and 2</title>
        <p>To compare the benefits of TED, we designed a TED-enabled core (Core 1) and a non-TED core (Core 2). The designs of both cores were fabricated in 65 nm CMOS. The supply voltage range of both designs is from 300 mV to 500 mV, which is at the edge or below the strong inversion region for the process and all the digital cells. However, we optimized <italic>TEDsc</italic> to work deep into subthreshold; the analysis below will only include 300 mV and 400 mV operation points. </p>
        <p>To simplify the design process of Core 1, two power domains were used in the design. The instruction memory and the error propagation path are located within one power domain, while the rest of the design is in a second power domain. The size of the instruction memory is 256 instructions and the size of the register file is 68 bytes. The area of the TED core (without instruction memory) is approximately 50,000 μm<sup>2</sup>. The length of the CLK period is approximately 160 times the FO4 delay. The clock period is limited by the “Execute” stage and EDS design.</p>
        <p>The foundries did not provide digital EDA tool library information for subthreshold operation. To acquire the library’s timing and power information for the EDA tools, we re-characterized the standard cells for subthreshold operation by using the Synopsys library characterization workflow. During the re-characterization process, we used the standard libraries as templates, considered all the timing arcs, and acquired the new timing and power information via analog simulation. The re-characterization process was repeated for the typical, best, and worst corners. The acquired library information was used by the EDA tools in the automated design flow. Due to their sensitivity variation in subthreshold, the smallest gates were removed from the libraries.</p>
        <p>It was not possible to characterize the EDS element and include it to the digital library due to the asynchronous nature of the element’s error signal. Furthermore, the VSC block that generates bias voltages for the <italic>TEDsc</italic> blocks is inherently analog. Therefore, a digital simulation of the full system was not possible. An analog simulation of the system would have been excessively long. Thus, we performed a mixed-mode simulation on the system. The VCS and <italic>TEDsc</italic> blocks were simulated using Spice transistor level models. All of the digital blocks were simulated using the post-layout netlist (including parasitics). Mentor Graphics Questa ADMS was used to perform the mix-mode simulation.</p>
        <p>The die microphotograph of Core 1 (TED) and Core 2 without instruction memory is shown in <xref ref-type="fig" rid="jlpea-02-00180-f009">Figure 9</xref>. Both Cores include all the logic, delays, and buffers. The VSC block and the EDS circuits are also shown in Core 1 (TED).</p>
        <fig id="jlpea-02-00180-f009" position="anchor">
          <label>Figure 9</label>
          <caption>
            <p>(<bold>a</bold>) The microcontroller core with and without TED are shown as Core 1 (TED) and 2, respectively; (<bold>b</bold>) Core 1 (TED) including the VSC and <italic>TEDsc</italic> circuits.</p>
          </caption>
          <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jlpea-02-00180-g009.tif"/>
        </fig>
        <p><xref ref-type="table" rid="jlpea-02-00180-t001">Table 1</xref> shows a comparison of Core 1 and Core 2. The area of Core 2 is approximately 18,000 µm<sup>2</sup>, which is approximately 64% smaller than that of the TED version. For the comparison, the chip I/O compatibility level-shifters present in the subthreshold version are excluded, which gives the total area for the TED version as approximately 50,000 µm<sup>2</sup>. The VSC block occupies an area of approximately 1750 µm<sup>2</sup> in the subthreshold design. It should be noted that in a larger design, the VSC area gets proportionally smaller. The areas of the different blocks were measured so that only the active area occupied by the blocks was taken into account.</p>
        <p>The data in <xref ref-type="table" rid="jlpea-02-00180-t001">Table 1</xref> shows that both the clock delay cells and the buffer cells occupy a substantially larger area than in the nominal voltage design. The number and the area of the logic ports are comparable. The area of the data storage elements is approximately two times larger in the subthreshold design, which can be explained by the fact that the EDS cells in general are larger in area than their conventional style counterparts. This applies especially to the EDS circuits designed for subthreshold operation due to their variation immunity requirements as explained in <xref ref-type="sec" rid="sec3dot3-jlpea-02-00180">Section 3.3</xref>. The area in the table that is unaccounted for is occupied by the decap and antenna protection elements, and in Core 1 by the VSC block. It should be noted that the Core 1 (TED) design has not been optimized area-wise. Also, the I/O port functionality has been excluded from the Core 1 design. This makes the comparison somewhat less favorable for the Core 1 design in terms of the area. Also, the error recovery mechanism modification adds to the logic size slightly. The last columns of the table show the percentage of the area of the Core 1 design compared to the area of the nominal voltage design (Core 2).</p>
        <table-wrap id="jlpea-02-00180-t001" position="anchor">
          <object-id pub-id-type="pii">jlpea-02-00180-t001_Table 1</object-id>
          <label>Table 1</label>
          <caption>
            <p>An area comparison of Core 1 (TED) and Core 2.</p>
          </caption>
          <table rules="all" style="border: solid thin">
            <thead>
              <tr>
                <th rowspan="2" align="center" valign="middle" style="border-left: hidden; border-top: hidden"> </th>
                <th colspan="2" align="center" valign="middle" style="background: #9FD3A4">Core 2 (Total Area ≈ 18,000 µm<sup>2</sup>)</th>
                <th colspan="2" align="center" valign="middle" style="background: #9FD3A4">Core 1 (TED) (Total Area ≈ 50,000 µm<sup>2</sup>)</th>
                <th rowspan="2" align="center" valign="middle" style="background: #9FD3A4">Area of Cells in Core 1 ÷ Area of Cells in Core 2 (
                <italic>i.e.</italic>, % larger area that Core 1 uses than Core 2)</th>
              </tr>
              <tr>
                <th align="center" valign="middle" style="background: #9FD3A4">Number of Cells</th>
                <th align="center" valign="middle" style="background: #9FD3A4">% of the Total Area</th>
                <th align="center" valign="middle" style="background: #9FD3A4">Number of Cells</th>
                <th align="center" valign="middle" style="background: #9FD3A4">% of the Total Area</th>
              </tr>
            </thead>
            <tbody>
              <tr>
                <td align="center" valign="middle" style="background: #9FD3A4">
                  <bold>Buffer Cells</bold>
                </td>
                <td align="center" valign="middle">139</td>
                <td align="center" valign="middle">2.5%</td>
                <td align="center" valign="middle">934</td>
                <td align="center" valign="middle">5%</td>
                <td align="center" valign="middle">538%</td>
              </tr>
              <tr>
                <td align="center" valign="middle" style="background: #9FD3A4">
                  <bold>Clock Buffer Cells</bold>
                </td>
                <td align="center" valign="middle">223</td>
                <td align="center" valign="middle">3.5%</td>
                <td align="center" valign="middle">66</td>
                <td align="center" valign="middle">&lt;1%</td>
                <td align="center" valign="middle">45%</td>
              </tr>
              <tr>
                <td align="center" valign="middle" style="background: #9FD3A4">
                  <bold>Clock Delay Cells</bold>
                </td>
                <td align="center" valign="middle">37</td>
                <td align="center" valign="middle">1%</td>
                <td align="center" valign="middle">1580</td>
                <td align="center" valign="middle">21%</td>
                <td align="center" valign="middle">4644%</td>
              </tr>
              <tr>
                <td align="center" valign="middle" style="background: #9FD3A4">
                  <bold>Data Storage Cells</bold>
                </td>
                <td align="center" valign="middle">777</td>
                <td align="center" valign="middle">35%</td>
                <td align="center" valign="middle">897</td>
                <td align="center" valign="middle">27%</td>
                <td align="center" valign="middle">205%</td>
              </tr>
              <tr>
                <td align="center" valign="middle" style="background: #9FD3A4">
                  <bold>Logic Port Cells</bold>
                </td>
                <td align="center" valign="middle">1942</td>
                <td align="center" valign="middle">45%</td>
                <td align="center" valign="middle">2191</td>
                <td align="center" valign="middle">17%</td>
                <td align="center" valign="middle">108%</td>
              </tr>
              <tr>
                <td align="center" valign="middle" style="background: #9FD3A4">
                  <bold>Filler Cells</bold>
                </td>
                <td align="center" valign="middle">563</td>
                <td align="center" valign="middle">10%</td>
                <td align="center" valign="middle">4756</td>
                <td align="center" valign="middle">19%</td>
                <td align="center" valign="middle">539%</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
      </sec>
    </sec>
    <sec id="sec4-jlpea-02-00180">
      <title>4. Silicon Measurement Results</title>
      <p>Measurements were done using an automated measurement setup that used Labview to manage the measurement instruments. Due to the nature of the effects of PVT variance in subthreshold, the correct start-up values for the supply voltage and operation frequency were known in advance. Thus, the aforementioned parameters were swept to adjust the core to the safe area of operation. This was accomplished by inputting test vectors to the core and monitoring the register dump and the timing error signals.</p>
      <p>The test programs were coded using an assembler and uploaded to the instruction memory using a pattern generator. The error rate was recorded during run-time. After the program execution, the register dump was loaded from the chip, and the dumped register values were compared against known results to verify the correct functionality. </p>
      <p><xref ref-type="table" rid="jlpea-02-00180-t002">Table 2</xref> and <xref ref-type="table" rid="jlpea-02-00180-t003">Table 3</xref> show shmoo plots for Core 1 (TED), running at 300 mV and 400 mV, respectively. The x-axis indicates the CLK operation period (<italic>T<sub>CLK</sub></italic>) and the y-axis indicates the duty cycle (<italic>D<sub>cycle</sub></italic>). The green squares display the duty cycle and frequency pairs in which the circuit is able to operate correctly. As the duty cycle is increased, the size of the TED window is also increased since the amount of minimum delay is directly proportional to the size of the TED window size. Additionally, as the frequency is decreased, the minimum delay requirement increases. </p>
      <p><xref ref-type="table" rid="jlpea-02-00180-t002">Table 2</xref> shows that the maximum usable duty cycle at <italic>V<sub>dd</sub></italic> = 300 mV is 20%–25%. Those values will still give approximately a 50% tuning range of the frequency. As <xref ref-type="table" rid="jlpea-02-00180-t002">Table 2</xref> shows, the circuit does not function more quickly than 2.95 kHz. This limitation is set by the HVT logic speed at the supply voltage of 300 mV.</p>
      <p><xref ref-type="table" rid="jlpea-02-00180-t003">Table 3</xref> shows that the maximum usable duty cycle at <italic>V<sub>dd</sub></italic> = 400 mV is 15%–20%. As <xref ref-type="table" rid="jlpea-02-00180-t003">Table 3</xref> shows, the circuit does not function more quickly than 37.2 kHz. This limitation is set by the HVT logic speed at a supply voltage of 400 mV.</p>
      <table-wrap id="jlpea-02-00180-t002" position="anchor">
        <object-id pub-id-type="pii">jlpea-02-00180-t002_Table 2</object-id>
        <label>Table 2</label>
        <caption>
          <p>Shmoo plot of Core 1 (TED) at <italic>V<sub>dd</sub></italic> = 300 mV. The maximum clock frequency, <italic>f<sub>max</sub></italic><sub>1</sub>, is 2.95 kHz. The green squares (and checkmarks) display the duty cycle and frequency pairs in which the circuit is able to operate correctly.</p>
        </caption>
        <table rules="all" style="border: solid thin">
          <thead>
            <tr align="center">
              <th rowspan="2" colspan="2" align="center" valign="middle" style="border-left: hidden; border-top: hidden"> </th>
              <th colspan="14" align="center" valign="middle">T<sub>CLK</sub></th>
            </tr>
            <tr>
              <th align="center" valign="middle">2000 Hz</th>
              <th align="center" valign="middle">2088 Hz</th>
              <th align="center" valign="middle">2180 Hz</th>
              <th align="center" valign="middle">2276 Hz</th>
              <th align="center" valign="middle">2237 Hz</th>
              <th align="center" valign="middle">2482 Hz</th>
              <th align="center" valign="middle">2591 Hz</th>
              <th align="center" valign="middle">2706 Hz</th>
              <th align="center" valign="middle">2825 Hz</th>
              <th align="center" valign="middle">2950 Hz</th>
              <th align="center" valign="middle">3080 Hz</th>
              <th align="center" valign="middle">3216 Hz</th>
              <th align="center" valign="middle">3358 Hz</th>
              <th align="center" valign="middle">3506 Hz</th>
            </tr>
          </thead>
          <tbody>
            <tr align="center">
              <td rowspan="8" align="center" valign="middle">
                <bold>D<sub>cycle</sub></bold>
              </td>
              <td align="center" valign="middle">
                <bold>2%</bold>
              </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>5%</bold>
              </td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>10%</bold>
              </td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>15%</bold>
              </td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>20%</bold>
              </td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>25%</bold>
              </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>30%</bold>
              </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>35%</bold>
              </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
          </tbody>
        </table>
      </table-wrap>
      <table-wrap id="jlpea-02-00180-t003" position="anchor">
        <object-id pub-id-type="pii">jlpea-02-00180-t003_Table 3</object-id>
        <label>Table 3</label>
        <caption>
          <p>Shmoo plot of Core 1 (TED) at <italic>V<sub>dd</sub></italic> = 400 mV. For this Core, <italic>f<sub>max</sub></italic><sub>1</sub> is 37.2 kHz.</p>
        </caption>
        <table rules="all" style="border: solid thin">
          <thead>
            <tr>
              <th rowspan="2" colspan="2" align="center" valign="middle" style="border-left: hidden; border-top: hidden"> </th>
              <th colspan="14" align="center" valign="middle">T<sub>CLK</sub></th>
            </tr>
            <tr>
              <th align="center" valign="middle">20.0 kHz</th>
              <th align="center" valign="middle">21.7 kHz</th>
              <th align="center" valign="middle">23.4 kHz</th>
              <th align="center" valign="middle">25.2 kHz</th>
              <th align="center" valign="middle">26.9 kHz</th>
              <th align="center" valign="middle">28.6 kHz</th>
              <th align="center" valign="middle">30.3 kHz</th>
              <th align="center" valign="middle">32.1 kHz</th>
              <th align="center" valign="middle">33.8 kHz</th>
              <th align="center" valign="middle">35.5 kHz</th>
              <th align="center" valign="middle">37.2 kHz</th>
              <th align="center" valign="middle">39.0 kHz</th>
              <th align="center" valign="middle">40.7 kHz</th>
              <th align="center" valign="middle">42.4 kHz</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td rowspan="8" align="center" valign="middle">
                <bold>D<sub>cycle</sub></bold>
              </td>
              <td align="center" valign="middle">
                <bold>2%</bold>
              </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>5%</bold>
              </td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>10%</bold>
              </td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>15%</bold>
              </td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>20%</bold>
              </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>25%</bold>
              </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: #66FF33">√</td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>30%</bold>
              </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
            <tr>
              <td align="center" valign="middle">
                <bold>35%</bold>
              </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
              <td align="center" valign="middle" style="background: red"> </td>
            </tr>
          </tbody>
        </table>
      </table-wrap>
      <p>To compare the energy per operation of Core 1 (TED) and Core 2, Core 1 (TED) was first set to a nominal <italic>V<sub>dd</sub></italic> (e.g., 300 mV). At this <italic>V<sub>dd</sub></italic>, or <italic>V<sub>ddCore</sub></italic><sub>1</sub>, the maximum clock frequency (<italic>f<sub>max</sub></italic><sub>1</sub>) was determined as explained in <xref ref-type="table" rid="jlpea-02-00180-t002">Table 2</xref>. In order to guarantee the operation of Core 2 under worst case conditions and at the same frequency as Core 1 (<italic>i.e.</italic>, <italic>f<sub>max</sub></italic><sub>2</sub> = <italic>f<sub>max</sub></italic><sub>1</sub>), a safety margin was found for Core 2. Similarly to other TED implementations [<xref ref-type="bibr" rid="B19-jlpea-02-00180">19</xref>], the safety margin was found from the worst case delay due to a SS process corner, a temperature of −40 °C, and a voltage droop of 10%. This worst case delay required that <italic>V<sub>ddCore</sub></italic><sub>2</sub> be increased from 300 mV to ensure that <italic>f<sub>max</sub></italic><sub>2</sub> = <italic>f<sub>max</sub></italic><sub>1</sub>. This increase in voltage increased the energy per operation of Core 2 relative to Core 1 when operating at <italic>f<sub>max</sub></italic><sub>1</sub>. </p>
      <p><xref ref-type="fig" rid="jlpea-02-00180-f010">Figure 10</xref> shows the energy per operation for both cores. At 300 mV, Core 1 (TED) uses 28% less energy per operation than Core 2. At 400 mV, Core 1 and Core 2 consume approximately the same amount of energy per operation. However, Core 1 still has an advantage considering its ability to compensate for all local and global variations. <xref ref-type="table" rid="jlpea-02-00180-t004">Table 4</xref> shows a summary for Core 1. It is important to note that at 400 mV, the operation speed is more than 10 times faster than at 300 mV while keeping energy consumption per operation essentially the same.</p>
      <fig id="jlpea-02-00180-f010" position="anchor">
        <label>Figure 10</label>
        <caption>
          <p>Energy per operation of both Core 1 (TED) and Core 2.</p>
        </caption>
        <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jlpea-02-00180-g010.tif"/>
      </fig>
      <table-wrap id="jlpea-02-00180-t004" position="anchor">
        <object-id pub-id-type="pii">jlpea-02-00180-t004_Table 4</object-id>
        <label>Table 4</label>
        <caption>
          <p>Summary of the subthreshold-capable microprocessor (Core 1) performance in 65 nm CMOS.</p>
        </caption>
        <table rules="all" style="border: solid thin">
          <thead>
            <tr>
              <th colspan="3" align="center" valign="middle" style="background: #9FD3A4">Core 1 (TED) Summary</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="center" valign="middle">Process Technology</td>
              <td colspan="2" align="center" valign="middle">65 nm CMOS</td>
            </tr>
            <tr>
              <td align="center" valign="middle">Number of 
              <italic>TEDsc’</italic>s</td>
              <td colspan="2" align="center" valign="middle">20</td>
            </tr>
            <tr>
              <td align="center" valign="middle">Clock cycle length (T<sub>CLK</sub>)</td>
              <td colspan="2" align="center" valign="middle">160 FO4</td>
            </tr>
            <tr style="background: #9FD3A4">
              <td align="center" valign="middle">
                <bold>
                  <italic>V<sub>dd</sub></italic>
                </bold>
              </td>
              <td align="center" valign="middle">
                <bold>Clock Frequency</bold>
              </td>
              <td align="center" valign="middle">
                <bold>Energy/Operation (pJ/op)</bold>
              </td>
            </tr>
            <tr>
              <td align="center" valign="middle">300 mV</td>
              <td align="center" valign="middle">2.95 kHz</td>
              <td align="center" valign="middle">4.35 </td>
            </tr>
            <tr>
              <td align="center" valign="middle">400 mV</td>
              <td align="center" valign="middle">37.2 kHz</td>
              <td align="center" valign="middle">4.71 </td>
            </tr>
          </tbody>
        </table>
      </table-wrap>
    </sec>
    <sec sec-type="conclusions" id="sec5-jlpea-02-00180">
      <title>5. Conclusions</title>
      <p>The presented microprocessor proves that timing-error detection (TED) is feasible in subthreshold and that TED reduces the energy per operation. When combined with a start-up algorithm, TED would guarantee a correct operation at time zero even under a wide range of global and local variations. However, the adaptive system presented here was not optimized, especially in terms of area. This was mostly due to the fact that conventional synthesis and place and route design was not optimized for subthreshold and the optimization for subthreshold here was restricted to characterizing the library. Thus, there is still great potential for optimization in the subthreshold TED system in order to further reduce energy consumption and area usage. </p>
    </sec>
  </body>
  <back>
    <ack>
      <title>Acknowledgments</title>
      <p>This work is funded by the Academy of Finland (Projects #124029, #140340, and #13139458), and the Finnish Graduate School of Electronics, Telecommunications, and Automation (GETA).</p>
    </ack>
    <ref-list>
      <title>References</title>
      <ref id="B1-jlpea-02-00180">
        <label>1.</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Shashank</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Inman</surname>
              <given-names>D.</given-names>
            </name>
          </person-group>
          <source>Energy Harvesting Technologies</source>
          <edition>1st</edition>
          <publisher-name>Springer</publisher-name>
          <publisher-loc>New York, NY, USA</publisher-loc>
          <year>2009</year>
        </citation>
      </ref>
      <ref id="B2-jlpea-02-00180">
        <label>2.</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Rabaey</surname>
              <given-names>J.</given-names>
            </name>
          </person-group>
          <article-title>Optimizing Power at Standby: Circuits and Systems</article-title>
          <source>Low Power Design Essentials</source>
          <edition>1st</edition>
          <publisher-name>Springer</publisher-name>
          <publisher-loc>New York, NY, USA</publisher-loc>
          <year>2009</year>
          <comment>Chapter 8</comment>
        </citation>
      </ref>
      <ref id="B3-jlpea-02-00180">
        <label>3.</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Wang</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Calhoun</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Chandrakasan</surname>
              <given-names>A.P.</given-names>
            </name>
          </person-group>
          <source>Sub-Threshold Design for Ultra Low-Power Systems</source>
          <edition>1st</edition>
          <publisher-name>Springer</publisher-name>
          <publisher-loc>New York, NY, USA</publisher-loc>
          <year>2006</year>
        </citation>
      </ref>
      <ref id="B4-jlpea-02-00180">
        <label>4.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Versace</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Chandler</surname>
              <given-names>B.</given-names>
            </name>
          </person-group>
          <article-title>The brain of a new machine</article-title>
          <source>IEEE Spectr.</source>
          <year>2010</year>
          <volume>12</volume>
          <fpage>30</fpage>
          <lpage>37</lpage>
          <pub-id pub-id-type="doi">10.1109/MSPEC.2010.5644776</pub-id>
        </citation>
      </ref>
      <ref id="B5-jlpea-02-00180">
        <label>5.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Bol</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Ambroise</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Flandre</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Legat</surname>
              <given-names>J.</given-names>
            </name>
          </person-group>
          <article-title>Interests and Limitations of Technology Scaling for Subthreshold Logic</article-title>
          <source>IEEE Trans. Very Large Scale Integr. Syst.</source>
          <year>2009</year>
          <volume>17</volume>
          <fpage>1508</fpage>
          <lpage>1519</lpage>
          <pub-id pub-id-type="doi">10.1109/TVLSI.2008.2005413</pub-id>
        </citation>
      </ref>
      <ref id="B6-jlpea-02-00180">
        <label>6.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Bull</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Das</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Shivashankar</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Dasika</surname>
              <given-names>G.</given-names>
            </name>
            <name>
              <surname>Flautner</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Blaauw</surname>
              <given-names>D.</given-names>
            </name>
          </person-group>
          <article-title>A power-efficient 32 bit ARM processor using timing-error detection and correction for transient-error tolerance and adaptation to PVT variation</article-title>
          <source>IEEE J. Solid State Circ.</source>
          <year>2011</year>
          <volume>46</volume>
          <fpage>18</fpage>
          <lpage>31</lpage>
          <pub-id pub-id-type="doi">10.1109/JSSC.2010.2079410</pub-id>
        </citation>
      </ref>
      <ref id="B7-jlpea-02-00180">
        <label>7.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Bowman</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Tschanz</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Lu</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Aseron</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Khellah</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Raychowdhury</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Geuskens</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Tokunaga</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Wilkerson</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Karnik</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>De</surname>
              <given-names>V.</given-names>
            </name>
          </person-group>
          <article-title>A 45 nm resilient microprocessor core for dynamic variation tolerance</article-title>
          <source>IEEE J. Solid State Circuit</source>
          <year>2011</year>
          <volume>46</volume>
          <fpage>194</fpage>
          <lpage>208</lpage>
          <pub-id pub-id-type="doi">10.1109/JSSC.2010.2089657</pub-id>
        </citation>
      </ref>
      <ref id="B8-jlpea-02-00180">
        <label>8.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Crop</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Krimer</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Moezzi-Madani</surname>
              <given-names>N.</given-names>
            </name>
            <name>
              <surname>Pawlowski</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Ruggeri</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Chiang</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Erez</surname>
              <given-names>M.</given-names>
            </name>
          </person-group>
          <article-title>Error detection and recovery techniques for variation-aware cmos computing: A comprehensive review</article-title>
          <source>J. Low Power Electron. Appl.</source>
          <year>2011</year>
          <volume>1</volume>
          <fpage>334</fpage>
          <lpage>356</lpage>
          <pub-id pub-id-type="doi">10.3390/jlpea1030334</pub-id>
        </citation>
      </ref>
      <ref id="B9-jlpea-02-00180">
        <label>9.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Kwong</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Ramadass</surname>
              <given-names>Y.K.</given-names>
            </name>
            <name>
              <surname>Verma</surname>
              <given-names>N.</given-names>
            </name>
            <name>
              <surname>Chandrakasan</surname>
              <given-names>A.P.</given-names>
            </name>
          </person-group>
          <article-title>A 65 nm sub-<italic>V<sub>t</sub></italic> microcontroller with integrated SRAM and switched capacitor DC-DC converter</article-title>
          <source>IEEE J. Solid State Circ.</source>
          <year>2009</year>
          <volume>44</volume>
          <fpage>115</fpage>
          <lpage>126</lpage>
          <pub-id pub-id-type="doi">10.1109/JSSC.2008.2007160</pub-id>
        </citation>
      </ref>
      <ref id="B10-jlpea-02-00180">
        <label>10.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Zhai</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Pant</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Nazhandali</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Hanson</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Olson</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Reeves</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Minuth</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Helfand</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Austin</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Sylvester</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Blaauw</surname>
              <given-names>D.</given-names>
            </name>
          </person-group>
          <article-title>Energy-efficient subthreshold processor design</article-title>
          <source>IEEE Trans. VLSI Syst.</source>
          <year>2009</year>
          <volume>17</volume>
          <fpage>1127</fpage>
          <lpage>1137</lpage>
          <pub-id pub-id-type="doi">10.1109/TVLSI.2008.2007564</pub-id>
        </citation>
      </ref>
	  <ref id="B11-jlpea-02-00180">
        <label>11.</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Bol</surname>
              <given-names>D.</given-names>
            </name>            
			<name>
              <surname>De Vos</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Hocquet</surname>
              <given-names>C.</given-names>
            </name>            
			<name>
              <surname>Botman</surname>
              <given-names>F.</given-names>
            </name>
            <name>
              <surname>Durvaux</surname>
              <given-names>F.</given-names>
            </name>            
			<name>
              <surname>Boyd</surname>
              <given-names>S.</given-names>
            </name>	            
			<name>
              <surname>Flandre</surname>
              <given-names>D.</given-names>
            </name>            
			<name>
              <surname>Legat</surname>
              <given-names>J.-D.</given-names>
            </name>
			</person-group>
          <article-title>A 25 MHz 7 µW/MHz ultra-low-voltage microcontroller SoC in 65 nm LP/GP CMOS for low-carbon wireless sensor nodes</article-title>
          <source>Proceedings of the 2012 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)</source>
		  <conf-loc>San Francisco, CA, USA</conf-loc>
          <year>2012</year>
          <fpage>490</fpage>
          <lpage>492</lpage>
          <pub-id pub-id-type="doi">10.1109/JSSC.2008.917505</pub-id>
        </citation>
      </ref>	  
	  
      <ref id="B12-jlpea-02-00180">
        <label>12.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Zhai</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Seok</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Cline</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Zhou</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Singhal</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Minuth</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Olson</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Nazhandali</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Austin</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>Sylvester</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Blaauw</surname>
              <given-names>D.</given-names>
            </name>
          </person-group>
          <article-title>Exploring variability and performance in a sub-200-mV processor</article-title>
          <source>IEEE J. Solid State Circuit</source>
          <year>2008</year>
          <volume>43</volume>
          <fpage>881</fpage>
          <lpage>891</lpage>
          <pub-id pub-id-type="doi">10.1109/JSSC.2008.917505</pub-id>
        </citation>
      </ref>
      <ref id="B13-jlpea-02-00180">
        <label>13.</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Blaauw</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Kalaiselvan</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Lai</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Ma</surname>
              <given-names>W.H.</given-names>
            </name>
            <name>
              <surname>Pant</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Tokunaga</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Das</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Bull</surname>
              <given-names>D.</given-names>
            </name>
          </person-group>
          <article-title>Razor II: <italic>In situ</italic> error detection and correction for PVT and SER tolerance</article-title>
          <source>Proceedings of the 2008 IEEE International Solid-State Circuits Conference</source>
          <conf-loc>San Francisco, CA, USA</conf-loc>
          <conf-date>3–7 February 2008</conf-date>
          <fpage>400</fpage>
        </citation>
      </ref>
      <ref id="B14-jlpea-02-00180">
        <label>14.</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Turnquist</surname>
              <given-names>M.J.</given-names>
            </name>
            <name>
              <surname>Laulainen</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Makipaa</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Pulkkinen</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Koskinen</surname>
              <given-names>L.</given-names>
            </name>
          </person-group>
          <article-title>Measurement of a timing error detection latch capable of sub-threshold operation</article-title>
          <source>Proceedings of the 2009 IEEE NORCHIP Circuit Conference</source>
          <conf-loc>Trondheim, Norway</conf-loc>
          <conf-date>16–17 November 2009</conf-date>
          <fpage>1</fpage>
          <lpage>4</lpage>
        </citation>
      </ref>
      <ref id="B15-jlpea-02-00180">
        <label>15.</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Turnquist</surname>
              <given-names>M.J.</given-names>
            </name>
            <name>
              <surname>Laulainen</surname>
              <given-names>E.</given-names>
            </name>
            <name>
              <surname>Mäkipää</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Koskinen</surname>
              <given-names>L.</given-names>
            </name>
          </person-group>
          <article-title>Measurement of a system-adaptive error-detection sequential circuit with subthreshold SCL</article-title>
          <source>Proceedings of the 2011 IEEE NORCHIP Circuit Conference</source>
          <conf-loc>Lund, Sweden</conf-loc>
          <conf-date>14–15 November 2011</conf-date>
          <fpage>1</fpage>
          <lpage>4</lpage>
        </citation>
      </ref>
      <ref id="B16-jlpea-02-00180">
        <label>16.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Tajalli</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Leblebici</surname>
              <given-names>Y.</given-names>
            </name>
          </person-group>
          <article-title>Leakage current reduction using subthreshold source-coupled logic</article-title>
          <source>IEEE Trans. Circuit Syst. II</source>
          <year>2009</year>
          <volume>56</volume>
          <fpage>374</fpage>
          <lpage>378</lpage>
          <pub-id pub-id-type="doi">10.1109/TCSII.2009.2019167</pub-id>
        </citation>
      </ref>
      <ref id="B17-jlpea-02-00180">
        <label>17.</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Tajalli</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Leblebici</surname>
              <given-names>Y.</given-names>
            </name>
          </person-group>
          <source>Low-Power Mixed Signal IC Design</source>
          <edition>1st</edition>
          <publisher-name>Springer</publisher-name>
          <publisher-loc>New York, NY, USA</publisher-loc>
          <year>2010</year>
        </citation>
      </ref>
      <ref id="B18-jlpea-02-00180">
        <label>18.</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Alioto</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Leblebici</surname>
              <given-names>Y.</given-names>
            </name>
          </person-group>
          <article-title>Analysis and design of ultra-low power subthreshold MCML gates</article-title>
          <source>Proceedings of the IEEE International Symposium on Circuit and Systems</source>
          <conf-loc>Taipei, Taiwan</conf-loc>
          <conf-date>24–27 May 2009</conf-date>
          <fpage>2557</fpage>
          <lpage>2560</lpage>
        </citation>
      </ref>
      <ref id="B19-jlpea-02-00180">
        <label>19.</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Bowman</surname>
              <given-names>K.</given-names>
            </name>
            <name>
              <surname>Tschanz</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Kim</surname>
              <given-names>N.</given-names>
            </name>
            <name>
              <surname>Lee</surname>
              <given-names>J.</given-names>
            </name>
            <name>
              <surname>Wilkerso</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Lu</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Karnik</surname>
              <given-names>T.</given-names>
            </name>
            <name>
              <surname>De</surname>
              <given-names>V.</given-names>
            </name>
          </person-group>
          <article-title>Energy-efficient and metastability-immune timing-error detection and instruction-replay-based recovery circuits for dyanmic-variation tolerance</article-title>
          <source>Proceedings of the IEEE International Solid-State Circuits Conference</source>
          <conf-loc>San Francisco, CA, USA</conf-loc>
          <conf-date>3–7 February 2008</conf-date>
          <fpage>402</fpage>
          <lpage>403</lpage>
        </citation>
      </ref>
    </ref-list>
  </back>
</article>
