<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Journal of Low Power Electronics and Applications</journal-id>
<journal-title>Journal of Low Power Electronics and Applications</journal-title>
<issn pub-type="epub">2079-9268</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/jlpea1010097</article-id>
<article-id pub-id-type="publisher-id">jlpea-01-00097</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>Design of Energy Aware Adder Circuits Considering Random Intra-Die Process Variations</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Lanuzza</surname><given-names>Marco</given-names></name></contrib>
<contrib contrib-type="author">
<name><surname>Frustaci</surname><given-names>Fabio</given-names></name></contrib>
<contrib contrib-type="author">
<name><surname>Perri</surname><given-names>Stefania</given-names></name></contrib>
<contrib contrib-type="author">
<name><surname>Corsonello</surname><given-names>Pasquale</given-names></name><xref ref-type="corresp" rid="c1-jlpea-01-00097"><sup>*</sup></xref></contrib>
<aff id="af1-jlpea-01-00097">Department of Electronics, Computer Science and Systems (DEIS), University of Calabria, Arcavacata di Rende-87036-Rende (CS), Italy; E-Mails: <email>lanuzza@deis.unical.it</email> (M.L.); <email>ffrustaci@deis.unical.it</email> (F.F.); <email>perri@deis.unical.it</email> (S.P.)</aff></contrib-group>
<author-notes>
<corresp id="c1-jlpea-01-00097">
<label>*</label> Author to whom correspondence should be addressed; E-Mail: <email>p.corsonello@unical.it</email>; Tel.: +39-0984-494708; Fax: +39-0984-494834.</corresp></author-notes>
<pub-date pub-type="collection">
<year>2011</year></pub-date>
<pub-date pub-type="epub">
<day>06</day>
<month>04</month>
<year>2011</year></pub-date>
<volume>1</volume>
<issue>1</issue>
<fpage>97</fpage>
<lpage>108</lpage>
<history>
<date date-type="received">
<day>25</day>
<month>11</month>
<year>2010</year></date>
<date date-type="rev-recd">
<day>18</day>
<month>03</month>
<year>2011</year></date>
<date date-type="accepted">
<day>01</day>
<month>04</month>
<year>2011</year></date></history>
<permissions>
<copyright-statement>© 2011 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
<copyright-year>2011</copyright-year>
<license>
<p>This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p></license></permissions>
<abstract>
<p>Energy consumption is one of the main barriers to current high-performance designs. Moreover, the increased variability experienced in advanced process technologies implies further timing yield concerns and therefore intensifies this obstacle. Thus, proper techniques to achieve robust designs are a critical requirement for integrated circuit success. In this paper, the influence of intra-die random process variations is analyzed considering the particular case of the design of energy aware adder circuits. Five well known adder circuits were designed exploiting an industrial 45 nm static complementary metal-oxide semiconductor (CMOS) standard cell library. The designed adders were comparatively evaluated under different energy constraints. As a main result, the performed analysis demonstrates that, for a given energy budget, simpler circuits (which are conventionally identified as low-energy slow architectures) operating at higher power supply voltages can achieve a timing yield significantly better than more complex faster adders when used in low-power design with supply voltages lower than nominal.</p></abstract>
<kwd-group>
<kwd>intra-die process variations</kwd>
<kwd>yield-driven design</kwd>
<kwd>adder design</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>The rapid scaling of silicon technology has enabled designers to integrate millions and even billions of transistors into a single chip. This ability, to achieve very high integration density, has contributed to the success of integrated circuit (IC) design during the past few decades. Unfortunately, technology scaling is leading to a significant increase in process variability due to random doping effects, imperfections in lithographic patterning of small devices, and related effects [<xref ref-type="bibr" rid="b1-jlpea-01-00097">1</xref>]. Process variations (PVs) introduce statistical inter-die/intra-die fluctuations both in physical properties (e.g., transistor threshold voltage and transconductance, interconnect resistances and capacitances) and geometries of the different layers, which in turn result in uncertainties in speed and power characteristics of ICs [<xref ref-type="bibr" rid="b2-jlpea-01-00097">2</xref>,<xref ref-type="bibr" rid="b3-jlpea-01-00097">3</xref>]. This potentially impacts the parametric yield in advanced process technologies (like the 45 nm and beyond technological nodes) [<xref ref-type="bibr" rid="b2-jlpea-01-00097">2</xref>]. Moreover, the yield loss is also expected to grow in the future technology nodes where physical device parameters will approach the atomic scale and therefore will be subject to atomic uncertainties [<xref ref-type="bibr" rid="b1-jlpea-01-00097">1</xref>].</p>
<p>PVs can be compensated by using appropriate circuit techniques like Adaptive Body Bias (ABB) and Adaptive Supply Voltage (ASV) [<xref ref-type="bibr" rid="b4-jlpea-01-00097">4</xref>]. The ABB technique is based on the use of the transistor body effect to change transistor threshold voltage during circuit operation. This is accomplished by applying an adaptive body bias (either forward or reverse bias) to the transistors belonging to an IC. In [<xref ref-type="bibr" rid="b5-jlpea-01-00097">5</xref>] the effectiveness of ABB methodology was demonstrated. Body biasing was applied to both <italic>n</italic>-channel and <italic>p</italic>-channel transistors through separate on-chip power distribution networks. The <italic>p</italic>-channel transistors were forward biased to improve performance, whereas the <italic>n</italic>-channel transistors were reversed biased to reduce leakage. Results obtained in [<xref ref-type="bibr" rid="b5-jlpea-01-00097">5</xref>] demonstrate that the ABB technique can be very effective to control the distribution of maximum operating frequency (F<sub>max</sub>) and maximum power consumption (P<sub>max</sub>), and thus, to improve the parametric yield.</p>
<p>Another well known method for performing PV compensation is the ASV approach which consists in opportunely tuning the power supply voltage (VDD). This technique was originally proposed to trade performance with power [<xref ref-type="bibr" rid="b6-jlpea-01-00097">6</xref>,<xref ref-type="bibr" rid="b7-jlpea-01-00097">7</xref>]. In addition, as demonstrated in [<xref ref-type="bibr" rid="b4-jlpea-01-00097">4</xref>], the ASV method can be also a very good choice to tighten the performance and power consumption distributions and to improve product yield. Moreover, from the cost perspective, the ASV scheme is a less expensive solution than the ABB scheme, since ABB requires additional routing resources to distribute the bias voltage [<xref ref-type="bibr" rid="b4-jlpea-01-00097">4</xref>].</p>
<p>Whereas the above mentioned design methodologies are effective for compensating inter-die PVs they are less useful to mitigate intra-die PVs, since it is not physically possible to measure the variations for each single transistor on the chip and generate and apply the appropriate body/VDD source voltage to it.</p>
<p>The most well-known technique for reducing device-to-device (<italic>i.e.</italic>, intra-die) random variations consists of increasing the size of the transistors. However, in digital circuits this approach can lead to considerable power and area overheads [<xref ref-type="bibr" rid="b1-jlpea-01-00097">1</xref>].</p>
<p>In this paper, the influence of intra-die random PVs is analyzed considering the particular case of the adder circuits, which are a very important class of digital circuits since they are frequently used in the critical path of the control unit and the data-path of microprocessors and digital signal processors (DSPs) [<xref ref-type="bibr" rid="b8-jlpea-01-00097">8</xref>–<xref ref-type="bibr" rid="b10-jlpea-01-00097">10</xref>].</p>
<p>As a first step of our analysis, the speed uncertainty due to PVs is evaluated for different power supply voltages. It is shown that the impact of intra-die PVs on delay strongly depend on the considered VDD. Moreover the delay sensitivity worsens at the lower supply voltages. This information is particular important, especially for low power applications where the supply voltage may be reasonably low. In fact, if the delay variation becomes too large, timing yield fallout may occur. As a subsequent step of this work, the sensitivity to process variations was comparatively analyzed for low-energy slow and high-energy fast adder architectures. As a fundamental result, our study demonstrates that, for an equal energy budget, low-complexity circuits operating at higher VDDs can be significantly faster and less delay sensitive to random PVs than high-complexity adders operating at lower power supply voltages. This suggests some criteria for opportunely choosing optimum VDD and logic architecture to design energy aware high yield adders. We believe that this result can be very useful as it provides effective suggestions to manage intra-die process variability impact on Deep-Submicron (DSM) multi-VDD digital systems.</p>
<p>This paper is organized as follows: in Section 2, the analyzed adder topologies are briefly reviewed and their main characteristics are discussed; Section 3 deals with the impact of intra-die process variability on the analyzed adders; timing yield issues and important design guidelines for energy-aware adder circuits are discussed in Section 4; finally, conclusions are drawn in Section 5.</p></sec>
<sec>
<label>2.</label>
<title>Adder Circuits and Nominal Performances</title>
<p>Addition of binary numbers is implemented in a bitwise approach. At each bit position, the sum value can be determined based upon the corresponding bit values of the operands and the incoming carry value from the previous position. Since, in the worst case, the incoming carry value should be propagated from the least significant bit position to the most significant, the delay of an addition operation is dependent on the operand word length (<italic>n</italic>). In order to reduce addition time, different carry propagation techniques have been proposed at both the logical and circuit level [<xref ref-type="bibr" rid="b10-jlpea-01-00097">10</xref>].</p>
<p>Five 16-bit adder architectures have been considered as the case study in this work. They were synthesized by Synopsys Module Compiler (MC) [<xref ref-type="bibr" rid="b11-jlpea-01-00097">11</xref>] forcing a speed optimization (by using the <italic>max-delay</italic> timing constraint) and exploiting the STMicroelectronics (ST) 45-nm static Low-Power CMOS standard-cells library. The different logic architectures available for the MC automatic synthesis are summarized in <xref ref-type="table" rid="t1-jlpea-01-00097">Table 1</xref>, which also shows adder delay and area characteristics. The ripple carry adder (<italic>Ripple</italic>) is a low-area, slow and low-energy structure. By specifying this kind of architecture, MC maps an alternating polarity chain of full adders with inverted carry-ins and carry-outs [<xref ref-type="bibr" rid="b11-jlpea-01-00097">11</xref>].</p>
<p>The fast carry look-ahead (<italic>fast_CLA</italic>) adder is the fastest available MC architecture, but it is also the largest. It uses a dense carry tree to propagate the carries to each bit, in only log<sub>2</sub> <italic>n</italic> inverting AND-OR delays [<xref ref-type="bibr" rid="b11-jlpea-01-00097">11</xref>]. The carry look-ahead (<italic>CLA</italic>) adder exploits a sparse carry tree that roughly doubles the delay (actually 2(log<sub>2</sub> <italic>n</italic> – 1)) in the carry tree, relative to the <italic>fast_CLA</italic> but it provides significant area savings [<xref ref-type="bibr" rid="b11-jlpea-01-00097">11</xref>]. The carry select adder (<italic>CSA</italic>) is also a high-performance circuit. However, by increasing the adder size, the growing loading on the carry-select lines can degrade performance below the expected level. The carry look-ahead/select adder (<italic>CLSA</italic>) is by far the most flexible architecture (by specifying this kind of architecture MC automatically creates a structure ranging from a <italic>Ripple</italic> to a <italic>fast_CLA</italic> adder, depending on the desired delay [<xref ref-type="bibr" rid="b11-jlpea-01-00097">11</xref>]).</p>
<p>When a digital circuit is designed using the semi-custom standard cells based approach, the available degrees of freedom for a designer to satisfy given energy consumption and performance specs are essentially represented by logic architecture and supply voltage choosing. Among these, tuning the VDD value is a straightforward technique to meet the given delay (energy) constraint. In fact, by increasing the power supply voltage, the device drive currents are improved thus leading to better circuit performances, but this also degrades both dynamic and leakage power which are quadratically and exponentially dependent on VDD, respectively. Conversely, by reducing the power supply voltage, dynamic and leakage power are improved but the performance is degraded.</p>
<p>In order to characterize the sensitivity of the considered adder architectures to different VDDs, the circuits were simulated in the Cadence environment for VDD ranging from 0.8 V up to 1.2 V. Simulations were performed placing input buffers between ideal voltage sources and operand inputs to provide realistic input signals. Moreover, each output signal was loaded with a 0.8 fF capacitance (which corresponds to the input capacitance of a D-type Flip-Flop in the referred technology). This choice allows realistic running conditions to be examined.</p>
<p><xref ref-type="fig" rid="f1-jlpea-01-00097">Figure 1</xref> compares the nominal performances of the different adder topologies in the considered VDD range. Given simulation results were obtained for the TT 75 °C process corner and plotted for a step of 0.1 V. As expected, the <italic>fast_CLA</italic> and the <italic>CLSA</italic> achieve lower delay. On the contrary, the <italic>Ripple</italic> architecture is the slowest implementation resulting always more than three times slower than the fastest circuit.</p>
<p>The energy dissipation evaluated under different supply voltages is plotted in <xref ref-type="fig" rid="f2-jlpea-01-00097">Figure 2</xref>. Note that the energy consumption plotted here is an average energy value per operation (<italic>E</italic><sub>op</sub>), evaluated over 200 input patterns. The latter were randomly provided at a running frequency of 166 MHz. From <xref ref-type="fig" rid="f2-jlpea-01-00097">Figure 2</xref>, it is easy to observe that the <italic>Ripple</italic> circuit exhibits energy consumption significantly lower than the remaining counterparts (<italic>i.e.</italic>, up to 56%, 62%, 81% and 83% less than the <italic>CLA</italic>, <italic>CSA</italic>, <italic>fast_CLA</italic> and <italic>CLSA</italic>, respectively), proving to be the most suitable choice when low power consumption is mandatory. In contrast, the fastest adders (<italic>i.e.</italic>, the <italic>fast_CLA</italic> and <italic>CLSA</italic>) are the most energy hungry architectures, thus useful only when speed is the primary concern.</p>
<p>For the sake of completeness, the leakage current evaluated in the considered VDD range is plotted in <xref ref-type="fig" rid="f3-jlpea-01-00097">Figure 3</xref>. The <italic>Ripple</italic> circuit shows minimum leakage current due to its low-area structure, whereas relatively high leakage currents were measured for the fastest circuits due to their larger area occupancy.</p>
<p>The analyzed adder topologies can be thoroughly and fairly compared by combining results of <xref ref-type="fig" rid="f1-jlpea-01-00097">Figure 1</xref> and <xref ref-type="fig" rid="f2-jlpea-01-00097">Figure 2</xref> in the Energy-Delay (E-D) space (<italic>i.e.</italic>, the set of design points showing for a given energy/delay value the correspondent delay/energy characteristic), as illustrated in <xref ref-type="fig" rid="f4-jlpea-01-00097">Figure 4</xref>. By opportunely tuning the VDD value, the correct operations of the <italic>Ripple</italic> circuit are guaranteed also with energy consumption lower than 420 fJ. This can be obtained by using a power supply voltage equal or lower than the nominal VDD (<italic>i.e.</italic>, 1 V).</p>
<p>As highlighted in <xref ref-type="fig" rid="f5-jlpea-01-00097">Figure 5</xref>, the <italic>CLA</italic>, the <italic>Ripple</italic> and the <italic>CSA</italic> circuits, offer very similar speed results in the 460–560 fJ energy range, with a small advantage of the <italic>CLA</italic> adder which was up to 4% and 6% faster than the <italic>Ripple</italic> and the <italic>CSA</italic> architectures, respectively). The <italic>CLA</italic> is also the fastest circuit for an energy budget up to the 750 fJ. After that, the <italic>CSA</italic> is the most suitable adder architecture since it was up to 1.6× and 2.2× faster than the <italic>fast_CLA</italic> and the <italic>CLSA</italic>, respectively. Finally, when a very high speed is required (<italic>i.e.</italic>, for a delay constraint lower than 160 ps) the <italic>fast_CLA</italic> circuit is the obvious choice at the expense of considerable energy consumption (<italic>i.e.</italic>, more than 2.5 pJ).</p>
<p>The performed analysis suggests that considering power supply voltage as a tuning parameter, different architecture choices can be performed on the basis of the available energy budget. In the following, we analyze how the possible choices are impacted by random intra-die PVs.</p></sec>
<sec sec-type="discussion">
<label>3.</label>
<title>Impact of Intra-Die Process Variability for Different Power Supply Voltages</title>
<p>The impact of intra-die PVs was evaluated through Monte Carlo simulations performed on 1000 samples. In this case, driving circuits of the simulation setup are not influenced by random process variations in order to isolate process variability effects on circuits under test.</p>
<p>The ratio between the maximum spread 3σ and the mean value μ (<italic>i.e.</italic>, 3σ/μ [<xref ref-type="bibr" rid="b1-jlpea-01-00097">1</xref>]) was considered as a measure of the uncertainty of the delay. As can be easily observed in <xref ref-type="fig" rid="f6-jlpea-01-00097">Figure 6</xref>, during the optimization for power savings (<italic>i.e.</italic>, VDD lower than the nominal value) the delay variability increases at a rate similar to the decreasing of the nominal delay, and hence timing yield worsens during this optimization. Conversely, the delay variability is reduced for higher power supply voltages. The <italic>Ripple</italic> circuit is the less PV delay sensitive circuit (its delay variability ranges from 10.2%@1.2 V to 20.7%@0.8 V). In contrast, the <italic>fast_CLA</italic> is the most PV delay sensitive structure (its delay variability spreads from 12.9%@1.2 V to 28%@0.8 V), resulting from 1.26× to 1.35× more delay sensitive with respect the <italic>Ripple</italic> architecture. It is interesting to observe that at the same VDD value, circuits with longer critical path lengths always present delay variability lower than those with shorter critical path length. The reduced delay variability of slower circuits is explainable considering the higher number of logic gates which are in the critical path; each of them experiences a different impact on its delay characteristic also with different sign, thus a more pronounced averaging effect exists on longer logic gate chains.</p>
<p>The 3-sigma delay value (defined as μ + 3σ) was evaluated for different VDD and is plotted in <xref ref-type="fig" rid="f7-jlpea-01-00097">Figure 7</xref>.</p>
<p>It is worth noting that the 3-sigma delay value provides very practical information to evaluate the achievable post fabrication timing yield. In fact, considering the 3-sigma delay value as a timing constraint, it is statistically assured that about 99.87% of the fabricated circuits satisfy the target speed [<xref ref-type="bibr" rid="b1-jlpea-01-00097">1</xref>]. As the main effect of the intra-die PVs, all the curves are shifted up with respect to those drawn in <xref ref-type="fig" rid="f1-jlpea-01-00097">Figure 1</xref>. Obviously, the experienced shift amount depends on the particular circuit delay sensitivity to intra-die PVs.</p>
<p><xref ref-type="fig" rid="f8-jlpea-01-00097">Figure 8</xref> compares the Energy-3sigma Delay curves of the different adders. It is worth noting that the average energy consumption per operation is strongly dominated by the switching component which is relatively insensitive to process variations [<xref ref-type="bibr" rid="b12-jlpea-01-00097">12</xref>]. For this reason process induced variations on energy can be considered negligible and, thus, they were not taken into account in this work.</p>
<p>Results shown in <xref ref-type="fig" rid="f8-jlpea-01-00097">Figure 8</xref> describe a quite different scenario with respect to those given in <xref ref-type="fig" rid="f4-jlpea-01-00097">Figure 4</xref>. The E-D curves are now shifted toward the right depending on the influence of process variability on a given adder architecture. Due to this, the <italic>Ripple</italic> circuit has the lowest 3-sigma delay value when an energy budget up to 575 fJ is available. This is highlighted in <xref ref-type="fig" rid="f9-jlpea-01-00097">Figure 9</xref> which plots the <italic>Ripple</italic>, the <italic>CLA</italic> and the <italic>CSA</italic> E-D, curves for energy values ranging from 350 fJ to 800 fJ. It can be seen that the <italic>Rippl</italic>e architecture can achieve a 3-sigma delay value 9.5% and 16% lower than <italic>CLA</italic> and <italic>CSA</italic> circuits, respectively. As shown in <xref ref-type="fig" rid="f10-jlpea-01-00097">Figure 10</xref>, the <italic>CLA</italic> and the <italic>CSA</italic> E-D plots' result almost overlap in the 750–1250 fJ energy range. In the same range these circuits achieve 3-sigma delay up to 46.2% and 59.6% lower than the <italic>fast_CLA</italic> and the <italic>CSLA</italic> adders, respectively. Although the <italic>fast_CLA</italic> presents the highest delay variability, it remains almost the only choice when very high speed is mandatory and energy consumption is not a concern.</p>
<p>The previous discussed analysis provides important suggestions to design robust circuits under energy constraints. This is highlighted in the next section.</p></sec>
<sec sec-type="methods">
<label>4.</label>
<title>Timing Yield Issues and Design Guidelines for Energy-Aware Adder Circuits</title>
<p>Under process variations, the delay of a given circuit can be modeled by a normal distribution with a probability density function (PDF) characterized by the mean and the standard deviation values [<xref ref-type="bibr" rid="b1-jlpea-01-00097">1</xref>]. By analyzing the PDF of the delay for a given energy constraint, useful information about the achievable timing yield can be obtained.</p>
<p><xref ref-type="fig" rid="f11-jlpea-01-00097">Figure 11</xref> shows the PDF of the delay for the analyzed adder architectures under the 500 fJ energy constraint. Only the circuits which can meet the energy consumption requirement were considered in this analysis. It can be seen that, for the considered energy point, the <italic>Ripple</italic>@1.18 V, the <italic>CLA</italic>@0.85 V and the <italic>CSA</italic>@0.80 V, can achieve a very similar mean delay. However, as highlighted in <xref ref-type="fig" rid="f11-jlpea-01-00097">Figure 11</xref>, the <italic>Ripple</italic> circuit presents a significantly tighter performance distribution due to its delay variability 47% and 57 lower than the <italic>CLA</italic> and the <italic>CSA</italic> circuits, respectively.</p>
<p>The delay distributions for the <italic>Ripple</italic>@1.20 V, the <italic>CLA</italic>@1.07 V, the <italic>CSA</italic>@1.00 V and the <italic>fast_CLA</italic>@0.82 V are plotted for the 1000 fJ energy constraint in <xref ref-type="fig" rid="f12-jlpea-01-00097">Figure 12</xref>. It is worth noting that the <italic>Ripple</italic> circuit has been included in the comparison because it can achieve a 3-sigma delay value very close to that of the <italic>fast_CLA</italic> architecture working at VDD = 0.82 V (see <xref ref-type="fig" rid="f7-jlpea-01-00097">Figure 7</xref>), while consuming about 43% less energy per operation. As can be observed in <xref ref-type="fig" rid="f12-jlpea-01-00097">Figure 12</xref>, the <italic>CLA</italic> and the <italic>CSA</italic> presents an almost equal mean delay value which is significantly lower than those of the <italic>fast_CLA</italic> (about 39%) and the <italic>Ripple</italic> (about 46%) architectures. This higher speed is achieved with relatively low delay variability (<italic>i.e.</italic>, 12.4% for the <italic>CLA</italic> and 14.5% for the <italic>CSA</italic>). It is worth noting that the <italic>fast_CLA</italic> circuit presents the highest delay variability which is 1.9×, 2.2× and 2.7× larger than that of the <italic>CSA</italic>, <italic>CLA</italic> and the <italic>Ripple</italic> architectures, respectively.</p>
<p><xref ref-type="fig" rid="f13-jlpea-01-00097">Figure 13</xref> plots the delay PDFs for the <italic>CLA</italic>@1.20 V, the <italic>CSA</italic>@1.20 V, the <italic>fast_CLA</italic>@0.92 V and the <italic>CLSA</italic>@0.87 V under the 1500 fJ energy constraint. The <italic>CLA</italic> circuit achieves a mean delay value 27% and 47% better than that of <italic>fast_CLA</italic> and the <italic>CSLA</italic> circuits, respectively. Moreover, the <italic>CLA</italic> architecture has the lowest delay variability and consumes about 15% less energy per operation with respect to its counterparts.</p>
<p>The above discussed results clearly demonstrate that, for a given energy constraint, properly power supplied low-complexity adder architectures can achieve better timing characteristics and reduced delay sensitivity to random PVs with respect to complex adders operating at lower power supply voltages.</p></sec>
<sec sec-type="conclusions">
<label>5.</label>
<title>Conclusions</title>
<p>In this paper, the influence of intra-die random PVs was analyzed considering five well known adder circuits, designed exploiting the ST 45 nm static CMOS standard cells library. As a first step of our analysis, the speed uncertainty due to PVs was evaluated for different power supply voltages. It was shown that the impact of intra-die PVs on timing yield strongly depends on the considered logic architecture and chosen power supply voltage. For a given VDD, slower adder circuits present reduced delay variability due to the averaging effect of longer critical paths. In the second part of this work, the sensitivity to process variations was comparatively analyzed for low-energy slow and high-energy fast adder architectures. As the main result it was demonstrated that, for an equal energy budget, low-complexity circuits, operating at higher VDDs can be significantly faster and less delay sensitive to random PVs than high-complexity adders, operating at lower power supply voltages. This suggests some criteria for opportunely choosing optimum VDD and logic architecture to design energy aware high yield adders: for a given energy constraint it is preferable to use lower complexity adders power supplied at an appropriately high VDD.</p></sec></body>
<back>
<sec sec-type="display-objects">
<title>Figures and Table</title>
<fig id="f1-jlpea-01-00097" position="float">
<label>Figure 1.</label>
<caption>
<p>Delay characteristics.</p></caption>
<graphic xlink:href="jlpea-01-00097f1.gif"/></fig>
<fig id="f2-jlpea-01-00097" position="float">
<label>Figure 2.</label>
<caption>
<p>Energy characteristics.</p></caption>
<graphic xlink:href="jlpea-01-00097f2.gif"/></fig>
<fig id="f3-jlpea-01-00097" position="float">
<label>Figure 3.</label>
<caption>
<p>Leakage current.</p></caption>
<graphic xlink:href="jlpea-01-00097f3.gif"/></fig>
<fig id="f4-jlpea-01-00097" position="float">
<label>Figure 4.</label>
<caption>
<p>Energy-Delay characteristics.</p></caption>
<graphic xlink:href="jlpea-01-00097f4.gif"/></fig>
<fig id="f5-jlpea-01-00097" position="float">
<label>Figure 5.</label>
<caption>
<p>Energy-Delay characteristics in the 350–800 fJ energy range.</p></caption>
<graphic xlink:href="jlpea-01-00097f5.gif"/></fig>
<fig id="f6-jlpea-01-00097" position="float">
<label>Figure 6.</label>
<caption>
<p>Delay variability.</p></caption>
<graphic xlink:href="jlpea-01-00097f6.gif"/></fig>
<fig id="f7-jlpea-01-00097" position="float">
<label>Figure 7.</label>
<caption>
<p>3σ Delay characteristics.</p></caption>
<graphic xlink:href="jlpea-01-00097f7.gif"/></fig>
<fig id="f8-jlpea-01-00097" position="float">
<label>Figure 8.</label>
<caption>
<p>Energy-3σ Delay characteristics.</p></caption>
<graphic xlink:href="jlpea-01-00097f8.gif"/></fig>
<fig id="f9-jlpea-01-00097" position="float">
<label>Figure 9.</label>
<caption>
<p>Energy-3σ Delay characteristics in the 350–800 fJ energy range.</p></caption>
<graphic xlink:href="jlpea-01-00097f9.gif"/></fig>
<fig id="f10-jlpea-01-00097" position="float">
<label>Figure 10.</label>
<caption>
<p>Energy-3s Delay characteristics in the 550–1650 fJ energy range.</p></caption>
<graphic xlink:href="jlpea-01-00097f10.gif"/></fig>
<fig id="f11-jlpea-01-00097" position="float">
<label>Figure 11.</label>
<caption>
<p>Delay Probability Density Function (PDF) for <italic>E</italic><sub>op</sub> = 500 fJ.</p></caption>
<graphic xlink:href="jlpea-01-00097f11.gif"/></fig>
<fig id="f12-jlpea-01-00097" position="float">
<label>Figure 12.</label>
<caption>
<p>Delay Probability Density Function (PDF) for <italic>E</italic><sub>op</sub> = 1000 fJ.</p></caption>
<graphic xlink:href="jlpea-01-00097f12.gif"/></fig>
<fig id="f13-jlpea-01-00097" position="float">
<label>Figure 13.</label>
<caption>
<p>Delay Probability Density Function (PDF) for <italic>E</italic><sub>op</sub> = 1500 fJ.</p></caption>
<graphic xlink:href="jlpea-01-00097f13.gif"/></fig>
<table-wrap id="t1-jlpea-01-00097" position="float">
<label>Table 1.</label>
<caption>
<p>Asymptotic time and area requirements of <italic>n</italic>-bit adders.</p></caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="center" valign="top"><bold>Adder Type</bold></th>
<th align="center" valign="top"><bold>Description</bold></th>
<th align="center" valign="top"><bold>Area</bold></th>
<th align="center" valign="top"><bold>Delay</bold></th></tr></thead>
<tbody>
<tr>
<td align="center" valign="top"><bold><italic>Ripple</italic></bold></td>
<td align="center" valign="top">ripple carry adder</td>
<td align="center" valign="top">O(<italic>n</italic>)</td>
<td align="center" valign="top">O(<italic>n</italic>)</td></tr>
<tr>
<td align="center" valign="top"><bold><italic>Fast</italic>_<italic>CLA</italic></bold></td>
<td align="center" valign="top">fast carry look-ahead adder</td>
<td align="center" valign="top">O(<italic>n</italic>log<sub>2</sub> <italic>n</italic>)</td>
<td align="center" valign="top">O(log<sub>2</sub> <italic>n</italic>)</td></tr>
<tr>
<td align="center" valign="top"><bold><italic>CLA</italic></bold></td>
<td align="center" valign="top">carry look-ahead adder</td>
<td align="center" valign="top">O(<italic>n</italic>)</td>
<td align="center" valign="top">O(log<sub>2</sub> <italic>n</italic>)</td></tr>
<tr>
<td align="center" valign="top"><bold><italic>CSA</italic></bold></td>
<td align="center" valign="top">carry select adder</td>
<td align="center" valign="top">O(<italic>n</italic>)</td>
<td align="center" valign="top">O(√<italic>n</italic>)</td></tr>
<tr>
<td align="center" valign="middle"><bold><italic>CLSA</italic></bold></td>
<td align="center" valign="middle">carry look-ahead/select adder</td>
<td align="center" valign="top">Variable<break/>(<italic>Ripple</italic> ≥ <italic>fast</italic>_<italic>CLA</italic>)</td>
<td align="center" valign="top">Variable<break/>(<italic>Ripple</italic> ≥ <italic>fast</italic>_<italic>CLA</italic>)</td></tr></tbody></table></table-wrap></sec>
<ref-list>
<title>References</title>
<ref id="b1-jlpea-01-00097"><label>1.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Wong</surname><given-names>B.</given-names></name><name><surname>Zach</surname><given-names>F.</given-names></name><name><surname>Moroz</surname><given-names>V.</given-names></name><name><surname>Mittal</surname><given-names>A.</given-names></name><name><surname>Starr</surname><given-names>G.</given-names></name><name><surname>Kahng</surname><given-names>A.</given-names></name></person-group><source>Nano-CMOS Design for Manufacturability</source><publisher-name>John Wiley &amp; Sons</publisher-name><publisher-loc>Hoboken, NJ, USA</publisher-loc><year>2009</year></citation></ref>
<ref id="b2-jlpea-01-00097"><label>2.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sylvester</surname><given-names>D.</given-names></name><name><surname>Agarwal</surname><given-names>K.</given-names></name><name><surname>Shah</surname><given-names>S.</given-names></name></person-group><article-title>Variability in nanometer CMOS: Impact, analysis, and minimization</article-title><source>Integration</source><year>2008</year><volume>41</volume><fpage>319</fpage><lpage>339</lpage></citation></ref>
<ref id="b3-jlpea-01-00097"><label>3.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Borkar</surname><given-names>S.</given-names></name><name><surname>Karnik</surname><given-names>T.</given-names></name><name><surname>Narendra</surname><given-names>S.</given-names></name><name><surname>Tschanz</surname><given-names>J.</given-names></name><name><surname>Keshavarzi</surname><given-names>A.</given-names></name><name><surname>De</surname><given-names>V.</given-names></name></person-group><article-title>Parameter variations and impact on circuits and microarchitecture</article-title><conf-name>Proceedings of the 40th Conference on Design Automation</conf-name><conf-loc>Anaheim, CA, USA</conf-loc><month>June</month><year>2003</year></citation></ref>
<ref id="b4-jlpea-01-00097"><label>4.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>T.</given-names></name><name><surname>Naffziger</surname><given-names>S.</given-names></name></person-group><article-title>Comparison of Adaptive Body Bias (ABB) and Adaptive Supply Voltage (ASV) for Improving Delay and Leakage under the Presence of Process Variation</article-title><source>IEEE Trans. Very Large Scale Integr. VLSI Syst.</source><year>2003</year><volume>11</volume><fpage>888</fpage><lpage>899</lpage><pub-id pub-id-type="doi">10.1109/TVLSI.2003.817120</pub-id></citation></ref>
<ref id="b5-jlpea-01-00097"><label>5.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tschanz</surname><given-names>J.</given-names></name><name><surname>Kao</surname><given-names>J.</given-names></name><name><surname>Narendra</surname><given-names>S.</given-names></name><name><surname>Nair</surname><given-names>R.</given-names></name><name><surname>Antoniadis</surname><given-names>D.</given-names></name><name><surname>Chandrakasan</surname><given-names>A.</given-names></name><name><surname>De</surname><given-names>V.</given-names></name></person-group><article-title>Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage</article-title><source>IEEE J. Solid-State Circuit</source><year>2002</year><volume>37</volume><fpage>422</fpage><lpage>423</lpage></citation></ref>
<ref id="b6-jlpea-01-00097"><label>6.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wei</surname><given-names>G.-Y.</given-names></name><name><surname>Horowitz</surname><given-names>M.</given-names></name></person-group><article-title>A fully digital, energy-efficient, adaptive power-supply regulator</article-title><source>IEEE J. Solid-State Circuits</source><year>1999</year><volume>34</volume><fpage>520</fpage><lpage>528</lpage><pub-id pub-id-type="doi">10.1109/4.753685</pub-id></citation></ref>
<ref id="b7-jlpea-01-00097"><label>7.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname><given-names>J.</given-names></name><name><surname>Horowitz</surname><given-names>R.</given-names></name></person-group><article-title>An efficient digital sliding controller for adaptive power supply regulation</article-title><source>IEEE J. Solid-State Circuits</source><year>2002</year><volume>37</volume><fpage>639</fpage><lpage>647</lpage><pub-id pub-id-type="doi">10.1109/4.997858</pub-id></citation></ref>
<ref id="b8-jlpea-01-00097"><label>8.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Rabaey</surname><given-names>J.</given-names></name><name><surname>Chandrakasan</surname><given-names>A.</given-names></name><name><surname>Nikolic</surname><given-names>B.</given-names></name></person-group><source>Digital Integrated Circuits: A Design Perspective</source><publisher-name>Prentice Hall</publisher-name><publisher-loc>Englewood Cliffs, NJ, USA</publisher-loc><year>2003</year></citation></ref>
<ref id="b9-jlpea-01-00097"><label>9.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nagendra</surname><given-names>C.</given-names></name><name><surname>Irwin</surname><given-names>M.</given-names></name><name><surname>Owens</surname><given-names>R.</given-names></name></person-group><article-title>Area-time-power tradeoffs in parallel adders</article-title><source>IEEE Trans. Circuits Syst. II</source><year>1996</year><volume>43</volume><fpage>689</fpage><lpage>702</lpage><pub-id pub-id-type="doi">10.1109/82.539001</pub-id></citation></ref>
<ref id="b10-jlpea-01-00097"><label>10.</label><citation citation-type="book"><person-group person-group-type="author"><name><surname>Parhami</surname><given-names>B.</given-names></name></person-group><source>Computer Arithmetic: Algorithms and Hardware Designs</source><publisher-name>Oxford University Press</publisher-name><publisher-loc>New York, NY, USA</publisher-loc><year>2000</year></citation></ref>
<ref id="b11-jlpea-01-00097"><label>11.</label><citation citation-type="web"><person-group person-group-type="author"><collab>Synopsys Documentation</collab></person-group><comment>Available online: <ext-link xlink:href="http://www.synopsys.com/home.aspx" ext-link-type="uri">http://www.synopsys.com/home.aspx</ext-link> (accessed on 28 March 2011)</comment></citation></ref>
<ref id="b12-jlpea-01-00097"><label>12.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Srivastava</surname><given-names>A.</given-names></name><name><surname>Kachru</surname><given-names>T.</given-names></name><name><surname>Sylvester</surname><given-names>D.</given-names></name></person-group><article-title>Low-Power-Design Space Exploration Considering Process Variation Using Robust Optimization</article-title><source>IEEE Trans. Comput. Aided Des. Integr. Circuits Syst.</source><year>2007</year><volume>26</volume><fpage>67</fpage><lpage>79</lpage><pub-id pub-id-type="doi">10.1109/TCAD.2006.882491</pub-id></citation></ref></ref-list></back></article>
