J. Low Power Electron. Appl.2015, 5(2), 101-115; doi:10.3390/jlpea5020101 (registering DOI) - published 21 May 2015 Show/Hide Abstract
Abstract: In this paper, we analyze the variability of III-V homojunction tunnel FET (TFET) and FinFET devices and 32-bit carry-lookahead adder (CLA) circuit operating in near-threshold region. The impacts of the most severe intrinsic device variations including work function variation (WFV) and fin line-edge roughness (fin LER) on TFET and FinFET device Ion, Ioff, Cg, 32-bit CLA delay and power-delay product (PDP) are investigated and compared using 3D atomistic TCAD mixed-mode Monte-Carlo simulations and HSPICE simulations with look-up table based Verilog-A models calibrated with TCAD simulation results. The results indicate that WFV and fin LER have different impacts on device Ion and Ioff. Besides, at low operating voltage (<0.3 V), the CLA circuit delay and power-delay product (PDP) of TFET are significantly better than FinFET due to its better Ion and Cg,ave and their smaller variability. However, the leakage power of TFET CLA is larger than FinFET CLA due to the worse Ioff variability of TFET devices.
J. Low Power Electron. Appl.2015, 5(2), 81-100; doi:10.3390/jlpea5020081 - published 18 May 2015 Show/Hide Abstract
Abstract: This paper develops an ultra-low power asynchronous circuit design methodology, called Multi-Threshold NULL Convention Logic (MTNCL), also known as Sleep Convention Logic (SCL), which combines Multi-Threshold CMOS (MTCMOS) with NULL Convention Logic (NCL), to yield significant power reduction without any of the drawbacks of applying MTCMOS to synchronous circuits. In contrast to other power reduction techniques that usually result in large area overhead, MTNCL circuits are actually smaller than their original NCL versions. MTNCL utilizes high-Vttransistors to gate power and ground of a low-Vtlogic block to provide for both fast switching and very low leakage power when idle. To demonstrate the advantages of MTNCL, a number of 32-bit IEEE single-precision floating-point co-processors were designed for comparison using the 1.2 V IBM 8RF-LM 130 nm CMOS process: original NCL, MTNCL with just combinational logic (C/L) slept, Bit-Wise MTNCL (BWMTNCL), MTNCL with C/L and completion logic slept, MTNCL with C/L, completion logic, and registers slept, MTNCL with Safe Sleep architecture, and synchronous MTCMOS. These designs are compared in terms of throughput, area, dynamic energy, and idle power, showing the tradeoffs between the various MTNCL architectures, and that the best MTNCL design is much better than the original NCL design in all aspects, and much better than the synchronous MTCMOS design in terms of area, energy per operation, and idle power, although the synchronous design can operate faster.
J. Low Power Electron. Appl.2015, 5(2), 69-80; doi:10.3390/jlpea5020069 - published 29 April 2015 Show/Hide Abstract
Abstract: This work presents an analysis about the influence of the gate and source/drain underlap length (LUL) on UTBB FDSOI (UltraThin-Body-and-Buried-oxide Fully-Depleted-Silicon-On-Insulator) devices operating in conventional (VB = 0 V), dynamic threshold (DT, VB = VG), and the enhanced DT (eDT, VB = kVG) configurations, focusing on low power applications. It is shown that the underlap devices present a lower off-state current (IOFF at VG = 0 V), lower subthreshold swing (S), lower gate-induced drain leakage (GIDL), higher transconductance over drain current (gm/ID) ratio and higher intrinsic voltage gain (|AV|) due to their longer effective channel length in weak inversion and lower lateral electric field, while the eDT mode presents higher on-state current (ION) with the same IOFF, lower S, higher maximum transconductance (gmmax), lower threshold voltage (VT), higher gm/ID ratio and higher |AV| due to the dynamically reduced threshold voltage and stronger transversal electric field.
J. Low Power Electron. Appl.2015, 5(2), 57-68; doi:10.3390/jlpea5020057 - published 17 April 2015 Show/Hide Abstract
Abstract: To minimize energy consumption of a digital circuit, logic can be operated at sub- or near-threshold voltage. Operation at this region is challenging due to device and environment variations, and resulting performance may not be adequate to all applications. This article presents two variants of a 32-bit RISC CPU targeted for near-threshold voltage. Both CPUs are placed on the same die and manufactured in 28 nm CMOS process. They employ timing-error prevention with clock stretching to enable operation with minimal safety margins while maximizing performance and energy efficiency at a given operating point. Measurements show minimum energy of 3.15 pJ/cyc at 400 mV, which corresponds to 39% energy saving compared to operation based on static signoff timing.
J. Low Power Electron. Appl.2015, 5(2), 38-56; doi:10.3390/jlpea5020038 - published 27 March 2015 Show/Hide Abstract
Abstract: Modern systems-on-chip (SoCs) today contain hundreds of cores, and this number is predicted to reach the thousands by the year 2020. As the number of communicating elements increases, there is a need for an efficient, scalable and reliable communication infrastructure. As technology geometries shrink to the deep submicron regime, however, the communication delay and power consumption of global interconnections become the major bottleneck. The network-on-chip (NoC) design paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues, such as the performance limitations of long interconnects and integration of large number of cores on a chip. Recently, new communication technologies based on the NoC concept have emerged with the aim of improving the scalability limitations of conventional NoC-based architectures. Among them, wireless NoCs (WiNoCs) use the radio medium for reducing the performance and energy penalties of long-range and multi-hop communications. As the radio medium can be accessed by a single transmitter at a time, a radio access control mechanism (RACM) is needed. In this paper, we present a novel RACM, which allows one to improve both the performance and energy figures of the WiNoC. Experiments, carried out on both synthetic and real traffic scenarios, have shown the effectiveness of the proposed RACM. On average, a 30% reduction in communication delay and a 25% energy savings have been observed when the proposed RACM is applied to a known WiNoC architecture.
J. Low Power Electron. Appl.2015, 5(1), 3-37; doi:10.3390/jlpea5010003 - published 13 March 2015 Show/Hide Abstract
Abstract: This paper considers the problem of how to efficiently measure a large and complex information field with optimally few observations. Specifically, we investigate how to stochastically estimate modular criticality values in a large-scale digital circuit with a very limited number of measurements in order to minimize the total measurement efforts and time. We prove that, through sparsity-promoting transform domain regularization and by strategically integrating compressive sensing with Bayesian learning, more than 98% of the overall measurement accuracy can be achieved with fewer than 10% of measurements as required in a conventional approach that uses exhaustive measurements. Furthermore, we illustrate that the obtained criticality results can be utilized to selectively fortify large-scale digital circuits for operation with narrow voltage headrooms and in the presence of soft-errors rising at near threshold voltage levels, without excessive hardware overheads. Our numerical simulation results have shown that, by optimally allocating only 10% circuit redundancy, for some large-scale benchmark circuits, we can achieve more than a three-times reduction in its overall error probability, whereas if randomly distributing such 10% hardware resource, less than 2% improvements in the target circuit’s overall robustness will be observed. Finally, we conjecture that our proposed approach can be readily applied to estimate other essential properties of digital circuits that are critical to designing and analyzing them, such as the observability measure in reliability analysis and the path delay estimation in stochastic timing analysis. The only key requirement of our proposed methodology is that these global information fields exhibit a certain degree of smoothness, which is universally true for almost any physical phenomenon.