Open AccessArticle
Evaluating the Impact of Max Transition Constraint Variations on Power Reduction Capabilities in Cell-Based Designs
J. Low Power Electron. Appl. 2017, 7(4), 25; doi:10.3390/jlpea7040025 -
Abstract
Power optimization is a very important and challenging step in the physical design flow, and it is a critical success factor of an application-specific integrated circuit (ASIC) chip. Many techniques are used by the place and route (P&R) electronic design automation (EDA) tools
[...] Read more.
Power optimization is a very important and challenging step in the physical design flow, and it is a critical success factor of an application-specific integrated circuit (ASIC) chip. Many techniques are used by the place and route (P&R) electronic design automation (EDA) tools to meet the power requirement. In this paper, we will evaluate, independently from the library file, the impact of redefining the max transition constraint (MTC) before the power optimization phase, and we will study the impact of over-constraining or under-constraining a design on power in order to find the best trade-off between design constraining and power optimization. Experimental results showed that power optimization depends on the applied MTC and that the MTC value corresponding to the best power reduction results is different from the default MTC. By using a new MTC definition method on several designs, we found that the power gain between the default methodology and the new one reaches 2.34%. Full article
Figures

Figure 1

Open AccessArticle
Ultra-Low Power, Process-Tolerant 10T (PT10T) SRAM with Improved Read/Write Ability for Internet of Things (IoT) Applications
J. Low Power Electron. Appl. 2017, 7(3), 24; doi:10.3390/jlpea7030024 -
Abstract
In this paper, an ultra-low power (ULP) 10T static random access memory (SRAM) is presented for Internet of Things (IoT) applications, which operates at sub-threshold voltage. The proposed SRAM has the tendency to operate at low supply voltages with high static and dynamic
[...] Read more.
In this paper, an ultra-low power (ULP) 10T static random access memory (SRAM) is presented for Internet of Things (IoT) applications, which operates at sub-threshold voltage. The proposed SRAM has the tendency to operate at low supply voltages with high static and dynamic noise margins. The IoT application requires battery-enabled low leakage memory architecture in a subthreshold regime. Therefore, to improve leakage power consumption and provide better cell stability, a power-gated robust 10T SRAM is presented in this paper. The proposed cell uses a power-gated p-MOS transistor to reduce the leakage power or static power in standby mode. Moreover, due to the stacking of n-MOS transistors in 10T SRAM latch and by separating the read path from the 10T SRAM latch, the static and dynamic noise margins in read and write operations has shown significant tolerance w.r.t. the variations in device process, voltage, and temperature (PVT) values. The proposed SRAM shows significantly improved performance in terms of leakage power, read static noise margin (RSNM), write static noise margin (WSNM), write ability or write trip point (WTP), read–write energy, and dynamic read margin (DRM). Furthermore, these parameters of the proposed cell are observed at 8-Kilo bit (Kb) SRAM and compared with existing SRAM architectures. From the Monte Carlo simulation results, it is observed that the leakage power of a proposed low threshold voltage-LVT 10T SRAM is reduced by 98.76%, 98.6%, 6.7%, and 98.2% as compared to the LVT C6T, RD8T, LP9T, and ST10T SRAM, respectively, at 0.3V VDD. Additionally, in the proposed 10T SRAM, parameters such as RSNM, WSNM, WTP, and DRM are improved by 3×, 2×, 1.11×, and 1.32×, respectively, as compared to C6T SRAM. Similarly, the proposed 10T SRAM shows an improvement of 1.48×, 1.25×, and 1.1× in RSNM, WSNM, and WTP, respectively, in the parameters as compared to RD8T SRAM at 0.3 V VDD. Full article
Figures

Figure 1

Open AccessArticle
DESTINY: A Comprehensive Tool with 3D and Multi-Level Cell Memory Modeling Capability
J. Low Power Electron. Appl. 2017, 7(3), 23; doi:10.3390/jlpea7030023 -
Abstract
To enable the design of large capacity memory structures, novel memory technologies such as non-volatile memory (NVM) and novel fabrication approaches, e.g., 3D stacking and multi-level cell (MLC) design have been explored. The existing modeling tools, however, cover only a few memory technologies,
[...] Read more.
To enable the design of large capacity memory structures, novel memory technologies such as non-volatile memory (NVM) and novel fabrication approaches, e.g., 3D stacking and multi-level cell (MLC) design have been explored. The existing modeling tools, however, cover only a few memory technologies, technology nodes and fabrication approaches. We present DESTINY, a tool for modeling 2D/3D memories designed using SRAM, resistive RAM (ReRAM), spin transfer torque RAM (STT-RAM), phase change RAM (PCM) and embedded DRAM (eDRAM) and 2D memories designed using spin orbit torque RAM (SOT-RAM), domain wall memory (DWM) and Flash memory. In addition to single-level cell (SLC) designs for all of these memories, DESTINY also supports modeling MLC designs for NVMs. We have extensively validated DESTINY against commercial and research prototypes of these memories. DESTINY is very useful for performing design-space exploration across several dimensions, such as optimizing for a target (e.g., latency, area or energy-delay product) for a given memory technology, choosing the suitable memory technology or fabrication method (i.e., 2D v/s 3D) for a given optimization target, etc. We believe that DESTINY will boost studies of next-generation memory architectures used in systems ranging from mobile devices to extreme-scale supercomputers. The latest source-code of DESTINY is available from the following git repository: https://bitbucket.org/sparshmittal/destinyv2. Full article
Figures

Figure 1

Open AccessArticle
Review and Comparison of Clock Jitter Noise Reduction Techniques for Lowpass Continuous-Time Delta-Sigma Modulators
J. Low Power Electron. Appl. 2017, 7(3), 22; doi:10.3390/jlpea7030022 -
Abstract
It is well known that continuous-time Delta-Sigma modulators are very sensitive to clock jitter effects. In literature, a number of techniques have been proposed to cope with them. In this brief, we present a detailed review and comparison of the reported techniques. While
[...] Read more.
It is well known that continuous-time Delta-Sigma modulators are very sensitive to clock jitter effects. In literature, a number of techniques have been proposed to cope with them. In this brief, we present a detailed review and comparison of the reported techniques. While the effectiveness to reduce clock jitter effects may be of most importance in this comparison, we also consider other performance metrics such as circuit complexity and overhead to implement the technique, power consumption overhead of technique, synthesis complexity incurred in system-level design, extensibility of the technique from single-bit to multi-bit operation, and robustness to process variation. When clock jitter is relatively large, the fixed-width pulse feedback technique is most effective to reduce clock jitter effects among all techniques at high sampling frequency, while switched-capacitor-resistor and switched-shaped current techniques have best performance at medium frequency or below. Full article
Figures

Figure 1

Open AccessArticle
Models and Techniques for Temperature Robust Systems on a Reconfigurable Platform
J. Low Power Electron. Appl. 2017, 7(3), 21; doi:10.3390/jlpea7030021 -
Abstract
This paper investigates the variability of various circuits and systems over temperature and presents several methods to improve their performance over temperature. The work demonstrates use of large scale reconfigurable System-On-Chip (SOC) for reducing the variability of circuits and systems compiled on a
[...] Read more.
This paper investigates the variability of various circuits and systems over temperature and presents several methods to improve their performance over temperature. The work demonstrates use of large scale reconfigurable System-On-Chip (SOC) for reducing the variability of circuits and systems compiled on a Floating Gate (FG) based Field Programmable Analog Array (FPAA). Temperature dependencies of circuits are modeled using an open-source simulator built in the Scilab/XCOS environment and the results are compared with measurement data obtained from the FPAA. This comparison gives further insight into the temperature dependence of various circuits and signal processing systems and allows us to compensate as well as predict their behavior. Also, the work presents several different current and voltage references that could help in reducing the variability caused due to changes in temperature. These references are standard blocks in the Scilab/Xcos environment that could be easily compiled on the FPAA. An FG based current reference is then used for biasing a 12×1 Vector Matrix Multiplication (VMM) circuit and a second order GmC bandpass filter to demonstrate the compilation and usage of these voltage/current reference in a reconfigurable fabric. The large scale FG FPAA presented here is fabricated in a 350 nm CMOS process. Full article
Figures

Figure 1

Open AccessArticle
Ultra-Low Power Consuming Direct Radiation Sensors Based on Floating Gate Structures
J. Low Power Electron. Appl. 2017, 7(3), 20; doi:10.3390/jlpea7030020 -
Abstract
In this paper, we report on ultra-low power consuming single poly floating gate direct radiation sensors. The developed devices are intended for total ionizing dose (TID) measurements and fabricated in a standard CMOS process flow. Sensor design and operation is discussed in detail.
[...] Read more.
In this paper, we report on ultra-low power consuming single poly floating gate direct radiation sensors. The developed devices are intended for total ionizing dose (TID) measurements and fabricated in a standard CMOS process flow. Sensor design and operation is discussed in detail. Original array sensors were suggested and fabricated that allowed high statistical significance of the radiation measurements and radiation imaging functions. Single sensors and array sensors were analyzed in combination with the specially developed test structures. This allowed insight into the physics of sensor operations and exclusion of the phenomena related to material degradation under irradiation in the interpretation of the measurement results. Response of the developed sensors to various sources of ionizing radiation (Gamma, X-ray, UV, energetic ions) was investigated. The optimal design of sensor for implementation in dosimetry systems was suggested. The roadmap for future improvement of sensor performance is suggested. Full article
Figures

Open AccessArticle
Characterization of an ISFET with Built-in Calibration Registers through Segmented Eight-Bit Binary Search in Three-Point Algorithm Using FPGA
J. Low Power Electron. Appl. 2017, 7(3), 19; doi:10.3390/jlpea7030019 -
Abstract
Sensors play the most important role in observing changes in an environment they are a part. They detect even the smallest changes and send the information to other electronic devices. Making sure that these sensors provide an accurate output is equally crucial, as
[...] Read more.
Sensors play the most important role in observing changes in an environment they are a part. They detect even the smallest changes and send the information to other electronic devices. Making sure that these sensors provide an accurate output is equally crucial, as the data it measures and collects are used for analysis. Until now, calibrating sensors has been done manually by following a sequence of procedures, and is usually performed on-site or in a laboratory prior to deployment. To eliminate the manual procedure in the calibration (at the very least), an ion-sensitive field-effect transistor (ISFET) with a built-in calibration registers circuit was created through segmented eight-bit binary search in a three-point algorithm using a field-programmable gate array (FPGA). The circuit was created using a three-point calibration algorithm and three standard buffers (pH 4, pH 7, and pH 10). The block diagram, schematic diagram, and the number of logic gates were derived after synthesizing the Verilog program in Xilinx/FPGA. An average of 0.30% error was computed to prove the reliability of the created circuit using FPGA. Having an ISFET with built-in calibration registers will alleviate the work of experts in performing calibrations. This would follow the plug and play standard, hence its being a calibration-ready ISFET device. With this feature, it could be used as a pH level meter or a remote sensor node in several applications. Full article
Figures

Figure 1

Open AccessEditorial
A Summary of the Special Issue “Emerging Network-on-Chip Architectures for Low Power Embedded Systems”
J. Low Power Electron. Appl. 2017, 7(3), 18; doi:10.3390/jlpea7030018 -
Open AccessArticle
Starting Framework for Analog Numerical Analysis for Energy-Efficient Computing
J. Low Power Electron. Appl. 2017, 7(3), 17; doi:10.3390/jlpea7030017 -
Abstract
The focus of this work is to develop a starting framework for analog numerical analysis and related algorithm questions. Digital computation is enabled by a framework developed over the last 80 years. Having an analog framework enables wider capability while giving the designer
[...] Read more.
The focus of this work is to develop a starting framework for analog numerical analysis and related algorithm questions. Digital computation is enabled by a framework developed over the last 80 years. Having an analog framework enables wider capability while giving the designer tools to make reasonable choices. Analog numerical analysis concerns computation on physical structures utilizing the real-valued representations of that physical system. This work starts the conversation of analog numerical analysis, including exploring the relevancy and need for this framework. A complexity framework based on computational strengths and weaknesses builds from addressing analog and digital numerical precision, as well as addresses analog and digital error propagation due to computation. The complimentary analog and digital computational techniques enable wider computational capabilities. Full article
Figures

Figure 1

Open AccessArticle
Flexible, Scalable and Energy Efficient Bio-Signals Processing on the PULP Platform: A Case Study on Seizure Detection
J. Low Power Electron. Appl. 2017, 7(2), 16; doi:10.3390/jlpea7020016 -
Abstract
Ultra-low power operation and extreme energy efficiency are strong requirements for a number of high-growth application areas requiring near-sensor processing, including elaboration of biosignals. Parallel near-threshold computing is emerging as an approach to achieve significant improvements in energy efficiency while overcoming the performance
[...] Read more.
Ultra-low power operation and extreme energy efficiency are strong requirements for a number of high-growth application areas requiring near-sensor processing, including elaboration of biosignals. Parallel near-threshold computing is emerging as an approach to achieve significant improvements in energy efficiency while overcoming the performance degradation typical of low-voltage operations. In this paper, we demonstrate the capabilities of the PULP (Parallel Ultra-Low Power) platform on an algorithm for seizure detection, representative of a wide range of EEG signal processing applications. Starting from the 28-nm FD-SOI (Fully Depleted Silicon On Insulator) technology implementation of the third embodiment of the PULP architecture, we analyze the energy-efficient implementation of the seizure detection algorithm on PULP. The proposed parallel implementation exploits the dynamic voltage and frequency scaling capabilities, as well as the embedded power knobs of the PULP platform, reducing energy consumption for a seizure detection by up to 10× with respect to a sequential implementation at the nominal supply voltage and by 4.2× with respect to a sequential implementation with voltage scaling. Moreover, we analyze the trans-precision optimization of the algorithm on PULP, by means of a hybrid fixed- and floating-point implementation. This approach reduces the energy consumption by up to 43% with respect to the plain fixed-point and floating-point implementations, leveraging the requirements in terms of the precision of the kernels composing the processing chain to improve energy efficiency. Thanks to the proposed architecture and system-level approach for optimization, we demonstrate that PULP reduces energy consumption by up to 140× with respect to commercial low-power microcontrollers, being able to satisfy the real-time constraints typical of bio-medical applications, breaking the barrier of microwatts for a 50-ms complete seizure detection and a few milliwatts for a 5-ms detection latency on a fully-programmable architecture. Full article
Figures

Figure 1

Open AccessArticle
Predictive Direct Torque Control Application-Specific Integrated Circuit of an Induction Motor Drive with a Fuzzy Controller
J. Low Power Electron. Appl. 2017, 7(2), 15; doi:10.3390/jlpea7020015 -
Abstract
This paper proposes a modified predictive direct torque control (PDTC) application-specific integrated circuit (ASIC) of a motor drive with a fuzzy controller for eliminating sampling and calculating delay times in hysteresis controllers. These delay times degrade the control quality and increase both torque
[...] Read more.
This paper proposes a modified predictive direct torque control (PDTC) application-specific integrated circuit (ASIC) of a motor drive with a fuzzy controller for eliminating sampling and calculating delay times in hysteresis controllers. These delay times degrade the control quality and increase both torque and flux ripples in a motor drive. The proposed fuzzy PDTC ASIC calculates the stator’s magnetic flux and torque by detecting the three-phase current, three-phase voltage, and rotor speed, and eliminates the ripples in the torque and flux by using a fuzzy controller and predictive scheme. The Verilog hardware description language was used to implement the hardware architecture, and the ASIC was fabricated by the Taiwan Semiconductor Manufacturing Company through a 0.18-μm 1P6M CMOS process that involved a cell-based design method. The measurements revealed that the proposed fuzzy PDTC ASIC of the three-phase induction motor yielded a test coverage of 96.03%, fault coverage of 95.06%, chip area of 1.81 × 1.81 mm2, and power consumption of 296 mW, at an operating frequency of 50 MHz and a supply voltage of 1.8 V. Full article
Figures

Figure 1

Open AccessArticle
Architectural Techniques for Improving the Power Consumption of NoC-Based CMPs: A Case Study of Cache and Network Layer
J. Low Power Electron. Appl. 2017, 7(2), 14; doi:10.3390/jlpea7020014 -
Abstract
The disparity between memory and CPU have been ameliorated by the introduction of Network-on-Chip-based Chip-Multiprocessors (NoC-based CMPS). However, power consumption continues to be an aggressive stumbling block halting the progress of technology. Miniaturized transistors invoke many-core integration at the cost of high power
[...] Read more.
The disparity between memory and CPU have been ameliorated by the introduction of Network-on-Chip-based Chip-Multiprocessors (NoC-based CMPS). However, power consumption continues to be an aggressive stumbling block halting the progress of technology. Miniaturized transistors invoke many-core integration at the cost of high power consumption caused by the components in NoC-based CMPs; particularly caches and routers. If NoC-based CMPs are to be standardised as the future of technology design, it is imperative that the power demands of its components are optimized. Much research effort has been put into finding techniques that can improve the power efficiency for both cache and router architectures. This work presents a survey of power-saving techniques for efficient NoC designs with a focus on the cache and router components, such as the buffer and crossbar. Nonetheless, the aim of this work is to compile a quick reference guide of power-saving techniques for engineers and researchers. Full article
Figures

Figure 1

Open AccessArticle
Global Adaptation Controlled by an Interactive Consistency Protocol
J. Low Power Electron. Appl. 2017, 7(2), 13; doi:10.3390/jlpea7020013 -
Abstract
Static schedules for systems can lead to an inefficient usage of the resources, because the system’s behavior cannot be adapted at runtime. To improve the runtime system performance in current time-triggered Multi-Processor System on Chip (MPSoC), a dynamic reaction to events is performed
[...] Read more.
Static schedules for systems can lead to an inefficient usage of the resources, because the system’s behavior cannot be adapted at runtime. To improve the runtime system performance in current time-triggered Multi-Processor System on Chip (MPSoC), a dynamic reaction to events is performed locally on the cores. The effects of this optimization can be increased by coordinating the changes globally. To perform such global changes, a consistent view on the system state is needed, on which to base the adaptation decisions. This paper proposes such an interactive consistency protocol with low impact on the system w.r.t. latency and overhead. We show that an energy optimizing adaptation controlled by the protocol can enable a system to save up to 43% compared to a system without adaptation. Full article
Figures

Figure 1

Open AccessArticle
SoC Hardware Implementation of Real-Time Video Segmentation based on the Mixture of Gaussian Algorithm
J. Low Power Electron. Appl. 2017, 7(2), 12; doi:10.3390/jlpea7020012 -
Abstract
Video segmentation based on the Mixture of Gaussian (MoG) algorithm is widely used in video processing systems, and hardware implementations have been proposed in the past years. Most previous work focused on high-performance custom design of the MoG algorithm to meet real-time requirement
[...] Read more.
Video segmentation based on the Mixture of Gaussian (MoG) algorithm is widely used in video processing systems, and hardware implementations have been proposed in the past years. Most previous work focused on high-performance custom design of the MoG algorithm to meet real-time requirement of high-frame-rate high-resolution video segmentation tasks. This paper focuses on the System-on-Chip (SoC) design and the priority is SoC integration of the system for flexibility/adaptability, while at the same time, custom design of the original MoG algorithm is included. To maximally retain the accuracy of the MoG algorithm for best segmentation performance, we minimally modified the MoG algorithm for hardware implementation at the cost of hardware resources. The MoG algorithm is custom-implemented as a hardware IP (Intellectual Property), which is then integrated within an SoC platform together with other video processing components, so that some key control parameters can be configured on-line, which makes the video segmentation system most suitable for different scenarios. The proposed implementation has been demonstrated and tested on a Xilinx Spartan-3A DSP Video Starter Board. Experiment results show that under a clock frequency of 25 MHz, this design meets the real-time requirement for VGA resolution (640 × 480) at 30 fps (frame-per-second). Full article
Figures

Figure 1

Open AccessArticle
Ultra Low Energy FDSOI Asynchronous Reconfiguration Network for Adaptive Circuits
J. Low Power Electron. Appl. 2017, 7(2), 11; doi:10.3390/jlpea7020011 -
Abstract
This paper introduces a plug-and-play on-chip asynchronous communication network aimed at the dynamic reconfiguration of a low-power adaptive circuit such as an internet of things (IoT) system. By using a separate communication network, we can address both digital and analog blocks at a
[...] Read more.
This paper introduces a plug-and-play on-chip asynchronous communication network aimed at the dynamic reconfiguration of a low-power adaptive circuit such as an internet of things (IoT) system. By using a separate communication network, we can address both digital and analog blocks at a lower configuration cost, increasing the overall system power efficiency. As reconfiguration only occurs according to specific events and has to be automatically in stand-by most of the time, our design is fully asynchronous using handshake protocols. The paper presents the circuit’s architecture, performance results, and an example of the reconfiguration of frequency locked loops (FLL) to validate our work. We obtain an overall energy per bit of 0.07 pJ/bit for one stage, in a 28 nm Fully Depleted Silicon On Insulator (FDSOI) technology at 0.6 V and a 1.1 ns/bit latency per stage. Full article
Figures

Figure 1

Open AccessArticle
A General-Purpose Graphics Processing Unit (GPGPU)-Accelerated Robotic Controller Using a Low Power Mobile Platform
J. Low Power Electron. Appl. 2017, 7(2), 10; doi:10.3390/jlpea7020010 -
Abstract
Robotic controllers have to execute various complex independent tasks repeatedly. Massive processing power is required by the motion controllers to compute the solution of these computationally intensive algorithms. General-purpose graphics processing unit (GPGPU)-enabled mobile phones can be leveraged for acceleration of these motion
[...] Read more.
Robotic controllers have to execute various complex independent tasks repeatedly. Massive processing power is required by the motion controllers to compute the solution of these computationally intensive algorithms. General-purpose graphics processing unit (GPGPU)-enabled mobile phones can be leveraged for acceleration of these motion controllers. Embedded GPUs can replace several dedicated computing boards by a single powerful and less power-consuming GPU. In this paper, the inverse kinematic algorithm based numeric controllers is proposed and realized using the GPGPU of a handheld mobile device. This work is the extension of a desktop GPU-accelerated robotic controller presented at DAS’16 where the comparative analysis of different sequential and concurrent controllers is discussed. First of all, the inverse kinematic algorithm is sequentially realized using Arduino-Due microcontroller and the field-programmable gate array (FPGA) is used for its parallel implementation. Execution speeds of these controllers are compared with two different GPGPU architectures (Nvidia Quadro K2200 and Nvidia Shield K1 Tablet), programmed with Compute Unified Device Architecture (CUDA) computing language. Experimental data shows that the proposed mobile platform-based scheme outperforms the FPGA by 5× and boasts a 100× speedup over the Arduino-based sequential implementation. Full article
Figures

Open AccessArticle
High Performance Receiver Design for RX Carrier Aggregation
J. Low Power Electron. Appl. 2017, 7(2), 9; doi:10.3390/jlpea7020009 -
Abstract
Carrier aggregation is one of the key features to increase the data rate given a scarce bandwidth spectrum. This paper describes the design of a high performance receiver suitable for carrier aggregation in LTE-Advanced and future 5 G standards. The proposed architecture is
[...] Read more.
Carrier aggregation is one of the key features to increase the data rate given a scarce bandwidth spectrum. This paper describes the design of a high performance receiver suitable for carrier aggregation in LTE-Advanced and future 5 G standards. The proposed architecture is versatile to support legacy mode (single carrier), inter-band carrier aggregation, and intra-band carrier aggregation. Performance with carrier-aggregation support is as good as legacy receivers. Contradicting requirements of high linearity and the low noise is satisfied with the single-gm receiver architecture in addition to supporting carrier aggregation. The proposed cascode-shutoff low-noise trans-conductance amplifier (LNTA) achieves 57.1 dB voltage gain, 1.76 dB NF (noise figure) , and -6.7 dBm IIP3 (Third-order intercept point) with the power consumption of 21.3 mW in the intra-band carrier aggregation scenario. With legacy mode, the same receiver signal path achieves 56.6 dB voltage gain, 1.33 dB NF, and -6.2 dBm IIP3 with a low power consumption of 7.4 mW. Full article
Figures

Figure 1

Open AccessArticle
The Design and Implementation of a Low-Power Gating Scan Element in 32/28 nm CMOS Technology
J. Low Power Electron. Appl. 2017, 7(2), 7; doi:10.3390/jlpea7020007 -
Abstract
Excessive power consumption during test application time has severely negative effects on chip reliability since it has an inevitable role in hot spots that appear, degradation of performance, circuit premature destruction, and functional failures. In scan-based designs, rippling transitions caused by test patterns
[...] Read more.
Excessive power consumption during test application time has severely negative effects on chip reliability since it has an inevitable role in hot spots that appear, degradation of performance, circuit premature destruction, and functional failures. In scan-based designs, rippling transitions caused by test patterns shifting along the scan chain not only elevate power consumption in the scan chain but also introduce spurious switching activities in the combinational logic. In this work, a new low power gating scan cell for scan based designs has been proposed in order to reduce power consumption in the scan chain as well as the combinational part during shifting. We have modified the conventional scan cell and augmented it with state preserving and gating logic that enables an average power reduction in combinational logic during shift mode. The new scan cell mitigates the number of transitions during shift and capture cycles. Thus, it reduces the average power consumption inside the scan cell and as a result the scan chain during scan shifting with a low impact on peak power during the capture cycle. Furthermore, due to introducing a new shorter shift path, improvements are observed in terms of propagation delay and power consumption in the scan chain during shifting. This leads to higher feasible shift frequency whereby the shift frequency is limited by the maximum power budget and hence results in reducing the test application time. The post-layout spice simulation results show a 7.21% reduction in total power consumption, an average 12.25% reduction of shift power consumption, and a 50.7% improvement in the clock (CLK)-to-shift propagation delay over the conventional scan cell in Synopsys 32/28 nm standard CMOS technology. Full article
Figures

Figure 1

Open AccessArticle
Extending the Performance of Hybrid NoCs beyond the Limitations of Network Heterogeneity
J. Low Power Electron. Appl. 2017, 7(2), 8; doi:10.3390/jlpea7020008 -
Abstract
To meet the performance and scalability demands of the fast-paced technological growth towards exascale and big data processing with the performance bottleneck of conventional metal-based interconnects (wireline), alternative interconnect fabrics, such as inhomogeneous three-dimensional integrated network-on-chip (3D NoC) and hybrid wired-wireless network-on-chip (WiNoC),
[...] Read more.
To meet the performance and scalability demands of the fast-paced technological growth towards exascale and big data processing with the performance bottleneck of conventional metal-based interconnects (wireline), alternative interconnect fabrics, such as inhomogeneous three-dimensional integrated network-on-chip (3D NoC) and hybrid wired-wireless network-on-chip (WiNoC), have emanated as a cost-effective solution for emerging system-on-chip (SoC) design. However, these interconnects trade off optimized performance for cost by restricting the number of area and power hungry 3D routers and wireless nodes. Moreover, the non-uniform distributed traffic in a chip multiprocessor (CMP) demands an on-chip communication infrastructure that can avoid congestion under high traffic conditions while possessing minimal pipeline delay at low-load conditions. To this end, in this paper, we propose a low-latency adaptive router with a low-complexity single-cycle bypassing mechanism to alleviate the performance degradation due to the slow 2D routers in such emerging hybrid NoCs. The proposed router transmits a flit using dimension-ordered routing (DoR) in the bypass datapath at low-loads. When the output port required for intra-dimension bypassing is not available, the packet is routed adaptively to avoid congestion. The router also has a simplified virtual channel allocation (VA) scheme that yields a non-speculative low-latency pipeline. By combining the low-complexity bypassing technique with adaptive routing, the proposed router is able to balance the traffic in hybrid NoCs to achieve low-latency communication under various traffic loads. Simulation shows that the proposed router can reduce applications’ execution time by an average of 16.9% compared to low-latency routers, such as SWIFT. By reducing the latency between 2D routers (or wired nodes) and 3D routers (or wireless nodes), the proposed router can improve the performance efficiency in terms of average packet delay by an average of 45% (or 50%) in 3D NoCs (or WiNoCs). Full article
Figures

Figure 1

Open AccessArticle
Design of a Wideband Antenna for Wireless Network-On-Chip in Multimedia Applications
J. Low Power Electron. Appl. 2017, 7(2), 6; doi:10.3390/jlpea7020006 -
Abstract
To allow fast communication—at several Gb/s—of multimedia content among processors and memories in a multi-processor system-on-chip, a new approach is emerging in literature: Wireless Network-on-Chip (WiNoC). With reference to this scenario, this paper presents the design of the key element of the WiNoC:
[...] Read more.
To allow fast communication—at several Gb/s—of multimedia content among processors and memories in a multi-processor system-on-chip, a new approach is emerging in literature: Wireless Network-on-Chip (WiNoC). With reference to this scenario, this paper presents the design of the key element of the WiNoC: the antenna. Specifically, a bow-tie antenna is proposed, which operates at mm-waves and can be implemented on-chip using the top metal layer of a conventional silicon CMOS (Complementary Metal Oxide Semiconductor) technology. The antenna performance is discussed in the paper and is compared to the state-of-the-art, including the zig-zag antenna topology that is typically used in literature as a reference for WiNoC. The proposed bow-tie antenna design for WiNoC stands out for its good trade-off among bandwidth, gain, size and beamwidth vs. the state-of-the-art. Full article
Figures

Figure 1