Open AccessArticle
Starting Framework for Analog Numerical Analysis for Energy-Efficient Computing
J. Low Power Electron. Appl. 2017, 7(3), 17; doi:10.3390/jlpea7030017 -
Abstract
The focus of this work is to develop a starting framework for analog numerical analysis and related algorithm questions. Digital computation is enabled by a framework developed over the last 80 years. Having an analog framework enables wider capability while giving the designer
[...] Read more.
The focus of this work is to develop a starting framework for analog numerical analysis and related algorithm questions. Digital computation is enabled by a framework developed over the last 80 years. Having an analog framework enables wider capability while giving the designer tools to make reasonable choices. Analog numerical analysis concerns computation on physical structures utilizing the real-valued representations of that physical system. This work starts the conversation of analog numerical analysis, including exploring the relevancy and need for this framework. A complexity framework based on computational strengths and weaknesses builds from addressing analog and digital numerical precision, as well as addresses analog and digital error propagation due to computation. The complimentary analog and digital computational techniques enable wider computational capabilities. Full article
Figures

Figure 1

Open AccessArticle
Flexible, Scalable and Energy Efficient Bio-Signals Processing on the PULP Platform: A Case Study on Seizure Detection
J. Low Power Electron. Appl. 2017, 7(2), 16; doi:10.3390/jlpea7020016 -
Abstract
Ultra-low power operation and extreme energy efficiency are strong requirements for a number of high-growth application areas requiring near-sensor processing, including elaboration of biosignals. Parallel near-threshold computing is emerging as an approach to achieve significant improvements in energy efficiency while overcoming the performance
[...] Read more.
Ultra-low power operation and extreme energy efficiency are strong requirements for a number of high-growth application areas requiring near-sensor processing, including elaboration of biosignals. Parallel near-threshold computing is emerging as an approach to achieve significant improvements in energy efficiency while overcoming the performance degradation typical of low-voltage operations. In this paper, we demonstrate the capabilities of the PULP (Parallel Ultra-Low Power) platform on an algorithm for seizure detection, representative of a wide range of EEG signal processing applications. Starting from the 28-nm FD-SOI (Fully Depleted Silicon On Insulator) technology implementation of the third embodiment of the PULP architecture, we analyze the energy-efficient implementation of the seizure detection algorithm on PULP. The proposed parallel implementation exploits the dynamic voltage and frequency scaling capabilities, as well as the embedded power knobs of the PULP platform, reducing energy consumption for a seizure detection by up to 10× with respect to a sequential implementation at the nominal supply voltage and by 4.2× with respect to a sequential implementation with voltage scaling. Moreover, we analyze the trans-precision optimization of the algorithm on PULP, by means of a hybrid fixed- and floating-point implementation. This approach reduces the energy consumption by up to 43% with respect to the plain fixed-point and floating-point implementations, leveraging the requirements in terms of the precision of the kernels composing the processing chain to improve energy efficiency. Thanks to the proposed architecture and system-level approach for optimization, we demonstrate that PULP reduces energy consumption by up to 140× with respect to commercial low-power microcontrollers, being able to satisfy the real-time constraints typical of bio-medical applications, breaking the barrier of microwatts for a 50-ms complete seizure detection and a few milliwatts for a 5-ms detection latency on a fully-programmable architecture. Full article
Figures

Figure 1

Open AccessArticle
Predictive Direct Torque Control Application-Specific Integrated Circuit of an Induction Motor Drive with a Fuzzy Controller
J. Low Power Electron. Appl. 2017, 7(2), 15; doi:10.3390/jlpea7020015 -
Abstract
This paper proposes a modified predictive direct torque control (PDTC) application-specific integrated circuit (ASIC) of a motor drive with a fuzzy controller for eliminating sampling and calculating delay times in hysteresis controllers. These delay times degrade the control quality and increase both torque
[...] Read more.
This paper proposes a modified predictive direct torque control (PDTC) application-specific integrated circuit (ASIC) of a motor drive with a fuzzy controller for eliminating sampling and calculating delay times in hysteresis controllers. These delay times degrade the control quality and increase both torque and flux ripples in a motor drive. The proposed fuzzy PDTC ASIC calculates the stator’s magnetic flux and torque by detecting the three-phase current, three-phase voltage, and rotor speed, and eliminates the ripples in the torque and flux by using a fuzzy controller and predictive scheme. The Verilog hardware description language was used to implement the hardware architecture, and the ASIC was fabricated by the Taiwan Semiconductor Manufacturing Company through a 0.18-μm 1P6M CMOS process that involved a cell-based design method. The measurements revealed that the proposed fuzzy PDTC ASIC of the three-phase induction motor yielded a test coverage of 96.03%, fault coverage of 95.06%, chip area of 1.81 × 1.81 mm2, and power consumption of 296 mW, at an operating frequency of 50 MHz and a supply voltage of 1.8 V. Full article
Figures

Figure 1

Open AccessArticle
Architectural Techniques for Improving the Power Consumption of NoC-Based CMPs: A Case Study of Cache and Network Layer
J. Low Power Electron. Appl. 2017, 7(2), 14; doi:10.3390/jlpea7020014 -
Abstract
The disparity between memory and CPU have been ameliorated by the introduction of Network-on-Chip-based Chip-Multiprocessors (NoC-based CMPS). However, power consumption continues to be an aggressive stumbling block halting the progress of technology. Miniaturized transistors invoke many-core integration at the cost of high power
[...] Read more.
The disparity between memory and CPU have been ameliorated by the introduction of Network-on-Chip-based Chip-Multiprocessors (NoC-based CMPS). However, power consumption continues to be an aggressive stumbling block halting the progress of technology. Miniaturized transistors invoke many-core integration at the cost of high power consumption caused by the components in NoC-based CMPs; particularly caches and routers. If NoC-based CMPs are to be standardised as the future of technology design, it is imperative that the power demands of its components are optimized. Much research effort has been put into finding techniques that can improve the power efficiency for both cache and router architectures. This work presents a survey of power-saving techniques for efficient NoC designs with a focus on the cache and router components, such as the buffer and crossbar. Nonetheless, the aim of this work is to compile a quick reference guide of power-saving techniques for engineers and researchers. Full article
Figures

Figure 1

Open AccessArticle
Global Adaptation Controlled by an Interactive Consistency Protocol
J. Low Power Electron. Appl. 2017, 7(2), 13; doi:10.3390/jlpea7020013 -
Abstract
Static schedules for systems can lead to an inefficient usage of the resources, because the system’s behavior cannot be adapted at runtime. To improve the runtime system performance in current time-triggered Multi-Processor System on Chip (MPSoC), a dynamic reaction to events is performed
[...] Read more.
Static schedules for systems can lead to an inefficient usage of the resources, because the system’s behavior cannot be adapted at runtime. To improve the runtime system performance in current time-triggered Multi-Processor System on Chip (MPSoC), a dynamic reaction to events is performed locally on the cores. The effects of this optimization can be increased by coordinating the changes globally. To perform such global changes, a consistent view on the system state is needed, on which to base the adaptation decisions. This paper proposes such an interactive consistency protocol with low impact on the system w.r.t. latency and overhead. We show that an energy optimizing adaptation controlled by the protocol can enable a system to save up to 43% compared to a system without adaptation. Full article
Figures

Figure 1

Open AccessArticle
SoC Hardware Implementation of Real-Time Video Segmentation based on the Mixture of Gaussian Algorithm
J. Low Power Electron. Appl. 2017, 7(2), 12; doi:10.3390/jlpea7020012 -
Abstract
Video segmentation based on the Mixture of Gaussian (MoG) algorithm is widely used in video processing systems, and hardware implementations have been proposed in the past years. Most previous work focused on high-performance custom design of the MoG algorithm to meet real-time requirement
[...] Read more.
Video segmentation based on the Mixture of Gaussian (MoG) algorithm is widely used in video processing systems, and hardware implementations have been proposed in the past years. Most previous work focused on high-performance custom design of the MoG algorithm to meet real-time requirement of high-frame-rate high-resolution video segmentation tasks. This paper focuses on the System-on-Chip (SoC) design and the priority is SoC integration of the system for flexibility/adaptability, while at the same time, custom design of the original MoG algorithm is included. To maximally retain the accuracy of the MoG algorithm for best segmentation performance, we minimally modified the MoG algorithm for hardware implementation at the cost of hardware resources. The MoG algorithm is custom-implemented as a hardware IP (Intellectual Property), which is then integrated within an SoC platform together with other video processing components, so that some key control parameters can be configured on-line, which makes the video segmentation system most suitable for different scenarios. The proposed implementation has been demonstrated and tested on a Xilinx Spartan-3A DSP Video Starter Board. Experiment results show that under a clock frequency of 25 MHz, this design meets the real-time requirement for VGA resolution (640 × 480) at 30 fps (frame-per-second). Full article
Figures

Figure 1

Open AccessArticle
Ultra Low Energy FDSOI Asynchronous Reconfiguration Network for Adaptive Circuits
J. Low Power Electron. Appl. 2017, 7(2), 11; doi:10.3390/jlpea7020011 -
Abstract
This paper introduces a plug-and-play on-chip asynchronous communication network aimed at the dynamic reconfiguration of a low-power adaptive circuit such as an internet of things (IoT) system. By using a separate communication network, we can address both digital and analog blocks at a
[...] Read more.
This paper introduces a plug-and-play on-chip asynchronous communication network aimed at the dynamic reconfiguration of a low-power adaptive circuit such as an internet of things (IoT) system. By using a separate communication network, we can address both digital and analog blocks at a lower configuration cost, increasing the overall system power efficiency. As reconfiguration only occurs according to specific events and has to be automatically in stand-by most of the time, our design is fully asynchronous using handshake protocols. The paper presents the circuit’s architecture, performance results, and an example of the reconfiguration of frequency locked loops (FLL) to validate our work. We obtain an overall energy per bit of 0.07 pJ/bit for one stage, in a 28 nm Fully Depleted Silicon On Insulator (FDSOI) technology at 0.6 V and a 1.1 ns/bit latency per stage. Full article
Figures

Figure 1

Open AccessArticle
A General-Purpose Graphics Processing Unit (GPGPU)-Accelerated Robotic Controller Using a Low Power Mobile Platform
J. Low Power Electron. Appl. 2017, 7(2), 10; doi:10.3390/jlpea7020010 -
Abstract
Robotic controllers have to execute various complex independent tasks repeatedly. Massive processing power is required by the motion controllers to compute the solution of these computationally intensive algorithms. General-purpose graphics processing unit (GPGPU)-enabled mobile phones can be leveraged for acceleration of these motion
[...] Read more.
Robotic controllers have to execute various complex independent tasks repeatedly. Massive processing power is required by the motion controllers to compute the solution of these computationally intensive algorithms. General-purpose graphics processing unit (GPGPU)-enabled mobile phones can be leveraged for acceleration of these motion controllers. Embedded GPUs can replace several dedicated computing boards by a single powerful and less power-consuming GPU. In this paper, the inverse kinematic algorithm based numeric controllers is proposed and realized using the GPGPU of a handheld mobile device. This work is the extension of a desktop GPU-accelerated robotic controller presented at DAS’16 where the comparative analysis of different sequential and concurrent controllers is discussed. First of all, the inverse kinematic algorithm is sequentially realized using Arduino-Due microcontroller and the field-programmable gate array (FPGA) is used for its parallel implementation. Execution speeds of these controllers are compared with two different GPGPU architectures (Nvidia Quadro K2200 and Nvidia Shield K1 Tablet), programmed with Compute Unified Device Architecture (CUDA) computing language. Experimental data shows that the proposed mobile platform-based scheme outperforms the FPGA by 5× and boasts a 100× speedup over the Arduino-based sequential implementation. Full article
Figures

Open AccessArticle
High Performance Receiver Design for RX Carrier Aggregation
J. Low Power Electron. Appl. 2017, 7(2), 9; doi:10.3390/jlpea7020009 -
Abstract
Carrier aggregation is one of the key features to increase the data rate given a scarce bandwidth spectrum. This paper describes the design of a high performance receiver suitable for carrier aggregation in LTE-Advanced and future 5 G standards. The proposed architecture is
[...] Read more.
Carrier aggregation is one of the key features to increase the data rate given a scarce bandwidth spectrum. This paper describes the design of a high performance receiver suitable for carrier aggregation in LTE-Advanced and future 5 G standards. The proposed architecture is versatile to support legacy mode (single carrier), inter-band carrier aggregation, and intra-band carrier aggregation. Performance with carrier-aggregation support is as good as legacy receivers. Contradicting requirements of high linearity and the low noise is satisfied with the single-gm receiver architecture in addition to supporting carrier aggregation. The proposed cascode-shutoff low-noise trans-conductance amplifier (LNTA) achieves 57.1 dB voltage gain, 1.76 dB NF (noise figure) , and -6.7 dBm IIP3 (Third-order intercept point) with the power consumption of 21.3 mW in the intra-band carrier aggregation scenario. With legacy mode, the same receiver signal path achieves 56.6 dB voltage gain, 1.33 dB NF, and -6.2 dBm IIP3 with a low power consumption of 7.4 mW. Full article
Figures

Figure 1

Open AccessArticle
The Design and Implementation of a Low-Power Gating Scan Element in 32/28 nm CMOS Technology
J. Low Power Electron. Appl. 2017, 7(2), 7; doi:10.3390/jlpea7020007 -
Abstract
Excessive power consumption during test application time has severely negative effects on chip reliability since it has an inevitable role in hot spots that appear, degradation of performance, circuit premature destruction, and functional failures. In scan-based designs, rippling transitions caused by test patterns
[...] Read more.
Excessive power consumption during test application time has severely negative effects on chip reliability since it has an inevitable role in hot spots that appear, degradation of performance, circuit premature destruction, and functional failures. In scan-based designs, rippling transitions caused by test patterns shifting along the scan chain not only elevate power consumption in the scan chain but also introduce spurious switching activities in the combinational logic. In this work, a new low power gating scan cell for scan based designs has been proposed in order to reduce power consumption in the scan chain as well as the combinational part during shifting. We have modified the conventional scan cell and augmented it with state preserving and gating logic that enables an average power reduction in combinational logic during shift mode. The new scan cell mitigates the number of transitions during shift and capture cycles. Thus, it reduces the average power consumption inside the scan cell and as a result the scan chain during scan shifting with a low impact on peak power during the capture cycle. Furthermore, due to introducing a new shorter shift path, improvements are observed in terms of propagation delay and power consumption in the scan chain during shifting. This leads to higher feasible shift frequency whereby the shift frequency is limited by the maximum power budget and hence results in reducing the test application time. The post-layout spice simulation results show a 7.21% reduction in total power consumption, an average 12.25% reduction of shift power consumption, and a 50.7% improvement in the clock (CLK)-to-shift propagation delay over the conventional scan cell in Synopsys 32/28 nm standard CMOS technology. Full article
Figures

Figure 1

Open AccessArticle
Extending the Performance of Hybrid NoCs beyond the Limitations of Network Heterogeneity
J. Low Power Electron. Appl. 2017, 7(2), 8; doi:10.3390/jlpea7020008 -
Abstract
To meet the performance and scalability demands of the fast-paced technological growth towards exascale and big data processing with the performance bottleneck of conventional metal-based interconnects (wireline), alternative interconnect fabrics, such as inhomogeneous three-dimensional integrated network-on-chip (3D NoC) and hybrid wired-wireless network-on-chip (WiNoC),
[...] Read more.
To meet the performance and scalability demands of the fast-paced technological growth towards exascale and big data processing with the performance bottleneck of conventional metal-based interconnects (wireline), alternative interconnect fabrics, such as inhomogeneous three-dimensional integrated network-on-chip (3D NoC) and hybrid wired-wireless network-on-chip (WiNoC), have emanated as a cost-effective solution for emerging system-on-chip (SoC) design. However, these interconnects trade off optimized performance for cost by restricting the number of area and power hungry 3D routers and wireless nodes. Moreover, the non-uniform distributed traffic in a chip multiprocessor (CMP) demands an on-chip communication infrastructure that can avoid congestion under high traffic conditions while possessing minimal pipeline delay at low-load conditions. To this end, in this paper, we propose a low-latency adaptive router with a low-complexity single-cycle bypassing mechanism to alleviate the performance degradation due to the slow 2D routers in such emerging hybrid NoCs. The proposed router transmits a flit using dimension-ordered routing (DoR) in the bypass datapath at low-loads. When the output port required for intra-dimension bypassing is not available, the packet is routed adaptively to avoid congestion. The router also has a simplified virtual channel allocation (VA) scheme that yields a non-speculative low-latency pipeline. By combining the low-complexity bypassing technique with adaptive routing, the proposed router is able to balance the traffic in hybrid NoCs to achieve low-latency communication under various traffic loads. Simulation shows that the proposed router can reduce applications’ execution time by an average of 16.9% compared to low-latency routers, such as SWIFT. By reducing the latency between 2D routers (or wired nodes) and 3D routers (or wireless nodes), the proposed router can improve the performance efficiency in terms of average packet delay by an average of 45% (or 50%) in 3D NoCs (or WiNoCs). Full article
Figures

Figure 1

Open AccessArticle
Design of a Wideband Antenna for Wireless Network-On-Chip in Multimedia Applications
J. Low Power Electron. Appl. 2017, 7(2), 6; doi:10.3390/jlpea7020006 -
Abstract
To allow fast communication—at several Gb/s—of multimedia content among processors and memories in a multi-processor system-on-chip, a new approach is emerging in literature: Wireless Network-on-Chip (WiNoC). With reference to this scenario, this paper presents the design of the key element of the WiNoC:
[...] Read more.
To allow fast communication—at several Gb/s—of multimedia content among processors and memories in a multi-processor system-on-chip, a new approach is emerging in literature: Wireless Network-on-Chip (WiNoC). With reference to this scenario, this paper presents the design of the key element of the WiNoC: the antenna. Specifically, a bow-tie antenna is proposed, which operates at mm-waves and can be implemented on-chip using the top metal layer of a conventional silicon CMOS (Complementary Metal Oxide Semiconductor) technology. The antenna performance is discussed in the paper and is compared to the state-of-the-art, including the zig-zag antenna topology that is typically used in literature as a reference for WiNoC. The proposed bow-tie antenna design for WiNoC stands out for its good trade-off among bandwidth, gain, size and beamwidth vs. the state-of-the-art. Full article
Figures

Figure 1

Open AccessArticle
Energy Efficiency Effects of Vectorization in Data Reuse Transformations for Many-Core Processors—A Case Study †
J. Low Power Electron. Appl. 2017, 7(1), 5; doi:10.3390/jlpea7010005 -
Abstract
Thread-level and data-level parallel architectures have become the design of choice in many of today’s energy-efficient computing systems. However, these architectures put substantially higher requirements on the memory subsystem than scalar architectures, making memory latency and bandwidth critical in their overall efficiency. Data
[...] Read more.
Thread-level and data-level parallel architectures have become the design of choice in many of today’s energy-efficient computing systems. However, these architectures put substantially higher requirements on the memory subsystem than scalar architectures, making memory latency and bandwidth critical in their overall efficiency. Data reuse exploration aims at reducing the pressure on the memory subsystem by exploiting the temporal locality in data accesses. In this paper, we investigate the effects on performance and energy from a data reuse methodology combined with parallelization and vectorization in multi- and many-core processors. As a test case, a full-search motion estimation kernel is evaluated on Intel® CoreTM i7-4700K (Haswell) and i7-2600K (Sandy Bridge) multi-core processors, as well as on an Intel® Xeon PhiTM many-core processor (Knights Landing) with Streaming Single Instruction Multiple Data (SIMD) Extensions (SSE) and Advanced Vector Extensions (AVX) instruction sets. Results using a single-threaded execution on the Haswell and Sandy Bridge systems show that performance and EDP (Energy Delay Product) can be improved through data reuse transformations on the scalar code by a factor of ≈3× and ≈6×, respectively. Compared to scalar code without data reuse optimization, the SSE/AVX2 version achieves ≈10×/17× better performance and ≈92×/307× better EDP, respectively. These results can be improved by 10% to 15% using data reuse techniques. Finally, the most optimized version using data reuse and AVX512 achieves a speedup of ≈35× and an EDP improvement of ≈1192× on the Xeon Phi system. While single-threaded execution serves as a common reference point for all architectures to analyze the effects of data reuse on both scalar and vector codes, scalability with thread count is also discussed in the paper. Full article
Figures

Figure 1

Open AccessArticle
A Novel Design Flow for a Security-Driven Synthesis of Side-Channel Hardened Cryptographic Modules
J. Low Power Electron. Appl. 2017, 7(1), 4; doi:10.3390/jlpea7010004 -
Abstract
Over the last few decades, computer-aided engineering (CAE) tools have been developed and improved in order to ensure a short time-to-market in the chip design business. Up to now, these design tools do not yet support an integrated design strategy for the development
[...] Read more.
Over the last few decades, computer-aided engineering (CAE) tools have been developed and improved in order to ensure a short time-to-market in the chip design business. Up to now, these design tools do not yet support an integrated design strategy for the development of side-channel-resistant hardware implementations. In order to close this gap, a novel framework named AMASIVE (Adaptable Modular Autonomous SIde-Channel Vulnerability Evaluator) was developed. It supports the designer in implementing devices hardened against power attacks by exploiting novel security-driven synthesis methods. The article at hand can be seen as the second of the two contributions that address the AMASIVE framework. While the first one describes how the framework automatically detects vulnerabilities against power attacks, the second one explains how a design can be hardened in an automatic way by means of appropriate countermeasures, which are tailored to the identified weaknesses. In addition to the theoretical introduction of the fundamental concepts, we demonstrate an application to the hardening of a complete hardware implementation of the block cipher PRESENT. Full article
Figures

Open AccessArticle
Completing the Complete ECC Formulae with Countermeasures
J. Low Power Electron. Appl. 2017, 7(1), 3; doi:10.3390/jlpea7010003 -
Abstract
This work implements and evaluates the recent complete addition formulae for the prime order elliptic curves of Renes, Costello and Batina on an FPGA platform. We implement three different versions:(1) an unprotected architecture; (2) an architecture protected through coordinate randomization; and (3) an
[...] Read more.
This work implements and evaluates the recent complete addition formulae for the prime order elliptic curves of Renes, Costello and Batina on an FPGA platform. We implement three different versions:(1) an unprotected architecture; (2) an architecture protected through coordinate randomization; and (3) an architecture with both coordinate randomization and scalar splitting in place. The evaluation is done through timing analysis and test vector leakage assessment (TVLA). The results show that applying an increasing level of countermeasures leads to an increasing resistance against side-channel attacks. This is the first work looking into side-channel security issues of hardware implementations of the complete formulae. Full article
Figures

Figure 1

Open AccessArticle
On Improving Reliability of SRAM-Based Physically Unclonable Functions
J. Low Power Electron. Appl. 2017, 7(1), 2; doi:10.3390/jlpea7010002 -
Abstract
Physically unclonable functions (PUFs) have been touted for their inherent resistance to invasive attacks and low cost in providing a hardware root of trust for various security applications. SRAM PUFs in particular are popular in industry for key/ID generation. Due to intrinsic process
[...] Read more.
Physically unclonable functions (PUFs) have been touted for their inherent resistance to invasive attacks and low cost in providing a hardware root of trust for various security applications. SRAM PUFs in particular are popular in industry for key/ID generation. Due to intrinsic process variations, SRAM cells, ideally, tend to have the same start-up behavior. SRAM PUFs exploit this start-up behavior. Unfortunately, not all SRAM cells exhibit reliable start-up behavior due to noise susceptibility. Hence, design enhancements are needed for improving reliability. Some of the proposed enhancements in literature include fuzzy extraction, error-correcting codes and voting mechanisms. All enhancements involve a trade-off between area/power/performance overhead and PUF reliability. This paper presents a design enhancement technique for reliability that improves upon previous solutions. We present simulation results to quantify improvement in SRAM PUF reliability and efficiency. The proposed technique is shown to generate a 128-bit key in ≤0.2 μs at an area estimate of 4538 μm2 with error rate as low as 106 for intrinsic error probability of 15%. Full article
Figures

Figure 1

Open AccessEditorial
Acknowledgement to Reviewers of Journal of Low Power Electronics and Applications in 2016
J. Low Power Electron. Appl. 2017, 7(1), 1; doi:10.3390/jlpea7010001 -
Abstract The editors of Journal of Low Power Electronics and Applications would like to express their sincere gratitude to the following reviewers for assessing manuscripts in 2016.[...] Full article
Open AccessReview
Low Power Design for Future Wearable and Implantable Devices
J. Low Power Electron. Appl. 2016, 6(4), 20; doi:10.3390/jlpea6040020 -
Abstract
With the fast progress in miniaturization of sensors and advances in micromachinery systems, a gate has been opened to the researchers to develop extremely small wearable/implantable microsystems for different applications. However, these devices are reaching not to a physical limit but a power
[...] Read more.
With the fast progress in miniaturization of sensors and advances in micromachinery systems, a gate has been opened to the researchers to develop extremely small wearable/implantable microsystems for different applications. However, these devices are reaching not to a physical limit but a power limit, which is a critical limit for further miniaturization to develop smaller and smarter wearable/implantable devices (WIDs), especially for multi-task continuous computing purposes. Developing smaller and smarter devices with more functionality requires larger batteries, which are currently the main power provider for such devices. However, batteries have a fixed energy density, limited lifetime and chemical side effect plus the fact that the total size of the WID is dominated by the battery size. These issues make the design very challenging or even impossible. A promising solution is to design batteryless WIDs scavenging energy from human or environment including but not limited to temperature variations through thermoelectric generator (TEG) devices, body movement through Piezoelectric devices, solar energy through miniature solar cells, radio-frequency (RF) harvesting through antenna etc. However, the energy provided by each of these harvesting mechanisms is very limited and thus cannot be used for complex tasks. Therefore, a more comprehensive solution is the use of different harvesting mechanisms on a single platform providing enough energy for more complex tasks without the need of batteries. In addition to this, complex tasks can be done by designing Integrated Circuits (ICs), as the main core and the most power consuming component of any WID, in an extremely low power mode by lowering the supply voltage utilizing low-voltage design techniques. Having the ICs operational at very low voltages, will enable designing battery-less WIDs for complex tasks, which will be discussed in details throughout this paper. In this paper, a path towards battery-less computing is drawn by looking at device circuit co-design for future system-on-chips (SoCs). Full article
Figures

Figure 1

Open AccessArticle
InGaAs-OI Substrate Fabrication on a 300 mm Wafer
J. Low Power Electron. Appl. 2016, 6(4), 19; doi:10.3390/jlpea6040019 -
Abstract
In this work, we demonstrate for the first time a 300-mm indium–gallium–arsenic (InGaAs) wafer on insulator (InGaAs-OI) substrates by splitting in an InP sacrificial layer. A 30-nm-thick InGaAs layer was successfully transferred using low temperature direct wafer bonding (DWB) and Smart CutTM
[...] Read more.
In this work, we demonstrate for the first time a 300-mm indium–gallium–arsenic (InGaAs) wafer on insulator (InGaAs-OI) substrates by splitting in an InP sacrificial layer. A 30-nm-thick InGaAs layer was successfully transferred using low temperature direct wafer bonding (DWB) and Smart CutTM technology. Three key process steps of the integration were therefore specifically developed and optimized. The first one was the epitaxial growing process, designed to reduce the surface roughness of the InGaAs film. Second, direct wafer bonding conditions were investigated and optimized to achieve non-defective bonding up to 600 °C. Finally, we adapted the splitting condition to detach the InGaAs layer according to epitaxial stack specifications. The paper presents the overall process flow that achieved InGaAs-OI, the required optimization, and the associated characterizations, namely atomic force microscopy (AFM), scanning acoustic microscopy (SAM), and HR-XRD, to insure the crystalline quality of the post transferred layer. Full article
Figures

Figure 1

Open AccessArticle
A 4.1 W/mm2 Hybrid Inductive/Capacitive Converter for 2–140 mA-DVS Load under Inductor
J. Low Power Electron. Appl. 2016, 6(3), 18; doi:10.3390/jlpea6030018 -
Abstract
This work presents a fully integrated hybrid inductive/capacitive converter maintaining high efficiency for a load range of 2 mA to 140 mA (70×) suitable for the dynamic voltage scaling (DVS) based loads. This high efficiency is achieved by using an inductive converter for
[...] Read more.
This work presents a fully integrated hybrid inductive/capacitive converter maintaining high efficiency for a load range of 2 mA to 140 mA (70×) suitable for the dynamic voltage scaling (DVS) based loads. This high efficiency is achieved by using an inductive converter for higher loads (15–140 mA, 0.50–0.9 V) and a capacitive converter for lighter loads (2–5 mA, 0.40–0.55 V) with a 50 mV hysteresis margin. A digital state machine activates the appropriate converter based on the power efficiency and enables the converter hand-over. The functional feasibility of implementing digital circuits as representative loads under the inductor is shown thereby increasing the peak converter power density from 0.387 W/mm2 to 4.1 W/mm2 with only a minor hit on the efficiency. The maximum measured efficiency is achieved in inductive mode of operation and decreases from 76.4% to 71% when digital circuits are present under the inductor. The design was fabricated in IBM’s 32 nm SOI technology. Full article
Figures

Figure 1