J. Low Power Electron. Appl.2013, 3(4), 337-367; doi:10.3390/jlpea3040337 - published online 29 October 2013 Show/Hide Abstract
Abstract: Tone mapping algorithms are used to adapt captured wide dynamic range (WDR) scenes to the limited dynamic range of available display devices. Although there are several tone mapping algorithms available, most of them require manual tuning of their rendering parameters. In addition, the high complexities of some of these algorithms make it difficult to implement efficient real-time hardware systems. In this work, a real-time hardware implementation of an exponent-based tone mapping algorithm is presented. The algorithm performs a mixture of both global and local compression on colored WDR images. An automatic parameter selector has been proposed for the tone mapping algorithm in order to achieve good tone-mapped images without manual reconfiguration of the algorithm for each WDR image. Both algorithms are described in Verilog and synthesized for a field programmable gate array (FPGA). The hardware architecture employs a combination of parallelism and system pipelining, so as to achieve a high performance in power consumption, hardware resources usage and processing speed. Results show that the hardware architecture produces images of good visual quality that can be compared to software-based tone mapping algorithms. High peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) scores were obtained when the results were compared with output images obtained from software simulations using MATLAB.
J. Low Power Electron. Appl.2013, 3(4), 300-336; doi:10.3390/jlpea3040300 - published online 25 October 2013 Show/Hide Abstract
Abstract: As portable devices become more ubiquitous, data security in these devices is becoming increasingly important. Traditional circuit design techniques leave otherwise secure systems vulnerable due to the characteristics of the hardware implementation, rather than weaknesses in the security algorithms. These characteristics, called side-channels, are exploitable because they can be measured and correlated with processed data, potentially giving an attacker insight into the device’s secret data. Alternative design techniques such as dual-rail asynchronous designs are capable of minimizing these potential side-channels by decoupling them from the data being processed. However, these techniques are either expensive to implement compared to standard designs or leave exploitable imbalances in the dual-rail implementation itself. Multi-Threshold Dual-Spacer Dual-Rail Delay-Insensitive Logic (MTD3L) offers security by balancing side-channels both in general and between the dual-rail signals themselves, as well as reduction in circuit overhead compared to previous secure design techniques. Results show that the Advanced Encryption Standard (AES) cores designed using MTD3L exhibit similar security to previous secure techniques with substantially less area and energy overhead.
J. Low Power Electron. Appl.2013, 3(3), 279-299; doi:10.3390/jlpea3030279 - published online 9 September 2013 Show/Hide Abstract
Abstract: Brain neuroprostheses for neuromodulation are being designed to monitor the neural activity of the brain in the vicinity of the region being stimulated using a single macro-electrode. Using a single macro-electrode, recent neuromodulation studies show that recording systems with a low gain neuronal amplifier and successive amplifier stages can reduce or reject stimulation artifacts. These systems were made with off-the-shelf components that are not amendable for future implant design. A low-gain, low-noise integrated neuronal amplifier (NA) with the capability of recording local field potentials (LFP) and spike activity is presented. In vitro and in vivo characterizations of the tissue/electrode interface, with equivalent impedance as an electrical model for recording in the LFP band using macro-electrodes for rodents, contribute to the NA design constraints. The NA occupies 0.15 mm2 and dissipates 6.73 µW, and was fabricated using a 0.35 µm CMOS process. Test-bench validation indicates that the NA provides a mid-band gain of 20 dB and achieves a low input-referred noise of 4 µVRMS. Ability of the NA to perform spike recording in test-bench experiments is presented. Additionally, an awake and freely moving rodent setup was used to illustrate the integrated NA ability to record LFPs, paving the pathway for future implantable systems for neuromodulation.
J. Low Power Electron. Appl.2013, 3(3), 267-278; doi:10.3390/jlpea3030267 - published online 8 August 2013 Show/Hide Abstract
Abstract: A novel low power CMOS imaging system with smart image capture and adaptive complexity 2D-Discrete Cosine Transform (DCT) is proposed. Compared with the existing imaging systems, it involves the smart image capture and image processing stages cooperating together and is very efficient. The type of each 8 × 8 block is determined during the image capture stage, and then input into the DCT block, along with the pixel values. The 2D-DCT calculation has adaptive computation complexity according to block types. Since the block type prediction has been moved to the front end, no extra time or calculation is needed during image processing or image capturing for prediction. The image sensor with block type decision circuit is implemented in TSMC 0.18 µm CMOS technology. The adaptive complexity 2D-DCT compression is implemented based on Cyclone EP1C20F400C8 device. The performance including the image quality of the reconstructed picture and the power consumption of the imaging system are compared to those of traditional CMOS imaging systems to show the benefit of the proposed low power algorithm. According to simulation, up to 46% of power consumption can be saved during 2D DCT calculation without extra loss of image quality for the reconstructed pictures compared with the conventional compression methods.
J. Low Power Electron. Appl.2013, 3(3), 250-266; doi:10.3390/jlpea3030250 - published online 29 July 2013 Show/Hide Abstract
Abstract: The increasing popularity of DVFS (dynamic voltage frequency scaling) schemes for portable low power applications demands highly efficient on-chip DC-DC converters. The primary aim of this work is to enable increased efficiency of on-chip DC-DC conversion for near-threshold operation of multicore chips. The idea is to supply nominal (high) off-chip voltage to the cores which are then “voltage-stacked” to generate the near-threshold (low) voltages based on Kirchhoff’s voltage law through charge recycling. However, the effectiveness of this implicit down-conversion is affected by the current imbalance among the cores. The paper presents a design methodology and optimization strategy for highly efficient charge recycling on-chip regulation using a push-pull switched capacitor (SC) circuit. A dual-boundary hysteretic feedback control circuit has been designed for stacked loads. A stacked-voltage domain with its self-regulation capability combined with a SC converter has shown average efficiency of 78%–93% for 2:1 down-conversion with ILoad (max) of 200 mA and workload imbalance varying from 0–100%.
J. Low Power Electron. Appl.2013, 3(3), 233-249; doi:10.3390/jlpea3030233 - published online 15 July 2013 Show/Hide Abstract
Abstract: Scaling the voltage to the sub-threshold region is a convincing technique to achieve low power in digital circuits. The problem is that process variability severely impacts the performance of circuits operating in the sub-threshold domain. In this paper, we evaluate the sub-threshold sizing methodology of [1,2] on 40 nm and 90 nm standard cell libraries. The concept of the proposed sizing methodology consists of balancing the mean of the sub-threshold current of the equivalent N and P networks. In this paper, the equivalent N and P networks are derived based on the best and worst case transition times. The slack available in the best-case timing arc is reduced by using smaller transistors on that path, while the timing of the worst-case timing arc is improved by using bigger transistors. The optimization is done such that the overall area remains constant with regard to the area before optimization. Two sizing styles are applied, one is based on both transistor width and length tuning, and the other one is based on width tuning only. Compared to super-threshold libraries, at 0.3 V, the proposed libraries achieve 49% and 89% average cell timing improvement and 55% and 31% power delay product improvement at 40 nm and 90 nm respectively. From ITC (International Test Conference 99) benchmark circuit synthesis results, at 0.3 V the proposed library achieves up to 52% timing improvement and 53% power savings in the 40 nm technology node.