Previous Issue
Volume 12, June
 
 

J. Low Power Electron. Appl., Volume 12, Issue 3 (September 2022) – 12 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Readerexternal link to open them.
Order results
Result details
Select all
Export citation of selected articles as:
Article
High-Speed and Energy-Efficient Carry Look-Ahead Adder
J. Low Power Electron. Appl. 2022, 12(3), 46; https://doi.org/10.3390/jlpea12030046 - 10 Aug 2022
Viewed by 209
Abstract
The carry look-ahead adder (CLA) is well known among the family of high-speed adders. However, a conventional CLA is not faster than other high-speed adders such as a conditional sum adder (CSA), a carry-select adder (CSLA), and the Kogge–Stone adder (KSA), which is [...] Read more.
The carry look-ahead adder (CLA) is well known among the family of high-speed adders. However, a conventional CLA is not faster than other high-speed adders such as a conditional sum adder (CSA), a carry-select adder (CSLA), and the Kogge–Stone adder (KSA), which is the fastest parallel-prefix adder. Further, in terms of power-delay product (PDP) that characterizes the energy of digital circuits, the conventional CLA is not efficient compared to CSLA and KSA. In this context, this paper presents a high-speed and energy-efficient architecture for the CLA. Many adders ranging from ripple carry to parallel-prefix adders were implemented using a 32-28 nm CMOS standard digital cell library by considering a 32-bit addition. The adders were structurally described in Verilog and synthesized using Synopsys Design Compiler. From the results obtained, it is observed that the proposed CLA achieves a reduction in critical path delay by 55.3% and a reduction in PDP by 45% compared to the conventional CLA. Compared to the CSA, the proposed CLA achieves a reduction in critical path delay by 33.9%, a reduction in power by 26.1%, and a reduction in PDP by 51.1%. Compared to an optimized CSLA, the proposed CLA achieves a reduction in power by 35.4%, a reduction in area by 37.3%, and a reduction in PDP by 37.1% without sacrificing the speed. Although the KSA is faster, the proposed CLA achieves a reduction in power by 39.6%, a reduction in PDP by 6.5%, and a reduction in area by 55.6% in comparison. Full article
Show Figures

Figure 1

Article
Computer Engineering Education Experiences with RISC-V Architectures—From Computer Architecture to Microcontrollers
J. Low Power Electron. Appl. 2022, 12(3), 45; https://doi.org/10.3390/jlpea12030045 - 09 Aug 2022
Viewed by 233
Abstract
With the growing popularity of RISC-V and various open-source released RISC-V processors, it is now possible for computer engineers students to explore this simple and relevant architecture, and also, these students can explore and design a microcontroller at a low-level using real tool-flows [...] Read more.
With the growing popularity of RISC-V and various open-source released RISC-V processors, it is now possible for computer engineers students to explore this simple and relevant architecture, and also, these students can explore and design a microcontroller at a low-level using real tool-flows and implement and test their hardware. In this work, we describe our experiences with undergraduate engineers building RISC-V architectures on an FPGA and then extending their experiences to implement an Arduino-like RISC-V tool-flow and the respective hardware and software to handle input-output ports, interrupts, hardware timers, and communication protocols. The microcontroller is implemented on an FPGA as a Senior Design project to test the viability of such efforts. In this work, we will explain how undergraduates can achieve these experiences including preparation for these projects, the tool-flows they use, the challenges in understanding and extending a RISC-V processor with microcontroller functionality, and a suggestion of how to integrate this learning into an existing curriculum, including a discussion on if we should include these deeper experiences in the Computer Engineering undergraduate curriculum. Full article
(This article belongs to the Special Issue RISC-V Architectures and Systems: Hardware and Software Perspectives)
Review
Review on the Basic Circuit Elements and Memristor Interpretation: Analysis, Technology and Applications
J. Low Power Electron. Appl. 2022, 12(3), 44; https://doi.org/10.3390/jlpea12030044 - 03 Aug 2022
Viewed by 429
Abstract
Circuit or electronic components are useful elements allowing the realization of different circuit functionalities. The resistor, capacitor and inductor represent the three commonly known basic passive circuit elements owing to their fundamental nature relating them to the four circuit variables, namely voltage, magnetic [...] Read more.
Circuit or electronic components are useful elements allowing the realization of different circuit functionalities. The resistor, capacitor and inductor represent the three commonly known basic passive circuit elements owing to their fundamental nature relating them to the four circuit variables, namely voltage, magnetic flux, current and electric charge. The memory resistor (or memristor) was claimed to be the fourth basic passive circuit element, complementing the resistor, capacitor and inductor. This paper presents a review on the four basic passive circuit elements. After a brief recall on the first three known basic passive circuit elements, a thorough description of the memristor follows. Memristor sparks interest in the scientific community due to its interesting features, for example nano-scalability, memory capability, conductance modulation, connection flexibility and compatibility with CMOS technology, etc. These features among many others are currently in high demand on an industrial scale. For this reason, thousands of memristor-based applications are reported. Hence, the paper presents an in-depth overview of the philosophical argumentations of memristor, technologies and applications. Full article
Article
A Subthreshold Layout Strategy for Faster and Lower Energy Complex Digital Circuits
J. Low Power Electron. Appl. 2022, 12(3), 43; https://doi.org/10.3390/jlpea12030043 - 02 Aug 2022
Viewed by 296
Abstract
This work presents complex circuitry from subthreshold standard cell libraries created by geometric STI spacer patterning for bulk planar CMOS technology nodes. Performance/leakage granularity enhancement affords safer multi-Vt synthesis in aggressive voltage scaling schemes. Libraries are evaluated in silicon through implementation of 32-bit [...] Read more.
This work presents complex circuitry from subthreshold standard cell libraries created by geometric STI spacer patterning for bulk planar CMOS technology nodes. Performance/leakage granularity enhancement affords safer multi-Vt synthesis in aggressive voltage scaling schemes. Libraries are evaluated in silicon through implementation of 32-bit datapath 128-bit AES cores. Intra-die nominal temperature (20 °C) analysis reveals improvements of up to 8.65×/24% MEP-to-MEP in frequency and energy-per-cycle respectively, compared to a state-of-the-art subthreshold library. A negative temperature correlation with performance enhancement is demonstrated extending beyond the cell level and into more complex designs. MEP-to-MEP performance enhancement and energy-per-cycle reduction are demonstrated over a temperature range of 0 °C to 85 °C. Full article
Show Figures

Figure 1

Article
The Benefits and Costs of Netlist Randomization Based Side-Channel Countermeasures: An In-Depth Evaluation
J. Low Power Electron. Appl. 2022, 12(3), 42; https://doi.org/10.3390/jlpea12030042 - 23 Jul 2022
Viewed by 353
Abstract
Exchanging FPGA-based implementations of cryptographic algorithms during run-time using netlist randomized versions has been introduced recently as a unique countermeasure against side channel attacks. Using partial reconfiguration, it is possible to shuffle between structurally different but functionally similar versions of a cryptographic implementation. [...] Read more.
Exchanging FPGA-based implementations of cryptographic algorithms during run-time using netlist randomized versions has been introduced recently as a unique countermeasure against side channel attacks. Using partial reconfiguration, it is possible to shuffle between structurally different but functionally similar versions of a cryptographic implementation. The resulting varying power profile enhances the resistance against power-based side channel attacks. While side channel leakage is reduced, costs in terms of additional resources and/or lowered throughput are often increased due to the overheads of the required online partial reconfiguration. In this work, we provide an in-depth evaluation of the leakage-area-throughput trade-off. Full article
(This article belongs to the Special Issue Low-Power Hardware Security)
Article
Electrical Impedance Tomography for Hand Gesture Recognition for HMI Interaction Applications
J. Low Power Electron. Appl. 2022, 12(3), 41; https://doi.org/10.3390/jlpea12030041 - 18 Jul 2022
Viewed by 505
Abstract
Electrical impedance tomography (EIT) is based on the physical principle of bioimpedance defined as the opposition that biological tissues exhibit to the flow of a rotating alternating electrical current. Consequently, here, we propose studying the characterization and classification of bioimpedance patterns based on [...] Read more.
Electrical impedance tomography (EIT) is based on the physical principle of bioimpedance defined as the opposition that biological tissues exhibit to the flow of a rotating alternating electrical current. Consequently, here, we propose studying the characterization and classification of bioimpedance patterns based on EIT by measuring, on the forearm with eight electrodes in a non-invasive way, the potential drops resulting from the execution of six hand gestures. The starting point was the acquisition of bioimpedance patterns studied by means of principal component analysis (PCA), validated through the cross-validation technique, and classified using the k-nearest neighbor (kNN) classification algorithm. As a result, it is concluded that reduction and classification is feasible, with a sensitivity of 0.89 in the worst case, for each of the reduced bioimpedance patterns, leading to the following direct advantage: a reduction in the numbers of electrodes and electronics required. In this work, bioimpedance patterns were investigated for monitoring subjects’ mobility, where, generally, these solutions are based on a sensor system with moving parts that suffer from significant problems of wear, lack of adaptability to the patient, and lack of resolution. Whereas, the proposal implemented in this prototype, based on the so-called electrical impedance tomography, does not have these problems. Full article
Show Figures

Figure 1

Article
Dynamic SIMD Parallel Execution on GPU from High-Level Dataflow Synthesis
J. Low Power Electron. Appl. 2022, 12(3), 40; https://doi.org/10.3390/jlpea12030040 - 17 Jul 2022
Viewed by 407
Abstract
Developing and fine-tuning software programs for heterogeneous hardware such as CPU/GPU processing platforms comprise a highly complex endeavor that demands considerable time and effort of software engineers and requires evaluating various fundamental components and features of both the design and of the platform [...] Read more.
Developing and fine-tuning software programs for heterogeneous hardware such as CPU/GPU processing platforms comprise a highly complex endeavor that demands considerable time and effort of software engineers and requires evaluating various fundamental components and features of both the design and of the platform to maximize the overall performance. The dataflow programming approach has proven to be an appropriate methodology for reaching such a difficult and complex goal for the intrinsic portability and the possibility of easily decomposing a network of actors on different processing units of the heterogeneous hardware. Nonetheless, such a design method might not be enough on its own to achieve the desired performance goals, and supporting tools are useful to be able to efficiently explore the design space so as to optimize the desired performance objectives. This article presents a methodology composed of several stages for enhancing the performance of dataflow software developed in RVC-CAL and generating low-level implementations to be executed on GPU/CPU heterogeneous hardware platforms. The stages are composed of a method for the efficient scheduling of parallel CUDA partitions, an optimization of the performance of the data transmission tasks across computing kernels, and the exploitation of dynamic programming for introducing SIMD-capable graphics processing unit systems. The methodology is validated on both the quantitative and qualitative side by means of dataflow software application examples running on platforms according to various different mapping configurations. Full article
Show Figures

Figure 1

Article
Efficiency of Priority Queue Architectures in FPGA
J. Low Power Electron. Appl. 2022, 12(3), 39; https://doi.org/10.3390/jlpea12030039 - 14 Jul 2022
Viewed by 504
Abstract
This paper presents a novel SRAM-based architecture of a data structure that represents a set of multiple priority queues that can be implemented in FPGA or ASIC. The proposed architecture is based on shift registers, systolic arrays and SRAM memories. Such architecture, called [...] Read more.
This paper presents a novel SRAM-based architecture of a data structure that represents a set of multiple priority queues that can be implemented in FPGA or ASIC. The proposed architecture is based on shift registers, systolic arrays and SRAM memories. Such architecture, called MultiQueue, is optimized for minimum chip area costs, which leads to lower energy consumption too. The MultiQueue architecture has constant time complexity, constant critical path length and constant latency. Therefore, it is highly predictable and very suitable for real-time systems too. The proposed architecture was verified using a simplified version of UVM and applying millions of instructions with randomly generated input values. Achieved FPGA synthesis results are presented and discussed. These results show significant savings in FPGA Look-Up Tables consumption in comparison to existing solutions. More than 63% of Look-Up Tables can be saved using the MultiQueue architecture instead of the existing priority queues. Full article
(This article belongs to the Special Issue Advanced Researches in Embedded Systems)
Show Figures

Figure 1

Article
Analysis and Comparison of Different Approaches to Implementing a Network-Based Parallel Data Processing Algorithm
J. Low Power Electron. Appl. 2022, 12(3), 38; https://doi.org/10.3390/jlpea12030038 - 09 Jul 2022
Viewed by 485
Abstract
It is well known that network-based parallel data processing algorithms are well suited to implementation in reconfigurable hardware recurring to either Field-Programmable Gate Arrays (FPGA) or Programmable Systems-on-Chip (PSoC). The intrinsic parallelism of these devices makes it possible to execute several data-independent network [...] Read more.
It is well known that network-based parallel data processing algorithms are well suited to implementation in reconfigurable hardware recurring to either Field-Programmable Gate Arrays (FPGA) or Programmable Systems-on-Chip (PSoC). The intrinsic parallelism of these devices makes it possible to execute several data-independent network operations in parallel. However, the approaches to designing the respective systems vary significantly with the experience and background of the engineer in charge. In this paper, we analyze and compare the pros and cons of using an embedded processor, high-level synthesis methods, and register-transfer low-level design in terms of design effort, performance, and power consumption for implementing a parallel algorithm to find the two smallest values in a dataset. This problem is easy to formulate, has a number of practical applications (for instance, in low-density parity check decoders), and is very well suited to parallel implementation based on comparator networks. Full article
Show Figures

Figure 1

Review
Deep Learning Approaches to Source Code Analysis for Optimization of Heterogeneous Systems: Recent Results, Challenges and Opportunities
J. Low Power Electron. Appl. 2022, 12(3), 37; https://doi.org/10.3390/jlpea12030037 - 05 Jul 2022
Viewed by 544
Abstract
To cope with the increasing complexity of digital systems programming, deep learning techniques have recently been proposed to enhance software deployment by analysing source code for different purposes, ranging from performance and energy improvement to debugging and security assessment. As embedded platforms for [...] Read more.
To cope with the increasing complexity of digital systems programming, deep learning techniques have recently been proposed to enhance software deployment by analysing source code for different purposes, ranging from performance and energy improvement to debugging and security assessment. As embedded platforms for cyber-physical systems are characterised by increasing heterogeneity and parallelism, one of the most challenging and specific problems is efficiently allocating computational kernels to available hardware resources. In this field, deep learning applied to source code can be a key enabler to face this complexity. However, due to the rapid development of such techniques, it is not easy to understand which of those are suitable and most promising for this class of systems. For this purpose, we discuss recent developments in deep learning for source code analysis, and focus on techniques for kernel mapping on heterogeneous platforms, highlighting recent results, challenges and opportunities for their applications to cyber-physical systems. Full article
Show Figures

Figure 1

Article
Performance Estimation of High-Level Dataflow Program on Heterogeneous Platforms by Dynamic Network Execution
J. Low Power Electron. Appl. 2022, 12(3), 36; https://doi.org/10.3390/jlpea12030036 - 23 Jun 2022
Viewed by 583
Abstract
The performance of programs executed on heterogeneous parallel platforms largely depends on the design choices regarding how to partition the processing on the various different processing units. In other words, it depends on the assumptions and parameters that define the partitioning, mapping, scheduling, [...] Read more.
The performance of programs executed on heterogeneous parallel platforms largely depends on the design choices regarding how to partition the processing on the various different processing units. In other words, it depends on the assumptions and parameters that define the partitioning, mapping, scheduling, and allocation of data exchanges among the various processing elements of the platform executing the program. The advantage of programs written in languages using the dataflow model of computation (MoC) is that executing the program with different configurations and parameter settings does not require rewriting the application software for each configuration setting, but only requires generating a new synthesis of the execution code corresponding to different parameters. The synthesis stage of dataflow programs is usually supported by automatic code generation tools. Another competitive advantage of dataflow software methodologies is that they are well-suited to support designs on heterogeneous parallel systems as they are inherently free of memory access contention issues and naturally expose the available intrinsic parallelism. So as to fully exploit these advantages and to be able to efficiently search the configuration space to find the design points that better satisfy the desired design constraints, it is necessary to develop tools and associated methodologies capable of evaluating the performance of different configurations and to drive the search for good design configurations, according to the desired performance criteria. The number of possible design assumptions and associated parameter settings is usually so large (i.e., the dimensions and size of the design space) that intuition as well as trial and error are clearly unfeasible, inefficient approaches. This paper describes a method for the clock-accurate profiling of software applications developed using the dataflow programming paradigm such as the formal RVL-CAL language. The profiling can be applied when the application program has been compiled and executed on GPU/CPU heterogeneous hardware platforms utilizing two main methodologies, denoted as static and dynamic. This paper also describes how a method for the qualitative evaluation of the performance of such programs as a function of the supplied configuration parameters can be successfully applied to heterogeneous platforms. The technique was illustrated using two different application software examples and several design points. Full article
Show Figures

Figure 1

Article
±0.3V Bulk-Driven Fully Differential Buffer with High Figures of Merit
J. Low Power Electron. Appl. 2022, 12(3), 35; https://doi.org/10.3390/jlpea12030035 - 22 Jun 2022
Viewed by 586
Abstract
A high performance bulk-driven rail-to-rail fully differential buffer operating from ±0.3V supplies in 180 nm CMOS technology is reported. It has a differential–difference input stage and common mode feedback circuits implemented with no-tail, high CMRR bulk-driven pseudo-differential cells. It operates in subthreshold, has [...] Read more.
A high performance bulk-driven rail-to-rail fully differential buffer operating from ±0.3V supplies in 180 nm CMOS technology is reported. It has a differential–difference input stage and common mode feedback circuits implemented with no-tail, high CMRR bulk-driven pseudo-differential cells. It operates in subthreshold, has infinite input impedance, low output impedance (1.4 kΩ), 86.77 dB DC open-loop gain, 172.91 kHz bandwidth and 0.684 μW static power dissipation with a 50-pF load capacitance. The buffer has power efficient class AB operation, a small signal figure of merit FOMSS = 12.69 MHzpFμW−1, a large signal figure of merit FOMLS = 34.89 (V/μs) pFμW−1, CMRR = 102 dB, PSRR+ = 109 dB, PSRR− = 100 dB, 1.1 μV/√Hz input noise spectral density, 0.3 mVrms input noise and 3.5 mV input DC offset voltage. Full article
Show Figures

Figure 1

Previous Issue
Back to TopTop