Special Issue "Low Power AI"

A special issue of Journal of Low Power Electronics and Applications (ISSN 2079-9268).

Deadline for manuscript submissions: closed (1 June 2022) | Viewed by 11216

Special Issue Editor

Dr. Weidong Kuang
E-Mail Website
Guest Editor
Department of Electrical and Computer Engineering, The University of Texas Rio Grande Valley, Edinburg, TX 78539-2999, USA
Interests: machine learning; low power ICs design; asynchronous digital circuits

Special Issue Information

Dear Colleagues,

As artificial intelligence (AI) continues to proliferate into new domains such as IoT, mobile devices, autonomous self-driving cars, mobile devices, drones, etc., one of the obstacles for the deployment of AI algorithms is the power consumption. The primary focus of this special issue is on the exploration of power management, energy efficient techniques and architectures for cognitive computing and machine learning.  

Authors are invited to submit regular papers following the JLPEA submission guidelines, within the remit of this Special Issue call. Topics include but are not limited to 

  • Low power design at device/circuit level for machine learning 
  • Architectures for the edge: IoT, automotive, and mobile 
  • Approximation, quantization reduced precision computing 
  • Neural network pruning, tuning and automatic architecture search 
  • Hardware/software co-design techniques for low power 
  • Neural network architectures for resource constrained devices 
  • Novel memory architectures for machine learning 
  • Communication/computation scheduling for better performance and energy 
  • Load balancing and efficient task distribution techniques 
  • Exploring the balance between precision, performance, power and energy 
  • Exploration of new and efficient applications for machine learning 
  • Energy efficient on-device learning techniques 

Dr. Weidong Kuang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Low Power Electronics and Applications is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
The Potential of SoC FPAAs for Emerging Ultra-Low-Power Machine Learning
J. Low Power Electron. Appl. 2022, 12(2), 33; https://doi.org/10.3390/jlpea12020033 - 06 Jun 2022
Cited by 1 | Viewed by 1267
Abstract
Large-scale field-programmable analog arrays (FPAA) have the potential to handle machine inference and learning applications with significantly low energy requirements, potentially alleviating the high cost of these processes today, even in cloud-based systems. FPAA devices enable embedded machine learning, one form of physical [...] Read more.
Large-scale field-programmable analog arrays (FPAA) have the potential to handle machine inference and learning applications with significantly low energy requirements, potentially alleviating the high cost of these processes today, even in cloud-based systems. FPAA devices enable embedded machine learning, one form of physical mixed-signal computing, enabling machine learning and inference on low-power embedded platforms, particularly edge platforms. This discussion reviews the current capabilities of large-scale field-programmable analog arrays (FPAA), as well as considering the future potential of these SoC FPAA devices, including questions that enable ubiquitous use of FPAA devices similar to FPGA devices. Today’s FPAA devices include integrated analog and digital fabric, as well as specialized processors and infrastructure, becoming a platform of mixed-signal development and analog-enabled computing. We address and show that next-generation FPAAs can handle the required load of 10,000–10,000,000,000 PMAC, required for present and future large fielded applications, at orders of magnitude of lower energy levels than those expected by current technology, motivating the need to develop these new generations of FPAA devices. Full article
(This article belongs to the Special Issue Low Power AI)
Show Figures

Figure 1

Article
Big–Little Adaptive Neural Networks on Low-Power Near-Subthreshold Processors
J. Low Power Electron. Appl. 2022, 12(2), 28; https://doi.org/10.3390/jlpea12020028 - 18 May 2022
Viewed by 1126
Abstract
This paper investigates the energy savings that near-subthreshold processors can obtain in edge AI applications and proposes strategies to improve them while maintaining the accuracy of the application. The selected processors deploy adaptive voltage scaling techniques in which the frequency and voltage levels [...] Read more.
This paper investigates the energy savings that near-subthreshold processors can obtain in edge AI applications and proposes strategies to improve them while maintaining the accuracy of the application. The selected processors deploy adaptive voltage scaling techniques in which the frequency and voltage levels of the processor core are determined at the run-time. In these systems, embedded RAM and flash memory size is typically limited to less than 1 megabyte to save power. This limited memory imposes restrictions on the complexity of the neural networks model that can be mapped to these devices and the required trade-offs between accuracy and battery life. To address these issues, we propose and evaluate alternative ‘big–little’ neural network strategies to improve battery life while maintaining prediction accuracy. The strategies are applied to a human activity recognition application selected as a demonstrator that shows that compared to the original network, the best configurations obtain an energy reduction measured at 80% while maintaining the original level of inference accuracy. Full article
(This article belongs to the Special Issue Low Power AI)
Show Figures

Figure 1

Article
A Generalistic Approach to Machine-Learning-Supported Task Migration on Real-Time Systems
J. Low Power Electron. Appl. 2022, 12(2), 26; https://doi.org/10.3390/jlpea12020026 - 03 May 2022
Viewed by 1212
Abstract
Consolidating tasks to a smaller number of electronic control units (ECUs) is an important strategy for optimizing costs and resources in the automotive industry. In our research, we aim to enable ECU consolidation by migrating tasks at runtime between different ECUs, which adds [...] Read more.
Consolidating tasks to a smaller number of electronic control units (ECUs) is an important strategy for optimizing costs and resources in the automotive industry. In our research, we aim to enable ECU consolidation by migrating tasks at runtime between different ECUs, which adds redundancy and fail-safety capabilities to the system. In this paper, we present a setup with a generalistic and modular architecture that allows for integrating and testing different ECU architectures and machine learning (ML) models. As part of a holistic testbed, we introduce a collection of reproducible tasks, as well as a toolchain that controls the dynamic migration of tasks depending on ECU status and load. The migration is aided by the machine learning predictions on the schedulability analysis of possible future task distributions. To demonstrate the capabilities of the setup, we show its integration with FreeRTOS-based ECUs and two ML models—a long short-term memory (LSTM) network and a spiking neural network—along with a collection of tasks to distribute among the ECUs. Our approach shows a promising potential for machine-learning-based schedulability analysis and enables a comparison between different ML models. Full article
(This article belongs to the Special Issue Low Power AI)
Show Figures

Figure 1

Article
Low-Power Deep Learning Model for Plant Disease Detection for Smart-Hydroponics Using Knowledge Distillation Techniques
J. Low Power Electron. Appl. 2022, 12(2), 24; https://doi.org/10.3390/jlpea12020024 - 26 Apr 2022
Cited by 1 | Viewed by 1515
Abstract
Recent advances in computing allows researchers to propose the automation of hydroponic systems to boost efficiency and reduce manpower demands, hence increasing agricultural produce and profit. A completely automated hydroponic system should be equipped with tools capable of detecting plant diseases in real-time. [...] Read more.
Recent advances in computing allows researchers to propose the automation of hydroponic systems to boost efficiency and reduce manpower demands, hence increasing agricultural produce and profit. A completely automated hydroponic system should be equipped with tools capable of detecting plant diseases in real-time. Despite the availability of deep-learning-based plant disease detection models, the existing models are not designed for an embedded system environment, and the models cannot realistically be deployed on resource-constrained IoT devices such as raspberry pi or a smartphone. Some of the drawbacks of the existing models are the following: high computational resource requirements, high power consumption, dissipates energy rapidly, and occupies large storage space due to large complex structure. Therefore, in this paper, we proposed a low-power deep learning model for plant disease detection using knowledge distillation techniques. The proposed low-power model has a simple network structure of a shallow neural network. The parameters of the model were also reduced by more than 90%. This reduces its computational requirements as well as its power consumption. The proposed low-power model has a maximum power consumption of 6.22 w, which is significantly lower compared to the existing models, and achieved a detection accuracy of 99.4%. Full article
(This article belongs to the Special Issue Low Power AI)
Show Figures

Figure 1

Article
DSCU: Accelerating CNN Inference in FPGAs with Dual Sizes of Compute Unit
J. Low Power Electron. Appl. 2022, 12(1), 11; https://doi.org/10.3390/jlpea12010011 - 13 Feb 2022
Cited by 2 | Viewed by 1333
Abstract
FPGA-based accelerators have shown great potential in improving the performance of CNN inference. However, the existing FPGA-based approaches suffer from a low compute unit (CU) efficiency due to their large number of redundant computations, thus leading to high levels of performance degradation. In [...] Read more.
FPGA-based accelerators have shown great potential in improving the performance of CNN inference. However, the existing FPGA-based approaches suffer from a low compute unit (CU) efficiency due to their large number of redundant computations, thus leading to high levels of performance degradation. In this paper, we show that no single CU can perform best across all the convolutional layers (CONV-layers). To this end, we propose the use of dual sizes of compute unit (DSCU), an approach that aims to accelerate CNN inference in FPGAs. The key idea of DSCU is to select the best combination of CUs via dynamic programming scheduling for each CONV-layer and then assemble each CONV-layer combination into a computing solution for the given CNN to deploy in FPGAs. The experimental results show that DSCU can achieve a performance density of 3.36 × 103 GOPs/slice on a Xilinx Zynq ZU3EG, which is 4.29 times higher than that achieved by other approaches. Full article
(This article belongs to the Special Issue Low Power AI)
Show Figures

Figure 1

Article
Mapping Transformation Enabled High-Performance and Low-Energy Memristor-Based DNNs
J. Low Power Electron. Appl. 2022, 12(1), 10; https://doi.org/10.3390/jlpea12010010 - 10 Feb 2022
Viewed by 1479
Abstract
When deep neural network (DNN) is extensively utilized for edge AI (Artificial Intelligence), for example, the Internet of things (IoT) and autonomous vehicles, it makes CMOS (Complementary Metal Oxide Semiconductor)-based conventional computers suffer from overly large computing loads. Memristor-based devices are emerging as [...] Read more.
When deep neural network (DNN) is extensively utilized for edge AI (Artificial Intelligence), for example, the Internet of things (IoT) and autonomous vehicles, it makes CMOS (Complementary Metal Oxide Semiconductor)-based conventional computers suffer from overly large computing loads. Memristor-based devices are emerging as an option to conduct computing in memory for DNNs to make them faster, much more energy efficient, and accurate. Despite having excellent properties, the memristor-based DNNs are yet to be commercially available because of Stuck-At-Fault (SAF) defects. A Mapping Transformation (MT) method is proposed in this paper to mitigate Stuck-at-Fault (SAF) defects from memristor-based DNNs. First, the weight distribution for the VGG8 model with the CIFAR10 dataset is presented and analyzed. Then, the MT method is used for recovering inference accuracies at 0.1% to 50% SAFs with two typical cases, SA1 (Stuck-At-One): SA0 (Stuck-At-Zero) = 5:1 and 1:5, respectively. The experiment results show that the MT method can recover DNNs to their original inference accuracies (90%) when the ratio of SAFs is smaller than 2.5%. Moreover, even when the SAF is in the extreme condition of 50%, it is still highly efficient to recover the inference accuracy to 80% and 21%. What is more, the MT method acts as a regulator to avoid energy and latency overhead generated by SAFs. Finally, the immunity of the MT Method against non-linearity is investigated, and we conclude that the MT method can benefit accuracy, energy, and latency even with high non-linearity LTP = 4 and LTD = −4. Full article
(This article belongs to the Special Issue Low Power AI)
Show Figures

Figure 1

Article
Hardware/Software Solution for Low Power Evaluation of Tsunami Danger
J. Low Power Electron. Appl. 2022, 12(1), 6; https://doi.org/10.3390/jlpea12010006 - 21 Jan 2022
Cited by 1 | Viewed by 1121
Abstract
Carbon footprint reduction issues have been drawing more and more attention these days. Reducing the energy consumption is among the basic directions along this line. In the paper, a low-energy approach to tsunami danger evaluation is concerned. After several disaster tsunamis of the [...] Read more.
Carbon footprint reduction issues have been drawing more and more attention these days. Reducing the energy consumption is among the basic directions along this line. In the paper, a low-energy approach to tsunami danger evaluation is concerned. After several disaster tsunamis of the XXIst century, the question arises whether is it possible to evaluate in a couple of minutes the tsunami wave parameters, expected at the particular geo location. The point is that it takes around 20 min for the wave to approach the nearest coast after a seismic event offshore of Japan. Currently, the main tool for studying tsunamis is computer modeling. In particular, the expected tsunami height near the coastline, when a major underwater earthquake is detected, can be estimated by a series of numerical experiments of various scenarios of generation and the following wave propagation. Reducing the calculation time of such scenarios and the necessary energy consumption for this is the scope of this study. Moreover, in case of the major earthquake, the electric power shutdown is possible (e.g., the accident at the Fukushima nuclear power station in Japan on 11 May 2011), so the solution should be of low energy-consuming, preferably based at regular personal computers (PCs) or laptops. The way to achieve the requested performance of numerical modeling at the PC platform is a combination of efficient algorithms and their hardware acceleration. Following this strategy, a solution for the fast numerical simulation of tsunami wave propagation has been proposed. Most of tsunami researchers use the shallow-water approximation to simulate tsunami wave propagation at deep water areas. For software implementation, the MacCormack finite-difference scheme has been chosen, as it is suitable for pipelining. For hardware code acceleration, a special processor, that is, the calculator, has been designed at a field-programmable gate array (FPGA) platform. This combination was tested in terms of precision by comparison with the reference code and with the exact solutions (known for some special cases of the bottom profile). The achieved performance made it possible to calculate the wave propagation over a 1000 × 500 km water area in 1 min (the mesh size was compared to 250 m). It was nearly 300 times faster compared to that of a regular PC and 10 times faster compared to the use of a central processing unit (CPU). This result, being implemented into tsunami warning systems, will make it possible to reduce human casualties and economy losses for the so-called near-field tsunamis. The presented paper discussed the new aspect of such implementation, namely low energy consumption. The corresponding measurements for three platforms (PC and two types of FPGA) have been performed, and a comparison of the obtained results of energy consumption was given. As the numerical simulation of numerous tsunami propagation scenarios from different sources are needed for the purpose of coastal tsunami zoning, the integrated amount of the saving energy is expected to be really valuable. For the time being, tsunami researchers have not used the FPGA-based acceleration of computer code execution. Perhaps, the energy-saving aspect is able to promote the use of FPGAs in tsunami researches. The approach to designing special FPGA-based processors for the fast solution of various engineering problems using a PC could be extended to other areas, such as bioinformatics (motif search in DNA sequences and other algorithms of genome analysis and molecular dynamics) and seismic data processing (three-dimensional (3D) wave package decomposition, data compression, noise suppression, etc.). Full article
(This article belongs to the Special Issue Low Power AI)
Show Figures

Figure 1

Article
CORDIC Hardware Acceleration Using DMA-Based ISA Extension
J. Low Power Electron. Appl. 2022, 12(1), 4; https://doi.org/10.3390/jlpea12010004 - 15 Jan 2022
Cited by 2 | Viewed by 1310
Abstract
The use of RISC-based embedded processors aimed at low cost and low power is becoming an increasingly popular ecosystem for both hardware and software development. High-performance yet low-power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architecture [...] Read more.
The use of RISC-based embedded processors aimed at low cost and low power is becoming an increasingly popular ecosystem for both hardware and software development. High-performance yet low-power embedded processors may be attained via the use of hardware acceleration and Instruction Set Architecture (ISA) extension. Recent publications of AI have demonstrated the use of Coordinate Rotation Digital Computer (CORDIC) as a dedicated low-power solution for solving nonlinear equations applied to Neural Networks (NN). This paper proposes ISA extension to support floating-point CORDIC, providing efficient hardware acceleration for mathematical functions. A new DMA-based ISA extension approach integrated with a pipeline CORDIC accelerator is proposed. The CORDIC ISA extension is directly interfaced with a standard processor data path, allowing efficient implementation of new trigonometric ALU-based custom instructions. The proposed DMA-based CORDIC accelerator can also be used to perform repeated array calculations, offering a significant speedup over software implementations. The proposed accelerator is evaluated on Intel Cyclone-IV FPGA as an extension to Nios processor. Experimental results show a significant speedup of over three orders of magnitude compared with software implementation, while applied to trigonometric arrays, and outperforms the existing commercial CORDIC hardware accelerator. Full article
(This article belongs to the Special Issue Low Power AI)
Show Figures

Figure 1

Back to TopTop