Special Issue "Low-Power Techniques for Embedded Systems and Network-on-Chip Architectures"

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Networks".

Deadline for manuscript submissions: closed (31 July 2020).

Special Issue Editor

Dr. Davide Patti
Website
Guest Editor

Special Issue Information

Dear Colleagues,

Last generation embedded systems have evolved from traditional standalone systems to become a complex environment where computational elements tightly interact with physical entities such as sensors networks and I/O devices.

The mobility and pervasive requirements of such environments impose power and energy consumption constraints that must be met in the context of increasing computational needs, due to the processing of large amounts of data from sensing and input devices.

This Special Issue will explore emerging approaches, ideas, and contributions to address the challenges in the design of energy efficient computational-centric embedded systems and on-chip networks.

Potential topics include, but are not limited to:

  • Approximate computing for energy-efficient applications
  • Novel architectures for embedded low power computing
  • Communication infrastructures for energy efficient IoT environments
  • Energy efficient neural networks
  • Energy-aware parallel architectures for high-performance computing
  • Design Platforms and Tools for optimizing energy/performance trade-offs

Dr. Davide Patti
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • embedded systems
  • network-on-chip
  • approximate computing
  • power and energy
  • cyber-physical systems
  • neural networks
  • Internet of Things

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Open AccessArticle
Unified System Network Architecture: Flexible and Area-Efficient NoC Architecture with Multiple Ports and Cores
Electronics 2020, 9(8), 1316; https://doi.org/10.3390/electronics9081316 - 15 Aug 2020
Abstract
In recent years, as semiconductor manufacturing processes have been steadily scaled down, the transistor count fabricated on a single silicon die can reach up to a billion units. Therefore, current multiprocessor system-on-chips (MPSoCs) can include up to hundreds or even thousands of cores [...] Read more.
In recent years, as semiconductor manufacturing processes have been steadily scaled down, the transistor count fabricated on a single silicon die can reach up to a billion units. Therefore, current multiprocessor system-on-chips (MPSoCs) can include up to hundreds or even thousands of cores and additional accelerators for high-performance systems. Network-on-chips (NoCs) have become an attractive solution for interconnects, which are critical components of MPSoCs in terms of system performance. In this study, a highly flexible and area-efficient NoC architecture, namely the unified system network architecture (USNA), which can be tailored for various topologies, is proposed. The USNA provides high flexibility in port placements with varying numbers of local cores and router linkers. It also supports quality of service operations for both the router and linker. The network performance (e.g., average latency and saturated throughput) and implementation cost of the USNA, using various network configurations for the same number of local cores under uniform random traffic conditions, were investigated in this study. According to the simulation results, the performance of the USNA is better or similar to other NoCs, with a significantly smaller area and lower power consumption. Full article
Show Figures

Figure 1

Open AccessArticle
Towards Efficient Neuromorphic Hardware: Unsupervised Adaptive Neuron Pruning
Electronics 2020, 9(7), 1059; https://doi.org/10.3390/electronics9071059 - 27 Jun 2020
Cited by 2
Abstract
To solve real-time challenges, neuromorphic systems generally require deep and complex network structures. Thus, it is crucial to search for effective solutions that can reduce network complexity, improve energy efficiency, and maintain high accuracy. To this end, we propose unsupervised pruning strategies that [...] Read more.
To solve real-time challenges, neuromorphic systems generally require deep and complex network structures. Thus, it is crucial to search for effective solutions that can reduce network complexity, improve energy efficiency, and maintain high accuracy. To this end, we propose unsupervised pruning strategies that are focused on pruning neurons while training in spiking neural networks (SNNs) by utilizing network dynamics. The importance of neurons is determined by the fact that neurons that fire more spikes contribute more to network performance. Based on these criteria, we demonstrate that pruning with an adaptive spike count threshold provides a simple and effective approach that can reduce network size significantly and maintain high classification accuracy. The online adaptive pruning shows potential for developing energy-efficient training techniques due to less memory access and less weight-update computation. Furthermore, a parallel digital implementation scheme is proposed to implement spiking neural networks (SNNs) on field programmable gate array (FPGA). Notably, our proposed pruning strategies preserve the dense format of weight matrices, so the implementation architecture remains the same after network compression. The adaptive pruning strategy enables 2.3× reduction in memory size and 2.8× improvement on energy efficiency when 400 neurons are pruned from an 800-neuron network, while the loss of classification accuracy is 1.69%. And the best choice of pruning percentage depends on the trade-off among accuracy, memory, and energy. Therefore, this work offers a promising solution for effective network compression and energy-efficient hardware implementation of neuromorphic systems in real-time applications. Full article
Show Figures

Figure 1

Open AccessFeature PaperArticle
Delta Multi-Stage Interconnection Networks for Scalable Wireless On-Chip Communication
Electronics 2020, 9(6), 913; https://doi.org/10.3390/electronics9060913 - 30 May 2020
Cited by 3
Abstract
The Network-on-Chip (NoC) paradigm emerged as a viable solution to provide an efficient and scalable communication backbone for next-generation Multiprocessor Systems-on-Chip. As the number of integrated cores keeps growing, alternatives to the traditional multi-hop wired NoCs, such as wireless Networks-on-Chip (WiNoCs), have been [...] Read more.
The Network-on-Chip (NoC) paradigm emerged as a viable solution to provide an efficient and scalable communication backbone for next-generation Multiprocessor Systems-on-Chip. As the number of integrated cores keeps growing, alternatives to the traditional multi-hop wired NoCs, such as wireless Networks-on-Chip (WiNoCs), have been proposed to provide long-range communications in a single hop. In this work, we propose and analyze the integration of the Delta Multistage Interconnection Network (MINs) as a backbone for wireless-enabled NoCs. After extending the well-known Noxim platform to implement a cycle-accurate model of a wireless Delta MIN, we perform a comprehensive set of SystemC simulations to analyze how wireless-augmented Delta MINs can potentially lead to an improvement in both average delay and saturation. Further, we compare the results obtained with traditional mesh-based topologies, reporting energy profiles that show an overall energy cost reduced on both wired/wireless scenarios. Full article
Show Figures

Figure 1

Open AccessArticle
Fuzzy-Based Thermal Management Scheme for 3D Chip Multicores with Stacked Caches
Electronics 2020, 9(2), 346; https://doi.org/10.3390/electronics9020346 - 18 Feb 2020
Cited by 1
Abstract
By using through-silicon-vias (TSV), three dimension integration technology can stack large memory on the top of cores as a last-level on-chip cache (LLC) to reduce off-chip memory access and enhance system performance. However, the integration of more on-chip caches increases chip power density, [...] Read more.
By using through-silicon-vias (TSV), three dimension integration technology can stack large memory on the top of cores as a last-level on-chip cache (LLC) to reduce off-chip memory access and enhance system performance. However, the integration of more on-chip caches increases chip power density, which might lead to temperature-related issues in power consumption, reliability, cooling cost, and performance. An effective thermal management scheme is required to ensure the performance and reliability of the system. In this study, a fuzzy-based thermal management scheme (FBTM) is proposed that simultaneously considers cores and stacked caches. The proposed method combines a dynamic cache reconfiguration scheme with a fuzzy-based control policy in a temperature-aware manner. The dynamic cache reconfiguration scheme determines the size of the cache for the processor core according to the application that reaches a substantial amount of power consumption savings. The fuzzy-based control policy is used to change the frequency level of the processor core based on dynamic cache reconfiguration, a process which can further improve the system performance. Experiments show that, compared with other thermal management schemes, the proposed FBTM can achieve, on average, 3 degrees of reduction in temperature and a 41% reduction of leakage energy. Full article
Show Figures

Figure 1

Open AccessArticle
Approximate CPU Design for IoT End-Devices with Learning Capabilities
Electronics 2020, 9(1), 125; https://doi.org/10.3390/electronics9010125 - 09 Jan 2020
Cited by 3
Abstract
With the rise of Internet of Things (IoT), low-cost resource-constrained devices have to be more capable than traditional embedded systems, which operate on stringent power budgets. In order to add new capabilities such as learning, the power consumption planning has to be revised. [...] Read more.
With the rise of Internet of Things (IoT), low-cost resource-constrained devices have to be more capable than traditional embedded systems, which operate on stringent power budgets. In order to add new capabilities such as learning, the power consumption planning has to be revised. Approximate computing is a promising paradigm for reducing power consumption at the expense of inaccuracy introduced to the computations. In this paper, we set forth approximate computing features of a processor that will exist in the next generation low-cost resource-constrained learning IoT devices. Based on these features, we design an approximate IoT processor which benefits from RISC-V ISA. Targeting machine learning applications such as classification and clustering, we have demonstrated that our processor reinforced with approximate operations can save power up to 23% for ASIC implementation while at least 90% top-1 accuracy is achieved on the trained models and test data set. Full article
Show Figures

Figure 1

Open AccessArticle
Improved Ant Colony Algorithm Based on Task Scale in Network on Chip (NoC) Mapping
Electronics 2020, 9(1), 6; https://doi.org/10.3390/electronics9010006 - 19 Dec 2019
Cited by 2
Abstract
Multi-core processors integrate with multiple computing units on one chip. This technology is increasingly mature, and communication between cores has become the largest research hotspot. As the number of cores continues to increase, the humble bus structure can no longer play the role [...] Read more.
Multi-core processors integrate with multiple computing units on one chip. This technology is increasingly mature, and communication between cores has become the largest research hotspot. As the number of cores continues to increase, the humble bus structure can no longer play the role of multi-core processors. Network on chip (NoC) connects components through routing, which greatly enhances the efficiency of communication. However, the communication power it consumes and network latency are issues that cannot be ignored. An efficient mapping algorithm is an effective method to reduce the communication power and network latency. This paper proposes a mapping method. First, the task is divided depending on the scale of the task. When the task scale is small, to reduce the communication distance between resource nodes, a given NoC substructure is selected to map the task; when the task scale is large, to reduce the communication between tasks, the tasks are clustered and tasks with dependencies are divided into the same resource node. Then combine with an improving ant colony algorithm (ACO) for mapping. The method proposed is being experimentally verified on NoC platforms of different scales. The experimental results show that the method proposed is very effectual for reducing communication power and network latency during NoC mapping. Full article
Show Figures

Graphical abstract

Open AccessArticle
Router-Integrated Cache Hierarchy Design for Highly Parallel Computing in Efficient CMP Systems
Electronics 2019, 8(11), 1363; https://doi.org/10.3390/electronics8111363 - 17 Nov 2019
Cited by 1
Abstract
In current Chip Multi-Processor (CMP) systems, data sharing existing in cache hierarchy acts as a critical issue which costs plenty of clock cycles for maintaining data coherence. Along with the integrated core number increasing, the only shared cache serves too many processing threads [...] Read more.
In current Chip Multi-Processor (CMP) systems, data sharing existing in cache hierarchy acts as a critical issue which costs plenty of clock cycles for maintaining data coherence. Along with the integrated core number increasing, the only shared cache serves too many processing threads to maintain sharing data efficiently. In this work, an enhanced router network is integrated within the private cache level for fast interconnecting sharing data accesses existing in different threads. All sharing data in private cache level can be classified into seven access types by experimental pattern analysis. Then, both shared accesses and thread-crossed accesses can be rapidly detected and dealt with in the proposed router network. As a result, the access latency of private cache is decreased, and a conventional coherence traffic problem is alleviated. The process in the proposed path is composed of three steps. Firstly, the target accesses can be detected by exploring in the router network. Then, the proposed replacement logic can handle those accesses for maintaining data coherence. Finally, those accesses are delivered in the proposed data deliverer. Thus, the harmful data sharing accesses are solved within the first chip layer in 3D-IC structure. The proposed system is also implemented into a cycle-precise simulation platform, and experimental results illustrate that our model can improve the Instructions Per Cycle (IPC) of on-chip execution by maximum 31.85 percent, while energy consumption can be saved by about 17.61 percent compared to the base system. Full article
Show Figures

Figure 1

Open AccessArticle
Intelligent Mapping Method for Power Consumption and Delay Optimization Based on Heterogeneous NoC Platform
Electronics 2019, 8(8), 912; https://doi.org/10.3390/electronics8080912 - 19 Aug 2019
Cited by 4
Abstract
As integrated circuit processes become more advanced, feature sizes become smaller and smaller, and more and more processing cores and memory components are integrated on a single chip. However, the traditional bus-based System-on-Chip (SoC) communication is inefficient, has poor scalability, and cannot handle [...] Read more.
As integrated circuit processes become more advanced, feature sizes become smaller and smaller, and more and more processing cores and memory components are integrated on a single chip. However, the traditional bus-based System-on-Chip (SoC) communication is inefficient, has poor scalability, and cannot handle the communication tasks between the processing cores well. Network-on-chip (NoC) has become an important development direction in this field by virtue of its efficient transmission and scalability of data between multiple cores. The mapping problem is a hot spot in NoC's research field, and its mapping results will directly affect the power consumption, latency, and other properties of the chip. The mapping problem is a NP-hard problem, so how to effectively obtain low-power and low-latency mapping schemes becomes a research difficulty. Aiming at this problem, this paper proposes a two-populations-with-enhanced-initial-population based on genetic algorithm (TI_GA) task mapping algorithm based on an improved genetic algorithm from the two indexes of power consumption and delay. The quality of the initial individual is optimized in the process of constructing the population, so the quality of initial population is improved. In addition, a two-population genetic mechanism is added during the iterative process of the algorithm. The experimental results show that TI_GA is very effective for optimizing network power consumption and delay of heterogeneous multi-core. Full article
Show Figures

Graphical abstract

Back to TopTop