Special Issue "Multi-Core Systems-On-Chips Design and Optimization"

A special issue of Computers (ISSN 2073-431X).

Deadline for manuscript submissions: closed (28 February 2018)

Special Issue Editor

Guest Editor
Assoc. Prof. Ann Gordon-Ross

Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA
Website | E-Mail
Interests: computer engineering; low-power design; embedded systems; reconfigurable computing; platform design; aerospace system; fault tolerance; dynamic optimizations; hardware design; real-time systems; computer architecture; multi-core platforms

Special Issue Information

Dear Colleagues,

Systems-on-Chip (SoC) design has rapidly evolved from simple uni-core systems to complex, multi-core systems with tens to tens of thousands of heterogeneous cores communicating and cooperating via complex communication networks and shared resources. Traditional system design and optimization methods are becoming infeasible, necessitating innovative new research contributions in scalable and adaptive solutions to efficiently and effectively leverage oftentimes unknown and changing available resources for systems ranging from small, handheld/wearable devices to large-scale datacenter and exascale high performance computing. Multi-core systems must be able to quickly reconfigure/adapt their operation by monitoring runtime environmental and resource conditions to maintain performance/power requirements. Consequently, innovative architectural and system design, optimization approaches, design exploration assistance, etc., have become crucial to achieving performance and efficiency system goals.

This Special Issue addresses all aspects of energy-, power- and/or performance-efficient computing and system design for complex multi-core systems. Authors are invited to submit original papers in (but not limited to) the following topics:

    • Tools, methodologies, and design techniques
    • Adaptive runtime optimization approaches, and application mapping, scheduling, and resource sharing based on changing runtime conditions
    • Aging-aware design, energy- and thermal-related reliability issues
    • Efficient off-chip/on-chip communication architectures, including networks-on-chip
    • Shared memory architectures and technologies (e.g., coherence protocols)

Prof. Dr. Ann  Gordon-Ross

Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Computers is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 350 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (8 papers)

View options order results:
result details:
Displaying articles 1-8
Export citation of selected articles as:

Research

Open AccessArticle ASIR: Application-Specific Instruction-Set Router for NoC-Based MPSoCs
Received: 30 April 2018 / Revised: 9 June 2018 / Accepted: 22 June 2018 / Published: 27 June 2018
PDF Full-text (2980 KB) | HTML Full-text | XML Full-text
Abstract
The end of Dennard scaling led to the use of heterogeneous multi-processor systems-on-chip (MPSoCs). Heterogeneous MPSoCs provide a high efficiency in terms of energy and performance due to the fact that each processing element can be optimized for an application task. However, the
[...] Read more.
The end of Dennard scaling led to the use of heterogeneous multi-processor systems-on-chip (MPSoCs). Heterogeneous MPSoCs provide a high efficiency in terms of energy and performance due to the fact that each processing element can be optimized for an application task. However, the evolution of MPSoCs shows a growing number of processing elements (PEs), which leads to tremendous communication costs, tending to become the performance bottleneck. Networks-on-chip (NoCs) are a promising and scalable intra-chip communication technology for MPSoCs. However, these technological advances require novel and effective programming methodologies to efficiently exploit them. This work presents a novel router architecture called application-specific instruction-set router (ASIR) for field-programmable-gate-arrays (FPGA)-based MPSoCs. It combines data transfers with application-specific processing by adding high-level synthesized processing units to routers of the NoC. The execution of application-specific operations during data exchange between PEs exploits efficiently the transmission time. Furthermore, the processing units can be programmed in C/C++ using high-level synthesis, and accordingly, they can be specifically optimized for an application. This approach enables transferred data to be processed by a processing element, such as a MicroBlaze processor, before the transmission or by a router during the transmission. Moreover, a static mapping algorithm for applications modeled by a Kahn process network-based graph is introduced that maps tasks to the MicroBlaze processors and processing units. The mapping algorithm optimizes the communication cost by allocating tasks to nearest neighboring PEs. This complete methodology significantly simplifies the design and programming of ASIR-based MPSoCs. Furthermore, it efficiently exploits the heterogeneity of processing capabilities inside the routers and MicroBlaze processors. Full article
(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)
Figures

Figure 1

Open AccessArticle Hardware-Assisted Secure Communication in Embedded and Multi-Core Computing Systems
Received: 16 April 2018 / Revised: 7 May 2018 / Accepted: 12 May 2018 / Published: 15 May 2018
PDF Full-text (5208 KB) | HTML Full-text | XML Full-text
Abstract
With the sharp rise of functionalities and connectivities in multi-core embedded systems, these systems have become notably vulnerable to security attacks. Conventional software security mechanisms fail to deliver full safety and also affect the system performance significantly. In this paper, a hardware-based security
[...] Read more.
With the sharp rise of functionalities and connectivities in multi-core embedded systems, these systems have become notably vulnerable to security attacks. Conventional software security mechanisms fail to deliver full safety and also affect the system performance significantly. In this paper, a hardware-based security procedure is proposed to handle critical information in real-time through comprehensive separation without needing any help from the software. To evaluate the proposed system, an authentication system based on an image procession solution has been implemented on a reconfigurable device. In addition, the proposed security mechanism is evaluated for the Networks-on-chips, where minimal area, power consumption and performance overheads are achieved. Full article
(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)
Figures

Figure 1

Open AccessFeature PaperArticle Mixed Cryptography Constrained Optimization for Heterogeneous, Multicore, and Distributed Embedded Systems
Received: 28 February 2018 / Revised: 11 April 2018 / Accepted: 22 April 2018 / Published: 24 April 2018
PDF Full-text (3771 KB) | HTML Full-text | XML Full-text
Abstract
Embedded systems continue to execute computational- and memory-intensive applications with vast data sets, dynamic workloads, and dynamic execution characteristics. Adaptive distributed and heterogeneous embedded systems are increasingly critical in supporting dynamic execution requirements. With pervasive network access within these systems, security is a
[...] Read more.
Embedded systems continue to execute computational- and memory-intensive applications with vast data sets, dynamic workloads, and dynamic execution characteristics. Adaptive distributed and heterogeneous embedded systems are increasingly critical in supporting dynamic execution requirements. With pervasive network access within these systems, security is a critical design concern that must be considered and optimized within such dynamically adaptive systems. This paper presents a modeling and optimization framework for distributed, heterogeneous embedded systems. A dataflow-based modeling framework for adaptive streaming applications integrates models for computational latency, mixed cryptographic implementations for inter-task and intra-task communication, security levels, communication latency, and power consumption. For the security model, we present a level-based modeling of cryptographic algorithms using mixed cryptographic implementations. This level-based security model enables the development of an efficient, multi-objective genetic optimization algorithm to optimize security and energy consumption subject to current application requirements and security policy constraints. The presented methodology is evaluated using a video-based object detection and tracking application and several synthetic benchmarks representing various application types and dynamic execution characteristics. Experimental results demonstrate the benefits of a mixed cryptographic algorithm security model compared to using a single, fixed cryptographic algorithm. Results also highlight how security policy constraints can yield increased security strength and cryptographic diversity for the same energy constraint. Full article
(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)
Figures

Figure 1

Open AccessArticle Designing Domain-Specific Heterogeneous Architectures from Dataflow Programs
Received: 1 March 2018 / Revised: 15 April 2018 / Accepted: 21 April 2018 / Published: 22 April 2018
PDF Full-text (1372 KB) | HTML Full-text | XML Full-text
Abstract
The last ten years have seen performance and power requirements pushing computer architectures using only a single core towards so-called manycore systems with hundreds of cores on a single chip. To further increase performance and energy efficiency, we are now seeing the development
[...] Read more.
The last ten years have seen performance and power requirements pushing computer architectures using only a single core towards so-called manycore systems with hundreds of cores on a single chip. To further increase performance and energy efficiency, we are now seeing the development of heterogeneous architectures with specialized and accelerated cores. However, designing these heterogeneous systems is a challenging task due to their inherent complexity. We proposed an approach for designing domain-specific heterogeneous architectures based on instruction augmentation through the integration of hardware accelerators into simple cores. These hardware accelerators were determined based on their common use among applications within a certain domain.The objective was to generate heterogeneous architectures by integrating many of these accelerated cores and connecting them with a network-on-chip. The proposed approach aimed to ease the design of heterogeneous manycore architectures—and, consequently, exploration of the design space—by automating the design steps. To evaluate our approach, we enhanced our software tool chain with a tool that can generate accelerated cores from dataflow programs. This new tool chain was evaluated with the aid of two use cases: radar signal processing and mobile baseband processing. We could achieve an approximately 4 × improvement in performance, while executing complete applications on the augmented cores with a small impact (2.5–13%) on area usage. The generated accelerators are competitive, achieving more than 90% of the performance of hand-written implementations. Full article
(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)
Figures

Figure 1

Open AccessArticle Feedback-Based Admission Control for Firm Real-Time Task Allocation with Dynamic Voltage and Frequency Scaling
Received: 13 March 2018 / Revised: 12 April 2018 / Accepted: 13 April 2018 / Published: 16 April 2018
PDF Full-text (2377 KB) | HTML Full-text | XML Full-text
Abstract
Feedback-based mechanisms can be employed to monitor the performance of Multiprocessor Systems-on-Chips (MPSoCs) and steer the task execution even if the exact knowledge of the workload is unknown a priori. In particular, traditional proportional-integral controllers can be used with firm real-time tasks to
[...] Read more.
Feedback-based mechanisms can be employed to monitor the performance of Multiprocessor Systems-on-Chips (MPSoCs) and steer the task execution even if the exact knowledge of the workload is unknown a priori. In particular, traditional proportional-integral controllers can be used with firm real-time tasks to either admit them to the processing cores or reject in order not to violate the timeliness of the already admitted tasks. During periods with a lower computational power demand, dynamic voltage and frequency scaling (DVFS) can be used to reduce the dissipation of energy in the cores while still not violating the tasks’ time constraints. Depending on the workload pattern and weight, platform size and the granularity of DVFS, energy savings can reach even 60% at the cost of a slight performance degradation. Full article
(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)
Figures

Figure 1

Open AccessArticle Scheduling and Tuning for Low Energy in Heterogeneous and Configurable Multicore Systems
Received: 8 March 2018 / Revised: 7 April 2018 / Accepted: 11 April 2018 / Published: 14 April 2018
PDF Full-text (2181 KB) | HTML Full-text | XML Full-text
Abstract
Heterogeneous and configurable multicore systems provide hardware specialization to meet disparate application hardware requirements. However, effective multicore system specialization can require a priori knowledge of the applications, application profiling information, and/or dynamic hardware tuning to schedule and execute applications on the most energy
[...] Read more.
Heterogeneous and configurable multicore systems provide hardware specialization to meet disparate application hardware requirements. However, effective multicore system specialization can require a priori knowledge of the applications, application profiling information, and/or dynamic hardware tuning to schedule and execute applications on the most energy efficient cores. Furthermore, even though highly disparate core heterogeneity and/or highly configurable parameters with numerous potential parameter values result in more fine-grained specialization and higher energy savings potential, these large design spaces are challenging to efficiently explore. To address these challenges, we propose a novel configuration-subsetted heterogeneous and configurable multicore system, wherein each core offers a small subset of the design space, and propose a novel scheduling and tuning (SaT) algorithm to efficiently exploit the energy savings potential of this system. Our proposed architecture and algorithm require no a priori application knowledge or profiling, and incur minimal runtime overhead. Results reveal energy savings potential and insights on energy trade-offs in heterogeneous, configurable systems. Full article
(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)
Figures

Figure 1

Open AccessFeature PaperArticle Low Effort Design Space Exploration Methodology for Configurable Caches
Received: 12 February 2018 / Revised: 16 March 2018 / Accepted: 17 March 2018 / Published: 27 March 2018
Cited by 1 | PDF Full-text (3949 KB) | HTML Full-text | XML Full-text
Abstract
Designers can reduce design space exploration time and efforts using the design space subsetting method that removes energy-redundant configurations. However, the subsetting method requires a priori knowledge of all applications. We analyze the impact of a priori application knowledge on the subset quality
[...] Read more.
Designers can reduce design space exploration time and efforts using the design space subsetting method that removes energy-redundant configurations. However, the subsetting method requires a priori knowledge of all applications. We analyze the impact of a priori application knowledge on the subset quality by varying the amount of a priori application information available to designers during design time from no information to a general knowledge of the application domain. The results showed that only a small set of applications representative of the anticipated applications’ general domains alleviated the design efforts and was sufficient to provide energy savings within 5.6% of the complete, unsubsetted design space. Furthermore, since using a small set of applications was likely to reduce the design space exploration time, we analyze and quantify the impact of a priori applications knowledge on the speedup in the execution time to select the desired configurations. The results revealed that a basic knowledge of the anticipated applications reduced the subset design space exploration time by up to 6.6X. Full article
(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)
Figures

Figure 1

Open AccessFeature PaperArticle TaPT: Temperature-Aware Dynamic Cache Optimization for Embedded Systems
Received: 24 November 2017 / Revised: 19 December 2017 / Accepted: 20 December 2017 / Published: 22 December 2017
PDF Full-text (1576 KB) | HTML Full-text | XML Full-text
Abstract
Embedded systems have stringent design constraints, which has necessitated much prior research focus on optimizing energy consumption and/or performance. Since embedded systems typically have fewer cooling options, rising temperature, and thus temperature optimization, is an emergent concern. Most embedded systems only dissipate heat
[...] Read more.
Embedded systems have stringent design constraints, which has necessitated much prior research focus on optimizing energy consumption and/or performance. Since embedded systems typically have fewer cooling options, rising temperature, and thus temperature optimization, is an emergent concern. Most embedded systems only dissipate heat by passive convection, due to the absence of dedicated thermal management hardware mechanisms. The embedded system’s temperature not only affects the system’s reliability, but can also affect the performance, power, and cost. Thus, embedded systems require efficient thermal management techniques. However, thermal management can conflict with other optimization objectives, such as execution time and energy consumption. In this paper, we focus on managing the temperature using a synergy of cache optimization and dynamic frequency scaling, while also optimizing the execution time and energy consumption. This paper provides new insights on the impact of cache parameters on efficient temperature-aware cache tuning heuristics. In addition, we present temperature-aware phase-based tuning, TaPT, which determines Pareto optimal clock frequency and cache configurations for fine-grained execution time, energy, and temperature tradeoffs. TaPT enables autonomous system optimization and also allows designers to specify temperature constraints and optimization priorities. Experiments show that TaPT can effectively reduce execution time, energy, and temperature, while imposing minimal hardware overhead. Full article
(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)
Figures

Figure 1

Back to Top