MDPI - Publisher of Open Access Journals

20 pages, 578 KiB

Open AccessReview

A Survey on RISC-V-Based Machine Learning Ecosystem

by Stavros Kalapothas, Manolis Galetakis, Georgios Flamis, Fotis Plessas and Paris Kitsos

Information 2023, 14(2), 64; https://doi.org/10.3390/info14020064 - 21 Jan 2023

Cited by 24 | Viewed by 16838

In recent years, the advancements in specialized hardware architectures have supported the industry and the research community to address the computation power needed for more enhanced and compute intensive artificial intelligence (AI) algorithms and applications that have already reached a substantial growth, such as in natural language processing (NLP) and computer vision (CV). The developments of open-source hardware (OSH) and the contribution towards the creation of hardware-based accelerators with implication mainly in machine learning (ML), has also been significant. In particular, the reduced instruction-set computer-five (RISC-V) open standard architecture has been widely adopted by a community of researchers and commercial users, worldwide, in numerous openly available implementations. The selection through a plethora of RISC-V processor cores and the mix of architectures and configurations combined with the proliferation of ML software frameworks for ML workloads, is not trivial. In order to facilitate this process, this paper presents a survey focused on the assessment of the ecosystem that entails RISC-V based hardware for creating a classification of system-on-chip (SoC) and CPU cores, along with an inclusive arrangement of the latest released frameworks that have supported open hardware integration for ML applications. Moreover, part of this work is devoted to the challenges that are concerned, such as power efficiency and reliability, when designing and building application with OSH in the AI/ML domain. This study presents a quantitative taxonomy of RISC-V SoC and reveals the opportunities in future research in machine learning with RISC-V open-source hardware architectures. Full article

► Show Figures

Figure 1

12 pages, 2486 KiB

Open AccessEditor’s ChoiceArticle

Efficient Edge-AI Application Deployment for FPGAs

by Stavros Kalapothas, Georgios Flamis and Paris Kitsos

Information 2022, 13(6), 279; https://doi.org/10.3390/info13060279 - 28 May 2022

Cited by 30 | Viewed by 9204

Abstract

Field Programmable Gate Array (FPGA) accelerators have been widely adopted for artificial intelligence (AI) applications on edge devices (Edge-AI) utilizing Deep Neural Networks (DNN) architectures. FPGAs have gained their reputation due to the greater energy efficiency and high parallelism than microcontrollers (MCU) and graphical processing units (GPU), while they are easier to develop and more reconfigurable than the Application Specific Integrated Circuit (ASIC). The development and building of AI applications on resource constraint devices such as FPGAs remains a challenge, however, due to the co-design approach, which requires a valuable expertise in low-level hardware design and in software development. This paper explores the efficacy and the dynamic deployment of hardware accelerated applications on the Kria KV260 development platform based on the Xilinx Kria K26 system-on-module (SoM), which includes a Zynq multiprocessor system-on-chip (MPSoC). The platform supports the Python-based PYNQ framework and maintains a high level of versatility with the support of custom bitstreams (overlays). The demonstration proved the reconfigurabibilty and the overall ease of implementation with low-footprint machine learning (ML) algorithms. Full article

(This article belongs to the Special Issue Design Automation, Computer Engineering, Computer Networks and Social Media (SEEDA-CECNSM 2021))

► Show Figures

Figure 1

23 pages, 8091 KiB

Open AccessFeature PaperArticle

GAINESIS: Generative Artificial Intelligence NEtlists SynthesIS

by Konstantinos G. Liakos, Georgios K. Georgakilas, Fotis C. Plessas and Paris Kitsos

Electronics 2022, 11(2), 245; https://doi.org/10.3390/electronics11020245 - 13 Jan 2022

Cited by 14 | Viewed by 4088

Abstract

A significant problem in the field of hardware security consists of hardware trojan (HT) viruses. The insertion of HTs into a circuit can be applied for each phase of the circuit chain of production. HTs degrade the infected circuit, destroy it or leak encrypted data. Nowadays, efforts are being made to address HTs through machine learning (ML) techniques, mainly for the gate-level netlist (GLN) phase, but there are some restrictions. Specifically, the number and variety of normal and infected circuits that exist through the free public libraries, such as Trust-HUB, are based on the few samples of benchmarks that have been created from circuits large in size. Thus, it is difficult, based on these data, to develop robust ML-based models against HTs. In this paper, we propose a new deep learning (DL) tool named Generative Artificial Intelligence Netlists SynthesIS (GAINESIS). GAINESIS is based on the Wasserstein Conditional Generative Adversarial Network (WCGAN) algorithm and area–power analysis features from the GLN phase and synthesizes new normal and infected circuit samples for this phase. Based on our GAINESIS tool, we synthesized new data sets, different in size, and developed and compared seven ML classifiers. The results demonstrate that our new generated data sets significantly enhance the performance of ML classifiers compared with the initial data set of Trust-HUB. Full article

(This article belongs to the Special Issue Circuits and Systems of Security Applications)

► Show Figures

Figure 1

15 pages, 1891 KiB

Open AccessFeature PaperArticle

Best Practices for the Deployment of Edge Inference: The Conclusions to Start Designing

by Georgios Flamis, Stavros Kalapothas and Paris Kitsos

Electronics 2021, 10(16), 1912; https://doi.org/10.3390/electronics10161912 - 9 Aug 2021

Cited by 6 | Viewed by 4172

Abstract

The number of Artificial Intelligence (AI) and Machine Learning (ML) designs is rapidly increasing and certain concerns are raised on how to start an AI design for edge systems, what are the steps to follow and what are the critical pieces towards the most optimal performance. The complete development flow undergoes two distinct phases; training and inference. During training, all the weights are calculated through optimization and back propagation of the network. The training phase is executed with the use of 32-bit floating point arithmetic as this is the convenient format for GPU platforms. The inference phase on the other hand, uses a trained network with new data. The sensitive optimization and back propagation phases are removed and forward propagation is only used. A much lower bit-width and fixed point arithmetic is used aiming a good result with reduced footprint and power consumption. This study follows the survey based process and it is aimed to provide answers such as to clarify all AI edge hardware design aspects from the concept to the final implementation and evaluation. The technology as frameworks and procedures are presented to the order of execution for a complete design cycle with guaranteed success. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

14 pages, 824 KiB

Open AccessFeature PaperArticle

Compact Hardware Architectures of Enocoro-128v2 Stream Cipher for Constrained Embedded Devices

by Lampros Pyrgas and Paris Kitsos

Electronics 2020, 9(9), 1505; https://doi.org/10.3390/electronics9091505 - 14 Sep 2020

Cited by 4 | Viewed by 3293

Abstract

Lightweight cryptography is a vital and fast growing field in today’s world where billions of constrained devices interact with each other. In this paper, two novel compact architectures of the Enocoro-128v2 stream cipher are presented. The Enocoro-128v2 is part of the ISO/IEC 29192-3 standard. The first architecture has an 8-bit datapath while the second one has a 4-bit datapath. The proposed architectures were implemented on the BASYS3 board (Artix 7 XC7A35T) using the VERILOG hardware description language. The hardware implementation of the proposed 8-bit architecture runs at a 189 MHz clock and reaches a throughput equal to 302 Mbps, while at the same time, it utilizes only 254 Look-up Tables (LUTs) and 330 Flip-flops (FFs). Each round of computations requires 5 clock cycles. The 4-bit implementation has an operating frequency of 204 MHz and reaches a throughput equal to 181 Mbps, with each round requiring 9 clock cycles. The 4-bit implementation utilizes 249 LUTs and 343 FFs. To our knowledge, this is the first time that such implementations of the Enocoro-128v2 are presented. Both implementations utilize a very low number of resources (only 78 FPGA slices are required for the 8-bit architecture and only 83 for the 4-bit one) and the results demonstrate that they are sustainable for area constrained embedded devices. Full article

(This article belongs to the Section Circuit and Signal Processing)

► Show Figures

Figure 1

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI