Skip to Content

146 Results Found

  • Article
  • Open Access
4 Citations
5,719 Views
19 Pages

Hardware–Software Co-Design for Decimal Multiplication

  • Riaz-ul-haque Mian,
  • Michihiro Shintani and
  • Michiko Inoue

27 January 2021

Decimal arithmetic using software is slow for very large-scale applications. On the other hand, when hardware is employed, extra area overhead is required. A balanced strategy can overcome both issues. Our proposed methods are compliant with the IEEE...

  • Article
  • Open Access
6 Citations
3,641 Views
20 Pages

A Novel Hardware–Software Co-Design and Implementation of the HOG Algorithm

  • Sina Ghaffari,
  • Parastoo Soleimani,
  • Kin Fun Li and
  • David W. Capson

2 October 2020

The histogram of oriented gradients is a commonly used feature extraction algorithm in many applications. Hardware acceleration can boost the speed of this algorithm due to its large number of computations. We propose a hardware–software co-des...

  • Article
  • Open Access
22 Citations
14,998 Views
28 Pages

Hardware/Software Co-Design of a Traffic Sign Recognition System Using Zynq FPGAs

  • Yan Han,
  • Kushal Virupakshappa,
  • Esdras Vitor Silva Pinto and
  • Erdal Oruklu

4 December 2015

Traffic sign recognition (TSR), taken as an important component of an intelligent vehicle system, has been an emerging research topic in recent years. In this paper, a traffic sign detection system based on color segmentation, speeded-up robust featu...

  • Article
  • Open Access
8 Citations
4,403 Views
29 Pages

Hardware/Software Co-Design of Fractal Features Based Fall Detection System

  • Ahsen Tahir,
  • Gordon Morison,
  • Dawn A. Skelton and
  • Ryan M. Gibson

18 April 2020

Falls are a leading cause of death in older adults and result in high levels of mortality, morbidity and immobility. Fall Detection Systems (FDS) are imperative for timely medical aid and have been known to reduce death rate by 80%. We propose a nove...

  • Article
  • Open Access
5 Citations
5,984 Views
15 Pages

26 September 2022

Owing to their high accuracy, deep convolutional neural networks (CNNs) are extensively used. However, they are characterized by high complexity. Real-time performance and acceleration are required in current CNN systems. A graphics processing unit (...

  • Article
  • Open Access
1,156 Views
23 Pages

25 October 2025

The gravity forward modeling algorithm is a compute-intensive method and is widely used in scientific computing, particularly in geophysics, to predict the impact of subsurface structures on surface gravity fields. Traditional implementations rely on...

  • Article
  • Open Access
989 Views
27 Pages

Hardware–Software Co-Design Architecture for Real-Time EMG Feature Processing in FPGA-Based Prosthetic Systems

  • Carlos Gabriel Mireles-Preciado,
  • Diana Carolina Toledo-Pérez,
  • Roberto Augusto Gómez-Loenzo,
  • Marcos Aviles and
  • Juvenal Rodríguez-Reséndiz

30 September 2025

This paper presents a novel hardware architecture for implementing real-time EMG feature extraction and dimensionality reduction in resource-constrained FPGA environments. The proposed co-processing architecture integrates four time-domain feature ex...

  • Article
  • Open Access
7 Citations
6,621 Views
17 Pages

Accelerating SuperBE with Hardware/Software Co-Design

  • Andrew Tzer-Yeu Chen,
  • Rohaan Gupta,
  • Anton Borzenko,
  • Kevin I-Kai Wang and
  • Morteza Biglari-Abhari

18 October 2018

Background Estimation is a common computer vision task, used for segmenting moving objects in video streams. This can be useful as a pre-processing step, isolating regions of interest for more complicated algorithms performing detection, recognition,...

  • Article
  • Open Access
20 Citations
5,375 Views
16 Pages

Robust and High-Performance Machine Vision System for Automatic Quality Inspection in Assembly Processes

  • Fabio Frustaci,
  • Fanny Spagnolo,
  • Stefania Perri,
  • Giuseppe Cocorullo and
  • Pasquale Corsonello

7 April 2022

This paper addresses the problem of automatic quality inspection in assembly processes by discussing the design of a computer vision system realized by means of a heterogeneous multiprocessor system-on-chip. Such an approach was applied to a real cat...

  • Article
  • Open Access
3 Citations
3,423 Views
20 Pages

4 November 2022

In embedded electronic system applications being developed today, complex datasets are required to be obtained, processed, and communicated. These can be from various sources such as environmental sensors, still image cameras, and video cameras. Once...

  • Article
  • Open Access
1 Citations
2,447 Views
22 Pages

Hardware/Software Co-Design of a Circle Detection System Based on Evolutionary Computing

  • Luis Felipe Rojas-Muñoz,
  • Horacio Rostro-González,
  • Carlos Hugo García-Capulín and
  • Santiago Sánchez-Solano

27 August 2022

In recent years, the strategy of co-designing Hardware/Software (HW/SW) systems has been widely adopted to exploit the synergy between both approaches thanks to technological advances that have led to more powerful devices providing an increasingly b...

  • Article
  • Open Access
8 Citations
3,788 Views
22 Pages

29 January 2022

Microarchitectural attacks exploit target hardware properties to break software isolation techniques used by the processor. These attacks are extremely powerful and hard to detect since the determination of the program execution’s impact on the...

  • Article
  • Open Access
3 Citations
5,108 Views
26 Pages

Hardware/Software Co-Design Optimization for Training Recurrent Neural Networks at the Edge

  • Yicheng Zhang,
  • Bojian Yin,
  • Manil Dev Gomony,
  • Henk Corporaal,
  • Carsten Trinitis and
  • Federico Corradi

Edge devices execute pre-trained Artificial Intelligence (AI) models optimized on large Graphical Processing Units (GPUs); however, they frequently require fine-tuning when deployed in the real world. This fine-tuning, referred to as edge learning, i...

  • Article
  • Open Access
1 Citations
1,157 Views
22 Pages

This paper proposes a hardware–software co-design for adaptive lossless compression based on Hybrid Arithmetic–Huffman Coding, a table-driven approximation of arithmetic coding that preserves near-optimal compression efficiency while elim...

  • Feature Paper
  • Article
  • Open Access
1,046 Views
34 Pages

2 November 2025

The increasing adoption of high-performance DC motor control in embedded systems has driven the development of cost-effective solutions that extend beyond traditional software-based optimization techniques. This work presents a refined hardware-centr...

  • Feature Paper
  • Article
  • Open Access
3 Citations
1,793 Views
18 Pages

30 December 2024

Graph-based neural networks have proven to be useful in molecular property prediction, a critical component of computer-aided drug discovery. In this application, in response to the growing demand for improved computational efficiency and localized e...

  • Article
  • Open Access
1,045 Views
22 Pages

Design of Real-Time Gesture Recognition with Convolutional Neural Networks on a Low-End FPGA

  • Rui Policarpo Duarte,
  • Tiago Gonçalves,
  • Gustavo Jacinto,
  • Paulo Flores and
  • Mário Véstias

29 August 2025

Hand gesture recognition is used in human–computer interaction, with multiple applications in assistive technologies, virtual reality, and smart systems. While vision-based methods are commonly employed, they are often computationally intensive...

  • Article
  • Open Access
6 Citations
2,682 Views
17 Pages

High repetition rate lidar is typically equipped with a low-energy, high repetition rate laser, and small aperture telescopes. Therefore, it is small, compact, low-cost, and can be networked for observation. However, its data acquisition and control...

  • Article
  • Open Access
17 Citations
2,944 Views
23 Pages

12 November 2022

Digital designs complexity has exponentially increased in the last decades. Heterogeneous Systems-on-Chip integrate many different hardware components which require a reliable and scalable verification environment. The effort to set up such environme...

  • Article
  • Open Access
32 Citations
10,971 Views
21 Pages

The LinoSPAD camera system is a modular, compact and versatile time-resolved camera system, combining a linear 256 high fill factor pixel CMOS SPAD (single-photon avalanche diode) sensor with an FPGA (field-programmable gate array) and USB 3.0 transc...

  • Article
  • Open Access
347 Views
25 Pages

Privacy-Preserving Set Intersection Protocol Based on SM2 Oblivious Transfer

  • Zhibo Guan,
  • Hai Huang,
  • Haibo Yao,
  • Qiong Jia,
  • Kai Cheng,
  • Mengmeng Ge,
  • Bin Yu and
  • Chao Ma

10 January 2026

Private Set Intersection (PSI) is a fundamental cryptographic primitive in privacy-preserving computation and has been widely applied in federated learning, secure data sharing, and privacy-aware data analytics. However, most existing PSI protocols r...

  • Article
  • Open Access
812 Views
23 Pages

SparseDroop: Hardware–Software Co-Design for Mitigating Voltage Droop in DNN Accelerators

  • Arnab Raha,
  • Shamik Kundu,
  • Arghadip Das,
  • Soumendu Kumar Ghosh and
  • Deepak A. Mathaikutty

Modern deep neural network (DNN) accelerators must sustain high throughput while avoiding performance degradation from supply voltage (VDD) droop, which occurs when large arrays of multiply–accumulate (MAC) units switch concurrently and induce...

  • Review
  • Open Access
9 Citations
8,966 Views
21 Pages

Dealing with resource constraints is an inevitable feature of embedded systems. Power and performance are the main concerns beside others. Pre-silicon analysis of power and performance in today’s complex embedded designs is a big challenge. Alt...

  • Article
  • Open Access
1,435 Views
21 Pages

Karatsuba Algorithm Revisited for 2D Convolution Computation Optimization

  • Qi Wang,
  • Jianghan Zhu,
  • Can He,
  • Shihang Wang,
  • Xingbo Wang,
  • Yuan Ren and
  • Terry Tao Ye

8 May 2025

Convolution plays a significant role in many scientific and technological computations, such as artificial intelligence and signal processing. Convolutional computations consist of many dot-product operations (multiplication–accumulation, or MA...

  • Article
  • Open Access
2 Citations
4,704 Views
42 Pages

iDocChip: A Configurable Hardware Accelerator for an End-to-End Historical Document Image Processing

  • Menbere Kina Tekleyohannes,
  • Vladimir Rybalkin,
  • Muhammad Mohsin Ghaffar,
  • Javier Alejandro Varela,
  • Norbert Wehn and
  • Andreas Dengel

3 September 2021

In recent years, there has been an increasing demand to digitize and electronically access historical records. Optical character recognition (OCR) is typically applied to scanned historical archives to transcribe them from document images into machin...

  • Article
  • Open Access
4 Citations
3,094 Views
14 Pages

24 February 2024

Keyword spotting is an important part of modern speech recognition pipelines. Typical contemporary keyword-spotting systems are based on Mel-Frequency Cepstral Coefficient (MFCC) audio features, which are relatively complex to compute. Considering th...

  • Article
  • Open Access
1 Citations
2,160 Views
20 Pages

Generative Design of the Architecture Platform in Multiprocessor System Design

  • Luise Müller,
  • Nico Schumacher,
  • Lukas Steffen and
  • Christian Haubelt

When designing a system at the Electronic System Level (ESL), designers are confronted with a very large number of design decisions, each affecting the characteristics of the resulting system. Simultaneously, the demands for the system’s perfor...

  • Article
  • Open Access
14 Citations
4,793 Views
21 Pages

2 November 2020

The convolutional neural networks (CNNs) are a computation and memory demanding class of deep neural networks. The field-programmable gate arrays (FPGAs) are often used to accelerate the networks deployed in embedded platforms due to the high computa...

  • Article
  • Open Access
1 Citations
2,262 Views
24 Pages

31 May 2024

Batteryless, self-sustaining embedded sensing devices are key enablers for scalable and long-term operations of Internet of Things (IoT) applications. While advancements in both energy harvesting and intermittent computing have helped pave the way fo...

  • Article
  • Open Access
8 Citations
2,958 Views
16 Pages

Energy-Efficient and Real-Time Wearable for Wellbeing-Monitoring IoT System Based on SoC-FPGA

  • Maria Inês Frutuoso,
  • Horácio C. Neto,
  • Mário P. Véstias and
  • Rui Policarpo Duarte

4 March 2023

Wearable devices used for personal monitoring applications have been improved over the last decades. However, these devices are limited in terms of size, processing capability and power consumption. This paper proposes an efficient hardware/software...

  • Feature Paper
  • Article
  • Open Access
6 Citations
4,664 Views
35 Pages

M3-AC: A Multi-Mode Multithread SoC FPGA Based Acoustic Camera

  • Jurgen Vandendriessche,
  • Bruno da Silva,
  • Lancelot Lhoest,
  • An Braeken and
  • Abdellah Touhafi

Acoustic cameras allow the visualization of sound sources using microphone arrays and beamforming techniques. The required computational power increases with the number of microphones in the array, the acoustic images resolution, and in particular, w...

  • Article
  • Open Access
1,351 Views
26 Pages

18 August 2025

The need for efficient and real-time traffic sign recognition has become increasingly important as autonomous vehicles and Advanced Driver Assistance Systems (ADASs) continue to evolve. This study introduces TSRACE-AI, a system that accelerates traff...

  • Article
  • Open Access
1,344 Views
19 Pages

Extending a Moldable Computer Architecture to Accelerate DL Inference on FPGA

  • Mirko Mariotti,
  • Giulio Bianchini,
  • Igor Neri,
  • Daniele Spiga,
  • Diego Ciangottini and
  • Loriano Storchi

3 September 2025

Over Over the past years, the field of Machine Learning (ML) and Deep Learning (DL) has seen strong developments both in terms of software and hardware, with the increase of specialized devices. One of the biggest challenges in this field is the infe...

  • Article
  • Open Access
2,995 Views
40 Pages

12 October 2025

In recent years, the demand for efficient neural networks in embedded contexts has grown, driven by the need for real-time inference with limited resources. While GPUs offer high performance, their size, power consumption, and cost often make them un...

  • Article
  • Open Access
5 Citations
1,630 Views
29 Pages

26 December 2024

This study presents a comprehensive workflow for developing and deploying Multi-Layer Perceptron (MLP)-based soft sensors on embedded FPGAs, addressing diverse deployment objectives. The proposed workflow extends our prior research by introducing gre...

  • Feature Paper
  • Article
  • Open Access
7 Citations
5,479 Views
19 Pages

The ongoing era of the Internet of Things is opening up new opportunities towards the integration and interoperation of heterogeneous technologies at different abstraction layers, going from the so-called Edge Computing up to the Cloud and IoT Data A...

  • Article
  • Open Access
5 Citations
6,103 Views
27 Pages

Data-Driven Multiresolution Camera Using the Foveal Adaptive Pyramid

  • Martin González,
  • Antonio Sánchez-Pedraza,
  • Rebeca Marfil,
  • Juan A. Rodríguez and
  • Antonio Bandera

26 November 2016

There exist image processing applications, such as tracking or pattern recognition, that are not necessarily precise enough to maintain the same resolution across the whole image sensor. In fact, they must only keep it as high as possible in a relati...

  • Article
  • Open Access
11 Citations
6,974 Views
23 Pages

4 May 2015

The complexity of hardware designs is still increasing according to Moore’s law. With embedded systems being more and more intertwined and working together not only with each other, but also with their environments as cyber physical systems (CPSs), m...

  • Article
  • Open Access
2 Citations
6,286 Views
12 Pages

Automatic RTL Generation Tool of FPGAs for DNNs

  • Seojin Jang,
  • Wei Liu,
  • Sangun Park and
  • Yongbeom Cho

With the increasing use of multi-purpose artificial intelligence of things (AIOT) devices, embedded field-programmable gate arrays (FPGA) represent excellent platforms for deep neural network (DNN) acceleration on edge devices. FPGAs possess the adva...

  • Article
  • Open Access
25 Citations
7,171 Views
15 Pages

22 November 2021

On-device artificial intelligence has attracted attention globally, and attempts to combine the internet of things and TinyML (machine learning) applications are increasing. Although most edge devices have limited resources, time and energy costs are...

  • Article
  • Open Access
1 Citations
2,801 Views
23 Pages

17 January 2025

Deep learning significantly advances object detection. Post processes, a critical component of this process, select valid bounding boxes to represent the true targets during inference and assign boxes and labels to these objects during training to op...

  • Article
  • Open Access
5 Citations
6,357 Views
13 Pages

Multicore and multithreaded architectures increase the performance of computing systems. The increase in cores and threads, however, raises further issues in the efficiency achieved in terms of speedup and parallelization, particularly for the real-t...

  • Feature Paper
  • Review
  • Open Access
15 Citations
20,271 Views
73 Pages

Hardware Design and Verification with Large Language Models: A Scoping Review, Challenges, and Open Issues

  • Meisam Abdollahi,
  • Seyedeh Faegheh Yeganli,
  • Mohammad (Amir) Baharloo and
  • Amirali Baniasadi

30 December 2024

Background: Large Language Models (LLMs) are emerging as promising tools in hardware design and verification, with recent advancements suggesting they could fundamentally reshape conventional practices. Objective: This study examines the significance...

  • Article
  • Open Access
1 Citations
3,617 Views
20 Pages

27 August 2022

Human-in-the-loop driving simulation aims to create the illusion of driving by stimulating the driver’s sensory systems in as realistic conditions as possible. However, driving simulators can only produce a subset of the sensory stimuli that wo...

  • Feature Paper
  • Article
  • Open Access
6 Citations
2,265 Views
18 Pages

Implementation of the permanent magnet synchronous motor vector control implies strong time dependencies. The control process requires precise measurement of motor shaft position and winding currents to establish correct driving. The tight time depen...

  • Article
  • Open Access
10 Citations
10,336 Views
28 Pages

Designing Domain-Specific Heterogeneous Architectures from Dataflow Programs

  • Süleyman Savas,
  • Zain Ul-Abdin and
  • Tomas Nordström

The last ten years have seen performance and power requirements pushing computer architectures using only a single core towards so-called manycore systems with hundreds of cores on a single chip. To further increase performance and energy efficiency,...

  • Article
  • Open Access
42 Citations
4,354 Views
10 Pages

A Fully-Integrated Analog Machine Learning Classifier for Breast Cancer Classification

  • Sanjeev T. Chandrasekaran,
  • Ruobing Hua,
  • Imon Banerjee and
  • Arindam Sanyal

We propose a fully integrated common-source amplifier based analog artificial neural network (ANN). The performance of the proposed ANN with a custom non-linear activation function is demonstrated on the breast cancer classification task. A hardware-...

  • Review
  • Open Access
392 Views
31 Pages

The integration of Artificial Intelligence (AI) into Internet of Things (IoT) medical devices has revolutionized arrhythmia monitoring. However, the high computational and power demands of traditional Deep Learning (DL) models pose significant challe...

  • Review
  • Open Access
2,344 Views
24 Pages

A Review on AI Miniaturization: Trends and Challenges

  • Bin Tang,
  • Shengzhi Du and
  • Antonie Johan Smith

12 October 2025

Artificial intelligence (AI) often suffers from high energy consumption and complex deployment in resource-constrained environments, leading to a structural mismatch between capability and deployability. This review takes two representative scenarios...

of 3