Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (121)

Search Parameters:
Keywords = Hardware–Software Co-Design

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 3044 KiB  
Review
Deep Learning-Based Sound Source Localization: A Review
by Kunbo Xu, Zekai Zong, Dongjun Liu, Ran Wang and Liang Yu
Appl. Sci. 2025, 15(13), 7419; https://doi.org/10.3390/app15137419 - 2 Jul 2025
Viewed by 368
Abstract
As a fundamental technology in environmental perception, sound source localization (SSL) plays a critical role in public safety, marine exploration, and smart home systems. However, traditional methods such as beamforming and time-delay estimation rely on manually designed physical models and idealized assumptions, which [...] Read more.
As a fundamental technology in environmental perception, sound source localization (SSL) plays a critical role in public safety, marine exploration, and smart home systems. However, traditional methods such as beamforming and time-delay estimation rely on manually designed physical models and idealized assumptions, which struggle to meet practical demands in dynamic and complex scenarios. Recent advancements in deep learning have revolutionized SSL by leveraging its end-to-end feature adaptability, cross-scenario generalization capabilities, and data-driven modeling, significantly enhancing localization robustness and accuracy in challenging environments. This review systematically examines the progress of deep learning-based SSL across three critical domains: marine environments, indoor reverberant spaces, and unmanned aerial vehicle (UAV) monitoring. In marine scenarios, complex-valued convolutional networks combined with adversarial transfer learning mitigate environmental mismatch and multipath interference through phase information fusion and domain adaptation strategies. For indoor high-reverberation conditions, attention mechanisms and multimodal fusion architectures achieve precise localization under low signal-to-noise ratios by adaptively weighting critical acoustic features. In UAV surveillance, lightweight models integrated with spatiotemporal Transformers address dynamic modeling of non-stationary noise spectra and edge computing efficiency constraints. Despite these advancements, current approaches face three core challenges: the insufficient integration of physical principles, prohibitive data annotation costs, and the trade-off between real-time performance and accuracy. Future research should prioritize physics-informed modeling to embed acoustic propagation mechanisms, unsupervised domain adaptation to reduce reliance on labeled data, and sensor-algorithm co-design to optimize hardware-software synergy. These directions aim to propel SSL toward intelligent systems characterized by high precision, strong robustness, and low power consumption. This work provides both theoretical foundations and technical references for algorithm selection and practical implementation in complex real-world scenarios. Full article
Show Figures

Figure 1

26 pages, 1929 KiB  
Article
PASS: A Flexible Programmable Framework for Building Integrated Security Stack in Public Cloud
by Wenwen Fu, Jinli Yan, Jian Zhang, Yinhan Sun, Yong Wang, Ziwen Zhang, Qianming Yang and Yongwen Wang
Electronics 2025, 14(13), 2650; https://doi.org/10.3390/electronics14132650 - 30 Jun 2025
Viewed by 234
Abstract
Integrated security stacks, which offer diverse security function chains in a single device, hold substantial potential to satisfy the security requirements of multiple tenants on a public cloud. However, it is difficult for the software-only or hardware-customized security stack to establish a good [...] Read more.
Integrated security stacks, which offer diverse security function chains in a single device, hold substantial potential to satisfy the security requirements of multiple tenants on a public cloud. However, it is difficult for the software-only or hardware-customized security stack to establish a good tradeoff between performance and flexibility. SmartNIC overcomes these limitations by providing a programmable platform for implementing these functions with hardware acceleration. Significantly, without a professional CPU/SmartNIC co-design, developing security function chains from scratch with low-level APIs is challenging and tedious for network operators. This paper presents PASS, a flexible programmable framework for the fast development of high-performance security stacks with SmartNIC acceleration. In the data plane, PASS provides modular abstractions to extract the shared security logic and eliminate redundant operations by reusing the intermediate results with the customized metadata. In the control plane, PASS offloads the tedious security policy conversion to the proposed security auxiliary plane. With well-defined APIs, developers only need to focus on the core logic instead of labor-intensive shared logic. We built a PASS prototype based on a CPU-FPGA platform and developed three typical security components. Compared to implementation from scratch, PASS reduces the code by 65% on average. Additionally, PASS improves security processing performance by 76% compared to software-only implementations and optimizes the latency of policy translation and distribution by 90% versus the architecture without offloading. Full article
Show Figures

Figure 1

54 pages, 2065 KiB  
Review
Edge Intelligence: A Review of Deep Neural Network Inference in Resource-Limited Environments
by Dat Ngo, Hyun-Cheol Park and Bongsoon Kang
Electronics 2025, 14(12), 2495; https://doi.org/10.3390/electronics14122495 - 19 Jun 2025
Viewed by 749
Abstract
Deploying deep neural networks (DNNs) in resource-limited environments—such as smartwatches, IoT nodes, and intelligent sensors—poses significant challenges due to constraints in memory, computing power, and energy budgets. This paper presents a comprehensive review of recent advances in accelerating DNN inference on edge platforms, [...] Read more.
Deploying deep neural networks (DNNs) in resource-limited environments—such as smartwatches, IoT nodes, and intelligent sensors—poses significant challenges due to constraints in memory, computing power, and energy budgets. This paper presents a comprehensive review of recent advances in accelerating DNN inference on edge platforms, with a focus on model compression, compiler optimizations, and hardware–software co-design. We analyze the trade-offs between latency, energy, and accuracy across various techniques, highlighting practical deployment strategies on real-world devices. In particular, we categorize existing frameworks based on their architectural targets and adaptation mechanisms and discuss open challenges such as runtime adaptability and hardware-aware scheduling. This review aims to guide the development of efficient and scalable edge intelligence solutions. Full article
Show Figures

Figure 1

36 pages, 25977 KiB  
Article
How to Win Bosch Future Mobility Challenge: Design and Implementation of the VROOM Autonomous Scaled Vehicle
by Theodoros Papafotiou, Emmanouil Tsardoulias, Alexandros Nikolaou, Aikaterini Papagiannitsi, Despoina Christodoulou, Ioannis Gkountras and Andreas L. Symeonidis
Machines 2025, 13(6), 514; https://doi.org/10.3390/machines13060514 - 12 Jun 2025
Viewed by 1467
Abstract
Over the last decade, a transformation in the automotive industry has been witnessed, as advancements in artificial intelligence and sensor technology have continued to accelerate the development of driverless vehicles. These systems are expected to significantly reduce traffic accidents and associated costs, making [...] Read more.
Over the last decade, a transformation in the automotive industry has been witnessed, as advancements in artificial intelligence and sensor technology have continued to accelerate the development of driverless vehicles. These systems are expected to significantly reduce traffic accidents and associated costs, making their integration into future transportation systems highly impactful. To explore this field in a controlled and flexible manner, scaled autonomous vehicle platforms are increasingly adopted for experimentation. In this work, we propose a set of methodologies to perform autonomous driving tasks through a software–hardware co-design approach. The developed system focuses on deploying a modular and reconfigurable software stack tailored to run efficiently on constrained embedded hardware, demonstrating a balance between real-time capability and computational resource usage. The proposed platform was implemented on a 1:10 scale vehicle that participated in the Bosch Future Mobility Challenge (BFMC) 2024. It integrates a high-performance embedded computing unit and a heterogeneous sensor suite to achieve reliable perception, decision-making, and control. The architecture is structured across four interconnected layers—Input, Perception, Control, and Output—allowing flexible module integration and reusability. The effectiveness of the system was validated throughout the competition scenarios, leading the team to secure first place. Although the platform was evaluated on a scaled vehicle, its underlying software–hardware principles are broadly applicable and scalable to larger autonomous systems. Full article
(This article belongs to the Special Issue Emerging Approaches to Intelligent and Autonomous Systems)
Show Figures

Figure 1

15 pages, 126037 KiB  
Article
An Improved Dark Channel Prior Method for Video Defogging and Its FPGA Implementation
by Lin Wang, Zhongqiang Luo and Li Gao
Symmetry 2025, 17(6), 839; https://doi.org/10.3390/sym17060839 - 27 May 2025
Viewed by 437
Abstract
In fog, rain, snow, haze, and other complex environments, environmental objects photographed by imaging equipment are prone to image blurring, contrast degradation, and other problems. The decline in image quality fails to satisfy the requirements of application scenarios such as video surveillance, satellite [...] Read more.
In fog, rain, snow, haze, and other complex environments, environmental objects photographed by imaging equipment are prone to image blurring, contrast degradation, and other problems. The decline in image quality fails to satisfy the requirements of application scenarios such as video surveillance, satellite reconnaissance, and target tracking. Aiming at the shortcomings of the traditional dark channel prior algorithm in video defogging, this paper proposes a method to improve the guided filtering algorithm to refine the transmittance image and reduce the halo effect in the traditional algorithm. Meanwhile, a gamma correction method is proposed to recover the defogged image and enhance the image details in a low-light environment. The parallel symmetric pipeline design of the FPGA is used to improve the system’s overall stability. The improved dark channel prior algorithm is realized through the hardware–software co-design of ARM and the FPGA. Experiments show that this algorithm improves the Underwater Image Quality Measure (UIQM), Average Gradient (AG), and Information Entropy (IE) of the image, while the system is capable of stably processing video images with a resolution of 1280 × 720 @ 60 fps. By numerically analyzing the power consumption and resource usage at the board level, the power consumption on the FPGA is only 2.242 W, which puts the hardware circuit design in the category of low power consumption. Full article
(This article belongs to the Section Engineering and Materials)
Show Figures

Figure 1

12 pages, 870 KiB  
Article
An Improved Strategy for Data Layout in Convolution Operations on FPGA-Based Multi-Memory Accelerators
by Yongchang Wang and Hongzhi Zhao
Electronics 2025, 14(11), 2127; https://doi.org/10.3390/electronics14112127 - 23 May 2025
Viewed by 382
Abstract
Convolutional Neural Networks (CNNs) are fundamental to modern AI applications but often suffer from significant memory bottlenecks due to non-contiguous access patterns during convolution operations. Although previous work has optimized data layouts at the software level, hardware-level solutions for multi-memory accelerators remain underexplored. [...] Read more.
Convolutional Neural Networks (CNNs) are fundamental to modern AI applications but often suffer from significant memory bottlenecks due to non-contiguous access patterns during convolution operations. Although previous work has optimized data layouts at the software level, hardware-level solutions for multi-memory accelerators remain underexplored. In this paper, we propose a hardware-level approach to mitigate memory row conflicts in FPGA-based CNN accelerators. Specifically, we introduce a dynamic DDR controller generated using Vivado 2019.1, which optimizes feature map allocation across memory banks and operates in conjunction with a multi-memory architecture to enable parallel access. Our method reduces row conflicts by up to 21% and improves throughput by 17% on the KCU1500 FPGA, with validation across YOLOv2, VGG16, and AlexNet. The key innovation lies in the layer-specific address mapping strategy and hardware-software co-design, providing a scalable and efficient solution for CNN inference across both edge and cloud platforms. Full article
(This article belongs to the Special Issue FPGA-Based Reconfigurable Embedded Systems)
Show Figures

Figure 1

21 pages, 3009 KiB  
Article
Karatsuba Algorithm Revisited for 2D Convolution Computation Optimization
by Qi Wang, Jianghan Zhu, Can He, Shihang Wang, Xingbo Wang, Yuan Ren and Terry Tao Ye
Entropy 2025, 27(5), 506; https://doi.org/10.3390/e27050506 - 8 May 2025
Viewed by 430
Abstract
Convolution plays a significant role in many scientific and technological computations, such as artificial intelligence and signal processing. Convolutional computations consist of many dot-product operations (multiplication–accumulation, or MAC), for which the Winograd algorithm is currently the most widely used method to reduce the [...] Read more.
Convolution plays a significant role in many scientific and technological computations, such as artificial intelligence and signal processing. Convolutional computations consist of many dot-product operations (multiplication–accumulation, or MAC), for which the Winograd algorithm is currently the most widely used method to reduce the number of MACs. The Karatsuba algorithm, since its introduction in the 1960s, has been traditionally used as a fast arithmetic method to perform multiplication between large-bit-width operands. It had not been exploited to accelerate 2D convolution computations before. In this paper, we revisited the Karatsuba algorithm and exploited it to reduce the number of MACs in 2D convolutions. The matrices are first segmented into tiles in a divide-and-conquer method, and the resulting submatrices are overlapped to construct the final output matrix. Our analysis and benchmarks have shown that for convolution operations of the same dimensions, the Karatsuba algorithm requires the same number of multiplications but fewer additions as compared with the Winograd algorithm. A pseudocode implementation is also provided to demonstrate the complexity reduction in Karatsuba-based convolution. FPGA implementation of Karatsuba-based convolution also achieves 33.6% LUTs (Look -up Tables) reduction compared with Winograd-based implementation. Full article
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

17 pages, 4831 KiB  
Article
Achieving Low-Latency, High-Throughput Online Partial Particle Identification for the NA62 Experiment Using FPGAs and Machine Learning
by Pierpaolo Perticaroli, Roberto Ammendola, Andrea Biagioni, Carlotta Chiarini, Andrea Ciardiello, Paolo Cretaro, Ottorino Frezza, Francesca Lo Cicero, Michele Martinelli, Roberto Piandani, Luca Pontisso, Mauro Raggi, Cristian Rossi, Francesco Simula, Matteo Turisini, Piero Vicini and Alessandro Lonardo
Electronics 2025, 14(9), 1892; https://doi.org/10.3390/electronics14091892 - 7 May 2025
Viewed by 396
Abstract
FPGA-RICH is an FPGA-based online partial particle identification system for the NA62 experiment employing AI techniques. Integrated between the readout of the Ring Imaging Cherenkov detector (RICH) and the low-level trigger processor (L0TP+), FPGA-RICH implements a fast pipeline to process in real-time the [...] Read more.
FPGA-RICH is an FPGA-based online partial particle identification system for the NA62 experiment employing AI techniques. Integrated between the readout of the Ring Imaging Cherenkov detector (RICH) and the low-level trigger processor (L0TP+), FPGA-RICH implements a fast pipeline to process in real-time the RICH raw hit data stream, producing trigger primitives containing elaborate physics information—e.g., the number of charged particles in a physics event—that L0TP+ can use to improve trigger decision efficiency. Deployed on a single FPGA, the system combines classical online processing with a compact Neural Network algorithm to achieve efficient event classification while managing the challenging ∼10 MHz throughput requirement of NA62. The streaming pipeline ensures ∼1 μs latency, comparable to that of the NA62 detectors, allowing its seamless integration in the existing TDAQ setup as an additional detector. Development leverages High-Level Synthesis (HLS) and the open-source hls4ml package software–hardware codesign workflow, enabling fast and flexible reprogramming, debugging, and performance optimization. We describe the implementation of the full processing pipeline, the Neural Network classifier, their functional validation, performance metrics and the system’s current status and outlook. Full article
(This article belongs to the Special Issue Emerging Applications of FPGAs and Reconfigurable Computing System)
Show Figures

Figure 1

35 pages, 9206 KiB  
Article
New Strategies Based on Hierarchical Matrices for Matrix Polynomial Evaluation in Exascale Computing Era
by Luisa Carracciuolo and Valeria Mele
Mathematics 2025, 13(9), 1378; https://doi.org/10.3390/math13091378 - 23 Apr 2025
Viewed by 321
Abstract
Advancements in computing platform deployment have acted as both push and pull elements for the advancement of engineering design and scientific knowledge. Historically, improvements in computing platforms were mostly dependent on simultaneous developments in hardware, software, architecture, and algorithms (a process known as [...] Read more.
Advancements in computing platform deployment have acted as both push and pull elements for the advancement of engineering design and scientific knowledge. Historically, improvements in computing platforms were mostly dependent on simultaneous developments in hardware, software, architecture, and algorithms (a process known as co-design), which raised the performance of computational models. But, there are many obstacles to using the Exascale Computing Era sophisticated computing platforms effectively. These include but are not limited to massive parallelism, effective exploitation, and high complexity in programming, such as heterogeneous computing facilities. So, now is the time to create new algorithms that are more resilient, energy-aware, and able to address the demands of increasing data locality and achieve much higher concurrency through high levels of scalability and granularity. In this context, some methods, such as those based on hierarchical matrices (HMs), have been declared among the most promising in the use of new computing resources precisely because of their strongly hierarchical nature. This work aims to start to assess the advantages, and limits, of the use of HMs in operations such as the evaluation of matrix polynomials, which are crucial, for example, in a Graph Convolutional Deep Neural Network (GC-DNN) context. A case study from the GCNN context provides some insights into the effectiveness, in terms of accuracy, of the employment of HMs. Full article
Show Figures

Figure 1

10 pages, 7224 KiB  
Article
On-Chip Photonic Convolutional Processing Lights Up Fourier Neural Operator
by Zilong Tao, Hao Ouyang, Qiuquan Yan, Shiyin Du, Hao Hao, Jun Zhang and Jie You
Photonics 2025, 12(3), 253; https://doi.org/10.3390/photonics12030253 - 12 Mar 2025
Viewed by 1047
Abstract
Fourier Neural Operators (FNOs) have gained increasing attention for their effectiveness in extracting frequencydomain features and efficiently approximating functions, making them wellsuited for classification tasks. However, the absence of specialized photonic hardware has limited the acceleration of FNO inference. In this study, we [...] Read more.
Fourier Neural Operators (FNOs) have gained increasing attention for their effectiveness in extracting frequencydomain features and efficiently approximating functions, making them wellsuited for classification tasks. However, the absence of specialized photonic hardware has limited the acceleration of FNO inference. In this study, we introduce what we believe is the first photonic hardware framework dedicated to speeding up the Fourier layer of an FNO. Our approach employs a frequency domain convolutional photonic chip and a micro-ring array chip, achieving 5-bit quantization precision in the inference process. On the Radio ML 2016.10b dataset, our Fourier convolutional neural network achieves a peak identification accuracy of 95.50%, outperforming standard convolution-based networks. These findings highlight the transformative potential of co-designing software and hardware, demonstrating how photonic computing can deliver specialized acceleration for critical AI components and substantially improve inference efficiency. Ultimately, this work lays a foundation for integrating photonic technologies into next-generation AI accelerators, pointing to a promising direction for further research and development in optoelectronic hybrid computing. Full article
(This article belongs to the Special Issue The Principle and Application of Photonic Metasurfaces)
Show Figures

Figure 1

26 pages, 2271 KiB  
Article
Hardware/Software Co-Design Optimization for Training Recurrent Neural Networks at the Edge
by Yicheng Zhang, Bojian Yin, Manil Dev Gomony, Henk Corporaal, Carsten Trinitis and Federico Corradi
J. Low Power Electron. Appl. 2025, 15(1), 15; https://doi.org/10.3390/jlpea15010015 - 11 Mar 2025
Viewed by 1956
Abstract
Edge devices execute pre-trained Artificial Intelligence (AI) models optimized on large Graphical Processing Units (GPUs); however, they frequently require fine-tuning when deployed in the real world. This fine-tuning, referred to as edge learning, is essential for personalized tasks such as speech and gesture [...] Read more.
Edge devices execute pre-trained Artificial Intelligence (AI) models optimized on large Graphical Processing Units (GPUs); however, they frequently require fine-tuning when deployed in the real world. This fine-tuning, referred to as edge learning, is essential for personalized tasks such as speech and gesture recognition, which often necessitate the use of recurrent neural networks (RNNs). However, training RNNs on edge devices presents major challenges due to limited memory and computing resources. In this study, we propose a system for RNN training through sequence partitioning using the Forward Propagation Through Time (FPTT) training method, thereby enabling edge learning. Our optimized hardware/software co-design for FPTT represents a novel contribution in this domain. This research demonstrates the viability of FPTT for fine-tuning real-world applications by implementing a complete computational framework for training Long Short-Term Memory (LSTM) networks utilizing FPTT. Moreover, this work incorporates the optimization and exploration of a scalable digital hardware architecture using an open-source hardware-design framework, named Chipyard and its implementation on a Field-Programmable Gate Array (FPGA) for cycle-accurate verification. The empirical results demonstrate that partitioned training on the proposed architecture enables an 8.2-fold reduction in memory usage with only a 0.2× increase in latency for small-batch sequential MNIST (S-MNIST) compared to traditional non-partitioned training. Full article
Show Figures

Figure 1

23 pages, 13406 KiB  
Article
Object Detection Post Processing Accelerator Based on Co-Design of Hardware and Software
by Dengtian Yang, Lan Chen, Xiaoran Hao and Yiheng Zhang
Information 2025, 16(1), 63; https://doi.org/10.3390/info16010063 - 17 Jan 2025
Viewed by 1539
Abstract
Deep learning significantly advances object detection. Post processes, a critical component of this process, select valid bounding boxes to represent the true targets during inference and assign boxes and labels to these objects during training to optimize the loss function. However, post processes [...] Read more.
Deep learning significantly advances object detection. Post processes, a critical component of this process, select valid bounding boxes to represent the true targets during inference and assign boxes and labels to these objects during training to optimize the loss function. However, post processes constitute a substantial portion of the total processing time for a single image. This inefficiency primarily arises from the extensive Intersection over Union (IoU) calculations required between numerous redundant bounding boxes in post processing algorithms. To reduce these redundant IoU calculations, we introduce a classification prioritization strategy during both training and inference post processes. Additionally, post processes involve sorting operations that contribute to their inefficiency. To minimize unnecessary comparisons in Top-K sorting, we have improved the bitonic sorter by developing a hybrid bitonic algorithm. These improvements have effectively accelerated the post processing. Given the similarities between the training and inference post processes, we unify four typical post processing algorithms and design a hardware accelerator based on this framework. Our accelerator achieves at least 7.55 times the speed in inference post processing compared to that of recent accelerators. When compared to the RTX 2080 Ti system, our proposed accelerator offers at least 21.93 times the speed for the training post process and 19.89 times for the inference post process, thereby significantly enhancing the efficiency of loss function minimization. Full article
Show Figures

Figure 1

20 pages, 7167 KiB  
Article
Accelerating Deep Learning-Based Morphological Biometric Recognition with Field-Programmable Gate Arrays
by Nourhan Zayed, Nahed Tawfik, Mervat M. A. Mahmoud, Ahmed Fawzy, Young-Im Cho and Mohamed S. Abdallah
AI 2025, 6(1), 8; https://doi.org/10.3390/ai6010008 - 9 Jan 2025
Viewed by 1847
Abstract
Convolutional neural networks (CNNs) are increasingly recognized as an important and potent artificial intelligence approach, widely employed in many computer vision applications, such as facial recognition. Their importance resides in their capacity to acquire hierarchical features, which is essential for recognizing complex patterns. [...] Read more.
Convolutional neural networks (CNNs) are increasingly recognized as an important and potent artificial intelligence approach, widely employed in many computer vision applications, such as facial recognition. Their importance resides in their capacity to acquire hierarchical features, which is essential for recognizing complex patterns. Nevertheless, the intricate architectural design of CNNs leads to significant computing requirements. To tackle these issues, it is essential to construct a system based on field-programmable gate arrays (FPGAs) to speed up CNNs. FPGAs provide fast development capabilities, energy efficiency, decreased latency, and advanced reconfigurability. A facial recognition solution by leveraging deep learning and subsequently deploying it on an FPGA platform is suggested. The system detects whether a person has the necessary authorization to enter/access a place. The FPGA is responsible for processing this system with utmost security and without any internet connectivity. Various facial recognition networks are accomplished, including AlexNet, ResNet, and VGG-16 networks. The findings of the proposed method prove that the GoogLeNet network is the best fit due to its lower computational resource requirements, speed, and accuracy. The system was deployed on three hardware kits to appraise the performance of different programming approaches in terms of accuracy, latency, cost, and power consumption. The software programming on the Raspberry Pi-3B kit had a recognition accuracy of around 70–75% and relied on a stable internet connection for processing. This dependency on internet connectivity increases bandwidth consumption and fails to meet the required security criteria, contrary to ZYBO-Z7 board hardware programming. Nevertheless, the hardware/software co-design on the PYNQ-Z2 board achieved an accuracy rate of 85% to 87%. It operates independently of an internet connection, making it a standalone system and saving costs. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

73 pages, 3621 KiB  
Review
Hardware Design and Verification with Large Language Models: A Scoping Review, Challenges, and Open Issues
by Meisam Abdollahi, Seyedeh Faegheh Yeganli, Mohammad (Amir) Baharloo and Amirali Baniasadi
Electronics 2025, 14(1), 120; https://doi.org/10.3390/electronics14010120 - 30 Dec 2024
Cited by 1 | Viewed by 7662
Abstract
Background: Large Language Models (LLMs) are emerging as promising tools in hardware design and verification, with recent advancements suggesting they could fundamentally reshape conventional practices. Objective: This study examines the significance of LLMs in shaping the future of hardware design and verification. It [...] Read more.
Background: Large Language Models (LLMs) are emerging as promising tools in hardware design and verification, with recent advancements suggesting they could fundamentally reshape conventional practices. Objective: This study examines the significance of LLMs in shaping the future of hardware design and verification. It offers an extensive literature review, addresses key challenges, and highlights open research questions in this field. Design: in this scoping review, we survey over 360 papers most of the published between 2022 and 2024, including 71 directly relevant ones to the topic, to evaluate the current role of LLMs in advancing automation, optimization, and innovation in hardware design and verification workflows. Results: Our review highlights LLM applications across synthesis, simulation, and formal verification, emphasizing their potential to streamline development processes while upholding high standards of accuracy and performance. We identify critical challenges, such as scalability, model interpretability, and the alignment of LLMs with domain-specific languages and methodologies. Furthermore, we discuss open issues, including the necessity for tailored model fine-tuning, integration with existing Electronic Design Automation (EDA) tools, and effective handling of complex data structures typical of hardware projects. Conclusions: this survey not only consolidates existing knowledge but also outlines prospective research directions, underscoring the transformative role LLMs could play in the future of hardware design and verification. Full article
(This article belongs to the Special Issue Machine Learning in Network-on-Chip Architectures)
Show Figures

Graphical abstract

18 pages, 3376 KiB  
Article
Heterogeneous Edge Computing for Molecular Property Prediction with Graph Convolutional Networks
by Mahdieh Grailoo and Jose Nunez-Yanez
Electronics 2025, 14(1), 101; https://doi.org/10.3390/electronics14010101 - 30 Dec 2024
Cited by 2 | Viewed by 986
Abstract
Graph-based neural networks have proven to be useful in molecular property prediction, a critical component of computer-aided drug discovery. In this application, in response to the growing demand for improved computational efficiency and localized edge processing, this paper introduces a novel approach that [...] Read more.
Graph-based neural networks have proven to be useful in molecular property prediction, a critical component of computer-aided drug discovery. In this application, in response to the growing demand for improved computational efficiency and localized edge processing, this paper introduces a novel approach that leverages specialized accelerators on a heterogeneous edge computing platform. Our focus is on graph convolutional networks, a leading graph-based neural network variant that integrates graph convolution layers with multi-layer perceptrons. Molecular graphs are typically characterized by a low number of nodes, leading to low-dimensional dense matrix multiplications within multi-layer perceptrons—conditions that are particularly well-suited for Edge TPUs. These TPUs feature a systolic array of multiply–accumulate units optimized for dense matrix operations. Furthermore, the inherent sparsity in molecular graph adjacency matrices offers additional opportunities for computational optimization. To capitalize on this, we developed an FPGA GFADES accelerator, using high-level synthesis, specifically tailored to efficiently manage the sparsity in both the graph structure and node features. Our hardware/software co-designed GCN+MLP architecture delivers performance improvements, achieving up to 58× increased speed compared to conventional software implementations. This architecture is implemented using the Pynq framework and TensorFlow Lite Runtime, running on a multi-core ARM CPU within an AMD/Xilinx Zynq Ultrascale+ device, in combination with the Edge TPU and programmable logic. Full article
Show Figures

Figure 1

Back to TopTop