Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (76)

Search Parameters:
Keywords = FPGA-based edge computing

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 2212 KB  
Article
A Lightweight Model for Power Quality Disturbance Recognition Targeting Edge Deployment
by Hao Bai, Ruotian Yao, Tong Liu, Ziji Ma, Shangyu Liu, Yiyong Lei and Yawen Zheng
Energies 2026, 19(2), 368; https://doi.org/10.3390/en19020368 - 12 Jan 2026
Viewed by 188
Abstract
To address the dual demands of accuracy and real-time performance in power quality disturbance (PQD) recognition for new power system, this paper proposes a lightweight model named the Cross-Channel Attention Three-Layer Convolutional Model (1D-CCANet-3), specifically designed for edge deployment. Based on the one-dimensional [...] Read more.
To address the dual demands of accuracy and real-time performance in power quality disturbance (PQD) recognition for new power system, this paper proposes a lightweight model named the Cross-Channel Attention Three-Layer Convolutional Model (1D-CCANet-3), specifically designed for edge deployment. Based on the one-dimensional convolutional neural network (1D-CNN), the model features an ultra-compact architecture with only three convolutional layers and one fully connected layer. By incorporating a set of cross-channel attention (CCA) mechanisms in the final convolutional layer, the model further enhances disturbance recognition accuracy. Compared to other deep learning models, 1D-CCANet-3 significantly reduces computational and storage requirements for edge devices while achieving accurate and efficient PQD recognition. The model demonstrates robust performance in recognizing 10 types of PQD under varying signal-to-noise ratio (SNR) conditions. Furthermore, the model has been successfully deployed on the FPGA platform and exhibits high recognition accuracy and efficiency in real-world data validation. This work provides a feasible and effective solution for accurate and real-time PQD monitoring on edge devices in new power systems. Full article
Show Figures

Figure 1

28 pages, 1828 KB  
Article
Edge Detection on a 2D-Mesh NoC with Systolic Arrays: From FPGA Validation to GDSII Proof-of-Concept
by Emma Mascorro-Guardado, Susana Ortega-Cisneros, Francisco Javier Ibarra-Villegas, Jorge Rivera, Héctor Emmanuel Muñoz-Zapata and Emilio Isaac Baungarten-Leon
Appl. Sci. 2026, 16(2), 702; https://doi.org/10.3390/app16020702 - 9 Jan 2026
Viewed by 147
Abstract
Edge detection is a key building block in real-time image-processing applications such as drone-based infrastructure inspection, autonomous navigation, and remote sensing. However, its computational cost remains a challenge for resource-constrained embedded systems. This work presents a hardware-accelerated edge detection architecture based on a [...] Read more.
Edge detection is a key building block in real-time image-processing applications such as drone-based infrastructure inspection, autonomous navigation, and remote sensing. However, its computational cost remains a challenge for resource-constrained embedded systems. This work presents a hardware-accelerated edge detection architecture based on a homogeneous 2D-mesh Network-on-Chip (NoC) integrating systolic arrays to efficiently perform the convolution operations required by the Sobel filter. The proposed architecture was first developed and validated as a 3 × 3 mesh prototype on FPGA (Xilinx Zynq-7000, Zynq-7010, XC7Z010-CLG400A, Zybo board, utilizing 26,112 LUTs, 24,851 flip-flops, and 162 DSP blocks), achieving a throughput of 8.8 Gb/s with a power consumption of 0.79 W at 100 MHz. Building upon this validated prototype, a reduced 2 × 2 node cluster with 14-bit word width was subsequently synthesized at the physical level as a proof-of-concept using the OpenLane RTL-to-GDSII open-source flow targeting the SkyWater 130 nm PDK (sky130A). Post-layout analysis confirms the manufacturability of the design, with a total power consumption of 378 mW and compliance with timing constraints, demonstrating the feasibility of mapping the proposed architecture to silicon and its suitability for drone-based infrastructure monitoring applications. Full article
(This article belongs to the Special Issue Advanced Integrated Circuit Design and Applications)
Show Figures

Figure 1

25 pages, 6136 KB  
Article
Design and Implementation of a Decentralized Node-Level Battery Management System Chip Based on Deep Neural Network Algorithms
by Muh-Tian Shiue, Yang-Chieh Ou, Chih-Feng Wu, Yi-Fong Wang and Bing-Jun Liu
Electronics 2026, 15(2), 296; https://doi.org/10.3390/electronics15020296 - 9 Jan 2026
Viewed by 215
Abstract
As Battery Management Systems (BMSs) continue to expand in both scale and capacity, conventional state-of-charge (SOC) estimation methods—such as Coulomb counting and model-based observers—face increasing challenges in meeting the requirements for cell-level precision, scalability, and adaptability under aging and operating variability. To address [...] Read more.
As Battery Management Systems (BMSs) continue to expand in both scale and capacity, conventional state-of-charge (SOC) estimation methods—such as Coulomb counting and model-based observers—face increasing challenges in meeting the requirements for cell-level precision, scalability, and adaptability under aging and operating variability. To address these limitations, this study integrates a Deep Neural Network (DNN)–based estimation framework into a node-level BMS architecture, enabling edge-side computation at each individual battery cell. The proposed architecture adopts a decentralized node-level structure with distributed parameter synchronization, in which each BMS node independently performs SOC estimation using shared model parameters. Global battery characteristics are learned through offline training and subsequently synchronized to all nodes, ensuring estimation consistency across large battery arrays while avoiding centralized online computation. This design enhances system scalability and deployment flexibility, particularly in high-voltage battery strings with isolated measurement requirements. The proposed DNN framework consists of two identical functional modules: an offline training module and a real-time estimation module. The training module operates on high-performance computing platforms—such as in-vehicle microcontrollers during idle periods or charging-station servers—using historical charge–discharge data to extract and update battery characteristic parameters. These parameters are then transferred to the real-time estimation chip for adaptive SOC inference. The decentralized BMS node chip integrates preprocessing circuits, a momentum-based optimizer, a first-derivative sigmoid unit, and a weight update module. The design is implemented using the TSMC 40 nm CMOS process and verified on a Xilinx Virtex-5 FPGA. Experimental results using real BMW i3 battery data demonstrate a Root Mean Square Error (RMSE) of 1.853%, with an estimation error range of [4.324%, −4.346%]. Full article
(This article belongs to the Special Issue New Insights in Power Electronics: Prospects and Challenges)
Show Figures

Figure 1

19 pages, 1559 KB  
Article
FPGA Modular Scalability Framework for Real-Time Noise Reduction in Images
by Ng Boon Khai, Norfadila Mahrom, Rafikha Aliana A. Raof, Teo Sje Yin and Phaklen Ehkan
Computers 2026, 15(1), 13; https://doi.org/10.3390/computers15010013 - 1 Jan 2026
Viewed by 333
Abstract
Image noise degrades image quality in applications such as medical imaging, surveillance, and remote sensing, where real-time processing and high accuracy are critical. Software-based denoising can be flexible, but often suffers from latency and low throughput when deployed on embedded or edge systems. [...] Read more.
Image noise degrades image quality in applications such as medical imaging, surveillance, and remote sensing, where real-time processing and high accuracy are critical. Software-based denoising can be flexible, but often suffers from latency and low throughput when deployed on embedded or edge systems. A Field Programmable Gate Array (FPGA)-based system offers parallelism and lower latency, but the existing work typically focusses on fixed architectures rather than scalable framework supporting multiple filter models. This paper presents a high-performance, resource-efficient FPGA-based framework for real-time noise reduction. The modular, pipelined architecture integrates median and adaptive filters, managed by a state machine-based control unit to enhance processing efficiency. Implemented on a Cyclone V FPGA using Quartus Prime 22.1std, the system provides scalability through adjustable Random Access Memory (RAM) and supports multiple denoising algorithms. Tested on Leena images with salt-and-pepper noise, it processes 10% noise in 1.724 ms in a simulated environment running at 800 MHz; it was compared with Python version 3.11.2 with the OpenCV-library version 4.8.076 on a general-purpose Central Processing Unit (CPU) (0.0201 ms). The proposed solution demonstrates low latency and high throughput, making it well-suited for embedded and edge computing applications. Full article
Show Figures

Figure 1

17 pages, 558 KB  
Article
FPGA-Accelerated Multi-Resolution Spline Reconstruction for Real-Time Multimedia Signal Processing
by Manuel J. C. S. Reis
Electronics 2026, 15(1), 173; https://doi.org/10.3390/electronics15010173 - 30 Dec 2025
Viewed by 312
Abstract
This paper presents an FPGA-based architecture for real-time spline-based signal reconstruction, targeted at multimedia signal processing applications. Leveraging the multi-resolution properties of B-splines, the proposed design enables efficient upsampling, denoising, and feature preservation for image and video signals. Implemented on a mid-range FPGA, [...] Read more.
This paper presents an FPGA-based architecture for real-time spline-based signal reconstruction, targeted at multimedia signal processing applications. Leveraging the multi-resolution properties of B-splines, the proposed design enables efficient upsampling, denoising, and feature preservation for image and video signals. Implemented on a mid-range FPGA, the system supports parallel processing of multiple channels, with low-latency memory access and pipelined arithmetic units. The proposed pipeline achieves a throughput of up to 33.1 megasmples per second for 1D signals and 19.4 megapixels per second for 2D images, while maintaining average power consumption below 250 mW. Compared to CPU and embedded GPU implementations, the design delivers >15× improvement in energy efficiency and deterministic low-latency performance (8–12 clock cycles). A key novelty lies in combining multi-resolution B-spline reconstruction with fixed-point arithmetic and streaming-friendly pipelining, making the architecture modular, compact, and robust to varying input rates. Benchmarking results on synthetic and real multimedia datasets show significant improvements in throughput and energy efficiency compared to conventional CPU and GPU implementations. The architecture supports flexible resolution scaling, making it suitable for edge-computing scenarios in multimedia environments. Full article
(This article belongs to the Special Issue Digital Signal and Image Processing for Multimedia Technology)
Show Figures

Figure 1

25 pages, 7245 KB  
Article
A Hardware-Friendly Joint Denoising and Demosaicing System Based on Efficient FPGA Implementation
by Jiqing Wang, Xiang Wang and Yu Shen
Micromachines 2026, 17(1), 44; https://doi.org/10.3390/mi17010044 - 29 Dec 2025
Viewed by 316
Abstract
This paper designs a hardware-implementable joint denoising and demosaicing acceleration system. Firstly, a lightweight network architecture with multi-scale feature extraction based on partial convolution is proposed at the algorithm level. The partial convolution scheme can reduce the redundancy of filters and feature maps, [...] Read more.
This paper designs a hardware-implementable joint denoising and demosaicing acceleration system. Firstly, a lightweight network architecture with multi-scale feature extraction based on partial convolution is proposed at the algorithm level. The partial convolution scheme can reduce the redundancy of filters and feature maps, thereby reducing memory accesses, and achieve excellent visual effects with a smaller model complexity. In addition, multi-scale extraction can expand the receptive field while reducing model parameters. Then, we apply separable convolution and partial convolution to reduce the parameters of the model. Compared with the standard convolutional solution, the parameters and MACs are reduced by 83.38% and 77.71%, respectively. Moreover, different networks bring different memory access and complex computing methods; thus, we introduce a unified and flexibly configurable hardware acceleration processing platform and implement it on the Xilinx Zynq UltraScale + FPGA board. Finally, compared with the state-of-the-art neural network solution on the Kodak24 set, the peak signal-to-noise ratio and the structural similarity index measure are approximately improved by 2.36dB and 0.0806, respectively, and the computing efficiency is improved by 2.09×. Furthermore, the hardware architecture supports multi-parallelism and can adapt to the different edge-embedded scenarios. Overall, the image processing task solution proposed in this paper has positive advantages in the joint denoising and demosaicing system. Full article
(This article belongs to the Special Issue Advances in Field-Programmable Gate Arrays (FPGAs))
Show Figures

Figure 1

13 pages, 1258 KB  
Article
A Binary Convolution Accelerator Based on Compute-in-Memory
by Wenpeng Cui, Zhe Zheng, Pan Li, Ming Li, Yu Liu and Yingying Chi
Electronics 2026, 15(1), 117; https://doi.org/10.3390/electronics15010117 - 25 Dec 2025
Viewed by 322
Abstract
As AI workloads move to edge devices, the von Neumann architecture is hindered by memory- and power-wall limitations We present an SRAM-based compute-in-memory binary convolution accelerator that stores and transports only 1-bit weights and activations, maps MACs to bitwise XNOR–popcount, and fuses BatchNorm, [...] Read more.
As AI workloads move to edge devices, the von Neumann architecture is hindered by memory- and power-wall limitations We present an SRAM-based compute-in-memory binary convolution accelerator that stores and transports only 1-bit weights and activations, maps MACs to bitwise XNOR–popcount, and fuses BatchNorm, HardTanh, and binarization into a single affine-and-threshold uni. Residual paths are handled by in-accumulator summation to minimize data movement. FPGA validation shows 87.6% CIFAR 10 accuracy consistent with a bit-accurate software reference, a compute-only latency of 2.93 ms per 32 × 32 image at 50 MHz, sustained at only 1.52 W. These results demonstrate an efficient and practical path to deploying edge models under tight power and memory budgets. Full article
Show Figures

Figure 1

17 pages, 9727 KB  
Article
An Energy-Efficient Neuromorphic Processor Using Unified Refractory Control-Based NoC for Edge AI
by Su-Hwan Na and Dong-Sun Kim
Electronics 2025, 14(24), 4959; https://doi.org/10.3390/electronics14244959 - 17 Dec 2025
Viewed by 416
Abstract
Neuromorphic computing has emerged as a promising paradigm for edge AI systems owing to its event-driven operation and high energy efficiency. However, conventional spiking neural network (SNN) architectures often suffer from redundant computation and inefficient power control, particularly during on-chip learning. This paper [...] Read more.
Neuromorphic computing has emerged as a promising paradigm for edge AI systems owing to its event-driven operation and high energy efficiency. However, conventional spiking neural network (SNN) architectures often suffer from redundant computation and inefficient power control, particularly during on-chip learning. This paper proposes a network-on-chip (NoC) architecture featuring a unified refractory-enabled neuron (UREN)-based router that globally coordinates spike-driven computation across multiple neuron cores. The router applies a unified refractory time to all neurons following a winner spike event, effectively enabling clock gating and suppressing redundant activity. The proposed design adopts a star-routing topology with multicasting support and integrates nearest-neighbor spike-timing-dependent plasticity (STDP) for local online learning. FPGA-based experiments demonstrate a 30% reduction in computation and 86.1% online classification accuracy on the MNIST dataset compared with baseline SNN implementations. These results confirm that the UREN-based router provides a scalable and power-efficient neuromorphic processor architecture, well suited for energy-constrained edge AI applications. Full article
Show Figures

Figure 1

30 pages, 10600 KB  
Article
Edge-to-Cloud Continuum Orchestrator Based on Heterogeneous Nodes for Urban Traffic Monitoring
by Pietro Ruiu, Andrea Lagorio, Claudio Rubattu, Matteo Anedda, Michele Sanna and Mauro Fadda
Future Internet 2025, 17(12), 574; https://doi.org/10.3390/fi17120574 - 13 Dec 2025
Viewed by 514
Abstract
This paper presents an edge-to-cloud orchestrator capable of supporting services running at the edge on heterogeneous nodes based on general-purpose processing units and Field Programmable Gate Array (FPGA) platform (i.e., AMD Kria K26 SoM) in an urban environment, integrated with a series of [...] Read more.
This paper presents an edge-to-cloud orchestrator capable of supporting services running at the edge on heterogeneous nodes based on general-purpose processing units and Field Programmable Gate Array (FPGA) platform (i.e., AMD Kria K26 SoM) in an urban environment, integrated with a series of cloud-based services and capable of minimizing energy consumption. A use case of vehicle traffic monitoring is considered in a mobility scenario involving computing nodes equipped with video acquisition systems to evaluate the feasibility of the system. Since the use case concerns the monitoring of vehicular traffic by AI-based images and video processing, specific support for application orchestration in the form of containers was required. The development concerned the feasibility of managing containers with hardware acceleration derived from the Vitis AI design flow, leveraged to accelerate AI inference on the AMD Kria K26 SoM. A Kubernetes-based controller node was designed to facilitate the tracking and monitoring of specific vehicles. These vehicles may either be flagged by law enforcement authorities due to legal concerns or identified by the system itself through detection mechanisms deployed in computing nodes. Strategically distributed across the city, these nodes continuously analyze traffic, identifying vehicles that match the search criteria. Using containerized microservices and Kubernetes orchestration, the infrastructure ensures that tracking operations remain uninterrupted even in high-traffic scenarios. Full article
(This article belongs to the Special Issue Convergence of IoT, Edge and Cloud Systems)
Show Figures

Figure 1

19 pages, 3327 KB  
Article
Design and Research of High-Energy-Efficiency Underwater Acoustic Target Recognition System
by Ao Ma, Wenhao Yang, Pei Tan, Yinghao Lei, Liqin Zhu, Bingyao Peng and Ding Ding
Electronics 2025, 14(19), 3770; https://doi.org/10.3390/electronics14193770 - 24 Sep 2025
Viewed by 827
Abstract
Recently, with the rapid development of underwater resource exploration and underwater activities, underwater acoustic (UA) target recognition has become crucial in marine resource exploration. However, traditional underwater acoustic recognition systems face challenges such as low energy efficiency, poor accuracy, and slow response times. [...] Read more.
Recently, with the rapid development of underwater resource exploration and underwater activities, underwater acoustic (UA) target recognition has become crucial in marine resource exploration. However, traditional underwater acoustic recognition systems face challenges such as low energy efficiency, poor accuracy, and slow response times. Systems for UA target recognition using deep learning networks have garnered widespread attention. Convolutional neural network (CNN) consumes significant computational resources and energy during convolution operations, which exacerbates the issues of energy consumption and complicates edge deployment. This paper explores a high-energy-efficiency UA target recognition system. Based on the DenseNet CNN, the system uses fine-grained pruning for sparsification and sparse convolution computations. The UA target recognition CNN was deployed on FPGAs and chips to achieve low-power recognition. Using the noise-disturbed ShipsEar dataset, the system reaches a recognition accuracy of 98.73% at 0 dB signal-to-noise ratio (SNR). After 50% fine-grained pruning, the accuracy is 96.11%. The circuit prototype on FPGA shows that the circuit achieves an accuracy of 95% at 0 dB SNR. This work implements the circuit design and layout of the UA target recognition chip based on a 65 nm CMOS process. DC synthesis results show that the power consumption is 90.82 mW, and the single-target recognition time is 7.81 ns. Full article
(This article belongs to the Special Issue Digital Intelligence Technology and Applications)
Show Figures

Figure 1

16 pages, 2270 KB  
Article
Performance Evaluation of FPGA, GPU, and CPU in FIR Filter Implementation for Semiconductor-Based Systems
by Muhammet Arucu and Teodor Iliev
J. Low Power Electron. Appl. 2025, 15(3), 40; https://doi.org/10.3390/jlpea15030040 - 21 Jul 2025
Viewed by 3282
Abstract
This study presents a comprehensive performance evaluation of field-programmable gate array (FPGA), graphics processing unit (GPU), and central processing unit (CPU) platforms for implementing finite impulse response (FIR) filters in semiconductor-based digital signal processing (DSP) systems. Utilizing a standardized FIR filter designed with [...] Read more.
This study presents a comprehensive performance evaluation of field-programmable gate array (FPGA), graphics processing unit (GPU), and central processing unit (CPU) platforms for implementing finite impulse response (FIR) filters in semiconductor-based digital signal processing (DSP) systems. Utilizing a standardized FIR filter designed with the Kaiser window method, we compare computational efficiency, latency, and energy consumption across the ZYNQ XC7Z020 FPGA, Tesla K80 GPU, and Arm-based CPU, achieving processing times of 0.004 s, 0.008 s, and 0.107 s, respectively, with FPGA power consumption of 1.431 W and comparable energy profiles for GPU and CPU. The FPGA is 27 times faster than the CPU and 2 times faster than the GPU, demonstrating its suitability for low-latency DSP tasks. A detailed analysis of resource utilization and scalability underscores the FPGA’s reconfigurability for optimized DSP implementations. This work provides novel insights into platform-specific optimizations, addressing the demand for energy-efficient solutions in edge computing and IoT applications, with implications for advancing sustainable DSP architectures. Full article
(This article belongs to the Topic Advanced Integrated Circuit Design and Application)
Show Figures

Figure 1

35 pages, 8431 KB  
Article
Integrating Physical Unclonable Functions with Machine Learning for the Authentication of Edge Devices in IoT Networks
by Abdul Manan Sheikh, Md. Rafiqul Islam, Mohamed Hadi Habaebi, Suriza Ahmad Zabidi, Athaur Rahman Bin Najeeb and Adnan Kabbani
Future Internet 2025, 17(7), 275; https://doi.org/10.3390/fi17070275 - 21 Jun 2025
Cited by 2 | Viewed by 1483
Abstract
Edge computing (EC) faces unique security threats due to its distributed architecture, resource-constrained devices, and diverse applications, making it vulnerable to data breaches, malware infiltration, and device compromise. The mitigation strategies against EC data security threats include encryption, secure authentication, regular updates, tamper-resistant [...] Read more.
Edge computing (EC) faces unique security threats due to its distributed architecture, resource-constrained devices, and diverse applications, making it vulnerable to data breaches, malware infiltration, and device compromise. The mitigation strategies against EC data security threats include encryption, secure authentication, regular updates, tamper-resistant hardware, and lightweight security protocols. Physical Unclonable Functions (PUFs) are digital fingerprints for device authentication that enhance interconnected devices’ security due to their cryptographic characteristics. PUFs produce output responses against challenge inputs based on the physical structure and intrinsic manufacturing variations of an integrated circuit (IC). These challenge-response pairs (CRPs) enable secure and reliable device authentication. Our work implements the Arbiter PUF (APUF) on Altera Cyclone IV FPGAs installed on the ALINX AX4010 board. The proposed APUF has achieved performance metrics of 49.28% uniqueness, 38.6% uniformity, and 89.19% reliability. The robustness of the proposed APUF against machine learning (ML)-based modeling attacks is tested using supervised Support Vector Machines (SVMs), logistic regression (LR), and an ensemble of gradient boosting (GB) models. These ML models were trained over more than 19K CRPs, achieving prediction accuracies of 61.1%, 63.5%, and 63%, respectively, thus cementing the resiliency of the device against modeling attacks. However, the proposed APUF exhibited its vulnerability to Multi-Layer Perceptron (MLP) and random forest (RF) modeling attacks, with 95.4% and 95.9% prediction accuracies, gaining successful authentication. APUFs are well-suited for device authentication due to their lightweight design and can produce a vast number of challenge-response pairs (CRPs), even in environments with limited resources. Our findings confirm that our approach effectively resists widely recognized attack methods to model PUFs. Full article
(This article belongs to the Special Issue Distributed Machine Learning and Federated Edge Computing for IoT)
Show Figures

Figure 1

20 pages, 7927 KB  
Article
Efficient License Plate Alignment and Recognition Using FPGA-Based Edge Computing
by Chao-Hsiang Hsiao, Hoi Lee, Yin-Tien Wang and Min-Jie Hsu
Electronics 2025, 14(12), 2475; https://doi.org/10.3390/electronics14122475 - 18 Jun 2025
Cited by 2 | Viewed by 2451
Abstract
Efficient and accurate license plate recognition (LPR) in unconstrained environments remains a critical challenge, particularly when confronted with skewed imaging angles and the limited computational capabilities of edge devices. In this study, we propose a high-performance, FPGA-based license plate alignment and recognition (LPAR) [...] Read more.
Efficient and accurate license plate recognition (LPR) in unconstrained environments remains a critical challenge, particularly when confronted with skewed imaging angles and the limited computational capabilities of edge devices. In this study, we propose a high-performance, FPGA-based license plate alignment and recognition (LPAR) system to address these issues. Our LPAR system integrates lightweight deep learning models, including YOLOv4-tiny for license plate detection, a refined convolutional pose machine (CPM) for pose estimation and alignment, and a modified LPRNet for character recognition. By restructuring the pose estimation and alignment architectures to enhance the geometric correction of license plates and adding channel and spatial attention mechanisms to LPRNet for better character recognition, the proposed LPAR system improves recognition accuracy from 88.33% to 95.00%. The complete pipeline achieved a processing speed of 2.00 frames per second (FPS) on a resource-constrained FPGA platform, demonstrating its practical viability for real-time deployment in edge computing scenarios. Full article
Show Figures

Figure 1

19 pages, 2339 KB  
Article
Parallel Processing of Sobel Edge Detection on FPGA: Enhancing Real-Time Image Analysis
by Sanmugasundaram Ravichandran, Hui-Kai Su, Wen-Kai Kuo, Dileepan Dhanasekaran, Manikandan Mahalingam and Jui-Pin Yang
Sensors 2025, 25(12), 3649; https://doi.org/10.3390/s25123649 - 11 Jun 2025
Cited by 3 | Viewed by 2962
Abstract
Detection of object boundaries and significant features within an image is one of the most important processes in image processing and computer vision, as it allows the identification of object boundaries and significant features within an image. In applications such as autonomous vehicles, [...] Read more.
Detection of object boundaries and significant features within an image is one of the most important processes in image processing and computer vision, as it allows the identification of object boundaries and significant features within an image. In applications such as autonomous vehicles, surveillance systems, and medical imaging, real-time processing has become increasingly important, which requires hardware accelerators. In this paper, the improved Sobel edge detection algorithm was implemented using Verilog as an FPGA-based algorithm designed to perform real-time image processing under the Sobel edge detection algorithm for specially RGB images. The proposed design proposes an application of horizontal and vertical Sobel kernels in parallel in order to compute the gradient magnitudes for 1028 × 720 RGB images by taking the gradient magnitudes of 3 × 3 pixel windows. This work focuses on algorithmic complex reduction by using eight directional approaches, and parallel processing leads to reducing the architectural utilization. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

12 pages, 345 KB  
Article
NeuroAdaptiveNet: A Reconfigurable FPGA-Based Neural Network System with Dynamic Model Selection
by Achraf El Bouazzaoui, Omar Mouhib and Abdelkader Hadjoudja
Chips 2025, 4(2), 24; https://doi.org/10.3390/chips4020024 - 8 May 2025
Cited by 1 | Viewed by 1525
Abstract
This paper presents NeuroAdaptiveNet, an FPGA-based neural network framework that dynamically self-adjusts its architectural configurations in real time to maximize performance across diverse datasets. The core innovation is a Dynamic Classifier Selection mechanism, which harnesses the k-Nearest Centroid algorithm to identify the most [...] Read more.
This paper presents NeuroAdaptiveNet, an FPGA-based neural network framework that dynamically self-adjusts its architectural configurations in real time to maximize performance across diverse datasets. The core innovation is a Dynamic Classifier Selection mechanism, which harnesses the k-Nearest Centroid algorithm to identify the most competent neural network model for each incoming data sample. By adaptively selecting the most suitable model configuration, NeuroAdaptiveNet achieves significantly improved classification accuracy and optimized resource usage compared to conventional, statically configured neural networks. Experimental results on four datasets demonstrate that NeuroAdaptiveNet can reduce FPGA resource utilization by as much as 52.85%, increase classification accuracy by 4.31%, and lower power consumption by up to 24.5%. These gains illustrate the clear advantage of real-time, per-input reconfiguration over static designs. These advantages are particularly crucial for edge computing and embedded applications, where computational constraints and energy efficiency are paramount. The ability of NeuroAdaptiveNet to tailor its neural network parameters and architecture on a per-input basis paves the way for more efficient and accurate AI solutions in resource-constrained environments. Full article
Show Figures

Figure 1

Back to TopTop