MDPI - Publisher of Open Access Journals

20 pages, 999 KiB

Open AccessArticle

Efficient Real-Time Isotope Identification on SoC FPGA

by Katherine Guerrero-Morejón, José María Hinojo-Montero, Jorge Jiménez-Sánchez, Cristian Rocha-Jácome, Ramón González-Carvajal and Fernando Muñoz-Chavero

Sensors 2025, 25(12), 3758; https://doi.org/10.3390/s25123758 - 16 Jun 2025

Viewed by 879

Abstract

Efficient real-time isotope identification is a critical challenge in nuclear spectroscopy, with important applications such as radiation monitoring, nuclear waste management, and medical imaging. This work presents a novel approach for isotope classification using a System-on-Chip FPGA, integrating hardware-accelerated principal component analysis (PCA) [...] Read more.

Efficient real-time isotope identification is a critical challenge in nuclear spectroscopy, with important applications such as radiation monitoring, nuclear waste management, and medical imaging. This work presents a novel approach for isotope classification using a System-on-Chip FPGA, integrating hardware-accelerated principal component analysis (PCA) for feature extraction and a software-based random forest classifier. The system leverages the FPGA’s parallel processing capabilities to implement PCA, reducing the dimensionality of digitized nuclear signals and optimizing computational efficiency. A key feature of the design is its ability to perform real-time classification without storing ADC samples, directly processing nuclear pulse data as it is acquired. The extracted features are classified by a random forest model running on the embedded microprocessor. PCA quantization is applied to minimize power consumption and resource usage without compromising accuracy. The experimental validation was conducted using datasets from high-resolution pulse-shape digitization, including closely matched isotope pairs (¹²C/¹³C, ³⁶Ar/⁴⁰Ar, and ⁸⁰Kr/⁸⁴Kr). The results demonstrate that the proposed SoC FPGA system significantly outperforms conventional software-only implementations, reducing latency while maintaining classification accuracy above 98%. This study provides a scalable, precise, and energy-efficient solution for real-time isotope identification. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

22 pages, 2918 KiB

Open AccessArticle

Design and Development of a Low-Power IoT System for Continuous Temperature Monitoring

by Luis Miguel Pires, João Figueiredo, Ricardo Martins, João Nascimento and José Martins

Designs 2025, 9(3), 73; https://doi.org/10.3390/designs9030073 - 12 Jun 2025

Viewed by 949

Abstract

This article presents the development of a compact, high-precision, and energy-efficient temperature monitoring system designed for tracking applications where continuous and accurate thermal monitoring is essential. Built around the HY0020 System-on-Chip (SoC), the system integrates two bandgap-based temperature sensors—one internal to the SoC [...] Read more.

This article presents the development of a compact, high-precision, and energy-efficient temperature monitoring system designed for tracking applications where continuous and accurate thermal monitoring is essential. Built around the HY0020 System-on-Chip (SoC), the system integrates two bandgap-based temperature sensors—one internal to the SoC and one external (Si7020-A20)—mounted on a custom PCB and powered by a coin cell battery. A distinctive feature of the system is its support for real-time parameterization of the internal sensor, which enables advanced capabilities such as thermal profiling, cross-validation, and onboard diagnostics. The system was evaluated under both room temperature and refrigeration conditions, demonstrating high accuracy with the internal sensor showing an average error of 0.041 °C and −0.36 °C, respectively, and absolute errors below ±0.5 °C. With an average current draw of just 0.01727 mA, the system achieves an estimated autonomy of 6.6 years on a 1000 mAh battery. Data are transmitted via Bluetooth Low Energy (BLE) to a Raspberry Pi 4 gateway and forwarded to an IoT cloud platform for remote access and analysis. With a total cost of approximately EUR 20 and built entirely from commercially available components, this system offers a scalable and cost-effective solution for a wide range of temperature-sensitive applications. Its combination of precision, long-term autonomy, and advanced diagnostic capabilities make it suitable for deployment in diverse fields such as supply chain monitoring, environmental sensing, biomedical storage, and smart infrastructure—where reliable, low-maintenance thermal tracking is essential. Full article

► Show Figures

Figure 1

35 pages, 2630 KiB

Open AccessArticle

AHA: Design and Evaluation of Compute-Intensive Hardware Accelerators for AMD-Xilinx Zynq SoCs Using HLS IP Flow

by David Berrazueta-Mena and Byron Navas

Computers 2025, 14(5), 189; https://doi.org/10.3390/computers14050189 - 13 May 2025

Viewed by 944

Abstract

The increasing complexity of algorithms in embedded applications has amplified the demand for high-performance computing. Heterogeneous embedded systems, particularly FPGA-based systems-on-chip (SoCs), enhance execution speed by integrating hardware accelerator intellectual property (IP) cores. However, traditional low-level IP-core design presents significant challenges. High-level synthesis [...] Read more.

The increasing complexity of algorithms in embedded applications has amplified the demand for high-performance computing. Heterogeneous embedded systems, particularly FPGA-based systems-on-chip (SoCs), enhance execution speed by integrating hardware accelerator intellectual property (IP) cores. However, traditional low-level IP-core design presents significant challenges. High-level synthesis (HLS) offers a promising alternative, enabling efficient FPGA development through high-level programming languages. Yet, effective methodologies for designing and evaluating heterogeneous FPGA-based SoCs remain crucial. This study surveys HLS tools and design concepts and presents the development of the AHA IP cores, a set of five benchmarking accelerators for rapid Zynq-based SoC evaluation. These accelerators target compute-intensive tasks, including matrix multiplication, Fast Fourier Transform (FFT), Advanced Encryption Standard (AES), Back-Propagation Neural Network (BPNN), and Artificial Neural Network (ANN). We establish a streamlined design flow using AMD-Xilinx tools for rapid prototyping and testing FPGA-based heterogeneous platforms. We outline criteria for selecting algorithms to improve speed and resource efficiency in HLS design. Our performance evaluation across various configurations highlights performance–resource trade-offs and demonstrates that ANN and BPNN achieve significant parallelism, while AES optimization increases resource utilization the most. Matrix multiplication shows strong optimization potential, whereas FFT is constrained by data dependencies. Full article

► Show Figures

Figure 1

16 pages, 24435 KiB

Open AccessArticle

Real-Time Bio-Inspired Polarization Heading Resolution System Based on ZYNQ Heterogeneous Computing

by Yuan Li, Zhuo Liu, Xiaohui Dong and Fangchen Dong

Sensors 2025, 25(9), 2744; https://doi.org/10.3390/s25092744 - 26 Apr 2025

Viewed by 396

Abstract

Polarization navigation is an emerging navigation technology, that exhibits significant advantages, including strong anti-interference capability and non-cumulative errors over time, making it highly promising for applications in aerospace, autonomous driving, and robotics. To address the requirements of high integration and low power consumption [...] Read more.

Polarization navigation is an emerging navigation technology, that exhibits significant advantages, including strong anti-interference capability and non-cumulative errors over time, making it highly promising for applications in aerospace, autonomous driving, and robotics. To address the requirements of high integration and low power consumption for tri-directional polarization navigation sensors, this study proposes a system-on-chip (SoC) design solution. The system employs the ZYNQ MPSoC (Xilinx Inc., San Jose, CA, USA) as its core, leveraging hardware acceleration on the Programmable Logic (PL) side for three-angle polarization image data acquisition, image preprocessing, and edge detection. Simultaneously, the Processing System (PS) side orchestrates task coordination, performs polarization angle resolution, and extracts the solar meridian via Hough transform. Experimental results demonstrate that the system achieves an average heading angle output time interval of 9.43 milliseconds (ms) with a mean error of 0.50°, fulfilling the real-time processing demands of mobile devices. Full article

(This article belongs to the Special Issue Optoelectronic Devices and Sensors)

► Show Figures

Figure 1

24 pages, 6840 KiB

Open AccessArticle

A Tree Crown Segmentation Approach for Unmanned Aerial Vehicle Remote Sensing Images on Field Programmable Gate Array (FPGA) Neural Network Accelerator

by Jiayi Ma, Lingxiao Yan, Baozhe Chen and Li Zhang

Sensors 2025, 25(9), 2729; https://doi.org/10.3390/s25092729 - 25 Apr 2025

Viewed by 531

Abstract

Tree crown detection of high-resolution UAV forest remote sensing images using computer technology has been widely performed in the last ten years. In forest resource inventory management based on remote sensing data, crown detection is the most important and essential part. Deep learning [...] Read more.

Tree crown detection of high-resolution UAV forest remote sensing images using computer technology has been widely performed in the last ten years. In forest resource inventory management based on remote sensing data, crown detection is the most important and essential part. Deep learning technology has achieved good results in tree crown segmentation and species classification, but relying on high-performance computing platforms, edge calculation, and real-time processing cannot be realized. In this thesis, the UAV images of coniferous Pinus tabuliformis and broad-leaved Salix matsudana collected by Jingyue Ecological Forest Farm in Changping District, Beijing, are used as datasets, and a lightweight neural network U-Net-Light based on U-Net and VGG16 is designed and trained. At the same time, the IP core and SoC architecture of the neural network accelerator are designed and implemented on the Xilinx ZYNQ 7100 SoC platform. The results show that U-Net-light only uses 1.56 MB parameters to classify and segment the crown images of double tree species, and the accuracy rate reaches 85%. The designed SoC architecture and accelerator IP core achieved 31 times the speedup of the ZYNQ hard core, and 1.3 times the speedup compared with the high-end CPU (Intel CoreTM i9-10900K). The hardware resource overhead is less than 20% of the total deployment platform, and the total on-chip power consumption is 2.127 W. Shorter prediction time and higher energy consumption ratio prove the effectiveness and rationality of architecture design and IP development. This work departs from conventional canopy segmentation methods that rely heavily on ground-based high-performance computing. Instead, it proposes a lightweight neural network model deployed on FPGA for real-time inference on unmanned aerial vehicles (UAVs), thereby significantly lowering both latency and system resource consumption. The proposed approach demonstrates a certain degree of innovation and provides meaningful references for the automation and intelligent development of forest resource monitoring and precision agriculture. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

24 pages, 896 KiB

Open AccessArticle

The Ubimus Plugging Framework: Deploying FPGA-Based Prototypes for Ubiquitous Music Hardware Design

by Damián Keller, Aman Jagwani and Victor Lazzarini

Computers 2025, 14(4), 155; https://doi.org/10.3390/computers14040155 - 21 Apr 2025

Viewed by 829

Abstract

The emergent field of embedded computing presents a challenging scenario for ubiquitous music (ubimus) design. Available tools demand specific technical knowledge—as exemplified in the techniques involved in programming integrated circuits of configurable logic units, known as field-programmable gate arrays (FPGAs). Low-level hardware description [...] Read more.

The emergent field of embedded computing presents a challenging scenario for ubiquitous music (ubimus) design. Available tools demand specific technical knowledge—as exemplified in the techniques involved in programming integrated circuits of configurable logic units, known as field-programmable gate arrays (FPGAs). Low-level hardware description languages used for handling FPGAs involve a steep learning curve. Hence, FPGA programming offers a unique challenge to probe the boundaries of ubimus frameworks as enablers of fast and versatile prototyping. State-of-the-art hardware-oriented approaches point to the use of high-level synthesis as a promising programming technique. Furthermore, current FPGA system-on-chip (SoC) hardware with an associated onboard general-purpose processor may foster the development of flexible platforms for musical signal processing. Taking into account the emergence of an FPGA-based ecology of tools, we introduce the ubimus plugging framework. The procedures employed in the construction of a modular- synthesis library based on field-programmable gate array hardware, ModFPGA, are documented, and examples of musical projects applying key design principles are discussed. Full article

► Show Figures

Figure 1

13 pages, 5610 KiB

Open AccessArticle

An Approach to Thermal Management and Performance Throttling for Federated Computation on a Low-Cost 3D ESP32-S3 Package Stack

by Yi Liu, Parth Sandeepbhai Shah, Tian Xia and Dryver Huston

Computers 2025, 14(4), 147; https://doi.org/10.3390/computers14040147 - 11 Apr 2025

Viewed by 541

Abstract

The rise of 3D heterogeneous packaging holds promise for increased performance in applications such as AI by bringing compute and memory modules into close proximity. This increased performance comes with increased thermal management challenges. This research explores the use of thermal sensing and [...] Read more.

The rise of 3D heterogeneous packaging holds promise for increased performance in applications such as AI by bringing compute and memory modules into close proximity. This increased performance comes with increased thermal management challenges. This research explores the use of thermal sensing and load throttling combined with federated computation to manage localized internal heating in a multi-3D chip package. The overall concept is that individual chiplets may heat at different rates due to operational and geometric factors. Shifting computational loads from hot to cooler chiplets can prevent local overheating while maintaining overall computational output. This concept is verified with experiments in a low-cost test vehicle. The test vehicle mimics a 3D chiplet stack with a tightly stacked assembly of SoC devices. These devices can sense and report internal temperature and dynamically adjust frequency. The configuration is for ESP32-S3 microcontrollers to work on a federated computational task, while reporting internal temperature to a host controller. The tight packing of processors causes temperatures to rise, with those internal to the stack rising more quickly than external ones. With real-time temperature monitoring, when the temperatures exceed a threshold, the AI system reduces the processor frequency, i.e., throttles the processor, to save power and dynamically shifts part of the workload to other ESP32-S3s with lower temperatures. This approach maximizes overall efficiency while maintaining thermal safety without compromising computational power. Experimental results with up to six processors confirm the validity of the concept. Full article

► Show Figures

Figure 1

18 pages, 1821 KiB

Open AccessArticle

Embedded Streaming Hardware Accelerators Interconnect Architectures and Latency Evaluation

by Cristian-Tiberius Axinte, Andrei Stan and Vasile-Ion Manta

Electronics 2025, 14(8), 1513; https://doi.org/10.3390/electronics14081513 - 9 Apr 2025

Viewed by 593

Abstract

In the age of hardware accelerators, increasing pressure is applied on computer architects and hardware engineers to improve the balance between the cost and benefits of specialized computing units, in contrast to more general-purpose architectures. The first part of this study presents the [...] Read more.

In the age of hardware accelerators, increasing pressure is applied on computer architects and hardware engineers to improve the balance between the cost and benefits of specialized computing units, in contrast to more general-purpose architectures. The first part of this study presents the embedded Streaming Hardware Accelerator (eSAC) architecture. This architecture can reduce the idle time of specialized logic. The remainder of this paper explores the integration of an eSAC into a Central Processing Unit (CPU) core embedded inside a System-on-Chip (SoC) design, using the AXI-Stream protocol specification. The three evaluated architectures are the Tightly Coupled Streaming, Protocol Adapter FIFO, and Direct Memory Access (DMA) Streaming architectures. When comparing the tightly coupled architecture with the one including the DMA, the experiments in this paper show an almost 3× decrease in frame latency when using the DMA. Nevertheless, this comes at the price of an increase in FPGA resource utilization as follows: LUT (2.5×), LUTRAM (3×), FF (3.4×), and BRAM (1.2×). Four different test scenarios were run for the DMA architecture, showcasing the best and worst practices for data organization. The evaluation results highlight that poor data organization can lead to a more than 7× increase in latency. The CPU model was selected as the newly released MicroBlaze-V softcore processor. The designs presented herein successfully operate on a popular low-cost Field-Programmable Gate Array (FPGA) development board at 100 MHz. Block diagrams, FPGA resource utilization, and latency metrics are presented. Finally, based on the evaluation results, possible improvements are discussed. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

38 pages, 1737 KiB

Open AccessArticle

Deep Learning Scheduling on a Field-Programmable Gate Array Cluster Using Configurable Deep Learning Accelerators

by Tianyang Fang, Alejandro Perez-Vicente, Hans Johnson and Jafar Saniie

Information 2025, 16(4), 298; https://doi.org/10.3390/info16040298 - 8 Apr 2025

Viewed by 2440

Abstract

This paper presents the development and evaluation of a distributed system employing low-latency embedded field-programmable gate arrays (FPGAs) to optimize scheduling for deep learning (DL) workloads and to configure multiple deep learning accelerator (DLA) architectures. Aimed at advancing FPGA applications in real-time edge [...] Read more.

This paper presents the development and evaluation of a distributed system employing low-latency embedded field-programmable gate arrays (FPGAs) to optimize scheduling for deep learning (DL) workloads and to configure multiple deep learning accelerator (DLA) architectures. Aimed at advancing FPGA applications in real-time edge computing, this study focuses on achieving optimal latency for a distributed computing system. A novel methodology was adopted, using configurable hardware to examine clusters of DLAs, varying in architecture and scheduling techniques. The system demonstrated its capability to parallel-process diverse neural network (NN) models, manage compute graphs in a pipelined sequence, and allocate computational resources efficiently to intensive NN layers. We examined five configurable DLAs—Versatile Tensor Accelerator (VTA), Nvidia DLA (NVDLA), Xilinx Deep Processing Unit (DPU), Tensil Compute Unit (CU), and Pipelined Convolutional Neural Network (PipeCNN)—across two FPGA cluster types consisting of Zynq-7000 and Zynq UltraScale+ System-on-Chip (SoC) processors, respectively. Four deep neural network (DNN) workloads were tested: Scatter-Gather, AI Core Assignment, Pipeline Scheduling, and Fused Scheduling. These methods revealed an exponential decay in processing time up to 90% speedup, although deviations were noted depending on the workload and cluster configuration. This research substantiates FPGAs’ utility in adaptable, efficient DL deployment, setting a precedent for future experimental configurations and performance benchmarks. Full article

(This article belongs to the Special Issue Machine Learning and Data Mining: Innovations in Big Data Analytics)

► Show Figures

Figure 1

16 pages, 4100 KiB

Open AccessArticle

Analysis and Experiments of Resonant Coupling Wireless Power Transfer System for Nonuniform Powering of Multiple Sensors

by Thuc Phi Duong, Ngoc Hung Phi, Bilal Ahmad, Sasani Jayasekara and Jong-Wook Lee

Sensors 2025, 25(7), 2342; https://doi.org/10.3390/s25072342 - 7 Apr 2025

Viewed by 545

Abstract

With a quickly increasing number of Internet of Things (IoT) involving different power levels, wireless power transfer (WPT) systems need the capability to deliver energy to multiple receivers simultaneously. The nonuniform powering of multiple receivers is also necessary, considering the different power levels [...] Read more.

With a quickly increasing number of Internet of Things (IoT) involving different power levels, wireless power transfer (WPT) systems need the capability to deliver energy to multiple receivers simultaneously. The nonuniform powering of multiple receivers is also necessary, considering the different power levels that IoT sensors demand. This paper investigates asymmetric resonant coupling WPT systems for powering multiple receivers. We propose a simple method for achieving the specified power ratio of the multiple receivers using the equivalent circuit model and reflected impedance technique. The results are generalized for a system with an N number of multiple receivers. Experiments are performed for powering two receivers with power ratios of 1.5 and 2.5, which achieve a power transfer efficiency of 91.7% and 88.6%, respectively. Another experiment performed for powering four receivers, which have power ratios of 1.0, 1.5, 2.0, and 0.75, shows an efficiency of up to 89.9%, which agrees well with the simulation result. Our result shows that the distance between the source loop and the transmitting resonator can be varied to maximize efficiency without altering the power division. Full article

(This article belongs to the Special Issue Advanced Sensing Technology and Data Analytics for Power Equipment Security and Energy System)

► Show Figures

Figure 1

20 pages, 26851 KiB

Open AccessArticle

Precise Position Estimation of Road Users by Extracting Object-Specific Key Points for Embedded Edge Cameras

by Gahyun Kim, Ju Hee Yoo, Ho Gi Jung and Jae Kyu Suhr

Electronics 2025, 14(7), 1291; https://doi.org/10.3390/electronics14071291 - 25 Mar 2025

Viewed by 569

Abstract

Detecting road users and estimating accurate positions are significant in intelligent transportation systems (ITS). Most monocular camera-based systems for this purpose use 2D bounding box detectors to obtain real-time operability. However, this approach has the drawback of causing large positioning errors due to [...] Read more.

Detecting road users and estimating accurate positions are significant in intelligent transportation systems (ITS). Most monocular camera-based systems for this purpose use 2D bounding box detectors to obtain real-time operability. However, this approach has the drawback of causing large positioning errors due to the use of upright rectangles for every type of object. To overcome this shortcoming, this paper proposes a method that improves the positioning accuracy of road users by modifying a conventional 2D bounding box detector to extract one or two additional object-specific key points. Since these key points are where the road users contact the ground plane, their accurate positions can be estimated based on the relation between the ground plane on the image and that on the map. The proposed method handles four types of road users: cars, pedestrians, cyclists (including motorcyclists), and e-scooter riders. This method is easy to implement by only adding extra heads to the conventional object detector and improves the positioning accuracy with a negligible amount of additional computational cost. In experiments, the proposed method was evaluated under various practical situations and showed a 66.5% improvement in road user position estimation. Furthermore, this method was simplified based on channel pruning and embedded on the edge camera with a Qualcomm QCS 610 System on Chip (SoC) to show its real-time capability. Full article

(This article belongs to the Special Issue Advanced Technologies and Applications in Computer Science and Engineering: 2nd Edition)

► Show Figures

Figure 1

25 pages, 478 KiB

Open AccessReview

Electromyography Signals in Embedded Systems: A Review of Processing and Classification Techniques

by José Félix Castruita-López, Marcos Aviles, Diana C. Toledo-Pérez, Idalberto Macías-Socarrás and Juvenal Rodríguez-Reséndiz

Biomimetics 2025, 10(3), 166; https://doi.org/10.3390/biomimetics10030166 - 10 Mar 2025

Cited by 1 | Viewed by 1565

Abstract

This article provides an overview of the implementation of electromyography (EMG) signal classification algorithms in various embedded system architectures. They address the specifications used for implementation in different devices, such as the number of movements and the type of classification method. Architectures analyzed [...] Read more.

This article provides an overview of the implementation of electromyography (EMG) signal classification algorithms in various embedded system architectures. They address the specifications used for implementation in different devices, such as the number of movements and the type of classification method. Architectures analyzed include microcontrollers, DSP, FPGA, SoC, and neuromorphic computers/chips in terms of precision, processing time, energy consumption, and cost. This analysis highlights the capabilities of each technology for real-time wearable applications such as smart prosthetics and gesture control devices, as well as the importance of local inference in artificial intelligence models to minimize execution times and resource consumption. The results show that the choice of device depends on the required system specifications, the robustness of the model, the number of movements to be classified, and the limits of knowledge concerning design and budget. This work provides a reference for selecting technologies for developing embedded biomedical solutions based on EMG. Full article

(This article belongs to the Special Issue Artificial Intelligence (AI) in Biomedical Engineering)

► Show Figures

Figure 1

17 pages, 432 KiB

Open AccessArticle

Efficient Modeling and Usage of Scratchpad Memory for Artificial Intelligence Accelerators

by Cagla Irmak Rumelili Köksal and Sıddıka Berna Örs Yalçın

Electronics 2025, 14(5), 1032; https://doi.org/10.3390/electronics14051032 - 5 Mar 2025

Viewed by 1610

Abstract

Deep learning accelerators play a crucial role in enhancing computation-intensive AI applications. Optimizing system resources—such as shared caches, on-chip SRAM, and data movement mechanisms—is essential for achieving peak performance and energy efficiency. This paper explores the trade-off between last-level cache (LLC) and scratchpad [...] Read more.

Deep learning accelerators play a crucial role in enhancing computation-intensive AI applications. Optimizing system resources—such as shared caches, on-chip SRAM, and data movement mechanisms—is essential for achieving peak performance and energy efficiency. This paper explores the trade-off between last-level cache (LLC) and scratchpad memory (SPM) usage in accelerator-based SoCs. To evaluate this trade-off, we introduce a high-speed simulator for estimating the timing performance of complex SoCs and demonstrate the benefits of SPM utilization. Our work shows that dynamic reconfiguration of the LLC into an SPM with prefetching capabilities reduces cache misses while improving resource utilization, performance, and energy efficiency. With SPM usage, we achieve up to 13× speedup and a 10% reduction in energy consumption for CNN backbones. Additionally, our simulator significantly outperforms state-of-the-art alternatives, running 3000× faster than gem5-SALAM for fixed-weight convolution computations and up to 64,000× faster as weight size increases. These results validate the effectiveness of both the proposed architecture and simulator in optimizing deep learning workloads. Full article

(This article belongs to the Special Issue Recent Advances in AI Hardware Design)

► Show Figures

Figure 1

20 pages, 3901 KiB

Open AccessArticle

Design and Implementation of a Lightweight and Energy-Efficient Semantic Segmentation Accelerator for Embedded Platforms

by Hui Li, Jinyi Li, Bowen Li, Zhengqian Miao and Shengli Lu

Micromachines 2025, 16(3), 258; https://doi.org/10.3390/mi16030258 - 25 Feb 2025

Viewed by 740

Abstract

With the rapid development of lightweight network models and efficient hardware deployment techniques, the demand for real-time semantic segmentation in areas such as autonomous driving and medical image processing has increased significantly. However, realizing efficient semantic segmentation on resource-constrained embedded platforms still faces [...] Read more.

With the rapid development of lightweight network models and efficient hardware deployment techniques, the demand for real-time semantic segmentation in areas such as autonomous driving and medical image processing has increased significantly. However, realizing efficient semantic segmentation on resource-constrained embedded platforms still faces many challenges. As a classical lightweight semantic segmentation network, ENet has attracted much attention due to its low computational complexity. In this study, we optimize the ENet semantic segmentation network to significantly reduce its computational complexity through structural simplification and 8-bit quantization and improve its hardware compatibility through the optimization of on-chip data storage and data transfer while maintaining 51.18% mIoU. The optimized network is successfully deployed on hardware accelerator and SoC systems based on Xilinx ZYNQ ZCU104 FPGA. In addition, we optimize the computational units of transposed convolution and dilated convolution and improve the on-chip data storage and data transfer design. The optimized system achieves a frame rate of 130.75 FPS, which meets the real-time processing requirements in areas such as autonomous driving and medical imaging. Meanwhile, the power consumption of the accelerator is 3.479 W, the throughput reaches 460.8 GOPS, and the energy efficiency reaches 132.2 GOPS/W. These results fully demonstrate the effectiveness of the optimization and deployment strategies in achieving a balance between computational efficiency and accuracy, which makes the system well suited for resource-constrained embedded platform applications. Full article

► Show Figures

Figure 1

21 pages, 1055 KiB

Open AccessArticle

Verification of SPI Protocol Using Universal Verification Methodology for Modern IoT and Wearable Devices

by Chin-Wen Liao, Hsiu-Chou Yu and Yu-Cheng Liao

Electronics 2025, 14(5), 837; https://doi.org/10.3390/electronics14050837 - 20 Feb 2025

Cited by 1 | Viewed by 1788

Abstract

The Serial Peripheral Interface (SPI) protocol plays a crucial role in wearable and IoT devices, enabling high-speed communication between microcontrollers and peripherals such as sensors, displays, and connectivity modules. With the increasing complexity of modern devices and system-on-chip (SoC) designs, robust verification methods [...] Read more.

The Serial Peripheral Interface (SPI) protocol plays a crucial role in wearable and IoT devices, enabling high-speed communication between microcontrollers and peripherals such as sensors, displays, and connectivity modules. With the increasing complexity of modern devices and system-on-chip (SoC) designs, robust verification methods are essential to ensure functionality and reliability. This paper utilizes the Universal Verification Methodology (UVM) to develop a scalable and reusable testbench for SPI verification. The process encompasses test planning, simulation, emulation, and top-level verification to validate multi-slave coordination and error-handling scenarios. The results demonstrate the critical importance of UVM in ensuring the performance and dependability of SPI in advanced electronics, contributing to the reliable integration of the protocol in future devices. The verification results demonstrated a functional coverage of 83.33% and 100% assertion coverage, confirming our approach’s robustness. Analysis of the uncovered functional bins revealed that specific edge cases, such as timing violations and multi-slave arbitration conflicts, require additional test scenarios for full verification. Furthermore, our testbench successfully identified and handled critical fault conditions, such as clock jitter, bus contention, and framing errors, ensuring reliable SPI operation in real-world deployments. These findings highlight the effectiveness of UVM-based verification in improving the reliability and robustness of SPI communication in modern low-power, resource-constrained embedded systems. Full article

► Show Figures

Figure 1

Search Results (306)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (306)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI