Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (174)

Search Parameters:
Keywords = digital signal processor (DSP)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 480 KB  
Article
Hardware-Oriented Lie-Group Optimization Library for FPGA-Accelerated SLAM Using Custom Numeric Precision
by Emanuel Trabes and Carlos Valderrama Sakuyama
Electronics 2026, 15(11), 2272; https://doi.org/10.3390/electronics15112272 - 25 May 2026
Viewed by 423
Abstract
Nonlinear optimization is a central component of visual odometry and simultaneous localization and mapping (SLAM), but its repeated small- and medium-scale linear algebra operations are difficult to deploy efficiently on embedded hardware. This paper presents a synthesizable C++ library for AMD/Xilinx Vitis high-level [...] Read more.
Nonlinear optimization is a central component of visual odometry and simultaneous localization and mapping (SLAM), but its repeated small- and medium-scale linear algebra operations are difficult to deploy efficiently on embedded hardware. This paper presents a synthesizable C++ library for AMD/Xilinx Vitis high-level synthesis (HLS) that provides field-programmable gate array (FPGA)-oriented dense linear algebra kernels and Lie-group primitives on SO(3) and SE(3). The library supports configurable scalar types, including IEEE floating point, posit arithmetic, and reduced-precision floating-point formats, enabling design-space exploration between numerical accuracy and hardware cost. The proposed kernels are integrated into the back-end of a monocular direct mesh-based visual SLAM system and evaluated on an AMD/Xilinx Kria KV260 platform. Compared with the software reference running on the embedded processor, the integrated FPGA implementation reduces the end-to-end optimization iteration time from 32.0 ms to 8.9 ms, corresponding to a speed-up of 3.6×, and reduces the dominant kernel latency from 25.0 ms to 4.9 ms. The most resource-efficient reduced-precision configuration reduces lookup table (LUT) usage by 29.6%, flip-flop (FF) usage by 25.7%, block random-access memory (BRAM) usage by 25.9%, and digital signal processor (DSP) usage by 38.6% relative to the floating-point hardware baseline, while keeping the relative trajectory error within 1.42%. The results show that Lie-group-aware optimization back-ends can be mapped to embedded FPGAs efficiently when fixed-size algebraic kernels, synthesis-aware memory structures, and configurable arithmetic are considered together. Full article
Show Figures

Figure 1

21 pages, 6540 KB  
Article
HAPQ: A Hardware-Aware Pruning and Quantization Pipeline for Event-Based SNN Detection
by Zhengyinan Li and Jing Wu
Sensors 2026, 26(9), 2910; https://doi.org/10.3390/s26092910 - 6 May 2026
Viewed by 747
Abstract
Autonomous driving perception demands low latency, high temporal resolution, and stringent hardware efficiency. While event-based spiking neural networks (SNNs) offer bio-inspired sparse computation, their deployment on edge field-programmable gate arrays (FPGAs) is obstructed by irregular execution patterns and temporal state storage overhead. To [...] Read more.
Autonomous driving perception demands low latency, high temporal resolution, and stringent hardware efficiency. While event-based spiking neural networks (SNNs) offer bio-inspired sparse computation, their deployment on edge field-programmable gate arrays (FPGAs) is obstructed by irregular execution patterns and temporal state storage overhead. To address this, we propose HAPQ, a unified hardware-aware pruning and quantization pipeline for compact event-based object detection. Starting from an end-to-end adaptive sampling SNN detector (EAS-SNN), HAPQ conducts hardware-aware configuration search within discrete digital signal processor (DSP) and block RAM (BRAM) budgets, applies single-instruction-multiple-data (SIMD)-aligned structured pruning for computational regularity, and jointly quantizes synaptic weights and membrane potentials via a shift-friendly fixed-point recurrence. Evaluation on the Prophesee Gen1 dataset and an FPGA accelerator shows that HAPQ improves detection accuracy from 0.284 to 0.425 in mean average precision (mAP50:95) and achieves 0.722 mAP50. Hardware implementation reveals a reduction in lookup table (LUT) usage to 1680, complete DSP elimination, and a maximum operating frequency of 920.81 MHz at 0.630 W. These results confirm that effective temporal SNN deployment requires joint optimization of model architecture, state precision, and hardware-aligned workload organization. Full article
Show Figures

Figure 1

15 pages, 736 KB  
Article
Reducing Energy Footprint of LLM Inference Through FPGA-Based Heterogeneous Computing Platforms
by Thiago Cormie Monteiro and Andrea Guerrieri
Electronics 2026, 15(5), 1052; https://doi.org/10.3390/electronics15051052 - 3 Mar 2026
Cited by 2 | Viewed by 2612
Abstract
Artificial Intelligence (AI) has emerged as a transformative force, increasingly integrated into diverse aspects of modern society, from healthcare and education to business and entertainment. Among the most influential AI technologies are large language models (LLMs), such as generative pretrained transformers (GPTs). These [...] Read more.
Artificial Intelligence (AI) has emerged as a transformative force, increasingly integrated into diverse aspects of modern society, from healthcare and education to business and entertainment. Among the most influential AI technologies are large language models (LLMs), such as generative pretrained transformers (GPTs). These models are designed to process vast amounts of data and perform complex computations, enabling advanced capabilities in natural language understanding and generation. However, deployment and operation of such systems requires significant computational resources, leading to substantial energy consumption. While general-purpose hardware such as GPUs is limited by fixed-precision architectures, field-programmable gate arrays (FPGAs) offer the bit-level reconfigurability needed to exploit ultra-low-bitwidth representations. This allows power-intensive multiplications to be replaced by streamlined logic-based accumulations, maximizing the energy benefits of model quantization. This paper addresses the problem of the energy impact of LLMs by leveraging innovative FPGA-based heterogeneous computing platforms. Results demonstrate that ternary matrix multiplication (MatMul) achieves a 23% speedup and a remarkable 96% reduction in digital signal processor (DSP) utilization. Furthermore, the final optimized design shows a 52% reduction in total energy consumption compared to the baseline, making heterogeneous computing a compelling solution for power- and resource-constrained embedded applications. Full article
(This article belongs to the Special Issue New Trends for Power Optimizations in FPGA-Based Embedded Systems)
Show Figures

Figure 1

23 pages, 13345 KB  
Article
Neural-Based Controller on Low-Density FPGAs for Dynamic Systems
by Edson E. Cruz-Miguel, José R. García-Martínez, Jorge Orrante-Sakanassi, José M. Álvarez-Alvarado, Omar A. Barra-Vázquez and Juvenal Rodríguez-Reséndiz
Electronics 2026, 15(1), 198; https://doi.org/10.3390/electronics15010198 - 1 Jan 2026
Cited by 1 | Viewed by 709
Abstract
This work introduces a logic resource-efficient Artificial Neural Network (ANN) controller for embedded control applications on low-density Field-Programmable Gate Array (FPGA) platforms. The proposed design relies on 32-bit fixed-point arithmetic and incorporates an online learning mechanism, enabling the controller to adapt to system [...] Read more.
This work introduces a logic resource-efficient Artificial Neural Network (ANN) controller for embedded control applications on low-density Field-Programmable Gate Array (FPGA) platforms. The proposed design relies on 32-bit fixed-point arithmetic and incorporates an online learning mechanism, enabling the controller to adapt to system variations while maintaining low hardware complexity. Unlike conventional artificial intelligence solutions that require high-performance processors or Graphics Processing Units (GPUs), the proposed approach targets platforms with limited logic, memory, and computational resources. The ANN controller was described using a Hardware Description Language (HDL) and validated via cosimulation between ModelSim and Simulink. A practical comparison was also made between Proportional-Integral-Derivative (PID) control and an ANN for motor position control. The results confirm that the architecture efficiently utilizes FPGA resources, consuming approximately 50% of the available Digital Signal Processor (DSP) units, less than 40% of logic cells, and only 6% of embedded memory blocks. Owing to its modular design, the architecture is inherently scalable, allowing additional inputs or hidden-layer neurons to be incorporated with minimal impact on overall resource usage. Additionally, the computational latency can be precisely determined and scales with (16n+39)m+31 clock cycles, enabling precise timing analysis and facilitating integration into real-time embedded control systems. Full article
Show Figures

Figure 1

24 pages, 19021 KB  
Article
Methodology for Impedance Spectroscopy of Photovoltaic Modules Using a Power Converter
by Diego Alejandro Herrera-Jaramillo, Juan David Bastidas-Rodríguez, Carlos Andrés Ramos-Paja, Carlos Pavon-Vargas, Luis E. Garcia-Marrero and Sergio Ignacio Serna-Garcés
Sensors 2026, 26(1), 161; https://doi.org/10.3390/s26010161 - 25 Dec 2025
Viewed by 984
Abstract
Impedance Spectroscopy (IS) is widely used to analyze the dynamic behavior and degradation of electrochemical systems such as batteries. IS has also been successfully applied to study the performance and degradation mechanisms of photovoltaic (PV) devices. Traditionally, IS is performed with Frequency Response [...] Read more.
Impedance Spectroscopy (IS) is widely used to analyze the dynamic behavior and degradation of electrochemical systems such as batteries. IS has also been successfully applied to study the performance and degradation mechanisms of photovoltaic (PV) devices. Traditionally, IS is performed with Frequency Response Analyzers (FRA), which apply small-signal perturbations and measure the impedance response of the system. However, those instruments are costly and not suitable for in situ diagnostics. This work proposes a methodology to perform IS measurements on PV systems using a power converter, thereby eliminating the need for external specialized equipment. The proposed approach includes a theoretical analysis of the converter dynamics to derive an expression for the duty cycle amplitude, which is required to maintain a constant perturbation magnitude across a range of frequencies. The methodology is experimentally validated using a synchronous Boost converter connected to a PV panel and controlled by a Texas Instruments F28379D digital signal processor (DSP), which injects the perturbation signal in the converter’s duty cycle. Moreover, the voltage and current measurements are performed with an oscilloscope. The results demonstrate that the proposed converter-based IS method accurately reproduces the impedance spectra obtained with a commercial FRA, confirming its feasibility as a low-cost, flexible, and scalable solution for PV impedance characterization and diagnostics. Full article
(This article belongs to the Special Issue Sensing and Estimation Techniques in Electrical Systems)
Show Figures

Figure 1

24 pages, 3243 KB  
Article
A State-Space Framework for Parallelizing Digital Signal Processing in Coherent Optical Receivers
by Jinyang Wang, Zhugang Wang and Di Liu
Sensors 2025, 25(23), 7389; https://doi.org/10.3390/s25237389 - 4 Dec 2025
Cited by 1 | Viewed by 856
Abstract
Ultra-high sampling rates in coherent optical front-ends increasingly exceed the processing capabilities of real-time baseband processors, creating a bottleneck in coherent free-space optical communication systems. We propose a unified state-space framework to systematically parallelize digital signal processing (DSP) algorithms. This approach transforms an [...] Read more.
Ultra-high sampling rates in coherent optical front-ends increasingly exceed the processing capabilities of real-time baseband processors, creating a bottleneck in coherent free-space optical communication systems. We propose a unified state-space framework to systematically parallelize digital signal processing (DSP) algorithms. This approach transforms an algorithm’s transfer function into a state-space representation from which a parallel architecture is derived through matrix operations, overcoming the complexity of traditional ad hoc methods. Crucially, our framework enables an analysis of parallelization-induced latency. We introduce the parallel equivalent delay (PED) metric and demonstrate that it introduces right-half-plane zeros into the loop’s transfer function, thereby fundamentally constraining stability. This analysis leads to the derivation of “Throughput–Bandwidth Product” (TBP), a constant that provides a design guideline linking maximum stable loop bandwidth to the parallelization factor. The framework’s efficacy is demonstrated by designing a parallel Costas carrier recovery loop. Simulations validate its performance, confirm the TBP limit, and show significant advantages over conventional feedforward estimators, especially in low-SNR conditions. Implementation results on a AMD XCVU13P FPGA demonstrate that the proposed 50-parallel architecture achieves a throughput of 15.625 Gsps at a clock frequency of 312.5 MHz with a logic utilization below 7%. The experimental results confirm the theoretical trade-off between throughput and loop bandwidth, verifying the proposed design methodology. Full article
(This article belongs to the Section Communications)
Show Figures

Figure 1

22 pages, 5833 KB  
Article
A Codesign Framework for the Development of Next Generation Wearable Computing Systems
by Francesco Porreca, Fabio Frustaci and Raffaele Gravina
Sensors 2025, 25(21), 6624; https://doi.org/10.3390/s25216624 - 28 Oct 2025
Cited by 1 | Viewed by 1822
Abstract
Wearable devices can be developed using hardware platforms such as Application Specific Integrated Circuits (ASICs), Graphics Processing Units (GPUs), Digital Signal Processors (DSPs), Micro controller Units (MCUs), or Field Programmable Gate Arrays (FPGAs), each with distinct advantages and limitations. ASICs offer high efficiency [...] Read more.
Wearable devices can be developed using hardware platforms such as Application Specific Integrated Circuits (ASICs), Graphics Processing Units (GPUs), Digital Signal Processors (DSPs), Micro controller Units (MCUs), or Field Programmable Gate Arrays (FPGAs), each with distinct advantages and limitations. ASICs offer high efficiency but lack flexibility. GPUs excel in parallel processing but consume significant power. DSPs are optimized for signal processing but are limited in versatility. CPUs provide low power consumption but lack computational power. FPGAs are highly flexible, enabling powerful parallel processing at lower energy costs than GPUs but with higher resource demands than ASICs. The combined use of FPGAs and CPUs balances power efficiency and computational capability, making it ideal for wearable systems requiring complex algorithms in far-edge computing, where data processing occurs onboard the device. This approach promotes green electronics, extending battery life and reducing user inconvenience. The primary goal of this work was to develop a versatile framework, similar to existing software development frameworks, but specifically tailored for mixed FPGA/MCU platforms. The framework was validated through a real-world use case, demonstrating significant improvements in execution speed and power consumption. These results confirm its effectiveness in developing green and smart wearable systems. Full article
(This article belongs to the Section Wearables)
Show Figures

Figure 1

23 pages, 2255 KB  
Article
Design and Implementation of a YOLOv2 Accelerator on a Zynq-7000 FPGA
by Huimin Kim and Tae-Kyoung Kim
Sensors 2025, 25(20), 6359; https://doi.org/10.3390/s25206359 - 14 Oct 2025
Cited by 3 | Viewed by 2652
Abstract
You Only Look Once (YOLO) is a convolutional neural network-based object detection algorithm widely used in real-time vision applications. However, its high computational demand leads to significant power consumption and cost when deployed in graphics processing units. Field-programmable gate arrays offer a low-power [...] Read more.
You Only Look Once (YOLO) is a convolutional neural network-based object detection algorithm widely used in real-time vision applications. However, its high computational demand leads to significant power consumption and cost when deployed in graphics processing units. Field-programmable gate arrays offer a low-power alternative. However, their efficient implementation requires architecture-level optimization tailored to limited device resources. This study presents an optimized YOLOv2 accelerator for the Zynq-7000 system-on-chip (SoC). The design employs 16-bit integer quantization, a filter reuse structure, an input feature map reuse scheme using a line buffer, and tiling parameter optimization for the convolution and max pooling layers to maximize resource efficiency. In addition, a stall-based control mechanism is introduced to prevent structural hazards in the pipeline. The proposed accelerator was implemented on the Zynq-7000 SoC board, and a system-level evaluation confirmed a negligible accuracy drop of only 0.2% compared with the 32-bit floating-point baseline. Compared with previous YOLO accelerators on the same SoC, the design achieved up to 26% and 15% reductions in flip-flop and digital signal processor usage, respectively. This result demonstrates feasible deployment on XC7Z020 with DSP 57.27% and FF 16.55% utilization. Full article
(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)
Show Figures

Figure 1

19 pages, 819 KB  
Article
Efficient CNN Accelerator Based on Low-End FPGA with Optimized Depthwise Separable Convolutions and Squeeze-and-Excite Modules
by Jiahe Shen, Xiyuan Cheng, Xinyu Yang, Lei Zhang, Wenbin Cheng and Yiting Lin
AI 2025, 6(10), 244; https://doi.org/10.3390/ai6100244 - 1 Oct 2025
Cited by 17 | Viewed by 3539
Abstract
With the rapid development of artificial intelligence technology in the field of intelligent manufacturing, convolutional neural networks (CNNs) have shown excellent performance and generalization capabilities in industrial applications. However, the huge computational and resource requirements of CNNs have brought great obstacles to their [...] Read more.
With the rapid development of artificial intelligence technology in the field of intelligent manufacturing, convolutional neural networks (CNNs) have shown excellent performance and generalization capabilities in industrial applications. However, the huge computational and resource requirements of CNNs have brought great obstacles to their deployment on low-end hardware platforms. To address this issue, this paper proposes a scalable CNN accelerator that can operate on low-performance Field-Programmable Gate Arrays (FPGAs), which is aimed at tackling the challenge of efficiently running complex neural network models on resource-constrained hardware platforms. This study specifically optimizes depthwise separable convolution and the squeeze-and-excite module to improve their computational efficiency. The proposed accelerator allows for the flexible adjustment of hardware resource consumption and computational speed through configurable parameters, making it adaptable to FPGAs with varying performance and different application requirements. By fully exploiting the characteristics of depthwise separable convolution, the accelerator optimizes the convolution computation process, enabling flexible and independent module stackings at different stages of computation. This results in an optimized balance between hardware resource consumption and computation time. Compared to ARM CPUs, the proposed approach yields at least a 1.47× performance improvement, and compared to other FPGA solutions, it saves over 90% of Digital Signal Processors (DSPs). Additionally, the optimized computational flow significantly reduces the accelerator’s reliance on internal caches, minimizing data latency and further improving overall processing efficiency. Full article
Show Figures

Figure 1

20 pages, 5246 KB  
Article
Class E ZVS Resonant Inverter with CLC Filter and PLL-Based Resonant Frequency Tracking for Ultrasonic Piezoelectric Transducer
by Apinan Aurasopon, Boontan Sriboonrueng, Jirapong Jittakort and Saichol Chudjuarjeen
J. Low Power Electron. Appl. 2025, 15(3), 54; https://doi.org/10.3390/jlpea15030054 - 22 Sep 2025
Cited by 1 | Viewed by 2278
Abstract
This paper presents a Class E zero-voltage soft-switching (ZVS) resonant inverter integrated with a CLC filter and a digital resonant frequency tracking technique for driving a piezoelectric ceramic transducer (PZT) in ultrasonic cleaning applications. A digital signal processor (DSP) is used to dynamically [...] Read more.
This paper presents a Class E zero-voltage soft-switching (ZVS) resonant inverter integrated with a CLC filter and a digital resonant frequency tracking technique for driving a piezoelectric ceramic transducer (PZT) in ultrasonic cleaning applications. A digital signal processor (DSP) is used to dynamically monitor and adjust the operating frequency in response to slight variations in the cleaning load, employing a phase-locked loop (PLL) control scheme. The proposed method ensures that the inverter maintains ZVS operation across a frequency range from 30.0 kHz to 34.0 kHz, thereby improving energy efficiency and reducing switching losses. The system is capable of delivering a stable power output of 100 W. Both the simulation and experimental results validate the effectiveness of the proposed technique, demonstrating improved performance under varying load conditions. The combination of CLC filtering and frequency tracking offers a compact and robust solution suitable for ultrasonic cleaner systems and similar resonant-load applications. Full article
Show Figures

Figure 1

19 pages, 17187 KB  
Article
Controller Hardware-in-the-Loop Validation of a DSP-Controlled Grid-Tied Inverter Using Impedance and Time-Domain Approaches
by Leonardo Casey Hidalgo Monsivais, Yuniel León Ruiz, Julio Cesar Hernández Ramírez, Nancy Visairo-Cruz, Juan Segundo-Ramírez and Emilio Barocio
Electricity 2025, 6(3), 52; https://doi.org/10.3390/electricity6030052 - 6 Sep 2025
Cited by 3 | Viewed by 1891
Abstract
In this work, a controller hardware-in-the-loop (CHIL) simulation of a grid-connected three-phase inverter equipped with an LCL filter is implemented using a real-time digital simulator (RTDS) as the plant and a digital signal processor (DSP) as the control hardware. This work identifies and [...] Read more.
In this work, a controller hardware-in-the-loop (CHIL) simulation of a grid-connected three-phase inverter equipped with an LCL filter is implemented using a real-time digital simulator (RTDS) as the plant and a digital signal processor (DSP) as the control hardware. This work identifies and discusses the critical aspects of the CHIL implementation process, emphasizing the relevance of the control delays that arise from sampling, computation, and pulse width modulation (PWM), which also adversely affect system stability, accuracy, and performance. Time and frequency domains are used to validate the modeling of the system, either to represent large-signal or small-signal models. This work shows multiple representations of the system under study: the fundamental frequency model, the switched model, and the switched model controlled by the DSP, are used to validate the nonlinear model, whereas the impedance-based modeling is followed to validate the linear representation. The results demonstrate a strong correlation among the models, confirming that the delay effects are accurately captured in the different simulation approaches. This comparison provides valuable insights into configuration practices that improve the fidelity of CHIL-based validation and supports impedance-based stability analysis in power electronic systems. The findings are particularly relevant for wideband modeling and real-time studies in electromagnetic transient analysis. Full article
Show Figures

Figure 1

18 pages, 6610 KB  
Article
Design and Implementation of a Teaching Model for EESM Using a Modified Automotive Starter-Generator
by Patrik Resutík, Matúš Danko and Michal Praženica
World Electr. Veh. J. 2025, 16(9), 480; https://doi.org/10.3390/wevj16090480 - 22 Aug 2025
Viewed by 5308
Abstract
This project presents the development of an open-source educational platform based on an automotive Electrically Excited Synchronous Machine (EESM) repurposed from a KIA Sportage mild-hybrid vehicle. The introduction provides an overview of hybrid drive systems and the primary configurations employed in automotive applications, [...] Read more.
This project presents the development of an open-source educational platform based on an automotive Electrically Excited Synchronous Machine (EESM) repurposed from a KIA Sportage mild-hybrid vehicle. The introduction provides an overview of hybrid drive systems and the primary configurations employed in automotive applications, including classifications based on power flow and the placement of electric motors. The focus is placed on the parallel hybrid configuration, where a belt-driven starter-generator assists the internal combustion engine (ICE). Due to the proprietary nature of the original control system, the unit was disassembled, and a custom control board was designed using a Texas Instruments C2000 Digital Signal Processor (DSP). The motor features a six-phase dual three-phase stator, offering improved torque smoothness, fault tolerance, and reduced current per phase. A compact Anisotropic Magneto Resistive (AMR) position sensor was implemented for position and speed measurements. Current sensing was achieved using both direct and magnetic field-based methods. The control algorithm was verified on a modified six-phase inverter under simulated vehicle conditions utilizing a dynamometer. Results confirmed reliable operation and validated the control approach. Future work will involve complete hardware testing with the new control board to finalize the platform as a flexible, open-source tool for research and education in hybrid drive technologies. Full article
Show Figures

Figure 1

31 pages, 11216 KB  
Article
An Optimal Integral Fast Terminal Synergetic Control Scheme for a Grid-to-Vehicle and Vehicle-to-Grid Battery Electric Vehicle Charger Based on the Black-Winged Kite Algorithm
by Ishak Aris, Yanis Sadou and Abdelbaset Laib
Energies 2025, 18(13), 3397; https://doi.org/10.3390/en18133397 - 27 Jun 2025
Cited by 2 | Viewed by 1079
Abstract
The utilization of electric vehicles (EVs) has grown significantly and continuously in recent years, encouraging the creation of new implementation opportunities. The battery electric vehicle (BEV) charging system can be effectively used during peak load periods, for voltage regulation, and for the improvement [...] Read more.
The utilization of electric vehicles (EVs) has grown significantly and continuously in recent years, encouraging the creation of new implementation opportunities. The battery electric vehicle (BEV) charging system can be effectively used during peak load periods, for voltage regulation, and for the improvement of power system stability within the smart grid. It provides an efficient bidirectional interface for charging the battery from the grid and discharging the battery into the grid. These two operation modes are referred to as grid-to-vehicle (G2V) and vehicle-to-grid (V2G), respectively. The management of power flow in both directions is highly complex and sensitive, which requires employing a robust control scheme. In this paper, an Integral Fast Terminal Synergetic Control Scheme (IFTSC) is designed to control the BEV charger system through accurately tracking the required current and voltage in both G2V and V2G system modes. Moreover, the Black-Winged Kite Algorithm is introduced to select the optimal gains of the proposed IFTS control scheme. The system stability is checked using the Lyapunov stability method. Comprehensive simulations using MATLAB/Simulink are conducted to assess the safety and efficacy of the suggested optimal IFTSC in comparison with IFTSC, optimal integral synergetic, and conventional PID controllers. Furthermore, processor-in-the-loop (PIL) co-simulation is carried out for the studied system using the C2000 launchxl-f28379d digital signal processing (DSP) board to confirm the practicability and effectiveness of the proposed OIFTS. The analysis of the obtained quantitative comparison proves that the proposed optimal IFTSC provides higher control performance under several critical testing scenarios. Full article
(This article belongs to the Section D: Energy Storage and Application)
Show Figures

Figure 1

19 pages, 6410 KB  
Article
Optimized FPGA Architecture for CNN-Driven Subsurface Geotechnical Defect Detection
by Xiangyu Li, Linjian Che, Shunjiong Li, Zidong Wang and Wugang Lai
Electronics 2025, 14(13), 2585; https://doi.org/10.3390/electronics14132585 - 26 Jun 2025
Cited by 1 | Viewed by 1362
Abstract
Convolutional neural networks (CNNs) are widely used in geotechnical engineering. Real-time detection in complex geological environments, combined with the strict power constraints of embedded devices, makes Field-Programmable Gate Array (FPGA) platforms ideal for accelerating CNNs. Conventional parallelization strategies in FPGA-based accelerators often result [...] Read more.
Convolutional neural networks (CNNs) are widely used in geotechnical engineering. Real-time detection in complex geological environments, combined with the strict power constraints of embedded devices, makes Field-Programmable Gate Array (FPGA) platforms ideal for accelerating CNNs. Conventional parallelization strategies in FPGA-based accelerators often result in imbalanced resource utilization and computational inefficiency due to varying kernel sizes. To address this issue, we propose a customized heterogeneous hybrid parallel strategy and refine the bit-splitting approach for Digital Signal Processor (DSP) resources, improving timing performance and reducing Look-Up Table (LUT) consumption. Using this strategy, we deploy the lightweight YOLOv5n network on an FPGA platform, creating a high-speed, low-power subsurface geotechnical defect-detection system. A layer-wise quantization strategy reduces the model size with negligible mean average precision (mAP) loss. Operating at 300 MHz, the system reduces LUT usage by 33%, achieves a peak throughput of 328.25 GOPs in convolutional layers, and an overall throughput of 157.04 GOPs, with a power consumption of 9.4 W and energy efficiency of 16.7 GOPs/W. This implementation demonstrates more balanced performance improvements than existing solutions. Full article
Show Figures

Figure 1

23 pages, 1475 KB  
Article
Learning Online MEMS Calibration with Time-Varying and Memory-Efficient Gaussian Neural Topologies
by Danilo Pietro Pau, Simone Tognocchi and Marco Marcon
Sensors 2025, 25(12), 3679; https://doi.org/10.3390/s25123679 - 12 Jun 2025
Cited by 4 | Viewed by 5176
Abstract
This work devised an on-device learning approach to self-calibrate Micro-Electro-Mechanical Systems-based Inertial Measurement Units (MEMS-IMUs), integrating a digital signal processor (DSP), an accelerometer, and a gyroscope in the same package. The accelerometer and gyroscope stream their data in real time to the DSP, [...] Read more.
This work devised an on-device learning approach to self-calibrate Micro-Electro-Mechanical Systems-based Inertial Measurement Units (MEMS-IMUs), integrating a digital signal processor (DSP), an accelerometer, and a gyroscope in the same package. The accelerometer and gyroscope stream their data in real time to the DSP, which runs artificial intelligence (AI) workloads. The real-time sensor data are subject to errors, such as time-varying bias and thermal stress. To compensate for these drifts, the traditional calibration method based on a linear model is applicable, and unfortunately, it does not work with nonlinear errors. The algorithm devised by this study to reduce such errors adopts Radial Basis Function Neural Networks (RBF-NNs). This method does not rely on the classical adoption of the backpropagation algorithm. Due to its low complexity, it is deployable using kibyte memory and in software runs on the DSP, thus performing interleaved in-sensor learning and inference by itself. This avoids using any off-package computing processor. The learning process is performed periodically to achieve consistent sensor recalibration over time. The devised solution was implemented in both 32-bit floating-point data representation and 16-bit quantized integer version. Both of these were deployed into the Intelligent Sensor Processing Unit (ISPU), integrated into the LSM6DSO16IS Inertial Measurement Unit (IMU), which is a programmable 5–10 MHz DSP on which the programmer can compile and execute AI models. It integrates 32 KiB of program RAM and 8 KiB of data RAM. No permanent memory is integrated into the package. The two (fp32 and int16) RBF-NN models occupied less than 21 KiB out of the 40 available, working in real-time and independently in the sensor package. The models, respectively, compensated between 46% and 95% of the accelerometer measurement error and between 32% and 88% of the gyroscope measurement error. Finally, it has also been used for attitude estimation of a micro aerial vehicle (MAV), achieving an error of only 2.84°. Full article
(This article belongs to the Special Issue Sensors and IoT Technologies for the Smart Industry)
Show Figures

Graphical abstract

Back to TopTop