MDPI - Publisher of Open Access Journals

22 pages, 831 KB

Open AccessArticle

Energy-Efficient Dual-Core RISC-V Architecture for Edge AI Acceleration with Dynamic MAC Unit Reuse

by Cristian Andy Tanase

Computers 2026, 15(4), 219; https://doi.org/10.3390/computers15040219 - 1 Apr 2026

Viewed by 755

This paper presents a dual-core RISC-V architecture designed for energy-efficient AI acceleration at the edge, featuring dynamic MAC unit sharing, frequency scaling (DFS), and FIFO-based resource arbitration. The system comprises two RISC-V cores that compete for shared computational resources—a single Multiply–Accumulate (MAC) unit [...] Read more.

This paper presents a dual-core RISC-V architecture designed for energy-efficient AI acceleration at the edge, featuring dynamic MAC unit sharing, frequency scaling (DFS), and FIFO-based resource arbitration. The system comprises two RISC-V cores that compete for shared computational resources—a single Multiply–Accumulate (MAC) unit and a shared external memory subsystem—governed by a channel-based arbitration mechanism with CPU-priority semantics, while each core maintains private instruction and data caches. The architecture implements a tightly coupled Neural Processing Unit (NPU) with CONV, GEMM, and POOL operations that execute opportunistically in the background when the MAC unit is available. Dynamic frequency scaling (DFS) with three levels (100/200/400 MHz) is applied to the shared MAC unit, allowing the dynamic acceleration of CNN workloads. The arbitration mechanism uses SystemC sc_fifo channels with CPU-priority polling, ensuring that CPU execution is minimally impacted by background AI processing while the NPU makes progress during idle MAC slots. The NPU supports 3 × 3 convolutions, matrix multiplication (GEMM) with 10 × 10 tiles, and pooling operations. The implementation is cycle-accurate in SystemC, targeting FPGA deployment. Experimental evaluation demonstrates that the dual-core architecture achieves 1.87× speedup with 93.5% efficiency for parallel workloads, while DFS enables 70% power reduction at low frequency. The system successfully executes simultaneous CPU and AI workloads, with CPU-priority arbitration ensuring no CPU starvation under contention. The proposed design offers a practical solution for embedded AI applications requiring both general-purpose computation and neural network acceleration, validated through comprehensive SystemC simulation on modern FPGA platforms. Full article

(This article belongs to the Special Issue High-Performance Computing (HPC) and Computer Architecture)

► Show Figures

Figure 1

31 pages, 4949 KB

Open AccessEditor’s ChoiceArticle

Attention Distribution-Aware Softmax for NPU-Accelerated On-Device Inference of LLMs: An Edge-Oriented Approximation Design

by Sanoop Sadheerthan, Min-Jie Hsu, Chih-Hsiang Huang and Yin-Tien Wang

Electronics 2026, 15(6), 1312; https://doi.org/10.3390/electronics15061312 - 20 Mar 2026

Viewed by 670

Abstract

Low-power NPUs enable on-device LLM inference through efficient integer and fixed-point algebra, yet their lack of native exponential support makes Transformer softmax a critical performance bottleneck. Existing NPU kernels approximate

e^{x}

using uniform piecewise polynomials to enable O(1) SIMD indexing, but this [...] Read more.

Low-power NPUs enable on-device LLM inference through efficient integer and fixed-point algebra, yet their lack of native exponential support makes Transformer softmax a critical performance bottleneck. Existing NPU kernels approximate

e^{x}

using uniform piecewise polynomials to enable O(1) SIMD indexing, but this wastes computation by applying high-degree arithmetic indiscriminately in every segment. Conversely, fully adaptive approaches maximize statistical fidelity but introduce pipeline stalls due to comparator-based boundary search. To bridge this gap, we propose an attention distribution-aware softmax that uses Particle Swarm Optimization (PSO) to define non-uniform segments and variable polynomial degrees, prioritizing finer granularity and lower arithmetic complexity in attention-dense regions. To ensure efficiency, we snap boundaries into a 128-bin LUT, enabling O(1) retrieval of segment parameters without branching. Inference measurements show that this favors low-degree execution, minimizing exp-kernel overhead. Using TinyLlama-1.1B-Chat as a testbed, the proposed weighted design reduces cycles per call exp kernel (CPC) by 18.5% versus an equidistant uniform Degree-4 baseline and 13.1% versus uniform Degree-3, while preserving ranking fidelity. These results show that grid-snapped, variable-degree approximation can improve softmax efficiency while largely preserving attention ranking fidelity, enabling accurate edge LLM inference. Full article

(This article belongs to the Special Issue Emerging Applications of FPGAs and Reconfigurable Computing System)

► Show Figures

Figure 1

13 pages, 1144 KB

Open AccessArticle

NPU-Aware Fault Injection and Statistical Sensitivity Analysis for CNN Reliability Evaluation

by Yang Hua, Jianyu Zhang, Quanyu Piao, Wei Zhuang and Yuanfu Zhao

Electronics 2026, 15(6), 1295; https://doi.org/10.3390/electronics15061295 - 20 Mar 2026

Viewed by 366

Abstract

Artificial intelligence (AI) is propelling space exploration into a new era. Synergistic breakthroughs in chip design and high-speed communications have facilitated the large-scale deployment of on-board satellite computing. Assessing the reliability of these systems via fault injection (FI) remains difficult due to the [...] Read more.

Artificial intelligence (AI) is propelling space exploration into a new era. Synergistic breakthroughs in chip design and high-speed communications have facilitated the large-scale deployment of on-board satellite computing. Assessing the reliability of these systems via fault injection (FI) remains difficult due to the massive computational demands of Convolutional Neural Networks (CNNs) and the complex architectures of Neural Processing Units (NPUs). This research presents a high-precision, efficient FI methodology specifically tailored for NPU architectures to optimize both evaluation accuracy and execution efficiency. Implementing a hierarchical injection strategy to identify fault-sensitive layers minimizes computational overhead while ensuring statistical validity. Experimental results on the ResNet-50 network demonstrate that the proposed methodology constrains accuracy degradation to less than 0.1% while achieving a 60.80% reduction in total execution time. Full article

(This article belongs to the Special Issue Artificial Intelligence and Microsystems)

► Show Figures

Figure 1

25 pages, 15600 KB

Open AccessArticle

Filter Independence-Aware Pruning: Efficient Neural Networks for On-Device AI

by Jiali Wang, Hongxia Bie, Zhao Jing, Yichen Zhi, Yongkai Fan and Wentao Ma

Electronics 2026, 15(4), 794; https://doi.org/10.3390/electronics15040794 - 12 Feb 2026

Viewed by 518

Abstract

Filter pruning is an effective approach for improving the inference efficiency of neural networks and is particularly attractive for on-device artificial intelligence (AI) applications. However, many existing methods fail to accurately identify redundant filters due to limited modeling of inter-filter dependencies. A filter [...] Read more.

Filter pruning is an effective approach for improving the inference efficiency of neural networks and is particularly attractive for on-device artificial intelligence (AI) applications. However, many existing methods fail to accurately identify redundant filters due to limited modeling of inter-filter dependencies. A filter pruning method based on nuclear norm analysis is proposed to quantify filter independence and guide structured pruning. By analyzing the layer-wise distribution of independence scores, a principled trade-off between pruning rate and accuracy preservation is achieved. In most evaluation scenarios, the proposed method achieves 75–95% parameter reduction and 70–80% FLOPs reduction, while substantially higher compression ratios (up to 99%) can be obtained for more redundant network architectures, with consistent performance trends observed across multiple accuracy-related metrics. Furthermore, deployment on an RK3588 neural processing unit (NPU) demonstrates substantial reductions in memory consumption and inference latency, confirming the practical effectiveness of the method for mobile and edge AI applications. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

23 pages, 16184 KB

Open AccessArticle

A Lightweight Drone Vision System for Autonomous Inspection with Real-Time Processing

by Zhengran Zhou, Wei Wang, Hao Wu, Tong Wang and Satoshi Suzuki

Drones 2026, 10(2), 126; https://doi.org/10.3390/drones10020126 - 11 Feb 2026

Viewed by 1405

Abstract

Automated inspection of power infrastructure with drones requires processing video streams in real time and performing object recognition from image data with constrained resources. Server-based object recognition algorithms depend on transmitting data over a network and require considerable computational resources. In this study, [...] Read more.

Automated inspection of power infrastructure with drones requires processing video streams in real time and performing object recognition from image data with constrained resources. Server-based object recognition algorithms depend on transmitting data over a network and require considerable computational resources. In this study, we present an automated system designed to inspect power infrastructure using drones in real time. The proposed system is implemented on the Rockchip RK3588 platform and uses a lightweight YOLOv8 architecture incorporating a Slim-Neck model with a VanillaBlock module integrated into the backbone. To support real-time operation, we developed a digital video stream processing system (DVSPS) to coordinate multimedia processor (MPP)-based hardware video decoding, with inference performed on a multicore neural processing unit (NPU) using thread pooling. The system can navigate autonomously using a closed-loop machine vision system that computes the latitude and longitude of electrical towers to perform multilevel inspections. The proposed model attained an 84.2% mAP50 and 52.5% mAP50:95 with 3.7 GFLOPs and an average throughput of 111.3 FPS with 34% fewer parameters. These results demonstrate that the proposed method is an efficient and scalable solution for autonomous inspection across diverse operational conditions. Full article

► Show Figures

Figure 1

20 pages, 1423 KB

Open AccessArticle

Efficient Low-Precision GEMM on Ascend NPU: HGEMM’s Synergy of Pipeline Scheduling, Tiling, and Memory Optimization

by Erkun Zhang, Pengxiang Xu and Lu Lu

Computers 2026, 15(1), 39; https://doi.org/10.3390/computers15010039 - 8 Jan 2026

Viewed by 1625

Abstract

As one of the most widely used high-performance kernels, General Matrix Multiplication, or GEMM, plays a pivotal role in diverse application fields. With the growing prevalence of training for Convolutional Neural Networks (CNNs) and Large Language Models (LLMs), the design and implementation of [...] Read more.

As one of the most widely used high-performance kernels, General Matrix Multiplication, or GEMM, plays a pivotal role in diverse application fields. With the growing prevalence of training for Convolutional Neural Networks (CNNs) and Large Language Models (LLMs), the design and implementation of high-efficiency, low-precision GEMM on modern Neural Processing Unit (NPU) platforms are of great significance. In this work, HGEMM for Ascend NPU is presented, which enables collaborative processing of different computation types by Cube units and Vector units. The major contributions of this work are the following: (i) dual-stream pipeline scheduling is implemented, which synchronizes padding operations, matrix–matrix multiplications, and element-wise instructions across hierarchical buffers and compute units; (ii) a suite of tiling strategies and a corresponding strategy selection mechanism are developed, comprehensively accounting for the impacts from M, N, and K directions; and (iii) SplitK as well as ShuffleK methods are raised to address the challenges of memory access efficiency and AI Core utilization. Extensive evaluations demonstrate that our proposed HGEMM achieves an average 3.56× speedup over the CATLASS template-based implementation under identical Ascend NPU configurations, and an average 2.10× speedup relative to the cuBLAS implementation on Nvidia A800 GPUs under general random workloads. It also achieves a maximum computational utilization exceeding 90% under benchmark workloads. Moreover, the proposed HGEMM not only significantly outperforms the CATLASS template-based implementation but also delivers efficiency comparable to the cuBLAS implementation in OPT-based bandwidth-limited LLM inference workloads. Full article

► Show Figures

Figure 1

25 pages, 4235 KB

Open AccessArticle

A Performance Study of Deep Neural Network Representations of Interpretable ML on Edge Devices with AI Accelerators

by Julian Schauer, Payman Goodarzi, Jannis Morsch and Andreas Schütze

Sensors 2025, 25(18), 5681; https://doi.org/10.3390/s25185681 - 11 Sep 2025

Cited by 4 | Viewed by 2546

Abstract

With the rising adoption of machine learning (ML) and deep learning (DL) applications, the demand for deploying these algorithms closer to sensors has grown significantly, particularly in sensor-driven use cases such as predictive maintenance (PM) and condition monitoring (CM). This study investigated a [...] Read more.

With the rising adoption of machine learning (ML) and deep learning (DL) applications, the demand for deploying these algorithms closer to sensors has grown significantly, particularly in sensor-driven use cases such as predictive maintenance (PM) and condition monitoring (CM). This study investigated a novel application-oriented approach to representing interpretable ML inference as deep neural networks (DNNs) regarding the latency and energy efficiency on the edge, to tackle the problem of inefficient, high-effort, and uninterpretable-implementation ML algorithms. For this purpose, the interpretable deep neural network representation (IDNNRep) was integrated into an open-source interpretable ML toolbox to demonstrate the inference time and energy efficiency improvements. The goal of this work was to enable the utilization of generic artificial intelligence (AI) accelerators for interpretable ML algorithms to achieve efficient inference on edge hardware in smart sensor applications. This novel approach was applied to one regression and one classification task from the field of PM and validated by implementing the inference on the neural processing unit (NPU) of the QXSP-ML81 Single-Board Computer and the tensor processing unit (TPU) of the Google Coral. Different quantization levels of the implementation were tested against common Python and C++ implementations. The novel implementation reduced the inference time by up to 80% and the mean energy consumption by up to 76% at the lowest precision with only a 0.4% loss of accuracy compared to the C++ implementation. With the successful utilization of generic AI accelerators, the performance was further improved with a 94% reduction for both the inference time and the mean energy consumption. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

24 pages, 8898 KB

Open AccessArticle

Performance and Efficiency Gains of NPU-Based Servers over GPUs for AI Model Inference

by Youngpyo Hong and Dongsoo Kim

Systems 2025, 13(9), 797; https://doi.org/10.3390/systems13090797 - 11 Sep 2025

Cited by 2 | Viewed by 11053

Abstract

The exponential growth of AI applications has intensified the demand for efficient inference hardware capable of delivering low-latency, high-throughput, and energy-efficient performance. This study presents a systematic, empirical comparison of GPU- and NPU-based server platforms across key AI inference domains: text-to-text, text-to-image, multimodal [...] Read more.

The exponential growth of AI applications has intensified the demand for efficient inference hardware capable of delivering low-latency, high-throughput, and energy-efficient performance. This study presents a systematic, empirical comparison of GPU- and NPU-based server platforms across key AI inference domains: text-to-text, text-to-image, multimodal understanding, and object detection. We configure representative models—LLama-family for text generation, Stable Diffusion variants for image synthesis, LLaVA-NeXT for multimodal tasks, and YOLO11 series for object detection—on a dual NVIDIA A100 GPU server and an eight-chip RBLN-CA12 NPU server. Performance metrics including latency, throughput, power consumption, and energy efficiency are measured under realistic workloads. Results demonstrate that NPUs match or exceed GPU throughput in many inference scenarios while consuming 35–70% less power. Moreover, optimization with the vLLM library on NPUs nearly doubles the tokens-per-second and yields a 92% increase in power efficiency. Our findings validate the potential of NPU-based inference architectures to reduce operational costs and energy footprints, offering a viable alternative to the prevailing GPU-dominated paradigm. Full article

(This article belongs to the Special Issue Data-Driven Analysis of Industrial Systems Using AI)

► Show Figures

Figure 1

28 pages, 7302 KB

Open AccessArticle

A Prototype of a Lightweight Structural Health Monitoring System Based on Edge Computing

by Yinhao Wang, Zhiyi Tang, Guangcai Qian, Wei Xu, Xiaomin Huang and Hao Fang

Sensors 2025, 25(18), 5612; https://doi.org/10.3390/s25185612 - 9 Sep 2025

Cited by 4 | Viewed by 2683

Abstract

Bridge Structural Health Monitoring (BSHM) is vital for assessing structural integrity and operational safety. Traditional wired systems are limited by high installation costs and complexity, while existing wireless systems still face issues with cost, synchronization, and reliability. Moreover, cloud-based methods for extreme event [...] Read more.

Bridge Structural Health Monitoring (BSHM) is vital for assessing structural integrity and operational safety. Traditional wired systems are limited by high installation costs and complexity, while existing wireless systems still face issues with cost, synchronization, and reliability. Moreover, cloud-based methods for extreme event detection struggle to meet real-time and bandwidth constraints in edge environments. To address these challenges, this study proposes a lightweight wireless BSHM system based on edge computing, enabling local data acquisition and real-time intelligent detection of extreme events. The system consists of wireless sensor nodes for front-end acceleration data collection and an intelligent hub for data storage, visualization, and earthquake recognition. Acceleration data are converted into time–frequency images to train a MobileNetV2-based model. With model quantization and Neural Processing Unit (NPU) acceleration, efficient on-device inference is achieved. Experiments on a laboratory steel bridge verify the system’s high acquisition accuracy, precise clock synchronization, and strong anti-interference performance. Compared with inference on a general-purpose ARM CPU running the unquantized model, the quantized model deployed on the NPU achieves a 26× speedup in inference, a 35% reduction in power consumption, and less than 1% accuracy loss. This solution provides a cost-effective, reliable BSHM framework for small-to-medium-sized bridges, offering local intelligence and rapid response with strong potential for real-world applications. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

21 pages, 3725 KB

Open AccessArticle

Pruning-Friendly RGB-T Semantic Segmentation for Real-Time Processing on Edge Devices

by Jun Young Hwang, Youn Joo Lee, Ho Gi Jung and Jae Kyu Suhr

Electronics 2025, 14(17), 3408; https://doi.org/10.3390/electronics14173408 - 27 Aug 2025

Viewed by 1774

Abstract

RGB-T semantic segmentation using thermal and RGB images simultaneously is actively being researched to robustly recognize the surrounding environment of vehicles regardless of challenging lighting and weather conditions. It is important for them to operate in real time on edge devices. As transformer-based [...] Read more.

RGB-T semantic segmentation using thermal and RGB images simultaneously is actively being researched to robustly recognize the surrounding environment of vehicles regardless of challenging lighting and weather conditions. It is important for them to operate in real time on edge devices. As transformer-based approaches, which most recent RGB-T semantic segmentation studies belong to, are very difficult to perform on edge devices, this paper considers only CNN-based RGB-T semantic segmentation networks that can be performed on edge devices and operated in real time. Although EAEFNet shows the best performance among CNN-based networks on edge devices, its inference speed is too slow for real-time operation. Furthermore, even when channel pruning is applied, the speed improvement is minimal. The analysis of EAEFNet identifies the intermediate fusion of RGB and thermal features and the high complexity of the decoder as the main causes. To address these issues, this paper proposes a network using a ResNet encoder with an early-fused four-channel input and the U-Net decoder structure. To improve the decoder performance, bilinear upsampling is replaced with PixelShuffle. Additionally, mini Atrous Spatial Pyramid Pooling (ASPP) and Progressive Transposed Module (PTM) modules are applied. Since the Proposed Network is primarily composed of convolutional layers, channel pruning is confirmed to be effectively applicable. Consequently, channel pruning significantly improves inference speed, and enables real-time operation on the neural processing unit (NPU) of edge devices. The Proposed Network is evaluated using the MFNet dataset, one of the most widely used public datasets for RGB-T semantic segmentation. It is shown that the proposed method achieves a performance comparable to EAEFNet while operating at over 30 FPS on an embedded board equipped with the Qualcomm QCS6490 SoC. Full article

(This article belongs to the Special Issue New Insights in 2D and 3D Object Detection and Semantic Segmentation)

► Show Figures

Figure 1

26 pages, 4049 KB

Open AccessArticle

A Versatile UAS Development Platform Able to Support a Novel Tracking Algorithm in Real-Time

by Dan-Marius Dobrea and Matei-Ștefan Dobrea

Aerospace 2025, 12(8), 649; https://doi.org/10.3390/aerospace12080649 - 22 Jul 2025

Viewed by 1774

Abstract

A primary objective of this research entails the development of an innovative algorithm capable of tracking a drone in real-time. This objective serves as a fundamental requirement across various applications, including collision avoidance, formation flying, and the interception of moving targets. Nonetheless, regardless [...] Read more.

A primary objective of this research entails the development of an innovative algorithm capable of tracking a drone in real-time. This objective serves as a fundamental requirement across various applications, including collision avoidance, formation flying, and the interception of moving targets. Nonetheless, regardless of the efficacy of any detection algorithm, achieving 100% performance remains unattainable. Deep neural networks (DNNs) were employed to enhance this performance. To facilitate real-time operation, the DNN must be executed within a Deep Learning Processing Unit (DPU), Neural Processing Unit (NPU), Tensor Processing Unit (TPU), or Graphics Processing Unit (GPU) system on board the UAV. Given the constraints of these processing units, it may be necessary to quantify the DNN or utilize a less complex variant, resulting in an additional reduction in performance. However, precise target detection at each control step is imperative for effective flight path control. By integrating multiple algorithms, the developed system can effectively track UAVs with improved detection performance. Furthermore, this paper aims to establish a versatile Unmanned Aerial System (UAS) development platform constructed using open-source components and possessing the capability to adapt and evolve seamlessly throughout the development and post-production phases. Full article

(This article belongs to the Section Aeronautics)

► Show Figures

Figure 1

38 pages, 1737 KB

Open AccessArticle

Deep Learning Scheduling on a Field-Programmable Gate Array Cluster Using Configurable Deep Learning Accelerators

by Tianyang Fang, Alejandro Perez-Vicente, Hans Johnson and Jafar Saniie

Information 2025, 16(4), 298; https://doi.org/10.3390/info16040298 - 8 Apr 2025

Cited by 2 | Viewed by 6209

Abstract

This paper presents the development and evaluation of a distributed system employing low-latency embedded field-programmable gate arrays (FPGAs) to optimize scheduling for deep learning (DL) workloads and to configure multiple deep learning accelerator (DLA) architectures. Aimed at advancing FPGA applications in real-time edge [...] Read more.

This paper presents the development and evaluation of a distributed system employing low-latency embedded field-programmable gate arrays (FPGAs) to optimize scheduling for deep learning (DL) workloads and to configure multiple deep learning accelerator (DLA) architectures. Aimed at advancing FPGA applications in real-time edge computing, this study focuses on achieving optimal latency for a distributed computing system. A novel methodology was adopted, using configurable hardware to examine clusters of DLAs, varying in architecture and scheduling techniques. The system demonstrated its capability to parallel-process diverse neural network (NN) models, manage compute graphs in a pipelined sequence, and allocate computational resources efficiently to intensive NN layers. We examined five configurable DLAs—Versatile Tensor Accelerator (VTA), Nvidia DLA (NVDLA), Xilinx Deep Processing Unit (DPU), Tensil Compute Unit (CU), and Pipelined Convolutional Neural Network (PipeCNN)—across two FPGA cluster types consisting of Zynq-7000 and Zynq UltraScale+ System-on-Chip (SoC) processors, respectively. Four deep neural network (DNN) workloads were tested: Scatter-Gather, AI Core Assignment, Pipeline Scheduling, and Fused Scheduling. These methods revealed an exponential decay in processing time up to 90% speedup, although deviations were noted depending on the workload and cluster configuration. This research substantiates FPGAs’ utility in adaptable, efficient DL deployment, setting a precedent for future experimental configurations and performance benchmarks. Full article

(This article belongs to the Special Issue Machine Learning and Data Mining: Innovations in Big Data Analytics)

► Show Figures

Figure 1

25 pages, 4715 KB

Open AccessArticle

PassRecover: A Multi-FPGA System for End-to-End Offline Password Recovery Acceleration

by Guangwei Xie, Xitian Fan, Zhongchen Huang, Wei Cao and Fan Zhang

Electronics 2025, 14(7), 1415; https://doi.org/10.3390/electronics14071415 - 31 Mar 2025

Cited by 3 | Viewed by 1974

Abstract

In the domain of password recovery, deep learning has emerged as a pivotal technology for enhancing recovery efficiency. Despite its effectiveness, the inherent computation complexity of deep learning-based password generation algorithms poses substantial challenges, particularly in achieving synergistic acceleration between deep learning inference, [...] Read more.

In the domain of password recovery, deep learning has emerged as a pivotal technology for enhancing recovery efficiency. Despite its effectiveness, the inherent computation complexity of deep learning-based password generation algorithms poses substantial challenges, particularly in achieving synergistic acceleration between deep learning inference, and plaintext encryption process. In this paper, we introduce PassRecover, a multi-FPGA-based computing system that can simultaneously accelerate deep learning-driven password generation and plaintext encryption in an end-to-end manner. The system architecture incorporates a neural processing unit (NPU) and an encryption array configured to operate under a streaming dataflow paradigm for parallel processing. It is the first approach to explore the benefit of end-to-end offline password recovery. For comprehensive evaluation, PassRecover is benchmarked against PassGAN and five industry-standard encryption algorithms (Office2010, Office2013, PDF1.7, Winzip, and RAR5). Experimental results demonstrate excellent performance: Compared to the latest work that only accelerate encryption algorithms, PassRecover achieves an average 101.5% speedup across all tested encryption algorithms. When compared to graphics processing unit (GPU)-based end-to-end implementations, this work delivers 93.01% faster processing speeds and

3.73 \times

superior energy efficiency. These results establish PassRecover as a promising solution for resource-constrained password recovery scenarios requiring high throughput and energy efficiency. Full article

► Show Figures

Figure 1

24 pages, 32213 KB

Open AccessArticle

ACMSPT: Automated Counting and Monitoring System for Poultry Tracking

by Edmanuel Cruz, Miguel Hidalgo-Rodriguez, Adiz Mariel Acosta-Reyes, José Carlos Rangel, Keyla Boniche and Franchesca Gonzalez-Olivardia

AgriEngineering 2025, 7(3), 86; https://doi.org/10.3390/agriengineering7030086 - 19 Mar 2025

Cited by 3 | Viewed by 6002

Abstract

The poultry industry faces significant challenges in efficiently monitoring large populations, especially under resource constraints and limited connectivity. This paper introduces the Automated Counting and Monitoring System for Poultry Tracking (ACMSPT), an innovative solution that integrates edge computing, Artificial Intelligence (AI), and the [...] Read more.

The poultry industry faces significant challenges in efficiently monitoring large populations, especially under resource constraints and limited connectivity. This paper introduces the Automated Counting and Monitoring System for Poultry Tracking (ACMSPT), an innovative solution that integrates edge computing, Artificial Intelligence (AI), and the Internet of Things (IoT). The study begins by collecting a custom dataset of 1300 high-resolution images from real broiler farm environments, encompassing diverse lighting conditions, occlusions, and growth stages. Each image was manually annotated and used to train the YOLOv10 object detection model with carefully selected hyperparameters. The trained model was then deployed on an Orange Pi 5B single-board computer equipped with a Neural Processing Unit (NPU), enabling on-site inference and real-time poultry tracking. System performance was evaluated in both small- and commercial-scale sheds, achieving a precision of 93.1% and recall of 93.0%, with an average inference time under 200 milliseconds. The results demonstrate that ACMSPT can autonomously detect anomalies in poultry movement, facilitating timely interventions while reducing manual labor. Moreover, its cost-effective, low-connectivity design supports broader adoption in remote or resource-limited environments. Future work will focus on improving adaptability to extreme conditions and extending this approach to other livestock management contexts. Full article

(This article belongs to the Special Issue Precision Farming Technologies for Monitoring Livestock and Poultry)

► Show Figures

Graphical abstract

31 pages, 2916 KB

Open AccessArticle

Physics-Guided Neural Network-Based Feedforward Control for Seamless Pipe Manufacturing Process

by Luka Filipović, Luka Miličić, Milan Ristanović, Vladan Dimitrijević and Petar Jovanović

Appl. Sci. 2025, 15(4), 2229; https://doi.org/10.3390/app15042229 - 19 Feb 2025

Cited by 1 | Viewed by 2555

Abstract

Artificial intelligence (AI) is increasingly being utilized in the industrial sector, revolutionizing traditional manufacturing processes with advanced automation systems. Despite their potential, neural networks have seen limited adoption in industrial control systems due to their lack of interpretability compared to traditional methods. The [...] Read more.

Artificial intelligence (AI) is increasingly being utilized in the industrial sector, revolutionizing traditional manufacturing processes with advanced automation systems. Despite their potential, neural networks have seen limited adoption in industrial control systems due to their lack of interpretability compared to traditional methods. The recently introduced physics-guided neural networks (PGNNs) address this limitation by embedding physical knowledge directly into the network structure, enhancing the interpretability and robustness. This study proposes a novel feedforward control framework that integrates a reduced-order physics-based model of a hydraulic actuator with a data-driven correction term for accurate force control in the seamless pipe manufacturing process. The coupled dynamics of the actuator and the continuously cast material being pushed into the piercing mill are identified through experimental data, and reduced-order models are developed for integration into the PGNN structure. The training of the networks is performed on a dataset from a scaled industrial hydraulic system, with the validation of the proposed methods conducted on a neural processing unit (NPU), a specialized industrial-grade platform for AI, operating within a PLC environment. The results demonstrate real-time execution with excellent force tracking, even with a limited training dataset—a typical constraint in industrial processes—while providing safer and more predictable behavior compared to traditional neural-network-only solutions. Full article

► Show Figures

Figure 1

Search Results (21)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (21)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI