Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (483)

Search Parameters:
Keywords = graphics processing unit (GPU)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
37 pages, 1745 KB  
Article
Boundary-Aware Contrastive Learning for Log Anomaly Detection
by Fouad Ailabouni, Jesús-Ángel Román-Gallego, María-Luisa Pérez-Delgado and Laura Grande Pérez
Appl. Sci. 2026, 16(7), 3208; https://doi.org/10.3390/app16073208 - 26 Mar 2026
Viewed by 102
Abstract
Log anomaly detection in modern distributed systems is challenging. Anomalous behaviors are rare. Manual labeling is expensive. Session boundaries are often set by fixed heuristics before model training. This fixed-boundary assumption is problematic because segmentation errors propagate into representation learning and cannot be [...] Read more.
Log anomaly detection in modern distributed systems is challenging. Anomalous behaviors are rare. Manual labeling is expensive. Session boundaries are often set by fixed heuristics before model training. This fixed-boundary assumption is problematic because segmentation errors propagate into representation learning and cannot be corrected during optimization. To address this, this paper proposes BASN (Boundary-Aware Sessionization Network), a boundary-aware contrastive learning framework that jointly learns session boundaries and anomaly representations using a differentiable soft-reset mechanism. BASN does not treat sessionization as a separate step. Instead, it predicts boundary probabilities from event semantics and temporal gaps, then modulates end-to-end session-state updates. The session representations are optimized with self-supervised contrastive learning, enabling effective zero-shot anomaly detection and few-shot adaptation. Experiments on four benchmark datasets (BGL, HDFS, OpenStack, SSH) show strong zero-shot performance (area under the receiver operating characteristic curve, AUROC 0.935–0.975) and boundary alignment with expert-validated proxy segmentation (boundary F1 0.825–0.877). Comparative gains over baselines are reported in the article after bibliography correction, baseline verification, and expanded statistical analysis. BASN is also computationally efficient, requiring less than 10 ms per session on a Graphics Processing Unit (GPU) and less than 45 ms on a Central Processing Unit (CPU). This is compatible with real-time inference needs in the evaluated settings. However, cross-system transfer AUROC (0.735–0.812) remains below in-domain performance. Domain-specific adaptation is still needed for deployment in environments that differ greatly from the training domain. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 2797 KB  
Article
Variation-Aware Memristor-Based Analog Accelerator for Vision Transformer
by Qianhou Qu, Sheng Lu, Liuting Shang, Sungyong Jung, Qilian Liang and Chenyun Pan
Electronics 2026, 15(5), 1116; https://doi.org/10.3390/electronics15051116 - 8 Mar 2026
Viewed by 303
Abstract
Vision transformers (ViTs) have emerged as one of the most popular computer vision models, achieving remarkable performance in image recognition. However, ViTs require large-scale, high-dimensional matrix computations, and traditional digital accelerators, such as graphics processing units (GPUs), have memory bandwidth limitations, leading to [...] Read more.
Vision transformers (ViTs) have emerged as one of the most popular computer vision models, achieving remarkable performance in image recognition. However, ViTs require large-scale, high-dimensional matrix computations, and traditional digital accelerators, such as graphics processing units (GPUs), have memory bandwidth limitations, leading to higher latency, increased energy consumption, and larger area. To address this challenge, this paper proposes a memristor-based analog accelerator that leverages memristor crossbar arrays for in-memory computing, reducing data movement and improving computational efficiency. Considering the non-ideal characteristics of memristor devices and the influence of analog circuitry, we incorporate Gaussian-distributed analog computation error at each step and memristor non-ideality modeling into the ViT inference to enable realistic evaluation under hardware-level conditions. Experimental evaluation on ImageNet-1k dataset with TIMM-pretrained ViT models shows that the proposed analog accelerator can achieve the same Top-1 accuracy as a custom-designed 5 nm digital baseline accelerator, even with ~35% analog computation error and ~10% memristor conductance variation injected at each step. Compared to the digital counterpart, the proposed design achieves an 11.9× reduction in energy-delay product (EDP) and a 137.2× reduction in energy-delay-area product (EDAP). Full article
(This article belongs to the Section Circuit and Signal Processing)
Show Figures

Figure 1

24 pages, 504 KB  
Article
Feasibility Study of CUDA-Accelerated Homomorphic Encryption and Benchmarking on Consumer-Grade and Embedded GPUs
by Volodymyr Dubetskyy and Maria-Dolores Cano
Big Data Cogn. Comput. 2026, 10(3), 79; https://doi.org/10.3390/bdcc10030079 - 6 Mar 2026
Viewed by 517
Abstract
Fully Homomorphic Encryption (FHE) provides strong data confidentiality during computation but often suffers from high latency on Central Processing Units (CPUs). This study evaluates Graphics Processing Unit (GPU) acceleration for modern FHE libraries across a laptop (NVIDIA GTX 1650 Ti), a server (NVIDIA [...] Read more.
Fully Homomorphic Encryption (FHE) provides strong data confidentiality during computation but often suffers from high latency on Central Processing Units (CPUs). This study evaluates Graphics Processing Unit (GPU) acceleration for modern FHE libraries across a laptop (NVIDIA GTX 1650 Ti), a server (NVIDIA RTX 4060), and a Jetson Nano 2 GB embedded GPU. We benchmark key generation, arithmetic operations, Boolean-gate evaluation and scheme-specific tasks such as relinearization and key switching, using library-provided benchmarks with an explicit baseline (operation scope, timing boundaries, and parameter tuples). Moreover, we compare GPU-native libraries (NuFHE, Phantom-FHE, and Troy-Nova) with CPU-oriented ones (Microsoft SEAL, HElib, OpenFHE, Cupcake, and TFHE-rs). Results show GPUs deliver significant speedups for targeted operations. For example, NuFHE’s NVIDIA CUDA (Compute Unified Device Architecture) backend achieves about 1.4× faster Boolean-gate evaluation on the laptop and 3.4× faster on the server compared to its OpenCL backend. Likewise, RLWE (Ring Learning With Errors)-based schemes (BFV, CKKS, and BGV) see marked gains for polynomial arithmetic such as Number Theoretic Transform (NTT) when executed via Phantom-FHE. However, attempts to add CUDA support to Microsoft SEAL reveal four main challenges: high-precision modular arithmetic on GPUs, sequential dependencies in SEAL’s design, limited GPU memory and complex build-system changes. In light of these findings, we propose revised guidelines for GPU-first FHE libraries and practical recommendations for deploying high-throughput, privacy-preserving solutions on modern GPUs. Full article
(This article belongs to the Section Big Data)
Show Figures

Figure 1

27 pages, 13085 KB  
Article
End-to-End Tool Path Generation for Triangular Mesh Surfaces in Five-Axis CNC Machining
by Shi-Chu Li, Hong-Yu Ma, Bo-Wen Zhang and Li-Yong Shen
AppliedMath 2026, 6(3), 35; https://doi.org/10.3390/appliedmath6030035 - 24 Feb 2026
Viewed by 387
Abstract
Triangular mesh surface representation is widely adopted in geometric design and reverse engineering applications. However, in high-precision Computer Numerical Control (CNC) machining, significant limitations persist in automated Computer-Aided Manufacturing (CAM) tool path generation for such representations. Conventional CAM workflows heavily rely on manual [...] Read more.
Triangular mesh surface representation is widely adopted in geometric design and reverse engineering applications. However, in high-precision Computer Numerical Control (CNC) machining, significant limitations persist in automated Computer-Aided Manufacturing (CAM) tool path generation for such representations. Conventional CAM workflows heavily rely on manual engineering interventions, such as creating drive surfaces or tuning extensive parameters—a dependency that becomes particularly acute for generic free-form models. To address this critical challenge, this paper proposes a novel end-to-end single-step end-milling tool path generation methodology for triangular mesh surfaces in high-precision five-axis CNC machining. The framework includes clustering analysis for optimal workpiece orientation, normal vector distribution analysis to identify shallow and steep regions, Graphics Processing Unit (GPU)-accelerated collision detection for feasible tool orientation domains, and iso-planar tool path generation with Traveling Salesman Problem (TSP) optimization for efficient tool lifting and movement. Experimental validation confirms the framework ensures machining quality and algorithmic robustness. Full article
(This article belongs to the Section Computational and Numerical Mathematics)
Show Figures

Figure 1

22 pages, 1546 KB  
Article
Multimodal Fusion Attention Network for Real-Time Obstacle Detection and Avoidance for Low-Altitude Aircraft
by Xiaoqi Xu and Yiyang Zhao
Symmetry 2026, 18(2), 384; https://doi.org/10.3390/sym18020384 - 22 Feb 2026
Viewed by 342
Abstract
The rapid expansion of low-altitude unmanned aerial vehicles demands robust obstacle detection and avoidance systems capable of operating under diverse environmental conditions. This paper proposes a multimodal fusion attention network that integrates visual imagery and Light Detection and Ranging (LiDAR) point cloud data [...] Read more.
The rapid expansion of low-altitude unmanned aerial vehicles demands robust obstacle detection and avoidance systems capable of operating under diverse environmental conditions. This paper proposes a multimodal fusion attention network that integrates visual imagery and Light Detection and Ranging (LiDAR) point cloud data for real-time obstacle perception. The architecture incorporates a bidirectional cross-modal attention mechanism that learns dynamic correspondences between heterogeneous sensor modalities, enabling adaptive feature integration based on contextual reliability. An adaptive weighting component automatically modulates modal contributions according to estimated sensor confidence under varying environmental conditions. The network further employs gated fusion units and multi-scale feature pyramids to ensure comprehensive obstacle representation across different distances. A hierarchical avoidance decision framework translates detection outputs into executable control commands through threat assessment and graduated response strategies. Experimental evaluation on both public benchmarks and a purpose-collected low-altitude obstacle dataset demonstrates that the proposed method achieves 84.9% mean Average Precision (mAP) while maintaining 47.3 frames per second (FPS) on Graphics Processing Unit (GPU) hardware and 23.6 FPS on embedded platforms. Ablation studies confirm the contribution of each architectural component, with cross-modal attention providing the most substantial performance improvement. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

17 pages, 1497 KB  
Article
SPARTA: Sparse Parallel Architecture for Real-Time Threat Analysis for Lightweight Edge Network Defense
by Shi Li, Xiyun Mi, Lin Zhang and Ye Lu
Future Internet 2026, 18(2), 88; https://doi.org/10.3390/fi18020088 - 6 Feb 2026
Viewed by 331
Abstract
AI-driven network security relies increasingly on Large Language Models (LLMs) to detect sophisticated threats; however, their deployment on resource-constrained edge devices is severely hindered by immense parameter scales. While unstructured pruning offers a theoretical reduction in model size, commodity Graphics Processing Unit (GPU) [...] Read more.
AI-driven network security relies increasingly on Large Language Models (LLMs) to detect sophisticated threats; however, their deployment on resource-constrained edge devices is severely hindered by immense parameter scales. While unstructured pruning offers a theoretical reduction in model size, commodity Graphics Processing Unit (GPU) architectures fail to efficiently leverage element-wise sparsity due to the mismatch between fine-grained pruning patterns and the coarse-grained parallelism of Tensor Cores, leading to latency bottlenecks that compromise real-time analysis of high-volume security telemetry. To bridge this gap, we propose SPARTA (Sparse Parallel Architecture for Real-Time Threat Analysis), an algorithm–architecture co-design framework. Specifically, we integrate a hardware-based address remapping interface to enable flexible row-offset access. This mechanism facilitates a novel graph-based column vector merging strategy that aligns sparse data with Tensor Core parallelism, complemented by a pipelined execution scheme to mask decoding latencies. Evaluations on Llama2-7B and Llama2-13B benchmarks demonstrate that SPARTA achieves an average speedup of 2.35× compared to Flash-LLM, with peak speedups reaching 5.05×. These findings indicate that hardware-aware microarchitectural adaptations can effectively mitigate the penalties of unstructured sparsity, providing a viable pathway for efficient deployment in resource-constrained edge security. Full article
(This article belongs to the Special Issue DDoS Attack Detection for Cyber–Physical Systems)
Show Figures

Figure 1

34 pages, 6747 KB  
Article
Lightweight Semantic Segmentation for Fermentation Foam Monitoring: A Comparative Study of U-Net, DeepLabV3+, Fast-SCNN, and SegNet
by Maksym Vihuro, Andriy Malyar, Grzegorz Litawa, Kamila Kluczewska-Chmielarz, Tatiana Konrad and Piotr Migo
Appl. Sci. 2026, 16(3), 1487; https://doi.org/10.3390/app16031487 - 2 Feb 2026
Viewed by 360
Abstract
This study aims to identify an effective neural network architecture for the task of semantic segmentation of the surface of beer wort at the stage of primary fermentation, using deep learning methodologies. Four contemporary architectures were evaluated and contrasted. The following networks are [...] Read more.
This study aims to identify an effective neural network architecture for the task of semantic segmentation of the surface of beer wort at the stage of primary fermentation, using deep learning methodologies. Four contemporary architectures were evaluated and contrasted. The following networks are presented in both baseline and optimized forms: U-Net, DeepLabV3+, Fast-SCNN, and SegNet. The models were trained on a dataset of images depicting real beer surfaces at the primary fermentation stage. This was followed by the validation of the models using key metrics, including pixel classification accuracy, Mean Intersection over Union (mIoU), Dice Coefficient, inference time per image, and Graphics Processing Unit (GPU) resource utilization. Results indicate that the optimized U-Net achieved the optimal balance between performance and efficiency, attaining a validation accuracy of 88.85%, mIoU of 76.72%, and a Dice score of 86.71%. With an inference time of 49.5 milliseconds per image, coupled with minimal GPU utilization (18%), the model proves suitable for real-time deployment in production environments. Conversely, complex architectures, such as DeepLabV3+, did not yield the anticipated benefits, thereby underscoring the viability of utilizing compact models for highly specialized industrial tasks. This study establishes a novel quantitative metric for the assessment of fermentation. This is based on the characteristics of the foam surface and thus offers an objective alternative to traditional subjective inspections. The findings emphasize the potential of adapting optimized deep learning architectures to quality control tasks within the food industry, particularly in the brewing sector, and they pave the way for further integration into automated computer vision systems. Full article
(This article belongs to the Special Issue Advances in Machine Vision for Industry and Agriculture)
Show Figures

Figure 1

17 pages, 2803 KB  
Article
GPU Ray Tracing Analysis of Plasma Plume Perturbations on Reflector Antenna Radiation Characteristics
by Yijing Wang, Weike Yin and Bing Wei
Symmetry 2026, 18(2), 243; https://doi.org/10.3390/sym18020243 - 29 Jan 2026
Viewed by 292
Abstract
During ion thruster operation, electromagnetic waves propagating through the plasma plume undergo absorption and refraction effects. This paper presents a graphics processing unit (GPU) parallel ray tracing (RT) algorithm for inhomogeneous media to analyze plasma plume-induced perturbations on the radiation characteristics of a [...] Read more.
During ion thruster operation, electromagnetic waves propagating through the plasma plume undergo absorption and refraction effects. This paper presents a graphics processing unit (GPU) parallel ray tracing (RT) algorithm for inhomogeneous media to analyze plasma plume-induced perturbations on the radiation characteristics of a satellite reflector antenna, substantially improving computational efficiency. This algorithm performs ray path tracing in the plume, with the vertex and central rays in each ray tube assigned to dedicated GPU threads. This enables the parallel computation of electromagnetic wave attenuation, phase, and polarization. By further applying aperture integration and the superposition principle, the influence of the plume on the far-field antenna radiation patterns is efficiently analyzed. Comparison with serial results validates the accuracy of the algorithm for plume calculation, achieving approximately 319 times speed-up for 586,928 ray tubes. Within the 2–5 GHz frequency range, the plume causes amplitude attenuation of less than 3 dB. This study provides an efficient solution for real-time analysis of plume-induced interference in satellite communications. Full article
(This article belongs to the Section Physics)
Show Figures

Figure 1

54 pages, 3083 KB  
Review
A Survey on Green Wireless Sensing: Energy-Efficient Sensing via WiFi CSI and Lightweight Learning
by Rod Koo, Xihao Liang, Deepak Mishra and Aruna Seneviratne
Energies 2026, 19(2), 573; https://doi.org/10.3390/en19020573 - 22 Jan 2026
Viewed by 638
Abstract
Conventional sensing expends energy at three stages: powering dedicated sensors, transmitting measurements, and executing computationally intensive inference. Wireless sensing re-purposes WiFi channel state information (CSI) inherent in every packet, eliminating extra sensors and uplink traffic, though reliance on deep neural networks (DNNs) often [...] Read more.
Conventional sensing expends energy at three stages: powering dedicated sensors, transmitting measurements, and executing computationally intensive inference. Wireless sensing re-purposes WiFi channel state information (CSI) inherent in every packet, eliminating extra sensors and uplink traffic, though reliance on deep neural networks (DNNs) often trained and run on graphics processing units (GPUs) can negate these gains. This review highlights two core energy efficiency levers in CSI-based wireless sensing. First ambient CSI harvesting cuts power use by an order of magnitude compared to radar and active Internet of Things (IoT) sensors. Second, integrated sensing and communication (ISAC) embeds sensing functionality into existing WiFi links, thereby reducing device count, battery waste, and carbon impact. We review conventional handcrafted and accuracy-first methods to set the stage for surveying green learning strategies and lightweight learning techniques, including compact hybrid neural architectures, pruning, knowledge distillation, quantisation, and semi-supervised training that preserve accuracy while reducing model size and memory footprint. We also discuss hardware co-design from low-power microcontrollers to edge application-specific integrated circuits (ASICs) and WiFi firmware extensions that align computation with platform constraints. Finally, we identify open challenges in domain-robust compression, multi-antenna calibration, energy-proportionate model scaling, and standardised joules per inference metrics. Our aim is a practical battery-friendly wireless sensing stack ready for smart home and 6G era deployments. Full article
Show Figures

Graphical abstract

22 pages, 5297 KB  
Article
A Space-Domain Gravity Forward Modeling Method Based on Voxel Discretization and Multiple Observation Surfaces
by Rui Zhang, Guiju Wu, Jiapei Wang, Yufei Xi, Fan Wang and Qinhong Long
Symmetry 2026, 18(1), 180; https://doi.org/10.3390/sym18010180 - 19 Jan 2026
Viewed by 443
Abstract
Geophysical forward modeling serves as a fundamental theoretical approach for characterizing subsurface structures and material properties, essentially involving the computation of gravity responses at surface or spatial observation points based on a predefined density distribution. With the rapid development of data-driven techniques such [...] Read more.
Geophysical forward modeling serves as a fundamental theoretical approach for characterizing subsurface structures and material properties, essentially involving the computation of gravity responses at surface or spatial observation points based on a predefined density distribution. With the rapid development of data-driven techniques such as deep learning in geophysical inversion, forward algorithms are facing increasing demands in terms of computational scale, observable types, and efficiency. To address these challenges, this study develops an efficient forward modeling method based on voxel discretization, the enabling rapid calculation of gravity anomalies and radial gravity gradients on multiple observational surfaces. Leveraging the parallel computing capabilities of graphics processing units (GPU), together with tensor acceleration, Compute Unified Device Architecture (CUDA) execution, and Just-in-time (JIT) compilation strategies, the method achieves high efficiency and automation in the forward computation process. Numerical experiments conducted on several typical theoretical models demonstrate the convergence and stability of the calculated results, indicating that the proposed method significantly reduces computation time while maintaining accuracy, thus being well-suited for large-scale 3D modeling and fast batch simulation tasks. This research can efficiently generate forward datasets with multi-view and multi-metric characteristics, providing solid data support and a scalable computational platform for deep-learning-based geophysical inversion studies. Full article
Show Figures

Figure 1

17 pages, 2889 KB  
Technical Note
Increasing Computational Efficiency of a River Ice Model to Help Investigate the Impact of Ice Booms on Ice Covers Formed in a Regulated River
by Karl-Erich Lindenschmidt, Mojtaba Jandaghian, Saber Ansari, Denise Sudom, Sergio Gomez, Stephany Valarezo Plaza, Amir Ali Khan, Thomas Puestow and Seok-Bum Ko
Water 2026, 18(2), 218; https://doi.org/10.3390/w18020218 - 14 Jan 2026
Viewed by 433
Abstract
The formation and stability of river ice covers in regulated waterways are critical for uninterrupted hydro-electric operations. This study investigates the modelling of ice cover development in the Beauharnois Canal along the St. Lawrence River with the presence and absence of ice booms. [...] Read more.
The formation and stability of river ice covers in regulated waterways are critical for uninterrupted hydro-electric operations. This study investigates the modelling of ice cover development in the Beauharnois Canal along the St. Lawrence River with the presence and absence of ice booms. Ice booms are deployed in this canal to promote the rapid formation of a stable ice cover during freezing events, minimizing disruptions to dam operations. Remote sensing data were used to assess the spatial extent and temporal evolution of an ice cover and to calibrate the river ice model RIVICE. The model was applied to simulate ice formation for the 2019–2020 ice season, first for the canal with a series of three ice booms and then rerun under a scenario without booms. Comparative analysis reveals that the presence of ice booms facilitates the development of a relatively thinner and more uniform ice cover. In contrast, the absence of booms leads to thicker ice accumulations and increased risk of ice jamming, which could impact water management and hydroelectric generation operations. Computational efficiencies of the RIVICE model were also sought. RIVICE was originally compiled with a Fortran 77 compiler, which restricted modern optimization techniques. Recompiling with NVFortran significantly improved performance through advanced instruction scheduling, cache management, and automatic loop analysis, even without explicit optimization flags. Enabling optimization further accelerated execution, albeit marginally, reducing redundant operations and memory traffic while preserving numerical integrity. Tests across varying ice cross-sectional spacings confirmed that NVFortran reduced runtimes by roughly an order of magnitude compared to the original model. A test GPU (Graphics Processing Unit) version was able to run the data interpolation routines on the GPU, but frequent data transfers between the CPU (Central Processing Unit) and GPU caused by shared memory blocks and fixed-size arrays made it slower than the original CPU version. Achieving efficient GPU execution would require substantial code restructuring to eliminate global states, adopt persistent data regions, and parallelize at higher level loops, or alternatively, rewriting in a GPU-friendly language to fully exploit modern architectures. Full article
Show Figures

Figure 1

12 pages, 279 KB  
Perspective
Energy Demand, Infrastructure Needs and Environmental Impacts of Cryptocurrency Mining and Artificial Intelligence: A Comparative Perspective
by Marian Cătălin Voica, Mirela Panait and Ștefan Virgil Iacob
Energies 2026, 19(2), 338; https://doi.org/10.3390/en19020338 - 9 Jan 2026
Viewed by 1294
Abstract
This perspective paper aims to set the stage for current development in the field of energy consumption and environmental impacts in two major digital industries: cryptocurrency mining and artificial intelligence (AI). To better understand current developments, this paper uses a comparative analytical framework [...] Read more.
This perspective paper aims to set the stage for current development in the field of energy consumption and environmental impacts in two major digital industries: cryptocurrency mining and artificial intelligence (AI). To better understand current developments, this paper uses a comparative analytical framework of life-cycle assessment principles and high-resolution grid modeling to explore the energy impacts from academic and industry data. On the one hand, while both sectors convert energy into digital value, they operate according to completely different logics, in the sense that cryptocurrencies rely on specialized hardware (application-specific integrated circuits) and seek cheap energy, where they can function as “virtual batteries” for the network, quickly shutting down at peak times, with increasing hardware efficiency. On the other hand, AI is a much more rigid emerging energy consumer, in the sense that it needs high-quality, uninterrupted energy and advanced infrastructure for high-performance Graphics Processing Units (GPUs). The training and inference stages generate massive consumption, difficult to quantify, and AI data centers put great pressure on the electricity grid. In this sense, the transition from mining to AI is limited due to differences in infrastructure, with the only reusable advantage being access to electrical capacity. Regarding competition between the two industries, this dynamic can fragment the energy grid, as AI tends to monopolize quality energy, and how states will manage this imbalance will influence the energy and digital security of the next decade. Full article
15 pages, 659 KB  
Article
Context-Aware Road Event Detection Using Hybrid CNN–BiLSTM Networks
by Abiel Aguilar-González and Alejandro Medina Santiago
Vehicles 2026, 8(1), 4; https://doi.org/10.3390/vehicles8010004 - 2 Jan 2026
Viewed by 968
Abstract
Road anomaly detection is essential for intelligent transportation systems and road maintenance. This work presents a MATLAB-native hybrid Convolutional Neural Network–Bidirectional Long Short-Term Memory (CNN–BiLSTM) framework for context-aware road event detection using multiaxial acceleration and vibration signals. The proposed architecture integrates short-term feature [...] Read more.
Road anomaly detection is essential for intelligent transportation systems and road maintenance. This work presents a MATLAB-native hybrid Convolutional Neural Network–Bidirectional Long Short-Term Memory (CNN–BiLSTM) framework for context-aware road event detection using multiaxial acceleration and vibration signals. The proposed architecture integrates short-term feature extraction via one-dimensional convolutional layers with bidirectional LSTM-based temporal modeling, enabling simultaneous capture of instantaneous signal morphology and long-range dependencies across driving trajectories. Multiaxial data were acquired at 50 Hz using an AQ-1 On-Board Diagnostics II (OBDII) Data Logger during urban and suburban routes in San Andrés Cholula, Puebla, Mexico. Our hybrid CNN–BiLSTM model achieved a global accuracy of 95.91% and a macro F1-score of 0.959. Per-class F1-scores ranged from 0.932 (none) to 0.981 (pothole), with specificity values above 0.98 for all event categories. Qualitative analysis demonstrates that this architecture outperforms previous CNN-only vibration-based models by approximately 2–3% in macro F1-score while maintaining balanced precision and recall across all event types. Visualization of BiLSTM activations highlights enhanced interpretability and contextual discrimination, particularly for events with similar short-term signatures. Further, the proposed framework’s low computational overhead and compatibility with MATLAB Graphics Processing Unit (GPU) Coder support its feasibility for real-time embedded deployment. These results demonstrate the effectiveness and robustness of our hybrid CNN–BiLSTM approach for road anomaly detection using only acceleration and vibration signals, establishing a validated continuation of previous CNN-based research. Beyond the experimental validation, the proposed framework provides a practical foundation for real-time pavement monitoring systems and can support intelligent transportation applications such as preventive road maintenance, driver assistance, and large-scale deployment on low-power embedded platforms. Full article
Show Figures

Figure 1

23 pages, 13345 KB  
Article
Neural-Based Controller on Low-Density FPGAs for Dynamic Systems
by Edson E. Cruz-Miguel, José R. García-Martínez, Jorge Orrante-Sakanassi, José M. Álvarez-Alvarado, Omar A. Barra-Vázquez and Juvenal Rodríguez-Reséndiz
Electronics 2026, 15(1), 198; https://doi.org/10.3390/electronics15010198 - 1 Jan 2026
Cited by 1 | Viewed by 416
Abstract
This work introduces a logic resource-efficient Artificial Neural Network (ANN) controller for embedded control applications on low-density Field-Programmable Gate Array (FPGA) platforms. The proposed design relies on 32-bit fixed-point arithmetic and incorporates an online learning mechanism, enabling the controller to adapt to system [...] Read more.
This work introduces a logic resource-efficient Artificial Neural Network (ANN) controller for embedded control applications on low-density Field-Programmable Gate Array (FPGA) platforms. The proposed design relies on 32-bit fixed-point arithmetic and incorporates an online learning mechanism, enabling the controller to adapt to system variations while maintaining low hardware complexity. Unlike conventional artificial intelligence solutions that require high-performance processors or Graphics Processing Units (GPUs), the proposed approach targets platforms with limited logic, memory, and computational resources. The ANN controller was described using a Hardware Description Language (HDL) and validated via cosimulation between ModelSim and Simulink. A practical comparison was also made between Proportional-Integral-Derivative (PID) control and an ANN for motor position control. The results confirm that the architecture efficiently utilizes FPGA resources, consuming approximately 50% of the available Digital Signal Processor (DSP) units, less than 40% of logic cells, and only 6% of embedded memory blocks. Owing to its modular design, the architecture is inherently scalable, allowing additional inputs or hidden-layer neurons to be incorporated with minimal impact on overall resource usage. Additionally, the computational latency can be precisely determined and scales with (16n+39)m+31 clock cycles, enabling precise timing analysis and facilitating integration into real-time embedded control systems. Full article
Show Figures

Figure 1

22 pages, 3408 KB  
Article
A High-Performance Branch Control Mechanism for GPGPU Based on RISC-V Architecture
by Yao Cheng, Yi Man and Xinbing Zhou
Electronics 2026, 15(1), 125; https://doi.org/10.3390/electronics15010125 - 26 Dec 2025
Viewed by 607
Abstract
General-Purpose Graphics Processing Units (GPGPUs) rely on warp scheduling and control flow management to organize parallel thread execution, making efficient control flow mechanisms essential for modern GPGPU design. Currently, the mainstream RISC-V GPGPU Vortex adopts the Single Instruction Multiple Threads (SIMT) stack control [...] Read more.
General-Purpose Graphics Processing Units (GPGPUs) rely on warp scheduling and control flow management to organize parallel thread execution, making efficient control flow mechanisms essential for modern GPGPU design. Currently, the mainstream RISC-V GPGPU Vortex adopts the Single Instruction Multiple Threads (SIMT) stack control mechanism. This approach introduces high complexity and performance overhead, becoming a major limitation for further improving control efficiency. To address this issue, this paper proposes a thread-mask-based branch control mechanism for the RISC-V architecture. The mechanism introduces explicit mask primitives at the Instruction Set Architecture (ISA) level and directly manages the active status of threads within a warp through logical operations, enabling branch execution without jumps and thus reducing the overhead of the original control flow mechanism. Unlike traditional thread mask mechanisms in GPUs, our design centers on RISC-V and realizes co-optimization at both the ISA and microarchitecture levels. The mechanism was modeled and validated on Vortex SimX. Experimental results show that, compared with the Vortex SIMT stack mechanism, the proposed approach maintains correct control semantics while reducing branch execution cycles by an average of 31% and up to 40%, providing a new approach for RISC-V GPGPU control flow optimization. Full article
Show Figures

Figure 1

Back to TopTop