Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (475)

Search Parameters:
Keywords = graphics processing units (GPU)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
55 pages, 3089 KB  
Review
A Survey on Green Wireless Sensing: Energy-Efficient Sensing via WiFi CSI and Lightweight Learning
by Rod Koo, Xihao Liang, Deepak Mishra and Aruna Seneviratne
Energies 2026, 19(2), 573; https://doi.org/10.3390/en19020573 - 22 Jan 2026
Abstract
Conventional sensing expends energy at three stages: powering dedicated sensors, transmitting measurements, and executing computationally intensive inference. Wireless sensing re-purposes WiFi channel state information (CSI) inherent in every packet, eliminating extra sensors and uplink traffic, though reliance on deep neural networks (DNNs) often [...] Read more.
Conventional sensing expends energy at three stages: powering dedicated sensors, transmitting measurements, and executing computationally intensive inference. Wireless sensing re-purposes WiFi channel state information (CSI) inherent in every packet, eliminating extra sensors and uplink traffic, though reliance on deep neural networks (DNNs) often trained and run on graphics processing units (GPUs) can negate these gains. This review highlights two core energy efficiency levers in CSI-based wireless sensing. First ambient CSI harvesting cuts power use by an order of magnitude compared to radar and active Internet of Things (IoT) sensors. Second, integrated sensing and communication (ISAC) embeds sensing functionality into existing WiFi links, thereby reducing device count, battery waste, and carbon impact. We review conventional handcrafted and accuracy-first methods to set the stage for surveying green learning strategies and lightweight learning techniques, including compact hybrid neural architectures, pruning, knowledge distillation, quantisation, and semi-supervised training that preserve accuracy while reducing model size and memory footprint. We also discuss hardware co-design from low-power microcontrollers to edge application-specific integrated circuits (ASICs) and WiFi firmware extensions that align computation with platform constraints. Finally, we identify open challenges in domain-robust compression, multi-antenna calibration, energy-proportionate model scaling, and standardised joules per inference metrics. Our aim is a practical battery-friendly wireless sensing stack ready for smart home and 6G era deployments. Full article
Show Figures

Graphical abstract

22 pages, 5297 KB  
Article
A Space-Domain Gravity Forward Modeling Method Based on Voxel Discretization and Multiple Observation Surfaces
by Rui Zhang, Guiju Wu, Jiapei Wang, Yufei Xi, Fan Wang and Qinhong Long
Symmetry 2026, 18(1), 180; https://doi.org/10.3390/sym18010180 - 19 Jan 2026
Viewed by 189
Abstract
Geophysical forward modeling serves as a fundamental theoretical approach for characterizing subsurface structures and material properties, essentially involving the computation of gravity responses at surface or spatial observation points based on a predefined density distribution. With the rapid development of data-driven techniques such [...] Read more.
Geophysical forward modeling serves as a fundamental theoretical approach for characterizing subsurface structures and material properties, essentially involving the computation of gravity responses at surface or spatial observation points based on a predefined density distribution. With the rapid development of data-driven techniques such as deep learning in geophysical inversion, forward algorithms are facing increasing demands in terms of computational scale, observable types, and efficiency. To address these challenges, this study develops an efficient forward modeling method based on voxel discretization, the enabling rapid calculation of gravity anomalies and radial gravity gradients on multiple observational surfaces. Leveraging the parallel computing capabilities of graphics processing units (GPU), together with tensor acceleration, Compute Unified Device Architecture (CUDA) execution, and Just-in-time (JIT) compilation strategies, the method achieves high efficiency and automation in the forward computation process. Numerical experiments conducted on several typical theoretical models demonstrate the convergence and stability of the calculated results, indicating that the proposed method significantly reduces computation time while maintaining accuracy, thus being well-suited for large-scale 3D modeling and fast batch simulation tasks. This research can efficiently generate forward datasets with multi-view and multi-metric characteristics, providing solid data support and a scalable computational platform for deep-learning-based geophysical inversion studies. Full article
Show Figures

Figure 1

17 pages, 2889 KB  
Technical Note
Increasing Computational Efficiency of a River Ice Model to Help Investigate the Impact of Ice Booms on Ice Covers Formed in a Regulated River
by Karl-Erich Lindenschmidt, Mojtaba Jandaghian, Saber Ansari, Denise Sudom, Sergio Gomez, Stephany Valarezo Plaza, Amir Ali Khan, Thomas Puestow and Seok-Bum Ko
Water 2026, 18(2), 218; https://doi.org/10.3390/w18020218 - 14 Jan 2026
Viewed by 184
Abstract
The formation and stability of river ice covers in regulated waterways are critical for uninterrupted hydro-electric operations. This study investigates the modelling of ice cover development in the Beauharnois Canal along the St. Lawrence River with the presence and absence of ice booms. [...] Read more.
The formation and stability of river ice covers in regulated waterways are critical for uninterrupted hydro-electric operations. This study investigates the modelling of ice cover development in the Beauharnois Canal along the St. Lawrence River with the presence and absence of ice booms. Ice booms are deployed in this canal to promote the rapid formation of a stable ice cover during freezing events, minimizing disruptions to dam operations. Remote sensing data were used to assess the spatial extent and temporal evolution of an ice cover and to calibrate the river ice model RIVICE. The model was applied to simulate ice formation for the 2019–2020 ice season, first for the canal with a series of three ice booms and then rerun under a scenario without booms. Comparative analysis reveals that the presence of ice booms facilitates the development of a relatively thinner and more uniform ice cover. In contrast, the absence of booms leads to thicker ice accumulations and increased risk of ice jamming, which could impact water management and hydroelectric generation operations. Computational efficiencies of the RIVICE model were also sought. RIVICE was originally compiled with a Fortran 77 compiler, which restricted modern optimization techniques. Recompiling with NVFortran significantly improved performance through advanced instruction scheduling, cache management, and automatic loop analysis, even without explicit optimization flags. Enabling optimization further accelerated execution, albeit marginally, reducing redundant operations and memory traffic while preserving numerical integrity. Tests across varying ice cross-sectional spacings confirmed that NVFortran reduced runtimes by roughly an order of magnitude compared to the original model. A test GPU (Graphics Processing Unit) version was able to run the data interpolation routines on the GPU, but frequent data transfers between the CPU (Central Processing Unit) and GPU caused by shared memory blocks and fixed-size arrays made it slower than the original CPU version. Achieving efficient GPU execution would require substantial code restructuring to eliminate global states, adopt persistent data regions, and parallelize at higher level loops, or alternatively, rewriting in a GPU-friendly language to fully exploit modern architectures. Full article
Show Figures

Figure 1

12 pages, 279 KB  
Perspective
Energy Demand, Infrastructure Needs and Environmental Impacts of Cryptocurrency Mining and Artificial Intelligence: A Comparative Perspective
by Marian Cătălin Voica, Mirela Panait and Ștefan Virgil Iacob
Energies 2026, 19(2), 338; https://doi.org/10.3390/en19020338 - 9 Jan 2026
Viewed by 348
Abstract
This perspective paper aims to set the stage for current development in the field of energy consumption and environmental impacts in two major digital industries: cryptocurrency mining and artificial intelligence (AI). To better understand current developments, this paper uses a comparative analytical framework [...] Read more.
This perspective paper aims to set the stage for current development in the field of energy consumption and environmental impacts in two major digital industries: cryptocurrency mining and artificial intelligence (AI). To better understand current developments, this paper uses a comparative analytical framework of life-cycle assessment principles and high-resolution grid modeling to explore the energy impacts from academic and industry data. On the one hand, while both sectors convert energy into digital value, they operate according to completely different logics, in the sense that cryptocurrencies rely on specialized hardware (application-specific integrated circuits) and seek cheap energy, where they can function as “virtual batteries” for the network, quickly shutting down at peak times, with increasing hardware efficiency. On the other hand, AI is a much more rigid emerging energy consumer, in the sense that it needs high-quality, uninterrupted energy and advanced infrastructure for high-performance Graphics Processing Units (GPUs). The training and inference stages generate massive consumption, difficult to quantify, and AI data centers put great pressure on the electricity grid. In this sense, the transition from mining to AI is limited due to differences in infrastructure, with the only reusable advantage being access to electrical capacity. Regarding competition between the two industries, this dynamic can fragment the energy grid, as AI tends to monopolize quality energy, and how states will manage this imbalance will influence the energy and digital security of the next decade. Full article
15 pages, 659 KB  
Article
Context-Aware Road Event Detection Using Hybrid CNN–BiLSTM Networks
by Abiel Aguilar-González and Alejandro Medina Santiago
Vehicles 2026, 8(1), 4; https://doi.org/10.3390/vehicles8010004 - 2 Jan 2026
Viewed by 243
Abstract
Road anomaly detection is essential for intelligent transportation systems and road maintenance. This work presents a MATLAB-native hybrid Convolutional Neural Network–Bidirectional Long Short-Term Memory (CNN–BiLSTM) framework for context-aware road event detection using multiaxial acceleration and vibration signals. The proposed architecture integrates short-term feature [...] Read more.
Road anomaly detection is essential for intelligent transportation systems and road maintenance. This work presents a MATLAB-native hybrid Convolutional Neural Network–Bidirectional Long Short-Term Memory (CNN–BiLSTM) framework for context-aware road event detection using multiaxial acceleration and vibration signals. The proposed architecture integrates short-term feature extraction via one-dimensional convolutional layers with bidirectional LSTM-based temporal modeling, enabling simultaneous capture of instantaneous signal morphology and long-range dependencies across driving trajectories. Multiaxial data were acquired at 50 Hz using an AQ-1 On-Board Diagnostics II (OBDII) Data Logger during urban and suburban routes in San Andrés Cholula, Puebla, Mexico. Our hybrid CNN–BiLSTM model achieved a global accuracy of 95.91% and a macro F1-score of 0.959. Per-class F1-scores ranged from 0.932 (none) to 0.981 (pothole), with specificity values above 0.98 for all event categories. Qualitative analysis demonstrates that this architecture outperforms previous CNN-only vibration-based models by approximately 2–3% in macro F1-score while maintaining balanced precision and recall across all event types. Visualization of BiLSTM activations highlights enhanced interpretability and contextual discrimination, particularly for events with similar short-term signatures. Further, the proposed framework’s low computational overhead and compatibility with MATLAB Graphics Processing Unit (GPU) Coder support its feasibility for real-time embedded deployment. These results demonstrate the effectiveness and robustness of our hybrid CNN–BiLSTM approach for road anomaly detection using only acceleration and vibration signals, establishing a validated continuation of previous CNN-based research. Beyond the experimental validation, the proposed framework provides a practical foundation for real-time pavement monitoring systems and can support intelligent transportation applications such as preventive road maintenance, driver assistance, and large-scale deployment on low-power embedded platforms. Full article
Show Figures

Figure 1

23 pages, 13345 KB  
Article
Neural-Based Controller on Low-Density FPGAs for Dynamic Systems
by Edson E. Cruz-Miguel, José R. García-Martínez, Jorge Orrante-Sakanassi, José M. Álvarez-Alvarado, Omar A. Barra-Vázquez and Juvenal Rodríguez-Reséndiz
Electronics 2026, 15(1), 198; https://doi.org/10.3390/electronics15010198 - 1 Jan 2026
Viewed by 183
Abstract
This work introduces a logic resource-efficient Artificial Neural Network (ANN) controller for embedded control applications on low-density Field-Programmable Gate Array (FPGA) platforms. The proposed design relies on 32-bit fixed-point arithmetic and incorporates an online learning mechanism, enabling the controller to adapt to system [...] Read more.
This work introduces a logic resource-efficient Artificial Neural Network (ANN) controller for embedded control applications on low-density Field-Programmable Gate Array (FPGA) platforms. The proposed design relies on 32-bit fixed-point arithmetic and incorporates an online learning mechanism, enabling the controller to adapt to system variations while maintaining low hardware complexity. Unlike conventional artificial intelligence solutions that require high-performance processors or Graphics Processing Units (GPUs), the proposed approach targets platforms with limited logic, memory, and computational resources. The ANN controller was described using a Hardware Description Language (HDL) and validated via cosimulation between ModelSim and Simulink. A practical comparison was also made between Proportional-Integral-Derivative (PID) control and an ANN for motor position control. The results confirm that the architecture efficiently utilizes FPGA resources, consuming approximately 50% of the available Digital Signal Processor (DSP) units, less than 40% of logic cells, and only 6% of embedded memory blocks. Owing to its modular design, the architecture is inherently scalable, allowing additional inputs or hidden-layer neurons to be incorporated with minimal impact on overall resource usage. Additionally, the computational latency can be precisely determined and scales with (16n+39)m+31 clock cycles, enabling precise timing analysis and facilitating integration into real-time embedded control systems. Full article
Show Figures

Figure 1

22 pages, 3408 KB  
Article
A High-Performance Branch Control Mechanism for GPGPU Based on RISC-V Architecture
by Yao Cheng, Yi Man and Xinbing Zhou
Electronics 2026, 15(1), 125; https://doi.org/10.3390/electronics15010125 - 26 Dec 2025
Viewed by 273
Abstract
General-Purpose Graphics Processing Units (GPGPUs) rely on warp scheduling and control flow management to organize parallel thread execution, making efficient control flow mechanisms essential for modern GPGPU design. Currently, the mainstream RISC-V GPGPU Vortex adopts the Single Instruction Multiple Threads (SIMT) stack control [...] Read more.
General-Purpose Graphics Processing Units (GPGPUs) rely on warp scheduling and control flow management to organize parallel thread execution, making efficient control flow mechanisms essential for modern GPGPU design. Currently, the mainstream RISC-V GPGPU Vortex adopts the Single Instruction Multiple Threads (SIMT) stack control mechanism. This approach introduces high complexity and performance overhead, becoming a major limitation for further improving control efficiency. To address this issue, this paper proposes a thread-mask-based branch control mechanism for the RISC-V architecture. The mechanism introduces explicit mask primitives at the Instruction Set Architecture (ISA) level and directly manages the active status of threads within a warp through logical operations, enabling branch execution without jumps and thus reducing the overhead of the original control flow mechanism. Unlike traditional thread mask mechanisms in GPUs, our design centers on RISC-V and realizes co-optimization at both the ISA and microarchitecture levels. The mechanism was modeled and validated on Vortex SimX. Experimental results show that, compared with the Vortex SIMT stack mechanism, the proposed approach maintains correct control semantics while reducing branch execution cycles by an average of 31% and up to 40%, providing a new approach for RISC-V GPGPU control flow optimization. Full article
Show Figures

Figure 1

31 pages, 8756 KB  
Article
Mammogram Analysis with YOLO Models on an Affordable Embedded System
by Anongnat Intasam, Nicholas Piyawattanametha, Yuttachon Promworn, Titipon Jiranantanakorn, Soonthorn Thawornwanchai, Pakpawee Pichayakul, Sarawan Sriwanichwiphat, Somchai Thanasitthichai, Sirihattaya Khwayotha, Methininat Lertkowit, Nucharee Phakwapee, Aniwat Juhong and Wibool Piyawattanametha
Cancers 2026, 18(1), 70; https://doi.org/10.3390/cancers18010070 - 25 Dec 2025
Viewed by 409
Abstract
Background/Objectives: Breast cancer persists as a leading cause of female mortality globally. Mammograms are a key screening tool for early detection, although many resource-limited hospitals lack access to skilled radiologists and advanced diagnostic tools. Deep learning-based computer-aided detection (CAD) systems can assist radiologists [...] Read more.
Background/Objectives: Breast cancer persists as a leading cause of female mortality globally. Mammograms are a key screening tool for early detection, although many resource-limited hospitals lack access to skilled radiologists and advanced diagnostic tools. Deep learning-based computer-aided detection (CAD) systems can assist radiologists by automating lesion detection and classification. This study investigates the performance of various You Only Look Once (YOLO) models and a Hybrid Convolutional-Transformer Architecture (YOLOv5, YOLOv8, YOLOv10, YOLOv11, and Real-Time-DEtection Transformer (RT-DETR)) for detecting mammographic lesions on an affordable embedded system. Methods: We developed a custom web-based annotation tool to enhance mammogram labeling accuracy, using a dataset of 3169 patients from Thailand and expert annotations from three radiologists. Lesions were classified into six categories: Masses Benign (MB), Calcifications Benign (CB), Associated Features Benign (AFB), Masses Malignant (MM), Calcifications Malignant (CM), and Associated Features Malignant (AFM). Results: Our results show that the YOLOv11n model is the optimal choice for the NVIDIA Jetson Nano, achieving an accuracy of 0.86 and an inference speed of 6.16 ± 0.31 frames per second. A comparative analysis with a graphics processing unit (GPU)-powered system revealed that the Jetson Nano achieves comparable detection performance at a fraction of the cost. Conclusions: The current research landscape has not yet integrated advanced YOLO versions for embedded deployment in mammography. This method could facilitate screening in clinics without high-end workstations, demonstrating the feasibility of deploying CAD systems in low-resource environments and underscoring its potential for real-world clinical applications. Full article
(This article belongs to the Section Methods and Technologies Development)
Show Figures

Figure 1

8 pages, 446 KB  
Proceeding Paper
Enhanced Early Detection of Epileptic Seizures Through Advanced Line Spectral Estimation and XGBoost Machine Learning
by K. Rama Krishna and B. B. Shabarinath
Comput. Sci. Math. Forum 2025, 12(1), 4; https://doi.org/10.3390/cmsf2025012004 - 17 Dec 2025
Viewed by 336
Abstract
This paper proposes a fast epileptic seizure detection method to allow for early clinical intervention. The primary goal is to enhance computational and predictive performance to make the method viable for online implementation. An advanced Line Spectral Estimation (LSE)-based method for EEG analysis [...] Read more.
This paper proposes a fast epileptic seizure detection method to allow for early clinical intervention. The primary goal is to enhance computational and predictive performance to make the method viable for online implementation. An advanced Line Spectral Estimation (LSE)-based method for EEG analysis was developed with Bayesian inference and Toeplitz structure-based fast inversion with Capon and non-uniform Fourier transforms to reduce computational requirements. XGBoost classifier with parallel boosting was employed to increase prediction performance. The method was tested with patients’ EEG data using multiple embedded Graphic Processing Unit (GPU) platforms and achieved 95.5% accuracy, and 23.48 and 33.46 min average and maximum lead times before a seizure, respectively. The sensitivity and specificity values (92.23% and 93.38%) show the method to be reliable. The integration of LSE and XGBoost can be extended to create an efficient and practical online seizure detection and management tool. Full article
Show Figures

Figure 1

13 pages, 729 KB  
Article
A Single-Neuron-per-Class Readout for Image-Encoded Sensor Time Series
by David Bernal-Casas and Jaime Gallego
Mathematics 2025, 13(24), 3893; https://doi.org/10.3390/math13243893 - 5 Dec 2025
Viewed by 325
Abstract
We introduce an ultra-compact, single-neuron-per-class end-to-end readout for binary classification of noisy, image-encoded sensor time series. The approach compares a linear single-unit perceptron (E2E-MLP-1) with a resonate-and-fire (RAF) neuron (E2E-RAF-1), which merges feature selection and decision-making in a single block. Beyond empirical evaluation, [...] Read more.
We introduce an ultra-compact, single-neuron-per-class end-to-end readout for binary classification of noisy, image-encoded sensor time series. The approach compares a linear single-unit perceptron (E2E-MLP-1) with a resonate-and-fire (RAF) neuron (E2E-RAF-1), which merges feature selection and decision-making in a single block. Beyond empirical evaluation, we provide a mathematical analysis of the RAF readout: starting from its subthreshold ordinary differential equation, we derive the transfer function H(jω), characterize the frequency response, and relate the output signal-to-noise ratio (SNR) to |H(jω)|2 and the noise power spectral density Sn(ω)ωα (brown, pink, and blue noise). We present a stable discrete-time implementation compatible with surrogate gradient training and discuss the associated stability constraints. As a case study, we classify walk-in-place (WIP) in a virtual reality (VR) environment, a vision-based motion encoding (72 × 56 grayscale) derived from 3D trajectories, comprising 44,084 samples from 15 participants. On clean data, both single-neuron-per-class models approach ceiling accuracy. At the same time, under colored noise, the RAF readout yields consistent gains (typically +5–8% absolute accuracy at medium/high perturbations), indicative of intrinsic band-selective filtering induced by resonance. With ∼8 k parameters and sub-2 ms inference on commodity graphical processing units (GPUs), the RAF readout provides a mathematically grounded, robust, and efficient alternative for stochastic signal processing across domains, with virtual reality locomotion used here as an illustrative validation. Full article
(This article belongs to the Special Issue Computer Vision, Image Processing Technologies and Machine Learning)
Show Figures

Figure 1

28 pages, 1010 KB  
Review
Recent Advances in B-Mode Ultrasound Simulators
by Cindy M. Solano-Cordero, Nerea Encina-Baranda, Mailyn Pérez-Liva and Joaquin L. Herraiz
Appl. Sci. 2025, 15(23), 12535; https://doi.org/10.3390/app152312535 - 26 Nov 2025
Viewed by 1218
Abstract
Ultrasound (US) imaging is one of the most accessible, non-invasive, and real-time diagnostic techniques in clinical medicine. However, conventional B-mode US suffers from intrinsic limitations such as speckle noise, operator dependence, and variability in image interpretation, which reduce diagnostic reproducibility and hinder skill [...] Read more.
Ultrasound (US) imaging is one of the most accessible, non-invasive, and real-time diagnostic techniques in clinical medicine. However, conventional B-mode US suffers from intrinsic limitations such as speckle noise, operator dependence, and variability in image interpretation, which reduce diagnostic reproducibility and hinder skill acquisition. Because accurate image acquisition and interpretation rely heavily on the operator’s experience, mastering ultrasound requires extensive hands-on training under diverse anatomical and pathological conditions. Yet, traditional educational settings rarely provide consistent exposure to such variability, making simulation-based environments essential for developing and standardizing operator expertise. This scoping review synthesizes advances from 2014 to 2024 in B-mode ultrasound simulation, identifying 80 studies through structured searches in PubMed, Scopus, Web of Science, and IEEE. Simulation methods were organized into interpolative, wave-based, ray-based, and convolution-based models, as well as emerging Artificial Intelligence (AI)-driven approaches. The review emphasizes recent simulation engines and toolboxes reported in this period and highlights the growing role of learning-based pipelines (e.g., Generative Adversarial Networks (GANs) and diffusion) for realism, scalability, and data augmentation. The results show steady progress toward high realism and computational efficiency, including Graphics Processing Unit (GPU)-accelerated transport models, physics-informed convolution, and AI-enhanced translation and synthesis. Remaining challenges include the modeling of nonlinear and dynamic effects at scale, standardizing evaluation across tasks, and integrating physics with learning to balance fidelity and speed. These findings outline current capabilities and future directions for training, validation, and diagnostic support in ultrasound imaging. Full article
Show Figures

Figure 1

31 pages, 11710 KB  
Article
An Efficient GPU-Accelerated High-Order Upwind Rotated Lattice Boltzmann Flux Solver for Simulating Three-Dimensional Compressible Flows with Strong Shock Waves
by Yunhao Wang, Qite Wang and Yan Wang
Entropy 2025, 27(12), 1193; https://doi.org/10.3390/e27121193 - 24 Nov 2025
Viewed by 399
Abstract
This paper presents an efficient and high-order WENO-based Upwind Rotated Lattice Boltzmann Flux Solver (WENO-URLBFS) on graphics processing units (GPUs) for simulating three-dimensional (3D) compressible flow problems. The proposed approach extends the baseline Rotated Lattice Boltzmann Flux Solver (RLBFS) by redefining the interface [...] Read more.
This paper presents an efficient and high-order WENO-based Upwind Rotated Lattice Boltzmann Flux Solver (WENO-URLBFS) on graphics processing units (GPUs) for simulating three-dimensional (3D) compressible flow problems. The proposed approach extends the baseline Rotated Lattice Boltzmann Flux Solver (RLBFS) by redefining the interface tangential velocity based on the theoretical solution of the Euler equations. This improvement, combined with a weighted decomposition of the numerical fluxes in two mutually perpendicular directions, effectively reduces numerical dissipation and enhances solution stability. To achieve high-order accuracy, the WENO interpolation is applied in the characteristic space to reconstruct physical quantities on both sides of the interface. The density perturbation test is employed to assess the accuracy of the scheme, which demonstrates 5th- and 7th-order convergence as expected. In addition, this test case is also employed to confirm the consistency between the CPU serial and GPU parallel implementations of the WENO-URLBFS scheme and to assess the acceleration performance across different grid resolutions, yielding a maximum speedup factor of 1208.27. The low-dissipation property of the scheme is further assessed through the inviscid Taylor–Green vortex problem. Finally, a series of challenging three-dimensional benchmark cases demonstrate that the present scheme achieves high accuracy, low dissipation, and excellent computational efficiency in simulating strongly compressible flows with complex features such as strong shock waves and discontinuities. Full article
(This article belongs to the Section Statistical Physics)
Show Figures

Figure 1

15 pages, 4097 KB  
Article
Optimization Algorithm for the Unstructured UGKWP Particle Tracking Process Based on a GPU
by Zhengyu Tian, Yuhang Chu, Hang Yu, Qianyue Fu and Weijie Ren
Aerospace 2025, 12(11), 1005; https://doi.org/10.3390/aerospace12111005 - 11 Nov 2025
Viewed by 494
Abstract
The Unified Gas–Kinetic Wave–Particle (UGKWP) method is a multiscale method that offers high computational efficiency when solving complex high-Mach-number flows around spacecraft. When the UGKWP method, based on a Graphics Processing Unit (GPU) platform, is used to simulate flow, threads within the same [...] Read more.
The Unified Gas–Kinetic Wave–Particle (UGKWP) method is a multiscale method that offers high computational efficiency when solving complex high-Mach-number flows around spacecraft. When the UGKWP method, based on a Graphics Processing Unit (GPU) platform, is used to simulate flow, threads within the same warp are responsible for tracking different particles, leading to a significant warp divergence problem that affects overall computational efficiency. Therefore, this study introduces a dynamic marking tracking algorithm based on block sharing to enhance the efficiency of particle tracking. This algorithm rebuilds the original tracking process by marking and tracking particles, aligning thread computations within the same warp as much as possible to reduce warp divergence. As a result, the average number of active threads increased by over 46% across different testing platforms. The optimized UGKWP platform was used to simulate the re-entry capsule case, and the results showed that the optimized UGKWP can accurately and efficiently simulate the flow details around the capsule. This research provides an efficient and accurate tool for simulating complex multiscale flows at high Mach numbers, which is of great significance. Full article
Show Figures

Figure 1

20 pages, 15574 KB  
Article
Temporal Encoding Strategies for YOLO-Based Detection of Honeybee Trophallaxis Behavior in Precision Livestock Systems
by Gabriela Vdoviak and Tomyslav Sledevič
Agriculture 2025, 15(22), 2338; https://doi.org/10.3390/agriculture15222338 - 11 Nov 2025
Viewed by 759
Abstract
Trophallaxis, a fundamental social behavior observed among honeybees, involves the redistribution of food and chemical signals. The automation of its detection under field-realistic conditions poses a significant challenge due to the presence of crowding, occlusions, and brief, fine-scale motions. In this study, we [...] Read more.
Trophallaxis, a fundamental social behavior observed among honeybees, involves the redistribution of food and chemical signals. The automation of its detection under field-realistic conditions poses a significant challenge due to the presence of crowding, occlusions, and brief, fine-scale motions. In this study, we propose a markerless, deep learning-based approach that injects short- and mid-range temporal features into single-frame You Only Look Once (YOLO) detectors via temporal-to-RGB encodings. A new dataset for trophallaxis detection, captured under diverse illumination and density conditions, has been released. On an NVIDIA RTX 4080 graphics processing unit (GPU), temporal-to-RGB inputs consistently outperformed RGB-only baselines across YOLO families. The YOLOv8m model improved from 84.7% mean average precision (mAP50) with RGB inputs to 91.9% with stacked-grayscale encoding and to 95.5% with temporally encoded motion and averaging over a 1 s window (TEMA-1s). Similar improvements were observed for larger models, with best mAP50 values approaching 94–95%. On an NVIDIA Jetson AGX Orin embedded platform, TensorRT-optimized YOLO models sustained real-time throughput, reaching 30 frames per second (fps) for small and 23–25 fps for medium models with temporal-to-RGB inputs. The results showed that the TEMA-1s encoded YOLOv8m model has achieved the highest mAP50 of 95.5% with real-time inference on both workstation and edge hardware. These findings indicate that temporal-to-RGB encodings provide an accurate and computationally efficient solution for markerless trophallaxis detection in field-realistic conditions. This approach can be further extended to multi-behavior recognition or integration of additional sensing modalities in precision beekeeping. Full article
Show Figures

Figure 1

22 pages, 5833 KB  
Article
A Codesign Framework for the Development of Next Generation Wearable Computing Systems
by Francesco Porreca, Fabio Frustaci and Raffaele Gravina
Sensors 2025, 25(21), 6624; https://doi.org/10.3390/s25216624 - 28 Oct 2025
Cited by 1 | Viewed by 966
Abstract
Wearable devices can be developed using hardware platforms such as Application Specific Integrated Circuits (ASICs), Graphics Processing Units (GPUs), Digital Signal Processors (DSPs), Micro controller Units (MCUs), or Field Programmable Gate Arrays (FPGAs), each with distinct advantages and limitations. ASICs offer high efficiency [...] Read more.
Wearable devices can be developed using hardware platforms such as Application Specific Integrated Circuits (ASICs), Graphics Processing Units (GPUs), Digital Signal Processors (DSPs), Micro controller Units (MCUs), or Field Programmable Gate Arrays (FPGAs), each with distinct advantages and limitations. ASICs offer high efficiency but lack flexibility. GPUs excel in parallel processing but consume significant power. DSPs are optimized for signal processing but are limited in versatility. CPUs provide low power consumption but lack computational power. FPGAs are highly flexible, enabling powerful parallel processing at lower energy costs than GPUs but with higher resource demands than ASICs. The combined use of FPGAs and CPUs balances power efficiency and computational capability, making it ideal for wearable systems requiring complex algorithms in far-edge computing, where data processing occurs onboard the device. This approach promotes green electronics, extending battery life and reducing user inconvenience. The primary goal of this work was to develop a versatile framework, similar to existing software development frameworks, but specifically tailored for mixed FPGA/MCU platforms. The framework was validated through a real-world use case, demonstrating significant improvements in execution speed and power consumption. These results confirm its effectiveness in developing green and smart wearable systems. Full article
(This article belongs to the Section Wearables)
Show Figures

Figure 1

Back to TopTop