Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (446)

Search Parameters:
Keywords = graphical processing unit (GPU)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 4049 KiB  
Article
A Versatile UAS Development Platform Able to Support a Novel Tracking Algorithm in Real-Time
by Dan-Marius Dobrea and Matei-Ștefan Dobrea
Aerospace 2025, 12(8), 649; https://doi.org/10.3390/aerospace12080649 - 22 Jul 2025
Viewed by 308
Abstract
A primary objective of this research entails the development of an innovative algorithm capable of tracking a drone in real-time. This objective serves as a fundamental requirement across various applications, including collision avoidance, formation flying, and the interception of moving targets. Nonetheless, regardless [...] Read more.
A primary objective of this research entails the development of an innovative algorithm capable of tracking a drone in real-time. This objective serves as a fundamental requirement across various applications, including collision avoidance, formation flying, and the interception of moving targets. Nonetheless, regardless of the efficacy of any detection algorithm, achieving 100% performance remains unattainable. Deep neural networks (DNNs) were employed to enhance this performance. To facilitate real-time operation, the DNN must be executed within a Deep Learning Processing Unit (DPU), Neural Processing Unit (NPU), Tensor Processing Unit (TPU), or Graphics Processing Unit (GPU) system on board the UAV. Given the constraints of these processing units, it may be necessary to quantify the DNN or utilize a less complex variant, resulting in an additional reduction in performance. However, precise target detection at each control step is imperative for effective flight path control. By integrating multiple algorithms, the developed system can effectively track UAVs with improved detection performance. Furthermore, this paper aims to establish a versatile Unmanned Aerial System (UAS) development platform constructed using open-source components and possessing the capability to adapt and evolve seamlessly throughout the development and post-production phases. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

16 pages, 2270 KiB  
Article
Performance Evaluation of FPGA, GPU, and CPU in FIR Filter Implementation for Semiconductor-Based Systems
by Muhammet Arucu and Teodor Iliev
J. Low Power Electron. Appl. 2025, 15(3), 40; https://doi.org/10.3390/jlpea15030040 - 21 Jul 2025
Viewed by 429
Abstract
This study presents a comprehensive performance evaluation of field-programmable gate array (FPGA), graphics processing unit (GPU), and central processing unit (CPU) platforms for implementing finite impulse response (FIR) filters in semiconductor-based digital signal processing (DSP) systems. Utilizing a standardized FIR filter designed with [...] Read more.
This study presents a comprehensive performance evaluation of field-programmable gate array (FPGA), graphics processing unit (GPU), and central processing unit (CPU) platforms for implementing finite impulse response (FIR) filters in semiconductor-based digital signal processing (DSP) systems. Utilizing a standardized FIR filter designed with the Kaiser window method, we compare computational efficiency, latency, and energy consumption across the ZYNQ XC7Z020 FPGA, Tesla K80 GPU, and Arm-based CPU, achieving processing times of 0.004 s, 0.008 s, and 0.107 s, respectively, with FPGA power consumption of 1.431 W and comparable energy profiles for GPU and CPU. The FPGA is 27 times faster than the CPU and 2 times faster than the GPU, demonstrating its suitability for low-latency DSP tasks. A detailed analysis of resource utilization and scalability underscores the FPGA’s reconfigurability for optimized DSP implementations. This work provides novel insights into platform-specific optimizations, addressing the demand for energy-efficient solutions in edge computing and IoT applications, with implications for advancing sustainable DSP architectures. Full article
(This article belongs to the Topic Advanced Integrated Circuit Design and Application)
Show Figures

Figure 1

8 pages, 702 KiB  
Proceeding Paper
Overview of Training LLMs on One Single GPU
by Mohamed Ben jouad and Lotfi Elaachak
Comput. Sci. Math. Forum 2025, 10(1), 14; https://doi.org/10.3390/cmsf2025010014 - 9 Jul 2025
Viewed by 367
Abstract
Large language models (LLMs) are developing at a rapid pace, which has made it necessary to better understand how they train, especially when faced with resource limitations. This paper examines in detail how various state-of-the-art LLMs train on a single Graphical Processing Unit [...] Read more.
Large language models (LLMs) are developing at a rapid pace, which has made it necessary to better understand how they train, especially when faced with resource limitations. This paper examines in detail how various state-of-the-art LLMs train on a single Graphical Processing Unit (GPU), paying close attention to crucial elements like throughput, memory utilization and training time. We find important trade-offs between model size, batch size and computational efficiency through empirical evaluation, offering practical advice for streamlining fine-tuning processes in the face of hardware constraints. Full article
Show Figures

Figure 1

17 pages, 2698 KiB  
Article
An Integrated Hydrological–Hydrodynamic Model Based on GPU Acceleration for Catchment-Scale Rainfall Flood Simulation
by Ruixiao Ma, Hao Han and Zhaoan Zhang
Atmosphere 2025, 16(7), 809; https://doi.org/10.3390/atmos16070809 - 1 Jul 2025
Viewed by 323
Abstract
Extreme rainstorms are difficult to predict and often result in catchment-scale rainfall flooding, leading to substantial economic losses globally. Enhancing the numerical computational efficiency of flood models is essential for improving flood forecasting capabilities. This study presents an integrated hydrological–hydrodynamic model accelerated using [...] Read more.
Extreme rainstorms are difficult to predict and often result in catchment-scale rainfall flooding, leading to substantial economic losses globally. Enhancing the numerical computational efficiency of flood models is essential for improving flood forecasting capabilities. This study presents an integrated hydrological–hydrodynamic model accelerated using GPU (Graphics Processing Unit) technology to perform high-efficiency and high-precision rainfall flood simulations at the catchment scale. The model couples hydrological and hydrodynamic processes by solving the fully two-dimensional shallow water equations (2D SWEs), incorporating GPU-accelerated parallel computing. The model achieves accelerated rainstorm flooding simulations through its implementation on GPUs with parallel computing technology, significantly enhancing its computational efficiency and maintaining its numerical stability. Validations are conducted using an idealized V-shaped catchment and an experimental benchmark, followed by application to a small catchment on the Chinese Loess Plateau. The computational experiments reveal a strong positive correlation between grid cell numbers and GPU acceleration efficiency. The results also demonstrate that the proposed model offers better computational accuracy and acceleration performance than the single-GPU model. This GPU-accelerated hydrological–hydrodynamic modeling framework enables rapid, high-fidelity rainfall flood simulations and provides critical support for timely and effective flood emergency decision making. Full article
(This article belongs to the Special Issue Advances in Rainfall-Induced Hazard Research)
Show Figures

Figure 1

23 pages, 2579 KiB  
Article
Multimodal Particulate Matter Prediction: Enabling Scalable and High-Precision Air Quality Monitoring Using Mobile Devices and Deep Learning Models
by Hirokazu Madokoro and Stephanie Nix
Sensors 2025, 25(13), 4053; https://doi.org/10.3390/s25134053 - 29 Jun 2025
Viewed by 395
Abstract
This paper presents a novel approach for predicting Particulate Matter (PM) concentrations using mobile camera devices. In response to persistent air pollution challenges across Japan, we developed a system that utilizes cutting-edge transformer-based deep learning architectures to estimate PM values from imagery captured [...] Read more.
This paper presents a novel approach for predicting Particulate Matter (PM) concentrations using mobile camera devices. In response to persistent air pollution challenges across Japan, we developed a system that utilizes cutting-edge transformer-based deep learning architectures to estimate PM values from imagery captured by smartphone cameras. Our approach employs Contrastive Language–Image Pre-Training (CLIP) as a multimodal framework to extract visual features associated with PM concentration from environmental scenes. We first developed a baseline through comparative analysis of time-series models for 1D PM signal prediction, finding that linear models, particularly NLinear, outperformed complex transformer architectures for short-term forecasting tasks. Building on these insights, we implemented a CLIP-based system for 2D image analysis that achieved a Top-1 accuracy of 0.24 and a Top-5 accuracy of 0.52 when tested on diverse smartphone-captured images. The performance evaluations on Graphics Processing Unit (GPU) and Single-Board Computer (SBC) platforms highlight a viable path toward edge deployment. Processing times of 0.29 s per image on the GPU versus 2.68 s on the SBC demonstrate the potential for scalable, real-time environmental monitoring. We consider that this research connects high-performance computing with energy-efficient hardware solutions, creating a practical framework for distributed environmental monitoring that reduces reliance on costly centralized monitoring systems. Our findings indicate that transformer-based multimodal models present a promising approach for mobile sensing applications, with opportunities for further improvement through seasonal data expansion and architectural refinements. Full article
(This article belongs to the Special Issue Machine Learning and Image-Based Smart Sensing and Applications)
Show Figures

Figure 1

21 pages, 3967 KiB  
Article
An Efficient Parallelization of Microscopic Traffic Simulation
by Benyamin Heidary, Joerg Schweizer, Ngoc An Nguyen, Federico Rupi and Cristian Poliziani
Appl. Sci. 2025, 15(13), 6960; https://doi.org/10.3390/app15136960 - 20 Jun 2025
Viewed by 448
Abstract
Large-scale traffic simulations at a microscopic level can mimic the physical reality in great detail so that innovative transport services can be evaluated. However, the simulation times of such scenarios is currently too long to be practical. (1) Background: With the availability of [...] Read more.
Large-scale traffic simulations at a microscopic level can mimic the physical reality in great detail so that innovative transport services can be evaluated. However, the simulation times of such scenarios is currently too long to be practical. (1) Background: With the availability of Graphical Processing Units (GPUs), is it possible to exploit parallel computing to reduce the simulation times of large microscopic simulations, such that they can run on normal PCs at reasonable runtimes?; (2) Methods: ParSim, a microsimulator with a monolithic microsimulation kernel, has been developed for CUDA-compatible GPUs, with the aim to efficiently parallelize the simulation processes; particular care has been taken regarding the memory usage and thread synchronization, and visualization software has been optionally added; (3) Results: The parallelized simulations have been performed by a GPU with an average performance, a 24 h microsimulation scenario for Bologna with 1 million trips was completed in 40 s. The average speeds and waiting times are similar to the results from an established microsimulator (SUMO), but the execution time is up to 5000 times faster with respect to SUMO; the 28 million trips of the 24 h San Francisco Bay Area scenario was completed in 26 min. With cutting-edge GPUs, the simulation speed can possibly be further reduced by a factor of seven; (4) Conclusions: The parallelized simulator presented in this paper can perform large-scale microsimulations in a reasonable time on readily available and inexpensive computer hardware. This means microsimulations could now be used in new application fields such as activity-based demand generation, reinforced AI learning, traffic forecasting, or crisis response management. Full article
(This article belongs to the Special Issue Recent Advances in Parallel Computing and Big Data)
Show Figures

Figure 1

23 pages, 1119 KiB  
Article
Improving Text Classification of Imbalanced Call Center Conversations Through Data Cleansing, Augmentation, and NER Metadata
by Sihyoung Jurn and Wooje Kim
Electronics 2025, 14(11), 2259; https://doi.org/10.3390/electronics14112259 - 31 May 2025
Viewed by 643
Abstract
The categories for call center conversation data are valuably used for reporting business results and marketing analysis. However, they typically lack clear patterns and suffer from severe imbalance in the number of instances across categories. The call center conversation categories used in this [...] Read more.
The categories for call center conversation data are valuably used for reporting business results and marketing analysis. However, they typically lack clear patterns and suffer from severe imbalance in the number of instances across categories. The call center conversation categories used in this study are Payment, Exchange, Return, Delivery, Service, and After-sales service (AS), with a significant imbalance where Service accounts for 26% of the total data and AS only 2%. To address these challenges, this study proposes a model that ensembles meta-information generated through Named Entity Recognition (NER) with machine learning inference results. Utilizing KoBERT (Korean Bidirectional Encoder Representations from Transformers) as our base model, we employed Easy Data Augmentation (EDA) to augment data in categories with insufficient instances. Through the training of nine models, encompassing KoBERT category probability weights and a CatBoost (Categorical Boosting) model that ensembles meta-information derived from named entities, we ultimately improved the F1 score from the baseline of 0.9117 to 0.9331, demonstrating a solution that circumvents the need for expensive LLMs (Large Language Models) or high-performance GPUs (Graphic Process Units). This improvement is particularly significant considering that, when focusing solely on the category with a 2% data proportion, our model achieved an F1 score of 0.9509, representing a 4.6% increase over the baseline. Full article
Show Figures

Figure 1

18 pages, 2645 KiB  
Article
A Deep Learning Methodology for Screening New Natural Therapeutic Candidates for Pharmacological Cardioversion and Anticoagulation in the Treatment and Management of Atrial Fibrillation
by Tim Dong, Rhys D. Llewellyn, Melanie Hezzell and Gianni D. Angelini
Biomedicines 2025, 13(6), 1323; https://doi.org/10.3390/biomedicines13061323 - 28 May 2025
Viewed by 517
Abstract
Background: The treatment and management of atrial fibrillation poses substantial complexity. A delicate balance in the trade-off between the minimising risk of stroke without increasing the risk of bleeding through anticoagulant optimisations. Natural compounds are often associated with low-toxicity effects, and their effects [...] Read more.
Background: The treatment and management of atrial fibrillation poses substantial complexity. A delicate balance in the trade-off between the minimising risk of stroke without increasing the risk of bleeding through anticoagulant optimisations. Natural compounds are often associated with low-toxicity effects, and their effects on atrial fibrillation have yet to be fully understood. Whilst deep learning (a subtype of machine learning that uses multiple layers of artificial neural networks) methods may be useful for drug compound interaction and discovery analysis, graphical processing units (GPUs) are expensive and often required for deep learning. Furthermore, in limited-resource settings, such as low- and middle-income countries, such technology may not be easily available. Objectives: This study aims to discover the presence of any new therapeutic candidates from a large set of natural compounds that may support the future treatment and management of atrial fibrillation anywhere using a low-cost technique. The objective is to develop a deep learning approach under a low-resource setting where suitable high-performance NVIDIA graphics processing units (GPUs) are not available and to apply to atrial fibrillation as a case study. Methods: The primary training dataset is the MINER-DTI dataset from the BIOSNAP collection. It includes 13,741 DTI pairs from DrugBank, 4510 drug compounds, and 2181 protein targets. Deep cross-modal attention modelling was developed and applied. The Database of Useful Decoys (DUD-E) was used to fine-tune the model using contrastive learning. This application and evaluation of the model were performed on the natural compound NPASS 2018 dataset as well as a dataset curated by a clinical pharmacist and a clinical scientist. Results: the new model showed good performance when compared to existing state-of-the-art approaches under low-resource settings in both the validation set (PR AUC: 0.8118 vs. 0.7154) and test set (PR AUC: 0.8134 vs. 0.7206). Tenascin-C (TNC; NPC306696) and deferoxamine (NPC262615) were identified as strong natural compound interactors of the arrhythmogenic targets ADRB1 and HCN1, respectively. A strong natural compound interactor of the bleeding-related target Factor X was also identified as sequoiaflavone (NPC194593). Conclusions: This study presented a new high-performing model under low-resource settings that identified new natural therapeutic candidates for pharmacological cardioversion and anticoagulation. Full article
(This article belongs to the Special Issue Role of Natural Product in Cardiovascular Disease—2nd Edition)
Show Figures

Figure 1

22 pages, 1502 KiB  
Article
PCMINN: A GPU-Accelerated Conditional Mutual Information-Based Feature Selection Method
by Nikolaos Papaioannou, Georgios Myllis, Alkiviadis Tsimpiris, Stamatis Aggelopoulos and Vasiliki Vrana
Information 2025, 16(6), 445; https://doi.org/10.3390/info16060445 - 27 May 2025
Cited by 1 | Viewed by 585
Abstract
In feature selection, it is crucial to identify features that are not only relevant to the target variable but also non-redundant. Conditional Mutual Information Nearest-Neighbor (CMINN) is an algorithm developed to address this challenge by using Conditional Mutual Information (CMI) to assess the [...] Read more.
In feature selection, it is crucial to identify features that are not only relevant to the target variable but also non-redundant. Conditional Mutual Information Nearest-Neighbor (CMINN) is an algorithm developed to address this challenge by using Conditional Mutual Information (CMI) to assess the relevance of individual features to the target variable, while identifying redundancy among similar features. Although effective, the original CMINN algorithm can be computationally intensive, particularly with large and high-dimensional datasets. In this study, we extend the CMINN algorithm by parallelizing it for execution on Graphics Processing Units (GPUs), significantly enhancing its efficiency and scalability for high-dimensional datasets. The parallelized CMINN (PCMINN) leverages the massive parallelism of modern GPUs to handle the computational complexity inherent in sequential feature selection, particularly when dealing with large-scale data. To evaluate the performance of PCMINN across various scenarios, we conduct both an extensive simulation study using datasets with combined feature effects and a case study using financial data. Our results show that PCMINN not only maintains the effectiveness of the original CMINN in selecting the optimal feature subset, but also achieves faster execution times. The parallelized approach allows for the efficient processing of large datasets, making PCMINN a valuable tool for high-dimensional feature selection tasks. We also provide a package that includes two Python implementations to support integration into future research workflows: a sequential version of CMINN and a parallel GPU-based version of PCMINN. Full article
(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)
Show Figures

Graphical abstract

26 pages, 5823 KiB  
Article
OGAIS: OpenGL-Driven GPU Acceleration Methodology for 3D Hyperspectral Image Simulation
by Xiangyu Li, Wenjuan Zhang, Bowen Wang, Huaili Qiu, Mengnan Jin and Peng Qi
Remote Sens. 2025, 17(11), 1841; https://doi.org/10.3390/rs17111841 - 25 May 2025
Viewed by 512
Abstract
Hyperspectral remote sensing, which can acquire data in both spectral and spatial dimensions, has been widely applied in various fields. However, the available data are limited by factors such as revisit time, imaging width, and weather conditions. Three-dimensional (3D) hyperspectral simulation based on [...] Read more.
Hyperspectral remote sensing, which can acquire data in both spectral and spatial dimensions, has been widely applied in various fields. However, the available data are limited by factors such as revisit time, imaging width, and weather conditions. Three-dimensional (3D) hyperspectral simulation based on ray tracing can overcome these limitations by enabling physics-based modeling of arbitrary imaging geometries, solar conditions, and atmospheric effects. This type of simulation offers advantages in acquiring multi-angle and multi-condition quantitative results. However, the 3D hyperspectral simulation requires substantial computational resources. With the development of hardware, a graphics processing unit (GPU) offers a potential way to accelerate it. This paper proposes a 3D hyperspectral simulation model based on GPU-accelerated ray tracing, which is realized by modifying and using a common graphics API (OpenGL). Through experiments, we demonstrate that this model enables 600-band hyperspectral simulation with a computational time of just 2.4 times that of RGB simulation. Furthermore, we analyzed the balance between calculation efficiency and accuracy, and carried out a correlation analysis between ray count and accuracy. Additionally, we verified the accuracy of this model by using UAV-based data. The results demonstrate over 90% spectral curve similarity between simulated and UAV-acquired images. Finally, based on this model, we conducted additional simulation experiments under different environmental variables and observation conditions to analyze the model’s ability to characterize different situations. The results show that the model effectively captures the effects of environmental variables and observation conditions on the hyperspectral characteristics of vehicles. Full article
Show Figures

Graphical abstract

19 pages, 1619 KiB  
Article
A Structured Method to Generate Self-Test Libraries for Tensor Cores
by Robert Limas Sierra, Juan David Guerrero Balaguera, Josie E. Rodriguez Condia and Matteo Sonza Reorda
Electronics 2025, 14(11), 2148; https://doi.org/10.3390/electronics14112148 - 25 May 2025
Viewed by 520
Abstract
Modern computing systems increasingly rely on specialized hardware accelerators, such as Graphics Processing Units (GPUs), to meet growing computational demands. GPUs are essential for accelerating a wide range of applications, from machine learning and scientific computing to safety-critical domains like autonomous systems and [...] Read more.
Modern computing systems increasingly rely on specialized hardware accelerators, such as Graphics Processing Units (GPUs), to meet growing computational demands. GPUs are essential for accelerating a wide range of applications, from machine learning and scientific computing to safety-critical domains like autonomous systems and aerospace. To enhance performance, modern GPUs integrate dedicated in-chip units, such as Tensor Cores(TCs), which are designed for efficient mixed-precision matrix operations. However, as semiconductor technologies scale down, reliability challenges emerge. Permanent hardware faults caused by aging, process variations, or environmental stress can lead to Silent Data Corruptions, which silently compromise computation results. In order to detect such faults, self-test libraries (STLs) are widely used, corresponding to suitably crafted pieces of code, able to activate faults and propagate their effects to visible points (e.g., the memory) and possibly signal their occurrence. This work introduces a structured method for generating STLs to detect permanent hardware faults that may arise in TCs. By leveraging the parallelism and regular structure of TCs, the method facilitates the creation of effective STLs for in-field fault detection without hardware modifications and with minimal requirements in terms of test time and memory. The proposed approach was validated on an NVIDIA GeForce RTX 3060 Ti GPU, installed in a Hewlett-Packard Z2 G5 workstation with an Intel Core i9-10800 CPU and 32 GB RAM, available at the Department of Control and Computer Engineering (DAUIN), Politecnico di Torino, Turin, Italy.This setup was used to address stuck-at faults in the arithmetic units of TCs. The results demonstrate that the methodology offers a practical, scalable, and non-intrusive solution for enhancing GPU reliability, applicable in both high-performance and safety-critical environments. Full article
Show Figures

Figure 1

12 pages, 1880 KiB  
Article
Feasibility of Implementing Motion-Compensated Magnetic Resonance Imaging Reconstruction on Graphics Processing Units Using Compute Unified Device Architecture
by Mohamed Aziz Zeroual, Natalia Dudysheva, Vincent Gras, Franck Mauconduit, Karyna Isaieva, Pierre-André Vuissoz and Freddy Odille
Appl. Sci. 2025, 15(11), 5840; https://doi.org/10.3390/app15115840 - 22 May 2025
Viewed by 371
Abstract
Motion correction in magnetic resonance imaging (MRI) has become increasingly complex due to the high computational demands of iterative reconstruction algorithms and the heterogeneity of emerging computing platforms. However, the clinical applicability of these methods requires fast processing to ensure rapid and accurate [...] Read more.
Motion correction in magnetic resonance imaging (MRI) has become increasingly complex due to the high computational demands of iterative reconstruction algorithms and the heterogeneity of emerging computing platforms. However, the clinical applicability of these methods requires fast processing to ensure rapid and accurate diagnostics. Graphics processing units (GPUs) have demonstrated substantial performance gains in various reconstruction tasks. In this work, we present a GPU implementation of the reconstruction kernel for the generalized reconstruction by inversion of coupled systems (GRICS), an iterative joint optimization approach that enables 3D high-resolution image reconstruction with motion correction. Three implementations were compared: (i) a C++ CPU version, (ii) a Matlab–GPU version (with minimal code modifications allowing data storage in GPU memory), and (iii) a native GPU version using CUDA. Six distinct datasets, including various motion types, were tested. The results showed that the Matlab–GPU approach achieved speedups ranging from 1.2× to 2.0× compared to the CPU implementation, whereas the native CUDA version attained speedups of 9.7× to 13.9×. Across all datasets, the normalized root mean square error (NRMSE) remained on the order of 106 to 104, indicating that the CUDA-accelerated method preserved image quality. Furthermore, a roofline analysis was conducted to quantify the kernel’s performance on one of the evaluated datasets. The kernel achieved 250 GFLOP/s, representing a 15.6× improvement over the performance of the Matlab–GPU version. These results confirm that GPU-based implementations of GRICS can drastically reduce reconstruction times while maintaining diagnostic fidelity, paving the way for more efficient clinical motion-compensated MRI workflows. Full article
(This article belongs to the Special Issue Data Structures for Graphics Processing Units (GPUs))
Show Figures

Figure 1

15 pages, 2611 KiB  
Article
GPU-Optimized Implementation for Accelerating CSAR Imaging
by Mengting Cui, Ping Li, Zhaohui Bu, Meng Xun and Li Ding
Electronics 2025, 14(10), 2073; https://doi.org/10.3390/electronics14102073 - 20 May 2025
Viewed by 310
Abstract
The direct porting of the Range Migration Algorithm to GPUs for three-dimensional (3D) cylindrical synthetic aperture radar (CSAR) imaging faces difficulties in achieving real-time performance while the architecture and programming models of GPUs significantly differ from CPUs. This paper proposes a GPU-optimized implementation [...] Read more.
The direct porting of the Range Migration Algorithm to GPUs for three-dimensional (3D) cylindrical synthetic aperture radar (CSAR) imaging faces difficulties in achieving real-time performance while the architecture and programming models of GPUs significantly differ from CPUs. This paper proposes a GPU-optimized implementation for accelerating CSAR imaging. The proposed method first exploits the concentric-square-grid (CSG) interpolation to reduce the computational complexity for reconstructing a uniform 2D wave-number domain. Although the CSG method transforms the 2D traversal interpolation into two independent 1D interpolations, the interval search to determine the position intervals for interpolation results in a substantial computational burden. Therefore, binary search is applied to avoid traditional point-to-point matching for efficiency improvement. Additionally, leveraging the partition independence of the grid distribution of CSG, the 360° data are divided into four streams along the diagonal for parallel processing. Furthermore, high-speed shared memory is utilized instead of high-latency global memory in the Hadamard product for the phase compensation stage. The experimental results demonstrate that the proposed method achieves CSAR imaging on a 1440×100×128 dataset in 0.794 s, with an acceleration ratio of 35.09 compared to the CPU implementation and 5.97 compared to the conventional GPU implementation. Full article
Show Figures

Figure 1

25 pages, 3464 KiB  
Article
A Comparative Analysis of the Usability of Consumer Graphics Cards for Deep Learning in the Aspects of Inland Navigational Signs Detection for Vision Systems
by Pawel Adamski and Jacek Lubczonek
Appl. Sci. 2025, 15(9), 5142; https://doi.org/10.3390/app15095142 - 6 May 2025
Viewed by 980
Abstract
Consumer-grade graphics processing units (GPUs) offer a potentially affordable and energy-efficient alternative to enterprise-class hardware for real-time image processing tasks, but systematic multi-criteria analyses of their suitability remain rare. This article fills that gap by evaluating the performance, power consumption, and cost-effectiveness of [...] Read more.
Consumer-grade graphics processing units (GPUs) offer a potentially affordable and energy-efficient alternative to enterprise-class hardware for real-time image processing tasks, but systematic multi-criteria analyses of their suitability remain rare. This article fills that gap by evaluating the performance, power consumption, and cost-effectiveness of GPUs from three leading vendors, AMD, Intel, and Nvidia, in an inland water transport (ITW) context. The main objective is to assess the feasibility of using consumer GPUs for deep learning tasks involving navigational sign detection, a critical component for ensuring safe and efficient inland transportation. The evaluation includes the use of image datasets of inland water transport signs processed by widely used detector and classifier models such as YOLO (you only look once), ResNet (residual neural network l), and MobileNet. To achieve this, we propose a multi-criteria framework based on a weighted scoring method (WSM), covering 21 different characteristics such as compatibility, resting power, energy efficiency in learning and inference, and the financial threshold for technology adoption. The results confirm that consumer-grade GPUs can deliver competitive performance with lower initial costs and lower power consumption. The findings underscore the enduring value of our analysis, as its framework can be adapted for ongoing comparisons of evolving GPU technologies using the proposed methodology. Full article
Show Figures

Figure 1

9 pages, 2062 KiB  
Article
Versal Adaptive Compute Acceleration Platform Processing for ATLAS-TileCal Signal Reconstruction
by Francisco Hervás Álvarez, Alberto Valero Biot, Luca Fiorini, Héctor Gutiérrez Arance, Fernando Carrió, Sonakshi Ahuja and Francesco Curcio
Particles 2025, 8(2), 49; https://doi.org/10.3390/particles8020049 - 1 May 2025
Viewed by 470
Abstract
Particle detectors at accelerators generate large amounts of data, requiring analysis to derive insights. Collisions lead to signal pile-up, where multiple particles produce signals in the same detector sensors, complicating individual signal identification. This contribution describes the implementation of a deep-learning algorithm on [...] Read more.
Particle detectors at accelerators generate large amounts of data, requiring analysis to derive insights. Collisions lead to signal pile-up, where multiple particles produce signals in the same detector sensors, complicating individual signal identification. This contribution describes the implementation of a deep-learning algorithm on a Versal Adaptive Compute Acceleration Platform (ACAP) device for improved processing via parallelization and concurrency. Connected to a host computer via Peripheral Component Interconnect express (PCIe), this system aims for enhanced speed and energy efficiency over Central Processing Units (CPUs) and Graphics Processing Units (GPUs). In the contribution, we will describe in detail the data processing and the hardware, firmware and software components of the system. The contribution presents the implementation of the deep-learning algorithm on a Versal ACAP device, as well as the system for transferring data in an efficient way. Full article
Show Figures

Figure 1

Back to TopTop