Submit to Special Issue Submit Abstract to Special Issue Review for Applied Sciences Propose a Special Issue

Journal Menu

Journal Browser

Data Structures for Graphics Processing Units (GPUs)

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 January 2026 | Viewed by 6486

Share This Special Issue

Special Issue Editors

Dr. Byunghyun Jang

E-Mail Website
Guest Editor

Department of Computer and Information Science, University of Mississippi, University, MS 38677, USA
Interests: hardware architecture and compilers for parallel and heterogeneous processors; GPU computing (GPGPU) and CPU-GPU heterogeneous computing and CPU-GPU heterogeneous computing

Prof. Dr. Juan A. Gómez-Pulido

E-Mail Website
Guest Editor

Department of Technologies of Computers and Communications, Universidad de Extremadura, Cáceres, Spain
Interests: embedded computing; reconfigurable computing; computational intelligence; optimization
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue will delve into the innovative and rapidly evolving field of data structures tailored to graphics processing units (GPUs). GPUs, originally designed for rendering graphics, have emerged as powerful parallel processors, revolutionizing computational tasks across diverse domains. This Special Issue will explore the development and optimization of data structures that leverage the parallel processing capabilities of GPUs to achieve significant performance enhancements.

Contributors to this Special Issue will present cutting-edge research into a variety of GPU-optimized data structures, including, but not limited to, stacks, queues, trees, graphs, hash tables, and priority queues. The articles will highlight novel approaches to memory management, data access patterns, and algorithmic modifications that harness the massive parallelism of GPUs. Furthermore, this Special Issue will address practical challenges, such as synchronization, load balancing, and efficient data transfer between CPU and GPU memory.

By featuring both theoretical advancements and practical implementations, this Special Issue will bridge the gap between traditional CPU-centric data structures and their GPU-optimized counterparts. Readers will gain insights into the latest techniques for maximizing GPU performance, making this Special Issue an essential resource for researchers and practitioners seeking to exploit the full potential of GPU computing for data-intensive applications.

Dr. Byunghyun Jang
Prof. Dr. Juan A. Gómez-Pulido
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

GPU computing
concurrent data structures
GPGPU
parallel computing

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

12 pages, 1880 KiB

Open AccessArticle

Feasibility of Implementing Motion-Compensated Magnetic Resonance Imaging Reconstruction on Graphics Processing Units Using Compute Unified Device Architecture

by Mohamed Aziz Zeroual, Natalia Dudysheva, Vincent Gras, Franck Mauconduit, Karyna Isaieva, Pierre-André Vuissoz and Freddy Odille

Appl. Sci. 2025, 15(11), 5840; https://doi.org/10.3390/app15115840 - 22 May 2025

Viewed by 397

Abstract

Motion correction in magnetic resonance imaging (MRI) has become increasingly complex due to the high computational demands of iterative reconstruction algorithms and the heterogeneity of emerging computing platforms. However, the clinical applicability of these methods requires fast processing to ensure rapid and accurate diagnostics. Graphics processing units (GPUs) have demonstrated substantial performance gains in various reconstruction tasks. In this work, we present a GPU implementation of the reconstruction kernel for the generalized reconstruction by inversion of coupled systems (GRICS), an iterative joint optimization approach that enables 3D high-resolution image reconstruction with motion correction. Three implementations were compared: (i) a C++ CPU version, (ii) a Matlab–GPU version (with minimal code modifications allowing data storage in GPU memory), and (iii) a native GPU version using CUDA. Six distinct datasets, including various motion types, were tested. The results showed that the Matlab–GPU approach achieved speedups ranging from 1.2× to 2.0× compared to the CPU implementation, whereas the native CUDA version attained speedups of 9.7× to 13.9×. Across all datasets, the normalized root mean square error (NRMSE) remained on the order of

10^{- 6}

10^{- 4}

, indicating that the CUDA-accelerated method preserved image quality. Furthermore, a roofline analysis was conducted to quantify the kernel’s performance on one of the evaluated datasets. The kernel achieved 250 GFLOP/s, representing a 15.6× improvement over the performance of the Matlab–GPU version. These results confirm that GPU-based implementations of GRICS can drastically reduce reconstruction times while maintaining diagnostic fidelity, paving the way for more efficient clinical motion-compensated MRI workflows. Full article

(This article belongs to the Special Issue Data Structures for Graphics Processing Units (GPUs))

► Show Figures

Figure 1

24 pages, 740 KiB

Open AccessArticle

GPU-Accelerated Fock Matrix Computation with Efficient Reduction

by Satoki Tsuji, Yasuaki Ito, Haruto Fujii, Nobuya Yokogawa, Kanta Suzuki, Koji Nakano, Victor Parque and Akihiko Kasagi

Appl. Sci. 2025, 15(9), 4779; https://doi.org/10.3390/app15094779 - 25 Apr 2025

Viewed by 610

Abstract

In quantum chemistry, constructing the Fock matrix is essential to compute Coulomb interactions among atoms and electrons and, thus, to determine electron orbitals and densities. In the fundamental framework of quantum chemistry such as the Hartree–Fock method, the iterative computation of the Fock matrix is a dominant process, constituting a critical computational bottleneck. Although the Fock matrix computation has been accelerated by parallel processing using GPUs, the issue of performance degradation due to memory contention remains unresolved. This is due to frequent conflicts of atomic operations accessing the same memory addresses when multiple threads update the Fock matrix elements concurrently. To address this issue, we propose a parallel algorithm that efficiently and suitably distributes the atomic operations; and significantly reduces the memory contention by decomposing the Fock matrix into multiple replicas, allowing each GPU thread to contribute to different replicas. Experimental results using a relevant set/configuration of molecules on an NVIDIA A100 GPU show that our approach achieves up to a

3.75 \times

speedup in Fock matrix computation compared to conventional high-contention approaches. Furthermore, our proposed method can also be readily combined with existing implementations that reduce the number of atomic operations, leading to a

1.98 \times

improvement. Full article

(This article belongs to the Special Issue Data Structures for Graphics Processing Units (GPUs))

► Show Figures

Figure 1

30 pages, 1684 KiB

Open AccessArticle

Efficient GPU Implementation of the McMurchie–Davidson Method for Shell-Based ERI Computations

by Haruto Fujii, Yasuaki Ito, Nobuya Yokogawa, Kanta Suzuki, Satoki Tsuji, Koji Nakano, Victor Parque and Akihiko Kasagi

Appl. Sci. 2025, 15(5), 2572; https://doi.org/10.3390/app15052572 - 27 Feb 2025

Cited by 1 | Viewed by 1113

Abstract

Quantum chemistry offers the formal machinery to derive molecular and physical properties arising from (sub)atomic interactions. However, as molecules of practical interest are largely polyatomic, contemporary approximation schemes such as the Hartree–Fock scheme are computationally expensive due to the large number of electron repulsion integrals (ERIs). Central to the Hartree–Fock method is the efficient computation of ERIs over Gaussian functions (GTO-ERIs). Here, the well-known McMurchie–Davidson method (MD) offers an elegant formalism by incrementally extending Hermite Gaussian functions and auxiliary tabulated functions. Although the MD method offers a high degree of versatility to acceleration schemes through Graphics Processing Units (GPUs), the current GPU implementations limit the practical use of supported values of the azimuthal quantum number. In this paper, we propose a generalized framework capable of computing GTO-ERIs for arbitrary azimuthal quantum numbers, provided that the intermediate terms of the MD method can be stored. Our approach benefits from extending the MD recurrence relations through shells, batches, and triple-buffering of the shared memory, and ordering similar ERIs, thus enabling the effective parallelization and use of GPU resources. Furthermore, our approach proposes four GPU implementation schemes considering the suitable mappings between Gaussian basis and CUDA blocks and threads. Our computational experiments involving the GTO-ERI computations of molecules of interest on an NVIDIA A100 Tensor Core GPU (NVIDIA, Santa Clara, CA, USA) have revealed the merits of the proposed acceleration schemes in terms of computation time, including up to a 72× improvement over our previous GPU implementation and up to a 4500× speedup compared to a naive CPU implementation, highlighting the effectiveness of our method in accelerating ERI computations for both monatomic and polyatomic molecules. Our work has the potential to explore new parallelization schemes of distinct and complex computation paths involved in ERI computation. Full article

(This article belongs to the Special Issue Data Structures for Graphics Processing Units (GPUs))

► Show Figures

Figure 1

20 pages, 899 KiB

Open AccessArticle

Boundary-Aware Concurrent Queue: A Fast and Scalable Concurrent FIFO Queue on GPU Environments

by Md. Sabbir Hossain Polak, David A. Troendle and Byunghyun Jang

Appl. Sci. 2025, 15(4), 1834; https://doi.org/10.3390/app15041834 - 11 Feb 2025

Viewed by 1152

Abstract

This paper presents Boundary-Aware Concurrent Queue (BACQ), a high-performance queue designed for modern GPUs, which focuses on high concurrency in massively parallel environments. BACQ operates at the warp level, leveraging intra-warp locality to improve throughput. A key to BACQ’s design is its ability to replace conflicting accesses to shared data with independent accesses to private data. It uses a ticket-based system to ensure fair ordering of operations and supports infinite growth of the head and tail across its ring buffer. The leader thread of each warp coordinates enqueue and dequeue operations, broadcasting offsets for intra-warp synchronization. BACQ dynamically adjusts operation priorities based on the queue’s state, especially as it approaches boundary conditions such as overfilling the buffer. It also uses a virtual caching layer for intra-warp communication, reducing memory latency. Rigorous benchmarking results show that BACQ outperforms the BWD (Broker Queue Work Distributor), the fastest known GPU queue, by more than 2× while preserving FIFO semantics. The paper demonstrates BACQ’s superior performance through real-world empirical evaluations. Full article

(This article belongs to the Special Issue Data Structures for Graphics Processing Units (GPUs))

► Show Figures

Figure 1

21 pages, 6218 KiB

Open AccessArticle

Multi-GPU Acceleration for Finite Element Analysis in Structural Mechanics

by David Herrero-Pérez and Humberto Martínez-Barberá

Appl. Sci. 2025, 15(3), 1095; https://doi.org/10.3390/app15031095 - 22 Jan 2025

Viewed by 2291

Abstract

This work evaluates the computing performance of finite element analysis in structural mechanics using modern multi-GPU systems. We can avoid the usual memory limitations when using one GPU device for many-core computing using multiple GPUs for scientific computing. We use a GPU-awareness MPI approach implementing a suitable smoothed aggregation multigrid for preconditioning an iterative distributed conjugate gradient solver for GPU computing. We evaluate the performance and scalability of different models, problem sizes, and computing resources. We take an efficient multi-core implementation as the reference to assess the computing performance of the numerical results. The numerical results show the advantages and limitations of using distributed many-core architectures to address structural mechanics problems. Full article

(This article belongs to the Special Issue Data Structures for Graphics Processing Units (GPUs))

► Show Figures

Journal Menu

Journal Browser

Data Structures for Graphics Processing Units (GPUs)

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (5 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI