applsci-logo

Journal Browser

Journal Browser

Data Structures for Graphics Processing Units (GPUs)

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 July 2025 | Viewed by 3375

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer and Information Science, University of Mississippi, University, MS 38677, USA
Interests: hardware architecture and compilers for parallel and heterogeneous processors; GPU computing (GPGPU) and CPU-GPU heterogeneous computing and CPU-GPU heterogeneous computing

E-Mail Website
Guest Editor

Special Issue Information

Dear Colleagues,

This Special Issue will delve into the innovative and rapidly evolving field of data structures tailored to graphics processing units (GPUs). GPUs, originally designed for rendering graphics, have emerged as powerful parallel processors, revolutionizing computational tasks across diverse domains. This Special Issue will explore the development and optimization of data structures that leverage the parallel processing capabilities of GPUs to achieve significant performance enhancements.

Contributors to this Special Issue will present cutting-edge research into a variety of GPU-optimized data structures, including, but not limited to, stacks, queues, trees, graphs, hash tables, and priority queues. The articles will highlight novel approaches to memory management, data access patterns, and algorithmic modifications that harness the massive parallelism of GPUs. Furthermore, this Special Issue will address practical challenges, such as synchronization, load balancing, and efficient data transfer between CPU and GPU memory.

By featuring both theoretical advancements and practical implementations, this Special Issue will bridge the gap between traditional CPU-centric data structures and their GPU-optimized counterparts. Readers will gain insights into the latest techniques for maximizing GPU performance, making this Special Issue an essential resource for researchers and practitioners seeking to exploit the full potential of GPU computing for data-intensive applications.

Dr. Byunghyun Jang
Prof. Dr. Juan A. Gómez-Pulido
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • GPU computing
  • concurrent data structures
  • GPGPU
  • parallel computing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

24 pages, 740 KiB  
Article
GPU-Accelerated Fock Matrix Computation with Efficient Reduction
by Satoki Tsuji, Yasuaki Ito, Haruto Fujii, Nobuya Yokogawa, Kanta Suzuki, Koji Nakano, Victor Parque and Akihiko Kasagi
Appl. Sci. 2025, 15(9), 4779; https://doi.org/10.3390/app15094779 - 25 Apr 2025
Viewed by 109
Abstract
In quantum chemistry, constructing the Fock matrix is essential to compute Coulomb interactions among atoms and electrons and, thus, to determine electron orbitals and densities. In the fundamental framework of quantum chemistry such as the Hartree–Fock method, the iterative computation of the Fock [...] Read more.
In quantum chemistry, constructing the Fock matrix is essential to compute Coulomb interactions among atoms and electrons and, thus, to determine electron orbitals and densities. In the fundamental framework of quantum chemistry such as the Hartree–Fock method, the iterative computation of the Fock matrix is a dominant process, constituting a critical computational bottleneck. Although the Fock matrix computation has been accelerated by parallel processing using GPUs, the issue of performance degradation due to memory contention remains unresolved. This is due to frequent conflicts of atomic operations accessing the same memory addresses when multiple threads update the Fock matrix elements concurrently. To address this issue, we propose a parallel algorithm that efficiently and suitably distributes the atomic operations; and significantly reduces the memory contention by decomposing the Fock matrix into multiple replicas, allowing each GPU thread to contribute to different replicas. Experimental results using a relevant set/configuration of molecules on an NVIDIA A100 GPU show that our approach achieves up to a 3.75× speedup in Fock matrix computation compared to conventional high-contention approaches. Furthermore, our proposed method can also be readily combined with existing implementations that reduce the number of atomic operations, leading to a 1.98× improvement. Full article
(This article belongs to the Special Issue Data Structures for Graphics Processing Units (GPUs))
Show Figures

Figure 1

30 pages, 1684 KiB  
Article
Efficient GPU Implementation of the McMurchie–Davidson Method for Shell-Based ERI Computations
by Haruto Fujii, Yasuaki Ito, Nobuya Yokogawa, Kanta Suzuki, Satoki Tsuji, Koji Nakano, Victor Parque and Akihiko Kasagi
Appl. Sci. 2025, 15(5), 2572; https://doi.org/10.3390/app15052572 - 27 Feb 2025
Viewed by 648
Abstract
Quantum chemistry offers the formal machinery to derive molecular and physical properties arising from (sub)atomic interactions. However, as molecules of practical interest are largely polyatomic, contemporary approximation schemes such as the Hartree–Fock scheme are computationally expensive due to the large number of electron [...] Read more.
Quantum chemistry offers the formal machinery to derive molecular and physical properties arising from (sub)atomic interactions. However, as molecules of practical interest are largely polyatomic, contemporary approximation schemes such as the Hartree–Fock scheme are computationally expensive due to the large number of electron repulsion integrals (ERIs). Central to the Hartree–Fock method is the efficient computation of ERIs over Gaussian functions (GTO-ERIs). Here, the well-known McMurchie–Davidson method (MD) offers an elegant formalism by incrementally extending Hermite Gaussian functions and auxiliary tabulated functions. Although the MD method offers a high degree of versatility to acceleration schemes through Graphics Processing Units (GPUs), the current GPU implementations limit the practical use of supported values of the azimuthal quantum number. In this paper, we propose a generalized framework capable of computing GTO-ERIs for arbitrary azimuthal quantum numbers, provided that the intermediate terms of the MD method can be stored. Our approach benefits from extending the MD recurrence relations through shells, batches, and triple-buffering of the shared memory, and ordering similar ERIs, thus enabling the effective parallelization and use of GPU resources. Furthermore, our approach proposes four GPU implementation schemes considering the suitable mappings between Gaussian basis and CUDA blocks and threads. Our computational experiments involving the GTO-ERI computations of molecules of interest on an NVIDIA A100 Tensor Core GPU (NVIDIA, Santa Clara, CA, USA) have revealed the merits of the proposed acceleration schemes in terms of computation time, including up to a 72× improvement over our previous GPU implementation and up to a 4500× speedup compared to a naive CPU implementation, highlighting the effectiveness of our method in accelerating ERI computations for both monatomic and polyatomic molecules. Our work has the potential to explore new parallelization schemes of distinct and complex computation paths involved in ERI computation. Full article
(This article belongs to the Special Issue Data Structures for Graphics Processing Units (GPUs))
Show Figures

Figure 1

20 pages, 899 KiB  
Article
Boundary-Aware Concurrent Queue: A Fast and Scalable Concurrent FIFO Queue on GPU Environments
by Md. Sabbir Hossain Polak, David A. Troendle and Byunghyun Jang
Appl. Sci. 2025, 15(4), 1834; https://doi.org/10.3390/app15041834 - 11 Feb 2025
Viewed by 630
Abstract
This paper presents Boundary-Aware Concurrent Queue (BACQ), a high-performance queue designed for modern GPUs, which focuses on high concurrency in massively parallel environments. BACQ operates at the warp level, leveraging intra-warp locality to improve throughput. A key to BACQ’s design is its [...] Read more.
This paper presents Boundary-Aware Concurrent Queue (BACQ), a high-performance queue designed for modern GPUs, which focuses on high concurrency in massively parallel environments. BACQ operates at the warp level, leveraging intra-warp locality to improve throughput. A key to BACQ’s design is its ability to replace conflicting accesses to shared data with independent accesses to private data. It uses a ticket-based system to ensure fair ordering of operations and supports infinite growth of the head and tail across its ring buffer. The leader thread of each warp coordinates enqueue and dequeue operations, broadcasting offsets for intra-warp synchronization. BACQ dynamically adjusts operation priorities based on the queue’s state, especially as it approaches boundary conditions such as overfilling the buffer. It also uses a virtual caching layer for intra-warp communication, reducing memory latency. Rigorous benchmarking results show that BACQ outperforms the BWD (Broker Queue Work Distributor), the fastest known GPU queue, by more than 2× while preserving FIFO semantics. The paper demonstrates BACQ’s superior performance through real-world empirical evaluations. Full article
(This article belongs to the Special Issue Data Structures for Graphics Processing Units (GPUs))
Show Figures

Figure 1

21 pages, 6218 KiB  
Article
Multi-GPU Acceleration for Finite Element Analysis in Structural Mechanics
by David Herrero-Pérez and Humberto Martínez-Barberá
Appl. Sci. 2025, 15(3), 1095; https://doi.org/10.3390/app15031095 - 22 Jan 2025
Viewed by 1281
Abstract
This work evaluates the computing performance of finite element analysis in structural mechanics using modern multi-GPU systems. We can avoid the usual memory limitations when using one GPU device for many-core computing using multiple GPUs for scientific computing. We use a GPU-awareness MPI [...] Read more.
This work evaluates the computing performance of finite element analysis in structural mechanics using modern multi-GPU systems. We can avoid the usual memory limitations when using one GPU device for many-core computing using multiple GPUs for scientific computing. We use a GPU-awareness MPI approach implementing a suitable smoothed aggregation multigrid for preconditioning an iterative distributed conjugate gradient solver for GPU computing. We evaluate the performance and scalability of different models, problem sizes, and computing resources. We take an efficient multi-core implementation as the reference to assess the computing performance of the numerical results. The numerical results show the advantages and limitations of using distributed many-core architectures to address structural mechanics problems. Full article
(This article belongs to the Special Issue Data Structures for Graphics Processing Units (GPUs))
Show Figures

Figure 1

Back to TopTop