Advances in High Performance Computing and Scalable Software

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Applications".

Deadline for manuscript submissions: closed (30 September 2024) | Viewed by 14500

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, University at Albany, Albany, NY 12222, USA
Interests: high performance computing; arrays; tensors; scalable software; optimizations

E-Mail Website
Guest Editor
1. Department of Computer Science, National University of Singapore, Singapore City, Singapore
2. Vq Research, Inc., Mountain View, CA 94043, USA
Interests: parallel computing; numerical analysis

Special Issue Information

Dear Colleagues,

HPC and scalable software pose numerous challenges to developers: achieving scalable performance, numerical accuracy and bitwise reproducibility, and ease of programming diverse architectures through tools (OpenMP, OpenACC, etc.) as well as advanced compiler techniques to allow application programmers to exploit current and future hardware designs without exposing them to machine-specific details. New fields like performance engineering are emerging to address some of these issues. However, without strong mathematical foundations and a commitment to codesign all components of software and hardware systems, such problems will increase and expand in complexity. Is it time to reformulate operating systems and memory management? Can existing languages still work in an era where massive HPC and AI data structures like arrays (tensors) must map automatically to a vast range of distributed memory designs? How can numerical accuracy and bitwise reproducibility be guaranteed on systems where the arithmetic and language standards do not specify bitwise-reproducible rounding? This Special Issue will address all those who are using mathematical foundations to seek portable and scalable ways to optimize hardware utilization without sacrificing programmer productivity, who use co-design to guarantee performance and accuracy, and who seek verifications of design. We can prove semantic designs, but can we prove operational designs? Can we prove that the outcome of two compilers on a single machine are equivalent? Can we predict performance automatically (using source code and a sufficient description of target hardware) to reduce the need for “ninja programmers”? What issues should be studied that have been avoided to solve these problems? This Special Issues encourages visionary scientists to submit papers in the areas of mathematical foundations to address all topics above, as well as the following:

(1) Numerical accuracy and bitwise reproducibility across machines and languages;

(2) Verification of semantic and operational designs;

(3) Domain-specific machines, operating systems, and languages;

(4) Software tools to automate scalable performance on HPC machines without sacrificing reproducibility.

Prof. Dr. Lenore Mullin
Prof. Dr. John L. Gustafson
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 1369 KiB  
Article
Enabling Parallel Performance and Portability of Solid Mechanics Simulations Across CPU and GPU Architectures
by Nathaniel Morgan, Caleb Yenusah, Adrian Diaz, Daniel Dunning, Jacob Moore, Erin Heilman, Evan Lieberman, Steven Walton, Sarah Brown, Daniel Holladay, Russell Marki, Robert Robey and Marko Knezevic
Information 2024, 15(11), 716; https://doi.org/10.3390/info15110716 - 7 Nov 2024
Viewed by 689
Abstract
Efficiently simulating solid mechanics is vital across various engineering applications. As constitutive models grow more complex and simulations scale up in size, harnessing the capabilities of modern computer architectures has become essential for achieving timely results. This paper presents advancements in running parallel [...] Read more.
Efficiently simulating solid mechanics is vital across various engineering applications. As constitutive models grow more complex and simulations scale up in size, harnessing the capabilities of modern computer architectures has become essential for achieving timely results. This paper presents advancements in running parallel simulations of solid mechanics on multi-core CPUs and GPUs using a single-code implementation. This portability is made possible by the C++ matrix and array (MATAR) library, which interfaces with the C++ Kokkos library, enabling the selection of fine-grained parallelism backends (e.g., CUDA, HIP, OpenMP, pthreads, etc.) at compile time. MATAR simplifies the transition from Fortran to C++ and Kokkos, making it easier to modernize legacy solid mechanics codes. We applied this approach to modernize a suite of constitutive models and to demonstrate substantial performance improvements across different computer architectures. This paper includes comparative performance studies using multi-core CPUs along with AMD and NVIDIA GPUs. Results are presented using a hypoelastic–plastic model, a crystal plasticity model, and the viscoplastic self-consistent generalized material model (VPSC-GMM). The results underscore the potential of using the MATAR library and modern computer architectures to accelerate solid mechanics simulations. Full article
(This article belongs to the Special Issue Advances in High Performance Computing and Scalable Software)
Show Figures

Figure 1

24 pages, 830 KiB  
Article
On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures
by Nathaniel Morgan, Caleb Yenusah, Adrian Diaz, Daniel Dunning, Jacob Moore, Erin Heilman, Calvin Roth, Evan Lieberman, Steven Walton, Sarah Brown, Daniel Holladay, Marko Knezevic, Gavin Whetstone, Zachary Baker and Robert Robey
Information 2024, 15(11), 673; https://doi.org/10.3390/info15110673 - 28 Oct 2024
Cited by 1 | Viewed by 1999
Abstract
This paper presents software advances to easily exploit computer architectures consisting of a multi-core CPU and CPU+GPU to accelerate diverse types of high-performance computing (HPC) applications using a single code implementation. The paper describes and demonstrates the performance of the open-source C++ mat [...] Read more.
This paper presents software advances to easily exploit computer architectures consisting of a multi-core CPU and CPU+GPU to accelerate diverse types of high-performance computing (HPC) applications using a single code implementation. The paper describes and demonstrates the performance of the open-source C++ matrix and array (MATAR) library that uniquely offers: (1) a straightforward syntax for programming productivity, (2) usable data structures for data-oriented programming (DOP) for performance, and (3) a simple interface to the open-source C++ Kokkos library for portability and memory management across CPUs and GPUs. The portability across architectures with a single code implementation is achieved by automatically switching between diverse fine-grained parallelism backends (e.g., CUDA, HIP, OpenMP, pthreads, etc.) at compile time. The MATAR library solves many longstanding challenges associated with easily writing software that can run in parallel on any computer architecture. This work benefits projects seeking to write new C++ codes while also addressing the challenges of quickly making existing Fortran codes performant and portable over modern computer architectures with minimal syntactical changes from Fortran to C++. We demonstrate the feasibility of readily writing new C++ codes and modernizing existing codes with MATAR to be performant, parallel, and portable across diverse computer architectures. Full article
(This article belongs to the Special Issue Advances in High Performance Computing and Scalable Software)
Show Figures

Figure 1

20 pages, 305 KiB  
Article
Revisiting Database Indexing for Parallel and Accelerated Computing: A Comprehensive Study and Novel Approaches
by Maryam Abbasi, Marco V. Bernardo, Paulo Váz, José Silva and Pedro Martins
Information 2024, 15(8), 429; https://doi.org/10.3390/info15080429 - 24 Jul 2024
Viewed by 1880
Abstract
While the importance of indexing strategies for optimizing query performance in database systems is widely acknowledged, the impact of rapidly evolving hardware architectures on indexing techniques has been an underexplored area. As modern computing systems increasingly leverage parallel processing capabilities, multi-core CPUs, and [...] Read more.
While the importance of indexing strategies for optimizing query performance in database systems is widely acknowledged, the impact of rapidly evolving hardware architectures on indexing techniques has been an underexplored area. As modern computing systems increasingly leverage parallel processing capabilities, multi-core CPUs, and specialized hardware accelerators, traditional indexing approaches may not fully capitalize on these advancements. This comprehensive experimental study investigates the effects of hardware-conscious indexing strategies tailored for contemporary and emerging hardware platforms. Through rigorous experimentation on a real-world database environment using the industry-standard TPC-H benchmark, this research evaluates the performance implications of indexing techniques specifically designed to exploit parallelism, vectorization, and hardware-accelerated operations. By examining approaches such as cache-conscious B-Tree variants, SIMD-optimized hash indexes, and GPU-accelerated spatial indexing, the study provides valuable insights into the potential performance gains and trade-offs associated with these hardware-aware indexing methods. The findings reveal that hardware-conscious indexing strategies can significantly outperform their traditional counterparts, particularly in data-intensive workloads and large-scale database deployments. Our experiments show improvements ranging from 32.4% to 48.6% in query execution time, depending on the specific technique and hardware configuration. However, the study also highlights the complexity of implementing and tuning these techniques, as they often require intricate code optimizations and a deep understanding of the underlying hardware architecture. Additionally, this research explores the potential of machine learning-based indexing approaches, including reinforcement learning for index selection and neural network-based index advisors. While these techniques show promise, with performance improvements of up to 48.6% in certain scenarios, their effectiveness varies across different query types and data distributions. By offering a comprehensive analysis and practical recommendations, this research contributes to the ongoing pursuit of database performance optimization in the era of heterogeneous computing. The findings inform database administrators, developers, and system architects on effective indexing practices tailored for modern hardware, while also paving the way for future research into adaptive indexing techniques that can dynamically leverage hardware capabilities based on workload characteristics and resource availability. Full article
(This article belongs to the Special Issue Advances in High Performance Computing and Scalable Software)
19 pages, 1277 KiB  
Article
Top-Down Models across CPU Architectures: Applicability and Comparison in a High-Performance Computing Environment
by Fabio Banchelli, Marta Garcia-Gasulla and Filippo Mantovani
Information 2023, 14(10), 554; https://doi.org/10.3390/info14100554 - 10 Oct 2023
Cited by 1 | Viewed by 1998
Abstract
Top-Down models are defined by hardware architects to provide information on the utilization of different hardware components. The target is to isolate the users from the complexity of the hardware architecture while giving them insight into how efficiently the code uses the resources. [...] Read more.
Top-Down models are defined by hardware architects to provide information on the utilization of different hardware components. The target is to isolate the users from the complexity of the hardware architecture while giving them insight into how efficiently the code uses the resources. In this paper, we explore the applicability of four Top-Down models defined for different hardware architectures powering state-of-the-art HPC clusters (Intel Skylake, Fujitsu A64FX, IBM Power9, and Huawei Kunpeng 920) and propose a model for AMD Zen 2. We study a parallel CFD code used for scientific production to compare these five Top-Down models. We evaluate the level of insight achieved, the clarity of the information, the ease of use, and the conclusions each allows us to reach. Our study indicates that the Top-Down model makes it very difficult for a performance analyst to spot inefficiencies in complex scientific codes without delving deep into micro-architecture details. Full article
(This article belongs to the Special Issue Advances in High Performance Computing and Scalable Software)
Show Figures

Figure 1

29 pages, 1605 KiB  
Article
Energy-Efficient Parallel Computing: Challenges to Scaling
by Alexey Lastovetsky and Ravi Reddy Manumachu
Information 2023, 14(4), 248; https://doi.org/10.3390/info14040248 - 20 Apr 2023
Cited by 5 | Viewed by 3595
Abstract
The energy consumption of Information and Communications Technology (ICT) presents a new grand technological challenge. The two main approaches to tackle the challenge include the development of energy-efficient hardware and software. The development of energy-efficient software employing application-level energy optimization techniques has become [...] Read more.
The energy consumption of Information and Communications Technology (ICT) presents a new grand technological challenge. The two main approaches to tackle the challenge include the development of energy-efficient hardware and software. The development of energy-efficient software employing application-level energy optimization techniques has become an important category owing to the paradigm shift in the composition of digital platforms from single-core processors to heterogeneous platforms integrating multicore CPUs and graphics processing units (GPUs). In this work, we present an overview of application-level bi-objective optimization methods for energy and performance that address two fundamental challenges, non-linearity and heterogeneity, inherent in modern high-performance computing (HPC) platforms. Applying the methods requires energy profiles of the application’s computational kernels executing on the different compute devices of the HPC platform. Therefore, we summarize the research innovations in the three mainstream component-level energy measurement methods and present their accuracy and performance tradeoffs. Finally, scaling the optimization methods for energy and performance is crucial to achieving energy efficiency objectives and meeting quality-of-service requirements in modern HPC platforms and cloud computing infrastructures. We introduce the building blocks needed to achieve this scaling and conclude with the challenges to scaling. Briefly, two significant challenges are described, namely fast optimization methods and accurate component-level energy runtime measurements, especially for components running on accelerators. Full article
(This article belongs to the Special Issue Advances in High Performance Computing and Scalable Software)
Show Figures

Figure 1

20 pages, 752 KiB  
Article
Realizing Mathematics of Arrays Operations as Custom Architecture Hardware-Software Co-Design Solutions
by Ian Andrew Grout and Lenore Mullin
Information 2022, 13(11), 528; https://doi.org/10.3390/info13110528 - 4 Nov 2022
Cited by 1 | Viewed by 2476
Abstract
In embedded electronic system applications being developed today, complex datasets are required to be obtained, processed, and communicated. These can be from various sources such as environmental sensors, still image cameras, and video cameras. Once obtained and stored in electronic memory, the data [...] Read more.
In embedded electronic system applications being developed today, complex datasets are required to be obtained, processed, and communicated. These can be from various sources such as environmental sensors, still image cameras, and video cameras. Once obtained and stored in electronic memory, the data is accessed and processed using suitable mathematical algorithms. How the data are stored, accessed, processed, and communicated will impact on the cost to process the data. Such algorithms are traditionally implemented in software programs that run on a suitable processor. However, different approaches can be considered to create the digital system architecture that would consist of the memory, processing, and communications operations. When considering the mathematics at the centre of the design making processes, this leads to system architectures that can be optimized for the required algorithm or algorithms to realize. Mathematics of Arrays (MoA) is a class of operations that supports n-dimensional array computations using array shapes and indexing of values held within the array. In this article, the concept of MoA is considered for realization in software and hardware using Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC) technologies. The realization of MoA algorithms will be developed along with the design choices that would be required to map a MoA algorithm to hardware, software or hardware-software co-designs. Full article
(This article belongs to the Special Issue Advances in High Performance Computing and Scalable Software)
Show Figures

Figure 1

Back to TopTop