High Performance Computing (HPC) Software Design

A special issue of Computation (ISSN 2079-3197). This special issue belongs to the section "Computational Engineering".

Deadline for manuscript submissions: closed (30 October 2016) | Viewed by 15065

Special Issue Editor


E-Mail Website
Guest Editor
Department Informatik, Friedrich-Alexander-Universität Erlangen-Nürnberg, Lehrstuhl für Systemsimulation, Cauerstr. 11, 91058 Erlangen, Germany
Interests: high performance computing; software design; computational science and engineering

Special Issue Information

Dear Colleagues,

High performance computing (HPC) deals with a variety of application domains, ranging from image and video processing to simulation and computational science applied in several areas of natural science. With consumer devices and high-end systems becoming increasingly powerful, HPC is playing an increasingly important role in research and in applications alike. Today, real-world application codes are often hand-tuned, which requires a huge amount of engineering effort given the variety of codes in use. Therefore, simplifying the task of constructing HPC codes that deliver high performance has become an important topic in research. The idea of this Special Issue is to show different approaches to HPC software design that increase performance, productivity, or portability of application codes.

Topics of interest include several aspects of HPC codes:

  • performance optimization
  • auto-tuning and machine learning
  • software technology
  • code generation for GPUs, accelerators and distributed systems
  • applications in embedded systems
  • hardware/high-level synthesis
  • reaching exascale performance
  • static analysis and verification
  • scalability of numerical algorithms

Dr. Harald Köstler
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Computation is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

2344 KiB  
Article
Evaluation of External Memory Access Performance on a High-End FPGA Hybrid Computer
by Konstantinos Kalaitzis, Evripidis Sotiriadis, Ioannis Papaefstathiou and Apostolos Dollas
Computation 2016, 4(4), 41; https://doi.org/10.3390/computation4040041 - 25 Oct 2016
Cited by 1 | Viewed by 3950
Abstract
The motivation of this research was to evaluate the main memory performance of a hybrid super computer such as the Convey HC-x, and ascertain how the controller performs in several access scenarios, vis-à-vis hand-coded memory prefetches. Such memory patterns are very useful in [...] Read more.
The motivation of this research was to evaluate the main memory performance of a hybrid super computer such as the Convey HC-x, and ascertain how the controller performs in several access scenarios, vis-à-vis hand-coded memory prefetches. Such memory patterns are very useful in stencil computations. The theoretical bandwidth of the memory of the Convey is compared with the results of our measurements. The accurate study of the memory subsystem is particularly useful for users when they are developing their application-specific personality. Experiments were performed to measure the bandwidth between the coprocessor and the memory subsystem. The experiments aimed mainly at measuring the reading access speed of the memory from Application Engines (FPGAs). Different ways of accessing data were used in order to find the most efficient way to access memory. This way was proposed for future work in the Convey HC-x. When performing a series of accesses to memory, non-uniform latencies occur. The Memory Controller of the Convey HC-x in the coprocessor attempts to cover this latency. We measure memory efficiency as a ratio of the number of memory accesses and the number of execution cycles. The result of this measurement converges to one in most cases. In addition, we performed experiments with hand-coded memory accesses. The analysis of the experimental results shows how the memory subsystem and Memory Controllers work. From this work we conclude that the memory controllers do an excellent job, largely because (transparently to the user) they seem to cache large amounts of data, and hence hand-coding is not needed in most situations. Full article
(This article belongs to the Special Issue High Performance Computing (HPC) Software Design)
Show Figures

Figure 1

6186 KiB  
Article
DiamondTorre Algorithm for High-Performance Wave Modeling
by Vadim Levchenko, Anastasia Perepelkina and Andrey Zakirov
Computation 2016, 4(3), 29; https://doi.org/10.3390/computation4030029 - 12 Aug 2016
Cited by 15 | Viewed by 5110
Abstract
Effective algorithms of physical media numerical modeling problems’ solution are discussed. The computation rate of such problems is limited by memory bandwidth if implemented with traditional algorithms. The numerical solution of the wave equation is considered. A finite difference scheme with a cross [...] Read more.
Effective algorithms of physical media numerical modeling problems’ solution are discussed. The computation rate of such problems is limited by memory bandwidth if implemented with traditional algorithms. The numerical solution of the wave equation is considered. A finite difference scheme with a cross stencil and a high order of approximation is used. The DiamondTorre algorithm is constructed, with regard to the specifics of the GPGPU’s (general purpose graphical processing unit) memory hierarchy and parallelism. The advantages of these algorithms are a high level of data localization, as well as the property of asynchrony, which allows one to effectively utilize all levels of GPGPU parallelism. The computational intensity of the algorithm is greater than the one for the best traditional algorithms with stepwise synchronization. As a consequence, it becomes possible to overcome the above-mentioned limitation. The algorithm is implemented with CUDA. For the scheme with the second order of approximation, the calculation performance of 50 billion cells per second is achieved. This exceeds the result of the best traditional algorithm by a factor of five. Full article
(This article belongs to the Special Issue High Performance Computing (HPC) Software Design)
Show Figures

Graphical abstract

1657 KiB  
Article
Automatic Generation of Massively Parallel Codes from ExaSlang
by Sebastian Kuckuk and Harald Köstler
Computation 2016, 4(3), 27; https://doi.org/10.3390/computation4030027 - 4 Aug 2016
Cited by 15 | Viewed by 5445
Abstract
Domain-specific languages (DSLs) have the potential to provide an intuitive interface for specifying problems and solutions for domain experts. Based on this, code generation frameworks can produce compilable source code. However, apart from optimizing execution performance, parallelization is key for pushing the limits [...] Read more.
Domain-specific languages (DSLs) have the potential to provide an intuitive interface for specifying problems and solutions for domain experts. Based on this, code generation frameworks can produce compilable source code. However, apart from optimizing execution performance, parallelization is key for pushing the limits in problem size and an essential ingredient for exascale performance. We discuss necessary concepts for the introduction of such capabilities in code generators. In particular, those for partitioning the problem to be solved and accessing the partitioned data are elaborated. Furthermore, possible approaches to expose parallelism to users through a given DSL are discussed. Moreover, we present the implementation of these concepts in the ExaStencils framework. In its scope, a code generation framework for highly optimized and massively parallel geometric multigrid solvers is developed. It uses specifications from its multi-layered external DSL ExaSlang as input. Based on a general version for generating parallel code, we develop and implement widely applicable extensions and optimizations. Finally, a performance study of generated applications is conducted on the JuQueen supercomputer. Full article
(This article belongs to the Special Issue High Performance Computing (HPC) Software Design)
Show Figures

Graphical abstract

Back to TopTop