Current Trends in Computer Architecture and High Performance Computing (HPC) with Their Mathematical Foundations

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (31 January 2022) | Viewed by 12277

Special Issue Editors


E-Mail Website
Guest Editor
CITIC Research Center, Universidade da Coruña, 15001 A Coruña, Spain
Interests: compilers; emerging memory technologies; low energy computing; high performance computing

E-Mail Website
Guest Editor
CITIC Research Center, Universidade da Coruña, A Coruña, Spain
Interests: high performance computing; big data; cloud computing; compilers; parallel algorithms; low-latency communications

Special Issue Information

Dear Colleagues,

The performance of computing devices has improved tremendously over the last decade. The design of computer architectures has adapted to work around the physical limitations hindering the traditional scaling of production processes, looking for new designs to take advantage of the ever growing transistor count while working around the power delivery and dissipation limitations imposed by the so-called “power wall”.

In response to this architectural evolution, the software community has undertaken huge efforts to adapt their scientific and engineering computations to the new highly parallel, highly heterogeneous computational units. The quest for high performance computing has triggered algorithmic paradigm changes, as well as the development of more powerful automatic optimizers and compilers.

We invite researchers to submit papers related to modern architectures, algorithms, and code analysis and synthesis techniques, highlighting the mathematical concepts and applications that constitute the fundamental pillars of modern computer engineering. These mathematical concepts include, but are not limited to, combinatorics, graph theory, probability and statistics, operations research, linear and computational algebra, geometry, numerical analysis, or optimization. 

Prof. Dr. Gabriel Rodríguez
Prof. Dr. Juan Touriño
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Processor, memory, storage systems, and interconnection network architecture
  • HPC programming models, systems, and infrastructures
  • Code generation, translation, transformation, and optimization
  • Optimizations for heterogeneous or specialized targets
  • Instruction, thread, and data-level parallelism
  • Architectural support and modeling
  • Effects of circuits or technology on architecture

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

29 pages, 2147 KiB  
Article
Representing Integer Sequences Using Piecewise-Affine Loops
by Gabriel Rodríguez, Louis-Noël Pouchet and Juan Touriño
Mathematics 2021, 9(19), 2368; https://doi.org/10.3390/math9192368 - 24 Sep 2021
Viewed by 1233
Abstract
A formal, high-level representation of programs is typically needed for static and dynamic analyses performed by compilers. However, the source code of target applications is not always available in an analyzable form, e.g., to protect intellectual property. To reason on such applications, it [...] Read more.
A formal, high-level representation of programs is typically needed for static and dynamic analyses performed by compilers. However, the source code of target applications is not always available in an analyzable form, e.g., to protect intellectual property. To reason on such applications, it becomes necessary to build models from observations of its execution. This paper details an algebraic approach which, taking as input the trace of memory addresses accessed by a single memory reference, synthesizes an affine loop with a single perfectly nested reference that generates the original trace. This approach is extended to support the synthesis of unions of affine loops, useful for minimally modeling traces generated by automatic transformations of polyhedral programs, such as tiling. The resulting system is capable of processing hundreds of gigabytes of trace data in minutes, minimally reconstructing 100% of the static control parts in PolyBench/C applications and 99.99% in the Pluto-tiled versions of these benchmarks. As an application example of the trace modeling method, trace compression is explored. The affine representations built for the memory traces of PolyBench/C codes achieve compression factors of the order of 106 and 103 with respect to gzip for the original and tiled versions of the traces, respectively. Full article
Show Figures

Figure 1

20 pages, 732 KiB  
Article
Evaluation of Clustering Algorithms on HPC Platforms
by Juan M. Cebrian, Baldomero Imbernón, Jesús Soto and José M. Cecilia
Mathematics 2021, 9(17), 2156; https://doi.org/10.3390/math9172156 - 04 Sep 2021
Cited by 2 | Viewed by 1790
Abstract
Clustering algorithms are one of the most widely used kernels to generate knowledge from large datasets. These algorithms group a set of data elements (i.e., images, points, patterns, etc.) into clusters to identify patterns or common features of a sample. However, these algorithms [...] Read more.
Clustering algorithms are one of the most widely used kernels to generate knowledge from large datasets. These algorithms group a set of data elements (i.e., images, points, patterns, etc.) into clusters to identify patterns or common features of a sample. However, these algorithms are very computationally expensive as they often involve the computation of expensive fitness functions that must be evaluated for all points in the dataset. This computational cost is even higher for fuzzy methods, where each data point may belong to more than one cluster. In this paper, we evaluate different parallelisation strategies on different heterogeneous platforms for fuzzy clustering algorithms typically used in the state-of-the-art such as the Fuzzy C-means (FCM), the Gustafson–Kessel FCM (GK-FCM) and the Fuzzy Minimals (FM). The experimental evaluation includes performance and energy trade-offs. Our results show that depending on the computational pattern of each algorithm, their mathematical foundation and the amount of data to be processed, each algorithm performs better on a different platform. Full article
Show Figures

Figure 1

19 pages, 1183 KiB  
Article
OpenCNN: A Winograd Minimal Filtering Algorithm Implementation in CUDA
by Roberto L. Castro, Diego Andrade and Basilio B. Fraguela
Mathematics 2021, 9(17), 2033; https://doi.org/10.3390/math9172033 - 24 Aug 2021
Cited by 4 | Viewed by 4931
Abstract
Improving the performance of the convolution operation has become a key target for High Performance Computing (HPC) developers due to its prevalence in deep learning applied mainly to video processing. The improvement is being pushed by algorithmic and implementation innovations. Algorithmically, the convolution [...] Read more.
Improving the performance of the convolution operation has become a key target for High Performance Computing (HPC) developers due to its prevalence in deep learning applied mainly to video processing. The improvement is being pushed by algorithmic and implementation innovations. Algorithmically, the convolution can be solved as it is mathematically enunciated, but other methods allow to transform it into a Fast Fourier Transform (FFT) or a GEneral Matrix Multiplication (GEMM). In this latter group, the Winograd algorithm is a state-of-the-art variant that is specially suitable for smaller convolutions. In this paper, we present openCNN, an optimized CUDA C++ implementation of the Winograd convolution algorithm. Our approach achieves speedups of up to 1.76× on Turing RTX 2080Ti and up to 1.85× on Ampere RTX 3090 with respect to Winograd convolution in cuDNN 8.2.0. OpenCNN is released as open-source software. Full article
Show Figures

Figure 1

14 pages, 288 KiB  
Article
A Theoretical Model for Global Optimization of Parallel Algorithms
by Julian Miller, Lukas Trümper, Christian Terboven and Matthias S. Müller
Mathematics 2021, 9(14), 1685; https://doi.org/10.3390/math9141685 - 17 Jul 2021
Cited by 4 | Viewed by 2036
Abstract
With the quickly evolving hardware landscape of high-performance computing (HPC) and its increasing specialization, the implementation of efficient software applications becomes more challenging. This is especially prevalent for domain scientists and may hinder the advances in large-scale simulation software. One idea to overcome [...] Read more.
With the quickly evolving hardware landscape of high-performance computing (HPC) and its increasing specialization, the implementation of efficient software applications becomes more challenging. This is especially prevalent for domain scientists and may hinder the advances in large-scale simulation software. One idea to overcome these challenges is through software abstraction. We present a parallel algorithm model that allows for global optimization of their synchronization and dataflow and optimal mapping to complex and heterogeneous architectures. The presented model strictly separates the structure of an algorithm from its executed functions. It utilizes a hierarchical decomposition of parallel design patterns as well-established building blocks for algorithmic structures and captures them in an abstract pattern tree (APT). A data-centric flow graph is constructed based on the APT, which acts as an intermediate representation for rich and automated structural transformations. We demonstrate the applicability of this model to three representative algorithms and show runtime speedups between 1.83 and 2.45 on a typical heterogeneous CPU/GPU architecture. Full article
Show Figures

Figure 1

Back to TopTop