High-Performance Computing (HPC) and Computer Architecture

A special issue of Computers (ISSN 2073-431X).

Deadline for manuscript submissions: 30 November 2026 | Viewed by 2902

Editors


E-Mail Website
Guest Editor
School of Computing and Engineering, Quinnipiac University, Hamden, CT 06518, USA
Interests: machine learning; computer networks; high-performance computing

E-Mail Website
Guest Editor
School of Computing and Engineering, Quinnipiac University, Hamden, CT 06518, USA
Interests: AI for science; multi-modal LLM systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues, 

High-Performance Computing (HPC) and Computer Architecture are undergoing rapid transformation, shaped by large-scale applications, heterogeneous accelerators, and the convergence of cloud- and supercomputing. At the same time, the rise of AI-driven workloads such as large-scale language model (LLM) training, Mixture-of-Experts (MoE) systems, and other AI-HPC infrastructures is redefining performance requirements and architectural priorities. This Special Issue seeks original research, surveys, and applied contributions that advance our understanding of performance, scalability, and system design across these evolving domains.

Topics of interest include parallel and distributed systems, supercomputing and cluster infrastructures, multi-core and GPU computing, hardware acceleration, quantum-AI computing, and energy-efficient system design. We also welcome work that addresses resource allocation, workload management, programming models, memory and interconnect architectures, and cloud or data center integration. Both theoretical advances and applied research are welcome, provided they deliver clear insights into performance optimization or system-level innovation.

By integrating contributions from both traditional HPC and emerging AI-HPC communities, this Special Issue aims to provide a comprehensive perspective on next-generation computing platforms and inspire solutions for scalable, efficient, and intelligent computation.

Prof. Dr. Taskin Kocak
Dr. Ron (Rongyu) Lin
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-anonymized peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Computers is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • high-performance and parallel computing
  • computer architecture
  • supercomputing and cluster systems
  • multi-core and GPU computing
  • performance optimization and workload management
  • energy efficiency and scalability
  • cloud computing and data centers
  • hardware acceleration
  • AI-HPC systems and infrastructures
  • hybrid quantum–AI computing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 2304 KB  
Article
AGP-GEMM: Adaptive Grouping and Partitioning Framework for Accelerating Small and Irregular Matrices on CPUs
by Hongzhe Zhou, Lu Lu, Haibiao Yang and Yu Zhang
Computers 2026, 15(4), 223; https://doi.org/10.3390/computers15040223 - 3 Apr 2026
Viewed by 776
Abstract
General Matrix Multiplication (GEMM) is a fundamental computational kernel in scientific computing, serving as the foundation for numerous complex tasks. However, in practical applications, the performance of GEMM is often constrained by irregular matrix dimensions and the diversity of hardware architectures. In particular, [...] Read more.
General Matrix Multiplication (GEMM) is a fundamental computational kernel in scientific computing, serving as the foundation for numerous complex tasks. However, in practical applications, the performance of GEMM is often constrained by irregular matrix dimensions and the diversity of hardware architectures. In particular, when processing small and irregular matrices, GEMM typically exhibits reduced computational efficiency. To address these challenges, this paper proposes a GEMM acceleration method based on an adaptive core grouping strategy. The method consists of two key components: a core grouping mechanism that alleviates workload imbalance among multi-core CPUs, and an adaptive block partitioning algorithm that dynamically selects optimal tiling schemes according to the matrix dimensions, achieving both load balance and cache-friendly data access. Experimental results on the Kunpeng CPU platform demonstrate that the proposed method achieves significant performance improvements compared to the Kunpeng KML math library, reaching a peak acceleration of up to 2.1× and an average speedup of 1.64×. These results validate the effectiveness and efficiency of the proposed approach in handling small and irregular matrix computation scenarios. Full article
(This article belongs to the Special Issue High-Performance Computing (HPC) and Computer Architecture)
Show Figures

Figure 1

22 pages, 831 KB  
Article
Energy-Efficient Dual-Core RISC-V Architecture for Edge AI Acceleration with Dynamic MAC Unit Reuse
by Cristian Andy Tanase
Computers 2026, 15(4), 219; https://doi.org/10.3390/computers15040219 - 1 Apr 2026
Viewed by 1180
Abstract
This paper presents a dual-core RISC-V architecture designed for energy-efficient AI acceleration at the edge, featuring dynamic MAC unit sharing, frequency scaling (DFS), and FIFO-based resource arbitration. The system comprises two RISC-V cores that compete for shared computational resources—a single Multiply–Accumulate (MAC) unit [...] Read more.
This paper presents a dual-core RISC-V architecture designed for energy-efficient AI acceleration at the edge, featuring dynamic MAC unit sharing, frequency scaling (DFS), and FIFO-based resource arbitration. The system comprises two RISC-V cores that compete for shared computational resources—a single Multiply–Accumulate (MAC) unit and a shared external memory subsystem—governed by a channel-based arbitration mechanism with CPU-priority semantics, while each core maintains private instruction and data caches. The architecture implements a tightly coupled Neural Processing Unit (NPU) with CONV, GEMM, and POOL operations that execute opportunistically in the background when the MAC unit is available. Dynamic frequency scaling (DFS) with three levels (100/200/400 MHz) is applied to the shared MAC unit, allowing the dynamic acceleration of CNN workloads. The arbitration mechanism uses SystemC sc_fifo channels with CPU-priority polling, ensuring that CPU execution is minimally impacted by background AI processing while the NPU makes progress during idle MAC slots. The NPU supports 3 × 3 convolutions, matrix multiplication (GEMM) with 10 × 10 tiles, and pooling operations. The implementation is cycle-accurate in SystemC, targeting FPGA deployment. Experimental evaluation demonstrates that the dual-core architecture achieves 1.87× speedup with 93.5% efficiency for parallel workloads, while DFS enables 70% power reduction at low frequency. The system successfully executes simultaneous CPU and AI workloads, with CPU-priority arbitration ensuring no CPU starvation under contention. The proposed design offers a practical solution for embedded AI applications requiring both general-purpose computation and neural network acceleration, validated through comprehensive SystemC simulation on modern FPGA platforms. Full article
(This article belongs to the Special Issue High-Performance Computing (HPC) and Computer Architecture)
Show Figures

Figure 1

Back to TopTop