Next Article in Journal
Collagen-Based Bioactive Bromelain Hydrolysate from Salt-Cured Cod Skin
Next Article in Special Issue
RLSchert: An HPC Job Scheduler Using Deep Reinforcement Learning and Remaining Time Prediction
Previous Article in Journal
Remotely Controlled Robot Swarms: A Structural Analysis and Model for Structural Optimization
Previous Article in Special Issue
Improvements to Supercomputing Service Availability Based on Data Analysis
Article

Analyzing the Performance of the S3 Object Storage API for HPC Workloads

by 1,* and 2,*
1
MIN Faculty, University of Hamburg, 20146 Hamburg, Germany
2
Institute of Computer Science, Faculty of Mathematics and Computer Science, Georg-August-Universität Göttingen/GWDG, 37018 Göttingen, Germany
*
Authors to whom correspondence should be addressed.
Academic Editors: Antonio J. Pena and Pedro Valero-Lara
Appl. Sci. 2021, 11(18), 8540; https://doi.org/10.3390/app11188540
Received: 19 July 2021 / Revised: 3 September 2021 / Accepted: 9 September 2021 / Published: 14 September 2021
(This article belongs to the Special Issue State-of-the-Art High-Performance Computing and Networking)
The line between HPC and Cloud is getting blurry: Performance is still the main driver in HPC, while cloud storage systems are assumed to offer low latency, high throughput, high availability, and scalability. The Simple Storage Service S3 has emerged as the de facto storage API for object storage in the Cloud. This paper seeks to check if the S3 API is already a viable alternative for HPC access patterns in terms of performance or if further performance advancements are necessary. For this purpose: (a) We extend two common HPC I/O benchmarks—the IO500 and MD-Workbench—to quantify the performance of the S3 API. We perform the analysis on the Mistral supercomputer by launching the enhanced benchmarks against different S3 implementations: on-premises (Swift, MinIO) and in the Cloud (Google, IBM…). We find that these implementations do not yet meet the demanding performance and scalability expectations of HPC workloads. (b) We aim to identify the cause for the performance loss by systematically replacing parts of a popular S3 client library with lightweight replacements of lower stack components. The created S3Embedded library is highly scalable and leverages the shared cluster file systems of HPC infrastructure to accommodate arbitrary S3 client applications. Another introduced library, S3remote, uses TCP/IP for communication instead of HTTP; it provides a single local S3 gateway on each node. By broadening the scope of the IO500, this research enables the community to track the performance growth of S3 and encourage sharing best practices for performance optimization. The analysis also proves that there can be a performance convergence—at the storage level—between Cloud and HPC over time by using a high-performance S3 library like S3Embedded. View Full-Text
Keywords: HPC; cloud; convergence; storage; AWS S3; S3Embedded HPC; cloud; convergence; storage; AWS S3; S3Embedded
Show Figures

Figure 1

MDPI and ACS Style

Gadban, F.; Kunkel, J. Analyzing the Performance of the S3 Object Storage API for HPC Workloads. Appl. Sci. 2021, 11, 8540. https://doi.org/10.3390/app11188540

AMA Style

Gadban F, Kunkel J. Analyzing the Performance of the S3 Object Storage API for HPC Workloads. Applied Sciences. 2021; 11(18):8540. https://doi.org/10.3390/app11188540

Chicago/Turabian Style

Gadban, Frank, and Julian Kunkel. 2021. "Analyzing the Performance of the S3 Object Storage API for HPC Workloads" Applied Sciences 11, no. 18: 8540. https://doi.org/10.3390/app11188540

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop