Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Case Study on Virtual HPC Container Clusters and Machine Learning Applications

Appl. Sci. 2025, 15(13), 7433; https://doi.org/10.3390/app15137433

by Piotr Krogulski and Tomasz Rak^*

Reviewer 1:

Xiao Qin

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Appl. Sci. 2025, 15(13), 7433; https://doi.org/10.3390/app15137433

Submission received: 15 May 2025 / Revised: 24 June 2025 / Accepted: 30 June 2025 / Published: 2 July 2025

(This article belongs to the Special Issue Novel Insights into Parallel and Distributed Computing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1. The paper’s present flow is hard to follow because system design, code listings, and performance data are interleaved, section numbering drifts (“Section gives the technical specifics…” line 70) and multi-page Dockerfiles interrupt the narrative. Later, a 400-line Dockerfile appears in the middle of the text , and multi-page results tables precede the methodological explanation that produced them. These jumps make it difficult for readers to track the argument.

2. The authors claim that container solutions are “lightweight and energy efficient,” but the text is not supported by power consumption or resource utilization data. There is a lack of energy or resource utilization analysis.

3. The lack of a non-container baseline makes it impossible to determine the additional overhead of containerization, and does not support the conclusion that containers do not affect performance.

4. Incomplete information about the hardware and software environments of the experiment in the paper. Please include host CPU model, core count, RAM, storage, host OS, Docker, OpenMPI, MPICH, TensorFlow and Horovod versions.

5. It is recommended that the authors replace (or at least supplement) the lengthy code list with a visual diagram of the network structure so that the reader can quickly understand the model. Multiple pages of Dockerfile/Keras code laced with layer definitions and training parameters.

6. Please briefly discuss the selection of threshold parameters. Such as, line 838-851.

The paper’s current structure is difficult to follow. Extensive blocks of Dockerfile and Keras code inserted in the middle of the text disrupt the flow and overwhelm the reader. The organization lacks clarity, making it hard to identify the main narrative thread.

The overall delivery of the article would also benefit from improvements in clarity and language. For example, line 48:"Important is verify its operation, examine its performance, and demonstrate its practical application."

I encourage the authors to address the methodological and presentation issues listed above.

Author Response

Comments 1: The paper’s present flow is hard to follow because system design, code listings, and performance data are interleaved, section numbering drifts (“Section gives the technical specifics…” line 70) and multi-page Dockerfiles interrupt the narrative. Later, a 400-line Dockerfile appears in the middle of the text , and multi-page results tables precede the methodological explanation that produced them. These jumps make it difficult for readers to track the argument.

Response 1: Thank you for this valuable remark. We agree that long code listings and broken references disrupted the flow of reading. Therefore, the full Dockerfiles (Sections 3.1.1 & 3.1.2) and the complete result tables have been moved to Appendix A to improve narrative clarity.

Comments 2: The authors claim that container solutions are “lightweight and energy efficient,” but the text is not supported by power consumption or resource utilization data. There is a lack of energy or resource utilization analysis.

Comments 4: Incomplete information about the hardware and software environments of the experiment in the paper. Please include host CPU model, core count, RAM, storage, host OS, Docker, OpenMPI, MPICH, TensorFlow and Horovod versions.

Response 2 and 4: Thank you for pointing this out. We have added detailed information about the hardware and software environment in a new table titled "Measurement environment configuration and parameters", including CPU model, core count, memory, storage, host OS, Docker, OpenMPI, MPICH, TensorFlow, and Horovod versions.

Comments 3: The lack of a non-container baseline makes it impossible to determine the additional overhead of containerization, and does not support the conclusion that containers do not affect performance.

Response 3: We agree, and the native baseline measurements (see *Table 3*) confirm that Docker introduces ≤2% latency overhead, consistent with the findings of Kononowicz & Czarnul. This comparison and the corresponding justification have been added to the revised manuscript.

Comments 5: It is recommended that the authors replace (or at least supplement) the lengthy code list with a visual diagram of the network structure so that the reader can quickly understand the model. Multiple pages of Dockerfile/Keras code laced with layer definitions and training parameters.

Response 5: We have included a diagram visualizing the model topology to improve accessibility and reduce reliance on lengthy code listings.

Comments 6: Please briefly discuss the selection of threshold parameters. Such as, line 838-851.

Response 6: A short rationale has been added to explain the selection of the threshold percentile:

\textcolor{red}{The value \texttt{normal\_percentile=0.99} was chosen after a grid-search (0.90–0.995) that maximised F1-score on a hand-labelled validation subset; lower percentiles produced excessive false positives, whereas 0.995 missed short spikes.}

Comments 7: For example, line 48:"Important is verify its operation, examine its performance, and demonstrate its practical application."

Response 7: Thank you for catching the stylistic error. The sentence has been rephrased.

\textcolor{red}{It is therefore essential to verify the cluster’s operation, assess its performance, and demonstrate a practical application.}

Reviewer 2 Report

Comments and Suggestions for Authors

The authors bring detailed instructions and considerations on how to use
Docker container clusters for High Performance Computing (HPC) environments.
They speciffically address the application of using such a virtual cluster
HPC environment for anomaly detection in system logs using LSTM autoencoder
neural networks.They evaluate the performance and identify some drawbacks in
resource management.

Concerning line 10, I would argue that the application of Docker Containers
for HPC does not represent an inovative application, as there have been
papers on the subject since 2014. Please rather put emphasis on the relevancy
of such an approach for today's research directions.

Also, please add more references concerning Docker for HPC to Related work,
especially recent, like https://doi.org/10.1007/978-3-030-34356-9_5,
https://doi.org/10.1007/s11227-022-04848-y, https://doi.org/10.1007/s10586-
022-03798-7, and others.

Either remove reference [1], because now it seems like unnecessary self-
citation, or add more references to support your claim that this is an
emerging field in line 62.

In lines 63-72 use Section 2 instead of second section 2, and do the same for
other sections.

There are entire paragraphs of text repeated in subsections 3.1 and 4.1,
particularly lines 258-270, 312-359, 627-674. Please revise the organization
of the paper to avoid the repetition.

The directive EXPOSE mentioned in line 462 is missing at the end of Listing
1.

Line 569 should be joined with the previous line.

The paper would greatly benefit from more visualization. I suggest several
figures:
- architecture of the whole system
- architecture of the neural network, from lines 901-929, with details on
layers, inputs and outputs, like in reference [37]
- a flow diagram from lines 963-...

Why did the authors choose to do the experimental setup on one computer
instead of on several?

The explanation of procedures and listings is very thorough, but would be improved with the abovementioned additional figures, and any other that the authors have idea for.
The discussion of results and conclusions are well written and address the questions and doubts that arise while reading the paper.

Comments on the Quality of English Language

There are minor problems with the English language, so please do a thorough
proofreading.

Author Response

Comments 1: Concerning line 10, I would argue that the application of Docker Containers

for HPC does not represent an inovative application, as there have been

papers on the subject since 2014. Please rather put emphasis on the relevancy

of such an approach for today's research directions.

Response 1: Thank you for this comment. We agree that the idea itself is no longer a novelty. However, its current applicability remains crucial. Therefore, the introductory sentence has been rephrased to emphasize the significance of containerization for today’s heterogeneous and AI-centered research workloads, rather than its historical "innovation".

\textcolor{red}{While the first Docker-enabled HPC studies date back a few years ago, the approach remains highly relevant today because modern AI-driven science demands portable, reproducible software stacks deployable across heterogeneous accelerator-rich clusters.}

Comments 2: Also, please add more references concerning Docker for HPC to Related work,

especially recent, like https://doi.org/10.1007/978-3-030-34356-9_5,

https://doi.org/10.1007/s11227-022-04848-y, https://doi.org/10.1007/s10586-

022-03798-7,

Response 2: Thank you for the suggestion. We have extended the Related Work section to include these recent references, which broaden the perspective on container-based approaches in HPC research.

Comments 3: Either remove reference [1], because now it seems like unnecessary self-

citation, or add more references to support your claim that this is an

emerging field in line 62.

Response 3: Thank you for the observation. We decided to retain reference [1] but supplemented it with additional citations and reworded the claim to highlight the ongoing momentum of containerization in the exascale-AI era.

Comments 4: In lines 63-72 use Section 2 instead of second section 2, and do the same for

other sections.

Response 4: Thank you. All incorrect section references have been corrected to a consistent “Section n” format throughout the manuscript.

Comments 5: There are entire paragraphs of text repeated in subsections 3.1 and 4.1,

particularly lines 258-270, 312-359, 627-674. Please revise the organization

of the paper to avoid the repetition.

Response 5: We agree. The repeated fragments have been removed, and the structure of the relevant chapters has been simplified to eliminate redundancy and improve clarity.

Comments 6: The directive EXPOSE mentioned in line 462 is missing at the end of Listing

Line 569 should be joined with the previous line.

Response 6: Thank you. The missing EXPOSE directive has been added, and the formatting of line 569 has been corrected by joining it with the preceding line.

Comments 7: The paper would greatly benefit from more visualization. I suggest several

figures:

- architecture of the whole system

- architecture of the neural network, from lines 901-929, with details on

layers, inputs and outputs, like in reference [37]

- a flow diagram from lines 963-...

Response 7: We fully agree. We have added illustrations including the full system architecture, a diagram of the neural network (with layer and I/O details) illustrating the execution pipeline.

Comments 8: Why did the authors choose to do the experimental setup on one computer

instead of on several?

Response 8: Thank you for the question. We have clarified the hardware configuration used for the experiments in a dedicated table (Measurement environment configuration and parameters). The use of a single machine was a deliberate choice to isolate container overhead without introducing variability from inter-node communication.

Reviewer 3 Report

Comments and Suggestions for Authors

This paper presents a case study on using Docker containers to build lightweight, flexible HPC environments. The study highlights Docker’s practical potential for distributed ML tasks and its value in improving HPC accessibility where traditional resources are limited.However, the presentation of this paper should be strengthened.

1）The manuscript would benefit from a more in-depth comparison with traditional HPC setups or container orchestration systems such as Kubernetes. Even a basic performance benchmark would strengthen the argument.

2）Ensure that all figures are clearly labeled and include descriptive captions. This will help readers understand the experimental setup and results more easily.

3） While the anomaly detection example is useful, a more complex use case could better demonstrate the cluster's scalability and performance.

4）The literature search is weak, and the reference is outdated, and more recent research is required

5)Typographical issues in the title and throughout the text (e.g., “ACaseStudy” should be “A Case Study”).

6)Some details, such as network configuration and resource allocation among containers, could be more clearly specified.

Comments on the Quality of English Language

The overall quality of English is acceptable; however, there are several instances of awkward phrasing, missing spaces (e.g., in the title), and minor grammatical issues throughout the manuscript. A thorough proofreading and language polish are recommended to improve clarity and readability.

Author Response

Comments 1: The manuscript would benefit from a more in-depth comparison with traditional HPC setups or container orchestration systems such as Kubernetes. Even a basic performance benchmark would strengthen the argument.

Response 1: Thank you for this suggestion. We agree that a comparison with Kubernetes would strengthen the conclusion. However, the focus of the manuscript is on Docker and its configuration. No experimental setup was prepared for other container orchestration systems in this study.

Comments 2: Ensure that all figures are clearly labeled and include descriptive captions. This will help readers understand the experimental setup and results more easily.

Response 2: All three figures now include axis labels, data series tags, and detailed legends to clearly explain the experimental setup and results.

Comments 3: While the anomaly detection example is useful, a more complex use case could better demonstrate the cluster's scalability and performance.

Response 3: We agree. However, the primary goal of the paper is to demonstrate how to set up a useful and reproducible environment. In future work, we plan to publish results involving classification models to evaluate scalability in more complex scenarios.

Comments 4: The literature search is weak, and the reference is outdated, and more recent research is required

Response 4: Thank you for pointing this out. We have extended the Related Work section to include recent publications and improve the coverage of caurrent research.

Comments 5: Typographical issues in the title and throughout the text (e.g., “ACaseStudy” should be “A Case Study”).

Response 5: Thank you for your attention to detail. However, we could not reproduce these typographical issues. In both the submitted PDF and the revised version, the spacing and formatting of the title and main text appear correct.

Comments 6: Some details, such as network configuration and resource allocation among containers, could be more clearly specified.

Response 6: We agree—additional details regarding network setup and resource allocation have been added to the configuration table.

Reviewer 4 Report

Comments and Suggestions for Authors

1. There are some spelling errors; please correct them (nothing major, but still requires some work).
2. The paper is heavily focused on Docker. Please add at least some information about Podman, as Kubernetes and the entire ecosystem are moving away from Docker for well-known reasons. A paragraph or two about it, maybe a reference or two, should suffice.
3. Please revise the lit review section - you have a lot of papers (close to 20) that pre-date year 2020, ant not a lot of papers from the past 3-4 years, and, in the last 3-4 years, a significant amount of papers has been published on the topic of Docker and HPC.
4. Please reference your github repository as it should be referenced in the reference section and in the paper.
5. Please put the Dockerfiles and similar content to Appendix - the code is too long to be used in the paper sections directly.
6. 4.2 and then no text before 4.2.1 - please go through your paper top-to-bottom and make sure that you don't have some kind of a section after a section without any text inbetween. Please write a paragraph of text and use it as a lead in to the next subsection.
7. Please don't end a section or any subsection level with a code/Dockerfile/table/figure. Example: listing 3. Please add a bit of text before the next (sub)section. Again, use it as a lead in.
8. Please don't put non-text content in sequence one after the other. Example: table 2 and 3. Please write a bit of text between those two tables as a lead-in to table 3. Check this type of formatting top to bottom and correct, please.
9. Future works - absolutely nothing. Please write a decent section or a couple of paragraphs in the "Conclusion" section that should then also be renamed to "Future works and conclusion".

The general problem with this paper is that I just don't see the novelty of this paper vs many others:
https://doi.org/10.36713/epra18562
https://ieeexplore.ieee.org/document/9286208
https://ieeexplore.ieee.org/document/10467287
https://link.springer.com/article/10.1007/s00607-023-01179-5
https://ieeexplore.ieee.org/document/10046607
https://ieeexplore.ieee.org/document/9284294
https://link.springer.com/article/10.1631/FITEE.2100016

Comments on the Quality of English Language

Minor improvements could be made to this paper, but it's not a showstopper.

Author Response

Comments 1: There are some spelling errors; please correct them (nothing major, but still requires some work).

Response 1: Thank you for this observation. We agree that minor typos may distract readers. We therefore performed a full spellcheck and corrected all identified issues.

Comments 2: The paper is heavily focused on Docker. Please add at least some information about Podman, as Kubernetes and the entire ecosystem are moving away from Docker for well-known reasons. A paragraph or two about it, maybe a reference or two, should suffice.

Response 2: Thank you for this important point. Indeed, Podman is becoming a significant alternative. Podman offers a rootless, daemon-less architecture that avoids the long-running dockerd service and is thus being adopted by Kubernetes (e.g., via CRI-O). We added a paragraph discussing these features and included relevant publications in the Related Work section that illustrate similar container use cases with Docker.

Comments 3: Please revise the lit review section - you have a lot of papers (close to 20) that pre-date year 2020, ant not a lot of papers from the past 3-4 years, and, in the last 3-4 years, a significant amount of papers has been published on the topic of Docker and HPC.

Response 3: Thank you. We have updated the literature review to include recent publications from the last 3–4 years, especially those focusing on Docker in HPC contexts.

Comments 4: Please reference your github repository as it should be referenced in the reference section and in the paper.

Response 4: Thank you—we have formally cited the GitHub repository both in the main text and in the References section.

Comments 5: Please put the Dockerfiles and similar content to Appendix - the code is too long to be used in the paper sections directly.

Response 5: Full listings, including Dockerfiles, have been moved to Appendix A to improve the readability of the main body.

Comments 6: 4.2 and then no text before 4.2.1 - please go through your paper top-to-bottom and make sure that you don't have some kind of a section after a section without any text inbetween. Please write a paragraph of text and use it as a lead in to the next subsection.

Response 6: Thank you. We have added an introductory sentence to Section 4.2 as a lead-in to its first subsection.

Comments 7: Please don't end a section or any subsection level with a code/Dockerfile/table/figure. Example: listing 3. Please add a bit of text before the next (sub)section. Again, use it as a lead in.

Response 7: We agree. We revised the structure and inserted brief concluding or transition text after each element that nds this way to improve flow and readability.

Comments 8: Please don't put non-text content in sequence one after the other. Example: table 2 and 3. Please write a bit of text between those two tables as a lead-in to table 3. Check this type of formatting top to bottom and correct, please.

Response 8: Thank you. While some table placement was handled automatically by LaTeX, we have added transition text between consecutive tables (e.g., Tables 2 and 3) to improve narrative continuity.

Comments 9: Future works - absolutely nothing. Please write a decent section or a couple of paragraphs in the "Conclusion" section that should then also be renamed to "Future works and conclusion".

Response 9: We agree. The Conclusion section has been expanded to include a Future Work component.

Comments 10: The general problem with this paper is that I just don't see the novelty of this paper vs many others:

https://doi.org/10.36713/epra18562

https://ieeexplore.ieee.org/document/9286208

https://ieeexplore.ieee.org/document/10467287

https://link.springer.com/article/10.1007/s00607-023-01179-5

https://ieeexplore.ieee.org/document/10046607

https://ieeexplore.ieee.org/document/9284294

https://link.springer.com/article/10.1631/FITEE.2100016

Response 10: Thank you for this valuable comparison. While most of the cited papers offer useful insights, only the last one directly addresses ML model deployment within containers. We have extended our comparison and explicitly highlighted our unique contributions, particularly in terms of practical reproducibility, automation, and integration of AI workflows with Docker on HPC-like clusters.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Comments 4: In lines 63-72 use Section 2 instead of second section 2, and do the same for

other sections.

Response 4: Thank you. All incorrect section references have been corrected to a consistent “Section n” format throughout the manuscript.

Better, but still not using the “Section n” format. Please use this usual format: The article is divided into sections. Section 2 provides ... . Section 3 contains ... . Section 4 discusses ... . The final Section 5 ...

----------------------------------------------------------------------------------------------

Please, put Table 1 in Section 4, not on Page 3, and reference it in Section 4, not in the Introduction.

------------------------------------------------------------------------------------------------

Comments 5: There are entire paragraphs of text repeated in subsections 3.1 and 4.1,

particularly lines 258-270, 312-359, 627-674. Please revise the organization

of the paper to avoid the repetition.

Response 5: We agree. The repeated fragments have been removed, and the structure of the relevant chapters has been simplified to eliminate redundancy and improve clarity.

There is still some repetition, but can be accepted.

Still, is it possible to leave the information from 3.1. only in that subsection, and not repeat it (even in the revised form) in 4.1, but only to refer the reader to subsection 3.1 for details?

--------------------------------------------------------------------------------------------------------

Comments 7: The paper would greatly benefit from more visualization. I suggest several

figures...

Response 7: We fully agree. We have added illustrations including the full system architecture, a diagram of the neural network (with layer and I/O details) illustrating the execution pipeline.

Thank you. I also believe that "Figure 2.3. LSTM Autoencoder Flow Diagram" could be included or redrawn and referenced from [46] to better visualize lines 639-668.

Author Response

------------------------------------------------------------------------------------------------
1. Better, but still not using the “Section n” format. Please use this usual format: The article is divided into sections. Section 2 provides ... . Section 3 contains ... . Section 4 discusses ... . The final Section 5 ...

Answer:
The current version has this division (62-75):

The article is divided into sections.
\textcolor{red}{The section~\ref{sec:2} provides a comprehensive overview of the evolution and significance of computational clusters, detailing their transition from exclusive use in government and academic organizations to more widespread accessibility due to technological advancements like miniaturization and the development of high-speed local computer networks.
The section~\ref{sec:3} contains a brief exposition of the system architecture used in the research, including details on the Docker environment, the configuration of the virtual HPC clusters, and the implementation of ML algorithms. This section gives the technical specifics and methodologies employed in the ML applications, particularly focusing on the use of Long Short-Term Memory (LSTM) networks for anomaly detection, along with detailed descriptions of the data transformation and neural network training processes.
The section~\ref{sec:4} discusses the real-world performance and application of the models and systems described, providing empirical data and analysis to showcase the efficacy of the proposed methods (Table \ref{Tab:measurementSetup}}). The final section~\ref{sec:5} provides a comprehensive summary and conclusion of the findings, along with implications for future research in this field.

----------------------------------------------------------------------------------------------
2. Please, put Table 1 in Section 4, not on Page 3, and reference it in Section 4, not in the Introduction.

Answer:
Changed.

------------------------------------------------------------------------------------------------
3. There is still some repetition, but can be accepted.
Still, is it possible to leave the information from 3.1. only in that subsection, and not repeat it (even in the revised form) in 4.1, but only to refer the reader to subsection 3.1 for details?

Answer:
The "General Characteristics" subsection has been shortened to a minimum.

------------------------------------------------------------------------------------------------
4. Thank you. I also believe that "Figure 2.3. LSTM Autoencoder Flow Diagram" could be included or redrawn and referenced from [46] to better visualize lines 639-668.

Answer:
The article does not provide the figure: "Figure 2.3. LSTM Autoencoder Flow Diagram".
You probably mean the listing: "Listing 4: Neural network model code for anomaly detection. Based on [46]".
The flows implemented in the anomaly detection were modeled exactly on the text [46].
The article contrasts the architecture and behavior of an LSTM Autoencoder with a regular LSTM network. The Autoencoder compresses input into a latent representation and reconstructs it, whereas the regular LSTM model processes sequences without such compression.

Article Menu

A Case Study on Virtual HPC Container Clusters and Machine Learning Applications

Further Information

Guidelines

MDPI Initiatives

Follow MDPI