Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Distributed Big Data Storage Infrastructure for Biomedical Research Featuring High-Performance and Rich-Features

Future Internet 2022, 14(10), 273; https://doi.org/10.3390/fi14100273

by Xingjian Xu^*

, Lijun Sun and Fanjun Meng

Reviewer 1: Anonymous

Reviewer 2:

Naoki Ohshima

Future Internet 2022, 14(10), 273; https://doi.org/10.3390/fi14100273

Submission received: 27 August 2022 / Revised: 21 September 2022 / Accepted: 22 September 2022 / Published: 24 September 2022

(This article belongs to the Special Issue Software Engineering and Data Science II)

Round 1

Reviewer 1 Report

The paper is titled – “Distributed Big Data Storage Infrastructure For Biomedical Research Featuring High-performance and Rich-features”. It aligns with the scope of the special issue to which it has been submitted. In this paper, the authors present F3BFS, a functional, fundamental, and future-oriented distributed file system, specially designed for various kinds of biomedical data. The paper claims that F3BFS makes it possible to boost existing software's performance without modifying its main algorithms by transmitting raw datasets from generic file systems. Further, F3BFS has various built-in features to help researchers manage biology datasets more efficiently and productively, including metadata management, fuzzy search, automatic backup, transparent compression, etc. The work seems novel. However, the presentation of the paper needs significant improvement. It is suggested that the authors make the necessary changes/updates to their paper as per the following comments:

1. In 3.2.1 the authors' state – “users cannot upload or download too large dataset to cluster in web console, and the IO speed is also limited” Additional details should be provided here as the definition of “too large” can be relative from user to user. Please provide specific information (such as dataset size, file types, etc.) in this context

2. “Compared to web console and FUSE mount, API bindings are more efficient and generally have better runtime performance” – Are there any data or results to support this claim? If yes, please provide the data/results and explain the same.

3. Several references are incomplete. For instance in [2], [3], and [5] the names of the conference/journal and year of publication are missing.

4. The authors state – “Many attempts have been made to accelerate biology big data analysis….” but just one paper has been cited. As the authors are stating “many attempts”, it is recommended that the authors cite at least two recent papers in this area. A couple of suggested citations are - https://doi.org/10.3390/covid2080076 and https://doi.org/10.1155/2021/5520366

4. Please proofread the entire paper and correct all the spelling mistakes. For instance, in 2.4, "metadata" is spelled as "metadta"

5. Please comment on the security protocols that are implemented to protect datasets uploaded to F3BFS which might contain confidential data in some form. Specifically, via the web console approach, if people use the “View Page Source” or “Inspect” option from a modern web browser - will they be able to see the code with clickable links/paths to the datasets which might make the web console vulnerable to a SQL injection attack?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments

The author introduces F3BFS, a distributed file system dedicated to the construction and operation of biomedical big data, in this paper because "few programs focus on providing a solid foundation of file systems for biomedical big data. This paper introduces F3BFS, a distributed file system dedicated to the construction and operation of biomedical big data. The purpose of this paper is, I hope, to provide the reader with useful scientific information.

However, an adequate discussion is not provided in the following aspects.

(1) F3BFS

What does F3BFS stand for? The original spelling of F3BFS is not indicated anywhere in the paper.

(2) Technical Advancements of F3BFS

The authors note that F3BFS has a variety of features that allow researchers to manage biological datasets more efficiently and productively, including metadata management, fuzzy search, automatic backup, and transparent compression. However, there is no mention of fuzzy search or transparent compression in the text. 3.1.3 mentions Encoder and chunking in the Data Router, but how do these help researchers manage biological data sets more efficiently and productively? How do they help researchers manage biological datasets more efficiently and productively?

(3) Verifiability

The authors state that "F3BFS makes it possible to transfer raw data from a general-purpose file system and improve its performance without changing key algorithms in existing software. However, it does not describe how to build an F3BFS environment (e.g., where to obtain the F3BFS program and how to install it), making it difficult for the reader to verify the usefulness of F3BFS.

(4) Are F3FS and F3BFS the same distributed file system?

1. in Introduction L.54-56, it is labeled "F3FS." In "Methods" and later, it is referred to as "F3BFS"." Is "F3FS" in "Introduction" and "F3FS" after "Methods" the same storage service? If so, is "F3FS" used only in the Introduction? Please explain why.

Spelling and Grammatical Errors

There are too much spelling and grammatical errors. Please correct the errors that the reviewers have noticed. 1.

1. Paper Title

Prepositions should be written in lower case.

Distributed Big Data Storage Infrastructure For Biomedical Research Featuring High-performance and Rich-features

Distributed Big Data Storage Infrastructure for Biomedical Research Featuring High-Performance and Rich-Features

2. Spelling and grammatical errors

･ Insert a single-byte space after the separator

L.12 ;biomedical big data -> ; biomedical big data

L.12 ;big data storage -> ; big data storage

･ Insert a single-byte space before the parentheses

L.16 Law[1].

L.22 technologies[2].

L.28 data[3,4].

L.35 system[5].

L.50 FASTAFS[8].

･ Misspelling

L.60 distribued

L.63 dta

L.72 strucure

L.112 implmenet

L.114 Metadta

L.161, L.163, L.164, L.166 cordinator

L.176 Figure 2 procudure.

L.183 hightly

L.205 recive

L.213 visist

L.226 userspace -> user space

L.271 distribued

L.284 bluestore -> blue store

L.303 prefetche -> prefetch

L.342 caculated

L.357 itslef.

L.367 currenly

L.384 availa-ble

L.386 resoruces

L.392

Input/Output Operations Per senconds -> Input/Output Operations Per Second

Input/Output Sequences Per senconds -> Input/Output Sequences Per Second

Grammatical errors

L.150 inode

Italicize "inode" since it is a concept specific to some operating systems.

L.156 to speedup

"Speedup" is a noun, not a verb.

The correct form of this sentence is "to speed up".

L.183

The encoders in chain is -> The encoders in chain are

L.187 chunks.For

L.341 affectthe

Insert a single-byte space

L.204 nodes -> node

L.281 are -> is

L.342 of an system is -> of a system is

L.371 hardwares

"Hardware" is an uncountable noun.

Comments for author File: Comments.docx

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have updated their paper as per all my comments and suggestions. I do not have any additional comments at this point. I recommend the publication of the paper in its current form.

Author Response

This reviewer decalres no additional comments at this moment.

Reviewer 2 Report

The revised manuscript seems to have improved on the reviewer's remarks. However, the author still uses "F3FS" in paragraphs from Line 54 to Line 57.

Why does the author write "F3FS" only in this paragraph (Line 54 to Line 57) nevertheless you describe "F3BFS" in almost all part of the paper?

Is this an intentional usage? Or is it a misspelling? If the former, please provide a clear reason in the main document. If the latter, please correct "F3FS" to "F3BFS" immediately.

Author Response

Thanks for your careful review. We have corrected all F3FS to F3BFS in manuscript. F3FS is the legacy name of our project.

Round 3

Reviewer 2 Report

No comment

Article Menu

Distributed Big Data Storage Infrastructure for Biomedical Research Featuring High-Performance and Rich-Features

Further Information

Guidelines

MDPI Initiatives

Follow MDPI