Review Reports
- Songfu Tan and
- Ligu Zhu*
Reviewer 1: Anonymous Reviewer 2: Anonymous Reviewer 3: Anonymous
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsPapers consider the problem of distributed storage from software perspective and offers efficient solution for large-scale virtual disk construction by developing new deduplication-aware thin provisioning mechanism and tailored SSD caching to improve I/O. The results are interesting and will be popular among Electronics readers. However, at current stage the paper is far from acceptance and I ask authors to address the following comments during major revision round:
- Is it possible to further reduce redundancy by applying novel information-theoretic techniques. See, for example link.springer.com/article/10.1134/S1064226920120116
- Is it possible to employ software acceleration techniques to further reduce I/O, like in ieeexplore.ieee.org/document/9499750
- Could you please extend the comparison to other Linux virtual disk implementations
- Could you please extend the comparison with existing thin-provisioning + deduplication systems
- Could you please elaborate more on systems overhead
- Could you please elaborate more on impact of larger physical backends
- Could you please estimate savings
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper proposes a hypercapacity Mapping (OCM) virtual disk technology, aiming to reduce the overhead of large-scale storage infrastructure. Write experiments were conducted on a 30GB OverCap volume using mixed real data containing audio, video, images and document files. The experimental results show that the relevant performance has been improved. The following issues should be noted in the paper:
1) The abstract should be appropriately reduced. Currently, the text is too long.
2) There should be a space between the references cited and the text.
3) The introduction should cite the latest published articles on virtual disk technology, especially those published in 2025, to demonstrate the innovation of the article.
4) "OCM has achieved a write acceleration of up to 7.8 times and a read acceleration of up to 248.2 times." How does this acceleration perform compared to other articles? It is suggested to add relevant comparative analyses.
5) Will the "SSD-based hierarchical asynchronous I/O acceleration strategy" add additional overhead?
6) Why is the curve fluctuation in Figure 14 relatively small? However, the fluctuations in Figures 10 and 13 are relatively large?
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsSummary
The paper introduces Over-Capacity Mapping (OCM), a method that integrates deduplication-aware thin provisioning at the Device Mapper layer. OCM allows logical volumes to exceed physical storage capacity while reducing redundant data. The work also proposes a hierarchical asynchronous I/O strategy using SSD caching to address performance bottlenecks. The authors validate their approach through experiments on a 30GB OverCap volume with mixed real-world data (audio, video, image, document files). The results suggest significant improvements in storage space utilization and performance.
Strengths
-
Proposes a novel integration of thin provisioning and deduplication at the Device Mapper layer.
-
Introduces an SSD-based cache layer to mitigate I/O bottlenecks, which is practical and impactful.
-
Validates the system through multidimensional experiments covering both write and read performance.
-
Demonstrates practical relevance by using real-world datasets.
-
The solution directly addresses challenges of storage efficiency and scalability in large-scale environments.
Weaknesses
-
The distinction between logical and physical capacity is not clearly explained.
-
The deduplication detection mechanism is underexplained, raising concerns about false positives (e.g., original data incorrectly marked as duplicate).
-
No clear discussion of error-handling mechanisms in case of misclassification, deduplication failure, or I/O issues.
-
Performance improvements of the SSD caching strategy are not quantified in detail.
-
Experimental setup details are limited, reducing reproducibility.
Required Changes (Major)
-
Clarify Logical vs. Physical Capacity: Provide a precise definition and an illustrative example in the abstract/introduction.
-
Deduplication Strategy: Explain how duplicates are identified, especially for multimedia data (photos, videos), and what safeguards prevent misclassification of original data.
-
Error Handling: Describe how the system handles errors in deduplication and I/O operations under high concurrency to ensure data integrity.
-
Caching Performance Analysis: Add quantitative results comparing performance with and without SSD caching.
-
Experimental Setup: Expand on workload distribution, parameter selection, and system configuration to improve reproducibility.
Minor Suggestions
Improve readability by briefly defining technical terms (e.g., thin provisioning, over-capacity mapping) when first introduced.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsAll comments have been addresses, I can recomend acceptance
Reviewer 2 Report
Comments and Suggestions for AuthorsThe article has been well revised and this version can be published now.
Reviewer 3 Report
Comments and Suggestions for AuthorsI thank the author for accommodating all the requested changes. The manuscript is now in good shape, and I accept it as it is.