Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

CAESAR: A Unified Framework for Foundation and Generative Models for Efficient Compression of Scientific Data

Appl. Sci. 2025, 15(16), 8977; https://doi.org/10.3390/app15168977

by Xiao Li^1,*, Liangji Zhu¹

, Jaemoon Lee¹

, Rahul Sengupta¹

, Scott Klasky², Sanjay Ranka^1,*

and Anand Rangarajan¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2025, 15(16), 8977; https://doi.org/10.3390/app15168977

Submission received: 8 July 2025 / Revised: 6 August 2025 / Accepted: 10 August 2025 / Published: 14 August 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper introduces a timely and technically sound contribution to scientific data compression using foundation models. The dual architecture—CAESAR-V for efficient transform-based compression and CAESAR-D for generative modeling—offers a balanced solution to scalability, fidelity, and generalization. The experiments are thorough and include diverse datasets, with detailed evaluation of key design decisions (e.g., keyframe selection, denoising steps, GPU acceleration).

Suggestions:

Consider explicitly discussing potential limitations (e.g., high training cost, domain transferability to medical data like DICOM).

It is recommended that the project's experimental code be published in a public repository, which would contribute to greater transparency, replicability, and dissemination of the project within the scientific and technical community.

Clarify in the abstract that the method does not yet apply to clinical image data, even though the framework is extensible.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

1. The discussion of prior work mixes scientific data compression algorithms with image and video compression algorithms/standards (DPCM and MPEG, respectively) optimized for human viewer perception. This should be sorted out - I'd recommend either to explain while the latter are relevant to the paper topic or to replace the examples with other ones, specialized for scientific data compression that possibly also make use of differential data compression and/or interpolation.
2. The Related Work section does not end with any synthetic summary of the state of the art that would point to main achievements of the prior work that the authors build on and indicate in what way they strive to extend it.
3. Although the authors present detailed diagrams of the architecture of the proposed framework and its key components, there is no clear presentation of the compression/decompression procedure steps. For instance, the authors mention the use of a pretrained model, but it is not cler whether it is a prerequirement for the encoding step (then, how large is it?) or building it is a part of the encoding step (then, is the time needed for it included in the reported results?).
4. In several places, the authors refer to their other work (arXiv:2507.02129) for more details. First, whenever feasible (i.e., it's not too spacy), the relevant information should be copied to this manuscript. Second, the authors should make clear (preferrably, in the introduction) what is the relation between these two papers.
5. Fig. 9 cast doubts on the proposed method's effectiveness as according to it, at low noise ratios, the compression ratios it achieves are about the same as those of algorithms known so far. This should be corrected or commented.
6. The compresion ratios are reported for the whole test datasets. It would be valuable to inform the reader if they are the same or degrade for smaller data chunks.
7. The comparison with other algorithms only comprises compression ratios but not compression/decompression times. These should be compared - at various noise levels.
8. The authors mention the algorithm's parallelization and GPU acceleration but do not report any measurements that would show how it improved the processing speed.
10. There is no Threats to Validity or Limitations section. This is necessary when dealing with lossy compression, as the compression-incurred information loss acceptable for most purposes may cause the data to be useless for a specific type of analysis.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

All my comments from the first round review have been adequately addressed.

I have no further remarks.

Article Menu

CAESAR: A Unified Framework for Foundation and Generative Models for Efficient Compression of Scientific Data

Further Information

Guidelines

MDPI Initiatives

Follow MDPI