Next Article in Journal
Groundwater Recovery and Associated Land Deformation Along Beijing–Tianjin HSR: Insights from PS-InSAR and Explainable AI
Previous Article in Journal
DeepReinforcement and IL for Autonomous Driving: A Review in the CARLA Simulation Environment
 
 
Article
Peer-Review Record

CAESAR: A Unified Framework for Foundation and Generative Models for Efficient Compression of Scientific Data

Appl. Sci. 2025, 15(16), 8977; https://doi.org/10.3390/app15168977 (registering DOI)
by Xiao Li 1,*, Liangji Zhu 1, Jaemoon Lee 1, Rahul Sengupta 1, Scott Klasky 2, Sanjay Ranka 1,* and Anand Rangarajan 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2025, 15(16), 8977; https://doi.org/10.3390/app15168977 (registering DOI)
Submission received: 8 July 2025 / Revised: 6 August 2025 / Accepted: 10 August 2025 / Published: 14 August 2025
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper introduces a timely and technically sound contribution to scientific data compression using foundation models. The dual architecture—CAESAR-V for efficient transform-based compression and CAESAR-D for generative modeling—offers a balanced solution to scalability, fidelity, and generalization. The experiments are thorough and include diverse datasets, with detailed evaluation of key design decisions (e.g., keyframe selection, denoising steps, GPU acceleration).

 

Suggestions:

Consider explicitly discussing potential limitations (e.g., high training cost, domain transferability to medical data like DICOM).

It is recommended that the project's experimental code be published in a public repository, which would contribute to greater transparency, replicability, and dissemination of the project within the scientific and technical community.

Clarify in the abstract that the method does not yet apply to clinical image data, even though the framework is extensible.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

1. The discussion of prior work mixes scientific data compression algorithms with image and video compression algorithms/standards (DPCM and MPEG, respectively) optimized for human viewer perception. This should be sorted out - I'd recommend either to explain while the latter are relevant to the paper topic or to replace the examples with other ones, specialized for scientific data compression that possibly also make use of differential data compression and/or interpolation.
2. The Related Work section does not end with any synthetic summary of the state of the art that would point to main achievements of the  prior work that the authors build on and indicate in what way they strive to extend it.
3. Although the authors present detailed diagrams of the architecture of the proposed framework and its key components, there is no clear presentation of the compression/decompression procedure steps. For instance, the authors mention the use of a pretrained model, but it is not cler whether it is a prerequirement for the encoding step (then, how large is it?) or building it is a part of the encoding step (then, is the time needed for it included in the reported results?).
4. In several places, the authors refer to their other work (arXiv:2507.02129) for more details. First, whenever feasible (i.e., it's not too spacy), the relevant information should be copied to this manuscript. Second, the authors should make clear (preferrably, in the introduction) what is the relation between these two papers.
5. Fig. 9 cast doubts on the proposed method's effectiveness as according to it, at low noise ratios, the compression ratios it achieves are about the same as those of algorithms known so far. This should be corrected or commented.
6. The compresion ratios are reported for the whole test datasets. It would be valuable to inform the reader if they are the same or degrade for smaller data chunks.
7. The comparison with other algorithms only comprises compression ratios but not compression/decompression times. These should be compared - at various noise levels.
8. The authors mention the algorithm's parallelization and GPU acceleration but do not report any measurements that would show how it improved the processing speed.
10. There is no Threats to Validity or Limitations section. This is necessary when dealing with lossy compression, as the compression-incurred information loss acceptable for most purposes may cause the data to be useless for a specific type of analysis.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

All my comments from the first round review have been adequately addressed.

I have no further remarks.

Back to TopTop