Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper introduces Waveshift Augmentation 2.0 (WS 2.0), a novel data augmentation method inspired by physics-based optical principles, specifically wavefront propagation and diffraction effects, aimed at improving generalization in fine-grained image classification. WS 2.0 enhances the previous WS 1.0 method by adding an aperture-dependent hyperparameter, allowing realistic simulation of optical attenuation and frequency modulation. Through comprehensive experiments across medical imaging, plant disease diagnosis, and fine-grained object recognition tasks, WS 2.0 demonstrates consistent performance improvements compared to traditional geometric augmentation and WS 1.0. In general, this paper is well-written and very easy to follow. However, there are also some weaknesses need to be addressed before publication.
- While WS 2.0 shows improvement over WS 1.0 and basic geometric augmentations, the manuscript lacks extensive comparative analysis against other leading augmentation methods or strategies widely recognized in the related works (e.g., CutMix, MixUp, or Neural Augmentation strategies).
- The manuscript claims computational efficiency, but does not substantiate this claim sufficiently with quantitative analyses. Fourier domain transformations and aperture modeling potentially introduce computational overhead.
- Although Optuna optimization is employed for hyperparameter tuning, there is limited sensitivity analysis or explanation on why certain hyperparameter ranges (e.g., z and R) were chosen initially, and the exploration seems somewhat empirical.
- Several visualization figures (e.g., Fig. 2, Fig. 4) could be improved in clarity, labeling, and presentation quality. Axis labeling and explanations of color scales in hyperparameter optimization graphs lack sufficient clarity, making them less interpretable to readers unfamiliar with the methodology.
- Some works about fine-grained recognition are suggested to be cited in this paper to make this submission more comprehensive, such as: Fine-grained image analysis with deep learning: A survey, Learning attention-guided pyramidal features for few-shot fine-grained recognition, Learning Contrastive Self-Distillation for Ultra-Fine-Grained Visual Categorization Targeting Limited Samples.
Author Response
Thank you very much for taking the time to review our manuscript. We have carefully addressed all the comments and revised the paper accordingly. For detailed responses to each point and an explanation of the corresponding changes, please refer to the attached PDF document.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper proposes a data augmentation technique called Waveshift Augmentation 2.0 (WS 2.0). Specifically, a Fourier transform is used to convert an optical image into the frequency domain for frequency modulation. Then, an inverse Fourier transform is employed to bring the modulated frequency spectrum back to the spatial domain to generate an augmented image. Its predecessor, WS 1.0, introduced the parameter Z to simulate the imaging of an object at different distances through phase modulation. WS 2.0 builds on this by introducing an aperture - controlled hyperparameter R, thus adding the function of amplitude modulation. This improvement enables WS 2.0 to simulate light intensity attenuation and capture the natural formation process of images under finite - aperture rather than idealized optical conditions.
- Why is the Fourier transform used instead of considering other frequency - domain transformation techniques?
- The detailed definition of norm[·] in Equation (3).
- In Figure 3, "optical blurring" only describes the phenomenon and does not explain its relationship with high - frequency attenuation in the frequency domain (for example, high - frequency components correspond to image details, and attenuating high - frequency components leads to blurring).
- Currently, the hyperparameter ranges (z: 15 - 151m, R: 0.0001 - 0.01) are manually set. Although optimized by Optuna, there is a lack of further explanation of the physical meanings of the parameters.
- No ablation experiments were designed to separate the independent effects of phase modulation (z) and amplitude modulation (R), making it impossible to quantify the specific contribution of R to performance. For example, when R = 0 (degenerating to WS 1.0), it is unclear how much the performance will decline.
- The time complexity of the algorithm should be provided to evaluate its practical application value.
Author Response
Thank you very much for taking the time to review our manuscript. We have carefully addressed all the comments and revised the paper accordingly. For detailed responses to each point and an explanation of the corresponding changes, please refer to the attached PDF document.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsNo more comments.
Reviewer 2 Report
Comments and Suggestions for AuthorsNo further comments.