Exploiting Diffusion Priors for Generalizable Few-Shot Satellite Image Semantic Segmentation
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe topic is relevant and timely, particularly given the growing interest in applying generative diffusion priors to perception tasks under limited supervision. The paper is well organized, technically detailed, and demonstrates a clear experimental setup. The proposed DiffSatSeg framework is methodologically sound, with good motivation and theoretical grounding. The empirical evaluation is comprehensive and convincing, including both quantitative and qualitative comparisons. However, some areas could be strengthened to improve clarity, reproducibility, and scientific rigor:
1.Novelty and Positioning: 1)While the integration of diffusion priors into few-shot satellite segmentation is novel, the manuscript could better differentiate DiffSatSeg from existing diffusion-based segmentation frameworks (e.g., DiffusionSeg, MaskDiffusion, or Diff-U-Net). A brief discussion comparing architectural design choices and adaptation strategies would help establish the degree of originality; 2)The introduction could also highlight why diffusion priors are more effective than vision transformer backbones or pre-trained contrastive representations (e.g., CLIP-Seg, SAM-based few-shot methods) for this specific domain.
2.Methodological: The proxy-query design is central to the framework. Please clarify whether the queries are shared across diffusion layers or unique to each layer. Also specify how they are initialized and regularized to prevent collapse or redundancy.
3.Experimental: 1)The experiments are primarily based on the SatelliteDataset and Speed+. It would be useful to know whether the proposed model generalizes to other spacecraft or aerial datasets (e.g., SpacecraftNet, URSO); 2)The low-light/backlit robustness claims are interesting but currently rely on qualitative examples. Quantitative evaluation (e.g., mIoU under illumination perturbations) would make the claim more convincing; 3)The paper mentions that the model was trained for 3,000 iterations on a single RTX 4090 GPU. Please specify the effective number of epochs, dataset splits, and approximate training time to allow reproduction.
References: The reference list is generally comprehensive, but it lacks the inclusion of the most recent diffusion-based segmentation and few-shot learning studies from 2024–2025, which would better reflect the current state of the field and strengthen the manuscript’s literature grounding.
Overall, the manuscript makes a strong and technically sound contribution to the field of satellite image analysis, particularly in introducing diffusion priors for few-shot semantic segmentation. The proposed DiffSatSeg framework is innovative, well-motivated, and supported by convincing experimental results. The methodology is carefully designed and the writing is generally clear. However, the paper would benefit from improvements in the presentation of novelty, more detailed methodological clarification, and broader experimental validation to enhance its reproducibility and generality.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsAfter carefully reading the paper, I do not think it is ready for publication. The writing is clear, but the research purpose and originality are not convincing.
In the introduction, the authors talk about diffusion models and satellite image segmentation, but they do not clearly say what new problem they are solving. The challenge of limited training data for satellites has already been discussed in many past studies. The paper does not show what is still missing or how their method solves something that others could not. So, the problem statement is weak.
The novelty is also not clear. Using diffusion models for segmentation has already been done by other researchers. The paper mixes diffusion features with “proxy queries,” but it is not explained why this idea is new or important. The authors need to show what specific issue in few-shot learning is fixed by their method.
The objectives and results do not match well. The paper claims strong generalization and real-world usefulness, but the tests are very limited and use the same few datasets. The performance numbers look higher, but there is no deeper analysis to prove that the improvement really comes from the new design. The results seem incremental, not a major advance.
The method section is full of equations but not easy to follow. It is not clear how someone could reproduce the work. Many details like random seed, parameter settings, and data split strategy are missing. The experiments are not compared with the latest diffusion-based segmentation works, so the “state-of-the-art” claim is not supported.
In short, the paper needs a major rewrite. The authors should:
-
Clearly define the unique research problem in the introduction.
-
Explain why existing diffusion segmentation methods cannot solve it.
-
Simplify the method and provide enough details for reproducibility.
-
Add stronger comparisons and experiments to prove novelty.
Right now, the paper feels more like a technical combination of known ideas than a clear new contribution. My recommendation is to reject this version and ask for a major revision before reconsideration.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsI accept article in present form

