Multi-Weather DomainShifter: A Comprehensive Multi-Weather Transfer LLM Agent for Handling Domain Shift in Aerial Image Processing
Abstract
1. Introduction
1.1. The Challenge of Weather Change-Caused Domain Shift in Aerial Imagery
1.2. Recent Developments in Generative Model and Image Synthesis
1.3. Essence and Contributions of This Paper
2. Related Work
2.1. Semantic Segmentation
2.2. Image Style Transfer
2.3. Domain Shift
3. Methodology
3.1. Multi-Weather DomainShifter
3.1.1. System Architecture
- Image Resources: This component serves as the data foundation for all operations. It is subdivided into three libraries: (1) a Style Image Library containing the target domain style references from our synthetic AWSD dataset (e.g., overcast, foggy, dusty), detailed in Section 4.4; (2) a Content Image Library storing the source domain images from real-world datasets like ISPRS [10,11]; and (3) a Content Mask Library with the corresponding semantic segmentation masks for the content images. The samples of style references, original content images, and corresponding segmentation masks are demonstrated in the top part of Figure 3.
- Tool Resources: As shown in the bottom part of Figure 3, this is a curated library of specialized generative models and general-purpose utilities. All functions in this tool resources are abstracted as tools with descriptions, enabling the LLM agent to understand how they should be utilized. The primary generative tools are our proposed (1) LAST model, designed for efficient style transfer of illumination and atmospheric changes (overcast, foggy, dusty), details in Section 3.2; and (2) the MSDM, a multi-modal diffusion model for handling complex physical scene alterations like snowy conditions, details in Section 3.3. The library is augmented with general tools for tasks such as resource listing and data transferring.
- LLM Agent (ReAct Framework): The system’s intelligence is orchestrated by an LLM agent operating on the ReAct paradigm [94]. This agent synergistically combines reasoning and acting to process user needs, which is illustrated in Figure 3. For each step, it generates a thought process (reasoning), devises an action to execute, and then observes the outcome of that action. This iterative cycle of Thought → Action → Observation allows the agent to dynamically plan, execute, and self-correct until the user’s goal is fully accomplished.

3.1.2. Agent Workflow
3.2. LAST
3.2.1. VAE for Image Compression
3.2.2. Latent Style Transformer
3.2.3. Perceptual Loss for Model Optimization
3.3. MSDM
3.3.1. ControlNet for Segmentation Mask Conditioning Diffusion Model
3.3.2. LLM-Assisted Scene Descriptor
4. Experiments
4.1. ISPRS Dataset
4.2. Model Implementation Details
4.2.1. Detailed Setup of LAST
| Model | Parameters (M) | Trainable (M) | Batch Size | Time (hours) | Iters/Epochs |
|---|---|---|---|---|---|
| Generative Models: | |||||
| LAST | 128.5 | 31.9 (24.8%) | 8 | 48 | 160K iters |
| MSDM | 1427.5 | 361.3 (25.2%) | 32 | 12 | 50 epochs |
| ResNet-50-Based Models: | |||||
| PointRend-R50 [18] | 28.7 | 28.7 (100.0%) | 24 | 7.8 | 80K iters |
| DeepLabV3+-R50 [16] | 43.6 | 43.6 (100.0%) | 24 | 21.4 | 80K iters |
| PSPNet-R50 [21] | 49.0 | 49.0 (100.0%) | 24 | 20.4 | 80K iters |
| FCN-R50 [19] | 49.5 | 49.5 (100.0%) | 24 | 16.4 | 80K iters |
| DANet-R50 [17] | 49.8 | 49.8 (100.0%) | 24 | 18.7 | 80K iters |
| UperNet-R50 [15] | 66.4 | 66.4 (100.0%) | 24 | 16.2 | 80K iters |
| Transformer-Based Models: | |||||
| UperNet-Swin | 59.8 | 59.8 (100.0%) | 24 | 18.7 | 80K iters |
| UperNet-ViT | 144.1 | 144.1 (100.0%) | 20 | 20.4 | 80K iters |
| Segmenter-ViT [20] | 102.4 | 102.4 (100.0%) | 1 | 1.0 | 80K iters |
4.2.2. Detailed Setup of MSDM
4.2.3. Detailed Setup of Semantic Segmentation Models
4.2.4. Inference Cost and Performance
| Model | FLOPs (TFLOPs) | Inference Time (ms) | FPS (img/s) | VRAM (GB) |
|---|---|---|---|---|
| Generative Models: | ||||
| LAST | 2.330 | 105.9 | 9.45 | 1.44 |
| MSDM | 15.446 | 3426.9 | 0.29 | 4.37 |
| ResNet-50-Based Models: | ||||
| FCN-R50 [19] | 0.198 | 12.4 | 80.96 | 0.40 |
| PSPNet-R50 [21] | 0.179 | 11.8 | 84.45 | 0.47 |
| DANet-R50 [17] | 0.211 | 12.6 | 79.33 | 0.41 |
| DeepLabV3+-R50 [16] | 0.177 | 12.4 | 80.85 | 0.50 |
| UperNet-R50 [15] | 0.237 | 17.2 | 58.20 | 0.68 |
| PointRend-R50 [18] | 0.034 | 14.7 | 68.15 | 0.30 |
| Transformer-Based Models: | ||||
| UperNet-Swin | 0.236 | 25.3 | 39.51 | 0.64 |
| UperNet-ViT | 0.443 | 22.2 | 44.99 | 1.00 |
| Segmenter-ViT [20] | 0.126 | 13.8 | 72.45 | 0.51 |
| Task | Model | Images | Total Time | GPU Hours |
|---|---|---|---|---|
| Atmospheric transfer | LAST | 1000 | 1.77 min | 0.029 h |
| Snowy scene generation | MSDM | 1000 | 57.1 min | 0.95 h |
| Segmentation inference | DeepLabV3+ | 1000 | 12.4 s | 0.0034 h |
4.3. Effect of Weather Change-Caused Domain Shift
4.4. Synthetic Dataset
4.5. Ablation Study of Synthetic Data Verification
4.6. Comprehensive Study of Domain Adaptation
5. Discussion
5.1. Comparison with Existing Approaches and Advantages
5.2. Real-World Implementation and Practical Significance
5.3. Limitations and Unsuccessful Cases
5.3.1. Generative Model Limitations
5.3.2. Dataset and Resolution Limitations
5.4. Future Deployment and System Architecture
6. Conclusions
6.1. Technical Contributions and Experimental Validation
6.2. Practical Significance and Real-World Implementation
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AIS | Aerial Image Segmentation |
| AWSD | Aerial Weather Synthetic Dataset |
| LAST | Latent Aerial Style Transfer |
| MSDM | Multi-Modal Snowy Scene Diffusion Model |
| LLM | Large Language Model |
| VAE | Variational Autoencoder |
| GAN | Generative Adversarial Network |
| LDM | Latent Diffusion Model |
| DM | Diffusion Model |
| T2I | Text-to-Image |
| I2I | Image-to-Image |
| MSA | Multi-head Self-Attention |
| MCA | Multi-head Cross-Attention |
| FFN | Feed-Forward Network |
| FCN | Fully Convolutional Network |
| CNN | Convolutional Neural Network |
| ViT | Vision Transformer |
| ISPRS | International Society for Photogrammetry and Remote Sensing |
| mIoU | mean Intersection over Union |
| GSD | Ground Sampling Distance |
Appendix A
Appendix A.1
| ID | Training Config | Weather | Imp. Surf. | Building | Low Veg. | Tree | Car | Clutter |
|---|---|---|---|---|---|---|---|---|
| Vaihingen Domain | ||||||||
| Exp. 1 | VN Ori | Original | 85.53 | 91.04 | 70.74 | 79.76 | 74.87 | 35.11 |
| Overcast | 82.2 | 88.79 | 68.05 | 79.2 | 69.97 | 29.03 | ||
| Foggy | 81.02 | 88.76 | 67.53 | 79.21 | 70.94 | 25.84 | ||
| Dusty | 55.05 | 85.8 | 52.03 | 78.16 | 64.11 | 17.63 | ||
| Snowy | 55.18 | 64.89 | 29.18 | 43.01 | 65.88 | 2.09 | ||
| Exp. 2 | + VN Weather (w/o. snow) | Original | 84.66 | 91.23 | 69.54 | 78.93 | 74.67 | 43.09 |
| Overcast | 84.65 | 90.95 | 69.48 | 78.84 | 73.91 | 40.01 | ||
| Foggy | 84.86 | 91.12 | 69.53 | 78.9 | 74.33 | 41.01 | ||
| Dusty | 84.65 | 91.12 | 69.48 | 78.94 | 74.47 | 39.99 | ||
| Snowy | 58.18 | 67.62 | 27.69 | 54.05 | 68.88 | 0.69 | ||
| Exp. 3 | + VN All Weather (w. snow) | Original | 82.76 | 90.13 | 67.44 | 79.35 | 74.01 | 46.42 |
| Overcast | 82.61 | 89.64 | 67.61 | 78.86 | 70.23 | 44.26 | ||
| Foggy | 82.57 | 89.31 | 67.76 | 78.95 | 71.18 | 44.42 | ||
| Dusty | 83.14 | 89.86 | 68.62 | 79.38 | 72.64 | 43.74 | ||
| Snowy | 74.67 | 84.41 | 55.45 | 70.99 | 74.31 | 16.75 | ||
| Potsdam Domain | ||||||||
| Exp. 4 | Potsdam Ori | Original | 83.99 | 91.72 | 73.14 | 75.17 | 83.36 | 37.06 |
| Overcast | 79.97 | 89.04 | 63.71 | 69.04 | 80.99 | 29.9 | ||
| Foggy | 78.02 | 88.64 | 66.13 | 71.7 | 80.99 | 29.08 | ||
| Dusty | 26.81 | 70.62 | 23.48 | 58.51 | 57.37 | 7.72 | ||
| Snowy | 65.22 | 49.05 | 15.73 | 57.04 | 40.1 | 14.46 | ||
| Exp. 5 | + VN Original | Original | 83.77 | 91.67 | 73.18 | 75.48 | 82.91 | 39.03 |
| Overcast | 74.36 | 89.04 | 56.01 | 69.25 | 80.81 | 25.44 | ||
| Foggy | 71.61 | 88.16 | 52.72 | 70.62 | 81.0 | 26.47 | ||
| Dusty | 44.3 | 76.9 | 40.87 | 59.63 | 72.79 | 11.14 | ||
| Snowy | 65.58 | 49.94 | 15.27 | 55.95 | 39.89 | 14.34 | ||
| Exp. 6 | + VN Weather (w/o. snow) | Original | 83.76 | 91.67 | 72.61 | 74.98 | 82.3 | 39.42 |
| Overcast | 78.05 | 89.01 | 64.45 | 70.77 | 79.78 | 27.6 | ||
| Foggy | 77.4 | 89.5 | 65.26 | 71.77 | 80.12 | 26.45 | ||
| Dusty | 76.46 | 85.59 | 65.68 | 71.44 | 80.01 | 33.2 | ||
| Snowy | 66.61 | 52.04 | 18.11 | 55.58 | 41.7 | 14.95 | ||
| Exp. 7 | + VN All Weather (w. snow) | Original | 83.94 | 91.69 | 72.7 | 75.15 | 83.34 | 39.8 |
| Overcast | 80.97 | 89.6 | 66.85 | 70.67 | 81.81 | 35.46 | ||
| Foggy | 79.84 | 89.93 | 66.73 | 72.22 | 82.11 | 34.03 | ||
| Dusty | 77.67 | 88.16 | 67.32 | 72.81 | 82.07 | 36.02 | ||
| Snowy | 69.83 | 56.58 | 30.66 | 63.44 | 38.37 | 17.96 | ||
| ID | Training Config | Weather | Imp. Surf. | Building | Low Veg. | Tree | Car | Clutter |
|---|---|---|---|---|---|---|---|---|
| Vaihingen Domain | ||||||||
| Exp. 1 | VN Ori | Original | 92.2 | 95.31 | 82.86 | 88.74 | 85.63 | 51.97 |
| Overcast | 90.23 | 94.06 | 80.99 | 88.39 | 82.34 | 45.00 | ||
| Foggy | 89.52 | 94.05 | 80.62 | 88.4 | 83.0 | 41.07 | ||
| Dusty | 71.01 | 92.36 | 68.45 | 87.74 | 78.13 | 29.98 | ||
| Snowy | 71.09 | 78.66 | 44.95 | 59.96 | 79.43 | 4.09 | ||
| Exp. 2 | + VN Weather (w/o. snow) | Original | 91.7 | 95.41 | 82.04 | 88.23 | 85.5 | 60.23 |
| Overcast | 91.69 | 95.26 | 81.99 | 88.17 | 85.0 | 57.15 | ||
| Foggy | 91.81 | 95.35 | 82.03 | 88.21 | 85.28 | 58.17 | ||
| Dusty | 91.69 | 95.35 | 81.99 | 88.23 | 85.36 | 57.13 | ||
| Snowy | 73.55 | 80.67 | 43.16 | 70.13 | 81.57 | 1.37 | ||
| Exp. 3 | + VN All Weather (w. snow) | Original | 90.57 | 94.81 | 80.55 | 88.48 | 85.06 | 63.4 |
| Overcast | 90.48 | 94.54 | 80.68 | 88.18 | 82.51 | 61.36 | ||
| Foggy | 90.45 | 94.35 | 80.78 | 88.23 | 83.16 | 61.52 | ||
| Dusty | 90.79 | 94.66 | 81.39 | 88.5 | 84.15 | 60.86 | ||
| Snowy | 85.43 | 91.52 | 71.11 | 82.96 | 85.25 | 26.56 | ||
| Potsdam Domain | ||||||||
| Exp. 4 | Potsdam Ori | Original | 91.3 | 95.68 | 84.49 | 85.82 | 90.92 | 54.08 |
| Overcast | 88.87 | 94.2 | 77.83 | 81.69 | 89.5 | 46.03 | ||
| Foggy | 87.65 | 93.98 | 79.61 | 83.52 | 89.5 | 45.05 | ||
| Dusty | 42.28 | 82.78 | 38.03 | 73.83 | 72.91 | 14.33 | ||
| Snowy | 78.93 | 65.79 | 27.01 | 72.61 | 57.22 | 25.18 | ||
| Exp. 5 | + VN Original | Original | 91.17 | 95.66 | 84.52 | 86.03 | 90.66 | 56.15 |
| Overcast | 85.3 | 94.2 | 71.8 | 81.83 | 89.39 | 40.56 | ||
| Foggy | 83.45 | 93.71 | 69.04 | 82.78 | 89.5 | 41.86 | ||
| Dusty | 61.4 | 86.94 | 58.02 | 74.71 | 84.25 | 20.04 | ||
| Snowy | 79.2 | 66.58 | 26.34 | 71.72 | 57.01 | 24.97 | ||
| Exp. 6 | + VN Weather (w/o. snow) | Original | 91.16 | 95.66 | 84.13 | 85.7 | 90.29 | 56.55 |
| Overcast | 87.67 | 94.18 | 78.38 | 82.89 | 88.75 | 43.27 | ||
| Foggy | 87.26 | 94.46 | 78.98 | 83.56 | 88.96 | 41.84 | ||
| Dusty | 86.66 | 92.24 | 79.29 | 83.34 | 88.9 | 49.85 | ||
| Snowy | 79.94 | 68.43 | 30.5 | 71.41 | 58.83 | 25.96 | ||
| Exp. 7 | + VN All Weather (w. snow) | Original | 91.27 | 95.67 | 84.19 | 85.81 | 90.91 | 56.94 |
| Overcast | 89.48 | 94.52 | 80.13 | 82.82 | 90.0 | 52.35 | ||
| Foggy | 88.79 | 94.7 | 80.04 | 83.87 | 90.17 | 50.78 | ||
| Dusty | 87.43 | 93.71 | 80.47 | 84.27 | 90.15 | 52.96 | ||
| Snowy | 82.21 | 72.24 | 46.88 | 77.62 | 55.42 | 30.41 | ||
Appendix A.1.1. Per-Class Performance Analysis: Intra-Distribution Validation
Per-Class Performance Analysis: Cross-Distribution Validation
References
- Pi, Y.; Nath, N.D.; Behzadan, A.H. Convolutional neural networks for object detection in aerial imagery for disaster response and recovery. Adv. Eng. Inform. 2020, 43, 101009. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, Z.; Nakano, Y.; Nishimatsu, K.; Hasegawa, K.; Ohya, J. Context Enhanced Traffic Segmentation: Traffic jam and road surface segmentation from aerial image. In Proceedings of the 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), Nafplio, Greece, 26–29 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar]
- Liang, Y.; Li, X.; Tsai, B.; Chen, Q.; Jafari, N. V-FloodNet: A video segmentation system for urban flood detection and quantification. Environ. Model. Softw. 2023, 160, 105586. [Google Scholar] [CrossRef]
- Li, X.; He, H.; Li, X.; Li, D.; Cheng, G.; Shi, J.; Weng, L.; Tong, Y.; Lin, Z. Pointflow: Flowing semantics through points for aerial image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 4217–4226. [Google Scholar]
- Wang, Y.; Wang, Z.; Nakano, Y.; Hasegawa, K.; Ishii, H.; Ohya, J. MAC: Multi-Scales Attention Cascade for Aerial Image Segmentation. In Proceedings of the 13th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2024, Science and Technology Publications, Lda, Rome, Italy, 24–26 February 2024; pp. 37–47. [Google Scholar]
- Toker, A.; Eisenberger, M.; Cremers, D.; Leal-Taixé, L. Satsynth: Augmenting image-mask pairs through diffusion models for aerial semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 27695–27705. [Google Scholar]
- Dai, D.; Van Gool, L. Dark model adaptation: Semantic image segmentation from daytime to nighttime. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 3819–3824. [Google Scholar]
- Michaelis, C.; Mitzkus, B.; Geirhos, R.; Rusak, E.; Bringmann, O.; Ecker, A.S.; Bethge, M.; Brendel, W. Benchmarking robustness in object detection: Autonomous driving when winter is coming. arXiv 2019, arXiv:1907.07484. [Google Scholar]
- Sun, T.; Segu, M.; Postels, J.; Wang, Y.; Van Gool, L.; Schiele, B.; Tombari, F.; Yu, F. SHIFT: A synthetic driving dataset for continuous multi-task domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 21371–21382. [Google Scholar]
- International Society for Photogrammetry and Remote Sensing (ISPRS). ISPRS 2D Semantic Labeling Contest. Available online: https://www.isprs.org/resources/datasets/benchmarks/UrbanSemLab/semantic-labeling.aspx (accessed on 21 July 2025).
- Rottensteiner, F.; Sohn, G.; Gerke, M.; Wegner, J.D.; Breitkopf, U.; Jung, J. Results of the ISPRS benchmark on urban object detection and 3D building reconstruction. ISPRS J. Photogramm. Remote Sens. 2014, 93, 256–271. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Virtual, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Denver, CO, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Xiao, T.; Liu, Y.; Zhou, B.; Jiang, Y.; Sun, J. Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 418–434. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Kirillov, A.; Wu, Y.; He, K.; Girshick, R. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–18 June 2020; pp. 9799–9808. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Strudel, R.; Garcia, R.; Laptev, I.; Schmid, C. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 10–17 October 2021; pp. 7262–7272. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Waqas Zamir, S.; Arora, A.; Gupta, A.; Khan, S.; Sun, G.; Shahbaz Khan, F.; Zhu, F.; Shao, L.; Xia, G.S.; Bai, X. isaid: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, CA, USA, 16–20 June 2019; pp. 28–37. [Google Scholar]
- Brown, T.B. Language models are few-shot learners. arXiv 2020, arXiv:2005.14165. [Google Scholar] [CrossRef]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar] [CrossRef]
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen technical report. arXiv 2023, arXiv:2309.16609. [Google Scholar] [CrossRef]
- Yang, A.; Li, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; Yu, B.; Gao, C.; Huang, C.; Lv, C.; et al. Qwen3 technical report. arXiv 2025, arXiv:2505.09388. [Google Scholar] [CrossRef]
- Liu, A.; Feng, B.; Xue, B.; Wang, B.; Wu, B.; Lu, C.; Zhao, C.; Deng, C.; Zhang, C.; Ruan, C.; et al. Deepseek-v3 technical report. arXiv 2024, arXiv:2412.19437. [Google Scholar]
- Guo, D.; Yang, D.; Zhang, H.; Song, J.; Zhang, R.; Xu, R.; Zhu, Q.; Ma, S.; Wang, P.; Bi, X.; et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv 2025, arXiv:2501.12948. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–22 June 2022; pp. 10684–10695. [Google Scholar]
- Podell, D.; English, Z.; Lacey, K.; Blattmann, A.; Dockhorn, T.; Müller, J.; Penna, J.; Rombach, R. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv 2023, arXiv:2307.01952. [Google Scholar] [CrossRef]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
- Betker, J.; Goh, G.; Jing, L.; Brooks, T.; Wang, J.; Li, L.; Ouyang, L.; Zhuang, J.; Lee, J.; Guo, Y.; et al. Improving image generation with better captions. Comput. Sci. 2023, 2, 8. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2014; Volume 27, Available online: https://proceedings.neurips.cc/paper_files/paper/2014/hash/f033ed80deb0234979a61f95710dbe25-Abstract.html (accessed on 25 October 2025).
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference On Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Zhang, H.; Xu, T.; Li, H.; Zhang, S.; Wang, X.; Huang, X.; Metaxas, D.N. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE internAtional Conference On Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5907–5915. [Google Scholar]
- Brock, A. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv 2018, arXiv:1809.11096. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
- Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 7354–7363. [Google Scholar]
- Kingma, D.P. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Vahdat, A.; Kautz, J. NVAE: A deep hierarchical variational autoencoder. Adv. Neural Inf. Process. Syst. 2020, 33, 19667–19679. [Google Scholar]
- Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Volume 37, pp. 2256–2265. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Zhang, L.; Rao, A.; Agrawala, M. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 3836–3847. [Google Scholar]
- Luo, Z.; Gustafsson, F.K.; Zhao, Z.; Sjölund, J.; Schön, T.B. Refusion: Enabling large-size realistic image restoration with latent-space diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 1680–1691. [Google Scholar]
- Li, T.; Chang, H.; Mishra, S.; Zhang, H.; Katabi, D.; Krishnan, D. Mage: Masked generative encoder to unify representation learning and image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 2142–2152. [Google Scholar]
- Khanna, S.; Liu, P.; Zhou, L.; Meng, C.; Rombach, R.; Burke, M.; Lobell, D.; Ermon, S. Diffusionsat: A generative foundation model for satellite imagery. arXiv 2023, arXiv:2312.03606. [Google Scholar] [CrossRef]
- Peebles, W.; Xie, S. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 4195–4205. [Google Scholar]
- Xu, Y.; Yu, W.; Ghamisi, P.; Kopp, M.; Hochreiter, S. Txt2Img-MHN: Remote sensing image generation from text using modern Hopfield networks. IEEE Trans. Image Process. 2023, 32, 5737–5750. [Google Scholar] [CrossRef] [PubMed]
- Sastry, S.; Khanal, S.; Dhakal, A.; Jacobs, N. Geosynth: Contextually-aware high-resolution satellite image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 460–470. [Google Scholar]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Li, C.; Wand, M. Precomputed real-time texture synthesis with markovian generative adversarial networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 702–716. [Google Scholar]
- Deng, Y.; Tang, F.; Dong, W.; Ma, C.; Pan, X.; Wang, L.; Xu, C. Stytr2: Image style transfer with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11326–11336. [Google Scholar]
- Brooks, T.; Holynski, A.; Efros, A.A. InstructPix2Pix: Learning To Follow Image Editing Instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 18392–18402. [Google Scholar]
- Wang, Z.; Zhao, L.; Xing, W. StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 7677–7689. [Google Scholar]
- Zhang, Y.; Huang, N.; Tang, F.; Huang, H.; Ma, C.; Dong, W.; Xu, C. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 10146–10156. [Google Scholar]
- Sohn, K.; Jiang, L.; Barber, J.; Lee, K.; Ruiz, N.; Krishnan, D.; Chang, H.; Li, Y.; Essa, I.; Rubinstein, M.; et al. Styledrop: Text-to-image synthesis of any style. Adv. Neural Inf. Process. Syst. 2024, 36, 66860–66889. [Google Scholar]
- Chung, J.; Hyun, S.; Heo, J.P. Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Denver, CO, USA, 17–21 June 2024; pp. 8795–8805. [Google Scholar]
- Shen, Y.; Song, K.; Tan, X.; Li, D.; Lu, W.; Zhuang, Y. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. Adv. Neural Inf. Process. Syst. 2023, 36, 38154–38180. [Google Scholar]
- Qin, J.; Wu, J.; Chen, W.; Ren, Y.; Li, H.; Wu, H.; Xiao, X.; Wang, R.; Wen, S. Diffusiongpt: Llm-driven text-to-image generation system. arXiv 2024, arXiv:2401.10061. [Google Scholar]
- Liu, Z.; He, Y.; Wang, W.; Wang, W.; Wang, Y.; Chen, S.; Zhang, Q.; Yang, Y.; Li, Q.; Yu, J.; et al. Internchat: Solving vision-centric tasks by interacting with chatbots beyond language. arXiv 2023, arXiv:2305.05662. [Google Scholar]
- Wang, Z.; Xie, E.; Li, A.; Wang, Z.; Liu, X.; Li, Z. Divide and conquer: Language models can plan and self-correct for compositional text-to-image generation. arXiv 2024, arXiv:2401.15688. [Google Scholar]
- Wang, Z.; Li, A.; Li, Z.; Liu, X. Genartist: Multimodal llm as an agent for unified image generation and editing. Adv. Neural Inf. Process. Syst. 2024, 37, 128374–128395. [Google Scholar]
- Unreal, E. Unreal Engine. 2025. Available online: https://www.unrealengine.com/en-us (accessed on 21 July 2025).
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1925–1934. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar] [CrossRef]
- Guo, M.H.; Lu, C.Z.; Hou, Q.; Liu, Z.; Cheng, M.M.; Hu, S.M. Segnext: Rethinking convolutional attention design for semantic segmentation. Adv. Neural Inf. Process. Syst. 2022, 35, 1140–1156. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Zheng, Z.; Zhong, Y.; Wang, J.; Ma, A. Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4096–4105. [Google Scholar]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 694–711. [Google Scholar]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
- Chen, H.; Wang, Z.; Zhang, H.; Zuo, Z.; Li, A.; Xing, W.; Lu, D. Artistic style transfer with internal-external learning and contrastive learning. Adv. Neural Inf. Process. Syst. 2021, 34, 26561–26573. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. ICLR 2022, 1, 3. [Google Scholar]
- Shah, V.; Ruiz, N.; Cole, F.; Lu, E.; Lazebnik, S.; Li, Y.; Jampani, V. Ziplora: Any subject in any style by effectively merging loras. In Proceedings of the European Conference on Computer Vision, Milano, Italy, 28 September–4 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 422–438. [Google Scholar]
- Liu, C.; Shah, V.; Cui, A.; Lazebnik, S. Unziplora: Separating content and style from a single image. arXiv 2024, arXiv:2412.04465. [Google Scholar]
- Jones, M.; Wang, S.Y.; Kumari, N.; Bau, D.; Zhu, J.Y. Customizing text-to-image models with a single image pair. In Proceedings of the SIGGRAPH Asia 2024 Conference Papers, Tokyo, Japan, 3–6 December 2024; pp. 1–13. [Google Scholar]
- Frenkel, Y.; Vinker, Y.; Shamir, A.; Cohen-Or, D. Implicit style-content separation using b-lora. In Proceedings of the European Conference on Computer Vision, Milano, Italy, 28 September–4 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 181–198. [Google Scholar]
- Chen, B.; Zhao, B.; Xie, H.; Cai, Y.; Li, Q.; Mao, X. Consislora: Enhancing content and style consistency for lora-based style transfer. arXiv 2025, arXiv:2503.10614. [Google Scholar]
- Ben-David, S.; Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.; Vaughan, J.W. A theory of learning from different domains. Mach. Learn. 2010, 79, 151–175. [Google Scholar] [CrossRef]
- Khosla, A.; Zhou, T.; Malisiewicz, T.; Efros, A.A.; Torralba, A. Undoing the damage of dataset bias. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 158–171. [Google Scholar]
- Muandet, K.; Balduzzi, D.; Schölkopf, B. Domain generalization via invariant feature representation. In Proceedings of the International Conference on Machine Learning, PMLR, Atlanta, GA, USA, 16–21 June 2013; pp. 10–18. [Google Scholar]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 23–30. [Google Scholar]
- Volpi, R.; Namkoong, H.; Sener, O.; Duchi, J.C.; Murino, V.; Savarese, S. Generalizing to unseen domains via adversarial data augmentation. Adv. Neural Inf. Process. Syst. 2018, 31, 5339–5349. Available online: https://proceedings.neurips.cc/paper_files/paper/2018/file/1d94108e907bb8311d8802b48fd54b4a-Paper.pdf (accessed on 25 October 2025).
- Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7167–7176. [Google Scholar]
- Wang, M.; Deng, W. Deep visual domain adaptation: A survey. Neurocomputing 2018, 312, 135–153. [Google Scholar] [CrossRef]
- Farahani, A.; Voghoei, S.; Rasheed, K.; Arabnia, H.R. A brief review of domain adaptation. In Advances in Data Science and Information Engineering: Proceedings from ICDATA 2020 and IKE 2020; Springer: Cham, Switzerland, 2021; pp. 877–894. [Google Scholar]
- Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; Cao, Y. React: Synergizing reasoning and acting in language models. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Figurnov, M.; Mohamed, S.; Mnih, A. Implicit reparameterization gradients. Adv. Neural Inf. Process. Syst. 2018, 31, 439–450. [Google Scholar]
- Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, PmLR, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- OpenMMLab. MMSegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark. 2020. Available online: https://github.com/open-mmlab/mmsegmentation (accessed on 24 October 2025).
- Cordts, M.; Omran, M.; Ramos, S.; Scharwächter, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset. In Proceedings of the CVPR Workshop on the Future of Datasets in Vision, Boston, MA, USA, 7–12 June 2015; Volume 2, p. 1. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Zhu, L. THOP: PyTorch-OpCounter. 2025. Available online: https://pypi.org/project/thop/ (accessed on 25 October 2025).
- Wang, J.; Zheng, Z.; Ma, A.; Lu, X.; Zhong, Y. LoveDA: A remote sensing land-cover dataset for domain adaptive semantic segmentation. arXiv 2021, arXiv:2110.08733. [Google Scholar]
- Anthropic. Model Context Protocol: Getting Started. 2025. Available online: https://modelcontextprotocol.io/docs/getting-started/intro (accessed on 26 October 2025).








| Method | Weather Conditions | |||||
|---|---|---|---|---|---|---|
| Model | Backbone | Original | Overcast | Foggy | Dusty | Snowy |
| UperNet [15] | Swin-T [12] | 73.26 | 68.27 | 66.66 | 66.46 | 43.54 |
| UperNet | ResNet-50 [13] | 73.33 | 68.47 | 68.18 | 56.16 | 42.52 |
| UperNet | ViT-B | 72.47 | 71.47 | 71.43 | 67.62 | 46.66 |
| DeepLabv3+ [16] | ResNet-50 | 72.84 | 69.54 | 68.89 | 58.80 | 43.37 |
| DANet [17] | ResNet-50 | 72.47 | 69.17 | 68.44 | 60.82 | 42.47 |
| PointRend [18] | ResNet-50 | 72.67 | 69.56 | 69.48 | 57.71 | 42.58 |
| FCN [19] | ResNet-50 | 72.79 | 67.78 | 66.81 | 61.98 | 43.62 |
| Segmenter [20] | ViT-B | 68.93 | 67.28 | 67.10 | 64.61 | 44.98 |
| PSPNet [21] | ResNet-50 | 72.91 | 70.00 | 69.41 | 62.55 | 44.60 |
| Average | 72.41 | 69.06 | 68.49 | 61.86 | 43.82 | |
| Method | Weather Conditions | |||||
|---|---|---|---|---|---|---|
| Model | Backbone | Original | Overcast | Foggy | Dusty | Snowy |
| UperNet [15] | Swin-T [12] | 83.00 | 78.89 | 76.91 | 78.27 | 56.83 |
| UperNet | ResNet-50 [13] | 83.12 | 79.42 | 79.12 | 68.77 | 55.55 |
| UperNet | ViT-B [14] | 82.52 | 81.90 | 81.96 | 78.46 | 59.14 |
| DeepLabv3+ [16] | ResNet-50 | 82.78 | 80.17 | 79.44 | 71.28 | 56.36 |
| DANet [17] | ResNet-50 | 82.57 | 79.92 | 79.20 | 73.16 | 55.45 |
| PointRend [18] | ResNet-50 | 82.77 | 80.46 | 80.36 | 70.20 | 55.56 |
| FCN [19] | ResNet-50 | 82.84 | 78.63 | 77.41 | 74.15 | 56.39 |
| Segmenter [20] | ViT-B | 79.46 | 78.39 | 78.38 | 75.86 | 57.41 |
| PSPNet [21] | ResNet-50 | 82.75 | 80.47 | 79.85 | 73.74 | 57.25 |
| Average | 82.42 | 79.81 | 79.18 | 73.77 | 56.66 | |
| Experiment | Weather Conditions | |||||
|---|---|---|---|---|---|---|
| ID | Training Configuration | Original | Overcast | Foggy | Dusty | Snowy |
| Vaihingen Domain | ||||||
| Exp. 1 | Vaihingen (VN) Ori | 72.84 | 69.54 | 68.89 | 58.80 | 43.37 |
| Exp. 2 | + VN Weather (w/o. snow) | 73.69 | 72.97 | 73.29 | 73.11 | 46.18 |
| Exp. 3 | + VN All Weather (w. snow) | 73.35 | 72.20 | 72.36 | 72.90 | 62.76 |
| Potsdam Domain | ||||||
| Exp. 4 | Potsdam Ori | 74.07 | 68.77 | 69.09 | 40.75 | 40.27 |
| Exp. 5 | + VN Original | 74.34 | 65.82 | 65.10 | 50.94 | 40.16 |
| Exp. 6 | + VN Weather (w/o. snow) | 74.12 | 68.28 | 68.42 | 68.73 | 41.50 |
| Exp. 7 | + VN All Weather (w. snow) | 74.44 | 70.89 | 70.81 | 70.67 | 46.14 |
| Experiment | Weather Conditions | |||||
|---|---|---|---|---|---|---|
| ID | Training Configuration | Original | Overcast | Foggy | Dusty | Snowy |
| Vaihingen Domain | ||||||
| Exp. 1 | Vaihingen (VN) Ori | 82.78 | 80.17 | 79.44 | 71.28 | 56.36 |
| Exp. 2 | + VN Weather (w/o. snow) | 83.85 | 83.21 | 83.47 | 83.29 | 58.41 |
| Exp. 3 | + VN All Weather (w. snow) | 83.81 | 82.96 | 83.08 | 83.39 | 73.80 |
| Potsdam Domain | ||||||
| Exp. 4 | Potsdam Ori | 83.72 | 79.69 | 79.89 | 54.03 | 54.46 |
| Exp. 5 | + VN Original | 84.03 | 77.18 | 76.73 | 64.23 | 54.30 |
| Exp. 6 | + VN Weather (w/o. snow) | 83.92 | 79.19 | 79.18 | 80.04 | 55.85 |
| Exp. 7 | + VN All Weather (w. snow) | 84.13 | 81.55 | 81.39 | 81.50 | 60.80 |
| Method | Weather Conditions | |||||
|---|---|---|---|---|---|---|
| Model | Backbone | Original | Overcast | Foggy | Dusty | Snowy |
| UperNet [15] | Swin-T [12] | 72.91 | 72.25 | 72.34 | 73.07 | 61.75 |
| UperNet | ResNet-50 [13] | 73.84 | 73.14 | 73.35 | 73.52 | 61.49 |
| UperNet | ViT-B | 72.80 | 72.03 | 72.24 | 73.10 | 63.20 |
| DeepLabv3+ [16] | ResNet-50 | 73.35 | 72.20 | 72.36 | 72.90 | 62.76 |
| DANet [17] | ResNet-50 | 72.44 | 72.06 | 72.44 | 72.81 | 61.34 |
| PointRend [18] | ResNet-50 | 72.09 | 71.64 | 71.75 | 72.12 | 60.12 |
| FCN [19] | ResNet-50 | 72.68 | 71.37 | 71.55 | 72.42 | 60.37 |
| Segmenter [20] | ViT-B | 69.38 | 68.86 | 68.94 | 68.96 | 59.68 |
| PSPNet [21] | ResNet-50 | 73.07 | 72.76 | 73.01 | 73.13 | 61.64 |
| Average | 72.51 | 71.81 | 72.00 | 72.45 | 61.37 | |
| Method | Weather Conditions | |||||
|---|---|---|---|---|---|---|
| Model | Backbone | Original | Overcast | Foggy | Dusty | Snowy |
| UperNet [15] | Swin-T [12] | 82.68 | 82.14 | 82.23 | 82.88 | 72.64 |
| UperNet | ResNet-50 [13] | 84.04 | 83.42 | 83.61 | 83.74 | 72.42 |
| UperNet | ViT-B [14] | 82.63 | 82.78 | 82.10 | 82.92 | 74.13 |
| DeepLabv3+ [16] | ResNet-50 | 83.81 | 82.96 | 83.08 | 83.39 | 73.80 |
| DANet [17] | ResNet-50 | 82.60 | 82.27 | 82.61 | 82.93 | 71.92 |
| PointRend [18] | ResNet-50 | 82.68 | 82.32 | 82.40 | 82.69 | 71.65 |
| FCN [19] | ResNet-50 | 82.94 | 81.91 | 82.07 | 82.69 | 71.34 |
| Segmenter [20] | ViT-B | 80.20 | 79.78 | 79.88 | 79.86 | 71.04 |
| PSPNet [21] | ResNet-50 | 83.26 | 82.99 | 83.21 | 83.27 | 72.65 |
| Average | 82.76 | 82.29 | 82.36 | 82.70 | 72.40 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Wen, R.; Ishii, H.; Ohya, J. Multi-Weather DomainShifter: A Comprehensive Multi-Weather Transfer LLM Agent for Handling Domain Shift in Aerial Image Processing. J. Imaging 2025, 11, 395. https://doi.org/10.3390/jimaging11110395
Wang Y, Wen R, Ishii H, Ohya J. Multi-Weather DomainShifter: A Comprehensive Multi-Weather Transfer LLM Agent for Handling Domain Shift in Aerial Image Processing. Journal of Imaging. 2025; 11(11):395. https://doi.org/10.3390/jimaging11110395
Chicago/Turabian StyleWang, Yubo, Ruijia Wen, Hiroyuki Ishii, and Jun Ohya. 2025. "Multi-Weather DomainShifter: A Comprehensive Multi-Weather Transfer LLM Agent for Handling Domain Shift in Aerial Image Processing" Journal of Imaging 11, no. 11: 395. https://doi.org/10.3390/jimaging11110395
APA StyleWang, Y., Wen, R., Ishii, H., & Ohya, J. (2025). Multi-Weather DomainShifter: A Comprehensive Multi-Weather Transfer LLM Agent for Handling Domain Shift in Aerial Image Processing. Journal of Imaging, 11(11), 395. https://doi.org/10.3390/jimaging11110395
