GCCG-RSI: Ground LiDAR and Image-Guided Geometry-Constrained Controllable Generation for Remote Sensing Image
Highlights
- A novel geometry-constrained controllable generation model is proposed to synthesize remote sensing images from ground-level images and corresponding point clouds.
- A dual remote sensing feature fusion module that leverages the complementary characteristics of image and point cloud data is designed to guide the diffusion model for generating realistic remote sensing imagery.
- This approach significantly enhances the fidelity and realism of synthesized remote sensing images while effectively reducing spatial structural randomness.
- It establishes a robust and efficient solution for cross-modal and cross-view image generation, offering significant value for observation in inaccessible areas like UAV no-fly zones and underground regions.
Abstract
1. Introduction
- (1)
- We propose a geometry-constrained controllable generation model called GCCG-RSI. This model mitigates geometric structural inaccuracies arising from the inherent randomness of remote sensing image generation.
- (2)
- We design a dual remote sensing image feature fusion module that leverages an attention mechanism to facilitate mutual guidance and information complementarity between the ground image and point clouds. The approach effectively enhances the realism and geometric fidelity of the generated images by incorporating fused features as control conditions into the diffusion model.
- (3)
- We conduct a comprehensive experimental evaluation on two datasets across diverse environments. The experimental results demonstrate that our proposed method robustly and consistently generates remote sensing images, thereby serving as valuable references for downstream tasks.
2. Related Work
2.1. Image Generation in Cross-View Localization
2.2. Diffusion Model for Image Generation
3. Materials and Methods
3.1. Image-Point Clouds Geometric Projection
3.2. Dual Remote Sensing Images Feature Fusion
3.2.1. Cross Attention Branch
3.2.2. Self Attention Branch
3.3. Geometric-Constrained Conditional Diffusion Model
| Algorithm 1 Controlled Latent Diffusion for Cross-View Satellite Image Generation |
| Require: Ground-truth satellite image , feature images and , fused feature |
| Parameters: Encoder , decoder , denoising U-Net with ControlNet branch, noise |
| schedule , learning rate |
Training phase:
|
Inference phase:
|
4. Results
4.1. Experimental Data and Evaluation Metrics
4.2. Evaluation Results
4.3. Ablation Study
4.4. Downstream Applications
| Type | Method | Generated Image | Original Image | ||
|---|---|---|---|---|---|
| Dist (m) | (°) | Dist (m) | (°) | ||
| Image-to-image | LM [41] | 14.62 | 4.91 | 12.08 | 3.72 |
| SliceMatch [48] | 8.52 | 5.05 | 7.96 | 4.12 | |
| CCVPE [49] | 3.77 | 4.51 | 1.22 | 0.67 | |
| Hu et al. [50] | 4.83 | 5.72 | 2.10 | 3.94 | |
| Song et al. [47] | 2.62 | 3.16 | 1.48 | 0.49 | |
| LiDAR-to-image | Zhang et al. [51] | 6.39 | 5.12 | 5.88 | 3.42 |
| Zhang et al. [52] | 5.76 | 3.90 | 4.49 | 2.27 | |
| Sun et al. [53] | 7.14 | 3.25 | 5.25 | 2.83 | |
| Wang et al. [54] | 8.47 | 5.31 | 6.76 | 3.21 | |
| Hu et al. [3] | 4.58 | 2.86 | 3.66 | 1.85 | |
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hu, D.; Yuan, X.; Xu, X.; Zhao, C. A Review of Ground-to-Aerial Cross-View Localization Research. J. Electron. Inf. Technol. 2025, 47, 5016–5032. [Google Scholar]
- Szász, B.; Heil, B.; Kovács, G.; Mészáros, D.; Czimber, K. Comparison of Advanced Terrestrial and Aerial Remote Sensing Methods for Above-Ground Carbon Stock Estimation—A Comparative Case Study for a Hungarian Temperate Forest. Remote Sens. 2025, 17, 2173. [Google Scholar] [CrossRef]
- Hu, D.; Yuan, X.; Xi, H.; Li, J.; Song, Z.; Xiong, F.; Zhang, K.; Zhao, C. Road structure inspired UGV-satellite cross-view geo-localization. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 16767–16786. [Google Scholar] [CrossRef]
- Zhu, Y.; Chen, S.; Lu, X.; Chen, J. Cross-view image synthesis from a single image with progressive parallel GAN. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4701513. [Google Scholar] [CrossRef]
- Niu, Z.; Li, Y.; Gong, Y.; Zhang, B.; He, Y.; Zhang, J.; Tian, M.; He, L. Multi-Class Guided GAN for Remote-Sensing Image Synthesis Based on Semantic Labels. Remote Sens. 2025, 17, 344. [Google Scholar] [CrossRef]
- Lai, Z.; Tang, C.; Lv, J. Multi-view image generation by cycle CVAE-GAN networks. In Proceedings of the International Conference on Neural Information Processing; Springer: Cham, Switzerland, 2019; pp. 43–54. [Google Scholar]
- Cai, H.; Huang, W.; Yang, S.; Ding, S.; Zhang, Y.; Hu, B.; Zhang, F.; Cheung, Y.M. Realize generative yet complete latent representation for incomplete multi-view learning. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 3637–3652. [Google Scholar] [CrossRef]
- Sordo, Z.; Chagnon, E.; Hu, Z.; Donatelli, J.J.; Andeer, P.; Nico, P.S.; Northen, T.; Ushizima, D. Synthetic scientific image generation with VAE, GAN, and diffusion model architectures. J. Imaging 2025, 11, 252. [Google Scholar] [CrossRef]
- Li, W.; He, J.; Ye, J.; Zhong, H.; Zheng, Z.; Huang, Z.; Lin, D.; He, C. Crossviewdiff: A cross-view diffusion model for satellite-to-street view synthesis. arXiv 2024, arXiv:2408.14765. [Google Scholar]
- Seo, M.; Jung, J.; Choi, D.G. Improved flood insights: Diffusion-based SAR-to-EO image translation. Remote Sens. 2025, 17, 2260. [Google Scholar] [CrossRef]
- Guo, Z.; Hu, W.; Zheng, S.; Zhang, B.; Zhou, M.; Peng, J.; Yao, Z.; Feng, M. Efficient Conditional Diffusion Model for SAR Despeckling. Remote Sens. 2025, 17, 2970. [Google Scholar] [CrossRef]
- Lee, Y.; Kim, K.; Kim, H.; Sung, M. Syncdiffusion: Coherent montage via synchronized joint diffusions. Adv. Neural Inf. Process. Syst. 2023, 36, 50648–50660. [Google Scholar]
- Lin, T.J.; Wang, W.; Shi, Y.; Perincherry, A.; Vora, A.; Li, H. Geometry-guided cross-view diffusion for one-to-many cross-view image synthesis. In Proceedings of the 2025 International Conference on 3D Vision (3DV); IEEE Computer Society: Washington, DC, USA, 2025; pp. 866–881. [Google Scholar]
- Hu, D.; Yuan, X.; Zhao, C. Active layered topology mapping driven by road intersection. Knowl.-Based Syst. 2025, 315, 113305. [Google Scholar] [CrossRef]
- Ding, X.; Zhang, X.; Song, S.; Li, B.; Hui, L.; Dai, Y. Cross-View Geo-Localization via 3D Gaussian Splatting-Based Novel View Synthesis. Remote Sens. 2025, 17, 3673. [Google Scholar] [CrossRef]
- Bajbaa, K.; Anwar, A.; Saqib, M.; Anwar, H.; Sharma, N.; Usman, M. From Satellite to Street: A Hybrid Framework Integrating Stable Diffusion and PanoGAN for Consistent Cross-View Synthesis. arXiv 2025, arXiv:2509.24369. [Google Scholar] [CrossRef]
- Regmi, K.; Borji, A. Cross-view image synthesis using geometry-guided conditional gans. Comput. Vis. Image Underst. 2019, 187, 102788. [Google Scholar] [CrossRef]
- Zhao, L.; Zhou, Y.; Hu, X.; Gan, W.; Huang, G.; Zhang, C.; Hou, M. Street-to-satellite view synthesis for cross-view geo-localization. In Proceedings of the International Conference on Remote Sensing Technology and Survey Mapping (RSTSM 2024); SPIE: Bellingham, WA, USA, 2024; Volume 13166, pp. 48–54. [Google Scholar]
- Wu, S.; Tang, H.; Jing, X.Y.; Zhao, H.; Qian, J.; Sebe, N.; Yan, Y. Cross-view panorama image synthesis. IEEE Trans. Multimed. 2022, 25, 3546–3559. [Google Scholar] [CrossRef]
- Shi, Y.; Campbell, D.; Yu, X.; Li, H. Geometry-guided street-view panorama synthesis from satellite imagery. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 10009–10022. [Google Scholar] [CrossRef]
- Li, G.; Qian, M.; Xia, G.S. Unleashing unlabeled data: A paradigm for cross-view geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 16719–16729. [Google Scholar]
- Ze, X.; Song, Z.; Wang, Q.; Lu, J.; Shi, Y. Controllable satellite-to-street-view synthesis with precise pose alignment and zero-shot environmental control. arXiv 2025, arXiv:2502.03498. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2022; pp. 10684–10695. [Google Scholar]
- Wang, X.; Cai, W.; Ding, Y.; Di, X.; Li, S.; Yin, Z.; Jia, H.; Fu, J. RGB to Infrared Image Translation Based on Diffusion Bridges Under Aerial Perspective. Remote Sens. 2025, 17, 3703. [Google Scholar] [CrossRef]
- Croitoru, F.A.; Hondru, V.; Ionescu, R.T.; Shah, M. Diffusion models in vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10850–10869. [Google Scholar] [CrossRef]
- Zheng, G.; Li, S.; Wang, H.; Yao, T.; Chen, Y.; Ding, S.; Li, X. Entropy-Driven Sampling and Training Scheme for Conditional Diffusion Generation. In Proceedings of the Computer Vision—ECCV 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer: Cham, Switzerland, 2022; pp. 754–769. [Google Scholar]
- Kawar, B.; Elad, M.; Ermon, S.; Song, J. Denoising Diffusion Restoration Models. arXiv 2022, arXiv:2201.11793. [Google Scholar]
- Dhariwal, P.; Nichol, A.Q. Diffusion Models Beat GANs on Image Synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
- Ho, J.; Salimans, T. Classifier-Free Diffusion Guidance. arXiv 2021, arXiv:2207.12598. [Google Scholar]
- Mou, C.; Wang, X.; Xie, L.; Wu, Y.; Zhang, J.; Qi, Z.; Shan, Y. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI: Singapore, 2024; Volume 38, pp. 4296–4304. [Google Scholar]
- Esser, P.; Kulal, S.; Blattmann, A.; Entezari, R.; Müller, J.; Saini, H.; Levi, Y.; Lorenz, D.; Sauer, A.; Boesel, F.; et al. Scaling rectified flow transformers for high-resolution image synthesis. In Proceedings of the Forty-First International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Arrabi, A.; Zhang, X.; Sultani, W.; Chen, C.; Wshah, S. Cross-view meets diffusion: Aerial image synthesis with geometry and text guidance. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, 26 February–6 March 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 5356–5366. [Google Scholar]
- Ye, J.; He, J.; Li, W.; Lv, Z.; Lin, Y.; Yu, J.; Yang, H.; He, C. Leveraging BEV paradigm for ground-to-aerial image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2025; pp. 28451–28461. [Google Scholar]
- Yu, Z.; Liu, C.; Liu, L.; Shi, Z.; Zou, Z. Metaearth: A generative foundation model for global-scale remote sensing image generation. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 1764–1781. [Google Scholar] [CrossRef]
- Liu, C.; Chen, K.; Zhao, R.; Zou, Z.; Shi, Z. Text2Earth: Unlocking text-driven remote sensing image generation with a global-scale dataset and a foundation model. IEEE Geosci. Remote Sens. Mag. 2025, 13, 238–259. [Google Scholar] [CrossRef]
- Wang, X.; Xu, R.; Cui, Z.; Wan, Z.; Zhang, Y. Fine-grained cross-view geo-localization using a correlation-aware homography estimator. Adv. Neural Inf. Process. Syst. 2023, 36, 5301–5319. [Google Scholar]
- Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar]
- Zhang, L.; Rao, A.; Agrawala, M. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2023; pp. 3836–3847. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The kitti dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Agarwal, S.; Vora, A.; Pandey, G.; Williams, W.; Kourous, H.; McBride, J. Ford multi-AV seasonal dataset. Int. J. Robot. Res. 2020, 39, 1367–1376. [Google Scholar] [CrossRef]
- Shi, Y.; Li, H. Beyond cross-view image retrieval: Highly accurate vehicle localization using satellite image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2022; pp. 17010–17020. [Google Scholar]
- Toker, A.; Zhou, Q.; Maximov, M.; Leal-Taixé, L. Coming down to earth: Satellite-to-street view synthesis for geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2021; pp. 6488–6497. [Google Scholar]
- Lu, X.; Li, Z.; Cui, Z.; Oswald, M.R.; Pollefeys, M.; Qin, R. Geometry-aware satellite-to-ground image synthesis for urban areas. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2020; pp. 859–867. [Google Scholar]
- Shi, Y.; Liu, L.; Yu, X.; Li, H. Spatial-aware feature aggregation for image based cross-view geo-localization. Adv. Neural Inf. Process. Syst. 2019, 32, 1–11. [Google Scholar]
- Tang, H.; Xu, D.; Sebe, N.; Wang, Y.; Corso, J.J.; Yan, Y. Multi-channel attention selection gan with cascaded semantic guidance for cross-view image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2019; pp. 2417–2426. [Google Scholar]
- Brooks, T.; Holynski, A.; Efros, A.A. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2023; pp. 18392–18402. [Google Scholar]
- Song, Z.; Lu, J.; Shi, Y. Learning dense flow field for highly-accurate cross-view camera localization. Adv. Neural Inf. Process. Syst. 2024, 36, 70612–70625. [Google Scholar]
- Lentsch, T.; Xia, Z.; Caesar, H.; Kooij, J.F. SliceMatch: Geometry-guided Aggregation for Cross-View Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2023; pp. 17225–17234. [Google Scholar]
- Xia, Z.; Booij, O.; Kooij, J.F.P. Convolutional Cross-View Pose Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 3813–3831. [Google Scholar] [CrossRef] [PubMed]
- Hu, W.; Zhang, Y.; Liang, Y.; Han, X.; Yin, Y.; Kruppa, H.; Ng, S.K.; Zimmermann, R. PetalView: Fine-grained Location and Orientation Extraction of Street-view Images via Cross-view Local Search. In Proceedings of the 31st ACM International Conference on Multimedia; ACM: New York, NY, USA, 2023; pp. 56–66. [Google Scholar]
- Zhang, Y.; Wang, J.; Wang, X.; Li, C.; Wang, L. 3d lidar-based intersection recognition and road boundary detection method for unmanned ground vehicle. In Proceedings of the 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Gran Canaria, Spain, 15–18 September 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 499–504. [Google Scholar]
- Zhang, Y.; Wang, J.; Wang, X.; Dolan, J.M. Road-Segmentation-Based Curb Detection Method for Self-Driving via a 3D-LiDAR Sensor. IEEE Trans. Intell. Transp. Syst. 2018, 19, 3981–3991. [Google Scholar] [CrossRef]
- Sun, P.; Zhao, X.; Xu, Z.; Wang, R.; Min, H. A 3D LiDAR data-based dedicated road boundary detection algorithm for autonomous vehicles. IEEE Access 2019, 7, 29623–29638. [Google Scholar] [CrossRef]
- Wang, G.; Wu, J.; He, R.; Tian, B. Speed and Accuracy Tradeoff for LiDAR Data Based Road Boundary Detection. IEEE/CAA J. Autom. Sin. 2021, 8, 1210–1220. [Google Scholar] [CrossRef]






| Parameter | Setting Value |
|---|---|
| Resolution of satellite image | 512 × 512 |
| Resolution of ground image | 512 × 1024 |
| Vertical FoV of projection | 17.5° |
| Pitch angle of projection | 1.9° |
| Scale of projection | 4 |
| Num of attention heads | 8 |
| in SSIM | (1.0, 1.0, 1.0) |
| Batch size | 1 |
| Learning rate | 0.00001 |
| Learning rate scheduler | Cosine annealing |
| Optimizer | AdamW |
| Number of epochs | 300 |
| Scale factor | 0.18215 |
| Model of GPU | NVIDIA GeForce RTX 3090 |
| Dataset | Method | PSNR ↑ | SSIM ↑ | LPIPS ↓ | FID ↓ | ↓ |
|---|---|---|---|---|---|---|
| KITTI | SelGAN [45] | 11.02 | 0.108 | 0.724 | 80.90 | 0.512 |
| GPG2A [32] | 10.08 | 0.117 | 0.710 | 67.92 | 0.399 | |
| Instr-p2p [46] | 10.14 | 0.126 | 0.701 | 52.18 | 0.495 | |
| ControlNet [38] | 10.93 | 0.135 | 0.676 | 42.21 | 0.392 | |
| SkyDiffusion [33] | 12.26 | 0.153 | 0.655 | 39.39 | 0.367 | |
| GCC [13] | 11.65 | 0.172 | 0.671 | 47.36 | 0.382 | |
| Proposed | 11.33 | 0.145 | 0.684 | 39.58 | 0.334 | |
| Ford | SelGAN [45] | 10.92 | 0.109 | 0.775 | 91.02 | 0.523 |
| GPG2A [32] | 9.81 | 0.120 | 0.752 | 65.57 | 0.412 | |
| Instr-p2p [46] | 9.89 | 0.118 | 0.741 | 59.46 | 0.513 | |
| ControlNet [38] | 10.88 | 0.140 | 0.723 | 47.36 | 0.421 | |
| SkyDiffusion [33] | 11.69 | 0.131 | 0.713 | 45.52 | 0.377 | |
| GCC [13] | 11.19 | 0.164 | 0.736 | 53.18 | 0.394 | |
| Proposed | 11.25 | 0.139 | 0.698 | 45.87 | 0.326 |
| IP-GP | DRF | PSNR ↑ | SSIM ↑ | LPIPS ↓ | FID ↓ | ↓ |
|---|---|---|---|---|---|---|
| ✕ | ✕ | 10.05 | 0.134 | 0.731 | 43.88 | 0.445 |
| ✕ | ✓ | 10.16 | 0.135 | 0.718 | 43.51 | 0.437 |
| ✓ | ✕ | 10.31 | 0.132 | 0.713 | 42.12 | 0.382 |
| ✓ | ✓ | 11.33 | 0.145 | 0.684 | 39.58 | 0.334 |
| I-RSI | CP-RSI | PSNR ↑ | SSIM ↑ | LPIPS ↓ | FID ↓ | ↓ |
|---|---|---|---|---|---|---|
| ✕ | ✕ | 10.05 | 0.134 | 0.731 | 43.88 | 0.432 |
| ✕ | ✓ | 10.28 | 0.134 | 0.721 | 42.18 | 0.362 |
| ✓ | ✕ | 10.38 | 0.124 | 0.708 | 43.58 | 0.412 |
| ✓ | ✓ | 11.33 | 0.145 | 0.684 | 39.58 | 0.334 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Hu, D.; Qin, R.; Yuan, X.; Yang, S.; Zhao, C. GCCG-RSI: Ground LiDAR and Image-Guided Geometry-Constrained Controllable Generation for Remote Sensing Image. Remote Sens. 2026, 18, 1512. https://doi.org/10.3390/rs18101512
Hu D, Qin R, Yuan X, Yang S, Zhao C. GCCG-RSI: Ground LiDAR and Image-Guided Geometry-Constrained Controllable Generation for Remote Sensing Image. Remote Sensing. 2026; 18(10):1512. https://doi.org/10.3390/rs18101512
Chicago/Turabian StyleHu, Di, Riyu Qin, Xia Yuan, Shuting Yang, and Chunxia Zhao. 2026. "GCCG-RSI: Ground LiDAR and Image-Guided Geometry-Constrained Controllable Generation for Remote Sensing Image" Remote Sensing 18, no. 10: 1512. https://doi.org/10.3390/rs18101512
APA StyleHu, D., Qin, R., Yuan, X., Yang, S., & Zhao, C. (2026). GCCG-RSI: Ground LiDAR and Image-Guided Geometry-Constrained Controllable Generation for Remote Sensing Image. Remote Sensing, 18(10), 1512. https://doi.org/10.3390/rs18101512

