Lightweight Unsupervised Homography Estimation for Infrared and Visible Images Based on UAV Perspective Enabling Real-Time Processing in Space–Air–Ground Integrated Network
Highlights
- We propose a lightweight homography estimation method, LFHomo, for homography estimation of infrared and visible images in low-altitude scenarios.
- Experimental results show that the propose method significantly outperforms existing methods on infrared and visible datasets based on UAV perspective, with significant advantages in computational complexity and inference speed.
- LFHomo balances model accuracy and computational efficiency, demonstrating the potential of deep networks for low-altitude multimodal image registration tasks with excellent scalability.
- We constructed a novel unregistered UAV-based infrared and visible image dataset, which provides support for research on multimodal UAV remote sensing image registration and fusion.
Abstract
1. Introduction
- We introduce an Inverted Residual Shift Convolution (IRSC) block that embeds a shift module into an inverted residual structure to capture local contextual features in blurred low-altitude UAV images.
- We design a Spatial-Reduction Channel-Sequential Shuffle Attention (SRCSSA) module that suppresses redundant and enhances informative features through spatial reduction, channel grouping and shuffling, and attention-based fusion.
- We develop a lightweight CNN–GNN hybrid homography estimator, LFHomoE, which achieves a favorable trade-off between accuracy and efficiency and delivers fast inference on both synthetic benchmark datasets and unregistered infrared–visible UAV image pairs.
2. Related Work
2.1. Homography Estimation for Multimodel Image
2.2. Lightweight Hybrid Backbones
3. Method
3.1. Network Structure
3.2. Anti-Blurring Feature Extractor
3.2.1. Inverse Residual Shift Convolution Block
3.2.2. Spatial Reduction-Based Channel Sequence Shuffle Attention
3.3. Lightweight and Fast Homography Estimator
3.4. Loss Function
3.4.1. Detail Feature Loss
3.4.2. Feature Identity Loss
4. Experimental Results
4.1. Dataset and Implementation Details
4.1.1. Dataset
4.1.2. Experimentation Details
4.2. Evaluation Metric
4.2.1. Average Corner Error
4.2.2. Structural Similarity
4.2.3. Adaptive Feature Registration Rate
4.2.4. Average Inference Time, Parameters, and Peak Memory Usage
4.3. Qualitative Comparison
Analysis of Non-Advantageous Cases on UHBD
4.4. Quantitative Comparison
4.5. Comparison on the Real-World Dataset
4.6. Ablation Studies
4.6.1. Effectiveness of Proposed Components
4.6.2. Effectiveness of Attention
4.6.3. Hyperparameter Settings in Loss Function
5. Discussion
5.1. Analysis of SRCSSA and Standard Attention Mechanisms
5.2. Limitations and Future Work
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| SIFT | Scale Invariant Feature Transform |
| ORB | Oriented FAST and Rotated BRIEF |
| DLT | Direct Linear Transformation |
| OAN | Order Aware Network |
| LCTrans | Local Correlation Transformer |
| GNNs | Graph Neural Networks |
| K-NN | K-Nearest Neighbor |
| UAV | Unmanned Aerial Vehicle |
| ViT | Vision Transformers |
| SM | Shift Module |
| IRSC | Inverted Residual Shift Convolution |
| SRCSSA | Spatial Reduction Channel-Sequential Shuffle Attention |
| SURF | Speeded Up Robust Features |
| BEBLID | Boosted Efficient Binary Local Image Descriptor |
| RANSAC | Random Sample Consensus |
| MAGSAC++ | Marginalizing Sample Consensus |
| LIFT | Learned Invariant Feature Transform |
| LoFTR | Local Feature matching with Transformers |
| GAN | Generative Adversarial Network |
| FCTrans | Feature Correlation Transformer |
| CNN | Convolutional Neural Network |
| NAS | Network Architecture Search |
| ViG | Vision in Graph Neural Network |
| SVGA | Sparse Vision Graph Attention |
| CSSA | Channel Sequence Shuffling Attention |
| GSA | Global Spatial Attention |
| AFF | Attention Feature Fusion |
| IRBlock | Inverse Residual Block |
| DFL | Detail Feature Loss |
| FIL | Feature Identity Loss |
| NIUHBD | Near-Infrared UAV Homography Benchmark Dataset |
| UHBD | UAV Homography Benchmark Dataset |
| ACE | Average Corner Error |
| STN | Spatial Transform Network |
| SENet | Squeeze and Excitation Network |
| CBAM | Convolutional Block Attention Module |
| ECA | Efficient Channel Attention |
| CA | Coordinate Attention |
| TA | Triple Attention |
| SSIM | Structural Similarity |
| AFRR | Adaptive Feature Registration Rate |
| ELA | Efficient Local Attention |
References
- Ye, T.; Qin, W.; Zhao, Z.; Gao, X.; Deng, X.; Ouyang, Y. Real-time object detection network in UAV-vision based on CNN and transformer. IEEE Trans. Instrum. Meas. 2023, 72, 2505713. [Google Scholar] [CrossRef]
- Kakaletsis, E.; Symeonidis, C.; Tzelepi, M.; Mademlis, I.; Tefas, A.; Nikolaidis, N.; Pitas, I. Computer vision for autonomous UAV flight safety: An overview and a vision-based safe landing pipeline example. ACM Comput. Surv. CSUR 2021, 54, 1–37. [Google Scholar] [CrossRef]
- Chen, Q.; Zhu, H.; Yang, L.; Chen, X.; Pollin, S.; Vinogradov, E. Edge computing assisted autonomous flight for UAV: Synergies between vision and communications. IEEE Commun. Mag. 2021, 59, 28–33. [Google Scholar] [CrossRef]
- McEnroe, P.; Wang, S.; Liyanage, M. A Survey on the Convergence of Edge Computing and AI for UAVs: Opportunities and Challenges. IEEE Internet Things J. 2022, 9, 15435–15459. [Google Scholar] [CrossRef]
- Zhao, D.; Zhou, L.; Li, Y.; He, W.; Arun, P.V.; Zhu, X.; Hu, J. Visibility estimation via near-infrared bispectral real-time imaging in bad weather. Infrared. Phys. Technol. 2024, 136, 105008. [Google Scholar] [CrossRef]
- Qin, H.; Xu, T.; Li, T.; Chen, Z.; Feng, T.; Li, J. MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 10–17 June 2025; pp. 16882–16891. [Google Scholar]
- Zhao, D.; Hu, B.; Jiang, W.; Zhong, W.; Arun, P.V.; Cheng, K.; Zhao, Z.; Zhou, H. Hyperspectral video tracker based on spectral difference matching reduction and deep spectral target perception features. Opt. Lasers Eng. 2025, 194, 109124. [Google Scholar] [CrossRef]
- Ramos, L.; Sappa, A.D. Multispectral semantic segmentation for land cover classification: An overview. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 14295–14336. [Google Scholar] [CrossRef]
- Memari, M.; Shekaramiz, M.; Masoum, M.A.; Seibi, A.C. Data Fusion and Ensemble Learning for Advanced Anomaly Detection Using Multi-Spectral RGB and Thermal Imaging of Small Wind Turbine Blades. Energies 2024, 17, 673. [Google Scholar] [CrossRef]
- Zhao, D.; Yan, W.; You, M.; Zhang, J.; Arun, P.V.; Jiao, C.; Wang, Q.; Zhou, H. Hyperspectral Anomaly Detection Based on Empirical Mode Decomposition and Local Weighted Contrast. IEEE Sens. J. 2024, 24, 33847–33861. [Google Scholar] [CrossRef]
- Zhao, D.; Asano, Y.; Gu, L.; Sato, I.; Zhou, H. City-scale distance sensing via bispectral light extinction in bad weather. Remote Sens. 2020, 12, 1401. [Google Scholar] [CrossRef]
- Shin, U.; Park, K.; Lee, B.-U.; Lee, K.; Kweon, I.S. Self-Supervised Monocular Depth Estimation from Thermal Images via Adversarial Multi-Spectral Adaptation. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 5787–5796. [Google Scholar]
- Luo, Y.; Wang, X.; Liao, Y.; Fu, Q.; Shu, C.; Wu, Y.; He, Y. A Review of Homography Estimation: Advances and Challenges. Electronics 2023, 12, 4977. [Google Scholar] [CrossRef]
- Lin, B.; Xu, X.; Shen, Z.; Yang, X.; Zhong, L.; Zhang, X. A Registration Algorithm for Astronomical Images Based on Geometric Constraints and Homography. Remote Sens. 2023, 15, 1921. [Google Scholar] [CrossRef]
- Debaque, B.; Perreault, H.; Mercier, J.P.; Drouin, M.A.; David, R.; Chatelais, B.; Duclos-Hindié, N.; Roy, S. Thermal and Visible Image Registration Using Deep Homography. In Proceedings of the 2022 25th International Conference on Information Fusion (FUSION), Linköping, Sweden, 4–7 July 2022; pp. 1–8. [Google Scholar]
- Bazargani, H.; Bilaniuk, O.; Laganière, R. A Fast and Robust Homography Scheme for Real-Time Planar Target Detection. J. Real-Time Image Proc. 2018, 15, 739–758. [Google Scholar] [CrossRef]
- Lu, F.; Dong, S.; Zhang, L.; Liu, B.; Lan, X.; Jiang, D.; Yuan, C. Deep homography estimation for visual place recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 10341–10349. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An Efficient Alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Barath, D.; Noskova, J.; Ivashechkin, M.; Matas, J. MAGSAC++, a Fast, Reliable and Accurate Robust Estimator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1304–1312. [Google Scholar]
- Luo, Y.; Wang, X.; Wu, Y.; Shu, C. Detail-Aware Deep Homography Estimation for Infrared and Visible Image. Electronics 2022, 11, 4185. [Google Scholar] [CrossRef]
- Luo, Y.; Wang, X.; Wu, Y.; Shu, C. Infrared and Visible Image Homography Estimation Using Multiscale Generative Adversarial Network. Electronics 2023, 12, 788. [Google Scholar] [CrossRef]
- Wang, X.; Luo, Y.; Fu, Q.; Rui, Y.; Shu, C.; Wu, Y.; He, Z.; He, Y. Infrared and Visible Image Homography Estimation Based on Feature Correlation Transformers for Enhanced 6G Space–Air–Ground Integrated Network Perception. Remote Sens. 2023, 15, 3535. [Google Scholar] [CrossRef]
- Wang, X.; Luo, Y.; Fu, Q.; He, Y.; Shu, C.; Wu, Y.; Liao, Y. Coarse-to-Fine Homography Estimation for Infrared and Visible Images. Electronics 2023, 12, 4441. [Google Scholar] [CrossRef]
- Liao, Y.; Luo, Y.; Fu, Q.; Shu, C.; Wu, Y.; Liu, Q.; He, Y. Deep Unsupervised Homography Estimation for Single-Resolution Infrared and Visible Images Using GNN. Electronics 2024, 13, 4173. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Howard, A.; Pang, R.; Adam, H.; Le, Q.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.C.; Tan, M.; Chu, G.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, Seattle, WA, USA, 13–19 June 2020; pp. 1577–1586. [Google Scholar]
- Mehta, S.; Rastegari, M. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. In Proceedings of the ICLR, Virtual-Only, 25 April 2022. [Google Scholar]
- Li, Y.; Yuan, G.; Wen, Y.; Hu, J.; Evangelidis, G.; Tulyakov, S.; Wang, Y.; Ren, J. Efficientformer: Vision transformers at mobilenet speed. Adv. Neural Inf. Process. Syst. 2022, 35, 12934–12949. [Google Scholar]
- Munir, M.; Avery, W.; Marculescu, R. Mobilevig: Graph-based sparse attention for mobile vision applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 2211–2219. [Google Scholar]
- Gao, T.; Zhang, Y.; Zhang, Z.; Geng, T.; Li, A.; Fang, Z.; Shi, L.; Di, X.; Li, H. BHViT: Binarized Hybrid Vision Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 11–15 June 2025; pp. 3563–3572. [Google Scholar]
- Han, K.; Wang, Y.; Guo, J.; Tang, Y.; Wu, E. Vision gnn: An image is worth graph of nodes. In Proceedings of the 35th Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 8291–8303. [Google Scholar]
- Bay, H.; Tuytelaars, T.; Gool, L.V. Surf: Speeded Up Robust Features. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
- Suárez, I.; Sfeir, G.; Buenaposada, J.M.; Baumela, L. BEBLID: Boosted efficient binary local image descriptor. Pattern Recognit. Lett. 2020, 133, 366–372. [Google Scholar] [CrossRef]
- Yi, K.M.; Trulls, E.; Lepetit, V.; Fua, P. Lift: Learned Invariant Feature Transform. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 10–16 October 2016; pp. 467–483. [Google Scholar]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-Supervised Interest Point Detection and Description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 224–236. [Google Scholar]
- Sarlin, P.E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 4938–4947. [Google Scholar]
- Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X. LoFTR: Detector-Free Local Feature Matching with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8918–8927. [Google Scholar]
- Munir, M.; Avery, W.; Rahman, M.M.; Marculescu, R. GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, 17–21 June 2024; pp. 6118–6127. [Google Scholar]
- Li, B.; Zhao, H.; Wang, W.; Liu, J.; Jiang, P.; Liu, Y. MAIR: A Locality-and Continuity-Preserving Mamba for Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 11–15 June 2025; pp. 7491–7501. [Google Scholar]
- Li, G.; Muller, M.; Thabet, A.; Ghanem, B. Deepgcns: Can Gcns Go as Deep as Cnns? In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 9267–9276. [Google Scholar]
- Dai, Y.; Gieseke, F.; Oehmcke, S.; Wu, Y.; Barnard, K. Attentional feature fusion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2021; pp. 3560–3569. [Google Scholar]
- Hong, M.; Lu, Y.; Ye, N.; Lin, C.; Zhao, Q.; Liu, S. Unsupervised Homography Estimation with Coplanarity-Aware GAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 17663–17672. [Google Scholar]
- Zhang, J.; Wang, C.; Liu, S.; Jia, L.; Ye, N.; Wang, J.; Zhou, J.; Sun, J. Content-Aware Unsupervised Deep Homography Estima-tion. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 653–669. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial Transformer Networks. In Proceedings of the 29th Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2017–2025. [Google Scholar]
- Razakarivony, S.; Jurie, F. Vehicle detection in aerial imagery: A small target detection benchmark. J. Vis. Commun. Image Represent. 2016, 34, 187–203. [Google Scholar] [CrossRef]
- Peng, T.; Li, Q.; Zhu, P. Rgb-t crowd counting from drone: A benchmark and mmccn network. In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020. [Google Scholar]
- Sun, Y.; Cao, B.; Zhu, P. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6700–6713. [Google Scholar] [CrossRef]
- Aguilera, C.; Barrera, F.; Lumbreras, F.; Sappa, A.D.; Toledo, R. Multispectral Image Feature Points. Sensors 2012, 12, 12661–12672. [Google Scholar] [CrossRef]
- Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary robust invariant scalable keypoints. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2548–2555. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 7132–7141. [Google Scholar]
- Sanghyun, W.; Jongchan, P.; Joon-Young, L.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Misra, D.; Nalamada, T.; Arasanipalai, A.U.; Hou, Q. Rotate to attend: Convolutional triplet attention module. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2021; pp. 3139–3148. [Google Scholar]
- Xu, W.; Wan, Y. ELA: Efficient Local Attention for Deep Convolutional Neural Networks. arXiv 2024, arXiv:2403.01123. [Google Scholar] [CrossRef]
- Zhao, D.; Zhong, W.; Ge, M.; Jiang, W.; Zhu, X.; Arun, P.V.; Zhou, H. SiamBSI: Hyperspectral video tracker based on band correlation grouping and spatial-spectral information interaction. Infrared Phys. Technol. 2025, 151, 106063. [Google Scholar] [CrossRef]
- Filali, A.; Abouaomar, A.; Cherkaoui, S.; Kobbane, A.; Guizani, M. Multi-access edge computing: A Survey. IEEE Access 2020, 8, 197017–197046. [Google Scholar] [CrossRef]
- Qi, Q.; Chen, X.; Khalili, A.; Zhong, C.; Zhang, Z.; Ng, D.W.K. Integrating Sensing, Computing, and Communication in 6G Wireless Networks: Design and Optimization. IEEE Trans. Commun. 2022, 70, 6212–6227. [Google Scholar] [CrossRef]
- Zhao, D.; Tang, L.; Arun, P.V.; Asano, Y.; Zhang, L.; Xiong, Y.; Tao, X.; Hu, J. City-scale distance estimation via near-infrared trispectral light extinction in bad weather. Infrared Phys. Technol. 2023, 128, 104507. [Google Scholar] [CrossRef]
- Asadzadeh, S.; de Oliveira, W.J.; de Souza Filho, C.R. UAV-based remote sensing for the petroleum industry and environmental monitoring: State-of-the-art and perspectives. J. Pet. Sci. Eng. 2022, 208, 109633. [Google Scholar] [CrossRef]
- Daud, S.M.S.M.; Yusof, M.Y.P.M.; Heo, C.C.; Khoo, L.S.; Singh, M.K.C.; Mahmood, M.S.; Nawawi, H. Applications of drone in disaster management: A scoping review. Sci. Justice 2022, 62, 30–42. [Google Scholar] [CrossRef]
- Khan, A.; Gupta, S.; Gupta, S.K. Emerging UAV technology for disaster detection, mitigation, response, and preparedness. J. Field Robot. 2022, 39, 905–955. [Google Scholar] [CrossRef]
- Wudunn, M.; Zakhor, A.; Touzani, S.; Granderson, J. Aerial 3d building reconstruction from rgb drone imagery. Geospat. Inform. X 2020, 11398, 9–19. [Google Scholar]
- Maboudi, M.; Homaei, M.; Song, S.; Malihi, S.; Saadatseresht, M.; Gerke, M. A Review on Viewpoints and Path Planning for UAV-Based 3D Reconstruction. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 5026–5048. [Google Scholar] [CrossRef]
- Mittal, M.; Mohan, R.; Burgard, W.; Valada, A. Vision-based autonomous UAV navigation and landing for urban search and rescue. Springer Proc. Adv. Robot. 2022, 20, 575–592. [Google Scholar]
- Du, Y. Multi-UAV Search and Rescue with Enhanced A* Algorithm Path Planning in 3D Environment. Int. J. Aerosp. Eng. 2023, 2023, 8614117. [Google Scholar] [CrossRef]
- Zhao, D.; Zhang, H.; Arun, P.V.; Jiao, C.; Zhou, H.; Xiang, P.; Cheng, K. SiamSTU: Hyperspectral video tracker based on spectral spatial angle mapping enhancement and state aware template update. Infrared Phys. Technol. 2025, 151, 105919. [Google Scholar] [CrossRef]
- Zhao, D.; Zhang, H.; Huang, K.; Zhu, X.; Arun, P.V.; Jiang, W.; Li, S.; Pei, X.; Zhou, H. SASU-Net: Hyperspectral video tracker based on spectral adaptive aggregation weighting and scale updating. Expert Syst. Appl. 2025, 272, 126721. [Google Scholar] [CrossRef]









| Method | Core Strategy | Advantages | Limitations |
|---|---|---|---|
| DADHN [22] | Designs a fine-grained feature extractor to support homography estimation. | Preserves channel and spatial information of multi-scale features, enhancing meaningful feature representations. | Limited shallow feature extraction capability; struggles in heavily blurred regions. |
| HomoMGAN [23] | Formulates the homography estimation process as a generative adversarial process. | Self-optimizes the homography matrix via GAN without explicitly optimizing images or attention maps. | Sensitive to illumination changes. |
| FCTrans [24] | Uses cross-image attention to explicitly model feature correlations. | Converts the multi-source homography problem into a single-source one. | Difficult to handle large-baseline image pairs. |
| LCTrans [25] | Adopts a coarse-to-fine strategy to iteratively refine the homography matrix. | Obtains multi-scale feature maps within a single network. Without additional matrix fusion. | Sensitive to noise. |
| homoViG [26] | First applies a graph attention mechanism to homography estimation. | Uses graph structures to mitigate modality discrepancies and strengthen feature matching. | Generalization to motion blur, adverse weather, and low-resolution UAV scenes remains unexplored. |
| Layer | Type | Input Channels | Output Channels | Output Size |
|---|---|---|---|---|
| Layer-1 | IRSC | 1 | 8 | |
| Layer-2 | IRSC | 8 | 16 | |
| Layer-3 | IRSC | 16 | 32 | |
| Layer-4 | Downsampling | 32 | 32 | |
| Layer-5 | CSSA and GSA | 32 | 32 | |
| Layer-6 | Upsampling | 32 | 16 | |
| Layer-7 | IRSC | 16 | 1 |
| Stage | Type | Input Channels | Output Channels | Input Size | Output Size | Para (M) | FLOPs |
|---|---|---|---|---|---|---|---|
| Embedding | Conv × 3 | 2 | 42 | 0.008 | |||
| Stage 1 | 42 | 42 | 0.047 | ||||
| Downsampling | Conv | 42 | 84 | 0.032 | |||
| Stage 2 | 84 | 84 | 0.182 | ||||
| Downsampling | Conv | 84 | 168 | 0.127 | |||
| Stage 3 | 168 | 168 | 1.413 | ||||
| Downsampling | Conv | 168 | 192 | 0.291 | |||
| Stage 4 | 192 | 192 | 0.935 | ||||
| FC | Pooling & MLP | 192 | 8 | 0.001 |
| Type | DataSet | Original Dataset | Original | Unregistered Image | ||
|---|---|---|---|---|---|---|
| Number | Resolution | Number | Resolution | |||
| Training Set | NIUHBD | VEDAI | 1246 | 452 | ||
| UHBD | DroneRGBT | 1807 | 293 | |||
| DroneVehicle | 17,990 | 3705 | ||||
| Test Set | NIUHBD | VEDAI | 1246 | 24 | ||
| UHBD | DroneRGBT | 1800 | 6 | |||
| DroneVehicle | 8980 | 23 | ||||
| (1) | Method | Easy | Moderate | Hard | Average | Failure Rate |
|---|---|---|---|---|---|---|
| (2) | I3×3 | 4.59 | 5.71 | 6.77 | 5.79 | 0% |
| (3) | SIFT + RANSAC | 50.87 | - | - | 50.87 | 93% |
| (4) | SIFT + MAGSAC++ | 131.72 | - | - | 131.72 | 93% |
| (5) | ORB + RANSAC | 82.64 | 118.29 | 313.74 | 160.89 | 17% |
| (6) | ORB + MAGSAC++ | 85.99 | 109.14 | 142.54 | 109.13 | 19% |
| (7) | BRISK + RANSAC | 104.06 | 126.8 | 244.01 | 143.2 | 24% |
| (8) | BRISK + MAGSAC++ | 101.37 | 136.01 | 234.14 | 143.4 | 24% |
| (9) | DADHN | 3.84 | 5.01 | 6.09 | 5.08 | 0% |
| (10) | HomoMGAN | 3.85 | 4.99 | 6.05 | 5.06 | 0% |
| (11) | FCTrans | 3.75 | 4.70 | 5.94 | 4.91 | 0% |
| (12) | LCTrans | 3.66 | 4.65 | 5.77 | 4.80 | 0% |
| (13) | homoViG | 3.72 | 4.50 | 5.55 | 4.68 | 0% |
| (14) | LFHomo (Ours) | 3.65 | 4.59 | 5.68 | 4.73 | 0% |
| Dataset | Method | ACE | SSIM | AFRR |
|---|---|---|---|---|
| Synthetic Benchmark | DADHN [22] | 5.08 | 0.97 | 0.74 |
| HomoMGAN [23] | 5.06 | 0.97 | 0.72 | |
| FCTrans [24] | 4.91 | 0.97 | 0.72 | |
| LCTrans [25] | 4.80 | 0.97 | 0.71 | |
| homoViG [26] | 4.68 | 0.96 | 0.70 | |
| LFHomo(Ours) | 4.73 | 0.97 | 0.72 | |
| UHBD | DADHN [22] | 7.03 | 0.95 | 0.87 |
| HomoMGAN [23] | 6.69 | 0.95 | 0.85 | |
| FCTrans [24] | 6.59 | 0.96 | 0.88 | |
| LCTrans [25] | 6.41 | 0.95 | 0.87 | |
| homoViG [26] | 6.30 | 0.94 | 0.82 | |
| LFHomo (Ours) | 6.14 | 0.95 | 0.88 | |
| NIUHBD | DADHN [22] | 6.59 | 0.94 | 0.87 |
| HomoMGAN [23] | 6.54 | 0.94 | 0.88 | |
| FCTrans [24] | 6.39 | 0.95 | 0.87 | |
| LCTrans [25] | 6.36 | 0.92 | 0.85 | |
| homoViG [26] | 6.32 | 0.94 | 0.86 | |
| LFHomo (Ours) | 6.12 | 0.94 | 0.87 |
| Method | Time (s) | Parameter (M) | Memory (M) |
|---|---|---|---|
| DADHN [42] | 0.36 | 85.23 | 85.32 |
| HomoMGAN [32] | 0.31 | 20.44 | 20.44 |
| FCTrans [33] | 0.36 | 20.12 | 20.12 |
| LCTrans [16] | 0.30 | 21.99 | 21.99 |
| homoViG [17] | 0.17 | 5.243 | 21.27 |
| LFHomo (Ours) | 0.09 | 3.233 | 19.35 |
| (1) | Method | Easy | Moderate | Hard | Average | Failure Rate |
|---|---|---|---|---|---|---|
| (2) | I3×3 | 2.56 | 4.13 | 6.17 | 4.28 | - |
| (3) | SIFT + RANSAC | 81.19 | - | - | 81.19 | 89% |
| (4) | SIFT + MAGSAC++ | 81.91 | - | - | 81.91 | 98% |
| (5) | ORB + RANSAC | 112.31 | - | - | 112.31 | 81% |
| (6) | ORB + MAGSAC++ | 64.29 | - | - | 64.29 | 81% |
| (7) | BRISK + RANSAC | 63.58 | - | - | 63.58 | 79% |
| (8) | BRISK + MAGSAC++ | 52.86 | - | - | 52.86 | 81% |
| (9) | DADHN | 2.17 | 3.95 | 5.50 | 4.03 | 0% |
| (10) | HomoMGAN | 2.47 | 3.89 | 5.52 | 4.12 | 0% |
| (11) | FCTrans | 2.26 | 3.76 | 5.44 | 3.98 | 0% |
| (12) | LCTrans | 2.14 | 3.68 | 5.39 | 3.90 | 0% |
| (13) | homoViG | 2.03 | 3.64 | 5.36 | 3.82 | 0% |
| (14) | LFHomo(Ours) | 2.17 | 3.31 | 4.59 | 3.57 | 0% |
| Shift Module | SRCSSA | Backbone | ACE | Time (s) | Parameter (M) | ||
|---|---|---|---|---|---|---|---|
| Synthetic Benchmark | UHBD | NIUHBD | |||||
| ✓ | × | LFHomoE | 5.02 | 6.33 | 6.23 | 0.09 | 3.232 |
| × | ✓ | LFHomoE | 5.01 | 6.31 | 6.21 | 0.09 | 3.233 |
| ✓ | ✓ | ResNet-34 [54] | 5.08 | 6.63 | 6.52 | 0.27 | 23.53 |
| ✓ | ✓ | ViT [55] | 5.11 | 6.49 | 6.35 | 0.13 | 20.45 |
| ✓ | ✓ | ViG [35] | 4.68 | 6.26 | 6.20 | 0.11 | 5.877 |
| ✓ | ✓ | MobileViT [31] | 5.17 | 6.64 | 6.45 | 0.09 | 1.848 |
| ✓ | ✓ | MobileViG [33] | 4.76 | 6.25 | 6.19 | 0.09 | 3.232 |
| ✓ | ✓ | RFViG [26] | 4.65 | 6.22 | 6.17 | 0.12 | 5.879 |
| ✓ | ✓ | LFHomoE (Ours) | 4.73 | 6.14 | 6.12 | 0.09 | 3.233 |
| Attention | ACE | Time (s) | Parameter (M) | ||
|---|---|---|---|---|---|
| Synthetic Benchmark | UHBD | NIUHBD | |||
| SENet [56] | 5.09 | 6.71 | 6.67 | 0.09 | 3.232 |
| CBAM [57] | 4.98 | 6.54 | 6.60 | 0.11 | 3.233 |
| ECA [58] | 5.02 | 4.56 | 6.64 | 0.09 | 3.232 |
| CA [59] | 4.85 | 6.48 | 6.41 | 0.10 | 3.234 |
| TA [60] | 4.80 | 6.25 | 6.29 | 0.10 | 3.233 |
| ELA [61] | 4.82 | 6.46 | 6.53 | 0.10 | 3.233 |
| SRCSSA (Ours) | 4.73 | 6.14 | 6.12 | 0.09 | 3.233 |
| ACE | ||||
|---|---|---|---|---|
| Synthetic Benchmark | UHBD | NIUHBD | ||
| S | 0.5 | 4.99 | 6.34 | 6.25 |
| 1 | 0.5 | 5.08 | 6.58 | 6.51 |
| 2 | 0.5 | 5.06 | 6.53 | 6.44 |
| 4 | 0.5 | 4.73 | 6.14 | 6.12 |
| 8 | 0.5 | 5.02 | 6.42 | 6.37 |
| 4 | 1.0 | 4.76 | 6.17 | 6.16 |
| 4 | 0.25 | 4.76 | 6.33 | 6.28 |
| ACE | ||||
|---|---|---|---|---|
| Synthetic Benchmark | UHBD | NIUHBD | ||
| 0.1 | 0.001 | 4.93 | 6.41 | 6.32 |
| 1.0 | 0.001 | 4.86 | 6.24 | 6.14 |
| 1.0 | 0.01 | 4.73 | 6.14 | 6.12 |
| 1.0 | 0.05 | 5.11 | 6.62 | 6.56 |
| 0.1 | 0.01 | 5.06 | 6.42 | 6.27 |
| 2.0 | 0.001 | 4.80 | 6.35 | 6.24 |
| 2.0 | 0.01 | 5.10 | 6.44 | 6.40 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liao, Y.; Luo, Y.; Qian, J.; Wu, Y.; Li, C.; Chen, H. Lightweight Unsupervised Homography Estimation for Infrared and Visible Images Based on UAV Perspective Enabling Real-Time Processing in Space–Air–Ground Integrated Network. Remote Sens. 2025, 17, 3884. https://doi.org/10.3390/rs17233884
Liao Y, Luo Y, Qian J, Wu Y, Li C, Chen H. Lightweight Unsupervised Homography Estimation for Infrared and Visible Images Based on UAV Perspective Enabling Real-Time Processing in Space–Air–Ground Integrated Network. Remote Sensing. 2025; 17(23):3884. https://doi.org/10.3390/rs17233884
Chicago/Turabian StyleLiao, Yanhao, Yinhui Luo, Jide Qian, Yuezhou Wu, Chengqi Li, and Hongming Chen. 2025. "Lightweight Unsupervised Homography Estimation for Infrared and Visible Images Based on UAV Perspective Enabling Real-Time Processing in Space–Air–Ground Integrated Network" Remote Sensing 17, no. 23: 3884. https://doi.org/10.3390/rs17233884
APA StyleLiao, Y., Luo, Y., Qian, J., Wu, Y., Li, C., & Chen, H. (2025). Lightweight Unsupervised Homography Estimation for Infrared and Visible Images Based on UAV Perspective Enabling Real-Time Processing in Space–Air–Ground Integrated Network. Remote Sensing, 17(23), 3884. https://doi.org/10.3390/rs17233884

