Abstract
Homography estimation of infrared and visible light images is a key visual technique that enables drones to perceive their environment and perform autonomous localization in low-altitude environments. Its potential lies in integration with edge computing and 5G technologies, enabling real-time control of drones within air–ground integrated networks. However, research on homography estimation techniques for low-altitude dynamic viewpoints remains scarce. Additionally, images in low-altitude scenarios suffer from issues such as blurring and jitter, presenting new challenges for homography estimation tasks. To address these issues, this paper proposes a light-weight homography estimation method, LFHomo, comprising two components: two anti-blurring feature extractors with non-shared parameters and a lightweight homography estimator, LFHomoE. The anti-blurring feature extractors introduce in-verse residual layers and feature displacement modules to capture sufficient contextual information in blurred regions and to enable lossless and rapid propagation of feature information. In addition, a spatial-reduction-based channel shuffle and spatial joint attention module is designed to suppress redundant features introduced by lossless transmission, allowing efficient extraction and refinement of informative features at low computational cost. The homography estimator LFHomoE adopts a CNN–GNN hybrid architecture to efficiently model geometric relationships between cross-modal features and to achieve fast prediction of homography matrices. Meanwhile, we construct and annotate an unregistered infrared and visible image dataset from drone perspectives for model training and evaluation. Experimental results show that LFHomo maintains great registration accuracy while significantly reducing model size and inference time.