UAV-TIRVis: A Benchmark Dataset for Thermal–Visible Image Registration from Aerial Platforms
Abstract
1. Introduction
- Preprocessing (often in multimodal, since the images come from different sensors: SAR vs. optical, IR vs. visible, CT vs. MRI)
- Keypoint detection (as invariant as possible: points (e.g., corners), edges (e.g., contours), regions (e.g., anatomical structures) in both images
- Keypoint matching
- Transform estimation
- Resampling and Image Warping (apply the transformation to align the source with the target)
- Evaluation of Registration Accuracy (if ground truth exists)
- Area-based: the alignment is found by optimizing a similarity measure (cross correlation, mean squared error, mutual information, structural similarity index, etc.)
- Feature-based: identify distinctive points, lines, or regions in both images and match them to estimate the transformation
- Intensity-based: directly compare pixel intensities across images using similarity metrics like normalized cross-correlation, mutual information, or sum of squared differences.
- Learning-based: use Neural Networks to predict deformation fields
- Hybrid: combine at least two of the above
2. Related Work
3. Methods
3.1. Dataset Acquisition
- 4000 × 3000 visible-spectrum image
- 640 × 512 thermal image
- 4000 × 3000 warped thermal image
- 4000 × 3000 overlay of the warped thermal over the visible image
- The coordinates of the keypoints used to register the images
3.2. Manual Image Registration
- Display the visible (reference) and thermal (moving) images side by side in an interactive GUI
- Manually select corresponding landmarks on both images (typically 40–80 per pair, but could go up to 100 or more depending on the scenario complexity), using easily identifiable structures such as building corners or roads
- Estimate a smooth non-rigid transformation using a thin-plate spline (TPS) based on the selected landmark pairs
- Warp the thermal image according to the TPS transformation to align it with the visible image
- Inspect the overlay; if misalignment remains, add or adjust landmarks and recompute the TPS (iterative refinement)
- Once alignment is satisfactory, save the landmark coordinates and TPS parameters for reproducibility
- Generate the final warped thermal image (4000 × 3000) and a blended overlay
3.3. Annotation Accuracy and Consistency
- Landmark influence (LOO). A leave-one-out experiment was performed, in which each landmark was removed in turn and the thin-plate spline (TPS) transformation was recomputed. The resulting warped image differed from the full-set warp by an average of 25.48 px on the 4000 × 3000 visible grid, confirming that each selected landmark contributes substantially to the global alignment.
- Transformation stability. Introducing small random deviations to landmark positions yielded 95% confidence intervals of 3.48 px (x) and 3.28 px (y), demonstrating that the estimated TPS transformation is highly stable to minor landmark uncertainty.
- Inter-annotator consistency. Although the published dataset was annotated by a single experienced annotator, expanding it to a significantly larger scale would require the involvement of multiple annotators. To assess this, we evaluated inter-annotator accuracy by comparing the points selected by two additional annotators with those provided by the trained annotator. The mean landmark discrepancy between annotators was 4.49 px for visible images and 0.84 px for thermal images, demonstrating strong consistency among operators and confirming the high reproducibility of the manual registration framework, even when used by newly trained annotators.
3.4. Automated Image Registration
- For each scale factor in trialFactors:
- –
- Resize moving image by f
- –
- Estimate similarity transform (moving → ref)
- –
- Refine with affine registration (moving → ref)
- –
- Obtain warped moving image
- –
- Compute metrics: RMSE, PSNR, SSIM, NCC
- –
- If metrics are better than bestResult (by NCC, tie-break SSIM), update stored factor, warped image, and metrics as bestResult
- End for
- Save the best warped image and overlays
- Save timing logs
3.5. Metrics
4. Evaluation
4.1. Traditional Registration Methods
4.2. Other Automated Registration Methods
5. Discussion
- Resize moving image by f:
- Estimate similarity transform (moving → ref, imregtform):
- Refine with affine registration (moving → ref, imregister):
- Obtain warped moving image (final resampling onto ref grid):
- Compute metrics (RMSE, PSNR, SSIM, NCC) on aligned images:
- Compare/update bestResult (by SSIM, tie-break NCC):
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zitová, B.; Flusser, J. Image registration methods: A survey. Image Vis. Comput. 2003, 21, 977–1000. [Google Scholar] [CrossRef]
- Sommervold, O.; Gazzea, M.; Arghandeh, R. A Survey on SAR and Optical Satellite Image Registration. Remote Sens. 2023, 15, 850. [Google Scholar] [CrossRef]
- Ulmamei, A.-A.; Bira, C. An Approach for Implementing Electronic Image Stabilization Using an FPGA System. ROMJIST 2024, 27, 267–280. [Google Scholar] [CrossRef]
- Velesaca, H.O.; Bastidas, G.; Rouhani, M.; Sappa, A.D. Multimodal image registration techniques: A comprehensive survey. Multimed. Tools Appl. 2024, 83, 63919–63947. [Google Scholar] [CrossRef]
- Zöllner, F. Multimodal ground truth datasets for abdominal medical image registration [data]. 2022. [Google Scholar] [CrossRef]
- Zambanini, S. H2OPM Image Registration Dataset. Computer Vision Lab, TU Wien. 2018. Available online: https://cvl.tuwien.ac.at/research/cvl-databases/h2opm-dataset/ (accessed on 4 September 2025).
- Liu, Y.; Liu, Y.; Yan, S.; Chen, C.; Zhong, J.; Peng, Y.; Zhang, M. A Multi-View Thermal–Visible Image Dataset for Cross-Spectral Matching. Remote Sens. 2023, 15, 174. [Google Scholar] [CrossRef]
- Hering, A.; Hansen, L.; Mok, T.C.W.; Chung, A.C.S.; Siebert, H.; Häger, S.; Lange, A.; Kuckertz, S.; Heldmann, S.; Shao, W.; et al. Learn2Reg: Comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning. arXiv 2021, arXiv:2112.04489. [Google Scholar] [CrossRef] [PubMed]
- Castillo, R.; Castillo, E.; Fuentes, D.; Ahmad, M.; Wood, A.M.; Ludwig, M.S.; Guerrero, T. A Reference Dataset for Deformable Image Registration Spatial Accuracy Evaluation Using the COPDgene Study Archive. Phys. Med. Biol. 2013, 58, 2861–2877. [Google Scholar] [CrossRef] [PubMed]
- Lambert, Z.; Petitjean, C.; Dubray, B.; Kuan, S. SegTHOR: Segmentation of Thoracic Organs at Risk in CT Images. In Proceedings of the 2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA), Paris, France, 9–12 November 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Hernández-Matas, C.; Zabulis, X.; Triantafyllou, A.; Anyfanti, P.; Douma, S.; Argyros, A.A. FIRE: Fundus Image Registration Dataset. Artif. Intell. Vis. Ophthalmol. 2017, 1, 16–28. [Google Scholar] [CrossRef]
- Ding, L.; Kang, T.; Kuriyan, A.; Ramchandran, R.; Wykoff, C.; Sharma, G. FLoRI21: Fluorescein Angiography Longitudinal Retinal Image Registration Dataset; IEEE Dataport: Piscataway, NJ, USA, 2021. [Google Scholar] [CrossRef]
- Hu, Y.; Gong, M.; Qiu, Z.; Liu, J.; Shen, H.; Yuan, M.; Zhang, X.; Li, H.; Lu, H.; Liu, J. COph100: A comprehensive fundus image registration dataset from infants constituting the “RIDIRP” database. Sci. Data 2025, 99, 2052–4463. [Google Scholar] [CrossRef]
- Wang, C.Y.; Sadrieh, F.K.; Shen, Y.T.; Chen, S.E.; Kim, S.; Chen, V.; Raghavendra, A.; Wang, D.; Saeedi, O.; Tao, Y. MEMO: A Multimodal Retinal Dataset for EMA and OCTA Registration. 2023. Available online: https://chiaoyiwang0424.github.io/MEMO/ (accessed on 4 September 2025).
- Wang, Y.; Li, W.; Pearce, T.; Wang, H. From Tissue Plane to Organ World: A Benchmark Dataset for Multimodal Biomedical Image Registration using Deep Co-Attention Networks. arXiv 2024, arXiv:2406.04105. [Google Scholar] [CrossRef]
- Li, J.; Yang, B.; Chen, C.; Habib, A. NRLI-UAV: Non-rigid registration of sequential raw laser scans and images for low-cost UAV LiDAR point cloud quality improvement. ISPRS J. Photogramm. Remote Sens. 2019, 158, 123–145. [Google Scholar] [CrossRef]
- Liao, Y.; Li, J.; Kang, S.; Li, Q.; Zhu, G.; Yuan, S.; Dong, Z.; Yang, B. SE-Calib: Semantic Edge-Based LiDAR–Camera Boresight Online Calibration in Urban Scenes. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1000513. [Google Scholar] [CrossRef]
- Liao, Y.; Kang, S.; Li, J.; Liu, Y.; Liu, Y.; Dong, Z.; Yang, B.; Chen, X. Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots. IEEE Robot. Autom. Lett. 2024, 9, 3902–3909. [Google Scholar] [CrossRef]
- Jia, X.; Bartlett, J.; Zhang, T.; Lu, W.; Qiu, Z.; Duan, J. U-Net vs. Transformer: Is U-Net Outdated in Medical Image Registration? arXiv 2022, arXiv:2208.04939. [Google Scholar]
- Sun, J.; Shen, Z.; Wang, Y.; Bao, H.; Zhou, X. LoFTR: Detector-Free Local Feature Matching with Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtually, 19–25 June 2021. [Google Scholar]
- Delaunay, R.; Zhang, R.; Pedrosa, F.C.; Feizi, N.; Sacco, D.; Patel, R.V.; Jagadeesan, J. Transformer-based local feature matching for multimodal image registration. In Proceedings of the Medical Imaging 2024: Image Processing, San Diego, CA, USA, 18–23 February 2024; p. 18. [Google Scholar] [CrossRef]
- Fu, Y.; Brown, N.; Saeed, S.; Casamitjana, A.; Baum, Z.; Delaunay, R.; Yang, Q.; Grimwood, A.; Min, Z.; Blumberg, S.; et al. DeepReg: A deep learning toolkit for medical image registration. J. Open Source Softw. 2020, 5, 2705. [Google Scholar] [CrossRef]
- Balakrishnan, G.; Zhao, A.; Sabuncu, M.R.; Guttag, J.; Dalca, A.V. VoxelMorph: A Learning Framework for Deformable Medical Image Registration. IEEE Trans. Med. Imaging 2019, 38, 1788–1800. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Liu, S.; Zhang, H.; Li, D.; Ma, L. Multi-Modal Remote Sensing Image Registration Method Combining Scale-Invariant Feature Transform with Co-Occurrence Filter and Histogram of Oriented Gradients Features. Remote Sens. 2025, 17, 2246. [Google Scholar] [CrossRef]
- Luo, X.; Wei, Z.; Jin, Y.; Wang, X.; Lin, P.; Wei, X.; Zhou, W. Fast Automatic Registration of UAV Images via Bidirectional Matching. Sensors 2023, 23, 8566. [Google Scholar] [CrossRef]
- Sun, W.; Gao, H.; Li, C. A Two-Stage Registration Strategy for Thermal–Visible Images in Substations. Appl. Sci. 2024, 14, 1158. [Google Scholar] [CrossRef]
- Kumawat, A.; Panda, S.; Gerogiannis, V.C.; Kanavos, A.; Acharya, B.; Manika, S. A Hybrid Approach for Image Acquisition Methods Based on Feature-Based Image Registration. J. Imaging 2024, 10, 228. [Google Scholar] [CrossRef] [PubMed]
- SciPy Community. scipy.interpolate.Rbf. Available online: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.Rbf.html (accessed on 22 August 2025).
- Li, J.; Hu, Q.; Ai, M. RIFT: Multi-Modal Image Matching Based on Radiation-Variation Insensitive Feature Transform. IEEE Trans. Image Process. 2020, 29, 3296–3310. [Google Scholar] [CrossRef] [PubMed]
- Ye, Y.; Bruzzone, L.; Shan, J.; Bovolo, F.; Zhu, Q. Fast and Robust Matching for Multimodal Remote Sensing Image Registration. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9059–9070. [Google Scholar] [CrossRef]
- Vasile, C.-E.; Ulmamei, A.-A.; Bira, C. Image Processing Hardware Acceleration-A Review of Operations Involved and Current Hardware Approaches. J. Imaging 2024, 10, 298. [Google Scholar] [CrossRef] [PubMed]







| Dataset | Domain/Modality | Year | Size Details | Ground Truth Notes |
|---|---|---|---|---|
| Learn2Reg [8] | Multi-task medical (CT, MR, US, histology) | 2021–2022 | Multiple datasets across anatomies | Labels + evaluation framework; challenge benchmark |
| COPDgene [9] | Thoracic CT (inhale/exhale) | 2013 | 10 BH-CT pairs, ∼7k landmarks | Manually validated landmark pairs; deformable registration reference |
| SegTHOR [10] | Thoracic CT (organs at risk) | 2020 | 60 3D CT scans (40 train, 20 test) | Manual segmentation of heart, aorta, trachea, esophagus |
| FIRE [11] | Retinal fundus images | 2017 | 134 pairs from 129 images | Landmark correspondences; standard fundus benchmark |
| FLoRI21 [12] | Retinal fluorescein angiography | 2021 | 15 reference-target pairs | Ground-truth alignments; longitudinal retina |
| COph100 [13] | Infant retinal fundus | 2025 | 491 image pairs (100 eyes, multi-session) | Correspondences + vessel masks; infant disease progression |
| MEMO [14] | Retinal EMA + OCTA | 2023 | EMA/OCTA image pairs | Landmarks + segmentation; vessel-density mismatch |
| ATOM [15] | Histology-Organ mapping | 2024 | Histology subregions in 3D organ | Spatial localization of histo sections within organ context |
| Multimodal Abdomen [5] | Synthetic MRI, CT, CBCT | 2020 | CycleGAN-generated volumes | Perfect co-registration; validation dataset |
| H2OPM [6] | Aerial orthophoto maps (Austria) | 2018 | 8 references, 42 historical pairs | Manual correspondences; groupwise aerial registration |
| MTV [7] | Multi-view thermal–visible images | 2022 | 40k image pairs, 640 × 512 | camera metadata, 3D reference model, depth map of the visible images, and 6-DoF pose of all images |
| Source | Type | Open-Source | Comments/Remarks |
|---|---|---|---|
| [16] NRLI-UAV: Non-rigid registration of sequential raw laser scans and images for low-cost UAV LiDAR point cloud quality improvement | LiDAR + imagery registration | No | A two-step “coarse-to-fine” non-rigid registration method addressing the low precision of low-cost UAV LiDAR systems; Final registration error <1 pixel in image space and <0.13 m in object space; Complex setup |
| [17] SE-Calib: Semantic Edges based LiDAR–Camera Boresight Online Calibration in Urban Scenes | LiDAR–Camera Boresight Calibration | No | This LiDAR–Camera calibration pipeline can be extended to thermal–visible sensors (especially semantic feature extraction, target-free operation, and multi-frame optimization) |
| [18] Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots | Lightweight semantic segmentation | No | Since this is a lightweight semantic segmentation method, it could be adapted/quantized to run onboard (on the drone or on-field on an embedded device) and achieve close to real-time performance |
| [19] U-Net vs. Transformer: Is U-Net Outdated in Medical Image Registration? | Learning-based image registration | No | Claims that U-Net with sufficient receptive field might still perform very well on most image registration tasks |
| [20] LoFTR: Detector-Free Local Feature Matching with Transformers | Learning-based image registration | Yes | Strong empirical results; Can be heavy in terms of GPU memory and compute, especially for large images or high resolution |
| [21] Transformer-Based Local Feature Matching for Multimodal Image Registration | Learning-based image registration | No | Built on top of LoFTR; Cross-modality, cross-dimensional registration |
| [22] DeepReg: a deep learning toolkit for medical image registration | Learning-based image registration | Yes | Open-source toolkit written in Python 3 (TensorFlow 2-based) designed for deep-learning-based image registration, originally in the medical imaging domain. Does not offer pretrained models |
| [23] VoxelMorph: A Learning Framework for Deformable Medical Image Registration | Learning-based image registration | Yes | CNN-based learning framework for deformable (dense) image registration in the medical-imaging domain; Claimed to have fast inference once trained |
| [24] Multi-Modal Remote Sensing Image Registration Method Combining Scale-Invariant Feature Transform with Co-Occurrence Filter and Histogram of Oriented Gradients Features | Multi-modal SIFT-based registration | No | Novel modification of SIFT to suppresses texture variations while preserving structural information |
| [25] Fast Automatic Registration of UAV Images via Bidirectional Matching | Visible-Visible ORB-based registration | No | Built on top of the lightweight feature-matching algorithm ORB; Targeted to visible spectrum UAV scenery |
| [26] A Two-Stage Registration Strategy for Thermal–Visible Images in Substations | Thermal–Visible registration | No | Domain-specific focus (electrical substations); Claims sub-5 pixel error across 30 images |
| [27] A Hybrid Approach for Image Acquisition Methods Based on Feature-Based Image Registration | Visible-Visible registration | No | Novel hybrid feature-detection/registration method aimed at image acquisition scenarios; Claimed to yield improved keypoint detection and computational efficiency compared to the conventional detectors (SIFT, ORB, etc.) |
| Method | RMSE | PSNR (dB) | SSIM | NCC |
|---|---|---|---|---|
| ORB | 0.25 | 12.31 | 0.46 | 0.26 |
| SURF | 0.23 | 13.51 | 0.52 | 0.38 |
| SIFT | 0.19 | 15.38 | 0.60 | 0.53 |
| KAZE | 0.19 | 14.78 | 0.54 | 0.50 |
| Cross-correlation | 0.20 | 15.10 | 0.55 | 0.51 |
| Intensity-based | 0.22 | 13.94 | 0.53 | 0.29 |
| Heuristic method * | 0.12 | 18.53 | 0.77 | 0.82 |
| Location | Method | RMSE | PSNR (dB) | SSIM | NCC | SAC ** |
|---|---|---|---|---|---|---|
| ORB | 0.33 | 9.67 | 0.25 | 0.09 | 0/7 | |
| SURF | 0.28 | 11.07 | 0.33 | 0.13 | 1/7 | |
| SIFT | 0.30 | 10.42 | 0.25 | 0.08 | 0/7 | |
| Mountain | KAZE | 0.30 | 10.36 | 0.27 | 0.05 | 0/7 |
| Cross-correlation | 0.29 | 10.84 | 0.39 | 0.11 | 2/7 | |
| Intensity-based | 0.15 | 16.48 | 0.66 | 0.69 | 5/7 | |
| Heuristic method * | 0.13 | 17.52 | 0.72 | 0.79 | 7/7 | |
| ORB | 0.21 | 14.07 | 0.50 | 0.38 | 2/14 | |
| SURF | 0.23 | 13.12 | 0.46 | 0.29 | 1/14 | |
| SIFT | 0.14 | 18.13 | 0.70 | 0.68 | 10/14 | |
| Seaside pier | KAZE | 0.16 | 15.86 | 0.50 | 0.56 | 2/14 |
| Cross-correlation | 0.12 | 18.95 | 0.66 | 0.71 | 9/14 | |
| Intensity-based | 0.19 | 14.45 | 0.49 | 0.29 | 3/14 | |
| Heuristic method * | 0.11 | 18.88 | 0.78 | 0.78 | 11/14 | |
| ORB | 0.21 | 13.57 | 0.61 | 0.40 | 3/6 | |
| SURF | 0.13 | 17.58 | 0.79 | 0.81 | 5/6 | |
| SIFT | 0.15 | 16.61 | 0.74 | 0.72 | 5/6 | |
| Residential area | KAZE | 0.12 | 17.92 | 0.80 | 0.83 | 6/6 |
| Cross-correlation | 0.19 | 15.9 | 0.67 | 0.64 | 4/6 | |
| Intensity-based | 0.3 | 10.49 | 0.43 | −0.09 | 0/6 | |
| Heuristic method * | 0.11 | 18.92 | 0.81 | 0.86 | 6/6 | |
| ORB | 0.27 | 11.59 | 0.48 | 0.20 | 3/14 | |
| SURF | 0.23 | 13.37 | 0.57 | 0.42 | 5/14 | |
| SIFT | 0.20 | 14.56 | 0.61 | 0.54 | 6/14 | |
| Mountain resort | KAZE | 0.19 | 14.57 | 0.61 | 0.54 | 6/14 |
| Cross-correlation | 0.24 | 13.03 | 0.48 | 0.44 | 3/14 | |
| Intensity-based | 0.23 | 13.64 | 0.54 | 0.25 | 4/14 | |
| Heuristic method * | 0.11 | 19.51 | 0.79 | 0.88 | 14/14 |
| Method | ORB | SURF | SIFT | KAZE | Cross-Corr | Intensity | Ours |
|---|---|---|---|---|---|---|---|
| Duration (s) | 1.50 | 1.76 | 1.77 | 2.64 | 3.21 | 2.25 | 92.72 |
| Location | NCC | SAC ** |
|---|---|---|
| Mountain | 0.37 | 2/7 |
| Seaside pier | 0.70 | 11/14 |
| Residential area | 0.60 | 5/6 |
| Mountain resort | 0.54 | 8/14 |
| Specification | CPU | GPU | Comments |
|---|---|---|---|
| Independent Cores | 6 | 60 | SIMD-vectorization on CPU (16× fp32) is inferior to CUDA cores per SM (128×) |
| Memory bandwidth (GB/s) | 76.8 | 504.2 | dual-channel, DDR 4800 MT/s 64b per Transfer = 76.8 GB/s |
| Max compute power (TFlops, 32-bit) | 1 | 40 | depends on maximum clock frequency number of cores, vectorization capabilities |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vasile, C.-E.; Bîră, C.; Hobincu, R. UAV-TIRVis: A Benchmark Dataset for Thermal–Visible Image Registration from Aerial Platforms. J. Imaging 2025, 11, 432. https://doi.org/10.3390/jimaging11120432
Vasile C-E, Bîră C, Hobincu R. UAV-TIRVis: A Benchmark Dataset for Thermal–Visible Image Registration from Aerial Platforms. Journal of Imaging. 2025; 11(12):432. https://doi.org/10.3390/jimaging11120432
Chicago/Turabian StyleVasile, Costin-Emanuel, Călin Bîră, and Radu Hobincu. 2025. "UAV-TIRVis: A Benchmark Dataset for Thermal–Visible Image Registration from Aerial Platforms" Journal of Imaging 11, no. 12: 432. https://doi.org/10.3390/jimaging11120432
APA StyleVasile, C.-E., Bîră, C., & Hobincu, R. (2025). UAV-TIRVis: A Benchmark Dataset for Thermal–Visible Image Registration from Aerial Platforms. Journal of Imaging, 11(12), 432. https://doi.org/10.3390/jimaging11120432

