Fine-Grained Multispectral Fusion for Oriented Object Detection in Remote Sensing
Highlights
- A novel visible-infrared fusion framework for oriented object detection, FGMF, is proposed to, achieving mAP50 of 80.2% and 66.3% on the DroneVehicle and VEDAI datasets, respectively, with only 87.2M parameters.
- A dual-enhancement and fusion module (DEFM) is proposed for fine-grained multispectral feature calibration and fusion and an orientation aggregation module (OAM) is designed to capture directional context.
- The study provides an effective solution to the critical challenges of modality misalignment and limited orientation sensitivity, significantly advancing the robustness of object detection in complex scenarios, such as low illumination, arbitrary orientations and dense arrangement.
- The DEFM and OAM modules represent significant advancements in multispectral fusion and orientation modeling, offering transferable architectural designs that can benefit numerous vision tasks beyond remote sensing applications.
Abstract
1. Introduction
- We propose a novel one-stage-based infrared–visible-oriented object detection method, named FGMF, which dynamically emphasizes object orientations and adaptively fuses complementary and similar information.
- To tackle the modality misalignment problem, we propose a Dual-Enhancement and Fusion Module with two single-modality enhancement processes to capture similar and distinct features, followed by a fusion step to achieve feature calibration.
- To handle the lack of directional priors in large square convolutional kernels, we propose an orientation aggregation module that employs a series of rotated strip convolution to encode orientation-aware features.
2. Related Work
2.1. Oriented Object Detection
2.2. Infrared–Visible Image Fusion
2.3. Long-Sequence Modeling
3. Methodology
3.1. Preliminaries
3.2. Overview
3.3. Dual-Enhancement and Fusion Module
3.4. Orientation Aggregation Module
3.5. Loss Function
4. Experiments and Analysis
4.1. Datasets
4.2. Implementation Details
4.3. Evaluation Metrics
4.4. Ablation Studies
4.5. Comparison with State-of-the-Art Methods
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
- Yang, X.; Yan, J.; Feng, Z.; He, T. R3det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 3163–3171. [Google Scholar]
- Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
- Han, J.; Ding, J.; Li, J.; Xia, G.S. Align deep features for oriented object detection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5602511. [Google Scholar] [CrossRef]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
- Liu, Z.; Yuan, L.; Weng, L.; Yang, Y. A high resolution optical satellite image dataset for ship recognition and some new baselines. In Proceedings of the International Conference on Pattern Recognition Applications and Methods, Porto, Portugal, 24–26 February 2017; Volume 2, pp. 324–331. [Google Scholar]
- Deng, Q.; Tian, W.; Huang, Y.; Xiong, L.; Bi, X. Pedestrian detection by fusion of RGB and infrared images in low-light environment. In Proceedings of the 2021 IEEE 24th International Conference on Information Fusion (FUSION), Sun City, South Africa, 1–4 November 2021; pp. 1–8. [Google Scholar]
- Abdulfattah, M.H.; Sheikh, U.U.; Masud, M.I.; Othman, M.A.; Khamis, N.; Aman, M.; Arfeen, Z.A. Assessing the Detection Capabilities of RGB and Infrared Models for Robust Occluded and Unoccluded Pedestrian Detection. IEEE Access 2025, 13, 91834–91845. [Google Scholar] [CrossRef]
- Wang, Q.; Chi, Y.; Shen, T.; Song, J.; Zhang, Z.; Zhu, Y. Improving rgb-infrared pedestrian detection by reducing cross-modality redundancy. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 526–530. [Google Scholar]
- Peng, T.; Li, Q.; Zhu, P. Rgb-t crowd counting from drone: A benchmark and mmccn network. In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020. [Google Scholar]
- Gu, S.; Lian, Z. A unified RGB-T crowd counting learning framework. Image Vis. Comput. 2023, 131, 104631. [Google Scholar] [CrossRef]
- Mu, B.; Shao, F.; Xie, Z.; Xu, L.; Jiang, Q. RGBT-Booster: Detail-Boosted Fusion Network for RGB-Thermal Crowd Counting with Local Contrastive Learning. IEEE Internet Things J. 2025, 12, 18331–18349. [Google Scholar] [CrossRef]
- Guan, D.; Cao, Y.; Yang, J.; Cao, Y.; Yang, M.Y. Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf. Fusion 2019, 50, 148–157. [Google Scholar] [CrossRef]
- Li, C.; Liang, X.; Lu, Y.; Zhao, N.; Tang, J. RGB-T object tracking: Benchmark and baseline. Pattern Recognit. 2019, 96, 106977. [Google Scholar] [CrossRef]
- Liu, L.; Chen, J.; Wu, H.; Li, G.; Li, C.; Lin, L. Cross-modal collaborative representation learning and a large-scale rgbt benchmark for crowd counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 4823–4833. [Google Scholar]
- Xu, D.; Ouyang, W.; Ricci, E.; Wang, X.; Sebe, N. Learning cross-modal deep representations for robust pedestrian detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5363–5371. [Google Scholar]
- Zhang, Q.; Huang, N.; Yao, L.; Zhang, D.; Shan, C.; Han, J. RGB-T salient object detection via fusing multi-level CNN features. IEEE Trans. Image Process. 2019, 29, 3321–3335. [Google Scholar] [CrossRef]
- Zhang, Q.; Zhao, S.; Luo, Y.; Zhang, D.; Huang, N.; Han, J. ABMDRNet: Adaptive-weighted bi-directional modality difference reduction network for RGB-T semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 2633–2642. [Google Scholar]
- Zhou, K.; Chen, L.; Cao, X. Improving multispectral pedestrian detection by addressing modality imbalance problems. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XVIII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 787–803. [Google Scholar]
- Yuan, M.; Wang, Y.; Wei, X. Translation, scale and rotation: Cross-modal alignment meets RGB-infrared vehicle detection. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 509–525. [Google Scholar]
- Pan, X.; Ren, Y.; Sheng, K.; Dong, W.; Yuan, H.; Guo, X.; Ma, C.; Xu, C. Dynamic refinement network for oriented and densely packed object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11207–11216. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Li, Y.; Li, X.; Dai, Y.; Hou, Q.; Liu, L.; Liu, Y.; Cheng, M.M.; Yang, J. Lsknet: A foundation lightweight backbone for remote sensing. Int. J. Comput. Vis. 2024, 133, 1410–1431. [Google Scholar] [CrossRef]
- Cai, X.; Lai, Q.; Wang, Y.; Wang, W.; Sun, Z.; Yao, Y. Poly kernel inception network for remote sensing detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–21 June 2024; pp. 27706–27716. [Google Scholar]
- Yuan, M.; Wei, X. C2former: Calibrated and complementary transformer for rgb-infrared object detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5403712. [Google Scholar] [CrossRef]
- Zhou, M.; Li, T.; Qiao, C.; Xie, D.; Wang, G.; Ruan, N.; Mei, L.; Yang, Y.; Shen, H.T. DMM: Disparity-guided Multispectral Mamba for Oriented Object Detection in Remote Sensing. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5404913. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 91–99. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Cheng, Y.; Xu, C.; Kong, Y.; Wang, X. Short-Side Excursion for Oriented Object Detection. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6515205. [Google Scholar] [CrossRef]
- Pu, Y.; Wang, Y.; Xia, Z.; Han, Y.; Wang, Y.; Gan, W.; Wang, Z.; Song, S.; Huang, G. Adaptive rotated convolution for rotated object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6589–6600. [Google Scholar]
- Wang, J.; Pu, Y.; Han, Y.; Guo, J.; Wang, Y.; Li, X.; Huang, G. Gra: Detecting oriented objects through group-wise rotating and attention. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 298–315. [Google Scholar]
- Yang, X.; Yan, J. Arbitrary-oriented object detection with circular smooth label. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part VIII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 677–694. [Google Scholar]
- Yang, X.; Hou, L.; Zhou, Y.; Wang, W.; Yan, J. Dense label encoding for boundary discontinuity free rotation detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 15819–15829. [Google Scholar]
- Yang, X.; Yan, J.; Ming, Q.; Wang, W.; Zhang, X.; Tian, Q. Rethinking rotated object detection with gaussian wasserstein distance loss. In Proceedings of the International Conference on Machine Learning. PMLR, Virtual, 18–24 July 2021; pp. 11830–11841. [Google Scholar]
- Yang, X.; Yang, X.; Yang, J.; Ming, Q.; Wang, W.; Tian, Q.; Yan, J. Learning high-precision bounding box for rotated object detection via kullback-leibler divergence. Adv. Neural Inf. Process. Syst. 2021, 34, 18381–18394. [Google Scholar]
- Yang, X.; Zhou, Y.; Zhang, G.; Yang, J.; Wang, W.; Yan, J.; Zhang, X.; Tian, Q. The KFIoU Loss for Rotated Object Detection. In Proceedings of the The Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Jiang, C.; Ren, H.; Yang, H.; Huo, H.; Zhu, P.; Yao, Z.; Li, J.; Sun, M.; Yang, S. M2FNet: Multi-modal fusion network for object detection from visible and thermal infrared images. Int. J. Appl. Earth Obs. Geoinf. 2024, 130, 103918. [Google Scholar] [CrossRef]
- Yuan, M.; Shi, X.; Wang, N.; Wang, Y.; Wei, X. Improving RGB-infrared object detection with cascade alignment-guided transformer. Inf. Fusion 2024, 105, 102246. [Google Scholar] [CrossRef]
- Hu, Y.; Chen, X.; Wang, S.; Liu, L.; Shi, H.; Fan, L.; Tian, J.; Liang, J. Deformle Cross-Attention Trnsformer for Wekly Aligned RGB–T Pedestrin Detection. IEEE Trans. Multimed. 2025, 27, 4400–4411. [Google Scholar] [CrossRef]
- Liu, Y.; Guo, W.; Yao, C.; Zhang, L. Dual-Perspective Alignment Learning for Multimodal Remote Sensing Object Detection. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5404015. [Google Scholar] [CrossRef]
- Cao, B.; Guo, J.; Zhu, P.; Hu, Q. Bi-directional adapter for multimodal tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, WC, Canada, 20–27 February 2024; Volume 38, pp. 927–935. [Google Scholar]
- Zhang, P.; Zhao, J.; Bo, C.; Wang, D.; Lu, H.; Yang, X. Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Trans. Image Process. 2021, 30, 3335–3347. [Google Scholar] [CrossRef]
- Gómez-Chova, L.; Tuia, D.; Moser, G.; Camps-Valls, G. Multimodal classification of remote sensing images: A review and future directions. Proc. IEEE 2015, 103, 1560–1584. [Google Scholar] [CrossRef]
- Zhang, J.; Lei, J.; Xie, W.; Fang, Z.; Li, Y.; Du, Q. SuperYOLO: Super resolution assisted object detection in multimodal remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5605415. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Kautz, J. Mambavision: A hybrid mamba-transformer vision backbone. arXiv 2024, arXiv:2407.08083. [Google Scholar]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. Vmamba: Visual state space model. Adv. Neural Inf. Process. Syst. 2024, 37, 103031–103063. [Google Scholar]
- Gu, A.; Goel, K.; Ré, C. Efficiently modeling long sequences with structured state spaces. arXiv 2021, arXiv:2111.00396. [Google Scholar]
- Mehta, H.; Gupta, A.; Cutkosky, A.; Neyshabur, B. Long range language modeling via gated state spaces. arXiv 2022, arXiv:2206.13947. [Google Scholar] [CrossRef]
- Malik, H.S.; Shamshad, F.; Naseer, M.; Nandakumar, K.; Khan, F.S.; Khan, S. Towards Evaluating the Robustness of Visual State Space Models. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 10–17 June 2025; pp. 3544–3553. [Google Scholar]
- He, X.; Cao, K.; Zhang, J.; Yan, K.; Wang, Y.; Li, R.; Xie, C.; Hong, D.; Zhou, M. Pan-mamba: Effective pan-sharpening with state space model. Inf. Fusion 2025, 115, 102779. [Google Scholar] [CrossRef]
- Chen, K.; Chen, B.; Liu, C.; Li, W.; Zou, Z.; Shi, Z. Rsmamba: Remote sensing image classification with state space model. IEEE Geosci. Remote Sens. Lett. 2024, 21, 8002605. [Google Scholar] [CrossRef]
- Zhu, Q.; Cai, Y.; Fang, Y.; Yang, Y.; Chen, C.; Fan, L.; Nguyen, A. Samba: Semantic segmentation of remotely sensed images with state space model. Heliyon 2024, 10, e38495. [Google Scholar] [CrossRef]
- Chen, H.; Song, J.; Han, C.; Xia, J.; Yokoya, N. ChangeMamba: Remote sensing change detection with spatiotemporal state space model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4409720. [Google Scholar] [CrossRef]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]
- Wu, X.; Cao, Z.H.; Huang, T.Z.; Deng, L.J.; Chanussot, J.; Vivone, G. Fully-Connected Transformer for Multi-Source Image Fusion. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 2071–2088. [Google Scholar] [CrossRef] [PubMed]
- Vs, V.; Jose Valanarasu, J.M.; Oza, P.; Patel, V.M. Image Fusion Transformer. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 3566–3570. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Sun, Y.; Cao, B.; Zhu, P.; Hu, Q. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6700–6713. [Google Scholar] [CrossRef]
- Razakarivony, S.; Jurie, F. Vehicle detection in aerial imagery: A small target detection benchmark. J. Vis. Commun. Image Represent. 2016, 34, 187–203. [Google Scholar] [CrossRef]
- Zhou, Y.; Yang, X.; Zhang, G.; Wang, J.; Liu, Y.; Hou, L.; Jiang, X.; Liu, X.; Yan, J.; Lyu, C.; et al. Mmrotate: A rotated object detection benchmark using pytorch. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 7331–7334. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open mmlab detection toolbox and benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar] [CrossRef]
- Zhang, L.; Liu, Z.; Zhang, S.; Yang, X.; Qiao, H.; Huang, K.; Hussain, A. Cross-modality interactive attention network for multispectral pedestrian detection. Inf. Fusion 2019, 50, 20–29. [Google Scholar] [CrossRef]
- Zhang, L.; Liu, Z.; Zhu, X.; Song, Z.; Yang, X.; Lei, Z.; Qiao, H. Weakly aligned feature fusion for multimodal object detection. IEEE Trans. Neural Netw. Learn. Syst. 2021, 36, 4145–4159. [Google Scholar] [CrossRef] [PubMed]
- He, X.; Tang, C.; Zou, X.; Zhang, W. Multispectral object detection via cross-modal conflict-aware learning. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 1465–1474. [Google Scholar]
- Zhang, J.; Cao, M.; Xie, W.; Lei, J.; Li, D.; Huang, W.; Li, Y.; Yang, X. E2e-mfd: Towards end-to-end synchronous multimodal fusion detection. Adv. Neural Inf. Process. Syst. 2024, 37, 52296–52322. [Google Scholar]









| Configuration Items | Parameters |
|---|---|
| CPU | Intel(R) Xeon(R) Platinum 8336C |
| GPU | GTX 4090 24 G |
| Memory | 100 G |
| Operating System | Ubuntu 22.04 |
| Deep Learning Framework | MMRotate and MMDetection (Based on PyTorch 2.5.1) |
| Exp | 0° | 45° | 90° | −45° | mAP50 |
|---|---|---|---|---|---|
| I | (3, 3) | – | – | – | 78.5 |
| II | (1, 9) | – | – | – | 78.3 |
| III | – | – | (1, 9) | – | 78.4 |
| IV | (1, 9) | – | (1, 9) | – | 79.3 |
| V | – | (1, 9) | – | (1, 9) | 78.1 |
| VI | (1, 9) | (1, 9) | (1, 9) | (1, 9) | 79.6 |
| VII | (1, 11) | (1, 11) | (1, 11) | (1, 11) | 79.4 |
| Exp | OAM | DEFM | mAP50 | ||
|---|---|---|---|---|---|
| DSSM | 1st Enh. | 2nd Enh. | |||
| I | – | ✔ | – | – | 79.3 |
| II | ✔ | ✔ | – | – | 79.6 |
| III | ✔ | ✔ | ✔ | – | 79.8 |
| IV | ✔ | ✔ | ✔ | ✔ | 80.2 |
| Modality | Method | Venue | Year | Basic Detector | Car | Truck | Freight Car | Bus | Van | mAP50 (%) | Params (M) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| RGB | Faster R-CNN [27] | NeurIPS | 2015 | – | 79.0 | 49.0 | 37.2 | 77.0 | 37.0 | 55.9 | 41.1 |
| RetinaNet [28] | ICCV | 2017 | – | 78.5 | 34.4 | 24.1 | 69.8 | 28.8 | 47.1 | 36.4 | |
| RoI Trans [1] | CVPR | 2019 | – | 61.6 | 55.1 | 42.3 | 85.5 | 44.8 | 61.6 | 55.1 | |
| S2ANet [4] | TGRS | 2021 | – | 80.0 | 54.2 | 42.2 | 84.9 | 43.8 | 61.0 | 38.6 | |
| IR | Faster R-CNN [27] | NeurIPS | 2015 | – | 89.4 | 53.5 | 48.3 | 87.0 | 42.6 | 64.2 | 41.1 |
| RetinaNet [28] | ICCV | 2017 | – | 88.8 | 35.4 | 39.5 | 76.5 | 32.1 | 54.5 | 36.4 | |
| RoI Trans [1] | CVPR | 2019 | – | 89.6 | 51.0 | 53.4 | 88.9 | 44.5 | 65.5 | 55.1 | |
| S2ANet [4] | TGRS | 2021 | – | 89.9 | 54.5 | 55.8 | 88.9 | 48.4 | 67.5 | 38.6 | |
| RGB+IR | CIAN [63] | INFORM FUSION | 2019 | - | 89.98 | 62.47 | 60.22 | 88.90 | 49.59 | 70.23 | - |
| AR-CNN [64] | TNNLS | 2021 | Faster R-CNN | 90.1 | 64.8 | 62.1 | 89.4 | 51.5 | 71.6 | - | |
| UA-CMDet [59] | TCSVT | 2022 | RoI Trans | 87.5 | 60.7 | 46.8 | 87.1 | 38.0 | 64.0 | - | |
| TSFADet [20] | ECCV | 2022 | - | 89.9 | 67.9 | 63.7 | 89.8 | 54.0 | 73.1 | 104.7 | |
| CALNet [65] | ACM MM | 2023 | – | 90.3 | 76.2 | 63.0 | 89.1 | 58.5 | 75.4 | - | |
| C2Former [25] | TGRS | 2024 | S2ANet | 90.2 | 68.3 | 64.4 | 89.8 | 58.5 | 74.2 | 100.8 | |
| E2E-MFD [66] | NeurIPS | 2024 | – | 90.3 | 79.3 | 64.6 | 89.8 | 63.1 | 77.4 | - | |
| DMM [26] | TGRS | 2025 | S2ANet | 90.5 | 77.7 | 73.2 | 90.0 | 65.1 | 79.3 | 88.0 | |
| FGMF (Ours) | - | 2025 | S2ANet | 90.5 | 78.5 | 74.7 | 90.3 | 67.1 | 80.2 | 87.2 |
| Modality | Method | Car | Truck | Tractor | Camping Car | Van | Pick-Up | Boat | Plane | Others | mAP50 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| RGB | RetinaNet [28] | 48.9 | 16.8 | 15.9 | 21.4 | 5.9 | 37.5 | 4.4 | 21.2 | 14.1 | 20.7 |
| S2ANet [4] | 74.5 | 47.3 | 55.6 | 61.7 | 32.5 | 65.1 | 16.7 | 7.1 | 39.8 | 44.5 | |
| Faster R-CNN [27] | 71.4 | 54.2 | 61.0 | 70.5 | 59.5 | 67.6 | 52.3 | 77.1 | 40.1 | 61.5 | |
| RoI Trans [1] | 77.3 | 56.1 | 64.7 | 73.6 | 60.2 | 71.5 | 56.7 | 85.7 | 42.8 | 65.4 | |
| IR | RetinaNet [28] | 44.2 | 15.3 | 9.4 | 17.1 | 7.2 | 32.1 | 4.0 | 33.4 | 5.7 | 18.7 |
| S2ANet [4] | 73.0 | 39.2 | 41.9 | 59.2 | 32.3 | 65.6 | 13.9 | 12.0 | 23.1 | 40.0 | |
| Faster R-CNN [27] | 71.6 | 49.1 | 49.2 | 68.1 | 57.0 | 66.5 | 35.6 | 71.6 | 29.5 | 55.4 | |
| RoI Trans [1] | 76.1 | 51.7 | 51.9 | 71.2 | 64.3 | 70.7 | 46.9 | 83.3 | 28.3 | 60.5 | |
| RGB+IR | C2Former + S2ANet [25] | 76.7 | 52.0 | 59.8 | 63.2 | 48.0 | 68.7 | 43.3 | 47.0 | 41.9 | 55.6 |
| DMM + S2ANet [26] | 77.9 | 59.3 | 68.1 | 70.8 | 57.4 | 75.8 | 61.2 | 77.5 | 43.5 | 65.7 | |
| FGMF + S2ANet (Ours) | 78.2 | 57.6 | 66.8 | 69.7 | 57.9 | 74.1 | 57.6 | 87.4 | 47.1 | 66.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lan, X.; Zhang, S.; Bai, Y.; Qin, X. Fine-Grained Multispectral Fusion for Oriented Object Detection in Remote Sensing. Remote Sens. 2025, 17, 3769. https://doi.org/10.3390/rs17223769
Lan X, Zhang S, Bai Y, Qin X. Fine-Grained Multispectral Fusion for Oriented Object Detection in Remote Sensing. Remote Sensing. 2025; 17(22):3769. https://doi.org/10.3390/rs17223769
Chicago/Turabian StyleLan, Xin, Shaolin Zhang, Yuhao Bai, and Xiaolin Qin. 2025. "Fine-Grained Multispectral Fusion for Oriented Object Detection in Remote Sensing" Remote Sensing 17, no. 22: 3769. https://doi.org/10.3390/rs17223769
APA StyleLan, X., Zhang, S., Bai, Y., & Qin, X. (2025). Fine-Grained Multispectral Fusion for Oriented Object Detection in Remote Sensing. Remote Sensing, 17(22), 3769. https://doi.org/10.3390/rs17223769

