MIFMNet: A Multimodal Interactions and Fusion Mamba for RGBT Tracking with UAV Platforms
Highlights
- A novel multimodal interaction and fusion Mamba network (MIFMNet) is proposed for UAV RGBT tracking, featuring two core modules scale differential enhanced Mamba (SDEM) and flow-guided multilayer interaction Mamba (FMIM) that address the trade-off between interaction capability and computational efficiency in existing CNN/Transformer frameworks.
- MIFMNet achieves state-of-the-art performance on four mainstream RGBT benchmarks (LasHeR, RGBT210, RGBT234, VTUAV), with an inference speed of 35.3 FPS and superior robustness in UAV-specific challenges (scale variation, rapid motion, occlusion).
- The scale differential enhancement and flow-guided motion-aware interaction mechanisms of MIFMNet provide an efficient solution for multimodal fusion in dynamic remote sensing observation scenarios with resource constraints.
- Extending Mamba to RGBT tracking verifies its potential for linear-complexity long-range modeling in multimodal vision tasks, offering a new architectural alternative to CNNs and Transformers for UAV computer vision applications.
Abstract
1. Introduction
- We propose a novel multimodal interaction and fusion network for RGBT tracking with UAV platforms. Through multi-level Mamba, this network not only achieves efficient multimodal interaction but also specifically addresses the issues of weak response in multiscale targets and insufficient adaptability to UAV scenarios.
- We design a scale differential enhancement Mamba to model modal differences with linear computational complexity and expand the receptive field through parallel convolutions, enabling cross-modal enhancement and fusion while efficiently adapting to multiscale targets.
- We introduce flow-guided multilayer interaction Mamba, which integrates optical flow-derived motion information into the scanning order prediction, allowing the network to dynamically prioritize shallow-texture or deep-semantic features based on motion intensity. This mitigates information forgetting and enhances robustness in UAV dynamic scenarios.
- Extensive experiments on four RGBT benchmarks show that MIFMNet achieves SOTA results while maintaining a manageable computational load, with exceptional performance in UAV scenarios such as scale changes and rapid motion.
2. Related Works
2.1. RGBT Tracking
2.2. Visual State Space Models
3. Methodology
3.1. Preliminaries
3.2. Overall Framework
3.3. SDEM
- It fails to exploit specific modes.
- Its quadratic computational overhead impedes dense multimodal feature interaction across multiple layers.
3.4. FMIM
4. Experiment
4.1. Experimental Setup
4.2. Quantitative Comparison
4.2.1. Evaluation on RGBT210
4.2.2. Evaluation on RGBT234
4.2.3. Evaluation on VTUAV
4.2.4. Evaluation on LasHeR
4.3. Qualitative Comparison
4.4. Ablation Studies
4.4.1. Component Analysis
4.4.2. Impact of Input Resolution
4.4.3. Impact of Network Depth
4.4.4. Efficiency Analysis
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sun, X.; Sun, H.; Liu, B.; Jiang, S.; Wang, J.; Li, D. Target-aware bidirectional fusion transformer for aerial object tracking. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 29071–29082. [Google Scholar] [CrossRef]
- He, B.; Zhao, X.; Chen, Y.; Liu, C.; Pang, X. Application of feature tracking using k-nearest-neighbor vector field consensus in sea ice tracking. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4326–4336. [Google Scholar] [CrossRef]
- Gao, W.; Niu, W.; Lu, W.; Wang, P.; Qi, Z.; Peng, X.; Yang, Z. Dim small target detection and tracking: A novel method based on temporal energy selective scaling and trajectory association. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 17239–17262. [Google Scholar] [CrossRef]
- Hui, T.; Xun, Z.; Peng, F.; Huang, J.; Wei, X.; Wei, X.; Dai, J.; Han, J.; Liu, S. Bridging search region interaction with template for RGB-T tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 13630–13639. [Google Scholar]
- Wang, W.; Li, C.; Zhang, D.; Zhou, H.; Xie, M.; Zhou, H.; Fu, K. FcFNet: A challenge-based feature complementary fusion network for RGB-T tracking. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 2239–2251. [Google Scholar] [CrossRef]
- Zhang, H.; Yuan, D.; Shu, X.; Li, Z.; Liu, Q.; Chang, X.; He, Z.; Shi, G. A comprehensive review of RGB-T tracking. IEEE Trans. Instrum. Meas. 2024, 73, 5027223. [Google Scholar]
- Zhang, T.; Liu, X.; Zhang, Q.; Han, J. SiamCDA: Complementarity- and distractor-aware RGB-T tracking based on siamese network. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 1403–1417. [Google Scholar] [CrossRef]
- Cheng, Z.; Fan, H.; Tang, Y.; Wang, Q. RGB-T object tracking network based on multi-scale modality fusion. J. Shandong Univ. Sci. Technol. 2024, 43, 89–99. [Google Scholar]
- Sun, D.; Pan, Y.; Lu, A.; Li, C.; Luo, B. Transformer RGB-T tracking with spatio-temporal multimodal tokens. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 12059–12072. [Google Scholar] [CrossRef]
- Fan, H.; Yu, Z.; Wang, Q.; Fan, B.; Tang, Y. QueryTrack: Joint-modality query fusion network for RGB-T tracking. IEEE Trans. Image Process. 2024, 33, 3187–3199. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 26551–26561. [Google Scholar]
- Li, M.; Zhang, P.; Yan, M.; Chen, H.; Wu, C. Dynamic feature-memory transformer network for RGB-T tracking. IEEE Sens. 2023, 23, 19692–19703. [Google Scholar] [CrossRef]
- Wu, Y.; Guan, X.; Zhao, B.; Huang, M. Vehicle detection based on adaptive multi-modal feature fusion and cross-modal vehicle index using RGB-T images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 8166–8177. [Google Scholar] [CrossRef]
- Guo, R.; Guo, X.; Sun, X.; Zhou, P.; Sun, B.; Su, S. Background-aware cross-attention multiscale fusion for multispectral object detection. Remote Sens. 2024, 16, 4034. [Google Scholar] [CrossRef]
- Xu, C.; Gao, L.; Liu, Y.; Zhang, Q.; Su, N.; Zhang, S.; Li, T.; Zheng, X. CMShipReID: A cross-modality ship dataset for the reidentification task. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 10503–10513. [Google Scholar] [CrossRef]
- Cawse-Nicholson, K.; Hook, S.J.; Miller, C.E.; Thompson, D.R. Intrinsic dimensionality in combined visible to thermal infrared imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 12, 4977–4984. [Google Scholar] [CrossRef]
- Zhang, L.; Danelljan, M.; Gonzalez-Garcia, A.; van de Weijer, J.; Khan, F.S. Multi-modal fusion for end-to-end RGB-T tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 2252–2261. [Google Scholar]
- Wu, B.; Zhang, R.; Liu, Y. Research on RGB-T multimodal interaction tracking algorithm with improved ViT. Comput. Eng. Appl. 2025, 61, 267–277. [Google Scholar]
- Li, C.; Liu, L.; Lu, A.; Ji, Q.; Tang, J. Challenge-aware RGB-T tracking. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 222–237. [Google Scholar]
- Xiao, Y.; Yang, M.; Li, C.; Liu, L.; Tang, J. Attribute-based progressive fusion network for RGB-T tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; pp. 2831–2838. [Google Scholar]
- Zhang, P.; Zhao, J.; Bo, C.; Wang, D.; Lu, H.; Yang, X. Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Trans. Image Process. 2021, 30, 3335–3347. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Liu, A.; Reid, I.; Hartley, R.; Zhuang, B.; Tang, H. Motion Mamba: Efficient and long sequence motion generation. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024. [Google Scholar]
- Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision Mamba: Efficient visual representation learning with bidirectional state space model. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), Vienna, Austria, 23–25 July 2024; pp. 62429–62442. [Google Scholar]
- Kang, B.; Chen, X.; Lai, S.; Liu, Y.; Liu, Y.; Wang, D. Exploring enhanced contextual information for video-level object tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025. [Google Scholar]
- Wu, Y.; Yang, X.; Wang, X.; Ye, H.; Zeng, D.; Li, S. MambaNUT: Nighttime UAV tracking via Mamba-based adaptive curriculum learning. In Proceedings of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hangzhou, China, 19–25 October 2025. [Google Scholar]
- Yao, M.; Peng, J.; He, Q.; Peng, B.; Chen, H.; Chi, M.; Liu, C.; Benediktsson, J.A. MM-Tracker: Motion Mamba for UAV-platform multiple object tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025. [Google Scholar]
- Ye, B.; Chang, H.; Ma, B.; Shan, S.; Chen, X. Joint feature learning and relation modeling for tracking: A one-stream framework. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 341–357. [Google Scholar]
- Hou, J.; Chen, X.; Wu, C.; Zhou, M.; Li, J.; Hong, D. Bilateral adaptive evolution transformer for multispectral image fusion. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5400612. [Google Scholar] [CrossRef]
- Roy, S.K.; Deria, A.; Hong, D.; Rasti, B.; Plaza, A.; Chanussot, J. Multimodal fusion transformer for remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5515620. [Google Scholar] [CrossRef]
- Wu, Q.; Yang, T.; Liu, Z.; Wu, B.; Shan, Y.; Chan, A.B. DropMAE: Masked autoencoders with spatial-attention dropout for tracking tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14561–14571. [Google Scholar]
- Li, C.; Xue, W.; Jia, Y.; Qu, Z.; Luo, B.; Tang, J.; Sun, D. LasHeR: A large-scale high-diversity benchmark for RGB-T tracking. IEEE Trans. Image Process. 2022, 31, 392–404. [Google Scholar] [CrossRef] [PubMed]
- Li, C.; Zhao, N.; Lu, Y.; Zhu, C.; Tang, J. Weighted sparse representation regularized graph learning for RGB-T object tracking. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1856–1864. [Google Scholar]
- Li, C.; Liang, X.; Lu, Y.; Zhao, N.; Tang, J. RGB-T object tracking: Benchmark and baseline. Pattern Recognit. 2019, 96, 106977. [Google Scholar] [CrossRef]
- Zhang, P.; Zhao, J.; Wang, D.; Lu, H.; Ruan, X. Visible-thermal UAV tracking: A large-scale benchmark and new baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21–24 June 2022; pp. 8886–8895. [Google Scholar]
- Zhang, P.; Wang, D.; Liu, H.; Yang, X. Learning adaptive attribute-driven representation for real-time RGB-T tracking. Int. J. Comput. Vis. 2021, 129, 2714–2729. [Google Scholar] [CrossRef]
- Lu, A.; Qian, C.; Li, C.; Tang, J.; Wang, L. Duality-gated mutual condition network for RGB-T tracking. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 4118–4131. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Guo, H.; Jiao, Q.; Zhang, Q.; Han, J. Efficient RGB-T tracking via cross-modality distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5404–5413. [Google Scholar]
- Zhu, Y.; Li, C.L.; Zhao, N.; Tang, J.; Lu, H. Quality-aware feature aggregation network for robust RGB-T tracking. IEEE Trans. Intell. Veh. 2021, 6, 121–130. [Google Scholar] [CrossRef]
- Liu, L.; Li, C.L.; Xiao, Y.; Tang, J. Quality-aware RGB-T tracking via supervised reliability learning and weighted residual guidance. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 3129–3137. [Google Scholar]
- Liu, L.; Li, C.L.; Xiao, Y.; Ruan, R.; Fan, M. RGBT tracking via challenge-based appearance disentanglement and interaction. IEEE Trans. Image Process. 2024, 33, 1753–1767. [Google Scholar] [CrossRef]
- Mei, J.; Zhou, D.; Cao, J.; Nie, R.; He, K. Differential reinforcement and global collaboration network for RGB-T tracking. IEEE Sens. J. 2023, 23, 7301–7311. [Google Scholar] [CrossRef]
- Qin, Y.; Zhang, J.; Fan, S.; Liu, Z.; Wang, J. MCIT: Multi-level cross-modal interactive transformer for RGB-T tracking. Neurocomputing 2025, 649, 130758. [Google Scholar] [CrossRef]
- Cao, B.; Guo, J.; Zhu, P.; Hu, Q. Bi-directional adapter for multimodal tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 927–935. [Google Scholar]
- Tang, Z.; Xu, T.; Wu, X.; Zhu, X.-F.; Kittler, J. Generative-based fusion mechanism for multimodal tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 5189–5197. [Google Scholar]
- Hu, X.; Zhong, B.; Liang, Q.; Zhang, S.; Li, N.; Li, X. Toward modalities correlation for RGB-T tracking. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 9102–9111. [Google Scholar] [CrossRef]
- Zhang, J.; Qin, Y.; Fan, S.; Xiao, Z.; Zhang, J. SiamTFA: Siamese triple-stream feature aggregation network for efficient RGB-T tracking. IEEE Trans. Intell. Transp. Syst. 2024, 26, 1900–1913. [Google Scholar] [CrossRef]
- Lu, A.; Wang, W.; Li, C.L.; Tang, J.; Luo, B. AFTER: Attention-based fusion router for RGB-T tracking. IEEE Trans. Image Process. 2025, 34, 4386–4401. [Google Scholar] [CrossRef]
- Hu, Y.; Shao, Z.; Fan, B.; Liu, H. Dual-level modality de-biasing for RGB-T tracking. IEEE Trans. Image Process. 2025, 34, 2667–2679. [Google Scholar] [CrossRef] [PubMed]
- Zhu, J.; Lai, S.; Chen, X.; Wang, D.; Lu, H. Visual prompt multi-modal tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 9516–9526. [Google Scholar]








| M210. | Backbone | LasHeR | RGBT210 | RGBT234 | VTUAV | FPS | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| SR | NPR | PR | SR | PR | SR | PR | SR | PR | |||
| mfDiMP [17] | ResNet50 | 34.3 | 39.5 | 44.7 | 55.5 | 78.6 | 42.8 | 64.6 | 55.4 | 67.3 | 10.3 |
| ADRNet [35] | VGG-M | 30.8 | 39.5 | 44.4 | 53.4 | 77.8 | 57.1 | 80.9 | 46.6 | 62.2 | 25.0 |
| CAT [19] | VGG-M | 31.4 | 39.5 | 45.0 | 53.3 | 79.2 | 56.1 | 80.4 | - | - | 20 |
| HMFT [34] | ResNet50 | 32.6 | 41.3 | 46.0 | 53.5 | 78.6 | 56.8 | 78.8 | 62.7 | 75.8 | 30.2 |
| DMCNet [36] | VGG-M | 35.5 | 43.1 | 49.0 | 55.5 | 79.7 | 59.3 | 83.9 | - | - | 2.3 |
| CMD [37] | ResNet50 | 46.4 | 54.6 | 59.0 | - | - | 58.4 | 82.4 | - | - | 30 |
| FANet [38] | VGG-M | 30.9 | 38.4 | 44.1 | - | - | 55.3 | 78.7 | - | - | 19 |
| QAT [39] | ResNet50 | 50.1 | 59.6 | 64.2 | 61.9 | 86.7 | 64.4 | 88.4 | 66.7 | 80.1 | 22 |
| CAT++ [40] | VGG-M | 35.6 | 44.4 | 50.9 | 56.1 | 82.2 | 59.2 | 84.0 | - | - | - |
| DRGCNet [41] | VGG-M | 33.8 | 42.3 | 48.3 | - | - | 58.1 | 82.5 | - | - | 4.9 |
| APFNet [20] | VGG-M | 36.2 | 43.9 | 50.0 | - | - | 57.9 | 82.7 | - | - | 1.3 |
| MCIT [42] | ViT-B | 50.5 | - | 64.5 | 56.5 | 80.8 | 59.5 | 83.1 | 70.6 | 85.3 | - |
| TBSI [4] | ViT-B | 55.6 | 65.7 | 69.2 | 62.5 | 85.3 | 63.7 | 87.1 | - | - | 36.2 |
| BAT [43] | ViT-B | 56.3 | - | 70.2 | - | - | 64.1 | 86.8 | - | - | - |
| GMMT [44] | ViT-B | 56.6 | 67.0 | 70.7 | - | - | 64.7 | 87.9 | - | - | - |
| MCTrack [45] | ViT-B | 57.1 | 67.6 | 71.6 | - | - | 65.6 | 87.5 | - | - | - |
| SiamTFA [46] | ViT-B | 48.1 | - | 62.5 | 56.3 | 79.7 | 59.2 | 82.2 | 67.9 | 82.1 | 37.0 |
| AFTER [47] | ViT-B | 55.1 | 65.8 | 70.3 | 63.5 | 87.6 | 66.7 | 90.1 | 72.5 | 84.9 | 20.9 |
| DMD [48] | ViT-B | 57.6 | 68.6 | 72.6 | 63.7 | 87.0 | 66.7 | 89.3 | - | - | 17 |
| STMT [9] | ViT-B | 53.7 | 63.4 | 67.4 | 59.5 | 83.0 | 63.4 | 67.4 | - | - | 39.1 |
| Ours | ViT-B | 57.8 | 69.1 | 73.2 | 64.3 | 86.7 | 66.9 | 88.7 | 74.8 | 86.5 | 35.3 |
| Methods | APFNeT | mfDiMP | ViPT | TBSI | Ours |
|---|---|---|---|---|---|
| NO | 46.7/66.7 | 57.5/76.5 | 68.4/84.1 | 74.1/91.4 | 72.4/88.4 |
| PO | 34.5/47.3 | 30.8/39.7 | 50.3/62.4 | 54.0/67.8 | 55.6/70.7 |
| TO | 31.4/41.7 | 25.0/32.2 | 46.1/57.6 | 51.0/64.3 | 51.9/66.0 |
| HO | 27.7/27.1 | 19.8/23.8 | 43.8/43.7 | 53.4/60.6 | 55.1/63.5 |
| MB | 32.8/45.9 | 28.7/37.6 | 45.9/57.3 | 49.5/63.1 | 51.9/67.0 |
| LI | 30.8/41.8 | 23.8/29.6 | 41.2/49.8 | 49.3/61.3 | 51.5/64.3 |
| HI | 41.2/60.4 | 35.1/46.7 | 54.2/67.8 | 58.2/73.8 | 62.2/79.5 |
| AIV | 26.2/32.1 | 16.6/16.4 | 34.2/36.3 | 49.8/58.2 | 50.2/59.1 |
| LR | 29.4/46.1 | 25.6/40.2 | 41.6/56.4 | 47.3/63.9 | 46.0/63.1 |
| DER | 36.8/45.8 | 34.2/40.3 | 55.7/67.4 | 58.7/71.6 | 61.0/75.1 |
| BC | 33.7/44.9 | 27.0/34.9 | 51.8/64.9 | 55.7/69.9 | 56.3/70.3 |
| SA | 31.7/42.8 | 29.5/37.2 | 46.5/57.3 | 50.2/62.2 | 50.7/64.2 |
| CM | 47.7/35.1 | 30.6/40.8 | 50.0/62.1 | 55.0/69.5 | 55.7/70.5 |
| TC | 31.6/43.1 | 28.8/38.0 | 46.0/57.3 | 50.1/62.6 | 51.6/65.3 |
| FL | 27.9/37.6 | 25.7/32.3 | 46.5/59.1 | 47.5/60.9 | 52.1/66.3 |
| OV | 34.2/36.4 | 34.9/40.6 | 65.0/76.2 | 55.9/64.6 | 64.0/74.2 |
| FM | 33.9/45.1 | 32.4/41.3 | 51.4/63.1 | 55.7/69.4 | 57.0/71.4 |
| SV | 36.0/49.8 | 34.9/45.2 | 52.5/65.0 | 56.2/70.2 | 58.2/73.4 |
| ARC | 31.0/40.5 | 30.9/37.8 | 49.5/59.3 | 52.5/64.3 | 55.3/68.1 |
| ALL | 36.2/50.0 | 34.3/44.7 | 52.5/65.1 | 56.3/70.5 | 57.8/73.2 |
| A | B | Resolution | SR | NPR | PR |
|---|---|---|---|---|---|
| × | × | 256 × 256 | 47.8 | 55.8 | 60.4 |
| √ | × | 256 × 256 | 55.3 | 63.6 | 66.9 |
| × | √ | 256 × 256 | 54.6 | 63.1 | 66.7 |
| × | √* | 256 × 256 | 54.1 | 62.3 | 65.2 |
| √ | √ | 256 × 256 | 57.8 | 69.1 | 73.2 |
| √ | √ | 384 × 384 | 58.3 | 69.5 | 73.8 |
| Layers | SR | NPR | PR |
|---|---|---|---|
| 1 | 55.3 | 66.7 | 75.3 |
| 3 | 56.4 | 67.3 | 75.9 |
| 6 | 57.2 | 68.2 | 77.5 |
| 12 | 57.8 | 69.1 | 78.2 |
| Methods | SR | NPR | Parameters | FLOPs | FPS |
|---|---|---|---|---|---|
| GMMT | 56.6 | 70.7 | 962.2 M | 146.5 G | 22.4 |
| TBSI | 55.6 | 69.2 | 99.3 M | 38.5 G | 36.2 |
| DMD | 57.6 | 68.2 | 121.6 M | 102.2 G | 16.8 |
| MIFMNet | 57.8 | 69.1 | 17.2 M | 5.6 G | 35.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Guo, R.; Sun, X.; Sun, B.; Qian, H.; Dang, Z.; Zhou, P.; Liu, F.; Su, S. MIFMNet: A Multimodal Interactions and Fusion Mamba for RGBT Tracking with UAV Platforms. Remote Sens. 2026, 18, 1026. https://doi.org/10.3390/rs18071026
Guo R, Sun X, Sun B, Qian H, Dang Z, Zhou P, Liu F, Su S. MIFMNet: A Multimodal Interactions and Fusion Mamba for RGBT Tracking with UAV Platforms. Remote Sensing. 2026; 18(7):1026. https://doi.org/10.3390/rs18071026
Chicago/Turabian StyleGuo, Runze, Xiaoyong Sun, Bei Sun, Hanxiang Qian, Zhaoyang Dang, Peida Zhou, Feiyang Liu, and Shaojing Su. 2026. "MIFMNet: A Multimodal Interactions and Fusion Mamba for RGBT Tracking with UAV Platforms" Remote Sensing 18, no. 7: 1026. https://doi.org/10.3390/rs18071026
APA StyleGuo, R., Sun, X., Sun, B., Qian, H., Dang, Z., Zhou, P., Liu, F., & Su, S. (2026). MIFMNet: A Multimodal Interactions and Fusion Mamba for RGBT Tracking with UAV Platforms. Remote Sensing, 18(7), 1026. https://doi.org/10.3390/rs18071026

