LF-SSM: Lightweight HiPPO-Free State Space Model for Real-Time UAV Tracking
Highlights
- We propose LF-SSM, a lightweight HiPPO-free state space model that reformulates state evolution on Riemannian manifolds through geodesic dynamics, eliminating complex discretization procedures and specialized hardware kernels required by existing SSM methods.
- The proposed Geodesic State Module (GSM) performs state updates through tangent space projection and exponential mapping on the unit sphere, providing adaptive local coordinate systems that preserve geometric structure of tracking features.
- LF-SSM achieves state-of-the-art performance on multiple UAV tracking benchmarks while running at 69 FPS with only 18.5M parameters on edge platforms (Jetson Orin Nano), enabling practical real-time deployment on resource-constrained UAV systems.
- The manifold-based formulation demonstrates that geometric approaches can effectively replace HiPPO-derived state transitions in visual tracking, opening new research directions for efficient sequence modeling without relying on fixed polynomial bases.
Abstract
1. Introduction
- We propose LF-SSM, a lightweight HiPPO-free state space model that fundamentally reformulates state evolution on Riemannian manifolds for visual tracking. Unlike existing SSM methods that project features onto fixed Legendre polynomial bases, our geodesic dynamics naturally preserve the geometric structure of tracking features through adaptive local coordinate systems. This eliminates both the computational overhead of HiPPO discretization and the information loss from fixed basis compression, while maintaining linear complexity with respect to sequence length.
- We design a Geodesic State Module (GSM) that performs state updates through tangent space projection and exponential mapping on the unit sphere . The core innovation lies in replacing the HiPPO-derived state transition matrix with geometric operations involving only vector computations. This design eliminates the need for specialized hardware kernels (such as Mamba’s selective scan) while providing adaptive tangent spaces that change with the evolving state. The input-dependent step size enables content-aware updates, and the prior velocity mechanism implements geometric forgetting through parallel transport approximation.
- Extensive experiments on four UAV benchmarks (UAV123, VisDrone, ARDMAV, LaSOT) demonstrate that LF-SSM achieves state-of-the-art performance while enabling practical real-time deployment on resource-constrained platforms. Specifically, LF-SSM-L achieves 73.2% AUC on UAV123, outperforming all SSM-based baselines by significant margins.
2. Related Work
2.1. UAV Visual Tracking
2.2. State Space Models
3. Preliminary
3.1. State Space Model
3.2. Riemannian Geometry
4. Methodology
4.1. Overall Architecture
4.2. Geodesic State Evolution
| Algorithm 1 Geodesic state evolution |
| Require: Input sequence , initial state , stability constant , confidence weight Ensure: Output sequence {Initialize prior velocity as zero vector} for do {Project input to state space} {Combine input with prior momentum} {Project to tangent space} {Compute input-dependent step size} {Geodesic update} {Transport prior to new tangent space} {Output projection} end for return |
4.3. Geodesic State Module
4.4. GSM Block
4.5. Training and Inference
4.5.1. Training
4.5.2. Inference
| Algorithm 2 LF-SSM Training |
| Require: Training set , learning rate , epochs E, loss weights , Ensure: Trained model parameters Initialize model parameters for do for each batch in do , end for end for return |
| Algorithm 3 LF-SSM Inference |
| Require: Template image , search images Ensure: Bounding boxes {Encode once and cache} for do end for return |
5. Efficiency Analysis
5.1. Time Complexity
5.2. Space Complexity
5.3. Computational Operations
6. Experiments and Analysis
6.1. Experimental Setup
6.1.1. Datasets
6.1.2. Evaluation Metrics
6.1.3. Implementation Details
6.1.4. Baselines
6.2. Comparison with SOTA Methods
6.3. Qualitative Evaluation
6.4. Ablation Study
6.4.1. Component Analysis
6.4.2. Comparison with SSM Methods
6.4.3. Quantitative Feature Analysis
6.5. Parameter Sensitivity Analysis
7. Discussion
7.1. Perturbation Stability of Geodesic Dynamics
7.2. Temporal Forgetting Through Geometric Transport
7.3. Relationship to HiPPO-Based State Transitions
7.4. Limitations and Future Directions
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Huang, X.; Bai, Y.; Ma, J.; Li, Y.; Shang, C.; Shen, Q. NightTrack: Joint Night-Time Image Enhancement and Object Tracking for UAVs. Drones 2025, 9, 824. [Google Scholar] [CrossRef]
- Hossein Motlagh, N.; Kortoçi, P.; Su, X.; Lovén, L.; Hoel, H.K.; Bjerkestrand Haugsvær, S.; Srivastava, V.; Gulbrandsen, C.F.; Nurmi, P.; Tarkoma, S. Unmanned Aerial Vehicles for Air Pollution Monitoring: A Survey. IEEE Internet Things J. 2023, 10, 21687–21704. [Google Scholar] [CrossRef]
- Liu, Z.; An, P.; Yang, Y.; Qiu, S.; Liu, Q.; Xu, X. Vision-Based Drone Detection in Complex Environments: A Survey. Drones 2024, 8, 643. [Google Scholar] [CrossRef]
- Wang, Y.; Su, Z.; Xu, Q.; Li, R.; Luan, T.H.; Wang, P. A Secure and Intelligent Data Sharing Scheme for UAV-Assisted Disaster Rescue. IEEE/ACM Trans. Netw. 2023, 31, 2422–2438. [Google Scholar] [CrossRef]
- Khaki, S.; Safaei, N.; Pham, H.; Wang, L. WheatNet: A Lightweight Convolutional Neural Network for High-Throughput Image-Based Wheat Head Detection and Counting. Neurocomputing 2022, 489, 78–89. [Google Scholar] [CrossRef]
- Zheng, L.; Zeng, J.; Qin, L.; Ju, R. Multi-Attention Meets Pareto Optimization: A Reinforcement Learning Method for Adaptive UAV Formation Control. Drones 2025, 9, 845. [Google Scholar] [CrossRef]
- Mueller, M.; Smith, N.; Ghanem, B. A Benchmark and Simulator for UAV Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 445–461. [Google Scholar]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The Unmanned Aerial Vehicle Benchmark: Object Detection, Tracking and Baseline. Int. J. Comput. Vis. 2020, 128, 1141–1159. [Google Scholar]
- Li, S.; Yeung, D.-Y. Visual Object Tracking for Unmanned Aerial Vehicles: A Benchmark and New Motion Models. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Ling, H.; Hu, Q.; Wu, H.; Nie, Q.; Cheng, H.; Liu, C.; et al. Detection and Tracking Meet Drones Challenge. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7380–7399. [Google Scholar] [CrossRef]
- Ye, B.; Chang, H.; Ma, B.; Shan, S.; Chen, X. Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
- Lin, L.; Fan, H.; Xu, Y.; Ling, H. SwinTrack: A Simple and Strong Baseline for Transformer Tracking. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Chen, X.; Peng, H.; Wang, D.; Lu, H.; Hu, H. SeqTrack: Sequence to Sequence Learning for Visual Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 14572–14581. [Google Scholar]
- Yan, B.; Peng, H.; Fu, J.; Wang, D.; Lu, H. Learning Spatio-Temporal Transformer for Visual Tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10448–10457. [Google Scholar]
- Chen, X.; Yan, B.; Zhu, J.; Wang, D.; Yang, X.; Lu, H. Transformer Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 8126–8135. [Google Scholar]
- Hamilton, J.D. State-Space Models. In Handbook of Econometrics; Elsevier: Amsterdam, The Netherlands, 1994; Volume 4, pp. 3039–3080. [Google Scholar]
- Yao, M.; Peng, J.; He, Q.; Peng, B.; Chen, H.; Chi, M.; Liu, C.; Benediktsson, J.A. MM-Tracker: Motion Mamba for UAV-platform Multiple Object Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 27 February–2 March 2025; Volume 39, pp. 9409–9417. [Google Scholar]
- Xie, J.; Zhong, B.; Liang, Q.; Li, N.; Mo, Z.; Song, S. Robust Tracking via Mamba-Based Context-Aware Token Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 27 February–2 March 2025; Volume 39, pp. 8727–8735. [Google Scholar]
- Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. In Proceedings of the First Conference on Language Modeling, Philadelphia, PA, USA, 7–9 October 2024. [Google Scholar]
- Gu, A.; Dao, T.; Ermon, S.; Rudra, A.; Ré, C. Hippo: Recurrent Memory with Optimal Polynomial Projections. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–12 December 2020; Volume 33, pp. 1474–1487. [Google Scholar]
- Netto, C.F.D.; Wang, Z.; Ruiz, L. Improved image classification with manifold neural networks. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar]
- Narayanan, H.; Mitter, S. Sample complexity of testing the manifold hypothesis. Adv. Neural Inf. Process. Syst. 2010, 23, 1786–1794. [Google Scholar]
- Liu, W.; Wen, Y.; Yu, Z.; Li, M.; Raj, B.; Song, L. SphereFace: Deep Hypersphere Embedding for Face Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 212–220. [Google Scholar]
- Mettes, P.; Van der Pol, E.; Snoek, C. Hyperspherical Prototype Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H.S. Fully-Convolutional Siamese Networks for Object Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 850–865. [Google Scholar]
- Hong, M.; Li, S.; Yang, Y.; Zhu, F.; Zhao, Q.; Lu, L. SSPNet: Scale Selection Pyramid Network for Tiny Person Detection From UAV Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High Performance Visual Tracking with Siamese Region Proposal Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8971–8980. [Google Scholar]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4282–4291. [Google Scholar]
- Zhu, Z.; Wang, Q.; Li, B.; Wu, W.; Yan, J.; Hu, W. Distractor-aware Siamese Networks for Visual Object Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 101–117. [Google Scholar]
- Fu, C.; Cao, Z.; Li, Y.; Ye, J.; Feng, C. Siamese Anchor Proposal Network for High-Speed Aerial Tracking. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 510–516. [Google Scholar]
- Xu, Y.; Wang, Z.; Li, Z.; Yuan, Y.; Yu, G. SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12549–12556. [Google Scholar]
- Guo, D.; Wang, J.; Cui, Y.; Wang, Z.; Chen, S. SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6269–6277. [Google Scholar]
- Chen, Z.; Zhong, B.; Li, G.; Zhang, S.; Ji, R. Siamese Box Adaptive Network for Visual Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6668–6677. [Google Scholar]
- Cao, Z.; Fu, C.; Ye, J.; Li, B.; Li, Y. HiFT: Hierarchical Feature Transformer for Aerial Tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 15457–15466. [Google Scholar]
- Cao, Z.; Huang, Z.; Pan, L.; Zhang, S.; Liu, Z.; Fu, C. TCTrack: Temporal Contexts for Aerial Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 14798–14808. [Google Scholar]
- Cao, Z.; Huang, Z.; Pan, L.; Zhang, S.; Liu, Z.; Fu, C. Towards Real-World Visual Tracking with Temporal Contexts. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 15834–15849. [Google Scholar] [CrossRef]
- Ye, J.; Fu, C.; Zheng, G.; Paudel, D.P.; Chen, G. Unsupervised Domain Adaptation for Nighttime Aerial Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 8896–8905. [Google Scholar]
- Yao, L.; Fu, C.; Li, S.; Zheng, G.; Ye, J. SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 3353–3359. [Google Scholar]
- Wu, Y.; Li, Y.; Liu, M.; Wang, X.; Yang, X.; Ye, H.; Zeng, D.; Zhao, Q.; Li, S. Learning an Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking. IEEE Trans. Circuits Syst. Video Technol. Early Access. 2025. [Google Scholar] [CrossRef]
- Cui, Y.; Jiang, C.; Wang, L.; Wu, G. MixFormer: End-to-End Tracking with Iterative Mixed Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 13608–13618. [Google Scholar]
- Cui, Y.; Song, T.; Wu, G.; Wang, L. MixFormerV2: Efficient Fully Transformer Tracking. Adv. Neural Inf. Process. Syst. 2023, 36, 58736–58751. [Google Scholar]
- Gao, S.; Zhou, C.; Ma, C.; Wang, X.; Yuan, J. AiATrack: Attention in Attention for Transformer Visual Tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 146–164. [Google Scholar]
- Pauwels, K.; Kragic, D. SimTrack: A Simulation-Based Framework for Scalable Real-Time Object Pose Detection and Tracking. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 1300–1307. [Google Scholar]
- Yuan, X.; Xu, T.; Liu, X.; Wang, Y.; Qin, H.; Fang, Y.; Li, J. Multi-Step Temporal Modeling for UAV Tracking. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7216–7230. [Google Scholar] [CrossRef]
- Cai, Y.; Liu, J.; Tang, J.; Wu, G. Robust Object Modeling for Visual Tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 9589–9600. [Google Scholar]
- Wei, X.; Bai, Y.; Zheng, Y.; Shi, D.; Gong, Y. Autoregressive Visual Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 9697–9706. [Google Scholar]
- Zheng, Y.; Zhong, B.; Liang, Q.; Mo, Z.; Zhang, S.; Li, X. ODTrack: Online Dense Temporal Token Learning for Visual Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 7588–7596. [Google Scholar]
- Xie, F.; Chu, L.; Li, J.; Lu, Y.; Ma, C. VideoTrack: Learning to Track Objects via Video Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 22826–22835. [Google Scholar]
- Shi, L.; Zhong, B.; Liang, Q.; Li, N.; Zhang, S.; Li, X. Explicit Visual Prompts for Visual Object Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 4838–4846. [Google Scholar]
- Cai, W.; Liu, Q.; Wang, Y. SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 16871–16881. [Google Scholar]
- Cai, W.; Liu, Q.; Wang, Y. HIPTrack: Visual Tracking with Historical Prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 19258–19267. [Google Scholar]
- Fan, H.; Lin, L.; Yang, F.; Chu, P.; Deng, G.; Yu, S.; Bai, H.; Xu, Y.; Liao, C.; Ling, H. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5374–5383. [Google Scholar]
- Huang, L.; Zhao, X.; Huang, K. GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 1562–1577. [Google Scholar] [CrossRef] [PubMed]
- Müller, M.; Bibi, A.; Giancola, S.; Alsubaihi, S.; Ghanem, B. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 300–317. [Google Scholar]
- Li, H.; Xu, K. Innovative Adaptive Edge Detection for Noisy Images Using Wavelet and Gaussian Method. Sci. Rep. 2025, 15, 5838. [Google Scholar] [CrossRef]
- Li, J.; Sun, S.; Wang, Y.; Zhang, J.; Zhuo, L. TSTrack: A Light-weight Transformer-based Spatiotemporal Feature Refinement Tracking Algorithm. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4707416. [Google Scholar] [CrossRef]
- Borsuk, V.; Vei, R.; Kupyn, O.; Martyniuk, T.; Krashenyi, I.; Matas, J. FEAR: Fast, Efficient, Accurate and Robust Visual Tracker. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 644–663. [Google Scholar]
- Blatter, P.; Kanakis, M.; Danelljan, M.; Van Gool, L. Efficient Visual Tracking with Exemplar Transformers. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 1571–1581. [Google Scholar]
- Chen, X.; Kang, B.; Wang, D.; Li, D.; Lu, H. Efficient Visual Tracking via Hierarchical Cross-Attention Transformer. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 461–477. [Google Scholar]
- Li, S.; Yang, Y.; Zeng, D.; Wang, X. Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 13989–14000. [Google Scholar]
- Zhang, W.; Zhang, Y.; Liu, Y.; Wu, L.; Hu, X. REATA: An Efficient Vision Transformer Accelerator Featuring a Resource-Optimized Attention Design on Versal ACAP. ACM Trans. Reconfig. Technol. Syst. 2025. [Google Scholar] [CrossRef]
- Wang, J.; Li, X.; Chen, J.; Zhou, L.; Guo, L.; He, Z.; Zhou, H.; Zhang, Z. DPH-YOLOv8: Improved YOLOv8 Based on Double Prediction Heads for the UAV Image Object Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar]
- Huang, S.; Lin, C.; Jiang, X.; Qu, Z. BRSTD: Bio-Inspired Remote Sensing Tiny Object Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
- Liu, X.; Zhang, C.; Huang, F.; Xia, S.; Wang, G.; Zhang, L. Vision Mamba: A Comprehensive Survey and Taxonomy. IEEE Trans. Neural Netw. Learn. Syst. Early Access. 2025. [Google Scholar] [CrossRef]
- Patro, B.N.; Agneeswaran, V.S. Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges. Eng. Appl. Artif. Intell. 2025, 159, 111279. [Google Scholar] [CrossRef]
- Gu, A.; Goel, K.; Ré, C. Efficiently Modeling Long Sequences with Structured State Spaces. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2022. [Google Scholar]
- Gu, A.; Goel, K.; Gupta, A.; Ré, C. On the Parameterization and Initialization of Diagonal State Space Models. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 35971–35983. [Google Scholar]
- Smith, J.T.H.; Warrington, A.; Linderman, S. Simplified State Space Layers for Sequence Modeling. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Gupta, A.; Gu, A.; Berant, J. Diagonal State Spaces Are as Effective as Structured State Spaces. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 22982–22994. [Google Scholar]
- Gu, A.; Johnson, I.; Goel, K.; Saab, K.; Dao, T.; Rudra, A.; Ré, C. Combining Recurrent, Convolutional, and Continuous-Time Models with Linear State Space Layers. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS); 6-14 December 2021; Online 34, pp. 572–585. [Google Scholar]
- Fu, D.Y.; Dao, T.; Saab, K.K.; Thomas, A.W.; Rudra, A.; Ré, C. Hungry Hungry Hippos: Towards Language Modeling with State Space Models. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Poli, M.; Massaroli, S.; Nguyen, E.; Fu, D.Y.; Dao, T.; Baccus, S.; Bengio, Y.; Ermon, S.; Ré, C. Hyena Hierarchy: Towards Larger Convolutional Language Models. In Proceedings of the International Conference on Machine Learning (ICML), Honolulu, HI, USA, 23–29 July 2023; pp. 28043–28078. [Google Scholar]
- Tay, Y.; Dehghani, M.; Abnar, S.; Shen, Y.; Bahri, D.; Pham, P.; Rao, J.; Yang, L.; Ruder, S.; Metzler, D. Long Range Arena: A Benchmark for Efficient Transformers. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021. [Google Scholar]
- Dao, T.; Gu, A. Transformers Are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality. In Proceedings of the International Conference on Machine Learning (ICML), Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Lenz, B.; Lieber, O.; Arazi, A.; Bergman, A.; Manevich, A.; Peleg, B.; Aviram, B.; Almagor, C.; Fridman, C.; Padnos, D.; et al. Jamba: Hybrid Transformer-Mamba Language Models. In Proceedings of the International Conference on Learning Representations (ICLR), Singapore, 24–28 April 2025. [Google Scholar]
- Peng, B.; Alcaide, E.; Anthony, Q.; Albalak, A.; Arcadinho, S.; Biderman, S.; Cao, H.; Cheng, X.; Chung, M.; Grella, M.; et al. RWKV: Reinventing RNNs for the Transformer Era. arXiv 2023, arXiv:2305.13048. [Google Scholar] [CrossRef]
- Han, D.; Wang, Z.; Xia, Z.; Han, Y.; Pu, Y.; Ge, C.; Song, J.; Song, S.; Zheng, B.; Huang, G. Demystify Mamba in Vision: A Linear Attention Perspective. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 9–15 December 2024; Volume 37, pp. 127181–127203. [Google Scholar]
- Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. In Proceedings of the International Conference on Machine Learning (ICML), Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. VMamba: Visual State Space Model. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 9–15 December 2024; Volume 37, pp. 103031–103063. [Google Scholar]
- Huang, T.; Pei, X.; You, S.; Wang, F.; Qian, C.; Xu, C. LocalMamba: Visual State Space Model with Windowed Selective Scan. In Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, 29 September–4 October 2024; pp. 12–22. [Google Scholar]
- Pei, X.; Huang, T.; Xu, C. EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 27 February–2 March 2025; Volume 39, pp. 6443–6451. [Google Scholar]
- Yang, C.; Chen, Z.; Espinosa, M.; Ericsson, L.; Wang, Z.; Liu, J.; Crowley, E.J. PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition. In Proceedings of the British Machine Vision Conference (BMVC), Glasgow, UK, 25–28 November 2024. [Google Scholar]
- Ma, C.; Wang, Z. Semi-Mamba-UNet: Pixel-Level Contrastive and Cross-Supervised Visual Mamba-Based UNet for Semi-Supervised Medical Image Segmentation. Knowl.-Based Syst. 2024, 300, 112203. [Google Scholar] [CrossRef]
- Ruan, J.; Li, J.; Xiang, S. VM-UNet: Vision Mamba UNet for Medical Image Segmentation. ACM Trans. Multimed. Comput. Commun. Appl. 2024. [Google Scholar] [CrossRef]
- Liu, J.; Yang, H.; Zhou, H.-Y.; Xi, Y.; Yu, L.; Li, C.; Liang, Y.; Shi, G.; Yu, Y.; Zhang, S.; et al. Swin-UMamba: Mamba-Based UNet with ImageNet-Based Pretraining. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Marrakesh, Morocco, 6–10 October 2024; pp. 615–625. [Google Scholar]
- Ma, J.; Li, F.; Wang, B. U-Mamba: Enhancing Long-Range Dependency for Biomedical Image Segmentation. arXiv 2024, arXiv:2401.04722. [Google Scholar]
- Xing, Z.; Ye, T.; Yang, Y.; Liu, G.; Zhu, L. SegMamba: Long-Range Sequential Modeling Mamba for 3D Medical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Marrakesh, Morocco, 6–10 October 2024. [Google Scholar]
- Hatamizadeh, A.; Kautz, J. MambaVision: A Hybrid Mamba-Transformer Vision Backbone. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 25261–25270. [Google Scholar]
- Long, J.; Zhang, Y.; Hu, S. A Visual Object Tracking Method Based on Historical Prompts of Mamba. Knowl.-Based Syst. 2025, 330, 114741. [Google Scholar] [CrossRef]
- Huang, J.; Wang, S.; Wang, S.; Wu, Z.; Wang, X.; Jiang, B. Mamba-FETrack: Frame-Event Tracking via State Space Model. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Urumqi, China, 18–20 October 2024; pp. 3–18. [Google Scholar]
- Lai, S.; Liu, C.; Zhu, J.; Kang, B.; Liu, Y.; Wang, D.; Lu, H. MambaVT: Spatio-Temporal Contextual Modeling for Robust RGB-T Tracking. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 9312–9323. [Google Scholar] [CrossRef]
- Li, K.; Li, X.; Wang, Y.; He, Y.; Wang, Y.; Wang, L.; Qiao, Y. VideoMamba: State Space Model for Efficient Video Understanding. arXiv 2024, arXiv:2403.06977. [Google Scholar] [CrossRef]
- Arjovsky, M.; Shah, A.; Bengio, Y. Unitary Evolution Recurrent Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), New York, NY, USA, 19–24 June 2016; pp. 1120–1128. [Google Scholar]
- Wisdom, S.; Powers, T.; Hershey, J.; Le Roux, J.; Atlas, L. Full-Capacity Unitary Recurrent Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Bonnabel, S. Stochastic Gradient Descent on Riemannian Manifolds. IEEE Trans. Autom. Control 2013, 58, 2217–2229. [Google Scholar] [CrossRef]
- Cao, Y.; He, Z.; Wang, L.; Wang, W.; Yuan, Y.; Zhang, D.; Zhang, J.; Zhu, P.; Van Gool, L.; Han, J.; et al. VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 2847–2854. [Google Scholar]
- Guo, H.; Zheng, Y.; Zhang, Y.; Gao, Z.; Zhao, S. Global-Local MAV Detection under Challenging Conditions Based on Appearance and Motion. IEEE Trans. Intell. Transp. Syst. 2024, 25, 12005–12017. [Google Scholar] [CrossRef]
- Wei, X.; Bai, Y.; Zheng, Y.; Shi, D.; Gong, Y. ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 18911–18920. [Google Scholar]
- Xie, J.; Zhong, B.; Mo, Z.; Zhang, S.; Shi, L.; Song, S.; Ji, R. Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 19300–19309. [Google Scholar]






| Method | UAV123 | VisDrone | ARDMAV | LaSOT | |||||
|---|---|---|---|---|---|---|---|---|---|
| AUC | P | AUC | P | AUC | P | AUC | P | ||
| SeqTrack [13] | 68.5 | 88.2 | 64.2 | 83.5 | 61.5 | 79.8 | 72.5 | 81.5 | 79.3 |
| VideoTrack [48] | 67.2 | 86.8 | 62.8 | 81.5 | 59.8 | 77.5 | 70.2 | 81.9 | 76.4 |
| ARTrackV2 [98] | 69.8 | 89.5 | 65.8 | 85.2 | 63.5 | 81.8 | 73.6 | 82.8 | 81.1 |
| ODTrack [47] | 70.2 | 89.8 | 66.2 | 85.5 | 64.2 | 82.5 | 74.0 | 84.2 | 82.3 |
| AQATrack [99] | 68.2 | 87.5 | 63.5 | 82.8 | 60.8 | 78.5 | 72.7 | 82.9 | 80.2 |
| EVPTrack [49] | 68.8 | 88.2 | 64.5 | 83.8 | 61.8 | 79.8 | 72.7 | 82.9 | 80.3 |
| MixFormer [40] | 69.5 | 88.9 | 65.5 | 84.5 | 63.2 | 81.2 | 72.4 | 82.2 | 80.1 |
| SPMTrack [50] | 72.8 | 91.8 | 68.5 | 87.8 | 66.5 | 84.8 | 77.4 | 86.6 | 85.0 |
| HIPTrack [51] | 68.5 | 87.8 | 64.2 | 83.2 | 61.2 | 79.2 | 72.7 | 82.9 | 79.5 |
| Mamba-FETrack [90] | 68.2 | 88.5 | 64.5 | 83.2 | 61.8 | 80.2 | 71.5 | 80.8 | 77.5 |
| MambaVT [91] | 67.8 | 87.5 | 63.8 | 82.5 | 60.5 | 78.8 | 70.8 | 80.2 | 76.8 |
| MambaVision [88] | 68.5 | 88.2 | 64.2 | 83.5 | 61.5 | 79.5 | 71.2 | 80.5 | 77.2 |
| VideoMamba [92] | 68.8 | 88.5 | 64.8 | 83.8 | 62.2 | 80.5 | 71.8 | 81.2 | 78.2 |
| LF-SSM-S | 69.5 | 89.2 | 65.2 | 84.5 | 62.8 | 81.2 | 72.8 | 82.5 | 79.8 |
| LF-SSM-M | 71.2 | 90.8 | 67.5 | 86.8 | 65.5 | 84.2 | 74.5 | 84.2 | 82.5 |
| LF-SSM-L | 73.2 | 92.5 | 69.2 | 88.5 | 67.8 | 86.2 | 75.8 | 85.5 | 84.2 |
| Method | Params (M) | GFLOPs | FPS | |
|---|---|---|---|---|
| 4090 | Orin Nano | |||
| SeqTrack [13] | 89 | 148 | 35 | 12 |
| VideoTrack [48] | 96 | 165 | 32 | 10 |
| ARTrackV2 [98] | 93 | 52 | 180 | 45 |
| ODTrack [47] | 92 | 73 | 95 | 28 |
| AQATrack [99] | 97 | 86 | 85 | 24 |
| EVPTrack [49] | 94 | 72 | 90 | 26 |
| MixFormer [40] | 196 | 245 | 25 | 8 |
| SPMTrack [50] | 105 | 78 | 155 | 38 |
| HIPTrack [51] | 98 | 92 | 105 | 28 |
| Mamba-FETrack [90] | 78 | 62 | 120 | 32 |
| MambaVT [91] | 82 | 58 | 135 | 35 |
| MambaVision [88] | 75 | 55 | 145 | 38 |
| VideoMamba [92] | 85 | 65 | 125 | 33 |
| Mamba-FETrack-S | 18.8 | 14.2 | 185 | 48 |
| MambaVT-S | 19.2 | 15.5 | 172 | 45 |
| VideoMamba-S | 18.5 | 13.8 | 168 | 42 |
| S4D-S | 18.6 | 13.5 | 205 | 55 |
| LF-SSM-S | 18.5 | 12.8 | 320 | 69 |
| LF-SSM-M | 35.2 | 24.5 | 265 | 52 |
| LF-SSM-L | 52.8 | 38.2 | 210 | 41 |
| No. | Component | UAV123 | VisDrone | ARDMAV | LaSOT | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TSP | ISS | Gate | AUC | P | AUC | P | AUC | P | AUC | P | ||
| #1 | 68.5 | 87.8 | 64.5 | 83.2 | 62.2 | 80.5 | 71.2 | 80.5 | 77.2 | |||
| #2 | ✓ | 70.2 | 89.2 | 65.8 | 84.5 | 63.8 | 82.2 | 72.5 | 82.2 | 79.5 | ||
| #3 | ✓ | 69.5 | 88.5 | 65.2 | 84.2 | 63.2 | 81.5 | 72.2 | 81.5 | 78.8 | ||
| #4 | ✓ | 69.8 | 88.8 | 65.5 | 84.5 | 63.5 | 81.8 | 72.5 | 81.8 | 79.2 | ||
| #5 | ✓ | ✓ | 71.5 | 90.5 | 67.2 | 86.2 | 65.5 | 84.2 | 74.2 | 83.8 | 81.8 | |
| #6 | ✓ | ✓ | 71.2 | 90.2 | 66.8 | 85.8 | 65.2 | 83.8 | 73.8 | 83.5 | 81.2 | |
| #7 | ✓ | ✓ | 70.8 | 89.8 | 66.5 | 85.5 | 64.8 | 83.5 | 73.5 | 83.2 | 80.8 | |
| #Ours | ✓ | ✓ | ✓ | 73.2 | 92.5 | 69.2 | 88.5 | 67.8 | 86.2 | 75.8 | 85.5 | 84.2 |
| SSM Variant | AUC | P | Params (M) | FPS |
|---|---|---|---|---|
| S4 [66] | 67.5 | 87.2 | 45.2 | 25 |
| S4D [67] | 68.2 | 87.8 | 42.8 | 32 |
| S5 [68] | 68.8 | 88.2 | 48.5 | 22 |
| Mamba [19] | 69.5 | 89.2 | 52.8 | 38 |
| Euclidean + Norm | 70.2 | 89.8 | 52.8 | 68 |
| Linear-RNN (Orthogonal) | 68.5 | 87.8 | 52.5 | 32 |
| GRU | 70.2 | 89.5 | 53.2 | 28 |
| LF-SSM (Ours) | 73.2 | 92.5 | 52.8 | 69 |
| State Update Mechanism | AUC | P | AUC |
|---|---|---|---|
| Euclidean Linear () | 68.5 | 87.8 | – |
| Diagonal SSM (S4D-style) | 69.2 | 88.5 | +0.7 |
| Euclidean + Normalization | 70.2 | 89.8 | +1.7 |
| Geodesic (Ours) | 73.2 | 92.5 | +4.7 |
| Operation | GSM (ms) | Mamba (ms) | S4D (ms) |
|---|---|---|---|
| Input Projection | 0.35 | 0.38 | 0.36 |
| Tangent Proj./Discretization | 0.08 | 0.42 | 0.35 |
| State Update/Selective Scan | 0.22 | 1.38 | 0.55 |
| Output Projection | 0.53 | 0.58 | 0.62 |
| Trigonometric/Exponential Ops | 0.02 | 0.04 | 0.02 |
| Total | 1.20 | 2.80 | 1.90 |
| Method | TRR ↑ | Entropy ↓ | Gradient ↑ |
|---|---|---|---|
| Linear-RNN (Orthogonal) | 2.92 | 5.05 | 0.45 |
| S4D | 2.89 | 4.82 | 0.48 |
| GRU | 3.05 | 4.42 | 0.55 |
| Mamba | 3.15 | 4.58 | 0.52 |
| LF-SSM (Ours) | 4.82 | 3.21 | 0.89 |
| State Dimension N | AUC | P | Params (M) | FPS |
|---|---|---|---|---|
| 70.2 | 89.5 | 48.5 | 265 | |
| 71.8 | 91.2 | 50.2 | 248 | |
| 73.2 | 92.5 | 52.8 | 210 | |
| 73.5 | 92.8 | 58.5 | 175 | |
| 73.6 | 92.9 | 72.2 | 138 |
| Step Size Configuration | AUC | P | Params (M) | FPS |
|---|---|---|---|---|
| Fixed | 71.5 | 90.8 | 52.8 | 215 |
| Fixed | 72.3 | 91.5 | 52.8 | 212 |
| Fixed | 72.8 | 91.8 | 52.8 | 210 |
| Fixed | 71.8 | 90.5 | 52.8 | 208 |
| Fixed | 69.2 | 88.2 | 52.8 | 205 |
| Adaptive (Ours) | 73.2 | 92.5 | 52.8 | 210 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, T.; Xu, X.; Qiu, S.; Sheng, C.; Wang, D.; Tian, H.; Yu, J. LF-SSM: Lightweight HiPPO-Free State Space Model for Real-Time UAV Tracking. Drones 2026, 10, 102. https://doi.org/10.3390/drones10020102
Wang T, Xu X, Qiu S, Sheng C, Wang D, Tian H, Yu J. LF-SSM: Lightweight HiPPO-Free State Space Model for Real-Time UAV Tracking. Drones. 2026; 10(2):102. https://doi.org/10.3390/drones10020102
Chicago/Turabian StyleWang, Tianyu, Xinghua Xu, Shaohua Qiu, Changchong Sheng, Di Wang, Hui Tian, and Jiawei Yu. 2026. "LF-SSM: Lightweight HiPPO-Free State Space Model for Real-Time UAV Tracking" Drones 10, no. 2: 102. https://doi.org/10.3390/drones10020102
APA StyleWang, T., Xu, X., Qiu, S., Sheng, C., Wang, D., Tian, H., & Yu, J. (2026). LF-SSM: Lightweight HiPPO-Free State Space Model for Real-Time UAV Tracking. Drones, 10(2), 102. https://doi.org/10.3390/drones10020102

