IMM-DeepSort: An Adaptive Multi-Model Kalman Framework for Robust Multi-Fish Tracking in Underwater Environments
Abstract
1. Introduction
2. Materials and Method
2.1. Theoretical Foundation
2.2. Overall Architecture
2.3. YOLOv8n-Based Object Detection Module
2.4. Interacting Multiple Model Kalman Filter (IMM-KF)
2.4.1. Motivation and Model Selection Basis
- Cruising:
- Burst Swimming:
2.4.2. Implementation of the Proposed IMM-KF
- A. Model Formulation
- Constant Velocity (CV) Model: This model assumes the target moves with a nearly constant velocity between frames. Its state vector is defined as:
- Constant Acceleration (CA) Model: This model accounts for motions involving acceleration, such as burst swimming. Its state vector is defined as:
- B. Workflow and Variable Definitions
- : Estimated state vector.
- : Error covariance matrix, representing the uncertainty in the state estimate.
- : State transition matrix, which models how the state evolves from one time step to the next without external input.
- : Process noise covariance, representing the uncertainty in the motion model itself.
- : Measurement (observation) from the detector, i.e., the bounding box.
- : Observation matrix, which maps the state vector into the measurement space.
- : Measurement noise covariance, representing the uncertainty in the detector’s outputs.
- : Kalman gain, which determines the weight given to the new measurement versus the prediction.
- Model Interaction/Mixing: Prior to prediction, the state estimates and covariances from the previous time step for both models (j = CV, CA) are mixed. This interaction is governed by a pre-defined Markov transition probability matrix, which models the likelihood of switching between the cruising and burst swimming states.
- Parallel Model-Conditional Prediction: Each model (CV and CA) performs a standard Kalman prediction step using its mixed input state.
- 3.
- Model-Conditional Update: Upon receiving a new measurement (from the YOLOv8n detector), each model independently updates its prediction.
- C. Model Interaction and Fusion Mechanism
- Interaction (Mixing): At the start of each cycle, the estimates from all models are mixed based on their previous probabilities and a pre-defined Markov transition matrix. This provides a mixed initial state for each filter, allowing them to “consider” the other model’s previous estimate.
- Fusion: After the individual update steps, the model likelihood is computed based on how well the model’s prediction matched the actual measurement. This likelihood is used to update the model’s posterior probability .
- D. Complete IMM Algorithmic Flow and Parameter Adaptation Mechanism
- Input Interaction: Probabilistic mixing of previous state estimates using Markov transition matrix.
- Parallel Prediction: Independent state projection by CV and CA models.
- Detection Update: Refinement of model predictions using YOLOv8n bounding box measurements.
- Adaptive Fusion: Dynamic weighting of model outputs based on performance metrics.
- Position noise: Scaled with target height to accommodate size-dependent uncertainty.
- Velocity/acceleration noise: Balanced to prevent overfitting while maintaining responsiveness.
- Innovation-based likelihood computation: Models are evaluated by their prediction accuracy against YOLOv8n detections.
- Bayesian probability propagation: Combines current performance with historical behavior patterns.
- Real-time weight adjustment: Automatically emphasizes the model best explaining observed motion.
- Biological motivation informed initial parameter ranges based on fish ethology studies.
- Empirical refinement optimized parameters through extensive testing on underwater video sequences.
- Validation ensured parameter robustness across diverse swimming behaviors and environmental conditions.
2.4.3. Advantages and Integration
- Biologically Informed Interpretability: The CV and CA models are explicitly aligned with empirically observed fish behaviors, enhancing the physical relevance of the tracking system.
- Dynamic Adaptability: The probabilistic fusion mechanism allows for seamless and automatic switching between motion hypotheses, improving resilience to abrupt behavioral changes.
- Enhanced Robustness: Under challenging conditions, the IMM-KF suppresses the effect of outliers through model blending, leading to smoother trajectories and reduced identity switches [34].
2.5. Experimental Materials and Setup
2.5.1. Dataset Description
2.5.2. Evaluation Metrics
2.5.3. Experimental Setup
3. Results
3.1. Tracking Performance Comparison
3.2. Ablation Study on Motion Models
3.3. Robustness Evaluation Under Severe Occlusion
4. Discussion
4.1. Interpretation of Key Findings
4.2. Practical Implications and Performance Trade-Offs
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Peng, Y.T.; Cao, K.; Cosman, P.C. Generalization of the dark channel prior for single image restoration. IEEE Trans. Image Process. 2018, 27, 2856–2868. [Google Scholar] [CrossRef] [PubMed]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: New York, NY, USA, 2012; pp. 3354–3361. [Google Scholar]
- Li, C.Y.; Guo, J.C.; Cong, R.M.; Pang, Y.W.; Wang, B. Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior. IEEE Trans. Image Process. 2016, 25, 5664–5677. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Elmezain, M.; Saoud, L.S.; Sultan, A.; Heshmat, M.; Seneviratne, L.; Hussain, I. Advancing underwater vision: A survey of deep learning models for underwater object recognition and tracking. IEEE Access 2025, 13, 17830–17867. [Google Scholar] [CrossRef]
- Kaur, R.; Singh, S. A comprehensive review of object detection with deep learning. Digit. Signal Process. 2023, 132, 103812. [Google Scholar] [CrossRef]
- Wang, Z.; Zheng, L.; Liu, Y.; Li, Y.; Wang, S. Towards real-time multi-object tracking. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 107–122. [Google Scholar]
- Wang, G.; Song, M.; Hwang, J.N. Recent advances in embedding methods for multi-object tracking: A survey. arXiv 2022, arXiv:2205.10766. [Google Scholar]
- Aziz, L.; Salam, M.S.B.H.; Sheikh, U.U.; Ayub, S. Exploring deep learning-based architecture, strategies, applications and current trends in generic object detection: A comprehensive review. IEEE Access 2020, 8, 170461–170495. [Google Scholar] [CrossRef]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; IEEE: New York, NY, USA, 2017; pp. 3645–3649. [Google Scholar]
- Butail, S.; Paley, D.A. Three-dimensional reconstruction of the fast-start swimming kinematics of densely schooling fish. J. R. Soc. Interface 2012, 9, 77–88. [Google Scholar] [CrossRef]
- François, B. Physical Aspects of Fish Locomotion: An Experimental Study of Intermittent Swimming and Pair Interaction. Ph.D. Thesis, Université Paris Cité, Paris, France, 2021. [Google Scholar]
- Zhang, W.; Zhou, H.; Sun, S.; Wang, Z.; Shi, J.; Loy, C.C. Robust multi-modality multi-object tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2365–2374. [Google Scholar]
- Chen, C.; Tay, C.; Laugier, C.; Mekhnacha, K. Dynamic environment modeling with gridmap: A multiple-object tracking application. In Proceedings of the 2006 9th International Conference on Control, Automation, Robotics and Vision, Singapore, 5–8 December 2006; IEEE: New York, NY, USA, 2006; pp. 1–6. [Google Scholar]
- ElTobgui, R.M.F. Visual Perception of Underwater Robotic Swarms. Master’s Thesis, Khalifa University of Science, Abu Dhabi, United Arab Emirates, 2024. [Google Scholar]
- Bochinski, E.; Eiselein, V.; Sikora, T. High-speed tracking-by-detection without using image information. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; IEEE: New York, NY, USA, 2017; pp. 1–6. [Google Scholar]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer Nature Switzerland: Cham, Switzerland, 2022; pp. 1–21. [Google Scholar]
- Yu, C.H.; Choi, J.W. Interacting multiple model filter-based distributed target tracking algorithm in underwater wireless sensor networks. Int. J. Control Autom. Syst. 2014, 12, 618–627. [Google Scholar] [CrossRef]
- Umar, M.; Ahmad, Z.; Ullah, S.; Saleem, F.; Siddique, M.F.; Kim, J.M. Advanced Fault Diagnosis in Milling Machines Using Acoustic Emission and Transfer Learning. IEEE Access 2025, 13, 100776–100790. [Google Scholar] [CrossRef]
- Siddique, M.F.; Ullah, S.; Kim, J.M. A Deep Learning Approach for Fault Diagnosis in Centrifugal Pumps through Wavelet Coherent Analysis and S-Transform Scalograms with CNN-KAN. Comput. Mater. Contin. 2025, 84, 3577–3603. [Google Scholar] [CrossRef]
- Li, X.R.; Jilkov, V.P. Survey of maneuvering target tracking. Part V. Multiple-model methods. IEEE Trans. Aerosp. Electron. Syst. 2005, 41, 1255–1321. [Google Scholar] [CrossRef]
- Blom, H.A.P.; Bar-Shalom, Y. The interacting multiple model algorithm for systems with Markovian switching coefficients. IEEE Trans. Autom. Control. 2002, 33, 780–783. [Google Scholar] [CrossRef]
- Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. Strongsort: Make deepsort great again. IEEE Trans. Multimed. 2023, 25, 8725–8737. [Google Scholar] [CrossRef]
- Zhang, C.; Liu, L.; Huang, G.; Wen, H.; Zhou, X.; Wang, Y. Webuot-1m: Advancing deep underwater object tracking with a million-scale benchmark. Adv. Neural Inf. Process. Syst. 2024, 37, 50152–50167. [Google Scholar]
- Mazor, E.; Averbuch, A.; Bar-Shalom, Y.; Dayan, J. Interacting multiple model methods in target tracking: A survey. IEEE Trans. Aerosp. Electron. Syst. 2002, 34, 103–123. [Google Scholar] [CrossRef]
- Bar-Shalom, Y.; Li, X.R.; Kirubarajan, T. Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software; John Wiley & Sons: Hoboken, NJ, USA, 2001. [Google Scholar]
- Milan, A.; Leal-Taixé, L.; Reid, I.; Roth, S.; Schindler, K. MOT16: A benchmark for multiobject tracking. arXiv 2016, arXiv:1603.00831. [Google Scholar] [CrossRef]
- Hao, Z.; Qiu, J.; Zhang, H.; Ren, G.; Liu, C. UMOTMA: Underwater multiple object tracking with memory aggregation. Front. Mar. Sci. 2022, 9, 1071618. [Google Scholar] [CrossRef]
- Lin, K.; Guo, Z.; Yang, F.; Huang, J.; Zhang, Y. Kalman filter-based multi-object tracking algorithm by collaborative multi-feature. In Proceedings of the 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 25–26 March 2017; IEEE: New York, NY, USA, 2017; pp. 1239–1244. [Google Scholar]
- Gwak, J. Multi-object tracking through learning relational appearance features and motion patterns. Comput. Vis. Image Underst. 2017, 162, 103–115. [Google Scholar] [CrossRef]
- Bukey, C.M.; Kulkarni, S.V.; Chavan, R.A. Multi-object tracking using Kalman filter and particle filter. In Proceedings of the 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), Chennai, India, 21–22 September 2017; IEEE: New York, NY, USA, 2017; pp. 1688–1692. [Google Scholar]
- Fan, L.; Wang, Z.; Cail, B.; Tao, C.; Zhang, Z.; Wang, Y.; Li, S.; Huang, F.; Fu, S.; Zhang, F. A survey on multiple object tracking algorithm. In Proceedings of the 2016 IEEE International Conference on Information and Automation (ICIA), Ningbo, China, 1–3 August 2016; IEEE: New York, NY, USA, 2016; pp. 1855–1862. [Google Scholar]
- Hassan, S.; Mujtaba, G.; Rajput, A.; Fatima, N. Multi-object tracking: A systematic literature review. Multimed. Tools Appl. 2024, 83, 43439–43492. [Google Scholar] [CrossRef]
- Sudderth, E.B. Graphical Models for Visual Object Recognition and Tracking. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2006. [Google Scholar]
- Li, W.; Li, F.; Li, Z. CMFTNet: Multiple fish tracking based on counterpoised JointNet. Comput. Electron. Agric. 2022, 198, 107018. [Google Scholar] [CrossRef]
- Ristic, B.; Vo, B.N.; Clark, D.; Vo, B.T. A metric for performance evaluation of multi-target tracking algorithms. IEEE Trans. Signal Process. 2011, 59, 3452–3457. [Google Scholar] [CrossRef]
- Pal, S.K.; Pramanik, A.; Maiti, J.; Mitra, P. Deep learning in multi-object detection and tracking: State of the art. Appl. Intell. 2021, 51, 6400–6429. [Google Scholar] [CrossRef]
- Zhao, Z.Q.; Zheng, P.; Xu, S.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [PubMed]
- Dardagan, N.; Brđanin, A.; Džigal, D.; Akagić, A. Multiple object trackers in opencv: A benchmark. In Proceedings of the 2021 IEEE 30th International Symposium on Industrial Electronics (ISIE), Kyoto, Japan, 20–23 June 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar]









| Configuration | Parameter |
|---|---|
| Operating system | Ubuntu 22.04 |
| Accelerated environment | CUDA 12.1 |
| Language | Python 3.12 |
| Framework | PyTorch 2.4.0 |
| GPU | NVIDIA GeForce RTX A16 |
| Model | Precision | Recall | mAP@0.5 | mAP@0.5:0.95 | AP (Small) | Prams (M) |
|---|---|---|---|---|---|---|
| yolov5 | 0.83 | 0.81 | 0.84 | 0.53 | 0.54 | 2.65 |
| yolov7 | 0.87 | 0.87 | 0.91 | 0.60 | 0.61 | 5.92 |
| EfficientDet | 0.90 | 0.87 | 0.89 | 0.63 | 0.62 | 3.95 |
| yolov8s | 0.91 | 0.90 | 0.91 | 0.65 | 0.64 | 10.97 |
| yolov8n | 0.92 | 0.92 | 0.92 | 0.69 | 0.67 | 3.15 |
| Model | MOTA | MOTP | IDF1 | FP | FN | ID Switches |
|---|---|---|---|---|---|---|
| SORT | 50.4 | 70.2 | 46.3 | 138 | 123 | 108 |
| OC-SORT | 51.8 | 65.4 | 57.0 | 68 | 119 | 37 |
| DeepSORT | 49.0 | 71.3 | 75.9 | 103 | 104 | 32 |
| ByteTrack | 58.8 | 71.7 | 55.5 | 20 | 112 | 65 |
| IMM-DeepSORT | 62.2 | 72.6 | 77.9 | 15 | 109 | 16 |
| Model | MOTA | MOTP | IDF1 | FP | FN | FPS |
|---|---|---|---|---|---|---|
| DeepSORT + CV | 49.0 | 71.3 | 75.9 | 103 | 104 | 50.5 |
| DeepSORT + CA | 53.9 | 74.8 | 65.3 | 95 | 106 | 51.7 |
| IMM-DeepSORT | 62.2 | 72.6 | 77.9 | 15 | 109 | 54.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, Y.; Li, Y.; Li, S. IMM-DeepSort: An Adaptive Multi-Model Kalman Framework for Robust Multi-Fish Tracking in Underwater Environments. Fishes 2025, 10, 592. https://doi.org/10.3390/fishes10110592
Yu Y, Li Y, Li S. IMM-DeepSort: An Adaptive Multi-Model Kalman Framework for Robust Multi-Fish Tracking in Underwater Environments. Fishes. 2025; 10(11):592. https://doi.org/10.3390/fishes10110592
Chicago/Turabian StyleYu, Ying, Yan Li, and Shuo Li. 2025. "IMM-DeepSort: An Adaptive Multi-Model Kalman Framework for Robust Multi-Fish Tracking in Underwater Environments" Fishes 10, no. 11: 592. https://doi.org/10.3390/fishes10110592
APA StyleYu, Y., Li, Y., & Li, S. (2025). IMM-DeepSort: An Adaptive Multi-Model Kalman Framework for Robust Multi-Fish Tracking in Underwater Environments. Fishes, 10(11), 592. https://doi.org/10.3390/fishes10110592

