A Stable Multi-Object Tracking Method for Unstable and Irregular Maritime Environments

Han, Young-Suk; Jung, Jae-Yoon

doi:10.3390/jmse12122252

Open AccessArticle

A Stable Multi-Object Tracking Method for Unstable and Irregular Maritime Environments

by

Young-Suk Han

¹

and

Jae-Yoon Jung

^1,2,*

¹

Department of Big Data Analytics, Kyung Hee University, Yongin 17104, Republic of Korea

²

Department of Industrial and Management Systems Engineering, Kyung Hee University, Yongin 17104, Republic of Korea

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(12), 2252; https://doi.org/10.3390/jmse12122252

Submission received: 11 November 2024 / Revised: 30 November 2024 / Accepted: 5 December 2024 / Published: 7 December 2024

(This article belongs to the Special Issue Unmanned Marine Vehicles: Navigation, Control and Sensing)

Download

Browse Figures

Versions Notes

Abstract

In this study, an improved stable multi-object simple online and real-time tracking (StableSORT) algorithm that was specifically designed for maritime environments was proposed to address challenges such as camera instability and irregular object motion. Specifically, StableSORT integrates a buffered IoU (B-IoU) and an observation-adaptive Kalman filter (OAKF) into the StrongSORT framework to improve tracking accuracy and robustness. A dataset was collected along the southern coast of Korea using a small autonomous surface vehicle to capture real-world maritime conditions. On this dataset, StableSORT achieved a 2.7% improvement in HOTA, 4.9% in AssA, and 2.6% in IDF1 compared to StrongSORT, and it significantly outperformed ByteTrack and OC-SORT by 84% and 69% in HOTA, respectively. These results underscore StableSORT’s ability to maintain identity consistency and enhance tracking performance under challenging maritime conditions. The ablation studies further validated the contributions of the B-IoU and OAKF modules in maintaining identity consistency and tracking accuracy under challenging maritime conditions.

Keywords:

multiple object tracking; autonomous surface vehicles; deep-learning-based ship tracking

1. Introduction

Misjudgments of the positions and actions of nearby ships are a major cause of maritime traffic accidents [1]. Autonomous surface vehicles (ASVs) have emerged as a critical research area to address this challenge and enhance navigational safety in the domain of marine automation technology. Achieving a reliable perception of the surrounding environment, particularly tracking nearby ships, is essential for navigation [2]. Automated ship-tracking mechanisms are playing an increasingly important role in analyzing the behavior of nearby vessels, thus contributing to improved marine surveillance and ship situational awareness systems [3].

Stable ship tracking remains a challenging task [4,5]. Difficulties arise primarily from environmental factors that affect the quality of the captured video frames. Onboard cameras are frequently affected by shaking from waves and wind, which lowers frame alignment and causes motion blur [6]. These issues degrade the performance of the tracking algorithms, making it challenging to maintain reliable object trajectories over time. Such challenges are more severe in small ASVs, which are particularly susceptible to irregular acceleration, sharp turns, and nonlinear object motion because of their smaller size and lower stability. Consequently, tracking failures occur more frequently, adding to tracking difficulty in maritime environments.

Two critical challenges were identified by Yang et al. [7] for such environments. First, the detection and tracking of identical objects may fail to overlap between consecutive frames, leading to tracking failures. Second, temporarily occluded objects may continue to update their geometric features inaccurately over multiple frames, resulting in missed matches when they reappear. Although appearance-based matching can mitigate these problems by leveraging visual similarities across frames, the presence of similar objects and ambiguous visual features in maritime environments diminishes the effectiveness of such techniques.

To address these limitations, this study proposes an improved algorithm for stable simple online and real-time tracking (StableSORT), which is an extension of StrongSORT [8]. StrongSORT combines appearance- and motion-based features to improve object association across frames, and StableSORT enhances this framework to specifically address challenges in maritime environments. StableSORT integrates buffered intersection over union (B-IoU) [7] and observation-adaptive Kalman filter (OAKF) to enable robust matching, even under camera instability and irregular object motion. B-IoU extends the matching space across consecutive frames, overcoming the limitations of traditional IoU-based algorithms. The OAKF enhances trajectory prediction by adjusting the state update process according to detection confidence, thereby ensuring that high-confidence detections significantly impact state estimation while mitigating the effect of low-confidence inputs through adaptive noise scaling.

StableSORT was validated using a dataset of onboard perspectives collected using a small ASV. The results demonstrate that StableSORT can achieve superior tracking performance compared with state-of-the-art algorithms such as OC-SORT [9] and StrongSORT [8].

The contributions of this study are two-folded.

The improved algorithm, StableSORT, enhances robustness and stability in multi-object tracking by B-IoU and OAKF into the StrongSORT framework. These enhancements address camera instability and irregular object motion in maritime environments, ensuring more accurate tracking performance.
A real-world dataset was collected using small ASVs under challenging maritime conditions. This dataset was used to validate StableSORT’s performance against state-of-the-art algorithms, demonstrating improvements in key metrics, including HOTA, AssA, and IDF1.

The remainder of this paper is organized as follows: Section 2 reviews related work on object-tracking algorithms and deep-learning-based approaches for ship tracking. Section 3 outlines the proposed method, StableSORT, which focuses on the StrongSORT-based tracking algorithm with B-IoU and OAKF. Section 4 describes the experimental setup, including the dataset and evaluation metrics. Section 5 presents experimental results. Finally, Section 6 concludes the paper and suggests future research directions.

2. Related Work

2.1. Object Tracking

Object tracking is a critical task in computer vision and is a key technology in autonomous driving [10]. The primary goal of tracking is to estimate the state (e.g., position and size) of a target object in subsequent frames of a video, given its initial state in an earlier frame [11]. Object tracking can be broadly categorized into single object tracking (SOT) and multi-object tracking (MOT). SOT attempts to localize and track an unknown object described only by its location in the first frame, whereas MOT assumes object detection as prior knowledge [12].

SOT methods focus on training models for effectively distinguishing a single target from its surrounding background [13]. For example, Yang et al. [14] improved tracking accuracy through a sample squeezing method to refine training data and statistic-based losses to enhance feature robustness during online training, while Qi et al. [15] enhanced bounding box precision and occlusion handling using CNN-based methods. MOT can be viewed as a generalized version of the SOT problem, where the locations of multiple targets are estimated by deploying multiple SOT models [16]. However, multiple independent SOT models were found to not be ideal for resolving occlusions and interactions between different ships [3]. Consequently, the focus of this study is on MOT within the context of ship tracking.

In MOT, two primary paradigms exist for initializing and maintaining object trajectories: tracking by detection (TBD) and Detection-Free Tracking (DFT) [17]. TBD uses pre-trained object detectors to automatically initialize objects in each frame and link detections into trajectories [18]. By contrast, DFT requires manual initialization of a fixed number of objects in the first frame and localizes these objects in subsequent frames without relying on object detectors. DFT methods, such as the approach proposed by Qi et al. [19], employ a histogram-of-gradient (HOG)-based SVM classifier to identify candidate objects and refine tracking boundaries using segmentation algorithms. However, as noted in this study, DFT struggles to handle dynamically appearing or disappearing objects, making it less suitable for dynamic environments. By contrast, TBD automatically initializes and manages object trajectories, making it the dominant paradigm in modern MOT [17].

TBD methods generally involve extracting motion- and/or appearance-based features, followed by the execution of bipartite graph matching to associate objects across frames [8]. Early approaches in TBD included SORT [20], which was designed for high-speed tracking using a combination of Kalman filtering and the Hungarian algorithm. DeepSORT [21] extended this approach by incorporating deep-learning-based appearance features, significantly improving the tracking performance in more complex scenarios. However, both SORT and DeepSORT underperform compared with state-of-the-art methods because they rely on conventional tracking techniques that are less effective in complex scenarios.

Recent advancements in the field include motion-based models such as ByteTrack [1] and OC-SORT [9], which emphasize improved motion-based modeling for greater accuracy in challenging environments. Additionally, appearance-based models, such as StrongSORT [8], have been developed to enhance tracking accuracy using refined appearance features.

In maritime environments, however, MOT presents unique challenges. Frequent shaking from waves and wind reduces frame correlation and disrupts object trajectories [5,6]. Low frame rates, often below 5 fps, are common in ASV videos due to bandwidth and computational limitations, further complicating object association by reducing temporal continuity [4]. Nonlinear object motions, such as those of speedboats, make trajectory prediction more difficult [3]. Additionally, objects observed at a distance often appear small and blurred with ambiguous or identical visual features [4].

To address these issues, B-IoU has been proposed to expand the matching space, effectively handling irregular motions and ambiguous appearances [7]. Similarly, correcting Kalman filter estimation using detection results has been introduced to address large deviations caused by nonlinear motion, leveraging the superior accuracy of modern object detectors over Kalman filter predictions [4]. Building on these advancements, this study integrates B-IoU and OAKF into StableSORT, enhancing tracking robustness and accuracy under challenging maritime conditions.

2.2. Deep Learning-Based Ship Tracking

Recent advancements in deep learning have contributed to MOT in maritime environments. Park et al. [22] evaluated various you only look once (YOLO)-based object detection models using the Singapore maritime dataset and introduced an IoU-based tracking approach. Han et al. [23] combined a single shot multibox detector (SSD) with an extended Kalman Filter to improve ship detection and tracking. Lee et al. [24] developed a ship-sensing system using YOLOv3 for detection and a Kalman Filter for tracking, achieving reliable results under stable conditions.

Liu et al. [25] enhanced DeepSORT with scale-invariant feature transform (SIFT) to improve tracking under occlusions. Ding and Weng [26] combined YOLOv5 with DeepSORT to enhance automatic identification system (AIS) data. Zhou et al. [27] proposed a ship tracking and speed extraction framework in hazy weather conditions, utilizing YOLOv5 and DeepSORT to improve performance in low-visibility maritime environments. Guo et al. [28] integrated B-IoU with an improved YOLOv7 to stabilize the tracking of nonlinearly moving objects on the sea surface.

Although these studies have improved tracking accuracy, challenges with camera instability and irregular object motion remain unresolved. The limitations of existing deep-learning-based MOT models include accommodating shaking from waves and wind as well as irregular movements of small ASVs in maritime environments. To address these gaps, this study proposes StableSORT, an improved tracking algorithm based on StrongSORT, that integrates B-IoU and OAKF to enhance robustness in unstable maritime conditions.

3. Proposed Method

This section introduces StableSORT, an improved StrongSORT-based algorithm designed for robust multi-object tracking in maritime environments characterized by camera instability and irregular object motion. StableSORT incorporates two key improvements to address the specific challenges of maritime environments: B-IoU for improved matching under sudden motion and OAKF for adaptive state updates based on detection confidence. These enhancements were specifically designed to address issues such as camera instability and irregular object motion.

3.1. Overview of StrongSORT

Within tracking-by-detection MOT, motion models that rely on motion-based features are widely adopted for object association. Algorithms like ByteTrack and OC-SORT utilize this approach, employing Kalman filters under constant-velocity assumptions to predict object motion across frames. While effective in scenarios with stable and predictable motion patterns, these motion-based methods exhibit significant limitations in handling environments characterized by sudden accelerations, abrupt turns, or irregular object trajectories, often leading to failed associations [7].

By contrast, StrongSORT combines motion- and appearance-based features, enabling more reliable tracking in environments with occlusions, appearance ambiguities, and irregular object motions. By incorporating both types of features, StrongSORT improves its ability to maintain object associations in challenging conditions. A core component of StrongSORT is its cost matrix C [8], which combines appearance and motion costs to achieve robust object associations. The cost is calculated as follows:

C = λ A_{a} + (1 - λ) A_{m}

(1)

A_a denotes appearance cost, which quantifies the visual similarity between detected objects across frames. This cost is derived from the deep feature embeddings extracted by a neural network, ensuring that objects with similar visual features are more likely to be associated.
A_m denotes motion cost, which estimates the positional consistency of objects between consecutive frames. This cost relies on the predictions generated by a Kalman filter, to forecast the location of each object based on its previous trajectory.
λ is a weighting factor that adjusts the effect of appearance and motion information.

To further enhance robustness, StrongSORT employs a two-stage data association process. In the first stage, objects are matched using the cost matrix C, and the Hungarian algorithm is applied to find the optimal associations that minimize the overall cost. This ensures reliable matching under typical tracking conditions.

If objects fail to associate in the initial stage due to insufficient appearance or motion similarity, StrongSORT performs a secondary matching process based on IoU. This re-matching step evaluates positional overlap to resolve associations for objects with temporary appearance ambiguities or inconsistent motion patterns, reducing the likelihood of tracking failures.

3.2. Buffered Intersection over Union

The B-IoU [7] was designed to overcome the limitations of conventional IoU-based matching in environments where sudden irregular motion disrupts object tracking. Unlike the standard IoU, which strictly evaluates the overlap between detections and tracklets, B-IoU introduces a buffer around each bounding box to extend the matching space and enhance tolerance to positional variations. For a given detection box

o = (x, y, w, h)

, the buffered version

o_{b}

[7] can be expressed as follows:

o_{b} = (x - b * w, y - b * h, w + 2 b * w, h + 2 b * h)

(2)

where b is the buffer scale that modulates the expansion of the matching area. This expanded space preserves associations, even with significant shifts due to environmental conditions, such as camera movement from waves and wind or sudden changes in object speed.

Using B-IoU in a cascaded matching scheme, tracking performance can be further optimized. The process begins with a smaller buffer to address slight deviations and applies a larger buffer, if necessary, to accommodate larger positional shifts. This hierarchical strategy balances precision and robustness, reducing false matches and ensuring continuity of objects.

3.3. Observation-Adaptive Kalman Filter

The OAKF enhances the noise scale adaptive Kalman (NSA Kalman) algorithm [29] by introducing an observation-adaptive adjustment that dynamically modifies measurement noise covariance based on detection confidence, which is the confidence value of the object detection algorithm embedded in the object tracking model. This approach is particularly important in maritime environments, where detection reliability can vary significantly due to factors such as camera instability. By adapting measurement noise based on detection confidence, OAKF ensures that high-confidence detections have a greater impact on the state update, treating low-confidence detections with caution by increasing noise to reduce their influence.

In OAKF, observation-adaptive adjustment extends the measurement noise covariance concept originally proposed for NSA Kalman. In the NSA Kalman method, the measurement noise covariance

{\tilde{R}}_{k}

[29] is calculated as follows:

{\tilde{R}}_{k} = (1 - c) R_{k}

(3)

where R_k is the preset constant measurement noise covariance, and c is the detection confidence score.

In OAKF, this confidence-based adjustment is applied with an additional confidence threshold

θ_{c}

to better manage the effect of varying confidence levels.

High confidence ( $c \geq θ_{c}$ ): When the confidence level meets or exceeds the threshold, the detection is fully trusted, and $c$ is set to 1.0, resulting in ${\tilde{R}}_{k} = 0$ . This removes any additional measurement noise, allowing the detection to exert maximum influence on the state update.
Low confidence ( $c < θ_{c}$ ): When the confidence level is below the threshold, the measurement noise covariance ${\tilde{R}}_{k}$ increases according to (3), effectively scaling up the noise and thereby reducing the influence of the detection on the state update.

3.4. StableSORT: Integrating B-IoU and OAKF to StrongSORT

StableSORT enhances the StrongSORT framework by integrating it with B-IoU and OAKF, thereby improving the ability to handle challenges specific to maritime environments. As shown in Figure 1, the flow begins with object detection, followed by vanilla matching, which uses a combined appearance-motion cost to associate detected objects across frames.

In the original StrongSORT, standard IoU matching is typically used for association; however, in this study, it was replaced by a cascade B-IoU matching process. This cascade approach begins with a small buffer for initial B-IoU matching to handle minor deviations. If further matching flexibility is required, a larger buffer is applied, allowing the matching space to expand progressively, which helps maintain associations even under significant positional shifts.

Following the matching stage, the Kalman filter prediction step estimates the future state of each tracklet based on historical data and provides an initial forecast for the next frame. This prediction is then refined using the OAKF update, which dynamically adjusts the measurement noise covariance based on detection confidence. High-confidence detections have a greater influence on state estimation, whereas low-confidence detections are downweighted, thus enhancing tracking stability under varying conditions.

The outcome of this process is a robust and stable object-tracking solution. The integration of B-IoU and OAKF into the StableSORT framework is particularly effective in maritime environments, where challenges such as camera instability and irregular object motion frequently arise. Specifically, B-IoU improves object association by expanding the matching space across frames, handling large positional shifts. OAKF enhances trajectory updates by dynamically adjusting measurement noise based on detection confidence, prioritizing reliable detections. These enhancements address the limitations of StrongSORT and contribute to the improved tracking accuracy demonstrated in Section 5.

Since StableSORT extends the StrongSORT framework, its computational complexity also remains

O (n^{3})

, determined by the Hungarian algorithm used for data association. The additions of B-IoU and OAKF do not significantly affect this complexity, ensuring that the algorithm maintains computational feasibility while achieving improved robustness and stability.

4. Experimental Setup

4.1. Dataset

The dataset used in this study was collected to reflect real-world maritime conditions and the challenges inherent to ship tracking. Data were collected using a small ASV along the southern coast of Korea to capture diverse environmental scenarios. Table 1 summarizes the specifications of the dataset. The training set comprised 77 video sequences with 19,985 frames annotated with 10,681 ground truth boxes and 185 distinct trajectories. The test set included seven sequences, totaling 1944 frames with 3604 ground truth (GT) boxes and 38 trajectories.

In particular, the test sequences were recorded while the ASV was moving at a high speed, which caused camera instability and irregular object motion, posing significant challenges for tracking algorithms in maritime environments. Figure 2 shows examples of objects with irregular motion from the test sequences.

4.2. Evaluation Measures

Detection performance was evaluated using precision, recall, and mean average precision (

m A P

). Precision is defined as

P = T P / (T P + F P)

, where

T P

represents the number of true positives, and

F P

represents the number of false positives. This measure reflects the accuracy of positive predictions made by the detection model. Recall measures the ability of the detection model to identify all relevant instances in the dataset and is defined as

R = T P / (T P + F N)

, where

F N

is the number of false negatives.

Average precision (

A P

) was calculated by integrating the precision–recall curve

A P = \int_{0}^{1} P (r) d r,

where

P (r)

represents the precision at a given level of recall

r

. Because this study focused on a single object class,

m A P

was simplified to

A P

. Integrating the precision–recall curve captures the tradeoff between precision and recall across different thresholds, making

m A P

a comprehensive measure for evaluating detection performance.

For tracking performance, several key measures were employed to provide a comprehensive evaluation. Higher-order tracking accuracy (HOTA) [30] was selected as the primary measure because of its balanced approach in assessing both detection and association accuracy. HOTA is defined as follows:

H O T A = \frac{\sum_{c \in \{T P\}} A (c)}{T P + F N + F P}

(4)

where

A (c)

is the association score for each correctly tracked target

c

. HOTA provides an improved trade-off between detection and association performance, making it a powerful measure in modern MOT evaluations.

Multiple object tracking accuracy (MOTA) [31] quantifies tracking reliability by considering errors from false positives, false negatives, and identity (ID) switches as follows:

M O T A = 1 - \frac{\sum_{t} (F N_{t} + F P_{t} + I D_{S W_{t}})}{\sum_{t} (G T_{t})}

(5)

where

I D_{S W_{t}}

is the number of ID switches at time

t

, and

G T_{t}

is the total number of ground truth objects at time

t

. MOTA provides a comprehensive overview but may emphasize detection performance more than association accuracy.

Association accuracy (AssA) [30] specifically measures the tracker’s ability to maintain correct object associations across frames. This measure is crucial for understanding performance in scenarios involving occlusions and complex object interactions as follows:

A s s A = \frac{\sum_{c \in \{T P\}} A (c)}{T P}

(6)

The identity F1 score (IDF1) [32] evaluates the consistency of the identity assignment, balancing identity precision (

I D P

) and identity recall (

I D R

). It is calculated as follows:

IDF1 = \frac{T P_{i d}}{T P_{i d} + 0.5 F P_{i d} + 0.5 F N_{i d}}

(7)

where

T P_{i d}

is the number of correctly matched object IDs;

F P_{i d}

is the number of false positive matches; and

F N_{i d}

is the number of missed matches. This measure is essential in assessing how well the tracker maintains a stable object identity over time.

These measures were used to evaluate the detection performance of the input detector and to comprehensively assess the tracking performance of StableSORT and other comparison algorithms in this study. The results of these evaluations are presented in Section 5.

5. Experimental Results

All the experiments were conducted using a system equipped with Intel Xeon E5-2603 v4 CPU and 62 GB of RAM running CentOS Linux 7.4. The software environment consisted of Python 3.8 with relevant libraries for multi-object tracking and evaluation.

5.1. Detection Results

The detection algorithm in this study utilized YOLOv5, which is recognized for its efficiency and high accuracy in object detection tasks. Three versions of the YOLOv5 model, namely, YOLOv5m, YOLOv5l, and YOLOv5x, were evaluated to identify the most suitable model for ship detection in maritime environments. These models differ primarily in terms of the number of parameters, which affect their complexity and performance. YOLOv5m has fewer parameters, making it lightweight and fast, whereas YOLOv5x has the highest number of parameters, offering improved accuracy at the expense of increased computational demand.

As shown in Table 2, YOLOv5x demonstrated the highest detection performance among the evaluated models, achieving a precision of 0.914, recall of 0.844, mAP@0.5 of 0.905, and mAP@0.5:0.95 of 0.608. This result underscores its capability to detect and localize ships with high accuracy across various challenging maritime conditions, making it the most suitable model for this study.

Figure 3 illustrates the training results of the YOLOv5x model, displaying the training and validation losses, as well as precision, recall, and mAP over epochs. The consistent decline in training and validation losses indicates effective learning with minimal overfitting. In addition, the steady improvement in precision and recall demonstrates the robustness of the YOLOv5x model during the training process.

5.2. Tracking Results

To evaluate the effectiveness of StableSORT, its tracking performance was compared with that of three state-of-the-art tracking methods: ByteTrack, OC-SORT, and StrongSORT. Each of these algorithms has unique characteristics. ByteTrack and OC-SORT are motion-based models that prioritize computational efficiency and reliable motion estimation, whereas StrongSORT integrates motion and appearance features to improve association accuracy. StableSORT builds on StrongSORT by integrating B-IoU and OAKF, which are specifically designed to address challenges such as camera instability and irregular object motion in maritime environments.

For the experiments, the parameters used in StableSORT were as follows: the buffer scale (

b

) for B-IoU was set to 0.3 for small buffers and 0.5 for large buffers to effectively handle varying positional shifts. The additional confidence threshold (

θ_{c}

) in OAKF was set to 0.7, ensuring that only high-confidence detections significantly influenced state estimation.

Table 3 lists the tracking performance measures across various test sequences using the evaluation measures described in Section 4.2. HOTA was chosen as the primary measure because of its balanced assessment of detection and association accuracy. StableSORT outperformed all the other algorithms, achieving the highest scores across individual test sequences and highest overall average scores, with a HOTA of 67.701, an IDF1 of 81.034, and an AssA of 68.417.

ByteTrack and OC-SORT scored lower than StrongSORT and StableSORT across all evaluation measures. These results suggest that motion-based models such as ByteTrack and OC-SORT may be less suited to maritime environments where robustness is essential under challenging conditions. Although StrongSORT showed substantial improvements over ByteTrack and OC-SORT, achieving higher scores overall, it still faced difficulties in handling severe camera shakes and irregular object motions in maritime environments. StableSORT, which uses B-IoU and OAKF, surpassed StrongSORT in addressing these specific challenges more effectively. These improvements provide greater robustness against camera instability and irregular object motion, resulting in superior tracking stability and association accuracy compared to StrongSORT.

To further illustrate the tracking performance of StableSORT compared to StrongSORT, selected examples from Sequences 2 and 5 are presented in Figure 4. These examples demonstrate specific cases of camera shaking on a small ASV that led to ID switches, effectively highlighting the key differences between the two algorithms. In Sequence 2, StableSORT maintained consistent tracking IDs during irregular motion, whereas StrongSORT encountered ID switching. Similarly, in Sequence 5, the enhanced stability of StableSORT reduced ID changes under camera instability, demonstrating improved tracking precision compared to StrongSORT.

5.3. Ablation Study

To further validate the contributions of individual components to StableSORT, ablation experiments were conducted by incrementally adding the B-IoU and OAKF modules. As listed in Table 4, the basic StrongSORT algorithm served as the foundation for the experiments.

The addition of the B-IoU module to StrongSORT further enhanced the results. HOTA improved to 67.479, AssA increased to 68.333, and IDF1 reached 80.805, indicating that B-IoU effectively improves matching capability across frames, addressing the challenges posed by irregular object motion and camera instability.

By adding the OAKF module to StrongSORT, HOTA increased from 65.920 to 66.113, reflecting an enhancement in tracking accuracy, whereas MOTA improved from 77.295 to 77.347. AssA and IDF1 also exhibited modest gains, rising to 65.529 and 79.444, respectively. This demonstrates the ability of the OAKF module to handle detection confidence variability, thereby enhancing tracking robustness.

The combination of OAKF and B-IoU modules achieved the most significant performance gains, with HOTA reaching 67.701, MOTA at 77.184, AssA improving to 68.417, and IDF1 increasing to 81.034. These results illustrate the complementary effects of the OAKF and B-IoU modules, providing StableSORT with superior adaptability and accuracy in challenging maritime environments.

6. Conclusions and Future Work

This study proposes an improved multi-object tracking method, StableSORT, specifically designed to address the challenges of maritime environments, such as camera instability and irregular object motion. StableSORT integrates B-IoU and OAKF into the StrongSORT framework. B-IoU expands the matching space to handle large positional shifts caused by environmental factors, while OAKF dynamically adjusts trajectory predictions by incorporating detection confidence to improve tracking stability.

In the experimental results using a dataset collected from real-world maritime environments, StableSORT demonstrated superior performance compared to state-of-the-art algorithms, including ByteTrack, OC-SORT, and StrongSORT. It achieved the highest scores across key metrics, with a HOTA of 67.701 (2.7% improvement), an AssA of 68.417 (4.9% improvement), and an IDF1 of 81.034 (2.6% improvement) compared to StrongSORT and significantly outperformed ByteTrack and OC-SORT by 84% and 69% in HOTA, respectively. These results underscore the effectiveness of StableSORT in maintaining identity consistency and enhancing tracking accuracy under challenging maritime conditions.

Future research will focus on integrating more advanced appearance modeling and deep-learning-based association strategies to further enhance the robustness and adaptability of the tracker. Additionally, the validation will be expanded to include a larger number of video sequences and more diverse datasets, such as those covering night-time, adverse weather conditions, and various benchmarks, to ensure a more comprehensive evaluation of the proposed method’s generalization ability. Exploring real-time processing capabilities and hardware optimization is also essential for the effective deployment of tracking systems in practical maritime applications.

Author Contributions

Conceptualization, writing—original draft preparation, and methodology, Y.-S.H.; funding acquisition and writing—review and editing, J.-Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a Korea Research Institute for Defense Technology (KRIT) grant funded by the Korean government (DAPA) (V220023) and partly supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. RS-2022-00155911 and Artificial Intelligence Convergence Innovation Human Resources Development (Kyung Hee University)).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 1–21. [Google Scholar] [CrossRef]
Shan, Y.; Liu, S.; Zhang, Y.; Jing, M.; Xu, H. LMD-TShip^⋆: Vision based large-scale maritime ship tracking benchmark for autonomous navigation applications. IEEE Access 2021, 9, 74370–74384. [Google Scholar] [CrossRef]
Zhang, W.; He, X.; Li, W.; Zhang, Z.; Luo, Y.; Su, L.; Wang, P. A robust deep affinity network for multiple ship tracking. IEEE Trans. Instrum. Meas. 2021, 70, 2508920. [Google Scholar] [CrossRef]
Liang, Z.; Xiao, G.; Hu, J.; Wang, J.; Ding, C. MotionTrack: Rethinking the motion cue for multiple object tracking in USV videos. Vis. Comput. 2024, 40, 2761–2773. [Google Scholar] [CrossRef]
Prasad, D.K.; Dong, H.; Rajan, D.; Quek, C. Are object detection assessment criteria ready for maritime computer vision? IEEE Trans. Intell. Transp. Syst. 2019, 21, 5295–5304. [Google Scholar] [CrossRef]
Fefilatyev, S.; Goldgof, D.; Shreve, M.; Lembke, C. Detection and tracking of ships in open sea with rapidly moving buoy-mounted camera system. Ocean Eng. 2012, 54, 1–12. [Google Scholar] [CrossRef]
Yang, F.; Odashima, S.; Masui, S.; Jiang, S. Hard to track objects with irregular motions and similar appearances? make it easier by buffering the matching space. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 4799–4808. [Google Scholar] [CrossRef]
Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. Strongsort: Make deepsort great again. IEEE Trans. Multimed. 2023, 25, 8725–8737. [Google Scholar] [CrossRef]
Cao, J.; Pang, J.; Weng, X.; Khirodkar, R.; Kitani, K. Observation-centric sort: Rethinking sort for robust multi-object tracking. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 9686–9696. [Google Scholar] [CrossRef]
Cao, J.; Zhang, H.; Jin, L.; Lv, J.; Hou, G.; Zhang, C. A review of object tracking methods: From general field to autonomous vehicles. Neurocomputing 2024, 585, 127635. [Google Scholar] [CrossRef]
Wu, Y.; Lim, J.; Yang, M.H. Online object tracking: A benchmark. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2411–2418. [Google Scholar] [CrossRef]
Karunasekera, H.; Wang, H.; Zhang, H. Multiple object tracking with attention to appearance, structure, motion and size. IEEE Access 2019, 7, 104423–104434. [Google Scholar] [CrossRef]
Zheng, L.; Tang, M.; Chen, Y.; Zhu, G.; Wang, J.; Lu, H. Improving multiple object tracking with single object tracking. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2453–2462. [Google Scholar] [CrossRef]
Yang, Y.; Li, G.; Qi, Y.; Huang, Q. Release the power of online-training for robust visual tracking. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2020; Volume 34, pp. 12645–12652. [Google Scholar] [CrossRef]
Qi, Y.; Qin, L.; Zhang, S.; Huang, Q.; Yao, H. Robust visual tracking via scale-and-state-awareness. Neurocomputing 2019, 329, 75–85. [Google Scholar] [CrossRef]
Chu, P.; Fan, H.; Tan, C.C.; Ling, H. Online multi-object tracking with instance-aware tracker and dynamic model refreshment. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 7–11 January 2019; pp. 161–170. [Google Scholar] [CrossRef]
Luo, W.; Xing, J.; Milan, A.; Zhang, X.; Liu, W.; Kim, T.K. Multiple object tracking: A literature review. Artif. Intell. 2021, 293, 103448. [Google Scholar] [CrossRef]
Stanojevic, V.D.; Todorovic, B.T. BoostTrack: Boosting the similarity measure and detection confidence for improved multiple object tracking. Mach. Vis. Appl. 2024, 35, 123. [Google Scholar] [CrossRef]
Qi, Y.; Yao, H.; Sun, X.; Zhang, Y.; Huang, Q. Structure-aware multi-object discovery for weakly supervised tracking. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 466–470. [Google Scholar] [CrossRef]
Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar] [CrossRef]
Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar] [CrossRef]
Park, H.; Ham, S.H.; Kim, T.; An, D. Object recognition and tracking in moving videos for maritime autonomous surface ships. J. Mar. Sci. Eng. 2022, 10, 841. [Google Scholar] [CrossRef]
Han, J.; Cho, Y.; Kim, J.; Kim, J.; Son, N.S.; Kim, S.Y. Autonomous collision detection and avoidance for ARAGON USV: Development and field tests. J. Field Robot. 2020, 37, 987–1002. [Google Scholar] [CrossRef]
Lee, W.J.; Roh, M.I.; Lee, H.W.; Ha, J.; Cho, Y.M.; Lee, S.J.; Son, N.S. Detection and tracking for the awareness of surroundings of a ship based on deep learning. J. Comput. Des. Eng. 2021, 8, 1407–1430. [Google Scholar] [CrossRef]
Liu, Y.; Liu, Y.; Zhong, Z.; Chen, Y.; Xia, J.; Chen, Y. Depth tracking of occluded ships based on SIFT feature matching. KSII Trans. Internet Inf. Syst. 2023, 17, 1066–1079. [Google Scholar] [CrossRef]
Ding, H.; Weng, J. A robust assessment of inland waterway collision risk based on AIS and visual data fusion. Ocean Eng. 2024, 307, 118242. [Google Scholar] [CrossRef]
Zhou, Z.; Zhao, J.; Chen, X.; Chen, Y. A ship tracking and speed extraction framework in hazy weather based on deep learning. J. Mar. Sci. Eng. 2023, 11, 1353. [Google Scholar] [CrossRef]
Guo, Y.; Shen, Q.; Ai, D.; Wang, H.; Zhang, S.; Wang, X. Sea-IoUTracker: A more stable and reliable maritime target tracking scheme for unmanned vessel platforms. Ocean Eng. 2024, 299, 117243. [Google Scholar] [CrossRef]
Du, Y.; Wan, J.; Zhao, Y.; Zhang, B.; Tong, Z.; Dong, J. Giaotracker: A comprehensive framework for MCMOT with global information and optimizing strategies in VisDrone 2021. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; pp. 2809–2819. [Google Scholar] [CrossRef]
Luiten, J.; Osep, A.; Dendorfer, P.; Torr, P.; Geiger, A.; Leal-Taixé, L.; Leibe, B. Hota: A higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 2021, 129, 548–578. [Google Scholar] [CrossRef]
Bernardin, K.; Stiefelhagen, R. Evaluating multiple object tracking performance: The clear mot metrics. EURASIP J. Image Video Process. 2008, 2008, 246309. [Google Scholar] [CrossRef]
Ristani, E.; Solera, F.; Zou, R.; Cucchiara, R.; Tomasi, C. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–10, 15–16 October 2016; pp. 17–35. [Google Scholar] [CrossRef]

Figure 1. Flow diagram of StableSORT based on StrongSORT with B-IoU and OAKF enhancements.

Figure 2. Examples of objects with unstable and irregular motion from the test sequences. The green line shows the trace of the object in the image.

Figure 3. YOLOv5x model training and validation results over epochs.

Figure 4. Examples of tracking results for StrongSORT and StableSORT.

Table 1. Summary of the dataset utilized in the experiments.

Dataset	Sequences	Length	GT Boxes	Trajectories
Train	77	19,985	10,681	185
Test	7	1944	3604	38

Table 2. Detection performance of YOLOv5 models. Bold indicates best performance.

Model	Precision	Recall	mAP0.5	mAP0.5:0.95
YOLOv5m	0.886	0.759	0.848	0.553
YOLOv5l	0.907	0.808	0.882	0.582
YOLOv5x	0.914	0.844	0.905	0.608

Table 3. Tracking performance of different algorithms across various test sequences.

Test Sequence	Tracking Method	HOTA	MOTA	AssA	IDF1
Sequence 1 (Length 273)	ByteTrack	15.082	17.204	10.580	25.210
	OC-SORT	13.655	12.903	5.388	15.152
	StrongSORT	62.334	74.194	62.334	85.366
	StableSORT	63.325	74.194	63.325	85.542
Sequence 2 (Length 293)	ByteTrack	36.579	31.319	38.633	43.041
	OC-SORT	50.197	36.081	51.538	49.948
	StrongSORT	50.878	53.480	51.949	53.112
	StableSORT	54.638	51.465	59.075	58.824
Sequence 3 (Length 271)	ByteTrack	20.576	39.786	11.688	23.349
	OC-SORT	19.908	34.917	8.0646	16.286
	StrongSORT	57.543	72.447	59.798	76.003
	StableSORT	58.42	72.447	60.581	76.003
Sequence 4 (Length 275)	ByteTrack	31.654	54.487	20.800	41.901
	OC-SORT	29.032	49.519	12.574	25.509
	StrongSORT	67.611	86.378	62.634	76.818
	StableSORT	68.167	87.179	63.63	77.92
Sequence 5 (Length 270)	ByteTrack	54.830	68.571	51.177	73.697
	OC-SORT	44.864	59.560	26.909	40.670
	StrongSORT	73.414	84.176	67.14	79.206
	StableSORT	79.914	84.615	79.945	86.814
Sequence 6 (Length 271)	ByteTrack	62.513	63.127	65.323	81.049
	OC-SORT	74.242	83.776	74.202	87.770
	StrongSORT	75.306	83.776	76.983	89.224
	StableSORT	75.157	83.776	76.768	89.224
Sequence 7 (Length 291)	ByteTrack	36.209	55.738	24.591	40.945
	OC-SORT	48.033	48.087	33.213	52.199
	StrongSORT	74.353	86.612	75.574	92.909
	StableSORT	74.288	86.612	75.598	92.909
Overall Average	ByteTrack	36.778	47.176	31.827	47.027
	OC-SORT	39.990	46.406	30.270	41.076
	StrongSORT	65.920	77.295	65.202	78.948
	StableSORT	67.701	77.184	68.417	81.034

Table 4. Ablation study on components of the proposed method.

Tracking Method	HOTA	MOTA	AssA	IDF1
StrongSORT	65.920	77.295	65.202	78.948
StrongSORT+B-IoU	67.479	76.605	68.333	80.805
StrongSORT+OAKF	66.113	77.347	65.529	79.444
StrongSORT+B-IoU+OAKF (StableSORT)	67.701	77.184	68.417	81.034

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, Y.-S.; Jung, J.-Y. A Stable Multi-Object Tracking Method for Unstable and Irregular Maritime Environments. J. Mar. Sci. Eng. 2024, 12, 2252. https://doi.org/10.3390/jmse12122252

AMA Style

Han Y-S, Jung J-Y. A Stable Multi-Object Tracking Method for Unstable and Irregular Maritime Environments. Journal of Marine Science and Engineering. 2024; 12(12):2252. https://doi.org/10.3390/jmse12122252

Chicago/Turabian Style

Han, Young-Suk, and Jae-Yoon Jung. 2024. "A Stable Multi-Object Tracking Method for Unstable and Irregular Maritime Environments" Journal of Marine Science and Engineering 12, no. 12: 2252. https://doi.org/10.3390/jmse12122252

APA Style

Han, Y.-S., & Jung, J.-Y. (2024). A Stable Multi-Object Tracking Method for Unstable and Irregular Maritime Environments. Journal of Marine Science and Engineering, 12(12), 2252. https://doi.org/10.3390/jmse12122252

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Stable Multi-Object Tracking Method for Unstable and Irregular Maritime Environments

Abstract

1. Introduction

2. Related Work

2.1. Object Tracking

2.2. Deep Learning-Based Ship Tracking

3. Proposed Method

3.1. Overview of StrongSORT

3.2. Buffered Intersection over Union

3.3. Observation-Adaptive Kalman Filter

3.4. StableSORT: Integrating B-IoU and OAKF to StrongSORT

4. Experimental Setup

4.1. Dataset

4.2. Evaluation Measures

5. Experimental Results

5.1. Detection Results

5.2. Tracking Results

5.3. Ablation Study

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI