MDPI - Publisher of Open Access Journals

29 pages, 9422 KB

Open AccessArticle

Context-Aware Identity Prediction for Anti-UAV Multi-Object Tracking in Remote Sensing Videos

by Bin Li, Tianyi Hu, Wenbo Wu and Jianming Hu

Remote Sens. 2026, 18(13), 2084; https://doi.org/10.3390/rs18132084 (registering DOI) - 25 Jun 2026

Anti-UAV multi-object tracking in remote sensing videos is challenging because UAV targets are small, weakly textured, and often affected by cluttered backgrounds, abrupt motion, occlusion, and intermittent visibility. To address these challenges, we formulate anti-UAV multi-object tracking as a context-aware identity prediction task, [...] Read more.

Anti-UAV multi-object tracking in remote sensing videos is challenging because UAV targets are small, weakly textured, and often affected by cluttered backgrounds, abrupt motion, occlusion, and intermittent visibility. To address these challenges, we formulate anti-UAV multi-object tracking as a context-aware identity prediction task, in which target identities and locations are inferred from historical trajectory priors instead of current-frame observations alone. Under this formulation, we propose a dual-track parallel tracking framework. The adaptive identity disambiguation (AID) module combines motion cues with appearance features according to their estimated reliability, improving short-term association when visual evidence is weak. In parallel, the motion-evolution temporal memory (METM) module models trajectory dynamics using motion anomaly detection and time-decayed memory, enabling spatiotemporal recovery after occlusion, temporary disappearance, or abrupt motion. The outputs of the two branches are integrated by a unified identity decision layer to produce stable tracking results. Experiments are conducted on the public 4th Anti-UAV Benchmark Track-3 and our newly constructed Anti-UAV Multi-Object Tracking dataset, AU-MOT. On the 4th Anti-UAV Benchmark Track-3, our method achieves 63.6% HOTA and 64.1% IDF1, outperforming the strongest competing method by 3.5% and 3.9%, respectively, while reducing identity switches and track fragments by 20.8% and 23.8%. On AU-MOT, it achieves 67.2% HOTA and 67.8% IDF1, with 20.2% fewer identity switches and 22.3% fewer track fragments. These results demonstrate its effectiveness under long-range observation, weak target appearance, cluttered backgrounds, abrupt motion, and intermittent target visibility. Full article

(This article belongs to the Special Issue Object Detection in Remote Sensing Images Based on Artificial Intelligence)

► Show Figures

Figure 1

18 pages, 2559 KB

Open AccessArticle

They Might Be Stalking Me: Edge-Based Multi-Object Tracking and Temporal Risk Modeling for Wearable Stalking Detection

by Aimoerfu, Yun Pan, Chunfang Li and Yao Deng

Electronics 2026, 15(12), 2657; https://doi.org/10.3390/electronics15122657 - 15 Jun 2026

Viewed by 244

Abstract

Computer vision (CV) has significantly advanced in object detection and multi-object tracking; however, its application to modeling safety-critical social behaviors for blind and low-vision (BLV) individuals remains limited. In particular, sustained behaviors such as stalking—characterized by persistent proximity and trajectory consistency—have not been [...] Read more.

Computer vision (CV) has significantly advanced in object detection and multi-object tracking; however, its application to modeling safety-critical social behaviors for blind and low-vision (BLV) individuals remains limited. In particular, sustained behaviors such as stalking—characterized by persistent proximity and trajectory consistency—have not been systematically addressed within wearable assistive systems. To investigate this gap, we first conducted a formative user study combining semi-structured interviews and behavioral observations to identify safety concerns and wearable design requirements among BLV participants. The findings reveal recurring concerns regarding prolonged following behaviors and highlight the importance of privacy-preserving, socially unobtrusive device configurations. Guided by these insights, we develop a shoulder-slung wearable system integrating dual-camera sensing with an edge-based vision processing pipeline. We reformulate stalking detection as a temporal behavioral persistence problem built upon multi-object tracking (MOT). Leveraging FairMOT for identity-preserving tracking and monocular depth estimation for spatial modeling, we introduce an online temporal persistence-based risk scoring mechanism that accumulates proximity and directional consistency over time. The complete pipeline operates in real time on an embedded platform without cloud dependency. By bridging user-centered design and behavior-oriented visual inference, this work demonstrates how MOT outputs can be extended beyond identity preservation to support temporally coherent safety assessment in wearable assistive contexts. Full article

(This article belongs to the Special Issue Deep/Machine Learning in Visual Recognition and Anomaly Detection)

► Show Figures

Figure 1

29 pages, 3529 KB

Open AccessArticle

TrackRefine: A Plug-and-Play Decoupled Enhancement Framework for Online Multi-Object Tracking and Segmentation

by Longfei Qie, Chunlei Chai, Ruixue Wang, Chao Bi, Ruiqi Ma, Aijun Zhang and Jiakui Tang

Sensors 2026, 26(12), 3696; https://doi.org/10.3390/s26123696 - 10 Jun 2026

Viewed by 239

Abstract

Multi-object tracking and segmentation (MOTS) aims to jointly perform pixel-level instance segmentation and temporal identity association for multiple objects in video sequences. Existing online decoupled MOTS methods face several challenges in complex scenarios, including limited front-end mask quality, corruption of memory representations under [...] Read more.

Multi-object tracking and segmentation (MOTS) aims to jointly perform pixel-level instance segmentation and temporal identity association for multiple objects in video sequences. Existing online decoupled MOTS methods face several challenges in complex scenarios, including limited front-end mask quality, corruption of memory representations under prolonged occlusion, and unstable data association and trajectory recovery. To address these limitations, we propose TrackRefine, a plug-and-play decoupled enhancement framework. TrackRefine enhances overall performance through back-end refinement without modifying the architecture of the front-end instance segmenter or relying on additional end-to-end joint training. Specifically, we introduce a lightweight Fast GrabCut-based mask refinement module to optimize mask boundaries, a multimodal long-short-term memory bank that integrates appearance, semantic, and shape cues for identity modeling, and a progressive three-stage association strategy for stable matching and long-term trajectory recovery. Experimental results on MOTS20 show that TrackRefine achieves 69.4 sMOTSA, 82.7 MOTSA, and 478 Frag. Experimental results on KITTI MOTS show that it achieves 62.4/73.7 sMOTSA and 78.0/85.4 MOTSA for pedestrians and cars, respectively. Extensive experiments with different front-end instance segmenters verify its plug-and-play flexibility and decoupled design, while ablation studies confirm the effectiveness of each core module. These results show that TrackRefine provides an efficient and practical solution for online MOTS in complex scenarios. Full article

(This article belongs to the Special Issue Smart Remote Sensing Images Processing for Sensor-Based Applications)

► Show Figures

Figure 1

37 pages, 45876 KB

Open AccessArticle

Hierarchical Multi-Prototype Appearance Memory: A Plug-and-Play Module for Identity-Stable Online Multi-Object Tracking

by Wenning Zhang, Mintao Liu, Yangjie Cao, Jihao Cai, Chao Wang, Huili Xia and Kunming Xu

Electronics 2026, 15(11), 2357; https://doi.org/10.3390/electronics15112357 - 29 May 2026

Viewed by 282

Abstract

Online multi-object tracking (MOT) aims to maintain consistent target identities across video frames, yet it remains vulnerable to identity switches under occlusion and appearance variation. Many existing trackers rely on single-prototype exponential moving average (EMA) memory, which is efficient but prone to contamination, [...] Read more.

Online multi-object tracking (MOT) aims to maintain consistent target identities across video frames, yet it remains vulnerable to identity switches under occlusion and appearance variation. Many existing trackers rely on single-prototype exponential moving average (EMA) memory, which is efficient but prone to contamination, over-smoothing, and staleness. To address this issue, we propose Hierarchical Multi-Prototype Appearance Memory (HMP), a plug-and-play module for online MOT. HMP separates stable long-term identity anchors from short-term transitional evidence through a multi-prototype long-term memory and a short first-in-first-out (FIFO) queue. A unified joint reliability score governs memory writing and maintenance, and a frozen two-stage association strategy first performs stable primary matching and then allows conservative short-term recovery only on residual cases. Experiments on MOT17 and MOT20 show that HMP improves identity continuity while preserving competitive overall tracking quality. Controlled ablations further support the effectiveness of the proposed memory representation, reliability control, and staged evidence usage under fixed upstream modules. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision)

► Show Figures

Figure 1

21 pages, 4214 KB

Open AccessArticle

AttriMOT: Semantic-Aware Multimodal 3D Multi-Object Tracking with Attribute-Level Alignment

by Youlin Liu, Mohammad Faidzul Nasrudin and Zainal Rasyid Mahayuddin

Symmetry 2026, 18(6), 907; https://doi.org/10.3390/sym18060907 - 26 May 2026

Viewed by 506

Abstract

3D multi-object tracking (MOT) in complex and dynamic environments remains challenging due to the time-varying reliability of sensor modalities, severe occlusions, and the difficulty of distinguishing instances with similar appearances. Existing methods mainly rely on coarse category-level semantics or heuristic multimodal fusion strategies, [...] Read more.

3D multi-object tracking (MOT) in complex and dynamic environments remains challenging due to the time-varying reliability of sensor modalities, severe occlusions, and the difficulty of distinguishing instances with similar appearances. Existing methods mainly rely on coarse category-level semantics or heuristic multimodal fusion strategies, which limits fine-grained instance discrimination and leads to unstable trajectory association under complex scenarios. Moreover, current 3D MOT frameworks generally lack the ability to leverage attribute-level semantic information for robust tracking and semantic-aware target retrieval. To address these limitations, we propose AttriMOT, a semantic-aware multimodal 3D MOT framework. Specifically, a category semantic anchoring and competition suppression mechanism is introduced to preserve discriminative fine-grained attribute information among visually similar instances. An attribute-level multimodal alignment module establishes structured correspondences across 3D geometry, 2D appearance, and textual semantics, enabling robust cross-modal representation learning. Furthermore, a parameter-free adaptive confidence fusion strategy dynamically balances LiDAR- and camera-derived trajectory confidence to improve tracking stability under varying environmental conditions. In addition, a semantic-aware trajectory selector is designed to support text-specified target retrieval and trajectory locking, enabling controllable semantic-guided 3D tracking. Extensive experiments on challenging 3D MOT benchmarks demonstrate that AttriMOT consistently outperforms state-of-the-art methods in tracking accuracy and robustness. In particular, AttriMOT achieves 1.33% improvement in HOTA and 0.54% improvement in MOTA compared with the best existing method, while also providing enhanced semantic controllability and text-guided tracking capability. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

16 pages, 58544 KB

Open AccessArticle

D3SSTrack: Center-Focused State-Space Modeling for Monocular 3D Multi-Object Tracking

by Darius-Ovidiu Firan and Călin-Adrian Popa

Mathematics 2026, 14(10), 1737; https://doi.org/10.3390/math14101737 - 18 May 2026

Viewed by 223

Abstract

Monocular 3D multi-object tracking (3D MOT) remains challenging because it is hard to model how objects move over time and to keep correct identities without explicit depth information. In this context, we introduce D3SSTrack, a novel tracking-by-detection framework that integrates Mamba state-space modeling [...] Read more.

Monocular 3D multi-object tracking (3D MOT) remains challenging because it is hard to model how objects move over time and to keep correct identities without explicit depth information. In this context, we introduce D3SSTrack, a novel tracking-by-detection framework that integrates Mamba state-space modeling into the 3D tracking pipeline. At its core is the Solid State Track (SST) block, which extends the original Mamba block with dropout regularization and an additional projection layer to improve feature integration before temporal fusion. This design enables efficient modeling of long-range temporal dependencies while maintaining real-time performance at 38 FPS on a single GPU. The proposed tracker combines structured sequence modeling with effective temporal association, improving robustness against occlusions and abrupt motion changes. On the KITTI benchmark, D3SSTrack achieves the best sAMOTA (97.12%) and AMOTA (49.95%) among recent monocular 3D MOT methods, outperforming the best model S3MOT by 0.16% and 0.22%, respectively. Our results highlight the potential of state space-based architectures for real-world monocular 3D MOT applications. Full article

(This article belongs to the Special Issue Advanced Methods and Applications with Deep Learning in Object Recognition)

► Show Figures

Figure 1

21 pages, 2140 KB

Open AccessArticle

Adaptive Multi-Level 3D Multi-Object Tracking with Transformer-Based Association and Scene-Aware Thresholds for Autonomous Driving

by Yongze Zhang, Feipeng Da and Haocheng Zhou

Machines 2026, 14(5), 472; https://doi.org/10.3390/machines14050472 - 23 Apr 2026

Viewed by 357

Abstract

3D multi-object tracking (MOT) for autonomous driving remains challenging due to frequent identity switches in crowded scenes, trajectory fragmentation during occlusions, and the difficulty of adapting association strategies to varying scene complexities. While existing methods rely on fixed geometric or appearance-based associations, they [...] Read more.

3D multi-object tracking (MOT) for autonomous driving remains challenging due to frequent identity switches in crowded scenes, trajectory fragmentation during occlusions, and the difficulty of adapting association strategies to varying scene complexities. While existing methods rely on fixed geometric or appearance-based associations, they struggle to handle ambiguous cases and detection failures. We present an adaptive multi-level 3D MOT framework that achieves robust tracking through three key innovations: (1) multi-granularity temporal modeling that captures both fine-grained short-term motion and coarse long-term trends via dual-scale spatio-temporal attention, enabling accurate motion prediction across different object dynamics; (2) Transformer-based Appearance Association that employs cross-attention to model global inter-object relationships, resolving ambiguous associations in crowded scenarios where geometric cues alone fail; and (3) scene-adaptive learned thresholds that automatically adjust association strictness based on object density, motion complexity, and occlusion levels, avoiding the one-size-fits-all limitations of fixed thresholds. Our hierarchical four-level tracking strategy progressively handles cases from easy geometric matching (Level 1) to complex interval-frame recovery (Level 4), with SOT-based virtual detection generation bridging detector failures. Extensive experiments on the nuScenes benchmark demonstrate state-of-the-art performance. Full article

(This article belongs to the Section Vehicle Engineering)

► Show Figures

Figure 1

20 pages, 7422 KB

Open AccessArticle

MAAT: A Marine-Aware Adaptive Tracker for Robust and Real-Time Multi-Object Tracking in Maritime Environments

by Xinjie Han, Qi Han, Yunsheng Fan and Dongdong Mu

J. Mar. Sci. Eng. 2026, 14(8), 738; https://doi.org/10.3390/jmse14080738 - 16 Apr 2026

Viewed by 507

Abstract

Multi-object tracking (MOT) is a key technology for enabling autonomous navigation of unmanned surface vehicle (USV) as it provides continuous perception of surrounding maritime targets and supports navigation decision-making. However, videos acquired on maritime platforms typically suffer from challenges such as platform-induced jitter [...] Read more.

Multi-object tracking (MOT) is a key technology for enabling autonomous navigation of unmanned surface vehicle (USV) as it provides continuous perception of surrounding maritime targets and supports navigation decision-making. However, videos acquired on maritime platforms typically suffer from challenges such as platform-induced jitter and nonlinear object motion, which significantly degrade tracking performance. To address these challenges, this paper builds upon ByteTrack by incorporating an adaptive Kalman filtering scheme and proposing a density-aware association strategy, resulting in a novel tracker termed the Marine-Aware Adaptive Tracker (MAAT). Specifically, an adaptive Kalman filter is introduced to increase the contribution of high-confidence detections during the state update process, thereby enhancing the stability and robustness of state estimation. Furthermore, to better mitigate the frequent identity switches caused by severe platform jitter from the USV observation platform, a density-aware association strategy is proposed. This strategy dynamically adjusts the composition of the cost matrix according to the density of high-confidence targets, enabling more reliable data association under varying scene conditions. Finally, the proposed tracking algorithm is evaluated against several state-of-the-art methods on the Singapore Maritime Dataset. It achieves competitive performance, attaining 44.37 MOTA and 43.857 IDF1. Moreover, MAAT operates in real time, running at 41.4 FPS. The experimental results demonstrate that MAAT is capable of performing accurate and real-time multi-object tracking in dynamic maritime environments with surface fluctuations, thereby providing effective technical support for intelligent maritime surveillance applications. Full article

(This article belongs to the Special Issue New Technologies in Autonomous Ship Navigation)

► Show Figures

Figure 1

23 pages, 3504 KB

Open AccessArticle

Spatially Time-Based Robust Tracking and Re-Identification of Kindergarten Students: A Hybrid Deep Learning Framework Combining YOLOv8n and Vision Transformer (ViT)

by Md. Rahatul Islam, Yui Kataoka, Keisuke Teramoto and Keiichi Horio

J. Imaging 2026, 12(4), 150; https://doi.org/10.3390/jimaging12040150 - 30 Mar 2026

Viewed by 977

Abstract

Detection, tracking, and re-identification (ReID) of children wearing similar uniforms in a kindergarten environment is a very complex challenge for computer vision. Traditional surveillance systems or simple convolutional neural network (CNN) models often fail to distinguish children in crowds and occlusions. To address [...] Read more.

Detection, tracking, and re-identification (ReID) of children wearing similar uniforms in a kindergarten environment is a very complex challenge for computer vision. Traditional surveillance systems or simple convolutional neural network (CNN) models often fail to distinguish children in crowds and occlusions. To address this challenge, this study proposes a novel hybrid framework combining YOLOv8 and Vision Transformer (ViT). Using YOLOv8 for detection and ViT for global feature extraction, we trained the model on a custom dataset of 31,521 images, achieving an overall accuracy of 93.75%, and the public benchmark MOT20 dataset of 28,630 images, achieving an overall accuracy of 96.02%. Our system showed remarkable success in tracking performance, where it achieved 86.7% MOTA and 99.7% IDF1 scores. This high IDF1 score proves that the model is highly effective in preventing identity switch. The main novelty of this study is the behavioral analysis of children beyond the boundaries of surveillance, where we measure walking distance and trajectory, and screen time. Finally, through cross-dataset comparison with the MOT20 public benchmark, we demonstrated that our proposed customized model is much more effective than current state-of-the-art methods in overcoming the domain gap in specific environments such as kindergarten. Full article

(This article belongs to the Special Issue Recent Progress and Challenges in Computer Vision and Machine Learning)

► Show Figures

Figure 1

28 pages, 3863 KB

Open AccessArticle

DeepSORT-OCR: Design and Application Research of a Maritime Ship Target Tracking Algorithm Incorporating Hull Number Features

by Jing Ma, Xihang Su, Kehui Xu, Hongliang Yin, Zhihong Xiao, Jiale Wang and Peng Liu

Mathematics 2026, 14(6), 1062; https://doi.org/10.3390/math14061062 - 20 Mar 2026

Viewed by 491

Abstract

Maritime ship target tracking plays an important role in applications such as maritime patrol and maritime surveillance. However, complex sea conditions, similar target appearances, and long-distance imaging often lead to target identity confusion and unstable trajectories. To address these issues, in this paper, [...] Read more.

Maritime ship target tracking plays an important role in applications such as maritime patrol and maritime surveillance. However, complex sea conditions, similar target appearances, and long-distance imaging often lead to target identity confusion and unstable trajectories. To address these issues, in this paper, a ship multi-object tracking algorithm, DeepSORT-OCR, that integrates hull number semantic features is proposed. Based on the YOLO detection framework and the DeepSORT tracking architecture, a CBAM-ResNet network is introduced to enhance the representation of ship appearance features. An Inner-SIoU metric is adopted to improve the geometric matching of slender ship targets, while an LSTM-Adaptive Kalman Filter is employed to model the nonlinear motion patterns of ships and improve trajectory prediction stability. In addition, a Hull Number Feature Extraction module is designed in order to recognize ship hull numbers using OCR and match them with a hull number database. The extracted hull number semantic features are dynamically fused with visual appearance features to strengthen identity constraints during target association. The experimental results show that the proposed method achieves an MOTA of 66.53% on the MOT16 dataset, representing an improvement of 5.13% over DeepSORT. On the self-constructed maritime ship dataset, the method achieves an MOTA of 70.89% and an MOTP of 80.84%. Furthermore, on the hull-number subset, the MOTA further increases to 77.18%, an improvement of 7.31% compared with DeepSORT, while the number of ID switches is significantly reduced. In addition, experiments conducted on pure real data, pure synthetic data, and cross-domain evaluation settings demonstrate the stability and strong generalization capability of the proposed algorithm under different data distributions. The proposed method effectively improves the stability and identity consistency of ship multi-object tracking in complex maritime environments. Full article

(This article belongs to the Special Issue Control Theory for Multi-Agent Systems: Recent Advances and Applications)

► Show Figures

Figure 1

30 pages, 43984 KB

Open AccessArticle

Edge-Graph Enhanced Network for Multi-Object Tracking in UAV Videos

by Yiming Xu, Hongbing Ji and Yongquan Zhang

Remote Sens. 2026, 18(6), 936; https://doi.org/10.3390/rs18060936 - 19 Mar 2026

Viewed by 604

Abstract

Multi-Object Tracking (MOT) is a fundamental research topic in the field of computer vision, with broad application potential in unmanned aerial vehicle (UAV) videos. However, existing methods still face significant challenges in detection discriminability and identity association stability due to the small scale [...] Read more.

Multi-Object Tracking (MOT) is a fundamental research topic in the field of computer vision, with broad application potential in unmanned aerial vehicle (UAV) videos. However, existing methods still face significant challenges in detection discriminability and identity association stability due to the small scale and weak appearance of objects under aerial viewpoints, as well as complex background interference. To address these issues, we propose an Edge-Graph Enhanced Network (EGEN) for UAV aerial MOT, aiming to improve the performance of small object detection (SOD) and tracking in complex scenes. The framework follows a one-step tracking paradigm and consists of three main components: object detection, embedding feature extraction, and data association. In the detection stage, we design an Edge-Guided Gaussian Enhancement Module (EGGEM), which models edge relationships between objects and backgrounds from a global perspective and selectively enhances Gaussian features guided by edge information, thereby strengthening key structural features of small objects while suppressing background interference. In the embedding feature extraction stage, we develop a Graph-Guided Embedding Enhancement Module (GGEEM), which explicitly represents re-identification (ReID) embeddings as a graph structure and jointly models nodes and their neighborhood relationships to fully capture inter-object associations and enhance embedding discriminability. In the data association stage, we introduce a hierarchical two-stage association strategy to match objects with different confidence levels separately, improving tracking stability and robustness. Extensive experiments on the VisDrone, UAVDT, and self-constructed WildDrone datasets demonstrate that the proposed method significantly outperforms state-of-the-art approaches in both SOD and MOT, demonstrating strong generalization and practical applicability. Full article

(This article belongs to the Special Issue Advanced Image Processing Algorithms for Object Detection and Tracking in Aerial and Satellite Imagery)

► Show Figures

Figure 1

24 pages, 28757 KB

Open AccessArticle

TASONet: A Spatial Enhancement and Temporal Modeling Framework for UAV Small Object Tracking

by Ruiqi Ma, Changcai Lai, Qinghua Sheng, Zehao Tao and Xiaorun Li

Remote Sens. 2026, 18(4), 561; https://doi.org/10.3390/rs18040561 - 11 Feb 2026

Viewed by 818

Abstract

Multi object tracking (MOT) in UAV imagery is challenged by weak feature representation of small objects due to limited resolution, which leads to frequent missed detections. However, enhancing small object features often amplifies background noise and increases false positives. To address this contradiction, [...] Read more.

Multi object tracking (MOT) in UAV imagery is challenged by weak feature representation of small objects due to limited resolution, which leads to frequent missed detections. However, enhancing small object features often amplifies background noise and increases false positives. To address this contradiction, we propose the Temporal Aware Small Object Enhancement Network (TASONet), which integrates spatial enhancement and temporal modeling for robust tracking. The Small Object Enhancement (SOE) module combines depthwise separable convolutions with contrast-aware attention mechanisms (SimAM and LCDAttn) to improve local discriminability. It further incorporates the Small Target Enhancement Path (STEP), which uses motion-difference cues and a confidence adaptive suppression strategy to strengthen spatial features while mitigating noise. The Temporal Enhancement Module (TEM), consisting of Temporal Feature Alignment (TFA) and a Target Memory Unit (TMU), aggregates multi-frame information through adaptive inter-frame fusion and memory of high confidence historical features, improving temporal consistency and reducing false positives potentially introduced by SOE. Experiments show that TASONet achieves significant gains over state-of-the-art methods: on UAVDT, MOTA increases from 68.33 to 75.97 and IDF1 from 83.50 to 88.51; on VisDrone-MOT, MOTA rises from 61.15 to 73.52 with an IDF1 of 88.83. These results validate the effectiveness of jointly enhancing spatial features and temporal coherence for UAV small-object MOT. Full article

► Show Figures

Figure 1

20 pages, 4296 KB

Open AccessArticle

Occlusion-Aware Multi-Object Tracking in Vineyards via SAM-Based Visibility Modeling

by Yanan Wang, Hagsong Kim, Muhammad Fayaz, Lien Minh Dang, Hyeonjoon Moon and Kang-Won Lee

Electronics 2026, 15(3), 621; https://doi.org/10.3390/electronics15030621 - 1 Feb 2026

Viewed by 642

Abstract

Multi-object tracking (MOT) in vineyard environments remains challenging due to frequent and long-term occlusions caused by dense foliage, overlapping grape clusters, and complex plant structures. These characteristics often result in identity switches and fragmented trajectories when using conventional tracking methods. This paper proposes [...] Read more.

Multi-object tracking (MOT) in vineyard environments remains challenging due to frequent and long-term occlusions caused by dense foliage, overlapping grape clusters, and complex plant structures. These characteristics often result in identity switches and fragmented trajectories when using conventional tracking methods. This paper proposes OATSAM-Track, an occlusion-aware multi-object tracking framework designed for vineyard fruit monitoring. The framework integrates lightweight MobileSAM-assisted instance segmentation to estimate target visibility and occlusion severity. Occlusion-state reasoning is further incorporated into temporal association, appearance memory updating, and identity recovery. An adaptive temporal memory mechanism selectively updates appearance features according to predicted occlusion states, reducing identity drift under partial and severe occlusions. To facilitate occlusion-aware evaluation, an extended vineyard multi-object tracking dataset (GrapeOcclusionMOTS) with SAM-refined instance masks and fine-grained occlusion annotations is constructed. The experimental results demonstrate that OATSAM-Track improves identity consistency and tracking robustness compared to representative baseline trackers, particularly under medium and severe occlusion scenarios. These results indicate that explicit occlusion modeling is beneficial for reliable fruit monitoring in precision agriculture. Full article

(This article belongs to the Special Issue Efficient Learning for Computer Vision: Few-Shot, Weakly Supervised and Unsupervised Approaches)

► Show Figures

Figure 1

23 pages, 21878 KB

Open AccessArticle

STC-SORT: A Dynamic Spatio-Temporal Consistency Framework for Multi-Object Tracking in UAV Videos

by Ziang Ma, Chuanzhi Chen, Jinbao Chen and Yuhan Jiang

Appl. Sci. 2026, 16(2), 1062; https://doi.org/10.3390/app16021062 - 20 Jan 2026

Cited by 1 | Viewed by 749

Abstract

Multi-object tracking (MOT) in videos captured by Unmanned Aerial Vehicles (UAVs) is critically challenged by significant camera ego-motion, frequent occlusions, and complex object interactions. To address the limitations of conventional trackers that depend on static, rule-based association strategies, this paper introduces STC-SORT, a [...] Read more.

Multi-object tracking (MOT) in videos captured by Unmanned Aerial Vehicles (UAVs) is critically challenged by significant camera ego-motion, frequent occlusions, and complex object interactions. To address the limitations of conventional trackers that depend on static, rule-based association strategies, this paper introduces STC-SORT, a novel tracking framework whose core is a two-level reasoning architecture for data association. First, a Spatio-Temporal Consistency Graph Network (STC-GN) models inter-object relationships via graph attention to learn adaptive weights for fusing motion, appearance, and geometric cues. Second, these dynamic weights are integrated into a 4D association cost volume, enabling globally optimal matching across a temporal window. When integrated with an enhanced AEE-YOLO detector, STC-SORT achieves significant and statistically robust improvements on major UAV tracking benchmarks. It elevates MOTA by 13.0% on UAVDT and 6.5% on VisDrone, while boosting IDF1 by 9.7% and 9.9%, respectively. The framework also maintains real-time inference speed (75.5 FPS) and demonstrates substantial reductions in identity switches. These results validate STC-SORT as having strong potential for robust multi-object tracking in challenging UAV scenarios. Full article

(This article belongs to the Section Aerospace Science and Engineering)

► Show Figures

Figure 1

24 pages, 13052 KB

Open AccessArticle

FGO-PMB: A Factor Graph Optimized Poisson Multi-Bernoulli Filter for Accurate Online 3D Multi-Object Tracking

by Jingyi Jin, Jindong Zhang, Yiming Wang and Yitong Liu

Sensors 2026, 26(2), 591; https://doi.org/10.3390/s26020591 - 15 Jan 2026

Viewed by 763

Abstract

Three-dimensional multi-object tracking (3D MOT) plays a vital role in enabling reliable perception for LiDAR-based autonomous systems. However, LiDAR measurements often exhibit sparsity, occlusion, and sensor noise that lead to uncertainty and instability in downstream tracking. To address these challenges, we propose FGO-PMB, [...] Read more.

Three-dimensional multi-object tracking (3D MOT) plays a vital role in enabling reliable perception for LiDAR-based autonomous systems. However, LiDAR measurements often exhibit sparsity, occlusion, and sensor noise that lead to uncertainty and instability in downstream tracking. To address these challenges, we propose FGO-PMB, a unified probabilistic framework that integrates the Poisson Multi-Bernoulli (PMB) filter from Random Finite Set (RFS) theory with Factor Graph Optimization (FGO) for robust LiDAR-based object tracking. In the proposed framework, object states, existence probabilities, and association weights are jointly formulated as optimizable variables within a factor graph. Four factors, including state transition, observation, existence, and association consistency, are formulated to uniformly encode the spatio-temporal constraints among these variables. By unifying the uncertainty modeling capability of RFS with the global optimization strength of FGO, the proposed framework achieves temporally consistent and uncertainty-aware estimation across continuous LiDAR scans. Experiments on KITTI and nuScenes indicate that the proposed method achieves competitive 3D MOT accuracy while maintaining real-time performance. Full article

(This article belongs to the Special Issue Recent Advances in LiDAR Sensing Technology for Autonomous Vehicles)

► Show Figures

Figure 1

Search Results (138)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (138)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI