Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (31)

Search Parameters:
Keywords = dynamic occlusion detector

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 9353 KB  
Article
YOLOv10n-Based Peanut Leaf Spot Detection Model via Multi-Dimensional Feature Enhancement and Geometry-Aware Loss
by Yongpeng Liang, Lei Zhao, Wenxin Zhao, Shuo Xu, Haowei Zheng and Zhaona Wang
Appl. Sci. 2026, 16(3), 1162; https://doi.org/10.3390/app16031162 (registering DOI) - 23 Jan 2026
Viewed by 38
Abstract
Precise identification of early peanut leaf spot is strategically significant for safeguarding oilseed supplies and reducing pesticide reliance. However, general-purpose detectors face severe domain adaptation bottlenecks in unstructured field environments due to small feature dissipation, physical occlusion, and class imbalance. To address this, [...] Read more.
Precise identification of early peanut leaf spot is strategically significant for safeguarding oilseed supplies and reducing pesticide reliance. However, general-purpose detectors face severe domain adaptation bottlenecks in unstructured field environments due to small feature dissipation, physical occlusion, and class imbalance. To address this, this study constructs a dataset spanning two phenological cycles and proposes POD-YOLO, a physics-aware and dynamics-optimized lightweight framework. Anchored on the YOLOv10n architecture and adhering to a “data-centric” philosophy, the framework optimizes the parameter convergence path via a synergistic “Augmentation-Loss-Optimization” mechanism: (1) Input Stage: A Physical Domain Reconstruction (PDR) module is introduced to simulate physical occlusion, blocking shortcut learning and constructing a robust feature space; (2) Loss Stage: A Loss Manifold Reshaping (LMR) mechanism is established utilizing dual-branch constraints to suppress background gradients and enhance small target localization; and (3) Optimization Stage: A Decoupled Dynamic Scheduling (DDS) strategy is implemented, integrating AdamW with cosine annealing to ensure smooth convergence on small-sample data. Experimental results demonstrate that POD-YOLO achieves a 9.7% precision gain over the baseline and 83.08% recall, all while maintaining a low computational cost of 8.4 GFLOPs. This study validates the feasibility of exploiting the potential of lightweight architectures through optimization dynamics, offering an efficient paradigm for edge-based intelligent plant protection. Full article
(This article belongs to the Section Optics and Lasers)
Show Figures

Figure 1

23 pages, 21878 KB  
Article
STC-SORT: A Dynamic Spatio-Temporal Consistency Framework for Multi-Object Tracking in UAV Videos
by Ziang Ma, Chuanzhi Chen, Jinbao Chen and Yuhan Jiang
Appl. Sci. 2026, 16(2), 1062; https://doi.org/10.3390/app16021062 - 20 Jan 2026
Viewed by 80
Abstract
Multi-object tracking (MOT) in videos captured by Unmanned Aerial Vehicles (UAVs) is critically challenged by significant camera ego-motion, frequent occlusions, and complex object interactions. To address the limitations of conventional trackers that depend on static, rule-based association strategies, this paper introduces STC-SORT, a [...] Read more.
Multi-object tracking (MOT) in videos captured by Unmanned Aerial Vehicles (UAVs) is critically challenged by significant camera ego-motion, frequent occlusions, and complex object interactions. To address the limitations of conventional trackers that depend on static, rule-based association strategies, this paper introduces STC-SORT, a novel tracking framework whose core is a two-level reasoning architecture for data association. First, a Spatio-Temporal Consistency Graph Network (STC-GN) models inter-object relationships via graph attention to learn adaptive weights for fusing motion, appearance, and geometric cues. Second, these dynamic weights are integrated into a 4D association cost volume, enabling globally optimal matching across a temporal window. When integrated with an enhanced AEE-YOLO detector, STC-SORT achieves significant and statistically robust improvements on major UAV tracking benchmarks. It elevates MOTA by 13.0% on UAVDT and 6.5% on VisDrone, while boosting IDF1 by 9.7% and 9.9%, respectively. The framework also maintains real-time inference speed (75.5 FPS) and demonstrates substantial reductions in identity switches. These results validate STC-SORT as having strong potential for robust multi-object tracking in challenging UAV scenarios. Full article
(This article belongs to the Section Aerospace Science and Engineering)
Show Figures

Figure 1

25 pages, 3879 KB  
Article
Robust Occluded Object Detection in Multimodal Autonomous Driving: A Fusion-Aware Learning Framework
by Zhengqing Li and Baljit Singh
Electronics 2026, 15(1), 245; https://doi.org/10.3390/electronics15010245 - 5 Jan 2026
Viewed by 282
Abstract
Reliable occluded object detection remains a persistent core challenge for autonomous driving perception systems, particularly in complex urban scenarios where targets are predominantly partially or fully obscured by static obstacles or dynamic agents. Conventional single-modality detectors often fail to capture adequate discriminative cues [...] Read more.
Reliable occluded object detection remains a persistent core challenge for autonomous driving perception systems, particularly in complex urban scenarios where targets are predominantly partially or fully obscured by static obstacles or dynamic agents. Conventional single-modality detectors often fail to capture adequate discriminative cues for robust recognition, while existing multimodal fusion strategies typically lack explicit occlusion modeling and effective feature completion mechanisms, ultimately degrading performance in safety-critical operating conditions. To address these limitations, we propose a novel Fusion-Aware Occlusion Detection (FAOD) framework that integrates explicit visibility reasoning with implicit cross-modal feature reconstruction. Specifically, FAOD leverages synchronized red–green–blue (RGB), light detection and ranging (LiDAR), and optional radar/infrared inputs, employs a visibility-aware attention mechanism to infer target occlusion states, and embeds a cross-modality completion module to reconstruct missing object features via complementary non-occluded modal information; it further incorporates an occlusion-aware data augmentation and annotation strategy to enhance model generalization across diverse occlusion patterns. Extensive evaluations on four benchmark datasets demonstrate that FAOD achieves state-of-the-art performance, including a +8.75% occlusion-level mean average precision (OL-mAP) improvement over existing methods on heavily occluded objects O=2 in the nuScenes dataset, while maintaining real-time efficiency. These findings confirm FAOD’s potential to advance reliable multimodal perception for next-generation autonomous driving systems in safety-critical environments. Full article
Show Figures

Figure 1

24 pages, 8304 KB  
Article
STAIR-DETR: A Synergistic Transformer Integrating Statistical Attention and Multi-Scale Dynamics for UAV Small Object Detection
by Linna Hu, Penghao Xue, Bin Guo, Yiwen Chen, Weixian Zha and Jiya Tian
Sensors 2025, 25(24), 7681; https://doi.org/10.3390/s25247681 - 18 Dec 2025
Viewed by 485
Abstract
Detecting small objects in unmanned aerial vehicle (UAV) imagery remains a challenging task due to the limited target scale, cluttered backgrounds, severe occlusion, and motion blur commonly observed in dynamic aerial environments. This study presents STAIR-DETR, a real-time synergistic detection framework derived from [...] Read more.
Detecting small objects in unmanned aerial vehicle (UAV) imagery remains a challenging task due to the limited target scale, cluttered backgrounds, severe occlusion, and motion blur commonly observed in dynamic aerial environments. This study presents STAIR-DETR, a real-time synergistic detection framework derived from RT-DETR, featuring comprehensive enhancements in feature extraction, resolution transformation, and detection head design. A Statistical Feature Attention (SFA) module is incorporated into the neck to replace the original AIFI, enabling token-level statistical modeling that strengthens fine-grained feature representation while effectively suppressing background interference. The backbone is reinforced with a Diverse Semantic Enhancement Block (DSEB), which employs multi-branch pathways and dynamic convolution to enrich semantic expressiveness without sacrificing spatial precision. To mitigate information loss during scale transformation, an Adaptive Scale Transformation Operator (ASTO) is proposed by integrating Context-Guided Downsampling (CGD) and Dynamic Sampling (DySample), achieving context-aware compression and content-adaptive reconstruction across resolutions. In addition, a high-resolution P2 detection head is introduced to leverage shallow-layer features for accurate classification and localization of extremely small targets. Extensive experiments conducted on the VisDrone2019 dataset demonstrate that STAIR-DETR attains 41.7% mAP@50 and 23.4% mAP@50:95, outperforming contemporary state-of-the-art (SOTA) detectors while maintaining real-time inference efficiency. These results confirm the effectiveness and robustness of STAIR-DETR for precise small object detection in complex UAV-based imaging scenarios. Full article
(This article belongs to the Special Issue Dynamics and Control System Design for Robotics)
Show Figures

Figure 1

20 pages, 2397 KB  
Article
IMM-DeepSort: An Adaptive Multi-Model Kalman Framework for Robust Multi-Fish Tracking in Underwater Environments
by Ying Yu, Yan Li and Shuo Li
Fishes 2025, 10(11), 592; https://doi.org/10.3390/fishes10110592 - 18 Nov 2025
Viewed by 457
Abstract
Multi-object tracking (MOT) is a critical task in computer vision, with widespread applications in intelligent surveillance, behavior analysis, autonomous navigation, and marine ecological monitoring. In particular, accurate tracking of underwater fish plays a significant role in scientific fishery management, biodiversity assessment, and behavioral [...] Read more.
Multi-object tracking (MOT) is a critical task in computer vision, with widespread applications in intelligent surveillance, behavior analysis, autonomous navigation, and marine ecological monitoring. In particular, accurate tracking of underwater fish plays a significant role in scientific fishery management, biodiversity assessment, and behavioral analysis of marine species. However, MOT remains particularly challenging due to low visibility, frequent occlusions, and the highly non-linear, burst-like motion of fish. To address these challenges, this paper proposes an improved tracking framework that integrates Interacting Multiple Model Kalman Filtering (IMM-KF) into DeepSORT, forming a self-adaptive multi-object tracking algorithm tailored for underwater fish tracking. First, a lightweight YOLOv8n (You Only Look Once v8 nano) detector is employed for target localization, chosen for its balance between detection accuracy and real-time efficiency in resource-constrained underwater scenarios. The tracking stage incorporates two complementary motion models—Constant Velocity (CV) for regular cruising and Constant Acceleration (CA) for rapid burst swimming. The IMM mechanism dynamically evaluates the posterior probability of each model given the observations, adaptively selecting and fusing predictions to maintain both responsiveness and stability. The proposed method is evaluated on a real-world underwater fish dataset collected from the East China Sea, comprising 19 species of marine fish annotated in YOLO format. Experimental results show that the IMM-DeepSORT framework outperforms the original DeepSORT in terms of MOTA, MOTP, and IDF1. In particular, it significantly reduces false matches and improves tracking continuity, demonstrating the method’s effectiveness and reliability in complex underwater multi-target tracking scenarios. Full article
(This article belongs to the Special Issue Technology for Fish and Fishery Monitoring)
Show Figures

Figure 1

29 pages, 21203 KB  
Article
Real-Time Parking Space Management System Based on a Low-Power Embedded Platform
by Kapyol Kim, Jongwon Lee, Incheol Jeong, Jungil Jung and Jinsoo Cho
Sensors 2025, 25(22), 7009; https://doi.org/10.3390/s25227009 - 17 Nov 2025
Viewed by 1105
Abstract
This study proposes an edge-centric outdoor parking management system that performs on-site inference on a low-power embedded device and outputs slot-level occupancy decisions in real time. A dataset comprising 13,691 images was constructed using two cameras capturing frames every 3–5 s under diverse [...] Read more.
This study proposes an edge-centric outdoor parking management system that performs on-site inference on a low-power embedded device and outputs slot-level occupancy decisions in real time. A dataset comprising 13,691 images was constructed using two cameras capturing frames every 3–5 s under diverse weather and illumination conditions, and a YOLOv8-based detector was trained for vehicle recognition. Beyond raw detections, a temporal occupancy decision module is introduced to map detections to predefined slot regions of interest (ROIs) while applying temporal smoothing and occlusion-robust rules, thereby improving stability under rainy and nighttime conditions. When deployed on an AI-BOX edge platform, the proposed system achieves end-to-end latency p50/p95 of 195 ms and 400 ms, respectively, while sustaining 10 FPS at 3.35 W (2.99 FPS/W) during continuous 24-hour operation. Compared with conventional sensor-based architectures, the proposed design significantly reduces upfront deployment costs and recurring maintenance requirements. Furthermore, when integrated with dynamic pricing mechanisms, it enables accurate and automated fee calculation based on real-time occupancy data. Overall, the results demonstrate that the proposed approach provides a flexible, scalable, and cost-efficient foundation for next-generation smart parking infrastructure. Full article
(This article belongs to the Special Issue Edge Computing in IoT Networks Based on Artificial Intelligence)
Show Figures

Figure 1

29 pages, 8876 KB  
Article
Adaptive CNN Ensemble for Apple Detection: Enabling Sustainable Monitoring Orchard
by Alexey Kutyrev, Nikita Andriyanov, Dmitry Khort, Igor Smirnov and Valeria Zubina
AgriEngineering 2025, 7(11), 369; https://doi.org/10.3390/agriengineering7110369 - 3 Nov 2025
Viewed by 879
Abstract
Accurate detection of apples in orchards under variable weather and illumination remains a key challenge for precision horticulture. This study presents a flexible framework for automated ensemble selection and optimization of convolutional neural network (CNN) inference. The system integrates eleven ensemble methods, dynamically [...] Read more.
Accurate detection of apples in orchards under variable weather and illumination remains a key challenge for precision horticulture. This study presents a flexible framework for automated ensemble selection and optimization of convolutional neural network (CNN) inference. The system integrates eleven ensemble methods, dynamically configured via Pareto-based multi-objective optimization balancing accuracy (mAP, F1-Score) and performance (FPS). A key innovation is its pre-deployment benchmarking whereby models are evaluated on a representative field sample to recommend a single optimal model or lightweight ensemble for real-time use. Experimental results show ensemble models consistently outperform individual detectors, achieving a 7–12% improvement in accuracy in complex scenes with occlusions and motion blur, underscoring the approach’s value for sustainable orchard management. Full article
Show Figures

Figure 1

11 pages, 1013 KB  
Proceeding Paper
A Comparative Evaluation of Classical and Deep Learning-Based Visual Odometry Methods for Autonomous Vehicle Navigation
by Armand Nagy and János Hollósi
Eng. Proc. 2025, 113(1), 16; https://doi.org/10.3390/engproc2025113016 - 29 Oct 2025
Viewed by 948
Abstract
This study introduces a comprehensive benchmarking framework for evaluating visual odometry (VO) methods, combining classical, learning-based, and hybrid approaches. We assess 52 configurations—spanning 19 keypoint detectors, 21 descriptors, and 4 matchers—across two widely used benchmark datasets: KITTI and EuRoC. Six key trajectory metrics, [...] Read more.
This study introduces a comprehensive benchmarking framework for evaluating visual odometry (VO) methods, combining classical, learning-based, and hybrid approaches. We assess 52 configurations—spanning 19 keypoint detectors, 21 descriptors, and 4 matchers—across two widely used benchmark datasets: KITTI and EuRoC. Six key trajectory metrics, including Absolute Trajectory Error (ATE) and Final Displacement Error (FDE), provide a detailed performance comparison under various environmental conditions, such as motion blur, occlusions, and dynamic lighting. Our results highlight the critical role of feature matchers, with the LightGlue–SIFT combination consistently outperforming others across both datasets. Additionally, learning-based matchers can be integrated with classical pipelines, improving robustness without requiring end-to-end training. Hybrid configurations combining classical detectors with learned components offer a balanced trade-off between accuracy, robustness, and computational efficiency, making them suitable for real-world applications in autonomous systems and robotics. Full article
(This article belongs to the Proceedings of The Sustainable Mobility and Transportation Symposium 2025)
Show Figures

Figure 1

31 pages, 5190 KB  
Article
MDF-YOLO: A Hölder-Based Regularity-Guided Multi-Domain Fusion Detection Model for Indoor Objects
by Fengkai Luan, Jiaxing Yang and Hu Zhang
Fractal Fract. 2025, 9(10), 673; https://doi.org/10.3390/fractalfract9100673 - 18 Oct 2025
Viewed by 684
Abstract
With the rise of embodied agents and indoor service robots, object detection has become a critical component supporting semantic mapping, path planning, and human–robot interaction. However, indoor scenes often face challenges such as severe occlusion, large-scale variations, small and densely packed objects, and [...] Read more.
With the rise of embodied agents and indoor service robots, object detection has become a critical component supporting semantic mapping, path planning, and human–robot interaction. However, indoor scenes often face challenges such as severe occlusion, large-scale variations, small and densely packed objects, and complex textures, making existing methods struggle in terms of both robustness and accuracy. This paper proposes MDF-YOLO, a multi-domain fusion detection framework based on Hölder regularity guidance. In the backbone, neck, and feature recovery stages, the framework introduces the CrossGrid Memory Block, Hölder-Based Regularity Guidance–Hierarchical Context Aggregation module, and Frequency-Guided Residual Block, achieving complementary feature modeling across the state space, spatial domain, and frequency domain. In particular, the HG-HCA module uses the Hölder regularity map as a guiding signal to balance the dynamic equilibrium between the macro and micro paths, thus achieving adaptive coordination between global consistency and local discriminability. Experimental results show that MDF-YOLO significantly outperforms mainstream detectors in metrics such as mAP@0.5, mAP@0.75, and mAP@0.5:0.95, achieving values of 0.7158, 0.6117, and 0.5814, respectively, while maintaining near real-time inference efficiency in terms of FPS and latency. Ablation studies further validate the independent and synergistic contributions of CGMB, HG-HCA, and FGRB in improving small-object detection, occlusion handling, and cross-scale robustness. This study demonstrates the potential of Hölder regularity and multi-domain fusion modeling in object detection, offering new insights for efficient visual modeling in complex indoor environments. Full article
Show Figures

Figure 1

21 pages, 11040 KB  
Article
DPDN-YOLOv8: A Method for Dense Pedestrian Detection in Complex Environments
by Yue Liu, Linjun Xu, Baolong Li, Zifan Lin and Deyue Yuan
Mathematics 2025, 13(20), 3325; https://doi.org/10.3390/math13203325 - 18 Oct 2025
Viewed by 966
Abstract
Accurate pedestrian detection from a robotic perspective has become increasingly critical, especially in complex environments such as crowded and high-density populations. Existing methods have low accuracy due to multi-scale pedestrians and dense occlusion in complex environments. To address the above drawbacks, a dense [...] Read more.
Accurate pedestrian detection from a robotic perspective has become increasingly critical, especially in complex environments such as crowded and high-density populations. Existing methods have low accuracy due to multi-scale pedestrians and dense occlusion in complex environments. To address the above drawbacks, a dense pedestrian detection network architecture based on YOLOv8n (DPDN-YOLOv8) was introduced for complex environments. The network aims to improve robots’ pedestrian detection in complex environments. Firstly, the C2f modules in the backbone network are replaced with C2f_ODConv modules integrating omni-dimensional dynamic convolution (ODConv) to enable the model’s multi-dimensional feature focusing on detected targets. Secondly, the up-sampling operator Content-Aware Reassembly of Features (CARAFE) is presented to replace the Up-Sample module to reduce the loss of the up-sampling information. Then, the Adaptive Spatial Feature Fusion detector head with four detector heads (ASFF-4) was introduced to enhance the system’s ability to detect small targets. Finally, to accelerate the convergence of the network, the Focaler-Shape-IoU is utilized to become the bounding box regression loss function. The experimental results show that, compared with YOLOv8n, the mAP@0.5 of DPDN-YOLOv8 increases from 80.5% to 85.6%. Although model parameters increase from 3×106 to 5.2×106, it can still meet requirements for deployment on mobile devices. Full article
(This article belongs to the Special Issue Artificial Intelligence: Deep Learning and Computer Vision)
Show Figures

Figure 1

22 pages, 5772 KB  
Article
CF-DETR: A Lightweight Real-Time Model for Chicken Face Detection in High-Density Poultry Farming
by Bin Gao, Wanchao Zhang, Deqi Hao, Kaisi Yang and Changxi Chen
Animals 2025, 15(19), 2919; https://doi.org/10.3390/ani15192919 - 8 Oct 2025
Viewed by 784
Abstract
Reliable individual detection under dense and cluttered conditions is a prerequisite for automated monitoring in modern poultry systems. We propose CF-DETR, an end-to-end detector that builds on RT-DETR and is tailored to chicken face detection in production-like environments. CF-DETR advances three technical directions: [...] Read more.
Reliable individual detection under dense and cluttered conditions is a prerequisite for automated monitoring in modern poultry systems. We propose CF-DETR, an end-to-end detector that builds on RT-DETR and is tailored to chicken face detection in production-like environments. CF-DETR advances three technical directions: Dynamic Inception Depthwise Convolution (DIDC) expands directional and multi-scale receptive fields while remaining lightweight, Polar Embedded Multi-Scale Encoder (PEMD) restores global context and fuses multi-scale information to compensate for lost high-frequency details, and a Matchability Aware Loss (MAL) aligns predicted confidence with localization quality to accelerate convergence and improve discrimination. On a comprehensive broiler dataset, CF-DETR achieves a mean average precision at IoU 0.50 of 96.9% and a mean average precision (IoU 0.50–0.95) of 62.8%. Compared to the RT-DETR baseline, CF-DETR reduces trainable parameters by 33.2% and lowers FLOPs by 23.0% while achieving 81.4 frames per second. Ablation studies confirm that each module contributes to performance gains and that the combined design materially enhances robustness to occlusion and background clutter. Owing to its lightweight design, CF-DETR is well-suited for deployment in real-time smart farming monitoring systems. These results indicate that CF-DETR delivers an improved trade-off between detection performance and computational cost for real-time visual monitoring in intensive poultry production. Full article
(This article belongs to the Section Poultry)
Show Figures

Figure 1

19 pages, 1948 KB  
Article
Graph-MambaRoadDet: A Symmetry-Aware Dynamic Graph Framework for Road Damage Detection
by Zichun Tian, Xiaokang Shao and Yuqi Bai
Symmetry 2025, 17(10), 1654; https://doi.org/10.3390/sym17101654 - 5 Oct 2025
Viewed by 1114
Abstract
Road-surface distress poses a serious threat to traffic safety and imposes a growing burden on urban maintenance budgets. While modern detectors based on convolutional networks and Vision Transformers achieve strong frame-level performance, they often overlook an essential property of road environments—structural symmetry [...] Read more.
Road-surface distress poses a serious threat to traffic safety and imposes a growing burden on urban maintenance budgets. While modern detectors based on convolutional networks and Vision Transformers achieve strong frame-level performance, they often overlook an essential property of road environments—structural symmetry within road networks and damage patterns. We present Graph-MambaRoadDet (GMRD), a symmetry-aware and lightweight framework that integrates dynamic graph reasoning with state–space modeling for accurate, topology-informed, and real-time road damage detection. Specifically, GMRD employs an EfficientViM-T1 backbone and two DefMamba blocks, whose deformable scanning paths capture sub-pixel crack patterns while preserving geometric symmetry. A superpixel-based graph is constructed by projecting image regions onto OpenStreetMap road segments, encoding both spatial structure and symmetric topological layout. We introduce a Graph-Generating State–Space Model (GG-SSM) that synthesizes sparse sample-specific adjacency in O(M) time, further refined by a fusion module that combines detector self-attention with prior symmetry constraints. A consistency loss promotes smooth predictions across symmetric or adjacent segments. The full INT8 model contains only 1.8 M parameters and 1.5 GFLOPs, sustaining 45 FPS at 7 W on a Jetson Orin Nano—eight times lighter and 1.7× faster than YOLOv8-s. On RDD2022, TD-RD, and RoadBench-100K, GMRD surpasses strong baselines by up to +6.1 mAP50:95 and, on the new RoadGraph-RDD benchmark, achieves +5.3 G-mAP and +0.05 consistency gain. Qualitative results demonstrate robustness under shadows, reflections, back-lighting, and occlusion. By explicitly modeling spatial and topological symmetry, GMRD offers a principled solution for city-scale road infrastructure monitoring under real-time and edge-computing constraints. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

18 pages, 7743 KB  
Article
Improved Daytime Cloud Detection Algorithm in FY-4A’s Advanced Geostationary Radiation Imager
by Xiao Zhang, Song-Ying Zhao and Rui-Xuan Tang
Atmosphere 2025, 16(9), 1105; https://doi.org/10.3390/atmos16091105 - 20 Sep 2025
Viewed by 672
Abstract
Cloud detection is an indispensable step in satellite remote sensing of cloud properties and objects under the influence of cloud occlusion. Nevertheless, interfering targets such as snow and haze pollution are easily misjudged as clouds for most of the current algorithms. Hence, a [...] Read more.
Cloud detection is an indispensable step in satellite remote sensing of cloud properties and objects under the influence of cloud occlusion. Nevertheless, interfering targets such as snow and haze pollution are easily misjudged as clouds for most of the current algorithms. Hence, a robust cloud detection algorithm is urgently needed, especially for regions with high latitudes or severe air pollution. This paper demonstrated that the passive satellite detector Advanced Geosynchronous Radiation Imager (AGRI) onboard the FY-4A satellite has a great possibility to misjudge the dense aerosols in haze pollution as clouds during the daytime, and constructed an algorithm based on the spectral information of the AGRI’s 14 bands with a concise and high-speed calculation. This study adjusted the previously proposed cloud mask rectification algorithm of Moderate-Resolution Imaging Spectroradiometer (MODIS), rectified the MODIS cloud detection result, and used it as the accurate cloud mask data. The algorithm was constructed based on adjusted Fisher discrimination analysis (AFDA) and spectral spatial variability (SSV) methods over four different underlying surfaces (land, desert, snow, and water) and two seasons (summer and winter). This algorithm divides the identification into two steps to screen the confident cloud clusters and broken clouds, which are not easy to recognize, respectively. In the first step, channels with obvious differences in cloudy and cloud-free areas were selected, and AFDA was utilized to build a weighted sum formula across the normalized spectral data of the selected bands. This step transforms the traditional dynamic-threshold test on multiple bands into a simple test of the calculated summation value. In the second step, SSV was used to capture the broken clouds by calculating the standard deviation (STD) of spectra in every 3 × 3-pixel window to quantify the spectral homogeneity within a small scale. To assess the algorithm’s spatial and temporal generalizability, two evaluations were conducted: one examining four key regions and another assessing three different moments on a certain day in East China. The results showed that the algorithm has an excellent accuracy across four different underlying surfaces, insusceptible to the main interferences such as haze and snow, and shows a strong detection capability for broken clouds. This algorithm enables widespread application to different regions and times of day, with a low calculation complexity, indicating that a new method satisfying the requirements of fast and robust cloud detection can be achieved. Full article
(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)
Show Figures

Figure 1

16 pages, 881 KB  
Article
Text-Guided Spatio-Temporal 2D and 3D Data Fusion for Multi-Object Tracking with RegionCLIP
by Youlin Liu, Zainal Rasyid Mahayuddin and Mohammad Faidzul Nasrudin
Appl. Sci. 2025, 15(18), 10112; https://doi.org/10.3390/app151810112 - 16 Sep 2025
Viewed by 1359
Abstract
3D Multi-Object Tracking (3D MOT) is a critical task in autonomous systems, where accurate and robust tracking of multiple objects in dynamic environments is essential. Traditional approaches primarily rely on visual or geometric features, often neglecting the rich semantic information available in textual [...] Read more.
3D Multi-Object Tracking (3D MOT) is a critical task in autonomous systems, where accurate and robust tracking of multiple objects in dynamic environments is essential. Traditional approaches primarily rely on visual or geometric features, often neglecting the rich semantic information available in textual modalities. In this paper, we propose Text-Guided 3D Multi-Object Tracking (TG3MOT), a novel framework that incorporates Vision-Language Models (VLMs) into the YONTD architecture to improve 3D MOT performance. Our framework leverages RegionCLIP, a multimodal open-vocabulary detector, to achieve fine-grained alignment between image regions and textual concepts, enabling the incorporation of semantic information into the tracking process. To address challenges such as occlusion, blurring, and ambiguous object appearances, we introduce the Target Semantic Matching Module (TSM), which quantifies the uncertainty of semantic alignment and filters out unreliable regions. Additionally, we propose the 3D Feature Exponential Moving Average Module (3D F-EMA) to incorporate temporal information, improving robustness in noisy or occluded scenarios. Furthermore, the Gaussian Confidence Fusion Module (GCF) is introduced to weight historical trajectory confidences based on temporal proximity, enhancing the accuracy of trajectory management. We evaluate our framework on the KITTI dataset and compare it with the YONTD baseline. Extensive experiments demonstrate that although the overall HOTA gain of TG3MOT is modest (+0.64%), our method achieves substantial improvements in association accuracy (+0.83%) and significantly reduces ID switches (−16.7%). These improvements are particularly valuable in real-world autonomous driving scenarios, where maintaining consistent trajectories under occlusion and ambiguous appearances is crucial for downstream tasks such as trajectory prediction and motion planning. The code will be made publicly available. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

28 pages, 2107 KB  
Article
A Scale-Adaptive and Frequency-Aware Attention Network for Precise Detection of Strawberry Diseases
by Kaijie Zhang, Yuchen Ye, Kaihao Chen, Zao Li and Hongxing Peng
Agronomy 2025, 15(8), 1969; https://doi.org/10.3390/agronomy15081969 - 15 Aug 2025
Viewed by 1108
Abstract
Accurate and automated detection of diseases is crucial for sustainable strawberry production. However, the challenges posed by small size, mutual occlusion, and high intra-class variance of symptoms in complex agricultural environments make this difficult. Mainstream deep learning detectors often do not perform well [...] Read more.
Accurate and automated detection of diseases is crucial for sustainable strawberry production. However, the challenges posed by small size, mutual occlusion, and high intra-class variance of symptoms in complex agricultural environments make this difficult. Mainstream deep learning detectors often do not perform well under these demanding conditions. We propose a novel detection framework designed for superior accuracy and robustness to address this critical gap. Our framework introduces four key innovations: First, we propose a novel attention-driven detection head featuring our Parallel Pyramid Attention (PPA) module. Inspired by pyramid attention principles, our module’s unique parallel multi-branch architecture is designed to overcome the limitations of serial processing. It simultaneously integrates global, local, and serial features to generate a fine-grained attention map, significantly improving the model’s focus on targets of varying scales. Second, we enhance the core feature fusion blocks by integrating Monte Carlo Attention (MCAttn), effectively empowering the model to recognize targets across diverse scales. Third, to improve the feature representation capacity of the backbone without increasing the parametric overhead, we replace standard convolutions with Frequency-Dynamic Convolutions (FDConv). This approach constructs highly diverse kernels in the frequency domain. Finally, we employ the Scale-Decoupled Loss function to optimize training dynamics. By adaptively re-weighting the localization and scale losses based on target size, we stabilize the training process and improve the Precision of bounding box regression for small objects. Extensive experiments on a challenging dataset related to strawberry diseases demonstrate that our proposed model achieves a mean Average Precision (MAP) of 81.1%. This represents an improvement of 2.1% over the strong YOLOv12-n baseline, highlighting its practical value as an effective tool for intelligent disease protection. Full article
Show Figures

Figure 1

Back to TopTop