Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,424)

Search Parameters:
Keywords = video tracking

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 3673 KB  
Systematic Review
Recent Advances in Multi-Camera Computer Vision for Industry 4.0 and Smart Cities: A Systematic Review
by Carlos Julio Fierro-Silva, Carolina Del-Valle-Soto, Samih M. Mostafa and José Varela-Aldás
Algorithms 2026, 19(4), 249; https://doi.org/10.3390/a19040249 (registering DOI) - 25 Mar 2026
Abstract
The rapid deployment of surveillance cameras in urban, industrial, and domestic environments has intensified the need for intelligent systems capable of analyzing video streams beyond the limitations of single-camera setups. Unlike traditional single-camera approaches, multi-camera systems expand spatial coverage, reduce blind spots, and [...] Read more.
The rapid deployment of surveillance cameras in urban, industrial, and domestic environments has intensified the need for intelligent systems capable of analyzing video streams beyond the limitations of single-camera setups. Unlike traditional single-camera approaches, multi-camera systems expand spatial coverage, reduce blind spots, and enable consistent tracking of people and objects across non-overlapping views, thereby improving robustness against occlusions and viewpoint changes. This article presents a comprehensive review of multi-camera vision systems published between 2020 and 2025, covering application domains including public security and biometrics, intelligent transportation, smart cities and IoT, healthcare monitoring, precision agriculture, industry and robotics, pan–tilt–zoom (PTZ) camera networks, and emerging areas such as retail and forensic analysis. The review synthesizes predominant technical approaches, including deep-learning-based detection, multi-target multi-camera tracking (MTMCT), re-identification (Re-ID), spatiotemporal fusion, and edge computing architectures. Persistent challenges are identified, particularly in inter-camera data association, scalability, computational efficiency, privacy preservation, and dataset availability. Emerging trends such as distributed edge AI, cooperative camera networks, and active perception are discussed to outline future research directions toward scalable, privacy-aware, and intelligent multi-camera infrastructures. Full article
Show Figures

Figure 1

27 pages, 9896 KB  
Article
Refer-ASV: Referring Multi-Object Tracking in Autonomous Surface Vehicle Navigation Scenes
by Bin Xue, Qiang Yu, Kun Ding, Ying Wang, Shiming Xiang and Chunhong Pan
J. Imaging 2026, 12(4), 145; https://doi.org/10.3390/jimaging12040145 - 25 Mar 2026
Viewed by 56
Abstract
Water-surface perception is critical for autonomous surface vehicle navigation, where reliable tracking of task-relevant objects is essential for safe and robust operation. Referring multi-object tracking (RMOT) provides a flexible tracking paradigm by allowing users to specify objects of interest through natural language. However, [...] Read more.
Water-surface perception is critical for autonomous surface vehicle navigation, where reliable tracking of task-relevant objects is essential for safe and robust operation. Referring multi-object tracking (RMOT) provides a flexible tracking paradigm by allowing users to specify objects of interest through natural language. However, existing RMOT benchmarks are mainly designed for ground or satellite scenes and fail to capture the distinctive visual and semantic characteristics of water-surface environments, including strong reflections, severe illumination variations, weak motion constraints, and a high proportion of small objects. To address this gap, we introduce Refer-ASV, the first RMOT dataset tailored for ASV navigation in complex water-surface scenes. Refer-ASV is constructed from real-world ASV videos and features diverse navigation scenes and fine-grained vessel categories. To facilitate systematic evaluation on Refer-ASV, we further propose RAMOT, an end-to-end baseline framework that enhances visual–language alignment throughout the tracking pipeline by improving visual–language alignment and robustness in challenging maritime environments. Experimental results show that RAMOT achieves a HOTA score of 39.97 on Refer-ASV, outperforming existing methods. Additional experiments on Refer-KITTI demonstrate its generalization ability across different scenes. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

13 pages, 1473 KB  
Article
Enhancing Ophthalmologists’ Accuracy in Detecting Convergence Insufficiency Using AI-Derived Graphical Outputs
by Ahmad Khatib, Haneen Jabaly-Habib, Shmuel Raz and Ilan Shimshoni
J. Clin. Transl. Ophthalmol. 2026, 4(2), 9; https://doi.org/10.3390/jcto4020009 - 24 Mar 2026
Viewed by 101
Abstract
Background: Accurate evaluation of the Near Point of Convergence (NPC) is essential for diagnosing and managing convergence insufficiency (CI). Conventional assessment relies on the patient’s verbal feedback and the examiner’s visual observation, making it subjective and examiner-dependent. The AI-based MobileS platform, previously validated [...] Read more.
Background: Accurate evaluation of the Near Point of Convergence (NPC) is essential for diagnosing and managing convergence insufficiency (CI). Conventional assessment relies on the patient’s verbal feedback and the examiner’s visual observation, making it subjective and examiner-dependent. The AI-based MobileS platform, previously validated for both diagnosis and home-based therapy of CI, enables smartphone-based measurement and visualisation of NPC through eye tracking, without the need for verbal responses or additional equipment. This study, the third stage of our research programme, examined how ophthalmologists interpret NPC data when presented as videos versus AI-derived graphs. Methods: Twenty-two ophthalmologists completed an online questionnaire with 20 NPC test cases from the validated MobileS database, presented as both silent videos and AI-derived graphs. Accuracy was analysed using mixed-effects logistic regression, and continuous error was assessed using clustered bootstrap. Results: Graph-based interpretation showed higher odds of accurate NPC identification than video-based interpretation at the primary ±5 mm threshold (OR = 19.7, 95% CI: 13.50–28.74; p < 0.0001). Absolute error was lower for graphs than videos (Graphs − Videos: −22.73 mm; 95% CI: −26.88 to −18.59; p < 0.0001). “Uncertain” responses occurred in 28.2% of video-based assessments and 0% of graph-based assessments. Off-target errors decreased from 50.2% (videos) to 3.6% (graphs). Conclusions: AI-derived graphs of eye-movement data were associated with improved NPC estimation, suggesting a potential role in supporting clinical and tele-ophthalmology workflows. Full article
Show Figures

Figure 1

20 pages, 3850 KB  
Article
Optimization of Indoor Pedestrian Counting Based on Target Detection and Tracking
by Laihao Song, Litao Han, Jiayan Wang, Hengjian Feng and Ran Ji
ISPRS Int. J. Geo-Inf. 2026, 15(3), 136; https://doi.org/10.3390/ijgi15030136 - 21 Mar 2026
Viewed by 119
Abstract
Real-time, precise monitoring of the number and distribution of indoor personnel is crucial for building safety management, operational optimization, and personnel scheduling. However, narrow entrances and high-density passageways often lead to missed detections, false positives, and tracking failures in pedestrian detection, thereby reducing [...] Read more.
Real-time, precise monitoring of the number and distribution of indoor personnel is crucial for building safety management, operational optimization, and personnel scheduling. However, narrow entrances and high-density passageways often lead to missed detections, false positives, and tracking failures in pedestrian detection, thereby reducing cross-line counting accuracy. Additionally, edge devices deployed in practical scenarios frequently process multiple video streams simultaneously, resulting in computational resource constraints. To address these challenges, this paper proposes a lightweight, enhanced multi-object pedestrian tracking and counting method tailored for indoor scenarios by optimizing deep learning models. Firstly, modular optimizations are applied to the YOLOv8n model to construct a more lightweight detector, RL_YOLOv8, reducing computational overhead while maintaining accuracy. Secondly, correlated pedestrian auxiliary prediction and pedestrian position change constraints are employed to mitigate ID switching, tracking interruptions, and trajectory jumps in dense scenes. Finally, a buffer zone auxiliary counting strategy is designed to further reduce missed detections of pedestrians crossing lines. Experimental results demonstrate that compared to the original detection-and-tracking-based line-crossing counting method, the improved approach effectively enhances counting accuracy and real-time performance, better meeting the requirements of practical intelligent security and crowd monitoring systems. Full article
Show Figures

Figure 1

18 pages, 4159 KB  
Article
Advancing Breast Cancer Lesion Analysis in Real-Time Sonography Through Multi-Layer Transfer Learning and Adaptive Tracking
by Suliman Thwib, Radwan Qasrawi, Ghada Issa, Razan AbuGhoush, Hussein AlMasri and Marah Qawasmi
Mach. Learn. Knowl. Extr. 2026, 8(3), 82; https://doi.org/10.3390/make8030082 - 21 Mar 2026
Viewed by 155
Abstract
Background: Real-time and accurate analysis of breast ultrasounds is crucial for diagnosis but remains challenging due to issues like low image contrast and operator dependency. This study aims to address these challenges by developing an integrated framework for real-time lesion detection and [...] Read more.
Background: Real-time and accurate analysis of breast ultrasounds is crucial for diagnosis but remains challenging due to issues like low image contrast and operator dependency. This study aims to address these challenges by developing an integrated framework for real-time lesion detection and tracking. Methods: The proposed system combines Contrast-Limited Adaptive Histogram Equalization (CLAHE) for image preprocessing, a transfer learning-enhanced YOLOv11 model following a continual learning paradigm for cross-center generalization in for lesion detection, and a novel Detection-Based Tracking (DBT) approach that integrates Kernelized Correlation Filters (KCF) with periodic detection verification. The framework was evaluated on a dataset comprising 11,383 static images and 40 ultrasound video sequences, with a subset verified through biopsy and the remainder annotated by two radiologists based on radiological reports. Results: The proposed framework demonstrated high performance across all components. The transfer learning strategy (TL12) significantly improved detection outcomes, achieving a mean Average Precision (mAP) of 0.955, a sensitivity of 0.938, and an F1 score of 0.956. The DBT method (KCF + YOLO) achieved high tracking accuracy, with a success rate of 0.984, an Intersection over Union (IoU) of 0.85, and real-time operation at 54 frames per second (FPS) with a latency of 7.74 ms. The use of CLAHE preprocessing was shown to be a critical factor in improving both detection and tracking stability across diverse imaging conditions. Conclusions: This research presents a robust, fully integrated framework that bridges the gap between speed and accuracy in breast ultrasound analysis. The system’s high performance and real-time efficiency underscore its strong potential for clinical adoption to enhance diagnostic workflows, reduce operator variability, and improve breast cancer assessment. Full article
Show Figures

Figure 1

21 pages, 1028 KB  
Article
Eating Habits, Knowledge and Perceptions of Functional Foods Among Primary School Students in Greece: Pilot Remote Educational Intervention Involving Children and Their Parents
by Irene Chrysovalantou Votsi and Antonios Ε. Koutelidakis
Appl. Sci. 2026, 16(6), 2983; https://doi.org/10.3390/app16062983 - 19 Mar 2026
Viewed by 181
Abstract
Background: Parental knowledge and perceptions towards Functional Foods (FFs) play a critical role in shaping children’s dietary behaviors. This study aimed to investigate dietary habits, FFs knowledge and perceptions among Greek primary school children and their parents and to evaluate the feasibility of [...] Read more.
Background: Parental knowledge and perceptions towards Functional Foods (FFs) play a critical role in shaping children’s dietary behaviors. This study aimed to investigate dietary habits, FFs knowledge and perceptions among Greek primary school children and their parents and to evaluate the feasibility of a one-month pilot asynchronous nutrition education program. Methods: A cross-sectional study included 374 children aged 9–11 years and 159 parents from urban (Thessaloniki) and rural (Lemnos) areas. Children completed questionnaires on dietary habits, FFs knowledge and Mediterranean Diet (MD) adherence (KIDMED score), while parents provided sociodemographic information, BMI, dietary habits, FFs knowledge and perceptions. A pilot asynchronous nutrition education intervention was delivered via pre-recorded videos on FFs, the MD, portion sizes and food label interpretation, with participation tracked and program evaluation conducted among parents. Data was analyzed using IBM SPSS Statistics (version 28). Descriptive statistics were calculated, group differences were assessed with t-tests and ANOVA and associations between variables were examined using chi-square tests and Pearson correlations (p < 0.06). Results: Children showed moderate MD adherence, frequent fast-food and soft drinks consumption and low FF knowledge, with a substantial gap between perceived and actual understanding. Parental FF knowledge was uneven, higher among normal-weight participants and largely limited to fortified products. Positive associations were found between children’s and parents’ diet quality and natural FF consumption, as well as between parental and child physical activity. The asynchronous intervention was positively rated; substantial attrition was observed across sessions and follow-up, which limited the ability to assess the intervention’s effects on behavioral change. Conclusions: This study highlights critical gaps in FFs knowledge among families and demonstrates that asynchronous, family-based nutrition education is feasible but challenged by engagement attrition. Targeted interventions are needed to clarify FF concepts and promote healthier family dietary behaviors. Full article
(This article belongs to the Special Issue Functional Foods and Active Natural Products)
Show Figures

Figure 1

30 pages, 43984 KB  
Article
Edge-Graph Enhanced Network for Multi-Object Tracking in UAV Videos
by Yiming Xu, Hongbing Ji and Yongquan Zhang
Remote Sens. 2026, 18(6), 936; https://doi.org/10.3390/rs18060936 - 19 Mar 2026
Viewed by 165
Abstract
Multi-Object Tracking (MOT) is a fundamental research topic in the field of computer vision, with broad application potential in unmanned aerial vehicle (UAV) videos. However, existing methods still face significant challenges in detection discriminability and identity association stability due to the small scale [...] Read more.
Multi-Object Tracking (MOT) is a fundamental research topic in the field of computer vision, with broad application potential in unmanned aerial vehicle (UAV) videos. However, existing methods still face significant challenges in detection discriminability and identity association stability due to the small scale and weak appearance of objects under aerial viewpoints, as well as complex background interference. To address these issues, we propose an Edge-Graph Enhanced Network (EGEN) for UAV aerial MOT, aiming to improve the performance of small object detection (SOD) and tracking in complex scenes. The framework follows a one-step tracking paradigm and consists of three main components: object detection, embedding feature extraction, and data association. In the detection stage, we design an Edge-Guided Gaussian Enhancement Module (EGGEM), which models edge relationships between objects and backgrounds from a global perspective and selectively enhances Gaussian features guided by edge information, thereby strengthening key structural features of small objects while suppressing background interference. In the embedding feature extraction stage, we develop a Graph-Guided Embedding Enhancement Module (GGEEM), which explicitly represents re-identification (ReID) embeddings as a graph structure and jointly models nodes and their neighborhood relationships to fully capture inter-object associations and enhance embedding discriminability. In the data association stage, we introduce a hierarchical two-stage association strategy to match objects with different confidence levels separately, improving tracking stability and robustness. Extensive experiments on the VisDrone, UAVDT, and self-constructed WildDrone datasets demonstrate that the proposed method significantly outperforms state-of-the-art approaches in both SOD and MOT, demonstrating strong generalization and practical applicability. Full article
Show Figures

Figure 1

22 pages, 3493 KB  
Article
Deepfake Detection Using Multimodal CLIP-Based SigLIP-2 Vision Transformers
by Joe Soundararajan and Dong Xu
AI 2026, 7(3), 115; https://doi.org/10.3390/ai7030115 - 19 Mar 2026
Viewed by 424
Abstract
Background: Deepfakes pose a growing threat to the integrity of visual media, motivating detectors that remain reliable as forgeries become increasingly realistic. Methods: We propose a deepfake detection framework built on CLIP-derived SigLIP-2 vision transformers and a multi-task design that jointly performs (i) [...] Read more.
Background: Deepfakes pose a growing threat to the integrity of visual media, motivating detectors that remain reliable as forgeries become increasingly realistic. Methods: We propose a deepfake detection framework built on CLIP-derived SigLIP-2 vision transformers and a multi-task design that jointly performs (i) classification and (ii) manipulated-region localization when pixel-level supervision is available. We evaluated the approach on three public benchmarks of increasing complexity—HiDF, SID_Set (SIDA), and CiFake—using each dataset’s official partitions where provided (SID_Set uses the predefined train/validation split) and a standardized preprocessing and training pipeline across experiments. Results: On HiDF, our model achieved strong performance on both video and image tracks (AUC up to 0.931 on video and 0.968 on images), yielding large gains relative to previously reported HiDF baselines under their published settings. On SID_Set, the model achieved 99.1% three-class accuracy (real/synthetic/tampered) and produced accurate localization masks for many tampered regions, while we explicitly documented the split protocol and leakage checks to support the validity of the evaluation. On CiFake, the model exceeded 95% accuracy and attained an AUC of 0.986. Conclusions: Overall, the results indicate that SigLIP-2 representations combined with multi-task training can deliver high detection accuracy and interpretable localization on challenging, realistic forgeries, while highlighting the importance of clearly stated evaluation protocols for fair comparison. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

17 pages, 3986 KB  
Article
Miniature Multi-Target Tracking in Sonar Images Using Dual Trajectory Storage Method
by Zhen Huang, Peizhen Zhang, Rui Wang, Xiaoyan Xian, Qi Wang, Jiayu Hu and Qinyu Wu
J. Mar. Sci. Eng. 2026, 14(6), 568; https://doi.org/10.3390/jmse14060568 - 19 Mar 2026
Viewed by 135
Abstract
To address the conflict between trajectory fragmentation and the trade-off between association efficiency and data integrity in underwater micro-scale multi-target sonar motion detection and tracking in video sequences, a multi-target motion detection and tracking algorithm based on a dual trajectory storage mechanism and [...] Read more.
To address the conflict between trajectory fragmentation and the trade-off between association efficiency and data integrity in underwater micro-scale multi-target sonar motion detection and tracking in video sequences, a multi-target motion detection and tracking algorithm based on a dual trajectory storage mechanism and adaptive trajectory association is proposed. The method first obtains target centroids through Gaussian mixture model foreground extraction, morphological post-processing, and connected region analysis. By employing a dual-storage structure consisting of real-time trajectories and complete trajectories, it dynamically adjusts association thresholds based on frame sampling rates to achieve adaptive distance calculation for trajectory tracking. Experimental results demonstrate that the proposed method achieves a completeness rate of 100% in recording valid trajectory point lengths. The adaptive threshold mechanism improves association accuracy to 96.07% while reducing trajectory fragmentation rate to 0.9%. The average association time is 0.28 ms per frame, enabling efficient real-time association while ensuring the integrity of motion trajectory tracking. This research contributes to enhancing real-time detection and tracking capabilities for micro-scale underwater targets and provides support for applications such as underwater security surveillance, marine resource exploration, and intelligent autonomous underwater vehicle navigation. Full article
(This article belongs to the Section Physical Oceanography)
Show Figures

Figure 1

12 pages, 1019 KB  
Proceeding Paper
Intelligent Drone Patrolling with Real-Time Object Detection and GPS-Based Path Adaptation
by Gurugubelli V. S. Narayana, Shiba Prasad Swain, Debabrata Pattnayak, Manas Ranjan Pradhan and P. Ankit Krishna
Eng. Proc. 2026, 124(1), 82; https://doi.org/10.3390/engproc2026124082 - 18 Mar 2026
Viewed by 217
Abstract
Background: The need for autonomous aerial surveillance originates from weaknesses in manual monitoring, such as late response, low scalability and rigid patrol plans. AI and GPS-driven smart aerial monitoring present an attractive solution for continuous adaptive wide-area surveillance. Objective: In this paper, we [...] Read more.
Background: The need for autonomous aerial surveillance originates from weaknesses in manual monitoring, such as late response, low scalability and rigid patrol plans. AI and GPS-driven smart aerial monitoring present an attractive solution for continuous adaptive wide-area surveillance. Objective: In this paper, we aim at designing and validating experimentally a low-cost drone-based unmanned autonomous mission patrolling system with waypoint navigation, real-time video backhauling, AI-based human/object detection and GPS path re-planning when an event occurs to ensure the safety of patrol missions under battery constraints. Methods: The proposed architecture combines autonomous navigation and embedded flight-control with online analog video streaming and ground-station-based computer vision processing. Object detection based on deep learning for live aerial video is used, and the proposed system’s performance is tested at different altitudes, lighting states and GPS patrol plans. Results: Experimental results show that the proposed method can obtain stable waypoint tracking with a clear real-time video downlink in patrol missions. The system is able to adaptively modify paths as a reaction to detected events and commence safe return-to-home functionality during low-battery conditions. The proposed detection model obtains a mean average precision of 87.4%, with an F1-score of 0.89 and real-time inference latency (20–25 ms per frame) that enables fast service without any interruption in practice during surveillance deployment. Conclusions: Experimental results show that the proposed method can obtain stable waypoint tracking with a clear real-time video downlink in patrol missions. The system can adaptively modify paths as a reaction to detected events and commence safe return-to-home functionality during low-battery conditions. The proposed detection model obtains a mean average precision of 87.4%, with an F1-score of 0.89 and real-time inference latency (20–25 ms per frame) that enables fast service without any interruption in practice during surveillance deployment. Full article
(This article belongs to the Proceedings of The 6th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

14 pages, 50163 KB  
Article
Stroke Asymmetry in Bird Wing Dynamics During Flight from Video Data
by Valentina Leontiuk, Innokentiy Kastalskiy, Waleed Khalid and Victor B. Kazantsev
Biomimetics 2026, 11(3), 212; https://doi.org/10.3390/biomimetics11030212 - 16 Mar 2026
Viewed by 570
Abstract
The aerodynamics of avian flight provides critical inspiration for the design of bioinspired aerial vehicles, yet the quantitative characterization of free-flight wing kinematics remains challenging. This study employs a neural-network-based motion tracking approach (DeepLabCut) to analyze wingbeat kinematics in free-flying birds from video [...] Read more.
The aerodynamics of avian flight provides critical inspiration for the design of bioinspired aerial vehicles, yet the quantitative characterization of free-flight wing kinematics remains challenging. This study employs a neural-network-based motion tracking approach (DeepLabCut) to analyze wingbeat kinematics in free-flying birds from video data. We automatically digitize key wing points and reconstruct three-dimensional trajectories to quantify asymmetric flapping patterns. Our analysis reveals that while wing oscillations approximate sinusoidal motion, they exhibit statistically significant velocity differences between upstroke and downstroke phases, confirming the stroke asymmetry of avian flapping. Furthermore, using video of a flying frigatebird (Fregata ariel), we quantify the changes in the effective wing area throughout the wingbeat cycle, showing a ~19% variation that significantly impacts lift generation efficiency. These findings provide quantitative benchmarks for avian-inspired wing design and offer insights for optimizing flapping kinematics in bioinspired aerial systems, particularly for enhancing takeoff and landing capabilities in micro air vehicles. Full article
(This article belongs to the Section Development of Biomimetic Methodology)
Show Figures

Graphical abstract

23 pages, 6668 KB  
Article
Development of a Visual SLAM-Based Autonomous UAV System for Greenhouse Plant Monitoring
by Jing-Heng Lin and Ta-Te Lin
Drones 2026, 10(3), 205; https://doi.org/10.3390/drones10030205 - 15 Mar 2026
Viewed by 371
Abstract
Autonomous monitoring is essential for precision agriculture in greenhouses, yet deploying unmanned aerial vehicles (UAVs) in confined, GPS-denied environments remains limited by payload, power, and cost constraints. This study developed and validated an autonomous UAV system for reliable, low-cost operation in such conditions. [...] Read more.
Autonomous monitoring is essential for precision agriculture in greenhouses, yet deploying unmanned aerial vehicles (UAVs) in confined, GPS-denied environments remains limited by payload, power, and cost constraints. This study developed and validated an autonomous UAV system for reliable, low-cost operation in such conditions. The proposed system employs a dual-link edge-computing architecture: a lightweight onboard controller handles flight control and sensor acquisition, while visual simultaneous localization and mapping (V-SLAM) is offloaded to an edge computer via the FPV video link. Phenotyping (flower detection and tracking/counting) is performed offline from the side-view RGB stream and does not participate in the flight control loop. Using muskmelon (Cucumis melo L.) flower development as a case study, the UAV autonomously executed daily missions for 27 days in a commercial greenhouse, performing flower detection and tracking to monitor phenological dynamics. Localization and control accuracy were evaluated against a validated UWB reference system, achieving 5.4~8.0 cm 2D RMSE for trajectory tracking and 12.7 cm translation RMSE for greenhouse mapping. This work demonstrates a practical architecture for autonomous monitoring in GPS-denied agricultural environments, with operational boundaries characterized through the sustained field deployment. The system’s design principles may extend to other indoor or communication-limited scenarios requiring lightweight, intelligent robotic operation. Full article
(This article belongs to the Section Drones in Agriculture and Forestry)
Show Figures

Figure 1

32 pages, 10936 KB  
Article
PLM-Net: Perception Latency Mitigation Network for Vision-Based Lateral Control of Autonomous Vehicles
by Aws Khalil and Jaerock Kwon
Sensors 2026, 26(6), 1798; https://doi.org/10.3390/s26061798 - 12 Mar 2026
Viewed by 184
Abstract
This study introduces the Perception Latency Mitigation Network (PLM-Net), a modular deep learning framework designed to mitigate perception latency in vision-based imitation-learning lane-keeping systems. Perception latency, defined as the delay between visual sensing and steering actuation, can degrade lateral tracking performance and steering [...] Read more.
This study introduces the Perception Latency Mitigation Network (PLM-Net), a modular deep learning framework designed to mitigate perception latency in vision-based imitation-learning lane-keeping systems. Perception latency, defined as the delay between visual sensing and steering actuation, can degrade lateral tracking performance and steering stability. While delay compensation has been extensively studied in classical predictive control systems, its treatment within vision-based imitation-learning architectures under constant and time-varying perception latency remains limited. Rather than reducing latency itself, PLM-Net mitigates its effect on control performance through a plug-in architecture that preserves the original control pipeline. The framework consists of a frozen Base Model (BM), representing an existing lane-keeping controller, and a Timed Action Prediction Model (TAPM), which predicts future steering actions corresponding to discrete latency conditions. Real-time mitigation is achieved by interpolating between model outputs according to the measured latency value, enabling adaptation to both constant and time-varying latency. The framework is evaluated in a closed-loop deterministic simulation environment under fixed-speed conditions to isolate the impact of perception latency. Results demonstrate significant reductions in steering error under multiple latency settings, achieving up to 62% and 78% reductions in Mean Absolute Error (MAE) for constant and time-varying latency cases, respectively. These findings demonstrate the architectural feasibility of modular latency mitigation for vision-based lateral control under controlled simulation settings. The project page including video demonstrations, code, and dataset is publicly released. Full article
(This article belongs to the Special Issue Intelligent Control Systems for Autonomous Vehicles)
Show Figures

Figure 1

27 pages, 2940 KB  
Article
A Unified Framework for Vehicle Detection, Tracking, and Counting Across Ground and Aerial Views Using Knowledge Distillation with YOLOv10-S
by Md Rezaul Karim Khan and Naphtali Rishe
Remote Sens. 2026, 18(5), 842; https://doi.org/10.3390/rs18050842 - 9 Mar 2026
Viewed by 379
Abstract
Accurate and reliable vehicle detection, tracking, and counting across different surveillance platforms are fundamental requirements for developing smart Traffic Management Systems (TMS) and promoting sustainable urban mobility. Recent advances in both ground-level surveillance and remote sensing using deep learning have opened new opportunities [...] Read more.
Accurate and reliable vehicle detection, tracking, and counting across different surveillance platforms are fundamental requirements for developing smart Traffic Management Systems (TMS) and promoting sustainable urban mobility. Recent advances in both ground-level surveillance and remote sensing using deep learning have opened new opportunities for extracting detailed vehicular information from high-resolution aerial and surveillance video data. Our research reported here aims to present a unified, real-time vehicle analysis framework that integrates lightweight deep learning–based detection, robust multi-object tracking, and trajectory-driven counting within a single modular pipeline. The proposed framework employs a “You Only Look Once” system, YOLOv10-S as the detection backbone and enhances its robustness through supervision-level knowledge distillation without introducing any architectural modifications. Temporal consistency is enforced using an observation-centric multi-object tracking algorithm (OC-SORT), enabling stable identity preservation under camera motion and dense traffic conditions. Vehicle counting is performed using a trajectory-based virtual gate strategy, reducing duplicate counts and improving counting reliability. Comprehensive experiments conducted on the UA-DETRAC and VisDrone benchmarks show that the proposed framework effectively balances detection performance, tracking robustness, counting accuracy, and real-time efficiency in both ground-based and aerial surveillance settings. Furthermore, cross-dataset evaluations under direct train–test transfer highlight the inherent challenges of domain shift while showing that knowledge distillation consistently improves robustness in detection, tracking identity consistency, and vehicle counting. Overall, this framework enables effective real-world traffic monitoring by adopting a scalable and practical system design, where reliability is prioritized over architectural complexity. Full article
(This article belongs to the Section Urban Remote Sensing)
Show Figures

Figure 1

26 pages, 4796 KB  
Article
Research on Damage Identification of Suspension Bridges Based on Visual Image Recognition Technology
by Xingshun Liu and Kun Ma
Appl. Sci. 2026, 16(5), 2553; https://doi.org/10.3390/app16052553 - 6 Mar 2026
Viewed by 228
Abstract
To address the challenge of identifying damage in the hangers and bridge deck systems of long-span suspension bridges, this paper proposes a non-contact monitoring method based on video image recognition. This method extracts structural vibration displacement responses through video acquisition and image analysis, [...] Read more.
To address the challenge of identifying damage in the hangers and bridge deck systems of long-span suspension bridges, this paper proposes a non-contact monitoring method based on video image recognition. This method extracts structural vibration displacement responses through video acquisition and image analysis, and combined with the strain mode change rate index, it achieves damage localization, type identification, and severity assessment. The principle of extracting displacement time-history data from video images is first elaborated, and MATLAB-based computational code is developed, including pixel tracking and time-history curve generation methods. The eigensystem realization algorithm is used to identify displacement mode shapes, which are then converted into strain mode shapes via the central difference method. The strain mode change rate and its deviation rate are proposed as damage indicators: under undamaged conditions, the curve is smooth; at damage locations, peaks appear; the distribution range of peaks can distinguish between hanger damage and bridge deck cracks; the deviation rate quantifies damage severity. The feasibility of the method is validated through finite element simulations and physical model experiments. The results show that hanger damage causes broad peaks, while bridge deck cracks present narrow peaks; the deviation rate increases monotonically with damage severity. Applied to an in-service suspension bridge, the method successfully identified hanger bending and weld cracking, with assessment results consistent with on-site inspections. This study demonstrates that the strain mode change rate analysis based on video images enables damage identification without prior knowledge of the structural health state, relying solely on the damaged state response. Offering advantages such as non-contact measurement, full-field monitoring, and no need for sensor deployment, it provides a new technical approach for the long-term monitoring of suspension bridge hanger systems. Full article
Show Figures

Figure 1

Back to TopTop