Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,117)

Search Parameters:
Keywords = video object detection

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 2494 KB  
Article
User Evaluation by Remote Pilots of Two Types of Detect-and-Avoid Systems: Remain Well Clear Bands Versus Route Guidance
by Sybert Stroeve, Ana Tanevska, Mirco Kroon and Ginevra Castellano
Aerospace 2026, 13(3), 295; https://doi.org/10.3390/aerospace13030295 - 20 Mar 2026
Abstract
The remain well clear (RWC) function of a detect-and-avoid (DAA) system provides guidance to a remote pilot (RP) of a remotely piloted aircraft to prevent a conflict from developing into a collision hazard. The ACAS Xu standard is a decision support system that [...] Read more.
The remain well clear (RWC) function of a detect-and-avoid (DAA) system provides guidance to a remote pilot (RP) of a remotely piloted aircraft to prevent a conflict from developing into a collision hazard. The ACAS Xu standard is a decision support system that uses RWC bands to advise a RP which headings to avoid. A recent A* DAA system is a resolution support system that advises a RP which route to take. The objective of this study is to achieve structured feedback by professional RPs on the horizontal RWC guidance of both systems. Nine RPs participated in on-line experiments, where they were shown videos of DAA displays of encounter scenarios between two aircraft. At various stages the RPs were asked for their opinion about transparency, pilot manoeuvring, situation awareness, display orientation, risk perception, competence, trust, and overall system preference. The results show that the scores for competence, trust and pilot manoeuvring were significantly higher, and the score for perceived risk was significant lower for the RWC route guidance. Overall, 89% of the RPs preferred the RWC route guidance, while one RP had no preference. An implication of the uncertainty in pilot behaviour is that ACAS Xu model-based optimisation may provide suboptimal RWC guidance strategies, while the A* DAA optimisation can be managed effectively. Full article
(This article belongs to the Section Air Traffic and Transportation)
14 pages, 18688 KB  
Article
Outdoor Motion Capture at Scale
by Michael Zwölfer, Martin Mössner, Helge Rhodin and Werner Nachbauer
Sensors 2026, 26(6), 1951; https://doi.org/10.3390/s26061951 - 20 Mar 2026
Abstract
Capturing kinematic data in outdoor sports is challenging, as motions span large capture volumes and occur under difficult environmental conditions. Video-based approaches, particularly with pan–tilt–zoom cameras, offer a practical solution, but the extensive manual post-processing required limits their use to short sequences and [...] Read more.
Capturing kinematic data in outdoor sports is challenging, as motions span large capture volumes and occur under difficult environmental conditions. Video-based approaches, particularly with pan–tilt–zoom cameras, offer a practical solution, but the extensive manual post-processing required limits their use to short sequences and few athletes. This study presents a motion capture pipeline that automates the detection of both reference points and sport-specific keypoints to overcome this limitation. The field test employed eight cameras covering a 250×80×30 m capture volume with nearly 300 reference points. Ten state-certified ski instructors performed eight standardized maneuvers. Reference points were localized through a hybrid approach combining YOLO object detection and ArUco marker identification. AlphaPose was fine-tuned on a new manually annotated dataset to detect skier-specific keypoints (e.g., skis, poles) alongside anatomical landmarks. Continuous frame-wise calibration and 3D reconstruction were performed using Direct Linear Transformation. Evaluation compared automated detections with manual annotations. Automated reference point detection achieved a mean localization error of 4.1 pixels (0.1% of 4K width) and reduced 3D segment-length variation by 23%. The skier-specific keypoint model reached 98% PCK, mAP of 0.97, and an MPJPE of 10.3 pixels while lowering 3D segment-length variation by 0.5 cm compared to manual digitization and 0.6 cm relative to a pretrained model. Replacing manual digitization with automated detection improves accuracy and facilitates kinematic data collection in large outdoor fields with many athletes and trials. The approach also enables the creation of sport-specific datasets valuable for biomechanical research and training next-generation 3D pose estimation models. Full article
(This article belongs to the Special Issue Advanced Sensors in Biomechanics and Rehabilitation—2nd Edition)
Show Figures

Figure 1

30 pages, 43984 KB  
Article
Edge-Graph Enhanced Network for Multi-Object Tracking in UAV Videos
by Yiming Xu, Hongbing Ji and Yongquan Zhang
Remote Sens. 2026, 18(6), 936; https://doi.org/10.3390/rs18060936 - 19 Mar 2026
Abstract
Multi-Object Tracking (MOT) is a fundamental research topic in the field of computer vision, with broad application potential in unmanned aerial vehicle (UAV) videos. However, existing methods still face significant challenges in detection discriminability and identity association stability due to the small scale [...] Read more.
Multi-Object Tracking (MOT) is a fundamental research topic in the field of computer vision, with broad application potential in unmanned aerial vehicle (UAV) videos. However, existing methods still face significant challenges in detection discriminability and identity association stability due to the small scale and weak appearance of objects under aerial viewpoints, as well as complex background interference. To address these issues, we propose an Edge-Graph Enhanced Network (EGEN) for UAV aerial MOT, aiming to improve the performance of small object detection (SOD) and tracking in complex scenes. The framework follows a one-step tracking paradigm and consists of three main components: object detection, embedding feature extraction, and data association. In the detection stage, we design an Edge-Guided Gaussian Enhancement Module (EGGEM), which models edge relationships between objects and backgrounds from a global perspective and selectively enhances Gaussian features guided by edge information, thereby strengthening key structural features of small objects while suppressing background interference. In the embedding feature extraction stage, we develop a Graph-Guided Embedding Enhancement Module (GGEEM), which explicitly represents re-identification (ReID) embeddings as a graph structure and jointly models nodes and their neighborhood relationships to fully capture inter-object associations and enhance embedding discriminability. In the data association stage, we introduce a hierarchical two-stage association strategy to match objects with different confidence levels separately, improving tracking stability and robustness. Extensive experiments on the VisDrone, UAVDT, and self-constructed WildDrone datasets demonstrate that the proposed method significantly outperforms state-of-the-art approaches in both SOD and MOT, demonstrating strong generalization and practical applicability. Full article
Show Figures

Figure 1

19 pages, 34223 KB  
Article
A Real Time Multi Modal Computer Vision Framework for Automated Autism Spectrum Disorder Screening
by Lehel Dénes-Fazakas, Ioan Catalin Mateas, Alexandru George Berciu, László Szilágyi, Levente Kovács and Eva-H. Dulf
Electronics 2026, 15(6), 1287; https://doi.org/10.3390/electronics15061287 - 19 Mar 2026
Abstract
Background: The early detection of autism spectrum disorder (ASD) is imperative for enhancing long-term developmental outcomes. Nevertheless, conventional screening methods depend on time-consuming, expert-driven behavioral assessments and are characterized by limited scalability. Automated video-based analysis provides a noninvasive and objective approach for the [...] Read more.
Background: The early detection of autism spectrum disorder (ASD) is imperative for enhancing long-term developmental outcomes. Nevertheless, conventional screening methods depend on time-consuming, expert-driven behavioral assessments and are characterized by limited scalability. Automated video-based analysis provides a noninvasive and objective approach for the extraction of behavioral biomarkers from naturalistic recordings. Methods: A modular multimodal framework was developed that integrates motion-based video analysis and facial feature extraction for the purpose of ASD versus typically developing (TD) classification. The system is capable of processing RGB videos, skeleton/stickman representations, and motion trajectory streams. A comprehensive set of kinematic features was extracted, encompassing joint trajectories, velocity and acceleration profiles, posture variability, movement smoothness, and bilateral asymmetry. The repetitive stereotypical behaviors exhibited by the subjects were characterized using frequency-domain analysis via FFT within the 0.3–7.0 Hz band. Facial expression features derived from normalized face crops and landmark-based morphological descriptors were integrated as complementary modalities. The feature-level fusion process was executed subsequent to z-score normalization, and the classification procedure was conducted using a Random Forest model with stratified 5-fold cross validation. The implementation of GPU acceleration was instrumental in facilitating near real-time inference. Results: The motion-based ComplexVideos pipeline demonstrated a cross-validated accuracy of 94.2 ± 2.1% with an area under the ROC curve (AUC) of 0.93. Skeleton-based KinectStickman inputs demonstrated moderate performance, with an accuracy range of 60–80%. In contrast, facial-only models exhibited an accuracy of approximately 60%. The integration of multiple modalities through feature fusion has been demonstrated to enhance the robustness of classification algorithms and mitigate the occurrence of false negative outcomes, thereby surpassing the performance of single-modality models. The mean inference time remained below one second per video frame under standard operating conditions. Conclusions: The experimental results demonstrate that the integration of multimodal cues, including motion and facial features, facilitates the development of effective and efficient video-based screening methods for autism spectrum disorder (ASD). The proposed framework is designed to offer a scalable, extensible, and computationally efficient solution that can support early screening in clinical and remote assessment settings. Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning for Biometric Systems)
Show Figures

Figure 1

19 pages, 37608 KB  
Article
ZoomPatch: An Adaptive PTZ Scheduling Framework for Small Object Video Analytics
by Shutong Chen, Binhua Liang and Yan Chen
Appl. Sci. 2026, 16(6), 2934; https://doi.org/10.3390/app16062934 - 18 Mar 2026
Viewed by 30
Abstract
Accurate detection of small objects in video analytics is limited by low pixel resolution and insufficient visual cues. While software-based enhancements often fail to recover missing details, Pan–Tilt–Zoom (PTZ) cameras can physically increase spatial resolution through optical zoom. However, mechanical latency and configuration [...] Read more.
Accurate detection of small objects in video analytics is limited by low pixel resolution and insufficient visual cues. While software-based enhancements often fail to recover missing details, Pan–Tilt–Zoom (PTZ) cameras can physically increase spatial resolution through optical zoom. However, mechanical latency and configuration complexity hinder their real-time applicability. We propose ZoomPatch, a real-time video analytics framework tailored for small object detection. ZoomPatch actively schedules PTZ adjustments to capture optically enhanced subframes of regions of interest (ROIs) and fuses inference results back to the global reference frame. Specifically, it introduces a dynamic Cycle Length Proposer to adapt analysis cycles based on scene motion, and a Mixed Integer Linear Programming (MILP)-based Configuration Decider to determine the optimal sequence of pan, tilt, and zoom adjustments under time budget constraints. Simulation-based experimental evaluations across diverse workloads demonstrate that ZoomPatch significantly outperforms fixed-perspective, super-resolution (SR), and greedy baselines. Notably, in the detection task using YOLOv10, ZoomPatch improves the F1-score from 0.33 to 0.47 (a 42% increase) compared to the fixed-perspective baseline. Furthermore, ZoomPatch yields performance gains of 30% and 7% over the SR baseline (0.36) and the greedy baseline (0.44). Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

12 pages, 1019 KB  
Proceeding Paper
Intelligent Drone Patrolling with Real-Time Object Detection and GPS-Based Path Adaptation
by Gurugubelli V. S. Narayana, Shiba Prasad Swain, Debabrata Pattnayak, Manas Ranjan Pradhan and P. Ankit Krishna
Eng. Proc. 2026, 124(1), 82; https://doi.org/10.3390/engproc2026124082 - 18 Mar 2026
Viewed by 86
Abstract
Background: The need for autonomous aerial surveillance originates from weaknesses in manual monitoring, such as late response, low scalability and rigid patrol plans. AI and GPS-driven smart aerial monitoring present an attractive solution for continuous adaptive wide-area surveillance. Objective: In this paper, we [...] Read more.
Background: The need for autonomous aerial surveillance originates from weaknesses in manual monitoring, such as late response, low scalability and rigid patrol plans. AI and GPS-driven smart aerial monitoring present an attractive solution for continuous adaptive wide-area surveillance. Objective: In this paper, we aim at designing and validating experimentally a low-cost drone-based unmanned autonomous mission patrolling system with waypoint navigation, real-time video backhauling, AI-based human/object detection and GPS path re-planning when an event occurs to ensure the safety of patrol missions under battery constraints. Methods: The proposed architecture combines autonomous navigation and embedded flight-control with online analog video streaming and ground-station-based computer vision processing. Object detection based on deep learning for live aerial video is used, and the proposed system’s performance is tested at different altitudes, lighting states and GPS patrol plans. Results: Experimental results show that the proposed method can obtain stable waypoint tracking with a clear real-time video downlink in patrol missions. The system is able to adaptively modify paths as a reaction to detected events and commence safe return-to-home functionality during low-battery conditions. The proposed detection model obtains a mean average precision of 87.4%, with an F1-score of 0.89 and real-time inference latency (20–25 ms per frame) that enables fast service without any interruption in practice during surveillance deployment. Conclusions: Experimental results show that the proposed method can obtain stable waypoint tracking with a clear real-time video downlink in patrol missions. The system can adaptively modify paths as a reaction to detected events and commence safe return-to-home functionality during low-battery conditions. The proposed detection model obtains a mean average precision of 87.4%, with an F1-score of 0.89 and real-time inference latency (20–25 ms per frame) that enables fast service without any interruption in practice during surveillance deployment. Full article
(This article belongs to the Proceedings of The 6th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

22 pages, 17744 KB  
Article
Task-Aware Low-Light Image Enhancement Method for Underground Coal Mine Monitoring
by Zhirui Yan, Yaru Li, Hongwei Wang, Zhixin Jin, Lei Tao and Yide Geng
Sensors 2026, 26(6), 1886; https://doi.org/10.3390/s26061886 - 17 Mar 2026
Viewed by 101
Abstract
Video AI recognition is crucial for coal mine safety, but complex environments often yield low-quality images, hindering intelligent monitoring. Existing enhancement methods typically focus on image quality alone, lacking adaptability to specific tasks. Therefore, we propose Mine-DCE-YDT: a task-aware low-light image enhancement model [...] Read more.
Video AI recognition is crucial for coal mine safety, but complex environments often yield low-quality images, hindering intelligent monitoring. Existing enhancement methods typically focus on image quality alone, lacking adaptability to specific tasks. Therefore, we propose Mine-DCE-YDT: a task-aware low-light image enhancement model that jointly optimizes enhancement with downstream object detection, ensuring enhanced images are both visually clearer and more conducive to accurate detection. Firstly, an improved Zero-DCE algorithm (Mine-DCE) is presented by introducing a Brightness-aware Mask Coordinate Attention (BMCA) module to improve illumination balance in the Value channel of the HSV image and a Multi-scale Detail Enhancement (MDE) module to reinforce textures and suppress noise. Then, Mine-DCE is co-modeled with YOLOv11n by training end-to-end via a joint loss fusing detection and enhancement quality losses to form Mine-DCE-YDT, which can enhance specific details containing image detection targets. Experimental results show that compared with Zero-DCE, Mine-DCE-YDT achieves reductions of 9.5% in NIQE and 35.5% in BRISQUE on the custom-constructed MineDataset and exhibits great enhancement performance on the public dataset LOL-V1. For the miner detection task in MineDataset, the integration of Mine-DCE-YDT with YOLOv11n achieves increases of 2.8% and 8.3% in mAP@0.5 and mAP@0.5:0.95, demonstrating its effectiveness in enhancing task-critical image features. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

23 pages, 2071 KB  
Article
The Face of Low Back Pain: A Preliminary Method for Quantifying Pain-Related Facial Expressions
by Franciele Parolini, Ricardo Pires, Sara Dereste dos Santos, Márcio F. Goethel, Klaus Becker, João Paulo Vilas-Boas, Rubim Santos and Ulysses F. Ervilha
Appl. Sci. 2026, 16(6), 2830; https://doi.org/10.3390/app16062830 - 16 Mar 2026
Viewed by 120
Abstract
Background: Facial expressions of pain are essential for pain assessment, yet subjective pain reports often vary between sexes. Traditional self-report measures are prone to bias, and objective methods are needed for more reliable pain evaluation. Objective: To develop and validate a subjectivity-free automated [...] Read more.
Background: Facial expressions of pain are essential for pain assessment, yet subjective pain reports often vary between sexes. Traditional self-report measures are prone to bias, and objective methods are needed for more reliable pain evaluation. Objective: To develop and validate a subjectivity-free automated tool to assess acute low back pain using facial expressions recorded during a functional spinal extension task. Participants: Thirty healthy adults, aged 18–40 years. Methods: Participants received intramuscular injections of hypertonic (pain) and isotonic (placebo) saline in the lumbar region during separate sessions. Facial expressions were video-recorded during a submaximal lumbar extension task and analyzed using a custom software based on Haar Cascade and Local Binary Pattern Histogram algorithms, which are techniques that do not require neither training data nor subjective labeling, contrary to what happens in deep learning solutions. Results: The tool successfully detected significant differences in facial expressions between pain, placebo, and pain-free conditions (p < 0.001). Test–retest reliability was good (ICC = 0.85). While both sexes showed similar facial expression patterns during pain, males reported higher pain scores on the numeric rating scale (p < 0.01). Pain significantly reduced steadiness of force in both sexes. Conclusion: The automated tool objectively quantified facial expressions associated with acute low back pain and revealed sex-related differences in subjective pain perception. This multimodal approach integrating expression analysis, physical performance, and self-report may enhance the accuracy of pain assessment in physiotherapy settings. Full article
(This article belongs to the Section Applied Biosciences and Bioengineering)
Show Figures

Figure 1

23 pages, 2397 KB  
Article
Video Anomaly Detection Through Spatial–Temporal Feature Relocalization and Calibrated Trajectory Modeling
by Jie Xu, Chenglizhao Chen, Xinyu Liu, Mengke Song and Huaye Zhang
Electronics 2026, 15(6), 1199; https://doi.org/10.3390/electronics15061199 - 13 Mar 2026
Viewed by 146
Abstract
To address the limitations of existing video anomaly detection methods that overly rely on pixel-space reconstruction and are sensitive to background noise and object scale variations, a self-supervised contrastive learning approach that integrates spatial–temporal feature relocalization with camera-calibrated trajectory modeling is proposed. The [...] Read more.
To address the limitations of existing video anomaly detection methods that overly rely on pixel-space reconstruction and are sensitive to background noise and object scale variations, a self-supervised contrastive learning approach that integrates spatial–temporal feature relocalization with camera-calibrated trajectory modeling is proposed. The proposed method takes spatial–temporal feature relocalization as the core task and constructs a feature-level contrastive learning mechanism to guide the model to focus on discriminative local appearance variations and global temporal semantic evolution. While suppressing background interference and scale-related noise, the method enhances the modeling of fine-grained appearance anomalies and global action-related temporal anomalies. Furthermore, camera calibration is introduced to recover continuous object trajectories in physical space, and a temporal aggregation module is designed to jointly model object motion patterns in pixel space and physical space, thereby improving the model’s ability to perceive complex anomalous behaviors. Experimental results on multiple public video anomaly detection benchmarks demonstrate that the proposed method consistently outperforms existing approaches, validating its effectiveness and generalization capability. Full article
Show Figures

Figure 1

20 pages, 4366 KB  
Article
Intelligent Detection of Asphalt Pavement Cracks Based on Improved YOLOv8s
by Jinfei Su, Jicong Xu, Chuqiao Shi, Yuhan Wang, Shihao Dong and Xue Zhang
Coatings 2026, 16(3), 359; https://doi.org/10.3390/coatings16030359 - 12 Mar 2026
Viewed by 217
Abstract
The intelligent detection of asphalt pavement cracks has become increasingly important for ensuring service performance of road infrastructure. Traditional manual detection has significant safety hazards and insufficient accuracy. Furthermore, existing deep learning models still face challenges, including missed detection, false alarms, and poor [...] Read more.
The intelligent detection of asphalt pavement cracks has become increasingly important for ensuring service performance of road infrastructure. Traditional manual detection has significant safety hazards and insufficient accuracy. Furthermore, existing deep learning models still face challenges, including missed detection, false alarms, and poor performance in small target detection under complex conditions. This investigation adopts unmanned aerial vehicles (UAVs) to acquire pavement distress information and develops an intelligent detection approach for asphalt pavement crack based on improved YOLOv8s. First, the Spatial Pyramid Pooling Fast (SPPF) module is replaced with the Spatial Pyramid Pooling Fast with Cross Stage Partial Connections (SPPFCSPC) module in the backbone network to enhance the multi-scale feature fusion capability. Secondly, the Convolutional Block Attention Module (CBAM) module is introduced to the neck network to optimize the feature weights in both channel and spatial attention. Meanwhile, the Efficient Intersection over Union (EIoU) loss is adopted to improve accuracy. Finally, the Crack_Dataset is established, and the ablation experiments are conducted to verify the reliability of the detection model. The research indicates that the improved model achieves Precision, Recall, and mAP@0.5 of 83.9%, 79.6%, and 83.9%, respectively, representing increases of 1.5%, 1.3%, and 1.4%, compared with the baseline model. In comparison with mainstream object detection algorithms such as YOLOv5s and YOLOv8s, the proposed method attains an F1-score, mAP@0.5, and mAP@[0.5–0.95] of 0.82, 83.9%, and 46.6%, respectively, demonstrating a performance improvement. Based on the improved detection model, a pavement crack detection system was designed and implemented using PyQt5. This system supports image, video, and real-time camera input and detection. Full article
(This article belongs to the Special Issue Pavement Surface Status Evaluation and Smart Perception)
Show Figures

Figure 1

21 pages, 4639 KB  
Article
Deep Learning-Based Real-Time Vehicle Tire and Tank Temperature Monitoring Using Thermal Cameras
by Yaoyao Hu, Jiaxin Li, Chuanyi Ma, Shuai Cheng, Ruolin Zheng and Xingang Zhang
Appl. Sci. 2026, 16(6), 2656; https://doi.org/10.3390/app16062656 - 11 Mar 2026
Viewed by 151
Abstract
Ensuring the driving safety of hazardous chemical vehicles is a critical priority. High temperatures in tires and tanks can lead to catastrophic accidents, including fires and road damage, particularly in bridge and tunnel sections. Therefore, the purpose of this study is to utilize [...] Read more.
Ensuring the driving safety of hazardous chemical vehicles is a critical priority. High temperatures in tires and tanks can lead to catastrophic accidents, including fires and road damage, particularly in bridge and tunnel sections. Therefore, the purpose of this study is to utilize deep learning to obtain the temperature of vehicle tires and tanks in real time. We constructed a comprehensive dataset by combining the FLIR infrared vehicle dataset, the SPT visible tire dataset, and self-collected thermal video frames captured in various environments. State-of-the-art object detection models, including different scales of YOLOv8, YOLOv9, and YOLOv10, were evaluated for the multi-target detection of vehicles, tires, and tanks. Comparative analysis reveals that the YOLOv8-L model optimized with the GIoU loss function delivers the best performance. Specifically, it achieves a mean Average Precision (mAP) of 97.9% with an average inference time of 6.9 ms per frame, effectively balancing accuracy and real-time efficiency. Finally, by mapping the detection bounding boxes to the radiometric temperature matrix, the system achieves precise, real-time temperature monitoring of the vehicle components. Full article
Show Figures

Figure 1

27 pages, 2940 KB  
Article
A Unified Framework for Vehicle Detection, Tracking, and Counting Across Ground and Aerial Views Using Knowledge Distillation with YOLOv10-S
by Md Rezaul Karim Khan and Naphtali Rishe
Remote Sens. 2026, 18(5), 842; https://doi.org/10.3390/rs18050842 - 9 Mar 2026
Viewed by 304
Abstract
Accurate and reliable vehicle detection, tracking, and counting across different surveillance platforms are fundamental requirements for developing smart Traffic Management Systems (TMS) and promoting sustainable urban mobility. Recent advances in both ground-level surveillance and remote sensing using deep learning have opened new opportunities [...] Read more.
Accurate and reliable vehicle detection, tracking, and counting across different surveillance platforms are fundamental requirements for developing smart Traffic Management Systems (TMS) and promoting sustainable urban mobility. Recent advances in both ground-level surveillance and remote sensing using deep learning have opened new opportunities for extracting detailed vehicular information from high-resolution aerial and surveillance video data. Our research reported here aims to present a unified, real-time vehicle analysis framework that integrates lightweight deep learning–based detection, robust multi-object tracking, and trajectory-driven counting within a single modular pipeline. The proposed framework employs a “You Only Look Once” system, YOLOv10-S as the detection backbone and enhances its robustness through supervision-level knowledge distillation without introducing any architectural modifications. Temporal consistency is enforced using an observation-centric multi-object tracking algorithm (OC-SORT), enabling stable identity preservation under camera motion and dense traffic conditions. Vehicle counting is performed using a trajectory-based virtual gate strategy, reducing duplicate counts and improving counting reliability. Comprehensive experiments conducted on the UA-DETRAC and VisDrone benchmarks show that the proposed framework effectively balances detection performance, tracking robustness, counting accuracy, and real-time efficiency in both ground-based and aerial surveillance settings. Furthermore, cross-dataset evaluations under direct train–test transfer highlight the inherent challenges of domain shift while showing that knowledge distillation consistently improves robustness in detection, tracking identity consistency, and vehicle counting. Overall, this framework enables effective real-world traffic monitoring by adopting a scalable and practical system design, where reliability is prioritized over architectural complexity. Full article
(This article belongs to the Section Urban Remote Sensing)
Show Figures

Figure 1

10 pages, 2909 KB  
Proceeding Paper
Sea Turtle Recognition with Multiple Data Augmentation Methods Suitable for Marine Scenarios
by Yi-Chieh Hung, Jhih-Ya Chan, Wei-Cheng Lien, Yan-Tsung Peng and Li-Shu Chen
Eng. Proc. 2026, 128(1), 11; https://doi.org/10.3390/engproc2026128011 - 9 Mar 2026
Viewed by 195
Abstract
The sea turtle is an indicator organism used in marine conservation to identify the health status of ecosystems in various marine regions. In the past, researchers had to review an 8 h underwater video every day to monitor and count sea turtle appearances. [...] Read more.
The sea turtle is an indicator organism used in marine conservation to identify the health status of ecosystems in various marine regions. In the past, researchers had to review an 8 h underwater video every day to monitor and count sea turtle appearances. However, since sea turtles often appear for only short periods, traditional approaches of manual searching and counting require significant labor and time to ensure accurate periods of their appearance. To address this issue, we adopted the You Only Look Once (YOLO) model for object detection, utilizing real underwater videos captured from three different areas in the Taiwan Keelung City Chaojing Bay Aquatic Plants and Animals Conservation Area for training and testing. To overcome limitations, such as underwater blur, sediment interference, obstructions from other fish, and distant targets that are challenging to identify, we applied data augmentation techniques, including scaling, rotation, and depth blur, with labeled data of different fish species to improve generalization capability. The experimental results of this study showed that this method achieves a 99.4% accuracy in sea turtle detection. After 60 days of deployment across the three areas, the model reduced search time by over 99%, significantly improving efficiency and reducing workload. Full article
Show Figures

Figure 1

20 pages, 1396 KB  
Article
A Cascaded Framework for Vehicle Detection in Low-Resolution Traffic Surveillance Videos
by Tao Yu and Laura Sevilla-Lara
Electronics 2026, 15(5), 1119; https://doi.org/10.3390/electronics15051119 - 8 Mar 2026
Viewed by 273
Abstract
Traffic surveillance cameras, as core sensing devices in smart cities, are crucial for traffic management, violation detection, and autonomous driving. However, due to deployment constraints and hardware limitations, the videos they capture often suffer from low resolution and noise, leading to missed and [...] Read more.
Traffic surveillance cameras, as core sensing devices in smart cities, are crucial for traffic management, violation detection, and autonomous driving. However, due to deployment constraints and hardware limitations, the videos they capture often suffer from low resolution and noise, leading to missed and false detections in traditional object detection algorithms trained on high-resolution data. To address this issue, this study proposes a cascaded collaborative framework that integrates video super-resolution (VSR) and object detection for robust perception in low-quality traffic surveillance scenarios. First, a transformer-based VSR model with masked intra- and inter-frame attention (MIA-VSR) is employed to reconstruct temporally coherent high-resolution video sequences from degraded inputs. A domain-specific super-resolved dataset is subsequently constructed to train a lightweight one-stage detector (You Only Look One-level Feature, YOLOF) for efficient vehicle localisation. Extensive experiments on public datasets (REDS, Vimeo90k, UA-DETRAC) demonstrate that the proposed framework achieved a 56.89 mAP@0.5 on low-resolution UA-DETRAC, outperforming both direct low-resolution inference (39.17 mAP@0.5) and conventional fine-tuning strategies (45.70 mAP@0.5) by 17.72 and 11.19 points, respectively. These findings indicate that super-resolution-driven data reconstruction provides an effective pathway for mitigating feature degradation in low-quality surveillance environments, offering both theoretical insight and practical value for intelligent transportation perception systems. Full article
Show Figures

Figure 1

15 pages, 2830 KB  
Article
Development of a Deep-Learning Model for Automated Detection and Quantification of Bleeding in Unilateral Biportal Endoscopic Spine Surgery
by Takaki Yoshimizu, Daisuke Sakai, Daiki Morita, Meng-Huang Wu, Teruaki Miyake, Sanshiro Saito, Tetsutaro Mizuno, Ushio Nosaka, Keisuke Ishii, Mizuki Watanabe and Kanji Sasaki
J. Clin. Med. 2026, 15(5), 1934; https://doi.org/10.3390/jcm15051934 - 4 Mar 2026
Viewed by 272
Abstract
Objectives: To develop and validate a deep-learning model capable of detecting and quantifying intraoperative bleeding to objectively evaluate visual field impairment in unilateral biportal endoscopic spine surgery (UBE). Methods: Overall, 223,568 still images were extracted from 20 UBE videos and used to train [...] Read more.
Objectives: To develop and validate a deep-learning model capable of detecting and quantifying intraoperative bleeding to objectively evaluate visual field impairment in unilateral biportal endoscopic spine surgery (UBE). Methods: Overall, 223,568 still images were extracted from 20 UBE videos and used to train a U-Net++ segmentation model based on the red masks generated using hue, saturation, and value (HSV) thresholding. The model was fine-tuned using 350 manually annotated images that differentiated clinically relevant bleeding (red masks) from non-bleeding red regions (zero masks). The model performance was evaluated against 180 ground-truth images annotated by three spine surgeons, which were extracted from videos that were separate from those used for training and fine-tuning. Dice and intersection-over-union (IoU) scores were calculated, and correlation analyses were performed based on inter-annotator agreement. Results: The HSV-based model reproduced the red regions with high fidelity; however, it showed limited agreement with the ground-truth bleeding regions (median Dice = 0.57, IoU = 0.40). The fine-tuned model improved substantially. For image-wise binary classification of bleeding presence, the model achieved an accuracy of 86%, with a sensitivity of 93% and a specificity of 60%. For pixel-level segmentation performance, the model achieved a median Dice score of 0.79 and a median IoU of 0.65 on ground-truth-positive images. Dice performance exceeded 0.80 in cases with strong inter-surgeon ground-truth concordance (≥0.80) and substantial bleeding area (>20%). Conclusions: This deep-learning model can accurately detect clinically meaningful intraoperative bleeding in UBE and quantify visual field impairments in still images and surgical videos. Future applications include the evaluation of hemostatic techniques, postoperative image-based assessment of surgical quality, and real-time intraoperative bleeding alerts to support surgical decision-making. Full article
Show Figures

Figure 1

Back to TopTop