Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,454)

Search Parameters:
Keywords = video features

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 444 KiB  
Brief Report
Swiping Disrupts Switching: Preliminary Evidence for Reduced Cue-Based Preparation Following Short-Form Video Exposure
by Wanying Luo, Xinran Zhao, Bingshan Jiang, Qiang Fu and Juan’er Zheng
Behav. Sci. 2025, 15(8), 1070; https://doi.org/10.3390/bs15081070 - 6 Aug 2025
Abstract
The rapid rise of short-form video platforms such as TikTok and Instagram Reels has transformed digital engagement by promoting fragmented, high-tempo swiping behaviors and intense sensory stimulation. While these platforms dominate daily use, their impact on higher-order cognition remains underexplored. This study provides [...] Read more.
The rapid rise of short-form video platforms such as TikTok and Instagram Reels has transformed digital engagement by promoting fragmented, high-tempo swiping behaviors and intense sensory stimulation. While these platforms dominate daily use, their impact on higher-order cognition remains underexplored. This study provides preliminary behavioral experimental evidence that even brief exposure to short-form video environments may be associated with reduced cue-based task preparation, a specific subcomponent of proactive cognitive flexibility. In a randomized between-subjects design, participants (N = 72) viewed either 30 min of TikTok-style content, a neutral documentary, or no video (passive control), followed by a task-switching paradigm with manipulated cue–target intervals (CTIs). As expected, the documentary and control group exhibited significant preparation benefits at longer CTIs, reflected in reduced switching costs—consistent with effective anticipatory task-set updating. In contrast, the short video group failed to leverage extended preparation time, indicating a selective disruption of goal-driven processing. Notably, performance at short CTIs did not differ across groups, reinforcing the interpretation that reactive control remained intact, while proactive preparation was selectively impaired. These findings link habitual “swiping” to disrupted task-switching efficiency—a phenomenon summarized as swiping disrupts switching. These findings suggest that short-form video exposure may temporarily bias attentional regulation toward stimulus-driven reactivity, thereby undermining anticipatory cognitive control. Given the widespread use of short-form video platforms—especially among young adults—these results underscore the need to better understand how media design features interact with cognitive control systems. Full article
(This article belongs to the Section Cognition)
Show Figures

Figure 1

25 pages, 6821 KiB  
Article
Hierarchical Text-Guided Refinement Network for Multimodal Sentiment Analysis
by Yue Su and Xuying Zhao
Entropy 2025, 27(8), 834; https://doi.org/10.3390/e27080834 - 6 Aug 2025
Abstract
Multimodal sentiment analysis (MSA) benefits from integrating diverse modalities (e.g., text, video, and audio). However, challenges remain in effectively aligning non-text features and mitigating redundant information, which may limit potential performance improvements. To address these challenges, we propose a Hierarchical Text-Guided Refinement Network [...] Read more.
Multimodal sentiment analysis (MSA) benefits from integrating diverse modalities (e.g., text, video, and audio). However, challenges remain in effectively aligning non-text features and mitigating redundant information, which may limit potential performance improvements. To address these challenges, we propose a Hierarchical Text-Guided Refinement Network (HTRN), a novel framework that refines and aligns non-text modalities using hierarchical textual representations. We introduce Shuffle-Insert Fusion (SIF) and the Text-Guided Alignment Layer (TAL) to enhance crossmodal interactions and suppress irrelevant signals. In SIF, empty tokens are inserted at fixed intervals in unimodal feature sequences, disrupting local correlations and promoting more generalized representations with improved feature diversity. The TAL guides the refinement of audio and visual representations by leveraging textual semantics and dynamically adjusting their contributions through learnable gating factors, ensuring that non-text modalities remain semantically coherent while retaining essential crossmodal interactions. Experiments demonstrate that the HTRN achieves state-of-the-art performance with accuracies of 86.3% (Acc-2) on CMU-MOSI, 86.7% (Acc-2) on CMU-MOSEI, and 80.3% (Acc-2) on CH-SIMS, outperforming existing methods by 0.8–3.45%. Ablation studies validate the contributions of SIF and the TAL, showing 1.9–2.1% performance gains over baselines. By integrating these components, the HTRN establishes a robust multimodal representation learning framework. Full article
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

29 pages, 1483 KiB  
Article
Empowering Independence for Visually Impaired Museum Visitors Through Enhanced Accessibility
by Theresa Zaher Nasser, Tsvi Kuflik and Alexandra Danial-Saad
Sensors 2025, 25(15), 4811; https://doi.org/10.3390/s25154811 - 5 Aug 2025
Abstract
Museums serve as essential cultural centers, yet their mostly visual exhibits restrict access for blind and partially sighted (BPS) individuals. While recent technological advances have started to bridge this gap, many accessibility solutions focus mainly on basic inclusion rather than promoting independent exploration. [...] Read more.
Museums serve as essential cultural centers, yet their mostly visual exhibits restrict access for blind and partially sighted (BPS) individuals. While recent technological advances have started to bridge this gap, many accessibility solutions focus mainly on basic inclusion rather than promoting independent exploration. This research addresses this limitation by creating features that enable visitors’ independence through customizable interaction patterns and self-paced exploration. It improved upon existing interactive tangible user interfaces (ITUIs) by enhancing their audio content and adding more flexible user control options. A mixed-methods approach evaluated the ITUI’s usability, ability to be used independently, and user satisfaction. Quantitative data were gathered using ITUI-specific satisfaction, usability, comparison, and general preference scales, while insights were obtained through notes taken during a think-aloud protocol as participants interacted with the ITUIs, direct observation, and analysis of video recordings of the experiment. The results showed a strong preference for a Pushbutton-based ITUI, which scored highest in usability (M = 87.5), perceived independence (72%), and user control (76%). Participants stressed the importance of tactile interaction, clear feedback, and customizable audio features like volume and playback speed. These findings underscore the vital role of user control and precise feedback in designing accessible museum experiences. Full article
Show Figures

Figure 1

10 pages, 1055 KiB  
Article
Artificial Intelligence and Hysteroscopy: A Multicentric Study on Automated Classification of Pleomorphic Lesions
by Miguel Mascarenhas, Carla Peixoto, Ricardo Freire, Joao Cavaco Gomes, Pedro Cardoso, Inês Castro, Miguel Martins, Francisco Mendes, Joana Mota, Maria João Almeida, Fabiana Silva, Luis Gutierres, Bruno Mendes, João Ferreira, Teresa Mascarenhas and Rosa Zulmira
Cancers 2025, 17(15), 2559; https://doi.org/10.3390/cancers17152559 - 3 Aug 2025
Viewed by 202
Abstract
Background/Objectives: The integration of artificial intelligence (AI) in medical imaging is rapidly advancing, yet its application in gynecologic use remains limited. This proof-of-concept study presents the development and validation of a convolutional neural network (CNN) designed to automatically detect and classify endometrial [...] Read more.
Background/Objectives: The integration of artificial intelligence (AI) in medical imaging is rapidly advancing, yet its application in gynecologic use remains limited. This proof-of-concept study presents the development and validation of a convolutional neural network (CNN) designed to automatically detect and classify endometrial polyps. Methods: A multicenter dataset (n = 3) comprising 65 hysteroscopies was used, yielding 33,239 frames and 37,512 annotated objects. Still frames were extracted from full-length videos and annotated for the presence of histologically confirmed polyps. A YOLOv1-based object detection model was used with a 70–20–10 split for training, validation, and testing. Primary performance metrics included recall, precision, and mean average precision at an intersection over union (IoU) ≥ 0.50 (mAP50). Frame-level classification metrics were also computed to evaluate clinical applicability. Results: The model achieved a recall of 0.96 and precision of 0.95 for polyp detection, with a mAP50 of 0.98. At the frame level, mean recall was 0.75, precision 0.98, and F1 score 0.82, confirming high detection and classification performance. Conclusions: This study presents a CNN trained on multicenter, real-world data that detects and classifies polyps simultaneously with high diagnostic and localization performance, supported by explainable AI features that enhance its clinical integration and technological readiness. Although currently limited to binary classification, this study demonstrates the feasibility and potential of AI to reduce diagnostic subjectivity and inter-observer variability in hysteroscopy. Future work will focus on expanding the model’s capabilities to classify a broader range of endometrial pathologies, enhance generalizability, and validate performance in real-time clinical settings. Full article
Show Figures

Figure 1

24 pages, 23817 KiB  
Article
Dual-Path Adversarial Denoising Network Based on UNet
by Jinchi Yu, Yu Zhou, Mingchen Sun and Dadong Wang
Sensors 2025, 25(15), 4751; https://doi.org/10.3390/s25154751 - 1 Aug 2025
Viewed by 234
Abstract
Digital image quality is crucial for reliable analysis in applications such as medical imaging, satellite remote sensing, and video surveillance. However, traditional denoising methods struggle to balance noise removal with detail preservation and lack adaptability to various types of noise. We propose a [...] Read more.
Digital image quality is crucial for reliable analysis in applications such as medical imaging, satellite remote sensing, and video surveillance. However, traditional denoising methods struggle to balance noise removal with detail preservation and lack adaptability to various types of noise. We propose a novel three-module architecture for image denoising, comprising a generator, a dual-path-UNet-based denoiser, and a discriminator. The generator creates synthetic noise patterns to augment training data, while the dual-path-UNet denoiser uses multiple receptive field modules to preserve fine details and dense feature fusion to maintain global structural integrity. The discriminator provides adversarial feedback to enhance denoising performance. This dual-path adversarial training mechanism addresses the limitations of traditional methods by simultaneously capturing both local details and global structures. Experiments on the SIDD, DND, and PolyU datasets demonstrate superior performance. We compare our architecture with the latest state-of-the-art GAN variants through comprehensive qualitative and quantitative evaluations. These results confirm the effectiveness of noise removal with minimal loss of critical image details. The proposed architecture enhances image denoising capabilities in complex noise scenarios, providing a robust solution for applications that require high image fidelity. By enhancing adaptability to various types of noise while maintaining structural integrity, this method provides a versatile tool for image processing tasks that require preserving detail. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

28 pages, 5699 KiB  
Article
Multi-Modal Excavator Activity Recognition Using Two-Stream CNN-LSTM with RGB and Point Cloud Inputs
by Hyuk Soo Cho, Kamran Latif, Abubakar Sharafat and Jongwon Seo
Appl. Sci. 2025, 15(15), 8505; https://doi.org/10.3390/app15158505 - 31 Jul 2025
Viewed by 148
Abstract
Recently, deep learning algorithms have been increasingly applied in construction for activity recognition, particularly for excavators, to automate processes and enhance safety and productivity through continuous monitoring of earthmoving activities. These deep learning algorithms analyze construction videos to classify excavator activities for earthmoving [...] Read more.
Recently, deep learning algorithms have been increasingly applied in construction for activity recognition, particularly for excavators, to automate processes and enhance safety and productivity through continuous monitoring of earthmoving activities. These deep learning algorithms analyze construction videos to classify excavator activities for earthmoving purposes. However, previous studies have solely focused on single-source external videos, which limits the activity recognition capabilities of the deep learning algorithm. This paper introduces a novel multi-modal deep learning-based methodology for recognizing excavator activities, utilizing multi-stream input data. It processes point clouds and RGB images using the two-stream long short-term memory convolutional neural network (CNN-LSTM) method to extract spatiotemporal features, enabling the recognition of excavator activities. A comprehensive dataset comprising 495,000 video frames of synchronized RGB and point cloud data was collected across multiple construction sites under varying conditions. The dataset encompasses five key excavator activities: Approach, Digging, Dumping, Idle, and Leveling. To assess the effectiveness of the proposed method, the performance of the two-stream CNN-LSTM architecture is compared with that of single-stream CNN-LSTM models on the same RGB and point cloud datasets, separately. The results demonstrate that the proposed multi-stream approach achieved an accuracy of 94.67%, outperforming existing state-of-the-art single-stream models, which achieved 90.67% accuracy for the RGB-based model and 92.00% for the point cloud-based model. These findings underscore the potential of the proposed activity recognition method, making it highly effective for automatic real-time monitoring of excavator activities, thereby laying the groundwork for future integration into digital twin systems for proactive maintenance and intelligent equipment management. Full article
(This article belongs to the Special Issue AI-Based Machinery Health Monitoring)
Show Figures

Figure 1

19 pages, 3130 KiB  
Article
Deep Learning-Based Instance Segmentation of Galloping High-Speed Railway Overhead Contact System Conductors in Video Images
by Xiaotong Yao, Huayu Yuan, Shanpeng Zhao, Wei Tian, Dongzhao Han, Xiaoping Li, Feng Wang and Sihua Wang
Sensors 2025, 25(15), 4714; https://doi.org/10.3390/s25154714 - 30 Jul 2025
Viewed by 234
Abstract
The conductors of high-speed railway OCSs (Overhead Contact Systems) are susceptible to conductor galloping due to the impact of natural elements such as strong winds, rain, and snow, resulting in conductor fatigue damage and significantly compromising train operational safety. Consequently, monitoring the galloping [...] Read more.
The conductors of high-speed railway OCSs (Overhead Contact Systems) are susceptible to conductor galloping due to the impact of natural elements such as strong winds, rain, and snow, resulting in conductor fatigue damage and significantly compromising train operational safety. Consequently, monitoring the galloping status of conductors is crucial, and instance segmentation techniques, by delineating the pixel-level contours of each conductor, can significantly aid in the identification and study of galloping phenomena. This work expands upon the YOLO11-seg model and introduces an instance segmentation approach for galloping video and image sensor data of OCS conductors. The algorithm, designed for the stripe-like distribution of OCS conductors in the data, employs four-direction Sobel filters to extract edge features in horizontal, vertical, and diagonal orientations. These features are subsequently integrated with the original convolutional branch to form the FDSE (Four Direction Sobel Enhancement) module. It integrates the ECA (Efficient Channel Attention) mechanism for the adaptive augmentation of conductor characteristics and utilizes the FL (Focal Loss) function to mitigate the class-imbalance issue between positive and negative samples, hence enhancing the model’s sensitivity to conductors. Consequently, segmentation outcomes from neighboring frames are utilized, and mask-difference analysis is performed to autonomously detect conductor galloping locations, emphasizing their contours for the clear depiction of galloping characteristics. Experimental results demonstrate that the enhanced YOLO11-seg model achieves 85.38% precision, 77.30% recall, 84.25% AP@0.5, 81.14% F1-score, and a real-time processing speed of 44.78 FPS. When combined with the galloping visualization module, it can issue real-time alerts of conductor galloping anomalies, providing robust technical support for railway OCS safety monitoring. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

20 pages, 1536 KiB  
Article
Graph Convolution-Based Decoupling and Consistency-Driven Fusion for Multimodal Emotion Recognition
by Yingmin Deng, Chenyu Li, Yu Gu, He Zhang, Linsong Liu, Haixiang Lin, Shuang Wang and Hanlin Mo
Electronics 2025, 14(15), 3047; https://doi.org/10.3390/electronics14153047 - 30 Jul 2025
Viewed by 236
Abstract
Multimodal emotion recognition (MER) is essential for understanding human emotions from diverse sources such as speech, text, and video. However, modality heterogeneity and inconsistent expression pose challenges for effective feature fusion. To address this, we propose a novel MER framework combining a Dynamic [...] Read more.
Multimodal emotion recognition (MER) is essential for understanding human emotions from diverse sources such as speech, text, and video. However, modality heterogeneity and inconsistent expression pose challenges for effective feature fusion. To address this, we propose a novel MER framework combining a Dynamic Weighted Graph Convolutional Network (DW-GCN) for feature disentanglement and a Cross-Attention Consistency-Gated Fusion (CACG-Fusion) module for robust integration. DW-GCN models complex inter-modal relationships, enabling the extraction of both common and private features. The CACG-Fusion module subsequently enhances classification performance through dynamic alignment of cross-modal cues, employing attention-based coordination and consistency-preserving gating mechanisms to optimize feature integration. Experiments on the CMU-MOSI and CMU-MOSEI datasets demonstrate that our method achieves state-of-the-art performance, significantly improving the ACC7, ACC2, and F1 scores. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

30 pages, 37977 KiB  
Article
Text-Guided Visual Representation Optimization for Sensor-Acquired Video Temporal Grounding
by Yun Tian, Xiaobo Guo, Jinsong Wang and Xinyue Liang
Sensors 2025, 25(15), 4704; https://doi.org/10.3390/s25154704 - 30 Jul 2025
Viewed by 266
Abstract
Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired [...] Read more.
Video temporal grounding (VTG) aims to localize a semantically relevant temporal segment within an untrimmed video based on a natural language query. The task continues to face challenges arising from cross-modal semantic misalignment, which is largely attributed to redundant visual content in sensor-acquired video streams, linguistic ambiguity, and discrepancies in modality-specific representations. Most existing approaches rely on intra-modal feature modeling, processing video and text independently throughout the representation learning stage. However, this isolation undermines semantic alignment by neglecting the potential of cross-modal interactions. In practice, a natural language query typically corresponds to spatiotemporal content in video signals collected through camera-based sensing systems, encompassing a particular sequence of frames and its associated salient subregions. We propose a text-guided visual representation optimization framework tailored to enhance semantic interpretation over video signals captured by visual sensors. This framework leverages textual information to focus on spatiotemporal video content, thereby narrowing the cross-modal gap. Built upon the unified cross-modal embedding space provided by CLIP, our model leverages video data from sensing devices to structure representations and introduces two dedicated modules to semantically refine visual representations across spatial and temporal dimensions. First, we design a Spatial Visual Representation Optimization (SVRO) module to learn spatial information within intra-frames. It selects salient patches related to the text, capturing more fine-grained visual details. Second, we introduce a Temporal Visual Representation Optimization (TVRO) module to learn temporal relations from inter-frames. Temporal triplet loss is employed in TVRO to enhance attention on text-relevant frames and capture clip semantics. Additionally, a self-supervised contrastive loss is introduced at the clip–text level to improve inter-clip discrimination by maximizing semantic variance during training. Experiments on Charades-STA, ActivityNet Captions, and TACoS, widely used benchmark datasets, demonstrate that our method outperforms state-of-the-art methods across multiple metrics. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

15 pages, 4592 KiB  
Article
SSAM_YOLOv5: YOLOv5 Enhancement for Real-Time Detection of Small Road Signs
by Fatima Qanouni, Hakim El Massari, Noreddine Gherabi and Maria El-Badaoui
Digital 2025, 5(3), 30; https://doi.org/10.3390/digital5030030 - 29 Jul 2025
Viewed by 382
Abstract
Many traffic-sign detection systems are available to assist drivers with particular conditions such as small and distant signs, multiple signs on the road, objects similar to signs, and other challenging conditions. Real-time object detection is an indispensable aspect of these detection systems, with [...] Read more.
Many traffic-sign detection systems are available to assist drivers with particular conditions such as small and distant signs, multiple signs on the road, objects similar to signs, and other challenging conditions. Real-time object detection is an indispensable aspect of these detection systems, with detection speed and efficiency being critical parameters. In terms of these parameters, to enhance performance in road-sign detection under diverse conditions, we proposed a comprehensive methodology, SSAM_YOLOv5, to handle feature extraction and small-road-sign detection performance. The method was based on a modified version of YOLOv5s. First, we introduced attention modules into the backbone to focus on the region of interest within video frames; secondly, we replaced the activation function with the SwishT_C activation function to enhance feature extraction and achieve a balance between inference, precision, and mean average precision (mAP@50) rates. Compared to the YOLOv5 baseline, the proposed improvements achieved remarkable increases of 1.4% and 1.9% in mAP@50 on the Tiny LISA and GTSDB datasets, respectively, confirming their effectiveness. Full article
Show Figures

Figure 1

21 pages, 9651 KiB  
Article
Self-Supervised Visual Tracking via Image Synthesis and Domain Adversarial Learning
by Gu Geng, Sida Zhou, Jianing Tang, Xinming Zhang, Qiao Liu and Di Yuan
Sensors 2025, 25(15), 4621; https://doi.org/10.3390/s25154621 - 25 Jul 2025
Viewed by 209
Abstract
With the widespread use of sensors in applications such as autonomous driving and intelligent security, stable and efficient target tracking from diverse sensor data has become increasingly important. Self-supervised visual tracking has attracted increasing attention due to its potential to eliminate reliance on [...] Read more.
With the widespread use of sensors in applications such as autonomous driving and intelligent security, stable and efficient target tracking from diverse sensor data has become increasingly important. Self-supervised visual tracking has attracted increasing attention due to its potential to eliminate reliance on costly manual annotations; however, existing methods often train on incomplete object representations, resulting in inaccurate localization during inference. In addition, current methods typically struggle when applied to deep networks. To address these limitations, we propose a novel self-supervised tracking framework based on image synthesis and domain adversarial learning. We first construct a large-scale database of real-world target objects, then synthesize training video pairs by randomly inserting these targets into background frames while applying geometric and appearance transformations to simulate realistic variations. To reduce domain shift introduced by synthetic content, we incorporate a domain classification branch after feature extraction and adopt domain adversarial training to encourage feature alignment between real and synthetic domains. Experimental results on five standard tracking benchmarks demonstrate that our method significantly enhances tracking accuracy compared to existing self-supervised approaches without introducing any additional labeling cost. The proposed framework not only ensures complete target coverage during training but also shows strong scalability to deeper network architectures, offering a practical and effective solution for real-world tracking applications. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

30 pages, 3451 KiB  
Article
Integrating Google Maps and Smooth Street View Videos for Route Planning
by Federica Massimi, Antonio Tedeschi, Kalapraveen Bagadi and Francesco Benedetto
J. Imaging 2025, 11(8), 251; https://doi.org/10.3390/jimaging11080251 - 25 Jul 2025
Viewed by 375
Abstract
This research addresses the long-standing dependence on printed maps for navigation and highlights the limitations of existing digital services like Google Street View and Google Street View Player in providing comprehensive solutions for route analysis and understanding. The absence of a systematic approach [...] Read more.
This research addresses the long-standing dependence on printed maps for navigation and highlights the limitations of existing digital services like Google Street View and Google Street View Player in providing comprehensive solutions for route analysis and understanding. The absence of a systematic approach to route analysis, issues related to insufficient street view images, and the lack of proper image mapping for desired roads remain unaddressed by current applications, which are predominantly client-based. In response, we propose an innovative automatic system designed to generate videos depicting road routes between two geographic locations. The system calculates and presents the route conventionally, emphasizing the path on a two-dimensional representation, and in a multimedia format. A prototype is developed based on a cloud-based client–server architecture, featuring three core modules: frames acquisition, frames analysis and elaboration, and the persistence of metadata information and computed videos. The tests, encompassing both real-world and synthetic scenarios, have produced promising results, showcasing the efficiency of our system. By providing users with a real and immersive understanding of requested routes, our approach fills a crucial gap in existing navigation solutions. This research contributes to the advancement of route planning technologies, offering a comprehensive and user-friendly system that leverages cloud computing and multimedia visualization for an enhanced navigation experience. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 4180 KiB  
Article
Quantitative and Correlation Analysis of Pear Leaf Dynamics Under Wind Field Disturbances
by Yunfei Wang, Xiang Dong, Weidong Jia, Mingxiong Ou, Shiqun Dai, Zhenlei Zhang and Ruohan Shi
Agriculture 2025, 15(15), 1597; https://doi.org/10.3390/agriculture15151597 - 24 Jul 2025
Viewed by 257
Abstract
In wind-assisted orchard spraying operations, the dynamic response of leaves—manifested through changes in their posture—critically influences droplet deposition on both sides of the leaf surface and the penetration depth into the canopy. These factors are pivotal in determining spray coverage and the spatial [...] Read more.
In wind-assisted orchard spraying operations, the dynamic response of leaves—manifested through changes in their posture—critically influences droplet deposition on both sides of the leaf surface and the penetration depth into the canopy. These factors are pivotal in determining spray coverage and the spatial distribution of pesticide efficacy. However, current research lacks comprehensive quantification and correlation analysis of the temporal response characteristics of leaves under wind disturbances. To address this gap, a systematic analytical framework was proposed, integrating real-time leaf segmentation and tracking, geometric feature quantification, and statistical correlation modeling. High-frame-rate videos of fluttering leaves were acquired under controlled wind conditions, and background segmentation was performed using principal component analysis (PCA) followed by clustering in the reduced feature space. A fine-tuned Segment Anything Model 2 (SAM2-FT) was employed to extract dynamic leaf masks and enable frame-by-frame tracking. Based on the extracted masks, time series of leaf area and inclination angle were constructed. Subsequently, regression analysis, cross-correlation functions, and Granger causality tests were applied to investigate cooperative responses and potential driving relationships among leaves. Results showed that the SAM2-FT model significantly outperformed the YOLO series in segmentation accuracy, achieving a precision of 98.7% and recall of 97.48%. Leaf area exhibited strong linear coupling and directional causality, while angular responses showed weaker correlations but demonstrated localized synchronization. This study offers a methodological foundation for quantifying temporal dynamics in wind–leaf systems and provides theoretical insights for the adaptive control and optimization of intelligent spraying strategies. Full article
(This article belongs to the Section Agricultural Technology)
Show Figures

Figure 1

23 pages, 13739 KiB  
Article
Traffic Accident Rescue Action Recognition Method Based on Real-Time UAV Video
by Bo Yang, Jianan Lu, Tao Liu, Bixing Zhang, Chen Geng, Yan Tian and Siyu Zhang
Drones 2025, 9(8), 519; https://doi.org/10.3390/drones9080519 - 24 Jul 2025
Viewed by 427
Abstract
Low-altitude drones, which are unimpeded by traffic congestion or urban terrain, have become a critical asset in emergency rescue missions. To address the current lack of emergency rescue data, UAV aerial videos were collected to create an experimental dataset for action classification and [...] Read more.
Low-altitude drones, which are unimpeded by traffic congestion or urban terrain, have become a critical asset in emergency rescue missions. To address the current lack of emergency rescue data, UAV aerial videos were collected to create an experimental dataset for action classification and localization annotation. A total of 5082 keyframes were labeled with 1–5 targets each, and 14,412 instances of data were prepared (including flight altitude and camera angles) for action classification and position annotation. To mitigate the challenges posed by high-resolution drone footage with excessive redundant information, we propose the SlowFast-Traffic (SF-T) framework, a spatio-temporal sequence-based algorithm for recognizing traffic accident rescue actions. For more efficient extraction of target–background correlation features, we introduce the Actor-Centric Relation Network (ACRN) module, which employs temporal max pooling to enhance the time-dimensional features of static backgrounds, significantly reducing redundancy-induced interference. Additionally, smaller ROI feature map outputs are adopted to boost computational speed. To tackle class imbalance in incident samples, we integrate a Class-Balanced Focal Loss (CB-Focal Loss) function, effectively resolving rare-action recognition in specific rescue scenarios. We replace the original Faster R-CNN with YOLOX-s to improve the target detection rate. On our proposed dataset, the SF-T model achieves a mean average precision (mAP) of 83.9%, which is 8.5% higher than that of the standard SlowFast architecture while maintaining a processing speed of 34.9 tasks/s. Both accuracy-related metrics and computational efficiency are substantially improved. The proposed method demonstrates strong robustness and real-time analysis capabilities for modern traffic rescue action recognition. Full article
(This article belongs to the Special Issue Cooperative Perception for Modern Transportation)
Show Figures

Figure 1

27 pages, 705 KiB  
Article
A Novel Wavelet Transform and Deep Learning-Based Algorithm for Low-Latency Internet Traffic Classification
by Ramazan Enisoglu and Veselin Rakocevic
Algorithms 2025, 18(8), 457; https://doi.org/10.3390/a18080457 - 23 Jul 2025
Viewed by 345
Abstract
Accurate and real-time classification of low-latency Internet traffic is critical for applications such as video conferencing, online gaming, financial trading, and autonomous systems, where millisecond-level delays can degrade user experience. Existing methods for low-latency traffic classification, reliant on raw temporal features or static [...] Read more.
Accurate and real-time classification of low-latency Internet traffic is critical for applications such as video conferencing, online gaming, financial trading, and autonomous systems, where millisecond-level delays can degrade user experience. Existing methods for low-latency traffic classification, reliant on raw temporal features or static statistical analyses, fail to capture dynamic frequency patterns inherent to real-time applications. These limitations hinder accurate resource allocation in heterogeneous networks. This paper proposes a novel framework integrating wavelet transform (WT) and artificial neural networks (ANNs) to address this gap. Unlike prior works, we systematically apply WT to commonly used temporal features—such as throughput, slope, ratio, and moving averages—transforming them into frequency-domain representations. This approach reveals hidden multi-scale patterns in low-latency traffic, akin to structured noise in signal processing, which traditional time-domain analyses often overlook. These wavelet-enhanced features train a multilayer perceptron (MLP) ANN, enabling dual-domain (time–frequency) analysis. We evaluate our approach on a dataset comprising FTP, video streaming, and low-latency traffic, including mixed scenarios with up to four concurrent traffic types. Experiments demonstrate 99.56% accuracy in distinguishing low-latency traffic (e.g., video conferencing) from FTP and streaming, outperforming k-NN, CNNs, and LSTMs. Notably, our method eliminates reliance on deep packet inspection (DPI), offering ISPs a privacy-preserving and scalable solution for prioritizing time-sensitive traffic. In mixed-traffic scenarios, the model achieves 74.2–92.8% accuracy, offering ISPs a scalable solution for prioritizing time-sensitive traffic without deep packet inspection. By bridging signal processing and deep learning, this work advances efficient bandwidth allocation and enables Internet Service Providers to prioritize time-sensitive flows without deep packet inspection, improving quality of service in heterogeneous network environments. Full article
(This article belongs to the Section Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

Back to TopTop