Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (80)

Search Parameters:
Keywords = full-frame video

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 918 KB  
Article
Self-Supervised Spatio-Temporal Network for Classifying Lung Tumor in EBUS Videos
by Ching-Kai Lin, Chin-Wen Chen, Hung-Chih Tu, Hung-Jen Fan and Yun-Chien Cheng
Diagnostics 2025, 15(24), 3184; https://doi.org/10.3390/diagnostics15243184 - 13 Dec 2025
Viewed by 176
Abstract
Background: Endobronchial ultrasound-guided transbronchial biopsy (EBUS-TBB) is a valuable technique for diagnosing peripheral pulmonary lesions (PPLs). Although computer-aided diagnostic (CAD) systems have been explored for EBUS interpretation, most rely on manually selected 2D static frames and overlook temporal dynamics that may provide important [...] Read more.
Background: Endobronchial ultrasound-guided transbronchial biopsy (EBUS-TBB) is a valuable technique for diagnosing peripheral pulmonary lesions (PPLs). Although computer-aided diagnostic (CAD) systems have been explored for EBUS interpretation, most rely on manually selected 2D static frames and overlook temporal dynamics that may provide important cues for differentiating benign from malignant lesions. This study aimed to develop an artificial intelligence model that incorporates temporal modeling to analyze EBUS videos and improve lesion classification. Methods: We retrospectively collected EBUS videos from patients undergoing EBUS-TBB between November 2019 and January 2022. A dual-path 3D convolutional network (SlowFast) was employed for spatiotemporal feature extraction, and contrastive learning (SwAV) was integrated to enhance model generalizability on clinical data. Results: A total of 465 patients with corresponding EBUS videos were included. On the validation set, the SlowFast + SwAV_Frame model achieved an AUC of 0.857, accuracy of 82.26%, sensitivity of 93.18%, specificity of 55.56%, and F1-score of 88.17%, outperforming pulmonologists (accuracy 70.97%, sensitivity 77.27%, specificity 55.56%, F1-score 79.07%). On the test set, the model achieved an AUC of 0.823, accuracy of 76.92%, sensitivity of 84.85%, specificity of 63.16%, and F1-score of 82.35%. The proposed model also demonstrated superior performance compared with conventional 2D architectures. Conclusions: This study introduces the first CAD framework for real-time malignancy classification from full-length EBUS videos, which reduces reliance on manual image selection and improves diagnostic efficiency. In addition, given its higher accuracy compared with pulmonologists’ assessments, the framework shows strong potential for clinical applicability. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

25 pages, 3453 KB  
Article
High-Frame-Rate Camera-Based Vibration Analysis for Health Monitoring of Industrial Robots Across Multiple Postures
by Tuniyazi Abudoureheman, Hayato Otsubo, Feiyue Wang, Kohei Shimasaki and Idaku Ishii
Appl. Sci. 2025, 15(23), 12771; https://doi.org/10.3390/app152312771 - 2 Dec 2025
Viewed by 287
Abstract
Accurate vibration measurement is crucial for maintaining the performance, reliability, and safety of automated manufacturing environments. Abnormal vibrations caused by faults in gears or bearings can degrade positional accuracy, reduce productivity, and, over time, significantly impair production efficiency and product quality. Such vibrations [...] Read more.
Accurate vibration measurement is crucial for maintaining the performance, reliability, and safety of automated manufacturing environments. Abnormal vibrations caused by faults in gears or bearings can degrade positional accuracy, reduce productivity, and, over time, significantly impair production efficiency and product quality. Such vibrations may also disrupt supply chains, cause financial losses, and pose safety risks to workers through collisions, falling objects, or other operational hazards. Conventional vibration measurement techniques, such as wired accelerometers and strain gauges, are typically limited to a few discrete measurement points. Achieving multi-point measurements requires numerous sensors, which increases installation complexity, wiring constraints, and setup time, making the process both time-consuming and costly. The integration of high-frame-rate (HFR) cameras with Digital Image Correlation (DIC) enables non-contact, multi-point, full-field vibration measurement of robot manipulators, effectively addressing these limitations. In this study, HFR cameras were employed to perform non-contact, full-field vibration measurements of industrial robots. The HFR camera recorded the robot’s vibrations at 1000 frames per second (fps), and the resulting video was decomposed into individual frames according to the frame rate. Each frame, with a resolution of 1920 × 1080 pixels, was divided into 128 × 128 pixel blocks with a 64-pixel stride, yielding 435 sub-images. This setup effectively simulates the operation of 435 virtual vibration sensors. By applying mask processing to these sub-images, eight key points representing critical robot components were selected for multi-point DIC displacement measurements, enabling effective assessment of vibration distribution and real-time vibration visualization across the entire manipulator. This approach allows simultaneous capture of displacements across all robot components without the need for physical sensors. The transfer function is defined in the frequency domain as the ratio between the output displacement of each robot component and the input excitation applied by the shaker mounted on the end-effector. The frequency–domain transfer functions were computed for multiple robot components, enabling accurate and full-field vibration analysis during operation. Full article
(This article belongs to the Special Issue Innovative Approaches to Non-Destructive Evaluation)
Show Figures

Figure 1

27 pages, 12490 KB  
Article
Fast CU Division Algorithm for Different Occupancy Types of CUs in Geometric Videos
by Nana Li, Tiantian Zhang, Jinchao Zhao and Qiuwen Zhang
Electronics 2025, 14(20), 4124; https://doi.org/10.3390/electronics14204124 - 21 Oct 2025
Viewed by 366
Abstract
Video-based point cloud compression (V-PCC) is a 3D point cloud compression standard that first projects the point cloud from 3D space onto 2D space, thereby generating geometric and attribute videos, and then encodes the geometric and attribute videos using high-efficiency video coding (HEVC). [...] Read more.
Video-based point cloud compression (V-PCC) is a 3D point cloud compression standard that first projects the point cloud from 3D space onto 2D space, thereby generating geometric and attribute videos, and then encodes the geometric and attribute videos using high-efficiency video coding (HEVC). In the whole coding process, the coding of geometric videos is extremely time-consuming, mainly because the division of geometric video coding units has high computational complexity. In order to effectively reduce the coding complexity of geometric videos in video-based point cloud compression, we propose a fast segmentation algorithm based on the occupancy type of coding units. First, the CUs are divided into three categories—unoccupied, partially occupied, and fully occupied—based on the occupancy graph. For unoccupied CUs, the segmentation is terminated immediately; for partially occupied CUs, a geometric visual perception factor is designed based on their spatial depth variation characteristics, thus realizing early depth range skipping based on visual sensitivity; and, for fully occupied CUs, a lightweight fully connected network is used to make the fast segmentation decision. The experimental results show that, under the full intra-frame configuration, this algorithm significantly reduces the coding time complexity while almost maintaining the coding quality; i.e., the BD rate of D1 and D2 only increases by an average of 0.11% and 0.28% under the total coding rate, where the geometric video coding time saving reaches up to 58.71% and the overall V-PCC coding time saving reaches up to 53.96%. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

40 pages, 4388 KB  
Article
Optimized Implementation of YOLOv3-Tiny for Real-Time Image and Video Recognition on FPGA
by Riccardo Calì, Laura Falaschetti and Giorgio Biagetti
Electronics 2025, 14(20), 3993; https://doi.org/10.3390/electronics14203993 - 12 Oct 2025
Viewed by 1946
Abstract
In recent years, the demand for efficient neural networks in embedded contexts has grown, driven by the need for real-time inference with limited resources. While GPUs offer high performance, their size, power consumption, and cost often make them unsuitable for constrained or large-scale [...] Read more.
In recent years, the demand for efficient neural networks in embedded contexts has grown, driven by the need for real-time inference with limited resources. While GPUs offer high performance, their size, power consumption, and cost often make them unsuitable for constrained or large-scale applications. FPGAs have therefore emerged as a promising alternative, combining reconfigurability, parallelism, and increasingly favorable cost–performance ratios. They are especially relevant in domains such as robotics, IoT, and autonomous drones, where rapid sensor fusion and low power consumption are critical. This work presents the full implementation of a neural network on a low-cost FPGA, targeting real-time image and video recognition for drone applications. The workflow included training and quantizing a YOLOv3-Tiny model with Brevitas and PyTorch, converting it into hardware logic using the FINN framework, and optimizing the hardware design to maximize use of the reprogrammable silicon area and inference time. A custom driver was also developed to allow the device to operate as a TPU. The resulting accelerator, deployed on a Xilinx Zynq-7020, could recognize 208 frames per second (FPS) when running at a 200 MHz clock frequency, while consuming only 2.55 W. Compared to Google’s Coral Edge TPU, the system offers similar inference speed with greater flexibility, and outperforms other FPGA-based approaches in the literature by a factor of three to seven in terms of FPS/W. Full article
Show Figures

Figure 1

12 pages, 15620 KB  
Protocol
A Simple Method for Imaging and Quantifying Respiratory Cilia Motility in Mouse Models
by Richard Francis
Methods Protoc. 2025, 8(5), 113; https://doi.org/10.3390/mps8050113 - 1 Oct 2025
Cited by 1 | Viewed by 878
Abstract
A straightforward ex vivo approach has been developed and refined to enable high-resolution imaging and quantitative assessment of motile cilia function in mouse airway epithelial tissue, allowing critical insights into cilia motility and cilia generated flow using different mouse models or following different [...] Read more.
A straightforward ex vivo approach has been developed and refined to enable high-resolution imaging and quantitative assessment of motile cilia function in mouse airway epithelial tissue, allowing critical insights into cilia motility and cilia generated flow using different mouse models or following different sample treatments. In this method, freshly excised mouse trachea is cut longitudinally through the trachealis muscle which is then sandwiched between glass coverslips within a thin silicon gasket. By orienting the tissue along its longitudinal axis, the natural curling of the trachealis muscle helps maintain the sample in a configuration optimal for imaging along the full tracheal length. High-speed video microscopy, utilizing differential interference contrast (DIC) optics and a fast digital camera capturing at >200 frames per second is then used to record ciliary motion. This enables detailed measurement of both cilia beat frequency (CBF) and waveform characteristics. The application of 1 µm microspheres to the bathing media during imaging allows for additional analysis of fluid flow generated by ciliary activity. The entire procedure typically takes around 40 min to complete per animal: ~30 min for tissue harvest and sample mounting, then ~10 min for imaging samples and acquiring data. Full article
(This article belongs to the Section Biomedical Sciences and Physiology)
Show Figures

Figure 1

22 pages, 1799 KB  
Article
A Novel Hybrid Deep Learning–Probabilistic Framework for Real-Time Crash Detection from Monocular Traffic Video
by Reşat Buğra Erkartal and Atınç Yılmaz
Appl. Sci. 2025, 15(19), 10523; https://doi.org/10.3390/app151910523 - 29 Sep 2025
Viewed by 3233
Abstract
The rapid evolution of autonomous vehicle technologies has amplified the need for crash detection that operates robustly under complex traffic conditions with minimal latency. We propose a hybrid temporal hierarchy that augments a Region-based Convolutional Neural Network (R-CNN) with an adaptive time-variant Kalman [...] Read more.
The rapid evolution of autonomous vehicle technologies has amplified the need for crash detection that operates robustly under complex traffic conditions with minimal latency. We propose a hybrid temporal hierarchy that augments a Region-based Convolutional Neural Network (R-CNN) with an adaptive time-variant Kalman filter (with total-variation prior), a Hidden Markov Model (HMM) for state stabilization, and a lightweight Artificial Neural Network (ANN) for learned temporal refinement, enabling real-time crash detection from monocular video. Evaluated on simulated traffic in CARLA and real-world driving in Istanbul, the full temporal stack achieves the best precision–recall balance, yielding 83.47% F1 offline and 82.57% in real time (corresponding to 94.5% and 91.2% detection accuracy, respectively). Ablations are consistent and interpretable: removing the HMM reduces F1 by 1.85–2.16 percentage points (pp), whereas removing the ANN has a larger impact of 2.94–4.58 pp, indicating that the ANN provides the largest marginal gains—especially under real-time constraints. The transition from offline to real time incurs a modest overall loss (−0.90 pp F1), driven more by recall than precision. Compared to strong single-frame baselines, YOLOv10 attains 82.16% F1 and a real-time Transformer detector reaches 82.41% F1, while our full temporal stack remains slightly ahead in real time and offers a more favorable precision–recall trade-off. Notably, integrating the ANN into the HMM-based pipeline improves accuracy by 2.2%, while the time-variant Kalman configuration reduces detection lag by approximately 0.5 s—an improvement that directly addresses the human reaction time gap. Under identical conditions, the best RCNN-based configuration yields AP@0.50 ≈ 0.79 with an end-to-end latency of 119 ± 21 ms per frame (~8–9 FPS). Overall, coupling deep learning with probabilistic reasoning yields additive temporal benefits and advances deployable, camera-only crash detection that is cost-efficient and scalable for intelligent transportation systems. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

31 pages, 19915 KB  
Article
Tracking-Based Denoising: A Trilateral Filter-Based Denoiser for Real-World Surveillance Video in Extreme Low-Light Conditions
by He Jiang, Peilin Wu, Zhou Zheng, Hao Gu, Fudi Yi, Wen Cui and Chen Lv
Sensors 2025, 25(17), 5567; https://doi.org/10.3390/s25175567 - 6 Sep 2025
Viewed by 1287
Abstract
Video denoising in extremely low-light surveillance scenarios is a challenging task in computer vision, as it suffers from harsh noise and insufficient signal to reconstruct fine details. The denoising algorithm for these scenarios encounters challenges such as the lack of ground truth, [...] Read more.
Video denoising in extremely low-light surveillance scenarios is a challenging task in computer vision, as it suffers from harsh noise and insufficient signal to reconstruct fine details. The denoising algorithm for these scenarios encounters challenges such as the lack of ground truth, and the noise distribution in the real world is far more complex than in a normal scene. Consequently, recent state-of-the-art (SOTA) methods like VRT and Turtle for video denoising perform poorly in this low-light environment. Additionally, some methods rely on raw video data, which is difficult to obtain from surveillance systems. In this paper, a denoising method is proposed based on the trilateral filter, which aims to denoise real-world low-light surveillance videos. Our trilateral filter is a weighted filter, allocating reasonable weights to different inputs to produce an appropriate output. Our idea is inspired by an experimental finding: noise on stationary objects can be easily suppressed by averaging adjacent frames. This led us to believe that if we can track moving objects accurately and filter along their trajectories, the noise may be effectively removed. Our proposed method involves four main steps. First, coarse motion vectors are obtained by bilateral search. Second, an amplitude-phase filter is used to judge and correct erroneous vectors. Third, these vectors are refined by a full search in a small area for greater accuracy. Finally, the trilateral filter is applied along the trajectory to denoise the noisy frame. Extensive experiments have demonstrated that our method achieves superior performance in terms of visual effects and quantitative tests. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

15 pages, 2479 KB  
Article
Inter- and Intraobserver Variability in Bowel Preparation Scoring for Colon Capsule Endoscopy: Impact of AI-Assisted Assessment Feasibility Study
by Ian Io Lei, Daniel R. Gaya, Alexander Robertson, Benedicte Schelde-Olesen, Alice Mapiye, Anirudh Bhandare, Bei Bei Lui, Chander Shekhar, Ursula Valentiner, Pere Gilabert, Pablo Laiz, Santi Segui, Nicholas Parsons, Cristiana Huhulea, Hagen Wenzek, Elizabeth White, Anastasios Koulaouzidis and Ramesh P. Arasaradnam
Cancers 2025, 17(17), 2840; https://doi.org/10.3390/cancers17172840 - 29 Aug 2025
Viewed by 957
Abstract
Background: Colon capsule endoscopy (CCE) has seen increased adoption since the COVID-19 pandemic, offering a non-invasive alternative for lower gastrointestinal investigations. However, inadequate bowel preparation remains a key limitation, often leading to higher conversion rates to colonoscopy. Manual assessment of bowel cleanliness is [...] Read more.
Background: Colon capsule endoscopy (CCE) has seen increased adoption since the COVID-19 pandemic, offering a non-invasive alternative for lower gastrointestinal investigations. However, inadequate bowel preparation remains a key limitation, often leading to higher conversion rates to colonoscopy. Manual assessment of bowel cleanliness is inherently subjective and marked by high interobserver variability. Recent advances in artificial intelligence (AI) have enabled automated cleansing scores that not only standardise assessment and reduce variability but also align with the emerging semi-automated AI reading workflow, which highlights only clinically significant frames. As full video review becomes less routine, reliable, and consistent, cleansing evaluation is essential, positioning bowel preparation AI as a critical enabler of diagnostic accuracy and scalable CCE deployment. Objective: This CESCAIL sub-study aimed to (1) evaluate interobserver agreement in CCE bowel cleansing assessment using two established scoring systems, and (2) determine the impact of AI-assisted scoring, specifically a TransUNet-based segmentation model with a custom Patch Loss function, on both interobserver and intraobserver agreement compared to manual assessment. Methods: As part of the CESCAIL study, twenty-five CCE videos were randomly selected from 673 participants. Nine readers with varying CCE experience scored bowel cleanliness using the Leighton–Rex and CC-CLEAR scales. After a minimum 8-week washout, the same readers reassessed the videos using AI-assisted CC-CLEAR scores. Interobserver variability was evaluated using bootstrapped intraclass correlation coefficients (ICC) and Fleiss’ Kappa; intraobserver variability was assessed with weighted Cohen’s Kappa, paired t-tests, and Two One-Sided Tests (TOSTs). Results: Leighton–Rex showed poor to fair agreement (Fleiss = 0.14; ICC = 0.55), while CC-CLEAR demonstrated fair to excellent agreement (Fleiss = 0.27; ICC = 0.90). AI-assisted CC-CLEAR achieved only moderate agreement overall (Fleiss = 0.27; ICC = 0.69), with weaker performance among less experienced readers (Fleiss = 0.15; ICC = 0.56). Intraobserver agreement was excellent (ICC > 0.75) for experienced readers but variable in others (ICC 0.03–0.80). AI-assisted scores were significantly lower than manual reads by 1.46 points (p < 0.001), potentially increasing conversion to colonoscopy. Conclusions: AI-assisted scoring did not improve interobserver agreement and may even reduce consistency amongst less experienced readers. The maintained agreement observed in experienced readers highlights its current value in experienced hands only. Further refinement, including spatial analysis integration, is needed for robust overall AI implementation in CCE. Full article
(This article belongs to the Section Methods and Technologies Development)
Show Figures

Figure 1

21 pages, 1192 KB  
Article
Video Stabilization Algorithm Based on View Boundary Synthesis
by Wenchao Shan, Hejing Zhao, Xin Li, Qian Huang, Chuanxu Jiang, Yiming Wang, Ziqi Chen and Yao Tong
Symmetry 2025, 17(8), 1351; https://doi.org/10.3390/sym17081351 - 19 Aug 2025
Viewed by 1182
Abstract
Video stabilization is a critical technology for enhancing visual content quality in dynamic shooting scenarios, especially with the widespread adoption of mobile photography devices and Unmanned Aerial Vehicle (UAV) platforms. While traditional digital stabilization algorithms can improve frame stability by modeling global motion [...] Read more.
Video stabilization is a critical technology for enhancing visual content quality in dynamic shooting scenarios, especially with the widespread adoption of mobile photography devices and Unmanned Aerial Vehicle (UAV) platforms. While traditional digital stabilization algorithms can improve frame stability by modeling global motion trajectories, they often suffer from excessive cropping or boundary distortion, leading to a significant loss of valid image regions. To address this persistent challenge, we propose the View Out-boundary Synthesis Algorithm (VOSA), a symmetry-aware spatio-temporal consistency framework. By leveraging rotational and translational symmetry principles in motion dynamics, VOSA realizes optical flow field extrapolation through an encoder–decoder architecture and an iterative boundary extension strategy. Experimental results demonstrate that VOSA enhances conventional stabilization by increasing content retention by 6.3% while maintaining a 0.943 distortion score, outperforming mainstream methods in dynamic environments. The symmetry-informed design resolves stability–content conflicts and outperforms mainstream methods in dynamic environments, establishing a new paradigm for full-frame stabilization. Full article
(This article belongs to the Special Issue Symmetry/Asymmetry in Image Processing and Computer Vision)
Show Figures

Figure 1

10 pages, 1055 KB  
Article
Artificial Intelligence and Hysteroscopy: A Multicentric Study on Automated Classification of Pleomorphic Lesions
by Miguel Mascarenhas, Carla Peixoto, Ricardo Freire, Joao Cavaco Gomes, Pedro Cardoso, Inês Castro, Miguel Martins, Francisco Mendes, Joana Mota, Maria João Almeida, Fabiana Silva, Luis Gutierres, Bruno Mendes, João Ferreira, Teresa Mascarenhas and Rosa Zulmira
Cancers 2025, 17(15), 2559; https://doi.org/10.3390/cancers17152559 - 3 Aug 2025
Viewed by 1006
Abstract
Background/Objectives: The integration of artificial intelligence (AI) in medical imaging is rapidly advancing, yet its application in gynecologic use remains limited. This proof-of-concept study presents the development and validation of a convolutional neural network (CNN) designed to automatically detect and classify endometrial [...] Read more.
Background/Objectives: The integration of artificial intelligence (AI) in medical imaging is rapidly advancing, yet its application in gynecologic use remains limited. This proof-of-concept study presents the development and validation of a convolutional neural network (CNN) designed to automatically detect and classify endometrial polyps. Methods: A multicenter dataset (n = 3) comprising 65 hysteroscopies was used, yielding 33,239 frames and 37,512 annotated objects. Still frames were extracted from full-length videos and annotated for the presence of histologically confirmed polyps. A YOLOv1-based object detection model was used with a 70–20–10 split for training, validation, and testing. Primary performance metrics included recall, precision, and mean average precision at an intersection over union (IoU) ≥ 0.50 (mAP50). Frame-level classification metrics were also computed to evaluate clinical applicability. Results: The model achieved a recall of 0.96 and precision of 0.95 for polyp detection, with a mAP50 of 0.98. At the frame level, mean recall was 0.75, precision 0.98, and F1 score 0.82, confirming high detection and classification performance. Conclusions: This study presents a CNN trained on multicenter, real-world data that detects and classifies polyps simultaneously with high diagnostic and localization performance, supported by explainable AI features that enhance its clinical integration and technological readiness. Although currently limited to binary classification, this study demonstrates the feasibility and potential of AI to reduce diagnostic subjectivity and inter-observer variability in hysteroscopy. Future work will focus on expanding the model’s capabilities to classify a broader range of endometrial pathologies, enhance generalizability, and validate performance in real-time clinical settings. Full article
Show Figures

Figure 1

20 pages, 4569 KB  
Article
Lightweight Vision Transformer for Frame-Level Ergonomic Posture Classification in Industrial Workflows
by Luca Cruciata, Salvatore Contino, Marianna Ciccarelli, Roberto Pirrone, Leonardo Mostarda, Alessandra Papetti and Marco Piangerelli
Sensors 2025, 25(15), 4750; https://doi.org/10.3390/s25154750 - 1 Aug 2025
Cited by 2 | Viewed by 1300
Abstract
Work-related musculoskeletal disorders (WMSDs) are a leading concern in industrial ergonomics, often stemming from sustained non-neutral postures and repetitive tasks. This paper presents a vision-based framework for real-time, frame-level ergonomic risk classification using a lightweight Vision Transformer (ViT). The proposed system operates directly [...] Read more.
Work-related musculoskeletal disorders (WMSDs) are a leading concern in industrial ergonomics, often stemming from sustained non-neutral postures and repetitive tasks. This paper presents a vision-based framework for real-time, frame-level ergonomic risk classification using a lightweight Vision Transformer (ViT). The proposed system operates directly on raw RGB images without requiring skeleton reconstruction, joint angle estimation, or image segmentation. A single ViT model simultaneously classifies eight anatomical regions, enabling efficient multi-label posture assessment. Training is supervised using a multimodal dataset acquired from synchronized RGB video and full-body inertial motion capture, with ergonomic risk labels derived from RULA scores computed on joint kinematics. The system is validated on realistic, simulated industrial tasks that include common challenges such as occlusion and posture variability. Experimental results show that the ViT model achieves state-of-the-art performance, with F1-scores exceeding 0.99 and AUC values above 0.996 across all regions. Compared to previous CNN-based system, the proposed model improves classification accuracy and generalizability while reducing complexity and enabling real-time inference on edge devices. These findings demonstrate the model’s potential for unobtrusive, scalable ergonomic risk monitoring in real-world manufacturing environments. Full article
(This article belongs to the Special Issue Secure and Decentralised IoT Systems)
Show Figures

Figure 1

23 pages, 6358 KB  
Article
Optimization of Sorghum Spike Recognition Algorithm and Yield Estimation
by Mengyao Han, Jian Gao, Cuiqing Wu, Qingliang Cui, Xiangyang Yuan and Shujin Qiu
Agronomy 2025, 15(7), 1526; https://doi.org/10.3390/agronomy15071526 - 23 Jun 2025
Viewed by 675
Abstract
In the natural field environment, the high planting density of sorghum and severe occlusion among spikes substantially increases the difficulty of sorghum spike recognition, resulting in frequent false positives and false negatives. The target detection model suitable for this environment requires high computational [...] Read more.
In the natural field environment, the high planting density of sorghum and severe occlusion among spikes substantially increases the difficulty of sorghum spike recognition, resulting in frequent false positives and false negatives. The target detection model suitable for this environment requires high computational power, and it is difficult to realize real-time detection of sorghum spikes on mobile devices. This study proposes a detection-tracking scheme based on improved YOLOv8s-GOLD-LSKA with optimized DeepSort, aiming to enhance yield estimation accuracy in complex agricultural field scenarios. By integrating the GOLD module’s dual-branch multi-scale feature fusion and the LSKA attention mechanism, a lightweight detection model is developed. The improved DeepSort algorithm enhances tracking robustness in occlusion scenarios by optimizing the confidence threshold filtering (0.46), frame-skipping count, and cascading matching strategy (n = 3, max_age = 40). Combined with the five-point sampling method, the average dry weight of sorghum spikes (0.12 kg) was used to enable rapid yield estimation. The results demonstrate that the improved model achieved a mAP of 85.86% (a 6.63% increase over the original YOLOv8), an F1 score of 81.19%, and a model size reduced to 7.48 MB, with a detection speed of 0.0168 s per frame. The optimized tracking system attained a MOTA of 67.96% and ran at 42 FPS. Image- and video-based yield estimation accuracies reached 89–96% and 75–93%, respectively, with single-frame latency as low as 0.047 s. By optimizing the full detection–tracking–yield pipeline, this solution overcomes challenges in small object missed detections, ID switches under occlusion, and real-time processing in complex scenarios. Its lightweight, high-efficiency design is well suited for deployment on UAVs and mobile terminals, providing robust technical support for intelligent sorghum monitoring and precision agriculture management, and thereby playing a crucial role in driving agricultural digital transformation. Full article
Show Figures

Figure 1

22 pages, 20735 KB  
Article
High-Throughput ORB Feature Extraction on Zynq SoC for Real-Time Structure-from-Motion Pipelines
by Panteleimon Stamatakis and John Vourvoulakis
J. Imaging 2025, 11(6), 178; https://doi.org/10.3390/jimaging11060178 - 28 May 2025
Viewed by 1410
Abstract
This paper presents a real-time system for feature detection and description, the first stage in a structure-from-motion (SfM) pipeline. The proposed system leverages an optimized version of the ORB algorithm (oriented FAST and rotated BRIEF) implemented on the Digilent Zybo Z7020 FPGA board [...] Read more.
This paper presents a real-time system for feature detection and description, the first stage in a structure-from-motion (SfM) pipeline. The proposed system leverages an optimized version of the ORB algorithm (oriented FAST and rotated BRIEF) implemented on the Digilent Zybo Z7020 FPGA board equipped with the Xilinx Zynq-7000 SoC. The system accepts real-time video input (60 fps, 1920 × 1080 resolution, 24-bit color) via HDMI or a camera module. In order to support high frame rates for full-HD images, a double-data-rate pipeline scheme was adopted for Harris functions. Gray-scale video with features identified in red is exported through a separate HDMI port. Feature descriptors are calculated inside the FPGA by Zynq’s programmable logic and verified using Xilinx’s ILA IP block on a connected computer running Vivado. The implemented system achieves a latency of 192.7 microseconds, which is suitable for real-time applications. The proposed architecture is evaluated in terms of repeatability, matching retention and matching accuracy in several image transformations. It meets satisfactory accuracy and performance considering that there are slight changes between successive frames. This work paves the way for future research on the implementation of the remaining stages of a real-time SfM pipeline on the proposed hardware platform. Full article
(This article belongs to the Special Issue Recent Techniques in Image Feature Extraction)
Show Figures

Figure 1

21 pages, 4777 KB  
Article
Harnessing Semantic and Trajectory Analysis for Real-Time Pedestrian Panic Detection in Crowded Micro-Road Networks
by Rongyong Zhao, Lingchen Han, Yuxin Cai, Bingyu Wei, Arifur Rahman, Cuiling Li and Yunlong Ma
Appl. Sci. 2025, 15(10), 5394; https://doi.org/10.3390/app15105394 - 12 May 2025
Viewed by 984
Abstract
Pedestrian panic behavior is a primary cause of overcrowding and stampede accidents in public micro-road network areas with high pedestrian density. However, reliably detecting such behaviors remains challenging due to their inherent complexity, variability, and stochastic nature. Current detection models often rely on [...] Read more.
Pedestrian panic behavior is a primary cause of overcrowding and stampede accidents in public micro-road network areas with high pedestrian density. However, reliably detecting such behaviors remains challenging due to their inherent complexity, variability, and stochastic nature. Current detection models often rely on single-modality features, which limits their effectiveness in complex and dynamic crowd scenarios. To overcome these limitations, this study proposes a contour-driven multimodal framework that first employs a CNN (CDNet) to estimate density maps and, by analyzing steep contour gradients, automatically delineates a candidate panic zone. Within these potential panic zones, pedestrian trajectories are analyzed through LSTM networks to capture irregular movements, such as counterflow and nonlinear wandering behaviors. Concurrently, semantic recognition based on Transformer models is utilized to identify verbal distress cues extracted through Baidu AI’s real-time speech-to-text conversion. The three embeddings are fused through a lightweight attention-enhanced MLP, enabling end-to-end inference at 40 FPS on a single GPU. To evaluate branch robustness under streaming conditions, the UCF Crowd dataset (150 videos without panic labels) is processed frame-by-frame at 25 FPS solely for density assessment, whereas full panic detection is validated on 30 real Itaewon-Stampede videos and 160 SUMO/Unity simulated emergencies that include explicit panic annotations. The proposed system achieves 91.7% accuracy and 88.2% F1 on the Itaewon set, outperforming all single- or dual-modality baselines and offering a deployable solution for proactive crowd safety monitoring in transport hubs, festivals, and other high-risk venues. Full article
Show Figures

Figure 1

20 pages, 3071 KB  
Article
A Keyframe Extraction Method for Assembly Line Operation Videos Based on Optical Flow Estimation and ORB Features
by Xiaoyu Gao, Hua Xiang, Tongxi Wang, Wei Zhan, Mengxue Xie, Lingxuan Zhang and Muyu Lin
Sensors 2025, 25(9), 2677; https://doi.org/10.3390/s25092677 - 23 Apr 2025
Viewed by 2024
Abstract
In modern manufacturing, cameras are widely used to record the full workflow of assembly line workers, enabling video-based operational analysis and management. However, these recordings are often excessively long, leading to high storage demands and inefficient processing. Existing keyframe extraction methods typically apply [...] Read more.
In modern manufacturing, cameras are widely used to record the full workflow of assembly line workers, enabling video-based operational analysis and management. However, these recordings are often excessively long, leading to high storage demands and inefficient processing. Existing keyframe extraction methods typically apply uniform strategies across all frames, which are ineffective in detecting subtle movements. To address this, we propose a keyframe extraction method tailored for assembly line videos, combining optical flow estimation with ORB-based visual features. Our approach adapts extraction strategies to actions with different motion amplitudes. Each video frame is first encoded into a feature vector using the ORB algorithm and a bag-of-visual-words model. Optical flow is then calculated using the DIS algorithm, allowing frames to be categorized by motion intensity. Adjacent frames within the same category are grouped, and the appropriate number of clusters, k, is determined based on the group’s characteristics. Keyframes are finally selected via k-means++ clustering within each group. The experimental results show that our method achieves a recall rate of 85.2%, with over 90% recall for actions involving minimal movement. Moreover, the method processes an average of 274 frames per second. These results highlight the method’s effectiveness in identifying subtle actions, reducing redundant content, and delivering high accuracy with efficient performance. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Back to TopTop