Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (155)

Search Parameters:
Keywords = YOLO-D

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
13 pages, 769 KiB  
Article
A Novel You Only Listen Once (YOLO) Deep Learning Model for Automatic Prominent Bowel Sounds Detection: Feasibility Study in Healthy Subjects
by Rohan Kalahasty, Gayathri Yerrapragada, Jieun Lee, Keerthy Gopalakrishnan, Avneet Kaur, Pratyusha Muddaloor, Divyanshi Sood, Charmy Parikh, Jay Gohri, Gianeshwaree Alias Rachna Panjwani, Naghmeh Asadimanesh, Rabiah Aslam Ansari, Swetha Rapolu, Poonguzhali Elangovan, Shiva Sankari Karuppiah, Vijaya M. Dasari, Scott A. Helgeson, Venkata S. Akshintala and Shivaram P. Arunachalam
Sensors 2025, 25(15), 4735; https://doi.org/10.3390/s25154735 (registering DOI) - 31 Jul 2025
Abstract
Accurate diagnosis of gastrointestinal (GI) diseases typically requires invasive procedures or imaging studies that pose the risk of various post-procedural complications or involve radiation exposure. Bowel sounds (BSs), though typically described during a GI-focused physical exam, are highly inaccurate and variable, with low [...] Read more.
Accurate diagnosis of gastrointestinal (GI) diseases typically requires invasive procedures or imaging studies that pose the risk of various post-procedural complications or involve radiation exposure. Bowel sounds (BSs), though typically described during a GI-focused physical exam, are highly inaccurate and variable, with low clinical value in diagnosis. Interpretation of the acoustic characteristics of BSs, i.e., using a phonoenterogram (PEG), may aid in diagnosing various GI conditions non-invasively. Use of artificial intelligence (AI) and improvements in computational analysis can enhance the use of PEGs in different GI diseases and lead to a non-invasive, cost-effective diagnostic modality that has not been explored before. The purpose of this work was to develop an automated AI model, You Only Listen Once (YOLO), to detect prominent bowel sounds that can enable real-time analysis for future GI disease detection and diagnosis. A total of 110 2-minute PEGs sampled at 44.1 kHz were recorded using the Eko DUO® stethoscope from eight healthy volunteers at two locations, namely, left upper quadrant (LUQ) and right lower quadrant (RLQ) after IRB approval. The datasets were annotated by trained physicians, categorizing BSs as prominent or obscure using version 1.7 of Label Studio Software®. Each BS recording was split up into 375 ms segments with 200 ms overlap for real-time BS detection. Each segment was binned based on whether it contained a prominent BS, resulting in a dataset of 36,149 non-prominent segments and 6435 prominent segments. Our dataset was divided into training, validation, and test sets (60/20/20% split). A 1D-CNN augmented transformer was trained to classify these segments via the input of Mel-frequency cepstral coefficients. The developed AI model achieved area under the receiver operating curve (ROC) of 0.92, accuracy of 86.6%, precision of 86.85%, and recall of 86.08%. This shows that the 1D-CNN augmented transformer with Mel-frequency cepstral coefficients achieved creditable performance metrics, signifying the YOLO model’s capability to classify prominent bowel sounds that can be further analyzed for various GI diseases. This proof-of-concept study in healthy volunteers demonstrates that automated BS detection can pave the way for developing more intuitive and efficient AI-PEG devices that can be trained and utilized to diagnose various GI conditions. To ensure the robustness and generalizability of these findings, further investigations encompassing a broader cohort, inclusive of both healthy and disease states are needed. Full article
(This article belongs to the Special Issue Biomedical Signals, Images and Healthcare Data Analysis: 2nd Edition)
Show Figures

Figure 1

22 pages, 6487 KiB  
Article
An RGB-D Vision-Guided Robotic Depalletizing System for Irregular Camshafts with Transformer-Based Instance Segmentation and Flexible Magnetic Gripper
by Runxi Wu and Ping Yang
Actuators 2025, 14(8), 370; https://doi.org/10.3390/act14080370 - 24 Jul 2025
Viewed by 257
Abstract
Accurate segmentation of densely stacked and weakly textured objects remains a core challenge in robotic depalletizing for industrial applications. To address this, we propose MaskNet, an instance segmentation network tailored for RGB-D input, designed to enhance recognition performance under occlusion and low-texture conditions. [...] Read more.
Accurate segmentation of densely stacked and weakly textured objects remains a core challenge in robotic depalletizing for industrial applications. To address this, we propose MaskNet, an instance segmentation network tailored for RGB-D input, designed to enhance recognition performance under occlusion and low-texture conditions. Built upon a Vision Transformer backbone, MaskNet adopts a dual-branch architecture for RGB and depth modalities and integrates multi-modal features using an attention-based fusion module. Further, spatial and channel attention mechanisms are employed to refine feature representation and improve instance-level discrimination. The segmentation outputs are used in conjunction with regional depth to optimize the grasping sequence. Experimental evaluations on camshaft depalletizing tasks demonstrate that MaskNet achieves a precision of 0.980, a recall of 0.971, and an F1-score of 0.975, outperforming a YOLO11-based baseline. In an actual scenario, with a self-designed flexible magnetic gripper, the system maintains a maximum grasping error of 9.85 mm and a 98% task success rate across multiple camshaft types. These results validate the effectiveness of MaskNet in enabling fine-grained perception for robotic manipulation in cluttered, real-world scenarios. Full article
(This article belongs to the Section Actuators for Robotics)
Show Figures

Figure 1

20 pages, 3688 KiB  
Article
Intelligent Fruit Localization and Grasping Method Based on YOLO VX Model and 3D Vision
by Zhimin Mei, Yifan Li, Rongbo Zhu and Shucai Wang
Agriculture 2025, 15(14), 1508; https://doi.org/10.3390/agriculture15141508 - 13 Jul 2025
Viewed by 493
Abstract
Recent years have seen significant interest among agricultural researchers in using robotics and machine vision to enhance intelligent orchard harvesting efficiency. This study proposes an improved hybrid framework integrating YOLO VX deep learning, 3D object recognition, and SLAM-based navigation for harvesting ripe fruits [...] Read more.
Recent years have seen significant interest among agricultural researchers in using robotics and machine vision to enhance intelligent orchard harvesting efficiency. This study proposes an improved hybrid framework integrating YOLO VX deep learning, 3D object recognition, and SLAM-based navigation for harvesting ripe fruits in greenhouse environments, achieving servo control of robotic arms with flexible end-effectors. The method comprises three key components: First, a fruit sample database containing varying maturity levels and morphological features is established, interfaced with an optimized YOLO VX model for target fruit identification. Second, a 3D camera acquires the target fruit’s spatial position and orientation data in real time, and these data are stored in the collaborative robot’s microcontroller. Finally, employing binocular calibration and triangulation, the SLAM navigation module guides the robotic arm to the designated picking location via unobstructed target positioning. Comprehensive comparative experiments between the improved YOLO v12n model and earlier versions were conducted to validate its performance. The results demonstrate that the optimized model surpasses traditional recognition and harvesting methods, offering superior target fruit identification response (minimum 30.9ms) and significantly higher accuracy (91.14%). Full article
Show Figures

Figure 1

30 pages, 25636 KiB  
Article
Cluster-Based Flight Path Construction for Drone-Assisted Pear Pollination Using RGB-D Image Processing
by Arata Kuwahara, Tomotaka Kimura, Sota Okubo, Rion Yoshioka, Keita Endo, Hiroyuki Shimizu, Tomohito Shimada, Chisa Suzuki, Yoshihiro Takemura and Takefumi Hiraguri
Drones 2025, 9(7), 475; https://doi.org/10.3390/drones9070475 - 4 Jul 2025
Viewed by 333
Abstract
This paper proposes a cluster-based flight path construction method for automated drone-assisted pear pollination systems in orchard environments. The approach uses RGB-D (Red-Green-Blue-Depth) sensing through an observation drone equipped with RGB and depth cameras to detect blooming pear flowers. Flower detection is performed [...] Read more.
This paper proposes a cluster-based flight path construction method for automated drone-assisted pear pollination systems in orchard environments. The approach uses RGB-D (Red-Green-Blue-Depth) sensing through an observation drone equipped with RGB and depth cameras to detect blooming pear flowers. Flower detection is performed using a YOLO (You Only Look Once)-based object detection algorithm, and three-dimensional flower positions are estimated by integrating depth information with the drone’s positional and orientation data in the east-north-up coordinate system. To enhance pollination efficiency, the method applies the OPTICS (Ordering Points To Identify the Clustering Structure) algorithm to group detected flowers based on spatial proximity that correspond to branch-level distributions. The cluster centroids then construct a collision-free flight path, with offset vectors ensuring safe navigation and appropriate nozzle orientation for effective pollen spraying. Field experiments conducted using RTK-GNSS-based flight control confirmed the accuracy and stability of generated flight trajectories. The drone hovered in front of each flower cluster and performed uniform spraying along the planned path. The method achieved a fruit set rate of 62.1%, exceeding natural pollination at 53.6% and compared to the 61.9% of manual pollination. These results demonstrate the effectiveness and practicability of the method for real-world deployment in pear orchards. Full article
(This article belongs to the Special Issue UAS in Smart Agriculture: 2nd Edition)
Show Figures

Figure 1

18 pages, 17685 KiB  
Article
Real-Time Object Detection Model for Electric Power Operation Violation Identification
by Xiaoliang Qian, Longxiang Luo, Yang Li, Li Zeng, Zhiwu Chen, Wei Wang and Wei Deng
Information 2025, 16(7), 569; https://doi.org/10.3390/info16070569 - 3 Jul 2025
Viewed by 252
Abstract
The You Only Look Once (YOLO) object detection model has been widely applied to electric power operation violation identification, owing to its balanced performance in detection accuracy and inference speed. However, it still faces the following challenges: (1) insufficient detection capability for irregularly [...] Read more.
The You Only Look Once (YOLO) object detection model has been widely applied to electric power operation violation identification, owing to its balanced performance in detection accuracy and inference speed. However, it still faces the following challenges: (1) insufficient detection capability for irregularly shaped objects; (2) objects with low object-background contrast are easily omitted; (3) improving detection accuracy while maintaining computational efficiency is difficult. To address the above challenges, a novel real-time object detection model is proposed in this paper, which introduces three key innovations. To handle the first challenge, an edge perception cross-stage partial fusion with two convolutions (EPC2f) module that combines edge convolutions with depthwise separable convolutions is proposed, which can enhance the feature representation of irregularly shaped objects with only a slight increase in parameters. To handle the second challenge, an adaptive combination of local and global features module is proposed to enhance the discriminative ability of features while maintaining computational efficiency, where the local and global features are extracted respectively via 1D convolutions and adaptively combined by using learnable weights. To handle the third challenge, a parameter sharing of a multi-scale detection heads scheme is proposed to reduce the number of parameters and improve the interaction between multi-scale detection heads. The ablation study on the Ali Tianchi competition dataset validates the effectiveness of three innovation points and their combination. EAP-YOLO achieves the mAP@0.5 of 93.4% and an mAP@0.5–0.95 of 70.3% on the Ali Tianchi Competition dataset, outperforming 12 other object detection models while satisfying the real-time requirement. Full article
(This article belongs to the Special Issue Computer Vision for Security Applications, 2nd Edition)
Show Figures

Figure 1

27 pages, 13245 KiB  
Article
LHRF-YOLO: A Lightweight Model with Hybrid Receptive Field for Forest Fire Detection
by Yifan Ma, Weifeng Shan, Yanwei Sui, Mengyu Wang and Maofa Wang
Forests 2025, 16(7), 1095; https://doi.org/10.3390/f16071095 - 2 Jul 2025
Viewed by 346
Abstract
Timely and accurate detection of forest fires is crucial for protecting forest ecosystems. However, traditional monitoring methods face significant challenges in effectively detecting forest fires, primarily due to the dynamic spread of flames and smoke, irregular morphologies, and the semi-transparent nature of smoke, [...] Read more.
Timely and accurate detection of forest fires is crucial for protecting forest ecosystems. However, traditional monitoring methods face significant challenges in effectively detecting forest fires, primarily due to the dynamic spread of flames and smoke, irregular morphologies, and the semi-transparent nature of smoke, which make it extremely difficult to extract key visual features. Additionally, deploying these detection systems to edge devices with limited computational resources remains challenging. To address these issues, this paper proposes a lightweight hybrid receptive field model (LHRF-YOLO), which leverages deep learning to overcome the shortcomings of traditional monitoring methods for fire detection on edge devices. Firstly, a hybrid receptive field extraction module is designed by integrating the 2D selective scan mechanism with a residual multi-branch structure. This significantly enhances the model’s contextual understanding of the entire image scene while maintaining low computational complexity. Second, a dynamic enhanced downsampling module is proposed, which employs feature reorganization and channel-wise dynamic weighting strategies to minimize the loss of critical details, such as fine smoke textures, while reducing image resolution. Furthermore, a scale weighted Fusion module is introduced to optimize multi-scale feature fusion through adaptive weight allocation, addressing the issues of information dilution and imbalance caused by traditional fusion methods. Finally, the Mish activation function replaces the SiLU activation function to improve the model’s ability to capture flame edges and faint smoke textures. Experimental results on the self-constructed Fire-SmokeDataset demonstrate that LHRF-YOLO achieves significant model compression while further improving accuracy compared to the baseline model YOLOv11. The parameter count is reduced to only 2.25M (a 12.8% reduction), computational complexity to 5.4 GFLOPs (a 14.3% decrease), and mAP50 is increased to 87.6%, surpassing the baseline model. Additionally, LHRF-YOLO exhibits leading generalization performance on the cross-scenario M4SFWD dataset. The proposed method balances performance and resource efficiency, providing a feasible solution for real-time and efficient fire detection on resource-constrained edge devices with significant research value. Full article
(This article belongs to the Special Issue Forest Fires Prediction and Detection—2nd Edition)
Show Figures

Figure 1

17 pages, 7199 KiB  
Article
YED-Net: Yoga Exercise Dynamics Monitoring with YOLOv11-ECA-Enhanced Detection and DeepSORT Tracking
by Youyu Zhou, Shu Dong, Hao Sheng and Wei Ke
Appl. Sci. 2025, 15(13), 7354; https://doi.org/10.3390/app15137354 - 30 Jun 2025
Viewed by 354
Abstract
Against the backdrop of the deep integration of national fitness and sports science, this study addresses the lack of standardized movement assessment in yoga training by proposing an intelligent analysis system that integrates an improved YOLOv11-ECA detector with the DeepSORT tracking algorithm. A [...] Read more.
Against the backdrop of the deep integration of national fitness and sports science, this study addresses the lack of standardized movement assessment in yoga training by proposing an intelligent analysis system that integrates an improved YOLOv11-ECA detector with the DeepSORT tracking algorithm. A dynamic adaptive anchor mechanism and an Efficient Channel Attention (ECA) module are introduced, while the depthwise separable convolution in the C3k2 module is optimized with a kernel size of 2. Furthermore, a Parallel Spatial Attention (PSA) mechanism is incorporated to enhance multi-target feature discrimination. These enhancements enable the model to achieve a high detection accuracy of 98.6% mAP@0.5 while maintaining low computational complexity (2.35 M parameters, 3.11 GFLOPs). Evaluated on the SND Sun Salutation Yoga Dataset released in 2024, the improved model achieves a real-time processing speed of 85.79 frames per second (FPS) on an RTX 3060 platform, with an 18% reduction in computational cost compared to the baseline. Notably, it achieves a 0.9% improvement in AP@0.5 for small targets (<20 px). By integrating the Mars-smallCNN feature extraction network with a Kalman filtering-based trajectory prediction module, the system attains 58.3% Multiple Object Tracking Accuracy (MOTA) and 62.1% Identity F1 Score (IDF1) in dense multi-object scenarios, representing an improvement of approximately 9.8 percentage points over the conventional YOLO+DeepSORT method. Ablation studies confirm that the ECA module, implemented via lightweight 1D convolution, enhances channel attention modeling efficiency by 23% compared to the original SE module and reduces the false detection rate by 1.2 times under complex backgrounds. This study presents a complete “detection–tracking–assessment” pipeline for intelligent sports training. Future work aims to integrate 3D pose estimation to develop a closed-loop biomechanical analysis system, thereby advancing sports science toward intelligent decision-making paradigms. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

22 pages, 3981 KiB  
Article
Individual Recognition of a Group Beef Cattle Based on Improved YOLO v5
by Ziruo Li, Yadan Zhang, Xi Kang, Tianci Mao, Yanbin Li and Gang Liu
Agriculture 2025, 15(13), 1391; https://doi.org/10.3390/agriculture15131391 - 28 Jun 2025
Cited by 1 | Viewed by 359
Abstract
Deep learning-based individual recognition of beef cattle has improved the efficiency and effectiveness of individual recognition, providing technical support for modern large-scale farms. However, issues such as over-reliance on back patterns, similar patterns of adjacent cattle leading to low recognition accuracy, and difficulties [...] Read more.
Deep learning-based individual recognition of beef cattle has improved the efficiency and effectiveness of individual recognition, providing technical support for modern large-scale farms. However, issues such as over-reliance on back patterns, similar patterns of adjacent cattle leading to low recognition accuracy, and difficulties in deploying models on edge devices exist in the process of group cattle recognition. In this study, we proposed a model based on improved YOLO v5. Specifically, a Simple, Parameter-Free (SimAM) attention module is connected with the residual network and Multidimensional Collaborative Attention mechanism (MCA) to obtain the MCA-SimAM-Resnet (MRS-ATT) module, enhancing the model’s feature extraction and expression capabilities. Then, the LMPDIoU loss function is used to improve the localization accuracy of bounding boxes during target detection. Finally, structural pruning is applied to the model to achieve a lightweight version of the improved YOLO v5. Using 211 test images, the improved YOLO v5 model achieved an individual recognition precision (P) of 93.2%, recall (R) of 94.6%, mean Average Precision (mAP) of 94.5%, FLOPs of 7.84, 13.22 M parameters, and an average inference speed of 0.0746 s. The improved YOLO v5 model can accurately and quickly identify individuals within groups of cattle, with fewer parameters, making it easy to deploy on edge devices, thereby accelerating the development of intelligent cattle farming. Full article
(This article belongs to the Special Issue Computer Vision Analysis Applied to Farm Animals)
Show Figures

Figure 1

26 pages, 14660 KiB  
Article
Succulent-YOLO: Smart UAV-Assisted Succulent Farmland Monitoring with CLIP-Based YOLOv10 and Mamba Computer Vision
by Hui Li, Fan Zhao, Feng Xue, Jiaqi Wang, Yongying Liu, Yijia Chen, Qingyang Wu, Jianghan Tao, Guocheng Zhang, Dianhan Xi, Jundong Chen and Hill Hiroki Kobayashi
Remote Sens. 2025, 17(13), 2219; https://doi.org/10.3390/rs17132219 - 28 Jun 2025
Viewed by 527
Abstract
Recent advances in unmanned aerial vehicle (UAV) technology combined with deep learning techniques have greatly improved agricultural monitoring. However, accurately processing images at low resolutions remains challenging for precision cultivation of succulents. To address this issue, this study proposes a novel method that [...] Read more.
Recent advances in unmanned aerial vehicle (UAV) technology combined with deep learning techniques have greatly improved agricultural monitoring. However, accurately processing images at low resolutions remains challenging for precision cultivation of succulents. To address this issue, this study proposes a novel method that combines cutting-edge super-resolution reconstruction (SRR) techniques with object detection and then applies the above model in a unified drone framework to achieve large-scale, reliable monitoring of succulent plants. Specifically, we introduce MambaIR, an innovative SRR method leveraging selective state-space models, significantly improving the quality of UAV-captured low-resolution imagery (achieving a PSNR of 23.83 dB and an SSIM of 79.60%) and surpassing current state-of-the-art approaches. Additionally, we develop Succulent-YOLO, a customized target detection model optimized for succulent image classification, achieving a mean average precision (mAP@50) of 87.8% on high-resolution images. The integrated use of MambaIR and Succulent-YOLO achieves an mAP@50 of 85.1% when tested on enhanced super-resolution images, closely approaching the performance on original high-resolution images. Through extensive experimentation supported by Grad-CAM visualization, our method effectively captures critical features of succulents, identifying the best trade-off between resolution enhancement and computational demands. By overcoming the limitations associated with low-resolution UAV imagery in agricultural monitoring, this solution provides an effective, scalable approach for evaluating succulent plant growth. Addressing image-quality issues further facilitates informed decision-making, reducing technical challenges. Ultimately, this study provides a robust foundation for expanding the practical use of UAVs and artificial intelligence in precision agriculture, promoting sustainable farming practices through advanced remote sensing technologies. Full article
Show Figures

Graphical abstract

26 pages, 5558 KiB  
Article
ZoomHead: A Flexible and Lightweight Detection Head Structure Design for Slender Cracks
by Hua Li, Fan Yang, Junzhou Huo, Qiang Gao, Shusen Deng and Chang Guo
Sensors 2025, 25(13), 3990; https://doi.org/10.3390/s25133990 - 26 Jun 2025
Viewed by 419
Abstract
Detecting metal surface crack defects is of great significance for the safe operation of industrial equipment. However, most existing mainstream deep-object detection models suffer from complex structures, large parameter sizes, and high training costs, which hinder their deployment and application in frontline construction [...] Read more.
Detecting metal surface crack defects is of great significance for the safe operation of industrial equipment. However, most existing mainstream deep-object detection models suffer from complex structures, large parameter sizes, and high training costs, which hinder their deployment and application in frontline construction sites. Therefore, this paper optimizes the existing YOLO series head structure and proposes a lightweight detection head structure, ZoomHead, with lower computational complexity and stronger detection performance. First, the GroupNorm2d module replaces the BatchNorm2d module to stabilize the model’s feature distribution and accelerate the training speed. Second, Detail Enhanced Convolution (DEConv) replaces traditional convolution kernels, and shared convolution is adopted to reduce redundant structures, which enhances the ability to capture details and improves the detection performance for small objects. Next, the Zoom scale factor is introduced to achieve proportional scaling of the convolution kernels in the regression branch, minimizing redundant computational complexity. Finally, using the YOLOv10 and YOLO11 series models as baseline models, ZoomHead was used to replace the head structure of the baseline models entirely, and a series of performance comparison experiments were conducted on the rail surface crack dataset and NEU surface defect database. The results showed that the integration of ZoomHead effectively improved the model’s detection accuracy, reduced the number of parameters and computations, and increased the FPS, achieving a good balance between detection accuracy and speed. In the comparative experiment of the SOTA model, the addition of ZoomHead resulted in the model having the smallest number of parameters and the highest FPS, while maintaining the same mAP value as the SOTA model, indicating that the ZoomHead structure proposed in this paper has better comprehensive detection performance. Full article
(This article belongs to the Special Issue Convolutional Neural Network Technology for 3D Imaging and Sensing)
Show Figures

Figure 1

21 pages, 12722 KiB  
Article
PC3D-YOLO: An Enhanced Multi-Scale Network for Crack Detection in Precast Concrete Components
by Zichun Kang, Kedi Gu, Andrew Yin Hu, Haonan Du, Qingyang Gu, Yang Jiang and Wenxia Gan
Buildings 2025, 15(13), 2225; https://doi.org/10.3390/buildings15132225 - 25 Jun 2025
Viewed by 450
Abstract
Crack detection in precast concrete components aims to achieve precise extraction of crack features within complex image backgrounds. Current computer vision-based methods typically conduct limited local searches at a single scale, constraining the model’s capacity for feature extraction and fusion in information-rich environments. [...] Read more.
Crack detection in precast concrete components aims to achieve precise extraction of crack features within complex image backgrounds. Current computer vision-based methods typically conduct limited local searches at a single scale, constraining the model’s capacity for feature extraction and fusion in information-rich environments. To address these limitations, we propose PC3D-YOLO, an enhanced framework derived from YOLOv11, which strengthens long-range dependency modeling through multi-scale feature integration, offering a novel approach for crack detection in precast concrete structures. Our methodology involves three key innovations: (1) the Multi-Dilation Spatial-Channel Fusion with Shuffling (MSFS) module, employing dilated convolutions and channel shuffling to enable global feature fusion, replaces the C3K2 bottleneck module to enhance long-distance dependency capture; (2) the AIFI_M2SA module substitutes the conventional SPPF to mitigate its restricted receptive field and information loss, incorporating multi-scale attention for improved near-far contextual integration; (3) a redesigned neck network (MSCD-Net) preserves rich contextual information across all feature scales. Experimental results demonstrate that, on the self-developed dataset, the proposed algorithm achieves a recall of 78.8%, an AP@50 of 86.3%, and an AP@50-95 of 65.6%, outperforming the YOLOv11 algorithm. Furthermore, evaluations on the CRACKS_MANISHA and DECA datasets also confirm the proposed model’s strong generalization capability across different data domains. Full article
(This article belongs to the Section Building Materials, and Repair & Renovation)
Show Figures

Figure 1

19 pages, 26591 KiB  
Article
Hand Washing Gesture Recognition Using Synthetic Dataset
by Rüstem Özakar and Eyüp Gedikli
J. Imaging 2025, 11(7), 208; https://doi.org/10.3390/jimaging11070208 - 22 Jun 2025
Cited by 1 | Viewed by 462
Abstract
Hand hygiene is paramount for public health, especially in critical sectors like healthcare and the food industry. Ensuring compliance with recommended hand washing gestures is vital, necessitating autonomous evaluation systems leveraging machine learning techniques. However, the scarcity of comprehensive datasets poses a significant [...] Read more.
Hand hygiene is paramount for public health, especially in critical sectors like healthcare and the food industry. Ensuring compliance with recommended hand washing gestures is vital, necessitating autonomous evaluation systems leveraging machine learning techniques. However, the scarcity of comprehensive datasets poses a significant challenge. This study addresses this issue by presenting an open synthetic hand washing dataset, created using 3D computer-generated imagery, comprising 96,000 frames (equivalent to 64 min of footage), encompassing eight gestures performed by four characters in four diverse environments. This synthetic dataset includes RGB images, depth/isolated depth images and hand mask images. Using this dataset, four neural network models, Inception-V3, Yolo-8n, Yolo-8n segmentation and PointNet, were trained for gesture classification. The models were subsequently evaluated on a large real-world hand washing dataset, demonstrating successful classification accuracies of 56.9% for Inception-V3, 76.3% for Yolo-8n and 79.3% for Yolo-8n segmentation. These findings underscore the effectiveness of synthetic data in training machine learning models for hand washing gesture recognition. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

38 pages, 3698 KiB  
Review
Enhancing Autonomous Truck Navigation in Underground Mines: A Review of 3D Object Detection Systems, Challenges, and Future Trends
by Ellen Essien and Samuel Frimpong
Drones 2025, 9(6), 433; https://doi.org/10.3390/drones9060433 - 14 Jun 2025
Viewed by 1058
Abstract
Integrating autonomous haulage systems into underground mining has revolutionized safety and operational efficiency. However, deploying 3D detection systems for autonomous truck navigation in such an environment faces persistent challenges due to dust, occlusion, complex terrains, and low visibility. This affects their reliability and [...] Read more.
Integrating autonomous haulage systems into underground mining has revolutionized safety and operational efficiency. However, deploying 3D detection systems for autonomous truck navigation in such an environment faces persistent challenges due to dust, occlusion, complex terrains, and low visibility. This affects their reliability and real-time processing. While existing reviews have discussed object detection techniques and sensor-based systems, providing valuable insights into their applications, only a few have addressed the unique underground challenges that affect 3D detection models. This review synthesizes the current advancements in 3D object detection models for underground autonomous truck navigation. It assesses deep learning algorithms, fusion techniques, multi-modal sensor suites, and limited datasets in an underground detection system. This study uses systematic database searches with selection criteria for relevance to underground perception. The findings of this work show that the mid-level fusion method for combining different sensor suites enhances robust detection. Though YOLO (You Only Look Once)-based detection models provide superior real-time performance, challenges persist in small object detection, computational trade-offs, and data scarcity. This paper concludes by identifying research gaps and proposing future directions for a more scalable and resilient underground perception system. The main novelty is its review of underground 3D detection systems in autonomous trucks. Full article
Show Figures

Figure 1

17 pages, 4803 KiB  
Article
Deep Learning-Enhanced Electronic Packaging Defect Detection via Fused Thermal Simulation and Infrared Thermography
by Zijian Peng and Hu He
Appl. Sci. 2025, 15(12), 6592; https://doi.org/10.3390/app15126592 - 11 Jun 2025
Viewed by 528
Abstract
Advancements in semiconductor packaging toward higher integration and interconnect density have increased the risk of structural defects—such as missing solder balls, pad delamination, and bridging—that can disrupt thermal conduction paths, leading to localized overheating and potential chip failure. To address the limitations of [...] Read more.
Advancements in semiconductor packaging toward higher integration and interconnect density have increased the risk of structural defects—such as missing solder balls, pad delamination, and bridging—that can disrupt thermal conduction paths, leading to localized overheating and potential chip failure. To address the limitations of traditional non-destructive testing methods in detecting micron-scale defects, this study introduces a multimodal detection approach combining finite-element thermal simulation, infrared thermography, and the YOLO11 deep learning network. A comprehensive 3D finite-element model of a ball grid array (BGA) package was developed to analyze the impact of typical defects on both steady-state and transient thermal distributions, providing a solid physical foundation for modeling defect-induced thermal characteristics. An infrared thermal imaging platform was established to capture real thermal images, which were then compared with simulation results to verify physical consistency. An integrated dataset of simulated and infrared images was constructed to enhance the robustness of the detection model. Leveraging the YOLO11 network’s capabilities in end-to-end training, dataset small-object detection, and rapid inference, the system achieved accurate and rapid localization of defect regions. Experimental results show a mean average precision (mAP) of 99.5% at an intersection over union (IoU) threshold of 0.5 and an inference speed of 556 frames per second on the simulation dataset. Training with the hybrid dataset improved detection accuracy on real images from 41.7% to 91.7%, significantly outperforming models trained on a single data source. Furthermore, the maximum temperature discrepancy between simulation and experimental measurements was less than 5%, validating the reliability of the proposed method. This research offers a high-precision, real-time solution for semiconductor packaging defect detection, with substantial potential for industrial application. Full article
(This article belongs to the Special Issue Microelectronic Engineering: Devices, Materials, and Technologies)
Show Figures

Figure 1

23 pages, 52584 KiB  
Article
DMSF-YOLO: A Dynamic Multi-Scale Fusion Method for Maize Tassel Detection in UAV Low-Altitude Remote Sensing Images
by Dongbin Liu, Jiandong Fang and Yudong Zhao
Agriculture 2025, 15(12), 1259; https://doi.org/10.3390/agriculture15121259 - 11 Jun 2025
Viewed by 1282
Abstract
Maize tassels are critical phenotypic organs in maize, and their quantity is essential for determining tasseling stages, estimating yield potential, monitoring growth status, and supporting crop breeding programs. However, tassel identification in complex field environments presents significant challenges due to occlusion, variable lighting [...] Read more.
Maize tassels are critical phenotypic organs in maize, and their quantity is essential for determining tasseling stages, estimating yield potential, monitoring growth status, and supporting crop breeding programs. However, tassel identification in complex field environments presents significant challenges due to occlusion, variable lighting conditions, multi-scale target complexities, and the asynchronous and irregular growth patterns characteristic of maize tassels. In response to these challenges, this paper presents a DMSF-YOLO model for maize tassel detection. In the network’s backbone front, conventional convolutions are replaced with conditional parameter convolutions (CondConv) to enhance feature extraction capabilities. A novel DMSF-P2 network architecture is designed, including a multi-scale fusion module (SSFF-D), a scale-splicing module (TFE), and a small object detection layer (P2), which further enhances the model’s feature fusion capabilities. By integrating a dynamic detection head (Dyhead), superior recognition accuracy for maize tassels across various scales is achieved. Additionally, the Wise-IoU loss function is used to improve localization precision and strengthen the model’s adaptability. Experimental results demonstrate that on our self-built maize tassel detection dataset, the proposed DMSF-YOLO model shows remarkable superiority compared with the baseline YOLOv8n model, with precision (P), recall (R), mAP50, and mAP50:95 increasing by 0.5%, 3.4%, 2.4%, and 3.9%, respectively. This approach enables accurate and reliable maize tassel detection in complex field environments, providing effective technical support for precision field management of maize crops. Full article
Show Figures

Figure 1

Back to TopTop