MDPI - Publisher of Open Access Journals

30 pages, 2099 KiB

Open AccessArticle

SABE-YOLO: Structure-Aware and Boundary-Enhanced YOLO for Weld Seam Instance Segmentation

by Rui Wen, Wu Xie, Yong Fan and Lanlan Shen

J. Imaging 2025, 11(8), 262; https://doi.org/10.3390/jimaging11080262 - 6 Aug 2025

Accurate weld seam recognition is essential in automated welding systems, as it directly affects path planning and welding quality. With the rapid advancement of industrial vision, weld seam instance segmentation has emerged as a prominent research focus in both academia and industry. However, [...] Read more.

Accurate weld seam recognition is essential in automated welding systems, as it directly affects path planning and welding quality. With the rapid advancement of industrial vision, weld seam instance segmentation has emerged as a prominent research focus in both academia and industry. However, existing approaches still face significant challenges in boundary perception and structural representation. Due to the inherently elongated shapes, complex geometries, and blurred edges of weld seams, current segmentation models often struggle to maintain high accuracy in practical applications. To address this issue, a novel structure-aware and boundary-enhanced YOLO (SABE-YOLO) is proposed for weld seam instance segmentation. First, a Structure-Aware Fusion Module (SAFM) is designed to enhance structural feature representation through strip pooling attention and element-wise multiplicative fusion, targeting the difficulty in extracting elongated and complex features. Second, a C2f-based Boundary-Enhanced Aggregation Module (C2f-BEAM) is constructed to improve edge feature sensitivity by integrating multi-scale boundary detail extraction, feature aggregation, and attention mechanisms. Finally, the inner minimum point distance-based intersection over union (Inner-MPDIoU) is introduced to improve localization accuracy for weld seam regions. Experimental results on the self-built weld seam image dataset show that SABE-YOLO outperforms YOLOv8n-Seg by 3 percentage points in the AP(50–95) metric, reaching 46.3%. Meanwhile, it maintains a low computational cost (18.3 GFLOPs) and a small number of parameters (6.6M), while achieving an inference speed of 127 FPS, demonstrating a favorable trade-off between segmentation accuracy and computational efficiency. The proposed method provides an effective solution for high-precision visual perception of complex weld seam structures and demonstrates strong potential for industrial application. Full article

(This article belongs to the Section Image and Video Processing)

25 pages, 2518 KiB

Open AccessArticle

An Efficient Semantic Segmentation Framework with Attention-Driven Context Enhancement and Dynamic Fusion for Autonomous Driving

by Jia Tian, Peizeng Xin, Xinlu Bai, Zhiguo Xiao and Nianfeng Li

Appl. Sci. 2025, 15(15), 8373; https://doi.org/10.3390/app15158373 - 28 Jul 2025

Viewed by 349

Abstract

In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where [...] Read more.

In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where strict real-time performance is essential. Achieving an effective balance between speed and accuracy has thus become a central challenge in this field. To address this issue, we present a lightweight semantic segmentation model tailored for the perception requirements of autonomous vehicles. The architecture follows an encoder–decoder paradigm, which not only preserves the capability for deep feature extraction but also facilitates multi-scale information integration. The encoder leverages a high-efficiency backbone, while the decoder introduces a dynamic fusion mechanism designed to enhance information interaction between different feature branches. Recognizing the limitations of convolutional networks in modeling long-range dependencies and capturing global semantic context, the model incorporates an attention-based feature extraction component. This is further augmented by positional encoding, enabling better awareness of spatial structures and local details. The dynamic fusion mechanism employs an adaptive weighting strategy, adjusting the contribution of each feature channel to reduce redundancy and improve representation quality. To validate the effectiveness of the proposed network, experiments were conducted on a single RTX 3090 GPU. The Dynamic Real-time Integrated Vision Encoder–Segmenter Network (DriveSegNet) achieved a mean Intersection over Union (mIoU) of 76.9% and an inference speed of 70.5 FPS on the Cityscapes test dataset, 74.6% mIoU and 139.8 FPS on the CamVid test dataset, and 35.8% mIoU with 108.4 FPS on the ADE20K dataset. The experimental results demonstrate that the proposed method achieves an excellent balance between inference speed, segmentation accuracy, and model size. Full article

► Show Figures

Figure 1

21 pages, 4863 KiB

Open AccessArticle

Detection Model for Cotton Picker Fire Recognition Based on Lightweight Improved YOLOv11

by Zhai Shi, Fangwei Wu, Changjie Han, Dongdong Song and Yi Wu

Agriculture 2025, 15(15), 1608; https://doi.org/10.3390/agriculture15151608 - 25 Jul 2025

Viewed by 284

Abstract

In response to the limited research on fire detection in cotton pickers and the issue of low detection accuracy in visual inspection, this paper proposes a computer vision-based detection method. The method is optimized according to the structural characteristics of cotton pickers, and [...] Read more.

In response to the limited research on fire detection in cotton pickers and the issue of low detection accuracy in visual inspection, this paper proposes a computer vision-based detection method. The method is optimized according to the structural characteristics of cotton pickers, and a lightweight improved YOLOv11 algorithm is designed for cotton fire detection in cotton pickers. The backbone of the model is replaced with the MobileNetV2 network to achieve effective model lightweighting. In addition, the convolutional layers in the original C3k2 block are optimized using partial convolutions to reduce computational redundancy and improve inference efficiency. Furthermore, a visual attention mechanism named CBAM-ECA (Convolutional Block Attention Module-Efficient Channel Attention) is designed to suit the complex working conditions of cotton pickers. This mechanism aims to enhance the model’s feature extraction capability under challenging environmental conditions, thereby improving overall detection accuracy. To further improve localization performance and accelerate convergence, the loss function is also modified. These improvements enable the model to achieve higher precision in fire detection while ensuring fast and accurate localization. Experimental results demonstrate that the improved model reduces the number of parameters by 38%, increases the frame processing speed (FPS) by 13.2%, and decreases the computational complexity (GFLOPs) by 42.8%, compared to the original model. The detection accuracy for flaming combustion, smoldering combustion, and overall detection is improved by 1.4%, 3%, and 1.9%, respectively, with an increase of 2.4% in mAP (mean average precision). Compared to other models—YOLOv3-tiny, YOLOv5, YOLOv8, and YOLOv10—the proposed method achieves higher detection accuracy by 5.9%, 7%, 5.9%, and 5.3%, respectively, and shows improvements in mAP by 5.4%, 5%, 4.8%, and 6.3%. The improved detection algorithm maintains high accuracy while achieving faster inference speed and fewer model parameters. These improvements lay a solid foundation for fire prevention and suppression in cotton collection boxes on cotton pickers. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

18 pages, 3102 KiB

Open AccessArticle

A Multicomponent Face Verification and Identification System

by Athanasios Douklias, Ioannis Zorzos, Evangelos Maltezos, Vasilis Nousis, Spyridon Nektarios Bolierakis, Lazaros Karagiannidis, Eleftherios Ouzounoglou and Angelos Amditis

Appl. Sci. 2025, 15(15), 8161; https://doi.org/10.3390/app15158161 - 22 Jul 2025

Viewed by 245

Abstract

Face recognition technology is a biometric technology, which is based on the identification or verification of facial features. Automatic face recognition is an active research field in the context of computer vision and artificial intelligence (AI) that is fundamental for a variety of [...] Read more.

Face recognition technology is a biometric technology, which is based on the identification or verification of facial features. Automatic face recognition is an active research field in the context of computer vision and artificial intelligence (AI) that is fundamental for a variety of real-time applications. In this research, the design and implementation of a face verification and identification system of a flexible, modular, secure, and scalable architecture is proposed. The proposed system incorporates several and various types of system components: (i) portable capabilities (mobile application and mixed reality [MR] glasses), (ii) enhanced monitoring and visualization via a user-friendly Web-based user interface (UI), and (iii) information sharing via middleware to other external systems. The experiments showed that such interconnected and complementary system components were able to perform robust and real-time results related to face identification and verification. Furthermore, to identify a proper model of high accuracy, robustness, and performance speed for face identification and verification tasks, a comprehensive evaluation of multiple face recognition pre-trained models (FaceNet, ArcFace, Dlib, and MobileNetV2) on a curated version of the ID vs. Spot dataset was performed. Among the models used, FaceNet emerged as a preferable choice for real-time tasks due to its balance between accuracy and inference speed for both face identification and verification tasks achieving AUC of 0.99, Rank-1 of 91.8%, Rank-5 of 95.8%, FNR of 2% and FAR of 0.1%, accuracy of 98.6%, and inference speed of 52 ms. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Image Processing)

► Show Figures

Figure 1

20 pages, 5404 KiB

Open AccessArticle

Flying Steel Detection in Wire Rod Production Based on Improved You Only Look Once v8

by Yifan Lu, Fei Zhang, Xiaozhan Li, Jian Zhang, Xiong Xiao, Lijun Wang and Xiaofei Xiang

Processes 2025, 13(7), 2297; https://doi.org/10.3390/pr13072297 - 18 Jul 2025

Viewed by 432

Abstract

In the process of high-speed wire rod production, flying steel accidents may occur due to various reasons. Current detection methods relying on sensors like hardware make debugging complex as well as limit real-time and accuracy. These methods are complicated to debug, and the [...] Read more.

In the process of high-speed wire rod production, flying steel accidents may occur due to various reasons. Current detection methods relying on sensors like hardware make debugging complex as well as limit real-time and accuracy. These methods are complicated to debug, and the real-time and accuracy of detection are poor. Therefore, this paper proposes a flying steel detection method based on improved You Only Look Once v8 (YOLOv8), which can realize high-precision flying steel detection based on machine vision through the monitoring video of the production site. Firstly, the Omni-dimensional Dynamic Convolution (ODConv) is added to the backbone network to improve the feature extraction ability of the input image. Then, a lightweight C2f-PCCA_RVB module is proposed to be integrated into the neck network, so as to carry out the lightweight design of the neck network. Finally, the Efficient Multi-Scale Attention (EMA) module is added to the neck network to fuse the context information of different scales and improve the feature extraction ability. The experimental results show that the average accuracy (mAP@0.5) of the flying steel detection method based on the improved YOLOv8 is 99.1%, and the latency is reduced to 2.5 ms, which can realize the real-time accurate detection of the flying steel. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Image Processing for Industrial Processes)

► Show Figures

Figure 1

29 pages, 4633 KiB

Open AccessArticle

Failure Detection of Laser Welding Seam for Electric Automotive Brake Joints Based on Image Feature Extraction

by Diqing Fan, Chenjiang Yu, Ling Sha, Haifeng Zhang and Xintian Liu

Machines 2025, 13(7), 616; https://doi.org/10.3390/machines13070616 - 17 Jul 2025

Viewed by 261

Abstract

As a key component in the hydraulic brake system of automobiles, the brake joint directly affects the braking performance and driving safety of the vehicle. Therefore, improving the quality of brake joints is crucial. During the processing, due to the complexity of the [...] Read more.

As a key component in the hydraulic brake system of automobiles, the brake joint directly affects the braking performance and driving safety of the vehicle. Therefore, improving the quality of brake joints is crucial. During the processing, due to the complexity of the welding material and welding process, the weld seam is prone to various defects such as cracks, pores, undercutting, and incomplete fusion, which can weaken the joint and even lead to product failure. Traditional weld seam detection methods include destructive testing and non-destructive testing; however, destructive testing has high costs and long cycles, and non-destructive testing, such as radiographic testing and ultrasonic testing, also have problems such as high consumable costs, slow detection speed, or high requirements for operator experience. In response to these challenges, this article proposes a defect detection and classification method for laser welding seams of automotive brake joints based on machine vision inspection technology. Laser-welded automotive brake joints are subjected to weld defect detection and classification, and image processing algorithms are optimized to improve the accuracy of detection and failure analysis by utilizing the high efficiency, low cost, flexibility, and automation advantages of machine vision technology. This article first analyzes the common types of weld defects in laser welding of automotive brake joints, including craters, holes, and nibbling, and explores the causes and characteristics of these defects. Then, an image processing algorithm suitable for laser welding of automotive brake joints was studied, including pre-processing steps such as image smoothing, image enhancement, threshold segmentation, and morphological processing, to extract feature parameters of weld defects. On this basis, a welding seam defect detection and classification system based on the cascade classifier and AdaBoost algorithm was designed, and efficient recognition and classification of welding seam defects were achieved by training the cascade classifier. The results show that the system can accurately identify and distinguish pits, holes, and undercutting defects in welds, with an average classification accuracy of over 90%. The detection and recognition rate of pit defects reaches 100%, and the detection accuracy of undercutting defects is 92.6%. And the overall missed detection rate is less than 3%, with both the missed detection rate and false detection rate for pit defects being 0%. The average detection time for each image is 0.24 s, meeting the real-time requirements of industrial automation. Compared with infrared and ultrasonic detection methods, the proposed machine-vision-based detection system has significant advantages in detection speed, surface defect recognition accuracy, and industrial adaptability. This provides an efficient and accurate solution for laser welding defect detection of automotive brake joints. Full article

(This article belongs to the Special Issue Recent Analysis and Research in the Field of Vehicle Traffic Safety, 2nd Edition)

► Show Figures

Figure 1

22 pages, 6645 KiB

Open AccessArticle

Visual Detection on Aircraft Wing Icing Process Using a Lightweight Deep Learning Model

by Yang Yan, Chao Tang, Jirong Huang, Zhixiong Cen and Zonghong Xie

Aerospace 2025, 12(7), 627; https://doi.org/10.3390/aerospace12070627 - 12 Jul 2025

Viewed by 209

Abstract

Aircraft wing icing significantly threatens aviation safety, causing substantial losses to the aviation industry each year. High transparency and blurred edges of icing areas in wing images pose challenges to wing icing detection by machine vision. To address these challenges, this study proposes [...] Read more.

Aircraft wing icing significantly threatens aviation safety, causing substantial losses to the aviation industry each year. High transparency and blurred edges of icing areas in wing images pose challenges to wing icing detection by machine vision. To address these challenges, this study proposes a detection model, Wing Icing Detection DeeplabV3+ (WID-DeeplabV3+), for efficient and precise aircraft wing leading edge icing detection under natural lighting conditions. WID-DeeplabV3+ adopts the lightweight MobileNetV3 as its backbone network to enhance the extraction of edge features in icing areas. Ghost Convolution and Atrous Spatial Pyramid Pooling modules are incorporated to reduce model parameters and computational complexity. The model is optimized using the transfer learning method, where pre-trained weights are utilized to accelerate convergence and enhance performance. Experimental results show WID-DeepLabV3+ segments the icing edge at 1920 × 1080 within 0.03 s. The model achieves the accuracy of 97.15%, an IOU of 94.16%, a precision of 97%, and a recall of 96.96%, representing respective improvements of 1.83%, 3.55%, 1.79%, and 2.04% over DeeplabV3+. The number of parameters and computational complexity are reduced by 92% and 76%, respectively. With high accuracy, superior IOU, and fast inference speed, WID-DeeplabV3+ provides an effective solution for wing-icing detection. Full article

(This article belongs to the Section Aeronautics)

► Show Figures

Figure 1

42 pages, 5041 KiB

Open AccessArticle

Autonomous Waste Classification Using Multi-Agent Systems and Blockchain: A Low-Cost Intelligent Approach

by Sergio García González, David Cruz García, Rubén Herrero Pérez, Arturo Álvarez Sanchez and Gabriel Villarrubia González

Sensors 2025, 25(14), 4364; https://doi.org/10.3390/s25144364 - 12 Jul 2025

Viewed by 395

Abstract

The increase in garbage generated in modern societies demands the implementation of a more sustainable model as well as new methods for efficient waste management. This article describes the development and implementation of a prototype of a smart bin that automatically sorts waste [...] Read more.

The increase in garbage generated in modern societies demands the implementation of a more sustainable model as well as new methods for efficient waste management. This article describes the development and implementation of a prototype of a smart bin that automatically sorts waste using a multi-agent system and blockchain integration. The proposed system has sensors that identify the type of waste (organic, plastic, paper, etc.) and uses collaborative intelligent agents to make instant sorting decisions. Blockchain has been implemented as a technology for the immutable and transparent control of waste registration, favoring traceability during the classification process, providing sustainability to the process, and making the audit of data in smart urban environments transparent. For the computer vision algorithm, three versions of YOLO (YOLOv8, YOLOv11, and YOLOv12) were used and evaluated with respect to their performance in automatic detection and classification of waste. The YOLOv12 version was selected due to its overall performance, which is superior to others with mAP@50 values of 86.2%, an overall accuracy of 84.6%, and an average F1 score of 80.1%. Latency was kept below 9 ms per image with YOLOv12, ensuring smooth and lag-free processing, even for utilitarian embedded systems. This allows for efficient deployment in near-real-time applications where speed and immediate response are crucial. These results confirm the viability of the system in both accuracy and computational efficiency. This work provides an innovative solution in the field of ambient intelligence, characterized by low equipment cost and high scalability, laying the foundations for the development of smart waste management infrastructures in sustainable cities. Full article

(This article belongs to the Special Issue Sensing and AI: Advancements in Robotics and Autonomous Systems)

► Show Figures

Figure 1

17 pages, 2032 KiB

Open AccessArticle

Measurement Techniques for Highly Dynamic and Weak Space Targets Using Event Cameras

by Haonan Liu, Ting Sun, Ye Tian, Siyao Wu, Fei Xing, Haijun Wang, Xi Wang, Zongyu Zhang, Kang Yang and Guoteng Ren

Sensors 2025, 25(14), 4366; https://doi.org/10.3390/s25144366 - 12 Jul 2025

Viewed by 358

Abstract

Star sensors, as the most precise attitude measurement devices currently available, play a crucial role in spacecraft attitude estimation. However, traditional frame-based cameras tend to suffer from target blur and loss under high-dynamic maneuvers, which severely limit the applicability of conventional star sensors [...] Read more.

Star sensors, as the most precise attitude measurement devices currently available, play a crucial role in spacecraft attitude estimation. However, traditional frame-based cameras tend to suffer from target blur and loss under high-dynamic maneuvers, which severely limit the applicability of conventional star sensors in complex space environments. In contrast, event cameras—drawing inspiration from biological vision—can capture brightness changes at ultrahigh speeds and output a series of asynchronous events, thereby demonstrating enormous potential for space detection applications. Based on this, this paper proposes an event data extraction method for weak, high-dynamic space targets to enhance the performance of event cameras in detecting space targets under high-dynamic maneuvers. In the target denoising phase, we fully consider the characteristics of space targets’ motion trajectories and optimize a classical spatiotemporal correlation filter, thereby significantly improving the signal-to-noise ratio for weak targets. During the target extraction stage, we introduce the DBSCAN clustering algorithm to achieve the subpixel-level extraction of target centroids. Moreover, to address issues of target trajectory distortion and data discontinuity in certain ultrahigh-dynamic scenarios, we construct a camera motion model based on real-time motion data from an inertial measurement unit (IMU) and utilize it to effectively compensate for and correct the target’s trajectory. Finally, a ground-based simulation system is established to validate the applicability and superior performance of the proposed method in real-world scenarios. Full article

(This article belongs to the Special Issue Sensors and Sensing Technologies for Precision Optical Measurement and Metrology)

► Show Figures

Figure 1

30 pages, 4582 KiB

Open AccessReview

Review on Rail Damage Detection Technologies for High-Speed Trains

by Yu Wang, Bingrong Miao, Ying Zhang, Zhong Huang and Songyuan Xu

Appl. Sci. 2025, 15(14), 7725; https://doi.org/10.3390/app15147725 - 10 Jul 2025

Viewed by 595

Abstract

From the point of view of the intelligent operation and maintenance of high-speed train tracks, this paper examines the research status of high-speed train rail damage detection technology in the field of high-speed train track operation and maintenance detection in recent years, summarizes [...] Read more.

From the point of view of the intelligent operation and maintenance of high-speed train tracks, this paper examines the research status of high-speed train rail damage detection technology in the field of high-speed train track operation and maintenance detection in recent years, summarizes the damage detection methods for high-speed trains, and compares and analyzes different detection technologies and application research results. The analysis results show that the detection methods for high-speed train rail damage mainly focus on the research and application of non-destructive testing technology and methods, as well as testing platform equipment. Detection platforms and equipment include a new type of vortex meter, integrated track recording vehicles, laser rangefinders, thermal sensors, laser vision systems, LiDAR, new ultrasonic detectors, rail detection vehicles, rail detection robots, laser on-board rail detection systems, track recorders, self-moving trolleys, etc. The main research and application methods include electromagnetic detection, optical detection, ultrasonic guided wave detection, acoustic emission detection, ray detection, vortex detection, and vibration detection. In recent years, the most widely studied and applied methods have been rail detection based on LiDAR detection, ultrasonic detection, eddy current detection, and optical detection. The most important optical detection method is machine vision detection. Ultrasonic detection can detect internal damage of the rail. LiDAR detection can detect dirt around the rail and the surface, but the cost of this kind of equipment is very high. And the application cost is also very high. In the future, for high-speed railway rail damage detection, the damage standards must be followed first. In terms of rail geometric parameters, the domestic standard (TB 10754-2018) requires a gauge deviation of ±1 mm, a track direction deviation of 0.3 mm/10 m, and a height deviation of 0.5 mm/10 m, and some indicators are stricter than European standard EN-13848. In terms of damage detection, domestic flaw detection vehicles have achieved millimeter-level accuracy in crack detection in rail heads, rail waists, and other parts, with a damage detection rate of over 85%. The accuracy of identifying track components by the drone detection system is 93.6%, and the identification rate of potential safety hazards is 81.8%. There is a certain gap with international standards, and standards such as EN 13848 have stricter requirements for testing cycles and data storage, especially in quantifying damage detection requirements, real-time damage data, and safety, which will be the key research and development contents and directions in the future. Full article

► Show Figures

Figure 1

13 pages, 1697 KiB

Open AccessArticle

A Real-Time Vision-Based Adaptive Follow Treadmill for Animal Gait Analysis

by Guanghui Li, Salif Komi, Jakob Fleng Sorensen and Rune W. Berg

Sensors 2025, 25(14), 4289; https://doi.org/10.3390/s25144289 - 9 Jul 2025

Viewed by 454

Abstract

Treadmills are a convenient tool to study animal gait and behavior. Traditional animal treadmill designs often entail preset speeds and therefore have reduced adaptability to animals’ dynamic behavior, thus restricting the experimental scope. Fortunately, advancements in computer vision and automation allow circumvention of [...] Read more.

Treadmills are a convenient tool to study animal gait and behavior. Traditional animal treadmill designs often entail preset speeds and therefore have reduced adaptability to animals’ dynamic behavior, thus restricting the experimental scope. Fortunately, advancements in computer vision and automation allow circumvention of these limitations. Here, we introduce a series of real-time adaptive treadmill systems utilizing both marker-based visual fiducial systems (colored blocks or AprilTags) and marker-free (pre-trained models) tracking methods powered by advanced computer vision to track experimental animals. We demonstrate their real-time object recognition capabilities in specific tasks by conducting practical tests and highlight the performance of the marker-free method using an object detection machine learning algorithm (FOMO MobileNetV2 network), which shows high robustness and accuracy in detecting a moving rat compared to the marker-based method. The combination of this computer vision system together with treadmill control overcome the issues of traditional treadmills by enabling the adjustment of belt speed and direction based on animal movement. Full article

(This article belongs to the Special Issue Object Detection and Recognition Based on Deep Learning)

► Show Figures

Graphical abstract

14 pages, 29613 KiB

Open AccessArticle

Unsupervised Insulator Defect Detection Method Based on Masked Autoencoder

by Yanying Song and Wei Xiong

Sensors 2025, 25(14), 4271; https://doi.org/10.3390/s25144271 - 9 Jul 2025

Viewed by 317

Abstract

With the rapid expansion of high-speed rail infrastructure, maintaining the structural integrity of insulators is critical to operational safety. However, conventional defect detection techniques typically rely on extensive labeled datasets, struggle with class imbalance, and often fail to capture large-scale structural anomalies. In [...] Read more.

With the rapid expansion of high-speed rail infrastructure, maintaining the structural integrity of insulators is critical to operational safety. However, conventional defect detection techniques typically rely on extensive labeled datasets, struggle with class imbalance, and often fail to capture large-scale structural anomalies. In this paper, we present an unsupervised insulator defect detection framework based on a masked autoencoder (MAE) architecture. Built upon a vision transformer (ViT), the model employs an asymmetric encoder-decoder structure and leverages a high-ratio random masking scheme during training to facilitate robust representation learning. At inference, a dual-pass interval masking strategy enhances defect localization accuracy. Benchmark experiments across multiple datasets demonstrate that our method delivers competitive image- and pixel-level performance while significantly reducing computational overhead compared to existing ViT-based approaches. By enabling high-precision defect detection through image reconstruction without requiring manual annotations, this approach offers a scalable and efficient solution for real-time industrial inspection under limited supervision. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

13 pages, 3291 KiB

Open AccessTechnical Note

Semi-Automated Training of AI Vision Models

by Mathew G. Pelletier, John D. Wanjura and Greg A. Holt

AgriEngineering 2025, 7(7), 225; https://doi.org/10.3390/agriengineering7070225 - 8 Jul 2025

Viewed by 321

Abstract

The adoption of AI vision models in specialized industries is often hindered by the substantial requirement for extensive, manually annotated image datasets. Even when employing transfer learning, robust model development typically necessitates tens of thousands of such images, a process that is time-consuming, [...] Read more.

The adoption of AI vision models in specialized industries is often hindered by the substantial requirement for extensive, manually annotated image datasets. Even when employing transfer learning, robust model development typically necessitates tens of thousands of such images, a process that is time-consuming, costly, and demands consistent expert annotation. This technical note introduces a semi-automated method to significantly reduce this annotation burden. The proposed approach utilizes two general-purpose vision-transformer-to-caption (GP-ViTC) models to generate descriptive text from images. These captions are then processed by a custom-developed semantic classifier (SC), which requires only minimal training to predict the correct image class. This GP-ViTC + SC system demonstrated exemplary classification rates in test cases and can subsequently be used to automatically annotate large image datasets. While the inference speed of the GP-ViTC models is not suited for real-time applications (approximately 10 s per image), this method substantially lessens the labor and expertise required for dataset creation, thereby facilitating the development of new, high-speed, custom AI vision models for niche applications. This work details the approach and its successful application, offering a cost-effective pathway for generating tailored image training sets. Full article

► Show Figures

Figure 1

22 pages, 6123 KiB

Open AccessArticle

Real-Time Proprioceptive Sensing Enhanced Switching Model Predictive Control for Quadruped Robot Under Uncertain Environment

by Sanket Lokhande, Yajie Bao, Peng Cheng, Dan Shen, Genshe Chen and Hao Xu

Electronics 2025, 14(13), 2681; https://doi.org/10.3390/electronics14132681 - 2 Jul 2025

Viewed by 511

Abstract

Quadruped robots have shown significant potential in disaster relief applications, where they have to navigate complex terrains for search and rescue or reconnaissance operations. However, their deployment is hindered by limited adaptability in highly uncertain environments, especially when relying solely on vision-based sensors [...] Read more.

Quadruped robots have shown significant potential in disaster relief applications, where they have to navigate complex terrains for search and rescue or reconnaissance operations. However, their deployment is hindered by limited adaptability in highly uncertain environments, especially when relying solely on vision-based sensors like cameras or LiDAR, which are susceptible to occlusions, poor lighting, and environmental interference. To address these limitations, this paper proposes a novel sensor-enhanced hierarchical switching model predictive control (MPC) framework that integrates proprioceptive sensing with a bi-level hybrid dynamic model. Unlike existing methods that either rely on handcrafted controllers or deep learning-based control pipelines, our approach introduces three core innovations: (1) a situation-aware, bi-level hybrid dynamic modeling strategy that hierarchically combines single-body rigid dynamics with distributed multi-body dynamics for modeling agility and scalability; (2) a three-layer hybrid control framework, including a terrain-aware switching MPC layer, a distributed torque controller, and a fast PD control loop for enhanced robustness during contact transitions; and (3) a multi-IMU-based proprioceptive feedback mechanism for terrain classification and adaptive gait control under sensor-occluded or GPS-denied environments. Together, these components form a unified and computationally efficient control scheme that addresses practical challenges such as limited onboard processing, unstructured terrain, and environmental uncertainty. A series of experimental results demonstrate that the proposed method outperforms existing vision- and learning-based controllers in terms of stability, adaptability, and control efficiency during high-speed locomotion over irregular terrain. Full article

(This article belongs to the Special Issue Smart Robotics and Autonomous Systems)

► Show Figures

Figure 1

27 pages, 21013 KiB

Open AccessArticle

Improved YOLO-Goose-Based Method for Individual Identification of Lion-Head Geese and Egg Matching: Methods and Experimental Study

by Hengyuan Zhang, Zhenlong Wu, Tiemin Zhang, Canhuan Lu, Zhaohui Zhang, Jianzhou Ye, Jikang Yang, Degui Yang and Cheng Fang

Agriculture 2025, 15(13), 1345; https://doi.org/10.3390/agriculture15131345 - 23 Jun 2025

Viewed by 599

Abstract

As a crucial characteristic waterfowl breed, the egg-laying performance of Lion-Headed Geese serves as a core indicator for precision breeding. Under large-scale flat rearing and selection practices, high phenotypic similarity among individuals within the same pedigree coupled with traditional manual observation and existing [...] Read more.

As a crucial characteristic waterfowl breed, the egg-laying performance of Lion-Headed Geese serves as a core indicator for precision breeding. Under large-scale flat rearing and selection practices, high phenotypic similarity among individuals within the same pedigree coupled with traditional manual observation and existing automation systems relying on fixed nesting boxes or RFID tags has posed challenges in achieving accurate goose–egg matching in dynamic environments, leading to inefficient individual selection. To address this, this study proposes YOLO-Goose, an improved YOLOv8s-based method, which designs five high-contrast neck rings (DoubleBar, Circle, Dot, Fence, Cylindrical) as individual identifiers. The method constructs a lightweight model with a small-object detection layer, integrates the GhostNet backbone to reduce parameter count by 67.2%, and employs the GIoU loss function to optimize neck ring localization accuracy. Experimental results show that the model achieves an F1 score of 93.8% and mAP50 of 96.4% on the self-built dataset, representing increases of 10.1% and 5% compared to the original YOLOv8s, with a 27.1% reduction in computational load. The dynamic matching algorithm, incorporating spatiotemporal trajectories and egg positional data, achieves a 95% matching rate, a 94.7% matching accuracy, and a 5.3% mismatching rate. Through lightweight deployment using TensorRT, the inference speed is enhanced by 1.4 times compared to PyTorch-1.12.1, with detection results uploaded to a cloud database in real time. This solution overcomes the technical bottleneck of individual selection in flat rearing environments, providing an innovative computer-vision-based approach for precision breeding of pedigree Lion-Headed Geese and offering significant engineering value for advancing intelligent waterfowl breeding. Full article

(This article belongs to the Special Issue Computer Vision Analysis Applied to Farm Animals)

► Show Figures

Figure 1

Search Results (504)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (504)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI