MDPI - Publisher of Open Access Journals

23 pages, 24448 KB

Open AccessArticle

YOLO-SCA: A Lightweight Potato Bud Eye Detection Method Based on the Improved YOLOv5s Algorithm

by Qing Zhao, Ping Zhao, Xiaojian Wang, Qingbing Xu, Siyao Liu and Tianqi Ma

Agriculture 2025, 15(19), 2066; https://doi.org/10.3390/agriculture15192066 - 1 Oct 2025

Bud eye identification is a critical step in the intelligent seed cutting process for potatoes. This study focuses on the challenges of low testing accuracy and excessive weighted memory in testing models for potato bud eye detection. It proposes an improved potato bud [...] Read more.

Bud eye identification is a critical step in the intelligent seed cutting process for potatoes. This study focuses on the challenges of low testing accuracy and excessive weighted memory in testing models for potato bud eye detection. It proposes an improved potato bud eye detection method based on YOLOv5s, referred to as the YOLO-SCA model, which synergistically optimizing three main modules. The improved model introduces the ShuffleNetV2 module to reconstruct the backbone network. The channel shuffling mechanism reduces the model’s weighted memory and computational load, while enhancing bud eye features. Additionally, the CBAM attention mechanism is embedded at specific layers, using dual-path feature weighting (channel and spatial) to enhance sensitivity to key bud eye features in complex contexts. Then, the Alpha-IoU function is used to replace the CloU function as the bounding box regression loss function. Its single-parameter control mechanism and adaptive gradient amplification characteristics significantly improve the accuracy of bud eye positioning and strengthen the model’s anti-interference ability. Finally, we conduct pruning based on the channel evaluation after sparse training, accurately removing redundant channels, significantly reducing the amount of computation and weighted memory, and achieving real-time performance of the model. This study aims to address how potato bud eye detection models can achieve high-precision real-time detection under the conditions of limited computational resources and storage space. The improved YOLO-SCA model has a size of 3.6 MB, which is 35.3% of the original model; the number of parameters is 1.7 M, which is 25% of the original model; and the average accuracy rate is 95.3%, which is a 12.5% improvement over the original model. This study provides theoretical support for the development of potato bud eye recognition technology and intelligent cutting equipment. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

19 pages, 7270 KB

Open AccessArticle

A Fast Rotation Detection Network with Parallel Interleaved Convolutional Kernels

by Leilei Deng, Lifeng Sun and Hua Li

Symmetry 2025, 17(10), 1621; https://doi.org/10.3390/sym17101621 - 1 Oct 2025

Abstract

In recent years, convolutional neural network-based object detectors have achieved extensive applications in remote sensing (RS) image interpretation. While multi-scale feature modeling optimization remains a persistent research focus, existing methods frequently overlook the symmetrical balance between feature granularity and morphological diversity, particularly when [...] Read more.

In recent years, convolutional neural network-based object detectors have achieved extensive applications in remote sensing (RS) image interpretation. While multi-scale feature modeling optimization remains a persistent research focus, existing methods frequently overlook the symmetrical balance between feature granularity and morphological diversity, particularly when handling high-aspect-ratio RS targets with anisotropic geometries. This oversight leads to suboptimal feature representations characterized by spatial sparsity and directional bias. To address this challenge, we propose the Parallel Interleaved Convolutional Kernel Network (PICK-Net), a rotation-aware detection framework that embodies symmetry principles through dual-path feature modulation and geometrically balanced operator design. The core innovation lies in the synergistic integration of cascaded dynamic sparse sampling and symmetrically decoupled feature modulation, enabling adaptive morphological modeling of RS targets. Specifically, the Parallel Interleaved Convolution (PIC) module establishes symmetric computation patterns through mirrored kernel arrangements, effectively reducing computational redundancy while preserving directional completeness through rotational symmetry-enhanced receptive field optimization. Complementing this, the Global Complementary Attention Mechanism (GCAM) introduces bidirectional symmetry in feature recalibration, decoupling channel-wise and spatial-wise adaptations through orthogonal attention pathways that maintain equilibrium in gradient propagation. Extensive experiments on RSOD and NWPU-VHR-10 datasets demonstrate our superior performance, achieving 92.2% and 84.90% mAP, respectively, outperforming state-of-the-art methods including EfficientNet and YOLOv8. With only 12.5 M parameters, the framework achieves symmetrical optimization of accuracy-efficiency trade-offs. Ablation studies confirm that the symmetric interaction between PIC and GCAM enhances detection performance by 2.75%, particularly excelling in scenarios requiring geometric symmetry preservation, such as dense target clusters and extreme scale variations. Cross-domain validation on agricultural pest datasets further verifies its rotational symmetry generalization capability, demonstrating 84.90% accuracy in fine-grained orientation-sensitive detection tasks. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

32 pages, 9638 KB

Open AccessArticle

MSSA: A Multi-Scale Semantic-Aware Method for Remote Sensing Image–Text Retrieval

by Yun Liao, Zongxiao Hu, Fangwei Jin, Junhui Liu, Nan Chen, Jiayi Lv and Qing Duan

Remote Sens. 2025, 17(19), 3341; https://doi.org/10.3390/rs17193341 - 30 Sep 2025

Abstract

In recent years, the convenience and potential for information extraction offered by Remote Sensing Image–Text Retrieval (RSITR) have made it a significant focus of research in remote sensing (RS) knowledge services. Current mainstream methods for RSITR generally align fused image features at multiple [...] Read more.

In recent years, the convenience and potential for information extraction offered by Remote Sensing Image–Text Retrieval (RSITR) have made it a significant focus of research in remote sensing (RS) knowledge services. Current mainstream methods for RSITR generally align fused image features at multiple scales with textual features, primarily focusing on the local information of RS images while neglecting potential semantic information. This results in insufficient alignment in the cross-modal semantic space. To overcome this limitation, we propose a Multi-Scale Semantic-Aware Remote Sensing Image–Text Retrieval method (MSSA). This method introduces Progressive Spatial Channel Joint Attention (PSCJA), which enhances the expressive capability of multi-scale image features through Window-Region-Global Progressive Attention (WRGPA) and Segmented Channel Attention (SCA). Additionally, the Image-Guided Text Attention (IGTA) mechanism dynamically adjust textual attention weights based on visual context. Furthermore, the Cross-Modal Semantic Extraction Module (CMSE) incorporated learnable semantic tokens at each scale, enabling attention interaction between multi-scale features of different modalities and the capturing of hierarchical semantic associations. This multi-scale semantic-guided retrieval method ensures cross-modal semantic consistency, significantly improving the accuracy of cross-modal retrieval in RS. MSSA demonstrates superior retrieval accuracy in experiments across three baseline datasets, achieving a new state-of-the-art performance. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

20 pages, 2545 KB

Open AccessArticle

LG-UNet Based Segmentation and Survival Prediction of Nasopharyngeal Carcinoma Using Multimodal MRI Imaging

by Yuhao Yang, Junhao Wen, Tianyi Wu, Jinrang Dong, Yunfei Xia and Yu Zhang

Bioengineering 2025, 12(10), 1051; https://doi.org/10.3390/bioengineering12101051 - 29 Sep 2025

Abstract

Image segmentation and survival prediction for nasopharyngeal carcinoma (NPC) are crucial for clinical diagnosis and treatment decisions. This study presents an improved 3D-UNet-based model for NPC GTV segmentation, referred to as LG-UNet. The encoder introduces deep strip convolution and channel attention mechanisms to [...] Read more.

Image segmentation and survival prediction for nasopharyngeal carcinoma (NPC) are crucial for clinical diagnosis and treatment decisions. This study presents an improved 3D-UNet-based model for NPC GTV segmentation, referred to as LG-UNet. The encoder introduces deep strip convolution and channel attention mechanisms to enhance feature extraction while avoiding spatial feature loss and anisotropic constraints. The decoder incorporates Dynamic Large Convolutional Kernel (DLCK) and Global Feature Fusion (GFF) modules to capture multi-scale features and integrate global contextual information, enabling precise segmentation of the tumor GTV in NPC MRI images. Risk prediction is performed on the segmented multi-modal MRI images using the Lung-Net model, with output risk factors combined with clinical data in the Cox model to predict metastatic probabilities for NPC lesions. Experimental results on 442 NPC MRI scans from Sun Yat-sen University Cancer Center showed DSC of 0.8223, accuracy of 0.8235, recall of 0.8297, and HD95 of 1.6807 mm. Compared to the baseline model, the DSC improved by 7.73%, accuracy increased by 4.52%, and recall improved by 3.40%. The combined model’s risk prediction showed C-index values of 0.756, with a 5-year AUC value of 0.789. This model can serve as an auxiliary tool for clinical decision-making in NPC. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Figure 1

19 pages, 13644 KB

Open AccessArticle

Rock Surface Crack Recognition Based on Improved Mask R-CNN with CBAM and BiFPN

by Yu Hu, Naifu Deng, Fan Ye, Qinglong Zhang and Yuchen Yan

Buildings 2025, 15(19), 3516; https://doi.org/10.3390/buildings15193516 - 29 Sep 2025

Abstract

To address the challenges of multi-scale distribution, low contrast and background interference in rock crack identification, this paper proposes an improved Mask R-CNN model (CBAM-BiFPN-Mask R-CNN) that integrates the convolutional block attention mechanism (CBAM) module and the bidirectional feature pyramid network (BiFPN) module. [...] Read more.

To address the challenges of multi-scale distribution, low contrast and background interference in rock crack identification, this paper proposes an improved Mask R-CNN model (CBAM-BiFPN-Mask R-CNN) that integrates the convolutional block attention mechanism (CBAM) module and the bidirectional feature pyramid network (BiFPN) module. A dataset of 1028 rock surface crack images was constructed. The robustness of the model was improved by dynamically combining Gaussian blurring, noise overlay, and color adjustment to enhance data augmentation strategies. The model embeds the CBAM module after the residual block of the ResNet50 backbone network, strengthens the crack-related feature response through channel attention, and uses spatial attention to focus on the spatial distribution of cracks; at the same time, it replaces the traditional FPN with BiFPN, realizes the adaptive fusion of cross-scale features through learnable weights, and optimizes multi-scale crack feature extraction. Experimental results show that the improved model significantly improves the crack recognition effect in complex rock mass scenarios. The mAP index, precision and recall rate are improved by 8.36%, 9.1% and 12.7%, respectively, compared with the baseline model. This research provides an effective solution for rock crack detection in complex geological environments, especially the missed detection of small cracks and complex backgrounds. Full article

(This article belongs to the Special Issue Recent Scientific Developments in Structural Damage Identification)

► Show Figures

Figure 1

20 pages, 1860 KB

Open AccessArticle

An Improved YOLOv11n Model Based on Wavelet Convolution for Object Detection in Soccer Scenes

by Yue Wu, Lanxin Geng, Xinqi Guo, Chao Wu and Gui Yu

Symmetry 2025, 17(10), 1612; https://doi.org/10.3390/sym17101612 - 28 Sep 2025

Abstract

Object detection in soccer scenes serves as a fundamental task for soccer video analysis and target tracking. This paper proposes WCC-YOLO, a symmetry-enhanced object detection framework based on YOLOv11n. Our approach integrates symmetry principles at multiple levels: (1) The novel C3k2-WTConv module synergistically [...] Read more.

Object detection in soccer scenes serves as a fundamental task for soccer video analysis and target tracking. This paper proposes WCC-YOLO, a symmetry-enhanced object detection framework based on YOLOv11n. Our approach integrates symmetry principles at multiple levels: (1) The novel C3k2-WTConv module synergistically combines conventional convolution with wavelet decomposition, leveraging the orthogonal symmetry of Haar wavelet quadrature mirror filters (QMFs) to achieve balanced frequency-domain decomposition and enhance multi-scale feature representation. (2) The Channel Prior Convolutional Attention (CPCA) mechanism incorporates symmetrical operations—using average-max pooling pairs in channel attention and multi-scale convolutional kernels in spatial attention—to automatically learn to prioritize semantically salient regions through channel-wise feature recalibration, thereby enabling balanced feature representation. Coupled with InnerShape-IoU for refined bounding box regression, WCC-YOLO achieves a 4.5% improvement in mAP@0.5:0.95 and a 5.7% gain in mAP@0.5 compared to the baseline YOLOv11n while simultaneously reducing the number of parameters and maintaining near-identical inference latency (δ < 0.1 ms). This work demonstrates the value of explicit symmetry-aware modeling for sports analytics. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

21 pages, 9052 KB

Open AccessArticle

SAM–Attention Synergistic Enhancement: SAR Image Object Detection Method Based on Visual Large Model

by Yirong Yuan, Jie Yang, Lei Shi and Lingli Zhao

Remote Sens. 2025, 17(19), 3311; https://doi.org/10.3390/rs17193311 - 26 Sep 2025

Abstract

The object detection model for synthetic aperture radar (SAR) images needs to have strong generalization ability and more stable detection performance due to the complex scattering mechanism, high sensitivity of the orientation angle, and susceptibility to speckle noise. Visual large models possess strong [...] Read more.

The object detection model for synthetic aperture radar (SAR) images needs to have strong generalization ability and more stable detection performance due to the complex scattering mechanism, high sensitivity of the orientation angle, and susceptibility to speckle noise. Visual large models possess strong generalization capabilities for natural image processing, but their application to SAR imagery remains relatively rare. This paper attempts to introduce a visual large model into the SAR object detection task, aiming to alleviate the problems of weak cross-domain generalization and poor adaptability to few-shot samples caused by the characteristics of SAR images in existing models. The proposed model comprises an image encoder, an attention module, and a detection decoder. The image encoder leverages the pre-trained Segment Anything Model (SAM) for effective feature extraction from SAR images. An Adaptive Channel Interactive Attention (ACIA) module is introduced to suppress SAR speckle noise. Further, a Dynamic Tandem Attention (DTA) mechanism is proposed in the decoder to integrate scale perception, spatial focusing, and task adaptation, while decoupling classification from detection for improved accuracy. Leveraging the strong representational and few-shot adaptation capabilities of large pre-trained models, this study evaluates their cross-domain and few-shot detection performance on SAR imagery. For cross-domain detection, the model was trained on AIR-SARShip-1.0 and tested on SSDD, achieving an mAP50 of 0.54. For few-shot detection on SAR-AIRcraft-1.0, using only 10% of the training samples, the model reached an mAP50 of 0.503. Full article

(This article belongs to the Special Issue Big Data Era: AI Technology for SAR and PolSAR Image)

► Show Figures

Figure 1

19 pages, 5381 KB

Open AccessArticle

Context_Driven Emotion Recognition: Integrating Multi_Cue Fusion and Attention Mechanisms for Enhanced Accuracy on the NCAER_S Dataset

by Merieme Elkorchi, Boutaina Hdioud, Rachid Oulad Haj Thami and Safae Merzouk

Information 2025, 16(10), 834; https://doi.org/10.3390/info16100834 - 26 Sep 2025

Abstract

In recent years, most conventional emotion recognition approaches have concentrated primarily on facial cues, often overlooking complementary sources of information such as body posture and contextual background. This limitation reduces their effectiveness in complex, real-world environments. In this work, we present a multi-branch [...] Read more.

In recent years, most conventional emotion recognition approaches have concentrated primarily on facial cues, often overlooking complementary sources of information such as body posture and contextual background. This limitation reduces their effectiveness in complex, real-world environments. In this work, we present a multi-branch emotion recognition framework that separately processes facial, bodily, and contextual information using three dedicated neural networks. To better capture contextual cues, we intentionally mask the face and body of the main subject within the scene, prompting the model to explore alternative visual elements that may convey emotional states. To further enhance the quality of the extracted features, we integrate both channel and spatial attention mechanisms into the network architecture. Evaluated on the challenging NCAER-S dataset, our model achieves an accuracy of 56.42%, surpassing the state-of-the-art GLAMOUR-Net. These results highlight the effectiveness of combining multi-cue representation and attention-guided feature extraction for robust emotion recognition in unconstrained settings. The findings also highlight the importance of accurate emotion recognition for human–computer interaction, where affect detection enables systems to adapt to users and deliver more effective experiences. Full article

(This article belongs to the Special Issue Multimodal Human-Computer Interaction)

► Show Figures

Figure 1

18 pages, 3547 KB

Open AccessArticle

Single-Image High Dynamic Range Reconstruction via Improved HDRUNet with Attention and Multi-Component Loss

by Liang Gao, Xiaoyun Tong and Laixian Zhang

Appl. Sci. 2025, 15(19), 10431; https://doi.org/10.3390/app151910431 - 25 Sep 2025

Abstract

High dynamic range (HDR) imaging aims to overcome the limited dynamic range of traditional imaging systems and achieve effective restoration of the brightness and color of the real world. In recent years, single-image HDR (SI-HDR) reconstruction technology has become a research hotspot due [...] Read more.

High dynamic range (HDR) imaging aims to overcome the limited dynamic range of traditional imaging systems and achieve effective restoration of the brightness and color of the real world. In recent years, single-image HDR (SI-HDR) reconstruction technology has become a research hotspot due to its simple acquisition process and applicability to dynamic scenes. This paper proposes an improved SI-HDR reconstruction method based on HDRUNet, which systematically integrates channel, spatial attention mechanism, brightness expansion, and color-enhancement branches, and constructs an adaptive multi-component loss function. This effectively enhances the detail restoration in extreme exposure areas and improves the overall color expressiveness. Experiments on public datasets such as NTIRE 2021, VDS, and HDR-Eye show that the proposed method outperforms the mainstream SI-HDR methods in terms of PSNR, SSIM, and VDP evaluation metrics. It performs particularly well in complex scenarios, demonstrating greater robustness and generalization ability. Full article

► Show Figures

Figure 1

26 pages, 6191 KB

Open AccessArticle

HLAE-Net: A Hierarchical Lightweight Attention-Enhanced Strategy for Remote Sensing Scene Image Classification

by Mingyuan Yang, Cuiping Shi, Kangning Tan, Haocheng Wu, Shenghan Wang and Liguo Wang

Remote Sens. 2025, 17(19), 3279; https://doi.org/10.3390/rs17193279 - 24 Sep 2025

Viewed by 148

Abstract

Remote sensing scene image classification has extensive application scenarios in fields such as land use monitoring and environmental assessment. However, traditional methodologies based on convolutional neural networks (CNNs) face considerable challenges caused by uneven image quality, imbalanced sample distribution, intra-class similarities and limited [...] Read more.

Remote sensing scene image classification has extensive application scenarios in fields such as land use monitoring and environmental assessment. However, traditional methodologies based on convolutional neural networks (CNNs) face considerable challenges caused by uneven image quality, imbalanced sample distribution, intra-class similarities and limited computing resources. To address such issues, this study proposes a hierarchical lightweight attention-enhanced network (HLAE-Net), which employs a hierarchical feature collaborative extraction (HFCE) strategy. By considering the differences in resolution and receptive field as well as the varying effectiveness of attention mechanisms across different network layers, the network uses different attention modules to progressively extract features from the images. This approach forms a complementary and enhanced feature chain among different layers, forming an efficient collaboration between various attention modules. In addition, an improved lightweight attention module group is proposed, including a lightweight dual coordinate spatial attention module (DCSAM), which captures spatial and channel information, as well as the lightweight multiscale spatial and channel attention module. These improved modules are incorporated into the featured average sampling (FAS) bottleneck and basic bottlenecks. The experiments were studied on four public standard datasets, and the results show that the proposed model outperforms several mainstream models from recent years in overall accuracy (OA). Particularly in terms of small training ratios, the proposed model shows competitive performance. Maintaining the parameter scale, it possesses both good classification ability and computational efficiency, providing a strong solution for the task of image classification. Full article

(This article belongs to the Special Issue Advanced Technology for Remote Sensing Image Analysis and Applications)

► Show Figures

Graphical abstract

15 pages, 1685 KB

Open AccessArticle

Ultra-High Resolution 9.4T Brain MRI Segmentation via a Newly Engineered Multi-Scale Residual Nested U-Net with Gated Attention

by Aryan Kalluvila, Jay B. Patel and Jason M. Johnson

Bioengineering 2025, 12(10), 1014; https://doi.org/10.3390/bioengineering12101014 - 24 Sep 2025

Viewed by 154

Abstract

A 9.4T brain MRI is the highest resolution MRI scanner in the public market. It offers submillimeter brain imaging with exceptional anatomical detail, making it one of the most powerful tools for detecting subtle structural changes associated with neurological conditions. Current segmentation models [...] Read more.

A 9.4T brain MRI is the highest resolution MRI scanner in the public market. It offers submillimeter brain imaging with exceptional anatomical detail, making it one of the most powerful tools for detecting subtle structural changes associated with neurological conditions. Current segmentation models are optimized for lower-field MRI (1.5T–3T), and they struggle to perform well on 9.4T data. In this study, we present the GA-MS-UNet++, the world’s first deep learning-based model specifically designed for 9.4T brain MRI segmentation. Our model integrates multi-scale residual blocks, gated skip connections, and spatial channel attention mechanisms to improve both local and global feature extraction. The model was trained and evaluated on 12 patients in the UltraCortex 9.4T dataset and benchmarked against four leading segmentation models (Attention U-Net, Nested U-Net, VDSR, and R2UNet). The GA-MS-UNet++ achieved a state-of-the-art performance across both evaluation sets. When tested against manual, radiologist-reviewed ground truth masks, the model achieved a Dice score of 0.93. On a separate test set using SynthSeg-generated masks as the ground truth, the Dice score was 0.89. Across both evaluations, the model achieved an overall accuracy of 97.29%, precision of 90.02%, and recall of 94.00%. Statistical validation using the Wilcoxon signed-rank test (p < 1 × 10⁻⁵) and Kruskal–Wallis test (H = 26,281.98, p < 1 × 10⁻⁵) confirmed the significance of these results. Qualitative comparisons also showed a near-exact alignment with ground truth masks, particularly in areas such as the ventricles and gray–white matter interfaces. Volumetric validation further demonstrated a high correlation (R² = 0.90) between the predicted and ground truth brain volumes. Despite the limited annotated data, the GA-MS-UNet++ maintained a strong performance and has the potential for clinical use. This algorithm represents the first publicly available segmentation model for 9.4T imaging, providing a powerful tool for high-resolution brain segmentation and driving progress in automated neuroimaging analysis. Full article

(This article belongs to the Special Issue New Sights of Machine Learning and Digital Models in Biomedicine)

► Show Figures

Figure 1

21 pages, 7458 KB

Open AccessArticle

Dynamic and Lightweight Detection of Strawberry Diseases Using Enhanced YOLOv10

by Huilong Jin, Xiangrong Ji and Wanming Liu

Electronics 2025, 14(19), 3768; https://doi.org/10.3390/electronics14193768 - 24 Sep 2025

Viewed by 139

Abstract

Strawberry cultivation faces significant challenges from pests and diseases, which are difficult to detect due to complex natural backgrounds and the high visual similarity between targets and their surroundings. This study proposes an advanced and lightweight detection algorithm, YOLO10-SC, based on the YOLOv10 [...] Read more.

Strawberry cultivation faces significant challenges from pests and diseases, which are difficult to detect due to complex natural backgrounds and the high visual similarity between targets and their surroundings. This study proposes an advanced and lightweight detection algorithm, YOLO10-SC, based on the YOLOv10 model, to address these challenges. The algorithm integrates the convolutional block attention module (CBAM) to enhance feature representation by focusing on critical disease-related information while suppressing irrelevant data. Additionally, the Spatial and Channel Reconstruction Convolution (SCConv) module is incorporated into the C2f module to improve the model’s ability to distinguish subtle differences among various pest and disease types. The introduction of DySample, an ultra-lightweight dynamic upsampler, further enhances feature boundary smoothness and detail preservation, ensuring efficient upsampling with minimal computational resources. Experimental results demonstrate that YOLO10-SC outperforms the original YOLOv10 and other mainstream algorithms in precision, recall, mAP50, F1 score, and FPS while reducing model parameters, GFLOPs, and size. These improvements significantly enhance detection accuracy and efficiency, making the model well-suited for real-time applications in natural agricultural environments. The proposed algorithm offers a robust solution for strawberry pest and disease detection, contributing to the advancement of smart agriculture. Full article

► Show Figures

Figure 1

24 pages, 5998 KB

Open AccessArticle

Dynamic Anomaly Detection Method for Pumping Units Based on Multi-Scale Feature Enhancement and Low-Light Optimization

by Kun Tan, Shuting Wang, Yaming Mao, Shunyi Wang and Guoqing Han

Processes 2025, 13(10), 3038; https://doi.org/10.3390/pr13103038 - 23 Sep 2025

Viewed by 98

Abstract

Abnormal shutdown detection in oilfield pumping units presents significant challenges, including degraded image quality under low-light conditions, difficulty in detecting small or obscured targets, and limited capabilities for dynamic state perception. Previous approaches, such as traditional visual inspection and conventional image processing, often [...] Read more.

Abnormal shutdown detection in oilfield pumping units presents significant challenges, including degraded image quality under low-light conditions, difficulty in detecting small or obscured targets, and limited capabilities for dynamic state perception. Previous approaches, such as traditional visual inspection and conventional image processing, often struggle with these limitations. To address these challenges, this study proposes an intelligent method integrating multi-scale feature enhancement and low-light image optimization. Specifically, a lightweight low-light enhancement framework is developed based on the Zero-DCE algorithm, improving the deep curve estimation network (DCE-Net) and non-reference loss functions through training on oilfield multi-exposure datasets. This significantly enhances brightness and detail retention in complex lighting conditions. The DAFE-Net detection model incorporates a four-level feature pyramid (P3–P6), channel-spatial attention mechanisms (CBAM), and Focal-EIoU loss to improve localization of small/occluded targets. Inter-frame difference algorithms further analyze motion states for robust “pump-off” determination. Experimental results on 5000 annotated images show the DAFE-Net achieves 93.9% mAP@50%, 96.5% recall, and 35 ms inference time, outperforming YOLOv11 and Faster R-CNN. Field tests confirm 93.9% accuracy under extreme conditions (e.g., strong illumination fluctuations and dust occlusion), demonstrating the method’s effectiveness in enabling intelligent monitoring across seven operational areas in the Changqing Oilfield while offering a scalable solution for real-time dynamic anomaly detection in industrial equipment monitoring. Full article

(This article belongs to the Section Energy Systems)

► Show Figures

Figure 1

24 pages, 4296 KB

Open AccessArticle

VST-YOLOv8: A Trustworthy and Secure Defect Detection Framework for Industrial Gaskets

by Lei Liang and Junming Chen

Electronics 2025, 14(19), 3760; https://doi.org/10.3390/electronics14193760 - 23 Sep 2025

Viewed by 111

Abstract

The surface quality of industrial gaskets directly impacts sealing performance, operational reliability, and market competitiveness. Inadequate or unreliable defect detection in silicone gaskets can lead to frequent maintenance, undetected faults, and security risks in downstream systems. This paper presents VST-YOLOv8, a trustworthy and [...] Read more.

The surface quality of industrial gaskets directly impacts sealing performance, operational reliability, and market competitiveness. Inadequate or unreliable defect detection in silicone gaskets can lead to frequent maintenance, undetected faults, and security risks in downstream systems. This paper presents VST-YOLOv8, a trustworthy and secure defect detection framework built upon an enhanced YOLOv8 architecture. To address the limitations of C2F feature extraction in the traditional YOLOv8 backbone, we integrate the lightweight Mobile Vision Transformer v2 (ViT v2) to improve global feature representation while maintaining interpretability. For real-time industrial deployment, we incorporate the Gating-Structured Convolution (GSConv) module, which adaptively adjusts convolution kernels to emphasize features of different shapes, ensuring stable detection under varying production conditions. A Slim-neck structure reduces parameter count and computational complexity without sacrificing accuracy, contributing to robustness against performance degradation. Additionally, the Triplet Attention mechanism combines channel, spatial, and fine-grained attention to enhance feature discrimination, improving reliability in challenging visual environments. Experimental results show that VST-YOLOv8 achieves higher accuracy and recall compared to the baseline YOLOv8, while maintaining low latency suitable for edge deployment. When integrated with secure industrial control systems, the proposed framework supports authenticated, tamper-resistant detection pipelines, ensuring both operational efficiency and data integrity in real-world production. These contributions strengthen trust in AI-driven quality inspection, making the system suitable for safety-critical manufacturing processes. Full article

(This article belongs to the Special Issue Advancements in Distributed Intelligent Security Through AI-Driven Solutions)

► Show Figures

Figure 1

17 pages, 2608 KB

Open AccessArticle

Improved UNet Recognition Model for Multiple Strawberry Pests Based on Small Samples

by Shengyi Zhao, Jizhan Liu, Tianzheng Hua and Yong Jiang

Agronomy 2025, 15(10), 2252; https://doi.org/10.3390/agronomy15102252 - 23 Sep 2025

Viewed by 88

Abstract

Intelligent pest detection has become a critical challenge in precision agriculture. Addressing the challenge of distinguishing between aphids, thrips, whiteflies, beet armyworms, spodopetra frugiperda, and spider mites during strawberry growth, this study establishes a small-sample multi-pest dataset for strawberries through field photography, open-source [...] Read more.

Intelligent pest detection has become a critical challenge in precision agriculture. Addressing the challenge of distinguishing between aphids, thrips, whiteflies, beet armyworms, spodopetra frugiperda, and spider mites during strawberry growth, this study establishes a small-sample multi-pest dataset for strawberries through field photography, open-source sharing, and web scraping. This study introduces a channel–space parallel attention mechanism (PCSA) into the UNet architecture. This improved UNet model accentuates pest color and morphology through channel-based attention and emphasizes spatial localization with coordinate-based attention, allowing for the comprehensive integration of global and local pixel information. Subsequently, comparative analysis of several color spaces identified HSV as optimal for pest recognition, with the “UNet + PCSA + HSV” approach achieving state-of-the-art results (IoU, 84.8%; recall, 89.9%; precision, 91.8%). Full article

(This article belongs to the Collection AI, Sensors and Robotics for Smart Agriculture)

► Show Figures

Figure 1

Search Results (1,551)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,551)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI