MDPI - Publisher of Open Access Journals

20 pages, 18416 KiB

Open AccessArticle

Swin-FSNet: A Frequency-Aware and Spatially Enhanced Network for Unpaved Road Extraction from UAV Remote Sensing Imagery

by Jiwu Guan, Qingzhan Zhao, Wenzhong Tian, Xinxin Yao, Jingyang Li and Wei Li

Remote Sens. 2025, 17(14), 2520; https://doi.org/10.3390/rs17142520 - 20 Jul 2025

Viewed by 235

Abstract

The efficient recognition of unpaved roads from remote sensing (RS) images holds significant value for tasks such as emergency response and route planning in outdoor environments. However, unpaved roads often face challenges such as blurred boundaries, low contrast, complex shapes, and a lack [...] Read more.

The efficient recognition of unpaved roads from remote sensing (RS) images holds significant value for tasks such as emergency response and route planning in outdoor environments. However, unpaved roads often face challenges such as blurred boundaries, low contrast, complex shapes, and a lack of publicly available datasets. To address these issues, this paper proposes a novel architecture, Swin-FSNet, which combines frequency analysis and spatial enhancement techniques to optimize feature extraction. The architecture consists of two core modules: the Wavelet-Based Feature Decomposer (WBFD) module and the Hybrid Dynamic Snake Block (HyDS-B) module. The WBFD module enhances boundary detection by capturing directional gradient changes at the road edges and extracting high-frequency features, effectively addressing boundary blurring and low contrast. The HyDS-B module, by adaptively adjusting the receptive field, performs spatial modeling for complex-shaped roads, significantly improving adaptability to narrow road curvatures. In this study, the southern mountainous area of Shihezi, Xinjiang, was selected as the study area, and the unpaved road dataset was constructed using high-resolution UAV images. Experimental results on the SHZ unpaved road dataset and the widely used DeepGlobe dataset show that Swin-FSNet performs well in segmentation accuracy and road structure preservation, with an IoU_road of 81.76% and 71.97%, respectively. The experiments validate the excellent performance and robustness of Swin-FSNet in extracting unpaved roads from high-resolution RS images. Full article

(This article belongs to the Special Issue Road Extraction and Distress Assessment by Spaceborne, Airborne and Terrestrial Platforms (Second Edition))

► Show Figures

Figure 1

21 pages, 4936 KiB

Open AccessArticle

A Lightweight Pavement Defect Detection Algorithm Integrating Perception Enhancement and Feature Optimization

by Xiang Zhang, Xiaopeng Wang and Zhuorang Yang

Sensors 2025, 25(14), 4443; https://doi.org/10.3390/s25144443 - 17 Jul 2025

Viewed by 182

Abstract

To address the current issue of large computations and the difficulty in balancing model complexity and detection accuracy in pavement defect detection models, a lightweight pavement defect detection algorithm, PGS-YOLO, is proposed based on YOLOv8, which integrates perception enhancement and feature optimization. The [...] Read more.

To address the current issue of large computations and the difficulty in balancing model complexity and detection accuracy in pavement defect detection models, a lightweight pavement defect detection algorithm, PGS-YOLO, is proposed based on YOLOv8, which integrates perception enhancement and feature optimization. The algorithm first designs the Receptive-Field Convolutional Block Attention Module Convolution (RFCBAMConv) and the Receptive-Field Convolutional Block Attention Module C2f-RFCBAM, based on which we construct an efficient Perception Enhanced Feature Extraction Network (PEFNet) that enhances multi-scale feature extraction capability by dynamically adjusting the receptive field. Secondly, the dynamic upsampling module DySample is introduced into the efficient feature pyramid, constructing a new feature fusion pyramid (Generalized Dynamic Sampling Feature Pyramid Network, GDSFPN) to optimize the multi-scale feature fusion effect. In addition, a shared detail-enhanced convolution lightweight detection head (SDCLD) was designed, which significantly reduces the model’s parameters and computation while improving localization and classification performance. Finally, Wise-IoU was introduced to optimize the training performance and detection accuracy of the model. Experimental results show that PGS-YOLO increases mAP50 by 2.8% and 2.9% on the complete GRDDC2022 dataset and the Chinese subset, respectively, outperforming the other detection models. The number of parameters and computations are reduced by 10.3% and 9.9%, respectively, compared to the YOLOv8n model, with an average frame rate of 69 frames per second, offering good real-time performance. In addition, on the CRACK500 dataset, PGS-YOLO improved mAP50 by 2.3%, achieving a better balance between model complexity and detection accuracy. Full article

(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))

► Show Figures

Figure 1

24 pages, 2440 KiB

Open AccessArticle

A Novel Dynamic Context Branch Attention Network for Detecting Small Objects in Remote Sensing Images

by Huazhong Jin, Yizhuo Song, Ting Bai, Kaimin Sun and Yepei Chen

Remote Sens. 2025, 17(14), 2415; https://doi.org/10.3390/rs17142415 - 12 Jul 2025

Viewed by 216

Abstract

Detecting small objects in remote sensing images is challenging due to their size, which results in limited distinctive features. This limitation necessitates the effective use of contextual information for accurate identification. Many existing methods often struggle because they do not dynamically adjust the [...] Read more.

Detecting small objects in remote sensing images is challenging due to their size, which results in limited distinctive features. This limitation necessitates the effective use of contextual information for accurate identification. Many existing methods often struggle because they do not dynamically adjust the contextual scope based on the specific characteristics of each target. To address this issue and improve the detection performance of small objects (typically defined as objects with a bounding box area of less than 1024 pixels), we propose a novel backbone network called the Dynamic Context Branch Attention Network (DCBANet). We present the Dynamic Context Scale-Aware (DCSA) Block, which utilizes a multi-branch architecture to generate features with diverse receptive fields. Within each branch, a Context Adaptive Selection Module (CASM) dynamically weights information, allowing the model to focus on the most relevant context. To further enhance performance, we introduce an Efficient Branch Attention (EBA) module that adaptively reweights the parallel branches, prioritizing the most discriminative ones. Finally, to ensure computational efficiency, we design a Dual-Gated Feedforward Network (DGFFN), a lightweight yet powerful replacement for standard FFNs. Extensive experiments conducted on four public remote sensing datasets demonstrate that the DCBANet achieves impressive mAP@0.5 scores of 80.79% on DOTA, 89.17% on NWPU VHR-10, 80.27% on SIMD, and a remarkable 42.4% mAP@0.5:0.95 on the specialized small object benchmark AI-TOD. These results surpass RetinaNet, YOLOF, FCOS, Faster R-CNN, Dynamic R-CNN, SKNet, and Cascade R-CNN, highlighting its effectiveness in detecting small objects in remote sensing images. However, there remains potential for further improvement in multi-scale and weak target detection. Future work will integrate local and global context to enhance multi-scale object detection performance. Full article

(This article belongs to the Special Issue High-Resolution Remote Sensing Image Processing and Applications)

► Show Figures

Figure 1

21 pages, 9172 KiB

Open AccessArticle

Spike-Driven Channel-Temporal Attention Network with Multi-Scale Convolution for Energy-Efficient Bearing Fault Detection

by JinGyo Lim and Seong-Eun Kim

Appl. Sci. 2025, 15(13), 7622; https://doi.org/10.3390/app15137622 - 7 Jul 2025

Viewed by 241

Abstract

Real-time bearing fault diagnosis necessitates highly accurate, computationally efficient, and energy-conserving models suitable for deployment on resource-constrained edge devices. To address these demanding requirements, we propose the Spike Convolutional Attention Network (SpikeCAN), a novel spike-driven neural architecture tailored explicitly for real-time industrial diagnostics. [...] Read more.

Real-time bearing fault diagnosis necessitates highly accurate, computationally efficient, and energy-conserving models suitable for deployment on resource-constrained edge devices. To address these demanding requirements, we propose the Spike Convolutional Attention Network (SpikeCAN), a novel spike-driven neural architecture tailored explicitly for real-time industrial diagnostics. SpikeCAN utilizes the inherent sparsity and event-driven processing capabilities of spiking neural networks (SNNs), significantly minimizing both computational load and power consumption. The SpikeCAN integrates a multi-dilated receptive field (MDRF) block and a convolution-based spike attention module. The MDRF module effectively captures extensive temporal dependencies from signals across various scales. Simultaneously, the spike-based attention mechanism dynamically extracts spatial-temporal patterns, substantially improving diagnostic accuracy and reliability. We validate SpikeCAN on two public bearing fault datasets: the Case Western Reserve University (CWRU) and the Society for Machinery Failure Prevention Technology (MFPT). The proposed model achieves 99.86% accuracy on the four-class CWRU dataset through five-fold cross-validation and 99.88% accuracy with a conventional 70:30 train–test random split. For the more challenging ten-class classification task on the same dataset, it achieves 97.80% accuracy under five-fold cross-validation. Furthermore, SpikeCAN attains a state-of-the-art accuracy of 96.31% on the fifteen-class MFPT dataset, surpassing existing benchmarks. These findings underscore a significant advancement in fault diagnosis technology, demonstrating the considerable practical potential of spike-driven neural networks in real-time, energy-efficient industrial diagnostic applications. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 4010 KiB

Open AccessArticle

PCES-YOLO: High-Precision PCB Detection via Pre-Convolution Receptive Field Enhancement and Geometry-Perception Feature Fusion

by Heqi Yang, Junming Dong, Cancan Wang, Zhida Lian and Hui Chang

Appl. Sci. 2025, 15(13), 7588; https://doi.org/10.3390/app15137588 - 7 Jul 2025

Viewed by 311

Abstract

Printed circuit board (PCB) defect detection faces challenges like small target feature loss and severe background interference. To address these issues, this paper proposes PCES-YOLO, an enhanced YOLOv11-based model. First, a developed Pre-convolution Receptive Field Enhancement (PRFE) module replaces C3k in the C3k2 [...] Read more.

Printed circuit board (PCB) defect detection faces challenges like small target feature loss and severe background interference. To address these issues, this paper proposes PCES-YOLO, an enhanced YOLOv11-based model. First, a developed Pre-convolution Receptive Field Enhancement (PRFE) module replaces C3k in the C3k2 module. The ConvNeXtBlock with inverted bottleneck is introduced in the P4 layer, greatly improving small-target feature capture and semantic understanding. The second key innovation lies in the creation of the Efficient Feature Fusion and Aggregation Network (EFAN), which integrates a lightweight Spatial-Channel Decoupled Downsampling (SCDown) module and three innovative fusion pathways. This achieves substantial parameter reduction while effectively integrating shallow detail features with deep semantic features, preserving critical defect information across different feature levels. Finally, the Shape-IoU loss function is incorporated, focusing on bounding box shape and scale for more accurate regression and enhanced defect localization precision. Experiments on the enhanced Peking University PCB defect dataset show that PCES-YOLO achieves a mAP50 of 97.3% and a mAP50–95 of 77.2%. Compared to YOLOv11n, it shows improvements of 3.6% in mAP50 and 15.2% in mAP50–95. When compared to YOLOv11s, it increases mAP50 by 1.0% and mAP50–95 by 5.6% while also significantly reducing the model parameters. The performance of PCES-YOLO is also evaluated against mainstream object detection algorithms, including Faster R-CNN, SSD, YOLOv8n, etc. These results indicate that PCES-YOLO outperforms these algorithms in terms of detection accuracy and efficiency, making it a promising high-precision and efficient solution for PCB defect detection in industrial settings. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 51503 KiB

Open AccessArticle

LSANet: Lightweight Super Resolution via Large Separable Kernel Attention for Edge Remote Sensing

by Tingting Yong and Xiaofang Liu

Appl. Sci. 2025, 15(13), 7497; https://doi.org/10.3390/app15137497 - 3 Jul 2025

Viewed by 286

Abstract

In recent years, remote sensing imagery has become indispensable for applications such as environmental monitoring, land use classification, and urban planning. However, the physical constraints of satellite imaging systems frequently limit the spatial resolution of these images, impeding the extraction of fine-grained information [...] Read more.

In recent years, remote sensing imagery has become indispensable for applications such as environmental monitoring, land use classification, and urban planning. However, the physical constraints of satellite imaging systems frequently limit the spatial resolution of these images, impeding the extraction of fine-grained information critical to downstream tasks. Super-resolution (SR) techniques thus emerge as a pivotal solution to enhance the spatial fidelity of remote sensing images via computational approaches. While deep learning-based SR methods have advanced reconstruction accuracy, their high computational complexity and large parameter counts restrict practical deployment in real-world remote sensing scenarios—particularly on edge or low-power devices. To address this gap, we propose LSANet, a lightweight SR network customized for remote sensing imagery. The core of LSANet is the large separable kernel attention mechanism, which efficiently expands the receptive field while retaining low computational overhead. By integrating this mechanism into an enhanced residual feature distillation module, the network captures long-range dependencies more effectively than traditional shallow residual blocks. Additionally, a residual feature enhancement module, leveraging contrast-aware channel attention and hierarchical skip connections, strengthens the extraction and integration of multi-level discriminative features. This design preserves fine textures and ensures smooth information propagation across the network. Extensive experiments on public datasets such as UC Merced Land Use and NWPU-RESISC45 demonstrate LSANet’s competitive or superior performance compared to state-of-the-art methods. On the UC Merced Land Use dataset, LSANet achieves a PSNR of 34.33, outperforming the best-baseline HSENet with its PSNR of 34.23 by 0.1. For SSIM, LSANet reaches 0.9328, closely matching HSENet’s 0.9332 while demonstrating excellent metric-balancing performance. On the NWPU-RESISC45 dataset, LSANet attains a PSNR of 35.02, marking a significant improvement over prior methods, and an SSIM of 0.9305, maintaining strong competitiveness. These results, combined with the notable reduction in parameters and floating-point operations, highlight the superiority of LSANet in remote sensing image super-resolution tasks. Full article

► Show Figures

Figure 1

16 pages, 3335 KiB

Open AccessArticle

An Improved DeepSORT-Based Model for Multi-Target Tracking of Underwater Fish

by Shengnan Liu, Jiapeng Zhang, Haojun Zheng, Cheng Qian and Shijing Liu

J. Mar. Sci. Eng. 2025, 13(7), 1256; https://doi.org/10.3390/jmse13071256 - 28 Jun 2025

Viewed by 446

Abstract

Precise identification and quantification of fish movement states are of significant importance for conducting fish behavior research and guiding aquaculture production, with object tracking serving as a key technical approach for achieving behavioral quantification. The traditional DeepSORT algorithm has been widely applied to [...] Read more.

Precise identification and quantification of fish movement states are of significant importance for conducting fish behavior research and guiding aquaculture production, with object tracking serving as a key technical approach for achieving behavioral quantification. The traditional DeepSORT algorithm has been widely applied to object tracking tasks; however, in practical aquaculture environments, high-density cultured fish exhibit visual characteristics such as similar textural features and frequent occlusions, leading to high misidentification rates and frequent ID switching during the tracking process. This study proposes an underwater fish object tracking method based on the improved DeepSORT algorithm, utilizing ResNet as the backbone network, embedding Deformable Convolutional Networks v2 to enhance adaptive receptive field capabilities, introducing Triplet Loss function to improve discrimination ability among similar fish, and integrating Convolutional Block Attention Module to enhance key feature learning. Finally, by combining the aforementioned improvement modules, the ReID feature extraction network was redesigned and optimized. Experimental results demonstrate that the improved algorithm significantly enhances tracking performance under frequent occlusion conditions, with the MOTA metric improving from 64.26% to 66.93% and the IDF1 metric improving from 53.73% to 63.70% compared to the baseline algorithm, providing more reliable technical support for underwater fish behavior analysis. Full article

(This article belongs to the Special Issue Selection of Deep-Sea Aquaculture Species and Development of Supporting Technologies and Equipment)

► Show Figures

Figure 1

20 pages, 67212 KiB

Open AccessArticle

KPV-UNet: KAN PP-VSSA UNet for Remote Image Segmentation

by Shuiping Zhang, Qiang Rao, Lei Wang, Tang Tang and Chen Chen

Electronics 2025, 14(13), 2534; https://doi.org/10.3390/electronics14132534 - 23 Jun 2025

Viewed by 418

Abstract

Semantic segmentation of remote sensing images is a key technology for land cover interpretation and target identification. Although convolutional neural networks (CNNs) have achieved remarkable success in this field, their inherent limitation of local receptive fields restricts their ability to model long-range dependencies [...] Read more.

Semantic segmentation of remote sensing images is a key technology for land cover interpretation and target identification. Although convolutional neural networks (CNNs) have achieved remarkable success in this field, their inherent limitation of local receptive fields restricts their ability to model long-range dependencies and global contextual information. As a result, CNN-based methods often struggle to capture the comprehensive spatial context necessary for accurate segmentation in complex remote sensing scenes, leading to issues such as the misclassification of small objects and blurred or imprecise object boundaries. To address these problems, this paper proposes a new hybrid architecture called KPV-UNet, which integrates the Kolmogorov–Arnold Network (KAN) and the Pyramid Pooling Visual State Space Attention (PP-VSSA) block. KPV-UNet introduces a deep feature refinement module based on KAN and incorporates PP-VSSA to enable scalable long-range modeling. This design effectively captures global dependencies and abundant localized semantic content extracted from complex feature spaces, overcoming CNNs’ limitations in modeling long-range dependencies and inter-national context in large-scale complex scenes. In addition, we designed an Auxiliary Local Monitoring (ALM) block that significantly enhances KPV-UNet’s perception of local content. Experimental results demonstrate that KPV-UNet outperforms state-of-the-art methods on the Vaihingen, LoveDA Urban, and WHDLD datasets, achieving mIoU scores of 84.03%, 51.27%, and 62.87%, respectively. The proposed method not only improves segmentation accuracy but also produces clearer and more connected object boundaries in visual results. Full article

► Show Figures

Figure 1

24 pages, 6594 KiB

Open AccessArticle

GAT-Enhanced YOLOv8_L with Dilated Encoder for Multi-Scale Space Object Detection

by Haifeng Zhang, Han Ai, Donglin Xue, Zeyu He, Haoran Zhu, Delian Liu, Jianzhong Cao and Chao Mei

Remote Sens. 2025, 17(13), 2119; https://doi.org/10.3390/rs17132119 - 20 Jun 2025

Viewed by 445

Abstract

The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale [...] Read more.

The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale feature fusion framework based on an improved version of YOLOv8_L is proposed. The combination of a graph attention network (GAT) and Dilated Encoder network significantly improves the algorithm detection and recognition performance for space remote sensing objects. It mainly includes abandoning the original Feature Pyramid Network (FPN) structure, proposing an adaptive fusion strategy based on multi-level features of backbone network, enhancing the expression ability of multi-scale objects through upsampling and feature stacking, and reconstructing the FPN. The local features extracted by convolutional neural networks are mapped to graph-structured data, and the nodal attention mechanism of GAT is used to capture the global topological association of space objects, which makes up for the deficiency of the convolutional operation in weight allocation and realizes GAT integration. The Dilated Encoder network is introduced to cover different-scale targets by differentiating receptive fields, and the feature weight allocation is optimized by combining it with a Convolutional Block Attention Module (CBAM). According to the characteristics of space missions, an annotated dataset containing 8000 satellite and space station images is constructed, covering a variety of lighting, attitude and scale scenes, and providing benchmark support for model training and verification. Experimental results on the space object dataset reveal that the enhanced algorithm achieves a mean average precision (mAP) of 97.2%, representing a 2.1% improvement over the original YOLOv8_L. Comparative experiments with six other models demonstrate that the proposed algorithm outperforms its counterparts. Ablation studies further validate the synergistic effect between the graph attention network (GAT) and the Dilated Encoder. The results indicate that the model maintains a high detection accuracy under challenging conditions, including strong light interference, multi-scale variations, and low-light environments. Full article

(This article belongs to the Special Issue Remote Sensing Image Thorough Analysis by Advanced Machine Learning)

► Show Figures

Figure 1

19 pages, 6772 KiB

Open AccessArticle

A Cross-Mamba Interaction Network for UAV-to-Satallite Geolocalization

by Lingyun Tian, Qiang Shen, Yang Gao, Simiao Wang, Yunan Liu and Zilong Deng

Drones 2025, 9(6), 427; https://doi.org/10.3390/drones9060427 - 12 Jun 2025

Viewed by 946

Abstract

The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face [...] Read more.

The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face challenges in capturing global feature dependencies due to their restricted receptive fields. Inspired by state-space models (SSMs), which have demonstrated efficacy in modeling long sequences, we propose a pure Mamba-based method called the Cross-Mamba Interaction Network (CMIN) for UAV geolocalization. CMIN consists of three key components: feature extraction, information interaction, and feature fusion. It leverages Mamba’s strengths in global information modeling to effectively capture feature correlations between UAV and satellite images over a larger receptive field. For feature extraction, we design a Siamese Feature Extraction Module (SFEM) based on two basic vision Mamba blocks, enabling the model to capture the correlation between UAV and satellite image features. In terms of information interaction, we introduce a Local Cross-Attention Module (LCAM) to fuse cross-Mamba features, providing a solution for feature matching via deep learning. By aggregating features from various layers of SFEMs, we generate heatmaps for the satellite image that help determine the UAV’s geographical coordinates. Additionally, we propose a Center Masking strategy for data augmentation, which promotes the model’s ability to learn richer contextual information from UAV images. Experimental results on benchmark datasets show that our method achieves state-of-the-art performance. Ablation studies further validate the effectiveness of each component of CMIN. Full article

► Show Figures

Figure 1

17 pages, 9400 KiB

Open AccessArticle

MRCA-UNet: A Multiscale Recombined Channel Attention U-Net Model for Medical Image Segmentation

by Lei Liu, Xiang Li, Shuai Wang, Jun Wang and Silas N. Melo

Symmetry 2025, 17(6), 892; https://doi.org/10.3390/sym17060892 - 6 Jun 2025

Viewed by 495

Abstract

Deep learning techniques play a crucial role in medical image segmentation for diagnostic purposes, with traditional convolutional neural networks (CNNs) and emerging transformers having achieved satisfactory results. CNN-based methods focus on extracting the local features of an image, which are beneficial for handling [...] Read more.

Deep learning techniques play a crucial role in medical image segmentation for diagnostic purposes, with traditional convolutional neural networks (CNNs) and emerging transformers having achieved satisfactory results. CNN-based methods focus on extracting the local features of an image, which are beneficial for handling image details and textural features. However, the receptive fields of CNNs are relatively small, resulting in poor performance when processing images with long-range dependencies. Conversely, transformer-based methods are effective in handling global information; however, they suffer from significant computational complexity arising from the building of long-range dependencies. Additionally, they lack the ability to perceive image details and adopt channel features. These problems can result in unclear image segmentation and blurred boundaries. Accordingly, in this study, a multiscale recombined channel attention (MRCA) module is proposed, which can simultaneously extract both global and local features and has the capability of exploring channel features during feature fusion. Specifically, the proposed MRCA first employs multibranch extraction of image features and performs operations such as blocking, shifting, and aggregating the image at different scales. This step enables the model to recognize multiscale information locally and globally. Feature selection is then performed to enhance the predictive capability of the model. Finally, features from different branches are connected and recombined across channels to complete the feature fusion. Benefiting from fully exploring the channel features, an MRCA-based U-Net (MRCA-UNet) framework is proposed for medical image segmentation. Experiments conducted on the Synapse multi-organ segmentation (Synapse) dataset and the International Skin Imaging Collaboration (ISIC-2018) dataset demonstrate the competitive segmentation performance of the proposed MRCA-UNet, achieving an average Dice Similarity Coefficient (DSC) of 81.61% and a Hausdorff Distance (HD) of 23.36 on Synapse and an Accuracy of 95.94% on ISIC-2018. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Image Processing and Computer Vision Using Embedded Systems)

► Show Figures

Figure 1

14 pages, 3919 KiB

Open AccessArticle

PCB Electronic Component Soldering Defect Detection Using YOLO11 Improved by Retention Block and Neck Structure

by Youzhi Xu, Hao Wu, Yulong Liu and Xing Zhang

Sensors 2025, 25(11), 3550; https://doi.org/10.3390/s25113550 - 4 Jun 2025

Viewed by 649

Abstract

Printed circuit board (PCB) assembly, on the basis of surface mount electronic component welding, is one of the most important electronic assembly processes, and its defect detection is also an important part of industrial generation. The traditional two-stage target detection algorithm model has [...] Read more.

Printed circuit board (PCB) assembly, on the basis of surface mount electronic component welding, is one of the most important electronic assembly processes, and its defect detection is also an important part of industrial generation. The traditional two-stage target detection algorithm model has a large number of parameters and the runtime is too long. The single-stage target detection algorithm has a faster running time, but the detection accuracy needs to be improved. To solve this problem, we innovated and modified the YOLO11n model. Firstly, we used the Retention Block (RetBlock) to improve the C3K2 module in the backbone, creating the RetC3K2 module, which makes up for the limitation of the original module’s limited, purely convolutional local receptive field. Secondly, the neck structure of the original model network is fused with a Multi-Branch Auxiliary Feature Pyramid Network (MAFPN) structure and turned into a multi-branch auxiliary neck network, which enhances the model’s ability to fuse multiple scaled characteristics and conveys diverse information about the gradient for the output layer. The improved YOLO11n model improves its mAP50 by 0.023 (2.5%) and mAP75 by 0.026 (2.8%) in comparison with the primitive model network, and detection precision is significantly improved, proving the superiority of our proposed approach. Full article

(This article belongs to the Section Electronic Sensors)

► Show Figures

Figure 1

16 pages, 6068 KiB

Open AccessArticle

MD-GAN: Multi-Scale Diversity GAN for Large Masks Inpainting

by Shibin Wang, Xuening Guo and Wenjie Guo

Electronics 2025, 14(11), 2218; https://doi.org/10.3390/electronics14112218 - 29 May 2025

Viewed by 285

Abstract

Image inpainting approaches have made considerable progress with the assistance of generative adversarial networks (GANs) recently. However, current inpainting methods are incompetent in handling the cases with large masks and they generally suffer from unreasonable structure. We find that the main reason is [...] Read more.

Image inpainting approaches have made considerable progress with the assistance of generative adversarial networks (GANs) recently. However, current inpainting methods are incompetent in handling the cases with large masks and they generally suffer from unreasonable structure. We find that the main reason is the lack of an effective receptive field in the inpainting network. To alleviate this issue, we propose a new two-stage inpainting model called MD-GAN, which is a multi-scale diverse GAN. We inject dense combinations of dilated convolutions in multiple scales of inpainting networks to obtain more effective receptive fields. In fact, the result of inpainting large masks is generally not uniquely deterministic. To this end, we newly propose the multi-scale probabilistic diverse module, which achieves diverse content generation by spatial-adaptive normalization. Meanwhile, the convolutional block attention module is introduced to improve the ability to extract complex features. Perceptual diversity loss is added to enhance diversity. Extensive experiments on benchmark datasets including CelebA-HQ, Places2 and Paris Street View demonstrate that our approach is able to effectively inpaint diverse and structurally reasonable images. Full article

► Show Figures

Figure 1

23 pages, 6510 KiB

Open AccessArticle

MAMNet: Lightweight Multi-Attention Collaborative Network for Fine-Grained Cropland Extraction from Gaofen-2 Remote Sensing Imagery

by Jiayong Wu, Xue Ding, Jinliang Wang and Jiya Pan

Agriculture 2025, 15(11), 1152; https://doi.org/10.3390/agriculture15111152 - 27 May 2025

Viewed by 355

Abstract

To address the issues of high computational complexity and boundary feature loss encountered when extracting farmland information from high-resolution remote sensing images, this study proposes an innovative CNN–Transformer hybrid network, MAMNet. This framework integrates a lightweight encoder, a global–local Transformer decoder, and a [...] Read more.

To address the issues of high computational complexity and boundary feature loss encountered when extracting farmland information from high-resolution remote sensing images, this study proposes an innovative CNN–Transformer hybrid network, MAMNet. This framework integrates a lightweight encoder, a global–local Transformer decoder, and a bidirectional attention architecture to achieve efficient and accurate farmland information extraction. First, we reconstruct the ResNet-18 backbone network using deep separable convolutions, reducing computational complexity while preserving feature representation capabilities. Second, the global–local Transformer block (GLTB) decoder uses multi-head self-attention mechanisms to dynamically fuse multi-scale features across layers, effectively restoring the topological structure of fragmented farmland boundaries. Third, we propose a novel bidirectional attention architecture: the Detail Improvement Module (DIM) uses channel attention to transfer semantic features to geometric features. The Context Enhancement Module (CEM) utilizes spatial attention to achieve dynamic geometric–semantic fusion, quantitatively distinguishing farmland textures from mixed ground cover. The positional attention mechanism (PAM) enhances the continuity of linear features by strengthening spatial correlations in jump connections. By cascading front-end feature module (FEM) to expand the receptive field and combining an adaptive feature reconstruction head (FRH), this method improves information integrity in fragmented areas. Evaluation results on the 2022 high-resolution two-channel image dataset from Chenggong District, Kunming City, demonstrate that MAMNet achieves an mIoU of 86.68% (an improvement of 1.66% and 2.44% over UNetFormer and BANet, respectively) and an F1-Score of 92.86% with only 12 million parameters. This method provides new technical insights for plot-level farmland monitoring in precision agriculture. Full article

(This article belongs to the Section Digital Agriculture)

► Show Figures

Figure 1

23 pages, 11186 KiB

Open AccessArticle

MixRformer: Dual-Branch Network for Underwater Image Enhancement in Wavelet Domain

by Jie Li, Lei Zhao, Heng Li, Xiaojun Xue and Hui Liu

Sensors 2025, 25(11), 3302; https://doi.org/10.3390/s25113302 - 24 May 2025

Viewed by 376

Abstract

This paper proposes an underwater image enhancement model MixRformer that combines the wavelet transform and a hybrid architecture. To address the problems of insufficient global modeling in existing CNN models, weak local feature extraction of Transformer and high computational complexity, multi-resolution feature decomposition [...] Read more.

This paper proposes an underwater image enhancement model MixRformer that combines the wavelet transform and a hybrid architecture. To address the problems of insufficient global modeling in existing CNN models, weak local feature extraction of Transformer and high computational complexity, multi-resolution feature decomposition is performed through a discrete wavelet transform (IWT/DWT) in which low-frequency components retain structure and texture, and high-frequency components capture detail features. An innovative dual-branch feature capture module (DFCB) is designed as follows: (1) the surface information extraction block combines convolution and position encoding to enhance local modeling; (2) the rectangular window gated Transformer expands the receptive field through the convolution gating mechanism to achieve efficient global relationship modeling. Experiments show that the model outperforms mainstream methods in color restoration and detail enhancement, while optimizing computational efficiency. Full article

(This article belongs to the Special Issue Image and Video Processing and Recognition Based on Artificial Intelligence: 3rd Edition)

► Show Figures

Figure 1

Search Results (159)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (159)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI