MDPI - Publisher of Open Access Journals

22 pages, 6487 KiB

Open AccessArticle

An RGB-D Vision-Guided Robotic Depalletizing System for Irregular Camshafts with Transformer-Based Instance Segmentation and Flexible Magnetic Gripper

by Runxi Wu and Ping Yang

Actuators 2025, 14(8), 370; https://doi.org/10.3390/act14080370 - 24 Jul 2025

Viewed by 269

Abstract

Accurate segmentation of densely stacked and weakly textured objects remains a core challenge in robotic depalletizing for industrial applications. To address this, we propose MaskNet, an instance segmentation network tailored for RGB-D input, designed to enhance recognition performance under occlusion and low-texture conditions. [...] Read more.

Accurate segmentation of densely stacked and weakly textured objects remains a core challenge in robotic depalletizing for industrial applications. To address this, we propose MaskNet, an instance segmentation network tailored for RGB-D input, designed to enhance recognition performance under occlusion and low-texture conditions. Built upon a Vision Transformer backbone, MaskNet adopts a dual-branch architecture for RGB and depth modalities and integrates multi-modal features using an attention-based fusion module. Further, spatial and channel attention mechanisms are employed to refine feature representation and improve instance-level discrimination. The segmentation outputs are used in conjunction with regional depth to optimize the grasping sequence. Experimental evaluations on camshaft depalletizing tasks demonstrate that MaskNet achieves a precision of 0.980, a recall of 0.971, and an F1-score of 0.975, outperforming a YOLO11-based baseline. In an actual scenario, with a self-designed flexible magnetic gripper, the system maintains a maximum grasping error of 9.85 mm and a 98% task success rate across multiple camshaft types. These results validate the effectiveness of MaskNet in enabling fine-grained perception for robotic manipulation in cluttered, real-world scenarios. Full article

(This article belongs to the Section Actuators for Robotics)

► Show Figures

Figure 1

22 pages, 5154 KiB

Open AccessArticle

BCS_YOLO: Research on Corn Leaf Disease and Pest Detection Based on YOLOv11n

by Shengnan Hao, Erjian Gao, Zhanlin Ji and Ivan Ganchev

Appl. Sci. 2025, 15(15), 8231; https://doi.org/10.3390/app15158231 - 24 Jul 2025

Viewed by 226

Abstract

Frequent corn leaf diseases and pests pose serious threats to agricultural production. Traditional manual detection methods suffer from significant limitations in both performance and efficiency. To address this, the present paper proposes a novel biotic condition screening (BCS) model for the detection of [...] Read more.

Frequent corn leaf diseases and pests pose serious threats to agricultural production. Traditional manual detection methods suffer from significant limitations in both performance and efficiency. To address this, the present paper proposes a novel biotic condition screening (BCS) model for the detection of corn leaf diseases and pests, called BCS_YOLO, based on the You Only Look Once version 11n (YOLOv11n). The proposed model enables accurate detection and classification of various corn leaf pathologies and pest infestations under challenging agricultural field conditions. It achieves this thanks to three key newly designed modules—a Self-Perception Coordinated Global Attention (SPCGA) module, a High/Low-Frequency Feature Enhancement (HLFFE) module, and a Local Attention Enhancement (LAE) module. The SPCGA module improves the model’s ability to perceive fine-grained targets by fusing multiple attention mechanisms. The HLFFE module adopts a frequency domain separation strategy to strengthen edge delineation and structural detail representation in affected areas. The LAE module effectively improves the model’s discrimination ability between targets and backgrounds through local importance calculation and intensity adjustment mechanisms. Conducted experiments show that BCS_YOLO achieves 78.4%, 73.7%, 76.0%, and 82.0% in

p r e c i s i o n

, recall,

F 1 s c o r e

, and

m A P @ 50

, respectively, representing corresponding improvements of 3.0%, 3.3%, 3.2%, and 4.6% compared to the baseline model (YOLOv11n), while also outperforming the mainstream object detection models. In summary, the proposed BCS_YOLO model provides a practical and scalable solution for efficient detection of corn leaf diseases and pests in complex smart-agriculture scenarios, demonstrating significant theoretical and application value. Full article

(This article belongs to the Special Issue Innovations in Artificial Neural Network Applications)

► Show Figures

Figure 1

22 pages, 4611 KiB

Open AccessArticle

MMC-YOLO: A Lightweight Model for Real-Time Detection of Geometric Symmetry-Breaking Defects in Wind Turbine Blades

by Caiye Liu, Chao Zhang, Xinyu Ge, Xunmeng An and Nan Xue

Symmetry 2025, 17(8), 1183; https://doi.org/10.3390/sym17081183 - 24 Jul 2025

Viewed by 296

Abstract

Performance degradation of wind turbine blades often stems from geometric asymmetry induced by damage. Existing methods for assessing damage face challenges in balancing accuracy and efficiency due to their limited ability to capture fine-grained geometric asymmetries associated with multi-scale damage under complex background [...] Read more.

Performance degradation of wind turbine blades often stems from geometric asymmetry induced by damage. Existing methods for assessing damage face challenges in balancing accuracy and efficiency due to their limited ability to capture fine-grained geometric asymmetries associated with multi-scale damage under complex background interference. To address this, based on the high-speed detection model YOLOv10-N, this paper proposes a novel detection model named MMC-YOLO. First, the Multi-Scale Perception Gated Convolution (MSGConv) Module was designed, which constructs a full-scale receptive field through multi-branch fusion and channel rearrangement to enhance the extraction of geometric asymmetry features. Second, the Multi-Scale Enhanced Feature Pyramid Network (MSEFPN) was developed, integrating dynamic path aggregation and an SENetv2 attention mechanism to suppress background interference and amplify damage response. Finally, the Channel-Compensated Filtering (CCF) module was constructed to preserve critical channel information using a dynamic buffering mechanism. Evaluated on a dataset of 4818 wind turbine blade damage images, MMC-YOLO achieves an 82.4% mAP [0.5:0.95], representing a 4.4% improvement over the baseline YOLOv10-N model, and a 91.1% recall rate, an 8.7% increase, while maintaining a lightweight parameter count of 4.2 million. This framework significantly enhances geometric asymmetry defect detection accuracy while ensuring real-time performance, meeting engineering requirements for high efficiency and precision. Full article

(This article belongs to the Special Issue Symmetry and Its Applications in Image Processing)

► Show Figures

Figure 1

17 pages, 1927 KiB

Open AccessArticle

ConvTransNet-S: A CNN-Transformer Hybrid Disease Recognition Model for Complex Field Environments

by Shangyun Jia, Guanping Wang, Hongling Li, Yan Liu, Linrong Shi and Sen Yang

Plants 2025, 14(15), 2252; https://doi.org/10.3390/plants14152252 - 22 Jul 2025

Viewed by 341

Abstract

To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification [...] Read more.

To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification tasks. Unlike existing hybrid approaches, ConvTransNet-S uniquely introduces three key innovations: First, a Local Perception Unit (LPU) and Lightweight Multi-Head Self-Attention (LMHSA) modules were introduced to synergistically enhance the extraction of fine-grained plant disease details and model global dependency relationships, respectively. Second, an Inverted Residual Feed-Forward Network (IRFFN) was employed to optimize the feature propagation path, thereby enhancing the model’s robustness against interferences such as lighting variations and leaf occlusions. This novel combination of a LPU, LMHSA, and an IRFFN achieves a dynamic equilibrium between local texture perception and global context modeling—effectively resolving the trade-offs inherent in standalone CNNs or transformers. Finally, through a phased architecture design, efficient fusion of multi-scale disease features is achieved, which enhances feature discriminability while reducing model complexity. The experimental results indicated that ConvTransNet-S achieved a recognition accuracy of 98.85% on the PlantVillage public dataset. This model operates with only 25.14 million parameters, a computational load of 3.762 GFLOPs, and an inference time of 7.56 ms. Testing on a self-built in-field complex scene dataset comprising 10,441 images revealed that ConvTransNet-S achieved an accuracy of 88.53%, which represents improvements of 14.22%, 2.75%, and 0.34% over EfficientNetV2, Vision Transformer, and Swin Transformer, respectively. Furthermore, the ConvTransNet-S model achieved up to 14.22% higher disease recognition accuracy under complex background conditions while reducing the parameter count by 46.8%. This confirms that its unique multi-scale feature mechanism can effectively distinguish disease from background features, providing a novel technical approach for disease diagnosis in complex agricultural scenarios and demonstrating significant application value for intelligent agricultural management. Full article

(This article belongs to the Section Plant Modeling)

► Show Figures

Figure 1

19 pages, 4037 KiB

Open AccessArticle

YOLO-MFD: Object Detection for Multi-Scenario Fires

by Fuchuan Mo, Shen Liu, Sitong Wu, Ruiyuan Chen and Tiecheng Song

Information 2025, 16(7), 620; https://doi.org/10.3390/info16070620 - 21 Jul 2025

Viewed by 245

Abstract

Fire refers to a disaster caused by combustion that is uncontrolled in the temporal and spatial dimensions, occurring in diverse complex scenarios where timely and effective detection is crucial. However, existing fire detection methods are often challenged by the deformation of smoke and [...] Read more.

Fire refers to a disaster caused by combustion that is uncontrolled in the temporal and spatial dimensions, occurring in diverse complex scenarios where timely and effective detection is crucial. However, existing fire detection methods are often challenged by the deformation of smoke and flames, resulting in missed detections. It is difficult to accurately extract fire features in complex backgrounds, and there are also significant difficulties in detecting small targets, such as small flames. To address this, this paper proposes a YOLO-Multi-scenario Fire Detector (YOLO-MFD) for multi-scenario fire detection. Firstly, to resolve missed detection caused by deformation of smoke and flames, a Scale Adaptive Perception Module (SAPM) is proposed. Secondly, aiming at the suppression of significant fire features by complex backgrounds, a Feature Adaptive Weighting Module (FAWM) is introduced to enhance the feature representation of fire. Finally, considering the difficulty in detecting small flames, a fine-grained Small Object Feature Extraction Module (SOFEM) is developed. Additionally, given the scarcity of multi-scenario fire datasets, this paper constructs a Multi-scenario Fire Dataset (MFDB). Experimental results on MFDB demonstrate that the proposed YOLO-MFD achieves a good balance between effectiveness and efficiency, achieving good effective fire detection performance across various scenarios. Full article

► Show Figures

Figure 1

21 pages, 12122 KiB

Open AccessArticle

RA3T: An Innovative Region-Aligned 3D Transformer for Self-Supervised Sim-to-Real Adaptation in Low-Altitude UAV Vision

by Xingrao Ma, Jie Xie, Di Shao, Aiting Yao and Chengzu Dong

Electronics 2025, 14(14), 2797; https://doi.org/10.3390/electronics14142797 - 11 Jul 2025

Viewed by 285

Abstract

Low-altitude unmanned aerial vehicle (UAV) vision is critically hindered by the Sim-to-Real Gap, where models trained exclusively on simulation data degrade under real-world variations in lighting, texture, and weather. To address this problem, we propose RA3T (Region-Aligned 3D Transformer), a novel self-supervised framework [...] Read more.

Low-altitude unmanned aerial vehicle (UAV) vision is critically hindered by the Sim-to-Real Gap, where models trained exclusively on simulation data degrade under real-world variations in lighting, texture, and weather. To address this problem, we propose RA3T (Region-Aligned 3D Transformer), a novel self-supervised framework that enables robust Sim-to-Real adaptation. Specifically, we first develop a dual-branch strategy for self-supervised feature learning, integrating Masked Autoencoders and contrastive learning. This approach extracts domain-invariant representations from unlabeled simulated imagery to enhance robustness against occlusion while reducing annotation dependency. Leveraging these learned features, we then introduce a 3D Transformer fusion module that unifies multi-view RGB and LiDAR point clouds through cross-modal attention. By explicitly modeling spatial layouts and height differentials, this component significantly improves recognition of small and occluded targets in complex low-altitude environments. To address persistent fine-grained domain shifts, we finally design region-level adversarial calibration that deploys local discriminators on partitioned feature maps. This mechanism directly aligns texture, shadow, and illumination discrepancies which challenge conventional global alignment methods. Extensive experiments on UAV benchmarks VisDrone and DOTA demonstrate the effectiveness of RA3T. The framework achieves +5.1% mAP on VisDrone and +7.4% mAP on DOTA over the 2D adversarial baseline, particularly on small objects and sparse occlusions, while maintaining real-time performance of 17 FPS at 1024 × 1024 resolution on an RTX 4080 GPU. Visual analysis confirms that the synergistic integration of 3D geometric encoding and local adversarial alignment effectively mitigates domain gaps caused by uneven illumination and perspective variations, establishing an efficient pathway for simulation-to-reality UAV perception. Full article

(This article belongs to the Special Issue Innovative Technologies and Services for Unmanned Aerial Vehicles)

► Show Figures

Figure 1

23 pages, 10392 KiB

Open AccessArticle

Dual-Branch Luminance–Chrominance Attention Network for Hydraulic Concrete Image Enhancement

by Zhangjun Peng, Li Li, Chuanhao Chang, Rong Tang, Guoqiang Zheng, Mingfei Wan, Juanping Jiang, Shuai Zhou, Zhenggang Tian and Zhigui Liu

Appl. Sci. 2025, 15(14), 7762; https://doi.org/10.3390/app15147762 - 10 Jul 2025

Viewed by 253

Abstract

Hydraulic concrete is a critical infrastructure material, with its surface condition playing a vital role in quality assessments for water conservancy and hydropower projects. However, images taken in complex hydraulic environments often suffer from degraded quality due to low lighting, shadows, and noise, [...] Read more.

Hydraulic concrete is a critical infrastructure material, with its surface condition playing a vital role in quality assessments for water conservancy and hydropower projects. However, images taken in complex hydraulic environments often suffer from degraded quality due to low lighting, shadows, and noise, making it difficult to distinguish defects from the background and thereby hindering accurate defect detection and damage evaluation. In this study, following systematic analyses of hydraulic concrete color space characteristics, we propose a Dual-Branch Luminance–Chrominance Attention Network (DBLCANet-HCIE) specifically designed for low-light hydraulic concrete image enhancement. Inspired by human visual perception, the network simultaneously improves global contrast and preserves fine-grained defect textures, which are essential for structural analysis. The proposed architecture consists of a Luminance Adjustment Branch (LAB) and a Chroma Restoration Branch (CRB). The LAB incorporates a Luminance-Aware Hybrid Attention Block (LAHAB) to capture both the global luminance distribution and local texture details, enabling adaptive illumination correction through comprehensive scene understanding. The CRB integrates a Channel Denoiser Block (CDB) for channel-specific noise suppression and a Frequency-Domain Detail Enhancement Block (FDDEB) to refine chrominance information and enhance subtle defect textures. A feature fusion block is designed to fuse and learn the features of the outputs from the two branches, resulting in images with enhanced luminance, reduced noise, and preserved surface anomalies. To validate the proposed approach, we construct a dedicated low-light hydraulic concrete image dataset (LLHCID). Extensive experiments conducted on both LOLv1 and LLHCID benchmarks demonstrate that the proposed method significantly enhances the visual interpretability of hydraulic concrete surfaces while effectively addressing low-light degradation challenges. Full article

► Show Figures

Figure 1

28 pages, 8102 KiB

Open AccessArticle

Multi-Neighborhood Sparse Feature Selection for Semantic Segmentation of LiDAR Point Clouds

by Rui Zhang, Guanlong Huang, Fengpu Bao and Xin Guo

Remote Sens. 2025, 17(13), 2288; https://doi.org/10.3390/rs17132288 - 3 Jul 2025

Viewed by 341

Abstract

LiDAR point clouds, as direct carriers of 3D spatial information, comprehensively record the geometric features and spatial topological relationships of object surfaces, providing intelligent systems with rich 3D scene representation capability. However, current point cloud semantic segmentation methods primarily extract features through operations [...] Read more.

LiDAR point clouds, as direct carriers of 3D spatial information, comprehensively record the geometric features and spatial topological relationships of object surfaces, providing intelligent systems with rich 3D scene representation capability. However, current point cloud semantic segmentation methods primarily extract features through operations such as convolution and pooling, yet fail to adequately consider sparse features that significantly influence the final results of point cloud-based scene perception, resulting in insufficient feature representation capability. To address these problems, a sparse feature dynamic graph convolutional neural network, abbreviated as SFDGNet, is constructed in this paper for LiDAR point clouds of complex scenes. In the context of this paper, sparse features refer to feature representations in which only a small number of activation units or channels exhibit significant responses during the forward pass of the model. First, a sparse feature regularization method was used to motivate the network model to learn the sparsified feature weight matrix. Next, a split edge convolution module, abbreviated as SEConv, was designed to extract the local features of the point cloud from multiple neighborhoods by dividing the input feature channels, and to effectively learn sparse features to avoid feature redundancy. Finally, a multi-neighborhood feature fusion strategy was developed that combines the attention mechanism to fuse the local features of different neighborhoods and obtain global features with fine-grained information. Taking S3DIS and ScanNet v2 datasets, we evaluated the feasibility and effectiveness of SFDGNet by comparing it with six typical semantic segmentation models. Compared with the benchmark model DGCNN, SFDGNet improved overall accuracy

(O A)

, mean accuracy

(m A c c)

, mean intersection over union

(m I o U)

, and

s p a r s i t y

by

1.8 %

,

3.7 %, 3.5 %

, and

85.5 %

on the S3DIS dataset, respectively. The

m I o U

on the ScanNet v2 validation set,

m I o U

on the test set, and

s p a r s i t y

were improved by

3.2 %, 7.0 %

, and

54.5 %

, respectively. Full article

(This article belongs to the Special Issue Remote Sensing for 2D/3D Mapping)

► Show Figures

Graphical abstract

15 pages, 4490 KiB

Open AccessTechnical Note

A Category–Pose Jointly Guided ISAR Image Key Part Recognition Network for Space Targets

by Qi Yang, Hongqiang Wang, Lei Fan and Shuangxun Li

Remote Sens. 2025, 17(13), 2218; https://doi.org/10.3390/rs17132218 - 27 Jun 2025

Viewed by 252

Abstract

It is a crucial interpretation task in space target perception to identify key parts of space targets through the inverse synthetic aperture radar (ISAR) imaging. Due to the significant variations in the categories and poses of space targets, conventional methods that directly predict [...] Read more.

It is a crucial interpretation task in space target perception to identify key parts of space targets through the inverse synthetic aperture radar (ISAR) imaging. Due to the significant variations in the categories and poses of space targets, conventional methods that directly predict identification results exhibit limited accuracy. Hence, we make the first attempt to propose a key part recognition network based on ISAR images, which incorporates the knowledge of space target categories and poses. Specifically, we propose a fine-grained category training paradigm that defines the same functional parts of different space targets as distinct categories. Correspondingly, additional classification heads are employed to predict category and pose, and the predictions are then integrated with ISAR image semantic features through a designed category–pose guidance module to achieve high-precision recognition guided by category and pose knowledge. Qualitative and quantitative evaluations on two types of simulated targets and one type of measured target demonstrate that the proposed method reduces the complexity of the key part recognition task and significantly improves recognition accuracy compared to the existing mainstream methods. Full article

► Show Figures

Figure 1

17 pages, 4478 KiB

Open AccessArticle

A Study on Generating Maritime Image Captions Based on Transformer Dual Information Flow

by Zhenqiang Zhao, Helong Shen, Meng Wang and Yufei Wang

J. Mar. Sci. Eng. 2025, 13(7), 1204; https://doi.org/10.3390/jmse13071204 - 21 Jun 2025

Viewed by 244

Abstract

The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual information into structured semantic descriptions. However, existing general purpose models often [...] Read more.

The environmental perception capability of intelligent ships is essential for enhancing maritime navigation safety and advancing shipping intelligence. Image caption generation technology plays a pivotal role in this context by converting visual information into structured semantic descriptions. However, existing general purpose models often struggle to perform effectively in complex maritime environments due to limitations in visual feature extraction and semantic modeling. To address these challenges, this study proposes a transformer dual-stream information (TDSI) model. The proposed model uses a Swin-transformer to extract grid features and combines them with fine-grained scene semantics obtained via SegFormer. A dual-encoder structure independently encodes the grid and segmentation features, which are subsequently fused through a feature fusion module for implicit integration. A decoder with a cross-attention mechanism is then employed to generate descriptive captions for maritime images. Extensive experiments were conducted using the constructed maritime semantic segmentation and maritime image captioning datasets. The results demonstrate that the proposed TDSI model outperforms existing mainstream methods in terms of several evaluation metrics, including BLEU, METEOR, ROUGE, and CIDEr. These findings confirm the effectiveness of the TDSI model in enhancing image captioning performance in maritime environments. Full article

(This article belongs to the Special Issue Unmanned Marine Vehicles: Perception, Planning, Control and Swarm—2nd Edition)

► Show Figures

Figure 1

17 pages, 5115 KiB

Open AccessArticle

PerNN: A Deep Learning-Based Recommendation Algorithm for Personalized Customization

by Yang Zhang, Xiaoping Lu, Yating Zhao and Zhenfa Yang

Electronics 2025, 14(12), 2451; https://doi.org/10.3390/electronics14122451 - 16 Jun 2025

Viewed by 377

Abstract

In the context of the Internet, the personalization and diversification of customer demands present a significant challenge for research on the identification, combination, and utilization of personalized demand feature elements. A key difficulty lies in achieving real-time perception, processing, and recognition of customer [...] Read more.

In the context of the Internet, the personalization and diversification of customer demands present a significant challenge for research on the identification, combination, and utilization of personalized demand feature elements. A key difficulty lies in achieving real-time perception, processing, and recognition of customer needs to dynamically identify and understand personalized customer intent. To address the limitations, we propose a Personalized customization-based Neural Network (PerNN), designed to enhance the performance and accuracy of recommendation systems in large-scale and complex information environments. The PerNN model introduces a Personalized Features Layer (PF), which effectively integrates multi-dimensional information—including historical interaction data, social network relationships, and users’ temporal behavior patterns—to generate fine-grained, personalized user feature representations. This approach significantly improves the model’s ability to predict user preferences. Extensive experiments conducted on public datasets demonstrate that the PerNN model consistently outperforms existing methods, particularly regarding the accuracy and response speed of personalized recommendations. The results validate the effectiveness and superiority of the proposed model in managing complex and recommendation tasks, offering a novel and efficient solution for personalized customization scenarios. Full article

► Show Figures

Figure 1

24 pages, 1562 KiB

Open AccessArticle

A Novel Framework for Enhancing Decision-Making in Autonomous Cyber Defense Through Graph Embedding

by Zhen Wang, Yongjie Wang, Xinli Xiong, Qiankun Ren and Jun Huang

Entropy 2025, 27(6), 622; https://doi.org/10.3390/e27060622 - 11 Jun 2025

Cited by 1 | Viewed by 566

Abstract

Faced with challenges posed by sophisticated cyber attacks and dynamic characteristics of cyberspace, the autonomous cyber defense (ACD) technology has shown its effectiveness. However, traditional decision-making methods for ACD are unable to effectively characterize the network topology and internode dependencies, which makes it [...] Read more.

Faced with challenges posed by sophisticated cyber attacks and dynamic characteristics of cyberspace, the autonomous cyber defense (ACD) technology has shown its effectiveness. However, traditional decision-making methods for ACD are unable to effectively characterize the network topology and internode dependencies, which makes it difficult for defenders to identify key nodes and critical attack paths. Therefore, this paper proposes an enhanced decision-making method combining graph embedding with reinforcement learning algorithms. By constructing a game model for cyber confrontations, this paper models important elements of the network topology for decision-making, which guide the defender to dynamically optimize its strategy based on topology awareness. We improve the reinforcement learning with the Node2vec algorithm to characterize information for the defender from the network. And, node attributes and network structural features are embedded into low-dimensional vectors instead of using traditional one-hot encoding, which can address the perceptual bottleneck in high-dimensional sparse environments. Meanwhile, the algorithm training environment Cyberwheel is extended by adding new fine-grained defense mechanisms to enhance the utility and portability of ACD. In experiments, our decision-making method based on graph embedding is compared and analyzed with traditional perception methods. The results show and verify the superior performance of our approach in the strategy selection of defensive decision-making. Also, diverse parameters of the graph representation model Node2vec are analyzed and compared to find the impact on the enhancement of the embedding effectiveness for the decision-making of ACD. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

25 pages, 11680 KiB

Open AccessArticle

ETAFHrNet: A Transformer-Based Multi-Scale Network for Asymmetric Pavement Crack Segmentation

by Chao Tan, Jiaqi Liu, Zhedong Zhao, Rufei Liu, Peng Tan, Aishu Yao, Shoudao Pan and Jingyi Dong

Appl. Sci. 2025, 15(11), 6183; https://doi.org/10.3390/app15116183 - 30 May 2025

Viewed by 642

Abstract

Accurate segmentation of pavement cracks from high-resolution remote sensing imagery plays a crucial role in automated road condition assessment and infrastructure maintenance. However, crack structures often exhibit asymmetry, irregular morphology, and multi-scale variations, posing significant challenges to conventional CNN-based methods in real-world environments. [...] Read more.

Accurate segmentation of pavement cracks from high-resolution remote sensing imagery plays a crucial role in automated road condition assessment and infrastructure maintenance. However, crack structures often exhibit asymmetry, irregular morphology, and multi-scale variations, posing significant challenges to conventional CNN-based methods in real-world environments. Specifically, the proposed ETAFHrNet focuses on two predominant pavement-distress morphologies—linear cracks (transverse and longitudinal) and alligator cracks—and has been empirically validated on their intersections and branching patterns over both asphalt and concrete road surfaces. In this work, we present ETAFHrNet, a novel attention-guided segmentation network designed to address the limitations of traditional architectures in detecting fine-grained and asymmetric patterns. ETAFHrNet integrates Transformer-based global attention and multi-scale hybrid feature fusion, enhancing both contextual perception and detail sensitivity. The network introduces two key modules: the Efficient Hybrid Attention Transformer (EHAT), which captures long-range dependencies, and the Cross-Scale Hybrid Attention Module (CSHAM), which adaptively fuses features across spatial resolutions. To support model training and benchmarking, we also propose QD-Crack, a high-resolution, pixel-level annotated dataset collected from real-world road inspection scenarios. Experimental results show that ETAFHrNet significantly outperforms existing methods—including U-Net, DeepLabv3+, and HRNet—in both segmentation accuracy and generalization ability. These findings demonstrate the effectiveness of interpretable, multi-scale attention architectures in complex object detection and image classification tasks, making our approach relevant for broader applications, such as autonomous driving, remote sensing, and smart infrastructure systems. Full article

(This article belongs to the Special Issue Object Detection and Image Classification)

► Show Figures

Figure 1

21 pages, 3538 KiB

Open AccessArticle

MFFP-Net: Building Segmentation in Remote Sensing Images via Multi-Scale Feature Fusion and Foreground Perception Enhancement

by Huajie Xu, Qiukai Huang, Haikun Liao, Ganxiao Nong and Wei Wei

Remote Sens. 2025, 17(11), 1875; https://doi.org/10.3390/rs17111875 - 28 May 2025

Viewed by 529

Abstract

The accurate segmentation of small target buildings in high-resolution remote sensing images remains challenging due to two critical issues: (1) small target buildings often occupy few pixels in complex backgrounds, leading to frequent background confusion, and (2) significant intra-class variance complicates feature representation [...] Read more.

The accurate segmentation of small target buildings in high-resolution remote sensing images remains challenging due to two critical issues: (1) small target buildings often occupy few pixels in complex backgrounds, leading to frequent background confusion, and (2) significant intra-class variance complicates feature representation compared to conventional semantic segmentation tasks. To address these challenges, we propose a novel Multi-Scale Feature Fusion and Foreground Perception Enhancement Network (MFFP-Net). This framework introduces three key innovations: (1) a Multi-Scale Feature Fusion (MFF) module that hierarchically aggregates shallow features through cross-level connections to enhance fine-grained detail preservation, (2) a Foreground Perception Enhancement (FPE) module that establishes pixel-wise affinity relationships within foreground regions to mitigate intra-class variance effects, and (3) a Dual-Path Attention (DPA) mechanism combining parallel global and local attention pathways to jointly capture structural details and long-range contextual dependencies. Experimental results demonstrate that the IoU of the proposed method achieves improvements of 0.44%, 0.98% and 0.61% compared to mainstream state-of-the-art methods on the WHU Building, Massachusetts Building, and Inria Aerial Image Labeling datasets, respectively, validating its effectiveness in handling small targets and intra-class variance while maintaining robustness in complex scenarios. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

19 pages, 8750 KiB

Open AccessArticle

FP-Deeplab: A Novel Face Parsing Network for Fine-Grained Boundary Detection and Semantic Understanding

by Borui Zeng, Can Shu, Ziqi Liao, Jingru Yu, Zhiyu Liu and Xiaoyan Chen

Appl. Sci. 2025, 15(11), 6016; https://doi.org/10.3390/app15116016 - 27 May 2025

Viewed by 402

Abstract

Facial semantic segmentation, as a critical technology in high-level visual understanding, plays an important role in applications such as facial editing, augmented reality, and identity recognition. However, due to the complexity of facial structures, ambiguous boundaries, and inconsistent scales of facial components, traditional [...] Read more.

Facial semantic segmentation, as a critical technology in high-level visual understanding, plays an important role in applications such as facial editing, augmented reality, and identity recognition. However, due to the complexity of facial structures, ambiguous boundaries, and inconsistent scales of facial components, traditional methods still suffer from significant limitations in detail preservation and contextual modeling. To address these challenges, this paper proposes a facial parsing network based on the Deeplabv3+ framework, named FP-Deeplab, which aims to improve segmentation performance and generalization capability through structurally enhanced modules. Specifically, two key modules are designed: (1) the Context-Channel Refine Feature Enhancement (CCR-FE) module, which integrates multi-scale contextual strip convolutions and Cross-Axis Attention and introduces a channel attention mechanism to strengthen the modeling of long-range spatial dependencies and enhances the perception and representation of boundary regions; (2) the Self-Modulation Attention Feature Integration with Regularization (SimFA) module, which combines local detail modeling and a parameter-free channel attention modulation mechanism to achieve fine-grained reconstruction and enhancement of semantic features, effectively mitigating boundary blur and information loss during the upsampling stage. The experimental results on two public facial segmentation datasets, CelebAMask-HQ and HELEN, demonstrate that FP-Deeplab improves the baseline model by 3.8% in Mean IoU and 2.3% in the overall F1-score on the HELEN dataset, and it achieves a Mean F1-score of 84.8% on the CelebAMask-HQ dataset. Furthermore, the proposed method shows superior accuracy and robustness in multiple key component categories, especially in long-tailed regions, validating its effectiveness. Full article

► Show Figures

Figure 1

Search Results (51)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (51)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI