Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (93)

Search Parameters:
Keywords = cascaded U-Net

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
33 pages, 2435 KB  
Article
Multi-Task Learning for Ocean-Front Detection and Evolutionary Trend Recognition
by Qi He, Anqi Huang, Lijia Geng, Wei Zhao and Yanling Du
Remote Sens. 2025, 17(23), 3862; https://doi.org/10.3390/rs17233862 - 28 Nov 2025
Viewed by 368
Abstract
Ocean fronts are central to upper-ocean dynamics and ecosystem processes, yet recognizing their evolutionary trends from satellite data remains challenging. We present a 3D U-Net-based multi-task framework that jointly performs ocean-front detection (OFD) and ocean-front evolutionary trend recognition (OFETR) from sea surface temperature [...] Read more.
Ocean fronts are central to upper-ocean dynamics and ecosystem processes, yet recognizing their evolutionary trends from satellite data remains challenging. We present a 3D U-Net-based multi-task framework that jointly performs ocean-front detection (OFD) and ocean-front evolutionary trend recognition (OFETR) from sea surface temperature gradient heatmaps. Instead of cascading OFD and OFETR in separate stages that pass OFD outputs downstream and can amplify upstream errors, the proposed model shares 3D spatiotemporal features and is trained end-to-end. We construct the Zhejiang–Fujian Coastal Front Mask (ZFCFM) and Evolutionary Trend (ZFCFET) datasets from ESA SST CCI L4 products for 2002–2021 and use them to evaluate the framework against 2D CNN baselines and traditional methods. Multi-task learning improves OFETR compared with single-task training while keeping OFD performance comparable, and the unified design reduces parameter count and daily computational cost. The model outputs daily point-level trend labels aligned with the dataset’s temporal resolution, indicating that end-to-end multi-task learning can mitigate error propagation and provide temporally resolved estimates. Full article
Show Figures

Figure 1

17 pages, 2779 KB  
Article
Image Restoration Based on Semantic Prior Aware Hierarchical Network and Multi-Scale Fusion Generator
by Yapei Feng, Yuxiang Tang and Hua Zhong
Technologies 2025, 13(11), 521; https://doi.org/10.3390/technologies13110521 - 13 Nov 2025
Viewed by 569
Abstract
As a fundamental low-level vision task, image restoration plays a pivotal role in reconstructing authentic visual information from corrupted inputs, directly impacting the performance of downstream high-level vision systems. Current approaches frequently exhibit two critical limitations: (1) Progressive texture degradation and blurring during [...] Read more.
As a fundamental low-level vision task, image restoration plays a pivotal role in reconstructing authentic visual information from corrupted inputs, directly impacting the performance of downstream high-level vision systems. Current approaches frequently exhibit two critical limitations: (1) Progressive texture degradation and blurring during iterative refinement, particularly in irregular damage patterns. (2) Structural incoherence when handling cross-domain artifacts. To address these challenges, we present a semantic-aware hierarchical network (SAHN) that synergistically integrates multi-scale semantic guidance with structural consistency constraints. Firstly, we construct a Dual-Stream Feature Extractor. Based on a modified U-Net backbone with dilated residual blocks, this skip-connected encoder–decoder module simultaneously captures hierarchical semantic contexts and fine-grained texture details. Secondly, we propose the semantic prior mapper by establishing spatial–semantic correspondences between damaged areas and multi-scale features through predefined semantic prototypes through adaptive attention pooling. Additionally, we construct a multi-scale fusion generator, by employing cascaded association blocks with structural similarity constraints. This unit progressively aggregates features from different semantic levels using deformable convolution kernels, effectively bridging the gap between global structure and local texture reconstruction. Compared to existing methods, our algorithm attains the highest overall PSNR of 34.99 with the best visual authenticity (with the lowest FID of 11.56). Comprehensive evaluations of three datasets demonstrate its leading performance in restoring visual realism. Full article
Show Figures

Figure 1

17 pages, 1571 KB  
Article
Anatomically Guided Cascaded U-Net Ensemble for Coronary Artery Calcification Segmentation in Cardiac CT
by Omar Alirr and Tarek Khalifa
Bioengineering 2025, 12(11), 1243; https://doi.org/10.3390/bioengineering12111243 - 13 Nov 2025
Viewed by 760
Abstract
Accurate segmentation of coronary artery calcifications (CAC) from cardiac CT is challenged by class imbalance, small lesion size, and anatomical ambiguity. We present an anatomically guided, cascaded framework that couples heart and vessel priors with a heterogeneous U-Net ensemble for robust, vessel-aware CAC [...] Read more.
Accurate segmentation of coronary artery calcifications (CAC) from cardiac CT is challenged by class imbalance, small lesion size, and anatomical ambiguity. We present an anatomically guided, cascaded framework that couples heart and vessel priors with a heterogeneous U-Net ensemble for robust, vessel-aware CAC segmentation. First, a ResU-Net trained on MM-WHS isolates the heart region of interest (ROI). Second, a ResU-Net trained on ASOCA—using Frangi vesselness enhancement—segments the coronary arteries, yielding vessel masks that constrain downstream lesion detection. Third, calcifications are segmented within the vessel-constrained ROI using an ensemble of U-Net variants (baseline U-Net, Residual U-Net, Attention U-Net, UNet++). At inference, a rank-based selective fusion strategy prioritizes predictions with strong morphological consistency and vessel conformity, suppressing false positives. On the Stanford COCA gated dataset, the proposed ensemble outperforms individual models (Dice 84.25%, sensitivity 87.10%, specificity 98.00%), with ablations demonstrating additional gains when vessel priors are integrated into selective fusion (Dice 85.50%, sensitivity 88.53%). Results confirm that combining dataset-specific anatomical priors with selective ensembling improves boundary sharpness, small-lesion detectability, and anatomical plausibility, supporting reliable CAC segmentation in clinical imaging workflows. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

20 pages, 55265 KB  
Article
Learning Precise Mask Representation for Siamese Visual Tracking
by Peng Yang, Fen Hu, Qinghui Wang and Lei Dou
Sensors 2025, 25(18), 5743; https://doi.org/10.3390/s25185743 - 15 Sep 2025
Viewed by 877
Abstract
Siamese network trackers are a prominent paradigm in visual object tracking due to efficient similarity learning. However, most Siamese trackers are restricted to the bounding box tracking format, which often fails to accurately describe the appearance of non-rigid targets with complex deformations. Additionally, [...] Read more.
Siamese network trackers are a prominent paradigm in visual object tracking due to efficient similarity learning. However, most Siamese trackers are restricted to the bounding box tracking format, which often fails to accurately describe the appearance of non-rigid targets with complex deformations. Additionally, since the bounding box frequently includes excessive background pixels, trackers are sensitive to similar distractors. To address these issues, we propose a novel segmentation-assisted model that learns binary mask representations of targets. This model is generic and can be seamlessly integrated into various Siamese frameworks, enabling pixel-wise segmentation tracking instead of the suboptimal bounding box tracking. Specifically, our model features two core components: (i) a multi-stage precise mask representation module composed of cascaded U-Net decoders, designed to predict segmentation masks of targets, and (ii) a saliency localization head based on the Euclidean model, which extracts spatial position constraints to boost the decoder’s discriminative capability. Extensive experiments on five tracking benchmarks demonstrate that our method effectively improves the performance of both anchor-based and anchor-free Siamese trackers. Notably, on GOT-10k, our method increases the AO scores of the baseline trackers SiamRPN++ (anchor-based) and SiamBAN (anchor-free) by 5.2% and 7.5%, respectively while maintaining speeds exceeding 60 FPS. Full article
(This article belongs to the Special Issue Deep Learning Technology and Image Sensing: 2nd Edition)
Show Figures

Figure 1

19 pages, 2838 KB  
Article
Cascaded Spatial and Depth Attention UNet for Hippocampus Segmentation
by Zi-Zheng Wei, Bich-Thuy Vu, Maisam Abbas and Ran-Zan Wang
J. Imaging 2025, 11(9), 311; https://doi.org/10.3390/jimaging11090311 - 11 Sep 2025
Viewed by 929
Abstract
This study introduces a novel enhancement to the UNet architecture, termed Cascaded Spatial and Depth Attention U-Net (CSDA-UNet), tailored specifically for precise hippocampus segmentation in T1-weighted brain MRI scans. The proposed architecture integrates two key attention mechanisms: a Spatial Attention (SA) module, which [...] Read more.
This study introduces a novel enhancement to the UNet architecture, termed Cascaded Spatial and Depth Attention U-Net (CSDA-UNet), tailored specifically for precise hippocampus segmentation in T1-weighted brain MRI scans. The proposed architecture integrates two key attention mechanisms: a Spatial Attention (SA) module, which refines spatial feature representations by producing attention maps from the deepest convolutional layer and modulating the matching object features; and an Inter-Slice Attention (ISA) module, which enhances volumetric uniformity by integrating related information from adjacent slices, thereby reinforcing the model’s capacity to capture inter-slice dependencies. The CSDA-UNet is assessed using hippocampal segmentation data derived from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and Decathlon, two benchmark studies widely employed in neuroimaging research. The proposed model outperforms state-of-the-art methods, achieving a Dice coefficient of 0.9512 and an IoU score of 0.9345 on ADNI and Dice scores of 0.9907/0.8963 (train/validation) and an IoU score of 0.9816/0.8132 (train/validation) on the Decathlon dataset across multiple quantitative metrics. These improvements underscore the efficacy of the proposed dual-attention framework in accurately explaining small, asymmetrical structures such as the hippocampus, while maintaining computational efficiency suitable for clinical deployment. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

17 pages, 3666 KB  
Article
Efficient Retinal Vessel Segmentation with 78K Parameters
by Zhigao Zeng, Jiakai Liu, Xianming Huang, Kaixi Luo, Xinpan Yuan and Yanhui Zhu
J. Imaging 2025, 11(9), 306; https://doi.org/10.3390/jimaging11090306 - 8 Sep 2025
Cited by 1 | Viewed by 1116
Abstract
Retinal vessel segmentation is critical for early diagnosis of diabetic retinopathy, yet existing deep models often compromise accuracy for complexity. We propose DSAE-Net, a lightweight dual-stage network that addresses this challenge by (1) introducing a Parameterized Cascaded W-shaped Architecture enabling progressive feature refinement [...] Read more.
Retinal vessel segmentation is critical for early diagnosis of diabetic retinopathy, yet existing deep models often compromise accuracy for complexity. We propose DSAE-Net, a lightweight dual-stage network that addresses this challenge by (1) introducing a Parameterized Cascaded W-shaped Architecture enabling progressive feature refinement with only 1% of the parameters of a standard U-Net; (2) designing a novel Skeleton Distance Loss (SDL) that overcomes boundary loss limitations by leveraging vessel skeletons to handle severe class imbalance; (3) developing a Cross-modal Fusion Attention (CMFA) module combining group convolutions and dynamic weighting to effectively expand receptive fields; and (4) proposing Coordinate Attention Gates (CAGs) to optimize skip connections via directional feature reweighting. Evaluated extensively on DRIVE, CHASE_DB1, HRF, and STARE datasets, DSAE-Net significantly reduces computational complexity while outperforming state-of-the-art lightweight models in segmentation accuracy. Its efficiency and robustness make DSAE-Net particularly suitable for real-time diagnostics in resource-constrained clinical settings. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

22 pages, 1243 KB  
Article
ProCo-NET: Progressive Strip Convolution and Frequency- Optimized Framework for Scale-Gradient-Aware Semantic Segmentation in Off-Road Scenes
by Zihang Liu, Donglin Jing and Chenxiang Ji
Symmetry 2025, 17(9), 1428; https://doi.org/10.3390/sym17091428 - 2 Sep 2025
Viewed by 760
Abstract
In off-road scenes, segmentation targets exhibit significant scale progression due to perspective depth effects from oblique viewing angles, meaning that the size of the same target undergoes continuous, boundary-less progressive changes along a specific direction. This asymmetric variation disrupts the geometric symmetry of [...] Read more.
In off-road scenes, segmentation targets exhibit significant scale progression due to perspective depth effects from oblique viewing angles, meaning that the size of the same target undergoes continuous, boundary-less progressive changes along a specific direction. This asymmetric variation disrupts the geometric symmetry of targets, causing traditional segmentation networks to face three key challenges: (1) inefficientcapture of continuous-scale features, where pyramid structures and multi-scale kernels struggle to balance computational efficiency with sufficient coverage of progressive scales; (2) degraded intra-class feature consistency, where local scale differences within targets induce semantic ambiguity; and (3) loss of high-frequency boundary information, where feature sampling operations exacerbate the blurring of progressive boundaries. To address these issues, this paper proposes the ProCo-NET framework for systematic optimization. Firstly, a Progressive Strip Convolution Group (PSCG) is designed to construct multi-level receptive field expansion through orthogonally oriented strip convolution cascading (employing symmetric processing in horizontal/vertical directions) integrated with self-attention mechanisms, enhancing perception capability for asymmetric continuous-scale variations. Secondly, an Offset-Frequency Cooperative Module (OFCM) is developed wherein a learnable offset generator dynamically adjusts sampling point distributions to enhance intra-class consistency, while a dual-channel frequency domain filter performs adaptive high-pass filtering to sharpen target boundaries. These components synergistically solve feature consistency degradation and boundary ambiguity under asymmetric changes. Experiments show that this framework significantly improves the segmentation accuracy and boundary clarity of multi-scale targets in off-road scene segmentation tasks: it achieves 71.22% MIoU on the standard RUGD dataset (0.84% higher than the existing optimal method) and 83.05% MIoU on the Freiburg_Forest dataset. Among them, the segmentation accuracy of key obstacle categories is significantly improved to 52.04% (2.7% higher than the sub-optimal model). This framework effectively compensates for the impact of asymmetric deformation through a symmetric computing mechanism. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

21 pages, 6925 KB  
Article
U2-LFOR: A Two-Stage U2 Network for Light-Field Occlusion Removal
by Mostafa Farouk Senussi, Mahmoud Abdalla, Mahmoud SalahEldin Kasem, Mohamed Mahmoud and Hyun-Soo Kang
Mathematics 2025, 13(17), 2748; https://doi.org/10.3390/math13172748 - 26 Aug 2025
Cited by 1 | Viewed by 877
Abstract
Light-field (LF) imaging transforms occlusion removal by using multiview data to reconstruct hidden regions, overcoming the limitations of single-view methods. However, this advanced capability often comes at the cost of increased computational complexity. To overcome this, we propose the U2-LFOR network, [...] Read more.
Light-field (LF) imaging transforms occlusion removal by using multiview data to reconstruct hidden regions, overcoming the limitations of single-view methods. However, this advanced capability often comes at the cost of increased computational complexity. To overcome this, we propose the U2-LFOR network, an end-to-end neural network designed to remove occlusions in LF images without compromising performance, addressing the inherent complexity of LF imaging while ensuring practical applicability. The architecture employs Residual Atrous Spatial Pyramid Pooling (ResASPP) at the feature extractor to expand the receptive field, capture localized multiscale features, and enable deep feature learning with efficient aggregation. A two-stage U2-Net structure enhances hierarchical feature learning while maintaining a compact design, ensuring accurate context recovery. A dedicated refinement module, using two cascaded residual blocks (ResBlock), restores fine details to the occluded regions. Experimental results demonstrate its competitive performance, achieving an average Peak Signal-to-Noise Ratio (PSNR) of 29.27 dB and Structural Similarity Index Measure (SSIM) of 0.875, which are two widely used metrics for evaluating reconstruction fidelity and perceptual quality, on both synthetic and real-world LF datasets, confirming its effectiveness in accurate occlusion removal. Full article
Show Figures

Figure 1

22 pages, 3744 KB  
Article
Improved DeepLabV3+ for UAV-Based Highway Lane Line Segmentation
by Yueze Wang, Dudu Guo, Yang Wang, Hongbo Shuai, Zhuzhou Li and Jin Ran
Sustainability 2025, 17(16), 7317; https://doi.org/10.3390/su17167317 - 13 Aug 2025
Cited by 3 | Viewed by 1061
Abstract
Sustainable highway infrastructure maintenance critically depends on precise lane line detection, yet conventional inspection approaches remain resource-depleting, carbon-intensive, and hazardous to personnel. To mitigate these constraints and address the low accuracy and high parameterization of existing models, this study utilizes unmanned aerial vehicle [...] Read more.
Sustainable highway infrastructure maintenance critically depends on precise lane line detection, yet conventional inspection approaches remain resource-depleting, carbon-intensive, and hazardous to personnel. To mitigate these constraints and address the low accuracy and high parameterization of existing models, this study utilizes unmanned aerial vehicle (UAV) imagery and proposes a UAV-based highway lane line segmentation method using an improved DeepLabV3+ model that resolves multi-scale lane line segmentation challenges in UAV imagery. MobileNetV2 is used as the backbone network to significantly reduce the number of model parameters. The Squeeze-and-Excitation (SE) attention mechanism is integrated to enhance feature extraction capabilities, particularly at lane line edges. A Feature Pyramid Network (FPN) is incorporated to improve multi-scale lane line feature extraction. We introduce a novel Waterfall Atrous Spatial Pyramid Pooling (WASPP) module, utilizing cascaded atrous convolutions with strategic dilation rate adjustments to progressively expand the receptive field and aggregate contextual information across scales. The improved model outperforms the original DeepLabV3+ by 5.04% mIoU (85.30% vs. 80.26%) and 3.35% F1-Score (91.74% vs. 88.39%) while cutting parameters by 85% (8.03 M vs. 54.8 M) and reducing training time by 2 h 50 min, thereby enhancing the model’s accuracy in lane line segmentation, reducing the number of parameters, and lowering the carbon footprint. Full article
(This article belongs to the Section Sustainable Transportation)
Show Figures

Figure 1

13 pages, 1574 KB  
Article
Multi-Stage Cascaded Deep Learning-Based Model for Acute Aortic Syndrome Detection: A Multisite Validation Study
by Joseph Chang, Kuan-Jung Lee, Ti-Hao Wang and Chung-Ming Chen
J. Clin. Med. 2025, 14(13), 4797; https://doi.org/10.3390/jcm14134797 - 7 Jul 2025
Cited by 1 | Viewed by 1333
Abstract
Background: Acute Aortic Syndrome (AAS), encompassing aortic dissection (AD), intramural hematoma (IMH), and penetrating atherosclerotic ulcer (PAU), presents diagnostic challenges due to its varied manifestations and the critical need for rapid assessment. Methods: We developed a multi-stage deep learning model trained [...] Read more.
Background: Acute Aortic Syndrome (AAS), encompassing aortic dissection (AD), intramural hematoma (IMH), and penetrating atherosclerotic ulcer (PAU), presents diagnostic challenges due to its varied manifestations and the critical need for rapid assessment. Methods: We developed a multi-stage deep learning model trained on chest computed tomography angiography (CTA) scans. The model utilizes a U-Net architecture for aortic segmentation, followed by a cascaded classification approach for detecting AD and IMH, and a multiscale CNN for identifying PAU. External validation was conducted on 260 anonymized CTA scans from 14 U.S. clinical sites, encompassing data from four different CT manufacturers. Performance metrics, including sensitivity, specificity, and area under the receiver operating characteristic curve (AUC), were calculated with 95% confidence intervals (CIs) using Wilson’s method. Model performance was compared against predefined benchmarks. Results: The model achieved a sensitivity of 0.94 (95% CI: 0.88–0.97), specificity of 0.93 (95% CI: 0.89–0.97), and an AUC of 0.96 (95% CI: 0.94–0.98) for overall AAS detection, with p-values < 0.001 when compared to the 0.80 benchmark. Subgroup analyses demonstrated consistent performance across different patient demographics, CT manufacturers, slice thicknesses, and anatomical locations. Conclusions: This deep learning model effectively detects the full spectrum of AAS across diverse populations and imaging platforms, suggesting its potential utility in clinical settings to enable faster triage and expedite patient management. Full article
(This article belongs to the Section Nuclear Medicine & Radiology)
Show Figures

Figure 1

21 pages, 3582 KB  
Article
A Cascade of Encoder–Decoder with Atrous Convolution and Ensemble Deep Convolutional Neural Networks for Tuberculosis Detection
by Noppadol Maneerat, Athasart Narkthewan and Kazuhiko Hamamoto
Appl. Sci. 2025, 15(13), 7300; https://doi.org/10.3390/app15137300 - 28 Jun 2025
Cited by 1 | Viewed by 800
Abstract
Tuberculosis (TB) is the most serious worldwide infectious disease and the leading cause of death among people with HIV. Early diagnosis and prompt treatment can cut off the rising number of TB deaths, and analysis of chest X-rays is a cost-effective method. We [...] Read more.
Tuberculosis (TB) is the most serious worldwide infectious disease and the leading cause of death among people with HIV. Early diagnosis and prompt treatment can cut off the rising number of TB deaths, and analysis of chest X-rays is a cost-effective method. We describe a deep learning-based cascade algorithm for detecting TB in chest X-rays. Firstly, the lung regions were segregated from other anatomical structures by an encoder–decoder with an atrous separable convolution network—DeepLabv3+ with an XceptionNet backbone, DLabv3+X, and then cropped by a bounding box. Using the cropped lung images, we trained several pre-trained Deep Convolutional Neural Networks (DCNNs) on the images with hyperparameters optimized by a Bayesian algorithm. Different combinations of trained DCNNs were compared, and the combination with the maximum accuracy was retained as the winning combination. The ensemble classifier was designed to predict the presence of TB by fusing DCNNs from the winning combination via weighted averaging. Our lung segmentation was evaluated on three publicly available datasets: it provided better Intercept over Union (IoU) values: 95.1% for Montgomery County (MC), 92.8% for Shenzhen (SZ), and 96.1% for JSRT datasets. For TB prediction, our ensemble classifier produced a better accuracy of 92.7% for the MC dataset and obtained a comparable accuracy of 95.5% for the SZ dataset. Finally, occlusion sensitivity and gradient-weighted class activation maps (Grad-CAM) were generated to indicate the most influential regions for the prediction of TB and to localize TB manifestations. Full article
(This article belongs to the Special Issue Advances in Deep Learning and Intelligent Computing)
Show Figures

Figure 1

23 pages, 4215 KB  
Article
Drought Stress Grading Model for Apple Rootstock Softwood Cuttings Based on the CU-ICA-Net
by Xu Wang, Pengfei Wang, Jianping Li, Hongjie Liu and Xin Yang
Agronomy 2025, 15(7), 1508; https://doi.org/10.3390/agronomy15071508 - 21 Jun 2025
Viewed by 703
Abstract
In order to maintain adequate hydration of apple rootstock softwood cuttings during the initial stage of cutting, a drought stress grading model based on machine vision was designed. This model was optimized based on the U-Net (U-shaped Neural Network), and the petiole morphology [...] Read more.
In order to maintain adequate hydration of apple rootstock softwood cuttings during the initial stage of cutting, a drought stress grading model based on machine vision was designed. This model was optimized based on the U-Net (U-shaped Neural Network), and the petiole morphology of the cuttings was used as the basis for classifying the drought stress levels. For the CU-ICA-Net model, which is obtained by improving U-Net with the ICA (Improved Coordinate Attention) module designed using a cascaded structure and dynamic convolution, the average accuracy rate of the predictions for the three parts of the cuttings, namely the leaf, stem, and petiole, is 93.37%. The R2 values of the prediction results for the petiole curvature k and the angle α between the petiole and the stem are 0.8109 and 0.8123, respectively. The dataset used for model training consists of 1200 RGB images of cuttings under different grades of drought stress. The ratio of the training set to the test set is 1:0.7. A humidification test was carried out using an automatic humidification system equipped with this model. The MIoU (Mean Intersection over Union) value is 0.913, and the FPS (Frames Per Second) value is 31.90. The test results prove that the improved U-Net model has excellent performance, providing a method for the design of an automatic humidification control system for industrialized cutting propagation of apple rootstocks. Full article
(This article belongs to the Section Precision and Digital Agriculture)
Show Figures

Figure 1

18 pages, 2325 KB  
Article
Enhanced Rail Surface Defect Segmentation Using Polarization Imaging and Dual-Stream Feature Fusion
by Yucheng Pan, Jiasi Chen, Peiwen Wu, Hongsheng Zhong, Zihao Deng and Daozong Sun
Sensors 2025, 25(11), 3546; https://doi.org/10.3390/s25113546 - 4 Jun 2025
Viewed by 1308
Abstract
Rail surface defects pose significant risks to the operational efficiency and safety of industrial equipment. Traditional visual defect detection methods typically rely on high-quality RGB images; however, they struggle in low-light conditions due to small, low-contrast defects that blend into complex backgrounds. Therefore, [...] Read more.
Rail surface defects pose significant risks to the operational efficiency and safety of industrial equipment. Traditional visual defect detection methods typically rely on high-quality RGB images; however, they struggle in low-light conditions due to small, low-contrast defects that blend into complex backgrounds. Therefore, this paper proposes a novel defect segmentation method leveraging a dual-stream feature fusion network that combines polarization images with DeepLabV3+. The approach utilizes the pruned MobileNetV3 as the backbone network, incorporating a coordinate attention mechanism for feature extraction. This reduces the number of model parameters and enhances computational efficiency. The dual-stream module implements cascade and addition strategies to effectively merge shallow and deep features from both the original and polarization images. This enhances the detection of low-contrast defects in complex backgrounds. Furthermore, the CBAM is integrated into the decoding area to refine feature fusion and mitigate the issue of missing small-target defects. Experimental results demonstrate that the enhanced DeepLabV3+ model outperforms existing models such as U-Net, PSPNet, and the original DeepLabV3+ in terms of MIoU and MPA metrics, achieving 73.00% and 80.59%, respectively. The comprehensive detection accuracy reaches 97.82%, meeting the demanding requirements for effective rail surface defect detection. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

23 pages, 6510 KB  
Article
MAMNet: Lightweight Multi-Attention Collaborative Network for Fine-Grained Cropland Extraction from Gaofen-2 Remote Sensing Imagery
by Jiayong Wu, Xue Ding, Jinliang Wang and Jiya Pan
Agriculture 2025, 15(11), 1152; https://doi.org/10.3390/agriculture15111152 - 27 May 2025
Viewed by 939
Abstract
To address the issues of high computational complexity and boundary feature loss encountered when extracting farmland information from high-resolution remote sensing images, this study proposes an innovative CNN–Transformer hybrid network, MAMNet. This framework integrates a lightweight encoder, a global–local Transformer decoder, and a [...] Read more.
To address the issues of high computational complexity and boundary feature loss encountered when extracting farmland information from high-resolution remote sensing images, this study proposes an innovative CNN–Transformer hybrid network, MAMNet. This framework integrates a lightweight encoder, a global–local Transformer decoder, and a bidirectional attention architecture to achieve efficient and accurate farmland information extraction. First, we reconstruct the ResNet-18 backbone network using deep separable convolutions, reducing computational complexity while preserving feature representation capabilities. Second, the global–local Transformer block (GLTB) decoder uses multi-head self-attention mechanisms to dynamically fuse multi-scale features across layers, effectively restoring the topological structure of fragmented farmland boundaries. Third, we propose a novel bidirectional attention architecture: the Detail Improvement Module (DIM) uses channel attention to transfer semantic features to geometric features. The Context Enhancement Module (CEM) utilizes spatial attention to achieve dynamic geometric–semantic fusion, quantitatively distinguishing farmland textures from mixed ground cover. The positional attention mechanism (PAM) enhances the continuity of linear features by strengthening spatial correlations in jump connections. By cascading front-end feature module (FEM) to expand the receptive field and combining an adaptive feature reconstruction head (FRH), this method improves information integrity in fragmented areas. Evaluation results on the 2022 high-resolution two-channel image dataset from Chenggong District, Kunming City, demonstrate that MAMNet achieves an mIoU of 86.68% (an improvement of 1.66% and 2.44% over UNetFormer and BANet, respectively) and an F1-Score of 92.86% with only 12 million parameters. This method provides new technical insights for plot-level farmland monitoring in precision agriculture. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

24 pages, 3819 KB  
Article
SF-UNet: An Adaptive Cross-Level Residual Cascade for Forest Hyperspectral Image Classification Algorithm by Fusing SpectralFormer and U-Net
by Xinggui Xu, Xuyang Li, Xiangsuo Fan, Qi Li, Hong Li and Haotian Yu
Forests 2025, 16(5), 858; https://doi.org/10.3390/f16050858 - 20 May 2025
Cited by 1 | Viewed by 772
Abstract
Traditional deep learning algorithms struggle to effectively utilize local spectral info in forest HS images and adequately capture subtle feature differences, often causing model confusion and misclassification. To tackle these issues, we present SF-UNet, a novel pixel-level classification network for forest HS images. [...] Read more.
Traditional deep learning algorithms struggle to effectively utilize local spectral info in forest HS images and adequately capture subtle feature differences, often causing model confusion and misclassification. To tackle these issues, we present SF-UNet, a novel pixel-level classification network for forest HS images. It integrates the strengths of SpectralFormer and U-Net. First, the HGSE module generates semicomponent spectral nesting, strengthening local info element connections via spectral embedding. Next, the CAM within SpectralFormer serves as an auxiliary U-Net encoder. This allows cross-level jump connections and cascading through interlayer soft residuals, enhancing feature representation via cross-regional adaptive learning. Finally, the U-Net decoder is used for pixel-level classification. Experiments on forest Sentinel-2 data show that SF-UNet outperforms mainstream frameworks. While Vision Transformer has an 88.29% classification accuracy, SF-UNet achieves 95.28%, a significant 6.99% improvement. Moreover, SF-UNet excels in land cover change analysis using multi-temporal Sentinel-2 images. It can accurately capture subtle land use changes and maintain classification consistency across seasons and years. These results highlight SF-UNet’s effectiveness in forest remote sensing image classification and its potential application value in deep learning-based forest HS remote sensing image classification research. Full article
Show Figures

Figure 1

Back to TopTop