MDPI - Publisher of Open Access Journals

17 pages, 2484 KB

Open AccessArticle

A Polyp Segmentation Algorithm Based on Local Enhancement and Attention Mechanism

by Lanxi Fan and Yu Jiang

Mathematics 2025, 13(12), 1925; https://doi.org/10.3390/math13121925 - 9 Jun 2025

Viewed by 898

Accurate polyp segmentation plays a vital role in the early detection and prevention of colorectal cancer. However, the diverse shapes, blurred boundaries, and varying sizes of polyps present significant challenges for automatic segmentation. Existing methods often struggle with effective local feature extraction and modeling long-range dependencies. To overcome these limitations, this paper proposes PolypFormer, which incorporates a local information enhancement module (LIEM) utilizing multi-kernel self-selective attention to better capture texture features, alongside dense channel attention to facilitate more effective feature fusion. Furthermore, a novel cross-shaped windows self-attention mechanism is introduced and integrated into the Transformer architecture to enhance the semantic understanding of polyp regions. Experimental results on five datasets show that the proposed method has good performance in polyp segmentation. On Kvasir-SEG datasets, mDice and mIoU reach 0.920 and 0.886, respectively. Full article

(This article belongs to the Special Issue Modern Methods and Applications Related to Integrable Systems)

► Show Figures

Figure 1

24 pages, 7057 KB

Open AccessArticle

Construction and Enhancement of a Rural Road Instance Segmentation Dataset Based on an Improved StyleGAN2-ADA

by Zhixin Yao, Renna Xi, Taihong Zhang, Yunjie Zhao, Yongqiang Tian and Wenjing Hou

Sensors 2025, 25(8), 2477; https://doi.org/10.3390/s25082477 - 15 Apr 2025

Cited by 2 | Viewed by 682

Abstract

With the advancement of agricultural automation, the demand for road recognition and understanding in agricultural machinery autonomous driving systems has significantly increased. To address the scarcity of instance segmentation data for rural roads and rural unstructured scenes, particularly the lack of support for high-resolution and fine-grained classification, a 20-class instance segmentation dataset was constructed, comprising 10,062 independently annotated instances. An improved StyleGAN2-ADA data augmentation method was proposed to generate higher-quality image data. This method incorporates a decoupled mapping network (DMN) to reduce the coupling degree of latent codes in W-space and integrates the advantages of convolutional networks and transformers by designing a convolutional coupling transfer block (CCTB). The core cross-shaped window self-attention mechanism in the CCTB enhances the network’s ability to capture complex contextual information and spatial layouts. Ablation experiments comparing the improved and original StyleGAN2-ADA networks demonstrate significant improvements, with the inception score (IS) increasing from 42.38 to 77.31 and the Fréchet inception distance (FID) decreasing from 25.09 to 12.42, indicating a notable enhancement in data generation quality and authenticity. In order to verify the effect of data enhancement on the model performance, the algorithms Mask R-CNN, SOLOv2, YOLOv8n, and OneFormer were tested to compare the performance difference between the original dataset and the enhanced dataset, which further confirms the effectiveness of the improved module. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

20 pages, 2388 KB

Open AccessArticle

The Spectrum Difference Enhanced Network for Hyperspectral Anomaly Detection

by Shaohua Liu, Huibo Guo, Shiwen Gao and Wuxia Zhang

Remote Sens. 2024, 16(23), 4518; https://doi.org/10.3390/rs16234518 - 2 Dec 2024

Cited by 1 | Viewed by 1805

Abstract

Most deep learning-based hyperspectral anomaly detection (HAD) methods focus on modeling or reconstructing the hyperspectral background to obtain residual maps from the original hyperspectral images. However, these methods typically do not pay enough attention to the spectral similarity in the complex environment, resulting in inadequate distinction between background and anomalies. Moreover, some anomalies and background are different objects, but they are sometimes recognized as the objects with the same spectrum. To address the issues mentioned above, this paper proposes a Spectrum Difference Enhanced Network (SDENet) for HAD, which employs variational mapping and Transformer to amplify spectrum differences. The proposed network is based on the encoder–decoder structure, which contains a CSWin-Transformer encoder, Variational Mapping Module (VMModule), and CSWin-Transformer decoder. First, the CSWin-Transformer encoder and decoder are designed to supplement image information by extracting deep and semantic features, where a cross-shaped window self-attention mechanism is designed to provide strong modeling capability with minimal computational cost. Second, in order to enhance the spectral difference characteristics between anomalies and background, a randomly sampling VMModule is presented for feature space transformation. Finally, all fully connected mapping operations are replaced with convolutional layers to reduce the model parameters and computational load. The effectiveness of the proposed SDENet is verified on three datasets, and experimental results show that it achieves better detection accuracy and lower model complexity compared with existing methods. Full article

(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)

► Show Figures

Figure 1

18 pages, 3376 KB

Open AccessArticle

Road Extraction Method of Remote Sensing Image Based on Deformable Attention Transformer

by Ling Zhao, Jianing Zhang, Xiujun Meng, Wenming Zhou, Zhenshi Zhang and Chengli Peng

Symmetry 2024, 16(4), 468; https://doi.org/10.3390/sym16040468 - 12 Apr 2024

Cited by 5 | Viewed by 2028

Abstract

Road extraction is a typical task in the semantic segmentation of remote sensing images, and one of the most efficient techniques for solving this task in recent years is the vision transformer technique. However, roads typically exhibit features such as uneven scales and low signal-to-noise ratios, which can be understood as the asymmetry between the road and the background category and the asymmetry in the transverse and longitudinal shape of the road. Existing vision transformer models, due to their fixed sliding window mechanism, cannot adapt to the uneven scale issue of roads. Additionally, self-attention, based on fully connected mechanisms for long sequences, may suffer from attention deviation due to excessive noise, making it unsuitable for low signal-to-noise ratio scenarios in road segmentation, resulting in incomplete and fragmented road segmentation results. In this paper, we propose a road extraction based on deformable self-attention computation, termed DOCswin-Trans (Deformable and Overlapped Cross-Window Transformer), to solve these problems. On the one hand, we develop a DOC-Transformer block to address the scale imbalance issue, which can utilize the overlapped window strategy to preserve the overall contextual semantic information of roads as much as possible. On the other hand, we propose a deformable window strategy to adaptively resample input vectors, which can direct attention automatically to the foreground areas relevant to roads and thereby address the low signal-to-noise ratio problem. We evaluate the proposed method on two popular road extraction datasets (i.e., DeepGlobe and Massachusetts datasets). The experimental results demonstrate that the proposed method outperforms baseline methods. On the DeepGlobe dataset, the proposed method achieves an IoU improvement ranging from 0.63% to 5.01% compared to baseline methods. On the Massachusetts dataset, our method achieves an IoU improvement ranging from 0.50% to 6.24% compared to baseline methods. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

21 pages, 2802 KB

Open AccessArticle

Two-Level Spatio-Temporal Feature Fused Two-Stream Network for Micro-Expression Recognition

by Zebiao Wang, Mingyu Yang, Qingbin Jiao, Liang Xu, Bing Han, Yuhang Li and Xin Tan

Sensors 2024, 24(5), 1574; https://doi.org/10.3390/s24051574 - 29 Feb 2024

Cited by 28 | Viewed by 3048

Abstract

Micro-expressions, which are spontaneous and difficult to suppress, reveal a person’s true emotions. They are characterized by short duration and low intensity, making the task of micro-expression recognition challenging in the field of emotion computing. In recent years, deep learning-based feature extraction and fusion techniques have been widely used for micro-expression recognition, particularly methods based on Vision Transformer that have gained popularity. However, the Vision Transformer-based architecture used in micro-expression recognition involves a significant amount of invalid computation. Additionally, in the traditional two-stream architecture, although separate streams are combined through late fusion, only the output features from the deepest level of the network are utilized for classification, thus limiting the network’s ability to capture subtle details due to the lack of fine-grained information. To address these issues, we propose a new two-level spatio-temporal feature fused with a two-stream architecture. This architecture includes a spatial encoder (modified ResNet) for learning texture features of the face, a temporal encoder (Swin Transformer) for learning facial muscle motor features, a feature fusion algorithm for integrating multi-level spatio-temporal features, a classification head, and a weighted average operator for temporal aggregation. The two-stream architecture has the advantage of extracting richer features compared to the single-stream architecture, leading to improved performance. The shifted window scheme of Swin Transformer restricts self-attention computation to non-overlapping local windows and allows cross-window connections, significantly improving the performance and reducing the computation compared to Vision Transformer. Moreover, the modified ResNet is computationally less intensive. Our proposed feature fusion algorithm leverages the similarity in output feature shapes at each stage of the two streams, enabling the effective fusion of multi-level spatio-temporal features. This algorithm results in an improvement of approximately 4% in both the F1 score and the UAR. Comprehensive evaluations conducted on three widely used spontaneous micro-expression datasets (SMIC-HS, CASME II, and SAMM) consistently demonstrate the superiority of our approach over comparative methods. Notably, our approach achieves a UAR exceeding 0.905 on CASME II, making it one of the few frameworks in the published micro-expression recognition literature to achieve such high performance. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

16 pages, 4480 KB

Open AccessArticle

Win-Former: Window-Based Transformer for Maize Plant Point Cloud Semantic Segmentation

by Yu Sun, Xindong Guo and Hua Yang

Agronomy 2023, 13(11), 2723; https://doi.org/10.3390/agronomy13112723 - 29 Oct 2023

Cited by 10 | Viewed by 2058

Abstract

Semantic segmentation of plant point clouds is essential for high-throughput phenotyping systems, while existing methods still struggle to balance efficiency and performance. Recently, the Transformer architecture has revolutionized the area of computer vision, and has potential for processing 3D point clouds. Applying the Transformer for semantic segmentation of 3D plant point clouds remains a challenge. To this end, we propose a novel window-based Transformer (Win-Former) network for maize 3D organic segmentation. First, we pre-processed the Pheno4D maize point cloud dataset for training. The maize points were then projected onto a sphere surface, and a window partition mechanism was proposed to construct windows into which points were distributed evenly. After that, we employed local self-attention within windows for computing the relationship of points. To strengthen the windows’ connection, we introduced a Cross-Window self-attention (C-SA) module to gather the cross-window features by moving entire windows along the sphere. The results demonstrate that Win-Former outperforms the famous networks and obtains 83.45% mIoU with the lowest latency of 31 s on maize organ segmentation. We perform extensive experiments on ShapeNet to evaluate stability and robustness, and our proposed model achieves competitive results on part segmentation tasks. Thus, our Win-Former model effectively and efficiently segments the maize point cloud and provides technical support for automated plant phenotyping analysis. Full article

(This article belongs to the Special Issue Computer Vision and Deep Learning Technology in Agriculture)

► Show Figures

Figure 1

25 pages, 9268 KB

Open AccessArticle

UATNet: U-Shape Attention-Based Transformer Net for Meteorological Satellite Cloud Recognition

by Zhanjie Wang, Jianghua Zhao, Ran Zhang, Zheng Li, Qinghui Lin and Xuezhi Wang

Remote Sens. 2022, 14(1), 104; https://doi.org/10.3390/rs14010104 - 26 Dec 2021

Cited by 40 | Viewed by 5070

Abstract

Cloud recognition is a basic task in ground meteorological observation. It is of great significance to accurately identify cloud types from long-time-series satellite cloud images for improving the reliability and accuracy of weather forecasting. However, different from ground-based cloud images with a small observation range and easy operation, satellite cloud images have a wider cloud coverage area and contain more surface features. Hence, it is difficult to effectively extract the structural shape, area size, contour shape, hue, shadow and texture of clouds through traditional deep learning methods. In order to analyze the regional cloud type characteristics effectively, we construct a China region meteorological satellite cloud image dataset named CRMSCD, which consists of nine cloud types and the clear sky (cloudless). In this paper, we propose a novel neural network model, UATNet, which can realize the pixel-level classification of meteorological satellite cloud images. Our model efficiently integrates the spatial and multi-channel information of clouds. Specifically, several transformer blocks with modified self-attention computation (swin transformer blocks) and patch merging operations are used to build a hierarchical transformer, and spatial displacement is introduced to construct long-distance cross-window connections. In addition, we introduce a Channel Cross fusion with Transformer (CCT) to guide the multi-scale channel fusion, and design an Attention-based Squeeze and Excitation (ASE) to effectively connect the fused multi-scale channel information to the decoder features. The experimental results demonstrate that the proposed model achieved 82.33% PA, 67.79% MPA, 54.51% MIoU and 70.96% FWIoU on CRMSCD. Compared with the existing models, our method produces more precise segmentation performance, which demonstrates its superiority on meteorological satellite cloud recognition tasks. Full article

(This article belongs to the Special Issue Deep Learning-Based Cloud Detection for Remote Sensing Images)

► Show Figures

Figure 1

Search Results (7)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (7)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI