Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (7)

Search Parameters:
Keywords = cross-shaped windows self-attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 2484 KB  
Article
A Polyp Segmentation Algorithm Based on Local Enhancement and Attention Mechanism
by Lanxi Fan and Yu Jiang
Mathematics 2025, 13(12), 1925; https://doi.org/10.3390/math13121925 - 9 Jun 2025
Viewed by 898
Abstract
Accurate polyp segmentation plays a vital role in the early detection and prevention of colorectal cancer. However, the diverse shapes, blurred boundaries, and varying sizes of polyps present significant challenges for automatic segmentation. Existing methods often struggle with effective local feature extraction and [...] Read more.
Accurate polyp segmentation plays a vital role in the early detection and prevention of colorectal cancer. However, the diverse shapes, blurred boundaries, and varying sizes of polyps present significant challenges for automatic segmentation. Existing methods often struggle with effective local feature extraction and modeling long-range dependencies. To overcome these limitations, this paper proposes PolypFormer, which incorporates a local information enhancement module (LIEM) utilizing multi-kernel self-selective attention to better capture texture features, alongside dense channel attention to facilitate more effective feature fusion. Furthermore, a novel cross-shaped windows self-attention mechanism is introduced and integrated into the Transformer architecture to enhance the semantic understanding of polyp regions. Experimental results on five datasets show that the proposed method has good performance in polyp segmentation. On Kvasir-SEG datasets, mDice and mIoU reach 0.920 and 0.886, respectively. Full article
(This article belongs to the Special Issue Modern Methods and Applications Related to Integrable Systems)
Show Figures

Figure 1

24 pages, 7057 KB  
Article
Construction and Enhancement of a Rural Road Instance Segmentation Dataset Based on an Improved StyleGAN2-ADA
by Zhixin Yao, Renna Xi, Taihong Zhang, Yunjie Zhao, Yongqiang Tian and Wenjing Hou
Sensors 2025, 25(8), 2477; https://doi.org/10.3390/s25082477 - 15 Apr 2025
Cited by 2 | Viewed by 682
Abstract
With the advancement of agricultural automation, the demand for road recognition and understanding in agricultural machinery autonomous driving systems has significantly increased. To address the scarcity of instance segmentation data for rural roads and rural unstructured scenes, particularly the lack of support for [...] Read more.
With the advancement of agricultural automation, the demand for road recognition and understanding in agricultural machinery autonomous driving systems has significantly increased. To address the scarcity of instance segmentation data for rural roads and rural unstructured scenes, particularly the lack of support for high-resolution and fine-grained classification, a 20-class instance segmentation dataset was constructed, comprising 10,062 independently annotated instances. An improved StyleGAN2-ADA data augmentation method was proposed to generate higher-quality image data. This method incorporates a decoupled mapping network (DMN) to reduce the coupling degree of latent codes in W-space and integrates the advantages of convolutional networks and transformers by designing a convolutional coupling transfer block (CCTB). The core cross-shaped window self-attention mechanism in the CCTB enhances the network’s ability to capture complex contextual information and spatial layouts. Ablation experiments comparing the improved and original StyleGAN2-ADA networks demonstrate significant improvements, with the inception score (IS) increasing from 42.38 to 77.31 and the Fréchet inception distance (FID) decreasing from 25.09 to 12.42, indicating a notable enhancement in data generation quality and authenticity. In order to verify the effect of data enhancement on the model performance, the algorithms Mask R-CNN, SOLOv2, YOLOv8n, and OneFormer were tested to compare the performance difference between the original dataset and the enhanced dataset, which further confirms the effectiveness of the improved module. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

20 pages, 2388 KB  
Article
The Spectrum Difference Enhanced Network for Hyperspectral Anomaly Detection
by Shaohua Liu, Huibo Guo, Shiwen Gao and Wuxia Zhang
Remote Sens. 2024, 16(23), 4518; https://doi.org/10.3390/rs16234518 - 2 Dec 2024
Cited by 1 | Viewed by 1805
Abstract
Most deep learning-based hyperspectral anomaly detection (HAD) methods focus on modeling or reconstructing the hyperspectral background to obtain residual maps from the original hyperspectral images. However, these methods typically do not pay enough attention to the spectral similarity in the complex environment, resulting [...] Read more.
Most deep learning-based hyperspectral anomaly detection (HAD) methods focus on modeling or reconstructing the hyperspectral background to obtain residual maps from the original hyperspectral images. However, these methods typically do not pay enough attention to the spectral similarity in the complex environment, resulting in inadequate distinction between background and anomalies. Moreover, some anomalies and background are different objects, but they are sometimes recognized as the objects with the same spectrum. To address the issues mentioned above, this paper proposes a Spectrum Difference Enhanced Network (SDENet) for HAD, which employs variational mapping and Transformer to amplify spectrum differences. The proposed network is based on the encoder–decoder structure, which contains a CSWin-Transformer encoder, Variational Mapping Module (VMModule), and CSWin-Transformer decoder. First, the CSWin-Transformer encoder and decoder are designed to supplement image information by extracting deep and semantic features, where a cross-shaped window self-attention mechanism is designed to provide strong modeling capability with minimal computational cost. Second, in order to enhance the spectral difference characteristics between anomalies and background, a randomly sampling VMModule is presented for feature space transformation. Finally, all fully connected mapping operations are replaced with convolutional layers to reduce the model parameters and computational load. The effectiveness of the proposed SDENet is verified on three datasets, and experimental results show that it achieves better detection accuracy and lower model complexity compared with existing methods. Full article
(This article belongs to the Special Issue Artificial Intelligence Remote Sensing for Earth Observation)
Show Figures

Figure 1

18 pages, 3376 KB  
Article
Road Extraction Method of Remote Sensing Image Based on Deformable Attention Transformer
by Ling Zhao, Jianing Zhang, Xiujun Meng, Wenming Zhou, Zhenshi Zhang and Chengli Peng
Symmetry 2024, 16(4), 468; https://doi.org/10.3390/sym16040468 - 12 Apr 2024
Cited by 5 | Viewed by 2028
Abstract
Road extraction is a typical task in the semantic segmentation of remote sensing images, and one of the most efficient techniques for solving this task in recent years is the vision transformer technique. However, roads typically exhibit features such as uneven scales and [...] Read more.
Road extraction is a typical task in the semantic segmentation of remote sensing images, and one of the most efficient techniques for solving this task in recent years is the vision transformer technique. However, roads typically exhibit features such as uneven scales and low signal-to-noise ratios, which can be understood as the asymmetry between the road and the background category and the asymmetry in the transverse and longitudinal shape of the road. Existing vision transformer models, due to their fixed sliding window mechanism, cannot adapt to the uneven scale issue of roads. Additionally, self-attention, based on fully connected mechanisms for long sequences, may suffer from attention deviation due to excessive noise, making it unsuitable for low signal-to-noise ratio scenarios in road segmentation, resulting in incomplete and fragmented road segmentation results. In this paper, we propose a road extraction based on deformable self-attention computation, termed DOCswin-Trans (Deformable and Overlapped Cross-Window Transformer), to solve these problems. On the one hand, we develop a DOC-Transformer block to address the scale imbalance issue, which can utilize the overlapped window strategy to preserve the overall contextual semantic information of roads as much as possible. On the other hand, we propose a deformable window strategy to adaptively resample input vectors, which can direct attention automatically to the foreground areas relevant to roads and thereby address the low signal-to-noise ratio problem. We evaluate the proposed method on two popular road extraction datasets (i.e., DeepGlobe and Massachusetts datasets). The experimental results demonstrate that the proposed method outperforms baseline methods. On the DeepGlobe dataset, the proposed method achieves an IoU improvement ranging from 0.63% to 5.01% compared to baseline methods. On the Massachusetts dataset, our method achieves an IoU improvement ranging from 0.50% to 6.24% compared to baseline methods. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

21 pages, 2802 KB  
Article
Two-Level Spatio-Temporal Feature Fused Two-Stream Network for Micro-Expression Recognition
by Zebiao Wang, Mingyu Yang, Qingbin Jiao, Liang Xu, Bing Han, Yuhang Li and Xin Tan
Sensors 2024, 24(5), 1574; https://doi.org/10.3390/s24051574 - 29 Feb 2024
Cited by 28 | Viewed by 3048
Abstract
Micro-expressions, which are spontaneous and difficult to suppress, reveal a person’s true emotions. They are characterized by short duration and low intensity, making the task of micro-expression recognition challenging in the field of emotion computing. In recent years, deep learning-based feature extraction and [...] Read more.
Micro-expressions, which are spontaneous and difficult to suppress, reveal a person’s true emotions. They are characterized by short duration and low intensity, making the task of micro-expression recognition challenging in the field of emotion computing. In recent years, deep learning-based feature extraction and fusion techniques have been widely used for micro-expression recognition, particularly methods based on Vision Transformer that have gained popularity. However, the Vision Transformer-based architecture used in micro-expression recognition involves a significant amount of invalid computation. Additionally, in the traditional two-stream architecture, although separate streams are combined through late fusion, only the output features from the deepest level of the network are utilized for classification, thus limiting the network’s ability to capture subtle details due to the lack of fine-grained information. To address these issues, we propose a new two-level spatio-temporal feature fused with a two-stream architecture. This architecture includes a spatial encoder (modified ResNet) for learning texture features of the face, a temporal encoder (Swin Transformer) for learning facial muscle motor features, a feature fusion algorithm for integrating multi-level spatio-temporal features, a classification head, and a weighted average operator for temporal aggregation. The two-stream architecture has the advantage of extracting richer features compared to the single-stream architecture, leading to improved performance. The shifted window scheme of Swin Transformer restricts self-attention computation to non-overlapping local windows and allows cross-window connections, significantly improving the performance and reducing the computation compared to Vision Transformer. Moreover, the modified ResNet is computationally less intensive. Our proposed feature fusion algorithm leverages the similarity in output feature shapes at each stage of the two streams, enabling the effective fusion of multi-level spatio-temporal features. This algorithm results in an improvement of approximately 4% in both the F1 score and the UAR. Comprehensive evaluations conducted on three widely used spontaneous micro-expression datasets (SMIC-HS, CASME II, and SAMM) consistently demonstrate the superiority of our approach over comparative methods. Notably, our approach achieves a UAR exceeding 0.905 on CASME II, making it one of the few frameworks in the published micro-expression recognition literature to achieve such high performance. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

16 pages, 4480 KB  
Article
Win-Former: Window-Based Transformer for Maize Plant Point Cloud Semantic Segmentation
by Yu Sun, Xindong Guo and Hua Yang
Agronomy 2023, 13(11), 2723; https://doi.org/10.3390/agronomy13112723 - 29 Oct 2023
Cited by 10 | Viewed by 2058
Abstract
Semantic segmentation of plant point clouds is essential for high-throughput phenotyping systems, while existing methods still struggle to balance efficiency and performance. Recently, the Transformer architecture has revolutionized the area of computer vision, and has potential for processing 3D point clouds. Applying the [...] Read more.
Semantic segmentation of plant point clouds is essential for high-throughput phenotyping systems, while existing methods still struggle to balance efficiency and performance. Recently, the Transformer architecture has revolutionized the area of computer vision, and has potential for processing 3D point clouds. Applying the Transformer for semantic segmentation of 3D plant point clouds remains a challenge. To this end, we propose a novel window-based Transformer (Win-Former) network for maize 3D organic segmentation. First, we pre-processed the Pheno4D maize point cloud dataset for training. The maize points were then projected onto a sphere surface, and a window partition mechanism was proposed to construct windows into which points were distributed evenly. After that, we employed local self-attention within windows for computing the relationship of points. To strengthen the windows’ connection, we introduced a Cross-Window self-attention (C-SA) module to gather the cross-window features by moving entire windows along the sphere. The results demonstrate that Win-Former outperforms the famous networks and obtains 83.45% mIoU with the lowest latency of 31 s on maize organ segmentation. We perform extensive experiments on ShapeNet to evaluate stability and robustness, and our proposed model achieves competitive results on part segmentation tasks. Thus, our Win-Former model effectively and efficiently segments the maize point cloud and provides technical support for automated plant phenotyping analysis. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning Technology in Agriculture)
Show Figures

Figure 1

25 pages, 9268 KB  
Article
UATNet: U-Shape Attention-Based Transformer Net for Meteorological Satellite Cloud Recognition
by Zhanjie Wang, Jianghua Zhao, Ran Zhang, Zheng Li, Qinghui Lin and Xuezhi Wang
Remote Sens. 2022, 14(1), 104; https://doi.org/10.3390/rs14010104 - 26 Dec 2021
Cited by 40 | Viewed by 5070
Abstract
Cloud recognition is a basic task in ground meteorological observation. It is of great significance to accurately identify cloud types from long-time-series satellite cloud images for improving the reliability and accuracy of weather forecasting. However, different from ground-based cloud images with a small [...] Read more.
Cloud recognition is a basic task in ground meteorological observation. It is of great significance to accurately identify cloud types from long-time-series satellite cloud images for improving the reliability and accuracy of weather forecasting. However, different from ground-based cloud images with a small observation range and easy operation, satellite cloud images have a wider cloud coverage area and contain more surface features. Hence, it is difficult to effectively extract the structural shape, area size, contour shape, hue, shadow and texture of clouds through traditional deep learning methods. In order to analyze the regional cloud type characteristics effectively, we construct a China region meteorological satellite cloud image dataset named CRMSCD, which consists of nine cloud types and the clear sky (cloudless). In this paper, we propose a novel neural network model, UATNet, which can realize the pixel-level classification of meteorological satellite cloud images. Our model efficiently integrates the spatial and multi-channel information of clouds. Specifically, several transformer blocks with modified self-attention computation (swin transformer blocks) and patch merging operations are used to build a hierarchical transformer, and spatial displacement is introduced to construct long-distance cross-window connections. In addition, we introduce a Channel Cross fusion with Transformer (CCT) to guide the multi-scale channel fusion, and design an Attention-based Squeeze and Excitation (ASE) to effectively connect the fused multi-scale channel information to the decoder features. The experimental results demonstrate that the proposed model achieved 82.33% PA, 67.79% MPA, 54.51% MIoU and 70.96% FWIoU on CRMSCD. Compared with the existing models, our method produces more precise segmentation performance, which demonstrates its superiority on meteorological satellite cloud recognition tasks. Full article
(This article belongs to the Special Issue Deep Learning-Based Cloud Detection for Remote Sensing Images)
Show Figures

Figure 1

Back to TopTop