Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (113)

Search Parameters:
Keywords = channel-wise and spatial attention

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 19790 KiB  
Article
HSF-DETR: A Special Vehicle Detection Algorithm Based on Hypergraph Spatial Features and Bipolar Attention
by Kaipeng Wang, Guanglin He and Xinmin Li
Sensors 2025, 25(14), 4381; https://doi.org/10.3390/s25144381 - 13 Jul 2025
Viewed by 209
Abstract
Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature [...] Read more.
Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature Network (CSFNet) backbone with Cross-Efficient Convolutional Gating (CECG) for enhanced long-range detection through hybrid state-space modeling; a Hypergraph-Enhanced Spatial Feature Modulation (HyperSFM) network utilizing hypergraph structures for high-order feature correlations and adaptive multi-scale fusion; a Dual-Domain Feature Encoder (DDFE) combining Bipolar Efficient Attention (BEA) and Frequency-Enhanced Feed-Forward Network (FEFFN) for precise feature weight allocation; and a Spatial-Channel Fusion Upsampling Block (SCFUB) improving feature fidelity through depth-wise separable convolution and channel shift mixing. Experiments conducted on a self-built special vehicle dataset containing 2388 images demonstrate that HSF-DETR achieves mAP50 and mAP50-95 of 96.6% and 70.6%, respectively, representing improvements of 3.1% and 4.6% over baseline RT-DETR while maintaining computational efficiency at 59.7 GFLOPs and 18.07 M parameters. Cross-domain validation on VisDrone2019 and BDD100K datasets confirms the method’s generalization capability and robustness across diverse scenarios, establishing HSF-DETR as an effective solution for special vehicle detection in complex environments. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

19 pages, 14033 KiB  
Article
SCCA-YOLO: Spatial Channel Fusion and Context-Aware YOLO for Lunar Crater Detection
by Jiahao Tang, Boyuan Gu, Tianyou Li and Ying-Bo Lu
Remote Sens. 2025, 17(14), 2380; https://doi.org/10.3390/rs17142380 - 10 Jul 2025
Viewed by 268
Abstract
Lunar crater detection plays a crucial role in geological analysis and the advancement of lunar exploration. Accurate identification of craters is also essential for constructing high-resolution topographic maps and supporting mission planning in future lunar exploration efforts. However, lunar craters often suffer from [...] Read more.
Lunar crater detection plays a crucial role in geological analysis and the advancement of lunar exploration. Accurate identification of craters is also essential for constructing high-resolution topographic maps and supporting mission planning in future lunar exploration efforts. However, lunar craters often suffer from insufficient feature representation due to their small size and blurred boundaries. In addition, the visual similarity between craters and surrounding terrain further exacerbates background confusion. These challenges significantly hinder detection performance in remote sensing imagery and underscore the necessity of enhancing both local feature representation and global semantic reasoning. In this paper, we propose a novel Spatial Channel Fusion and Context-Aware YOLO (SCCA-YOLO) model built upon the YOLO11 framework. Specifically, the Context-Aware Module (CAM) employs a multi-branch dilated convolutional structure to enhance feature richness and expand the local receptive field, thereby strengthening the feature extraction capability. The Joint Spatial and Channel Fusion Module (SCFM) is utilized to fuse spatial and channel information to model the global relationships between craters and the background, effectively suppressing background noise and reinforcing feature discrimination. In addition, the improved Channel Attention Concatenation (CAC) strategy adaptively learns channel-wise importance weights during feature concatenation, further optimizing multi-scale semantic feature fusion and enhancing the model’s sensitivity to critical crater features. The proposed method is validated on a self-constructed Chang’e 6 dataset, covering the landing site and its surrounding areas. Experimental results demonstrate that our model achieves an mAP0.5 of 96.5% and an mAP0.5:0.95 of 81.5%, outperforming other mainstream detection models including the YOLO family of algorithms. These findings highlight the potential of SCCA-YOLO for high-precision lunar crater detection and provide valuable insights into future lunar surface analysis. Full article
Show Figures

Figure 1

21 pages, 5895 KiB  
Article
Improved YOLO-Based Pulmonary Nodule Detection with Spatial-SE Attention and an Aspect Ratio Penalty
by Xinhang Song, Haoran Xie, Tianding Gao, Nuo Cheng and Jianping Gou
Sensors 2025, 25(14), 4245; https://doi.org/10.3390/s25144245 - 8 Jul 2025
Viewed by 285
Abstract
The accurate identification of pulmonary nodules is critical for the early diagnosis of lung diseases; however, this task remains challenging due to inadequate feature representation and limited localization sensitivity. Current methodologies often utilize channel attention mechanisms and intersection over union (IoU)-based loss functions. [...] Read more.
The accurate identification of pulmonary nodules is critical for the early diagnosis of lung diseases; however, this task remains challenging due to inadequate feature representation and limited localization sensitivity. Current methodologies often utilize channel attention mechanisms and intersection over union (IoU)-based loss functions. Yet, they frequently overlook spatial context and struggle to capture subtle variations in aspect ratios, which hinders their ability to detect small objects. In this study, we introduce an improved YOLOV11 framework that addresses these limitations through two primary components: a spatial squeeze-and-excitation (SSE) module that concurrently models channel-wise and spatial attention to enhance the discriminative features pertinent to nodules and explicit aspect ratio penalty IoU (EAPIoU) loss that imposes a direct penalty on the squared differences in aspect ratios to refine the bounding box regression process. Comprehensive experiments conducted on the LUNA16, LungCT, and Node21 datasets reveal that our approach achieves superior precision, recall, and mean average precision (mAP) across various IoU thresholds, surpassing previous state-of-the-art methods while maintaining computational efficiency. Specifically, the proposed SSE module achieves a precision of 0.781 on LUNA16, while the EAPIoU loss boosts mAP@50 to 92.4% on LungCT, outperforming mainstream attention mechanisms and IoU-based loss functions. These findings underscore the effectiveness of integrating spatially aware attention mechanisms with aspect ratio-sensitive loss functions for robust nodule detection. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

20 pages, 7167 KiB  
Article
FM-Net: Frequency-Aware Masked-Attention Network for Infrared Small Target Detection
by Yongxian Liu, Zaiping Lin, Boyang Li, Ting Liu and Wei An
Remote Sens. 2025, 17(13), 2264; https://doi.org/10.3390/rs17132264 - 1 Jul 2025
Viewed by 271
Abstract
Infrared small target detection (IRSTD) aims to locate and separate targets from complex backgrounds. The challenges in IRSTD primarily come from extremely sparse target features and strong background clutter interference. However, existing methods typically perform discrimination directly on the features extracted by deep [...] Read more.
Infrared small target detection (IRSTD) aims to locate and separate targets from complex backgrounds. The challenges in IRSTD primarily come from extremely sparse target features and strong background clutter interference. However, existing methods typically perform discrimination directly on the features extracted by deep networks, neglecting the distinct characteristics of weak and small targets in the frequency domain, thereby limiting the improvement of detection capability. In this paper, we propose a frequency-aware masked-attention network (FM-Net) that leverages multi-scale frequency clues to assist in representing global context and suppressing noise interference. Specifically, we design the wavelet residual block (WRB) to extract multi-scale spatial and frequency features, which introduces a wavelet pyramid as the intermediate layer of the residual block. Then, to perceive global information on the long-range skip connections, a frequency-modulation masked-attention module (FMM) is used to interact with multi-layer features from the encoder. FMM contains two crucial elements: (a) a mask attention (MA) mechanism for injecting broad contextual feature efficiently to promote full-level semantic correlation and focus on salient regions, and (b) a channel-wise frequency modulation module (CFM) for enhancing the most informative frequency components and suppressing useless ones. Extensive experiments on three benchmark datasets (e.g., SIRST, NUDT-SIRST, IRSTD-1k) demonstrate that FM-Net achieves superior detection performance. Full article
Show Figures

Graphical abstract

18 pages, 3974 KiB  
Article
LKD-YOLOv8: A Lightweight Knowledge Distillation-Based Method for Infrared Object Detection
by Xiancheng Cao, Yueli Hu and Haikun Zhang
Sensors 2025, 25(13), 4054; https://doi.org/10.3390/s25134054 - 29 Jun 2025
Viewed by 408
Abstract
Currently, infrared object detection is utilized in a broad spectrum of fields, including military applications, security, and aerospace. Nonetheless, the limited computational power of edge devices presents a considerable challenge in achieving an optimal balance between accuracy and computational efficiency in infrared object [...] Read more.
Currently, infrared object detection is utilized in a broad spectrum of fields, including military applications, security, and aerospace. Nonetheless, the limited computational power of edge devices presents a considerable challenge in achieving an optimal balance between accuracy and computational efficiency in infrared object detection. In order to enhance the accuracy of infrared target detection and strengthen the implementation of robust models on edge platforms for rapid real-time inference, this paper presents LKD-YOLOv8, an innovative infrared object detection method that integrates YOLOv8 architecture with masked generative distillation (MGD), further augmented by the lightweight convolution design and attention mechanism for improved feature adaptability. Linear deformable convolution (LDConv) strengthens spatial feature extraction by dynamically adjusting kernel offsets, while coordinate attention (CA) refines feature alignment through channel-wise interaction. We employ a large-scale model (YOLOv8s) as the teacher to imparts knowledge and supervise the training of a compact student model (YOLOv8n). Experiments show that LKD-YOLOv8 achieves a 1.18% mAP@0.5:0.95 improvement over baseline methods while reducing the parameter size by 7.9%. Our approach effectively balances accuracy and efficiency, rendering it applicable for resource-constrained edge devices in infrared scenarios. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

13 pages, 778 KiB  
Article
Tunnel Lining Recognition and Thickness Estimation via Optical Image to Radar Image Transfer Learning
by Chuan Li, Tong Pu, Nianbiao Cai, Xi Yang, Hao Liu and Lulu Wang
Appl. Sci. 2025, 15(13), 7306; https://doi.org/10.3390/app15137306 - 28 Jun 2025
Viewed by 243
Abstract
The secondary lining of a tunnel is a critical load-bearing component, whose stability and structural integrity are essential for ensuring the overall safety of the tunnel. However, identifying lining structures and estimating their thickness using ground-penetrating radar (GPR) remain challenging due to several [...] Read more.
The secondary lining of a tunnel is a critical load-bearing component, whose stability and structural integrity are essential for ensuring the overall safety of the tunnel. However, identifying lining structures and estimating their thickness using ground-penetrating radar (GPR) remain challenging due to several inherent limitations. First, the limited electromagnetic contrast between the primary and secondary linings results in weak interface reflections in GPR imagery, thereby hindering accurate delineation. Second, construction errors such as over-excavation or under-excavation often lead to complex interface geometries, further complicating the interpretation of GPR signals. To address these challenges, we propose an enhanced YOLOv8-seg network capable of performing pixel-level segmentation on GPR images to accurately delineate secondary lining regions and estimate their thickness. The model integrates a convolutional block attention module (CBAM) to refine feature extraction by emphasizing critical characteristics of the two interface layers through channel-wise and spatial attention mechanisms. The model is first pretrained on the COCO dataset and subsequently fine-tuned via transfer learning using a hybrid GPR dataset comprising real-world measurements and numerically simulated data based on forward modeling. Finally, the model is validated on real-world GPR measurements acquired from the Longhai tunnel. Experimental results demonstrate that the proposed method reliably identifies secondary tunnel linings and accurately estimates their average thickness. Full article
Show Figures

Figure 1

20 pages, 2599 KiB  
Article
Efficient Smoke Segmentation Using Multiscale Convolutions and Multiview Attention Mechanisms
by Xuesong Liu and Emmett J. Ientilucci
Electronics 2025, 14(13), 2593; https://doi.org/10.3390/electronics14132593 - 27 Jun 2025
Viewed by 222
Abstract
Efficient segmentation of smoke plumes is crucial for environmental monitoring and industrial safety. Existing models often face high computational demands and limited adaptability to diverse smoke appearances. To address these issues, we propose SmokeNet, a deep learning architecture integrating multiscale convolutions, multiview linear [...] Read more.
Efficient segmentation of smoke plumes is crucial for environmental monitoring and industrial safety. Existing models often face high computational demands and limited adaptability to diverse smoke appearances. To address these issues, we propose SmokeNet, a deep learning architecture integrating multiscale convolutions, multiview linear attention, and layer-specific loss functions. Specifically, multiscale convolutions capture diverse smoke shapes by employing varying kernel sizes optimized for different plume orientations. Subsequently, multiview linear attention emphasizes spatial and channel-wise features relevant to smoke segmentation tasks. Additionally, layer-specific loss functions promote consistent feature refinement across network layers, facilitating accurate and robust segmentation. SmokeNet achieves a segmentation accuracy of 72.74% mean Intersection over Union (mIoU) on our newly introduced quarry blast smoke dataset and maintains comparable performance on three benchmark smoke datasets, reaching up to 76.45% mIoU on the Smoke100k dataset. With a computational complexity of only 0.34 M parameters and 0.07 Giga Floating Point Operations (GFLOPs), SmokeNet is suitable for real-time applications. Evaluations conducted across these datasets demonstrate SmokeNet’s effectiveness and versatility in handling complex real-world scenarios. Full article
Show Figures

Figure 1

27 pages, 5780 KiB  
Article
Utilizing GCN-Based Deep Learning for Road Extraction from Remote Sensing Images
by Yu Jiang, Jiasen Zhao, Wei Luo, Bincheng Guo, Zhulin An and Yongjun Xu
Sensors 2025, 25(13), 3915; https://doi.org/10.3390/s25133915 - 23 Jun 2025
Viewed by 441
Abstract
The technology of road extraction serves as a crucial foundation for urban intelligent renewal and green sustainable development. Its outcomes can optimize transportation network planning, reduce resource waste, and enhance urban resilience. Deep learning-based approaches have demonstrated outstanding performance in road extraction, particularly [...] Read more.
The technology of road extraction serves as a crucial foundation for urban intelligent renewal and green sustainable development. Its outcomes can optimize transportation network planning, reduce resource waste, and enhance urban resilience. Deep learning-based approaches have demonstrated outstanding performance in road extraction, particularly excelling in complex scenarios. However, extracting roads from remote sensing data remains challenging due to several factors that limit accuracy: (1) Roads often share similar visual features with the background, such as rooftops and parking lots, leading to ambiguous inter-class distinctions; (2) Roads in complex environments, such as those occluded by shadows or trees, are difficult to detect. To address these issues, this paper proposes an improved model based on Graph Convolutional Networks (GCNs), named FR-SGCN (Hierarchical Depth-wise Separable Graph Convolutional Network Incorporating Graph Reasoning and Attention Mechanisms). The model is designed to enhance the precision and robustness of road extraction through intelligent techniques, thereby supporting precise planning of green infrastructure. First, high-dimensional features are extracted using ResNeXt, whose grouped convolution structure balances parameter efficiency and feature representation capability, significantly enhancing the expressiveness of the data. These high-dimensional features are then segmented, and enhanced channel and spatial features are obtained via attention mechanisms, effectively mitigating background interference and intra-class ambiguity. Subsequently, a hybrid adjacency matrix construction method is proposed, based on gradient operators and graph reasoning. This method integrates similarity and gradient information and employs graph convolution to capture the global contextual relationships among features. To validate the effectiveness of FR-SGCN, we conducted comparative experiments using 12 different methods on both a self-built dataset and a public dataset. The proposed model achieved the highest F1 score on both datasets. Visualization results from the experiments demonstrate that the model effectively extracts occluded roads and reduces the risk of redundant construction caused by data errors during urban renewal. This provides reliable technical support for smart cities and sustainable development. Full article
Show Figures

Figure 1

17 pages, 35033 KiB  
Article
A Multi-Branch Attention Fusion Method for Semantic Segmentation of Remote Sensing Images
by Kaibo Li, Zhenping Qiang, Hong Lin and Xiaorui Wang
Remote Sens. 2025, 17(11), 1898; https://doi.org/10.3390/rs17111898 - 30 May 2025
Viewed by 502
Abstract
In recent years, advancements in remote sensing image observation technology have significantly enriched the surface feature information captured in remote sensing images, posing greater challenges for semantic information extraction from remote sensing imagery. While convolutional neural networks (CNNs) excel at understanding relationships between [...] Read more.
In recent years, advancements in remote sensing image observation technology have significantly enriched the surface feature information captured in remote sensing images, posing greater challenges for semantic information extraction from remote sensing imagery. While convolutional neural networks (CNNs) excel at understanding relationships between adjacent image regions, processing multidimensional data requires reliance on attention mechanisms. However, due to the inherent complexity of remote sensing images, most attention mechanisms designed for natural images underperform when applied to remote sensing data. To address these challenges in remote sensing image semantic segmentation, we propose a highly generalizable multi-branch attention fusion method based on shallow and deep features. This approach applies pixel-wise, spatial, and channel attention mechanisms to feature maps fused with shallow and deep features, thereby enhancing the network’s semantic information extraction capability. Through evaluations on the Cityscapes, LoveDA, and WHDLD datasets, we validate the performance of our method in processing remote sensing data. The results demonstrate consistent improvements in segmentation accuracy across most categories, highlighting its strong generalization capability. Specifically, compared to baseline methods, our approach achieves average mIoU improvements of 0.42% and 0.54% on the WHDLD and LoveDA datasets, respectively, significantly enhancing network performance in complex remote sensing scenarios. Full article
Show Figures

Figure 1

19 pages, 7025 KiB  
Article
CDWMamba: Cloud Detection with Wavelet-Enhanced Mamba for Optical Satellite Imagery
by Shiyao Meng, Wei Gong, Siwei Li, Ge Song, Jie Yang and Yu Ding
Remote Sens. 2025, 17(11), 1874; https://doi.org/10.3390/rs17111874 - 28 May 2025
Viewed by 459
Abstract
Accurate cloud detection is a critical preprocessing step in remote sensing applications, as cloud and cloud shadow contamination can significantly degrade the quality of optical satellite imagery. In this paper, we propose CDWMamba, a novel dual-domain neural network that integrates the Mamba-based state [...] Read more.
Accurate cloud detection is a critical preprocessing step in remote sensing applications, as cloud and cloud shadow contamination can significantly degrade the quality of optical satellite imagery. In this paper, we propose CDWMamba, a novel dual-domain neural network that integrates the Mamba-based state space model with discrete wavelet transform (DWT) for effective cloud detection. CDWMamba adopts a four-direction Mamba module to capture long-range dependencies, while the wavelet decomposition enables multi-scale global context modeling in the frequency domain. To further enhance fine-grained spatial features, we incorporate a multi-scale depth-wise separable convolution (MDC) module for spatial detail refinement. Additionally, a spectral–spatial bottleneck (SSN) with channel-wise attention is introduced to promote inter-band information interaction across multi-spectral inputs. We evaluate our method on two benchmark datasets, L8 Biome and S2_CMC, covering diverse land cover types and environmental conditions. Experimental results demonstrate that CDWMamba achieves state-of-the-art performance across multiple metrics, significantly outperforming deep-learning-based baselines in terms of overall accuracy, mIoU, precision, and recall. Moreover, the model exhibits satisfactory performance under challenging conditions such as snow/ice and shrubland surfaces. These results verify the effectiveness of combining a state space model, frequency-domain representation, and spectral–spatial attention for cloud detection in multi-spectral remote sensing imagery. Full article
Show Figures

Graphical abstract

17 pages, 11290 KiB  
Article
Learning to Utilize Multi-Scale Feature Information for Crisp Power Line Detection
by Kai Li, Min Liu, Feiran Wang, Xinyang Guo, Geng Han, Xiangnan Bai and Changsong Liu
Electronics 2025, 14(11), 2175; https://doi.org/10.3390/electronics14112175 - 27 May 2025
Viewed by 306
Abstract
Power line detection (PLD) is a crucial task in the electric power industry where accurate PLD forms the foundation for achieving automated inspections. However, recent top-performing power line detection methods tend to generate thick and noisy edge lines, adding to the difficulties of [...] Read more.
Power line detection (PLD) is a crucial task in the electric power industry where accurate PLD forms the foundation for achieving automated inspections. However, recent top-performing power line detection methods tend to generate thick and noisy edge lines, adding to the difficulties of subsequent tasks. In this work, we propose a multi-scale feature-based PLD method named LUM-Net to allow for the detection of power lines in a crisp and precise way. The algorithm utilizes EfficientNetV1 as the backbone network, ensuring effective feature extraction across various scales. We developed a Coordinated Convolutional Block Attention Module (CoCBAM) to focus on critical features by emphasizing both channel-wise and spatial information, thereby refining the power lines and reducing noise. Furthermore, we constructed the Bi-Large Kernel Convolutional Block (BiLKB) as the decoder, leveraging large kernel convolutions and spatial selection mechanisms to capture more contextual information, supplemented by auxiliary small kernels to refine the extracted feature information. By integrating these advanced components into a top-down dense connection mechanism, our method achieves effective, multi-scale information interaction, significantly improving the overall performance. The experimental results show that our method can predict crisp power line maps and achieve state-of-the-art performance on the PLDU dataset (ODS = 0.969) and PLDM dataset (ODS = 0.943). Full article
Show Figures

Figure 1

26 pages, 14974 KiB  
Article
HFEF2-YOLO: Hierarchical Dynamic Attention for High-Precision Multi-Scale Small Target Detection in Complex Remote Sensing
by Yao Lu, Biyun Zhang, Chunmin Zhang, Yifan He and Yanqiang Wang
Remote Sens. 2025, 17(10), 1789; https://doi.org/10.3390/rs17101789 - 20 May 2025
Viewed by 579
Abstract
Deep learning-based methods for real-time small target detection are critical for applications such as traffic monitoring, land management, and marine transportation. However, achieving high-precision detection of small objects against complex backgrounds remains challenging due to insufficient feature representation and background interference. Existing methods [...] Read more.
Deep learning-based methods for real-time small target detection are critical for applications such as traffic monitoring, land management, and marine transportation. However, achieving high-precision detection of small objects against complex backgrounds remains challenging due to insufficient feature representation and background interference. Existing methods often struggle to balance multi-scale feature enhancement and computational efficiency, particularly in scenarios with low target-to-background contrast. To address this challenge, this study proposes an efficient detection method called hierarchical feature enhancement and feature fusion YOLO (HFEF2-YOLO), which is based on the hierarchical dynamic attention. Firstly, a Hierarchical Filtering Feature Pyramid Network (HF-FPN) is introduced, which employs a dynamic gating mechanism to achieve differentiated screening and fusion of cross-scale features. This design addresses the feature redundancy caused by fixed fusion strategies in conventional FPN architectures, preserving edge details of tiny targets. Secondly, we propose a Dynamic Spatial–Spectral Attention Module (DSAM), which adaptively fuses channel-wise and spatial–dimensional responses through learnable weight allocation, generating dedicated spatial modulation factors for individual channels and significantly enhancing the saliency representation of dim small targets. Extensive experiments on four benchmark datasets (VEDAI, AI-TOD, DOTA, NWPU VHR-10) demonstrate the superiority of HFEF2-YOLO; the proposed method can reach an accuracy of 0.761, 0.621, 0.737, and 0.969 (in terms of mAP@0.5), outperforming state-of-the-art methods by 3.5–8.1%. Furthermore, a lightweight version (L-HFEF2-YOLO) is developed via dynamic convolution, reducing parameters by 42% while maintaining >95% accuracy, demonstrating real-time applicability on edge devices. Robustness tests under simulated degradation (e.g., noise, blur) validate its practicality for satellite-based tasks. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

19 pages, 14292 KiB  
Article
GCAFlow: Multi-Scale Flow-Based Model with Global Context-Aware Channel Attention for Industrial Anomaly Detection
by Lin Liao, Congde Lu, Yujie Gao, Hao Yu and Biao Cai
Sensors 2025, 25(10), 3205; https://doi.org/10.3390/s25103205 - 20 May 2025
Viewed by 469
Abstract
In anomaly detection tasks, labeled defect data are often scarce. Unsupervised learning leverages only normal samples during training, making it particularly suitable for anomaly detection tasks. Among unsupervised methods, normalizing flow models have shown distinct advantages. They allow precise modeling of data distributions [...] Read more.
In anomaly detection tasks, labeled defect data are often scarce. Unsupervised learning leverages only normal samples during training, making it particularly suitable for anomaly detection tasks. Among unsupervised methods, normalizing flow models have shown distinct advantages. They allow precise modeling of data distributions and enable direct computation of sample log-likelihoods. Recent work has largely focused on feature fusion strategies. However, most of the flow-based methods emphasize spatial information while neglecting the critical role of channel-wise features. To address this limitation, we propose GCAFlow, a novel flow-based model enhanced with a global context-aware channel attention mechanism. In addition, we design a hierarchical convolutional subnetwork to improve the probabilistic modeling capacity of the flow-based framework. This subnetwork supports more accurate estimation of data likelihoods and enhances anomaly detection performance. We evaluate GCAFlow on three benchmark anomaly detection datasets, and the results demonstrate that it consistently outperforms existing flow-based models in both accuracy and robustness. In particular, on the VisA dataset, GCAFlow achieves an image-level AUROC of 98.2% and a pixel-level AUROC of 99.0%. Full article
(This article belongs to the Section Industrial Sensors)
Show Figures

Figure 1

16 pages, 3751 KiB  
Article
Improved Face Image Super-Resolution Model Based on Generative Adversarial Network
by Qingyu Liu, Yeguo Sun, Lei Chen and Lei Liu
J. Imaging 2025, 11(5), 163; https://doi.org/10.3390/jimaging11050163 - 19 May 2025
Viewed by 635
Abstract
Image super-resolution (SR) models based on the generative adversarial network (GAN) face challenges such as unnatural facial detail restoration and local blurring. This paper proposes an improved GAN-based model to address these issues. First, a Multi-scale Hybrid Attention Residual Block (MHARB) is designed, [...] Read more.
Image super-resolution (SR) models based on the generative adversarial network (GAN) face challenges such as unnatural facial detail restoration and local blurring. This paper proposes an improved GAN-based model to address these issues. First, a Multi-scale Hybrid Attention Residual Block (MHARB) is designed, which dynamically enhances feature representation in critical face regions through dual-branch convolution and channel-spatial attention. Second, an Edge-guided Enhancement Block (EEB) is introduced, generating adaptive detail residuals by combining edge masks and channel attention to accurately recover high-frequency textures. Furthermore, a multi-scale discriminator with a weighted sub-discriminator loss is developed to balance global structural and local detail generation quality. Additionally, a phase-wise training strategy with dynamic adjustment of learning rate (Lr) and loss function weights is implemented to improve the realism of super-resolved face images. Experiments on the CelebA-HQ dataset demonstrate that the proposed model achieves a PSNR of 23.35 dB, a SSIM of 0.7424, and a LPIPS of 24.86, outperforming classical models and delivering superior visual quality in high-frequency regions. Notably, this model also surpasses the SwinIR model (PSNR: 23.28 dB → 23.35 dB, SSIM: 0.7340 → 0.7424, and LPIPS: 30.48 → 24.86), validating the effectiveness of the improved model and the training strategy in preserving facial details. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

23 pages, 6938 KiB  
Article
A Hybrid Attention Framework Integrating Channel–Spatial Refinement and Frequency Spectral Analysis for Remote Sensing Smoke Recognition
by Guangtao Cheng, Lisha Yang, Zhihao Yu, Xiaobo Li and Guanghui Fu
Fire 2025, 8(5), 197; https://doi.org/10.3390/fire8050197 - 14 May 2025
Viewed by 427
Abstract
In recent years, accelerated global climate change has precipitated an increased frequency of wildfire events, with their devastating impacts on ecological systems and human populations becoming increasingly significant. Satellite remote sensing technology, leveraging its extensive spatial coverage and real-time monitoring capabilities, has emerged [...] Read more.
In recent years, accelerated global climate change has precipitated an increased frequency of wildfire events, with their devastating impacts on ecological systems and human populations becoming increasingly significant. Satellite remote sensing technology, leveraging its extensive spatial coverage and real-time monitoring capabilities, has emerged as a pivotal approach for wildfire early warning and comprehensive disaster assessment. To effectively detect subtle smoke signatures while minimizing background interference in remote sensing imagery, this paper introduces a novel dual-branch attention framework (CSFAttention) that synergistically integrates channel–spatial refinement with frequency spectral analysis to aggregate smoke features in remote sensing images. The channel–spatial branch implements an innovative triple-pooling strategy (incorporating average, maximum, and standard deviation pooling) across both channel and spatial dimensions to generate complementary descriptors that enhance distinct statistical properties of smoke representations. Concurrently, the frequency branch explicitly enhances high-frequency edge patterns, which are critical for distinguishing subtle textural variations characteristic of smoke plumes. The outputs from these complementary branches are fused through element-wise summation, yielding a refined feature representation that optimizes channel dependencies, spatial saliency, and spectral discriminability. The CSFAttention module is strategically integrated into the bottleneck structures of the ResNet architecture, forming a specialized deep network specifically designed for robust smoke recognition. Experimental validation on the USTC_SmokeRS dataset demonstrates that the proposed CSFResNet achieves recognition accuracy of 96.84%, surpassing existing deep networks for RS smoke recognition. Full article
Show Figures

Figure 1

Back to TopTop