MDPI - Publisher of Open Access Journals

28 pages, 19790 KiB

Open AccessArticle

HSF-DETR: A Special Vehicle Detection Algorithm Based on Hypergraph Spatial Features and Bipolar Attention

by Kaipeng Wang, Guanglin He and Xinmin Li

Sensors 2025, 25(14), 4381; https://doi.org/10.3390/s25144381 - 13 Jul 2025

Viewed by 209

Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature [...] Read more.

Special vehicle detection in intelligent surveillance, emergency rescue, and reconnaissance faces significant challenges in accuracy and robustness under complex environments, necessitating advanced detection algorithms for critical applications. This paper proposes HSF-DETR (Hypergraph Spatial Feature DETR), integrating four innovative modules: a Cascaded Spatial Feature Network (CSFNet) backbone with Cross-Efficient Convolutional Gating (CECG) for enhanced long-range detection through hybrid state-space modeling; a Hypergraph-Enhanced Spatial Feature Modulation (HyperSFM) network utilizing hypergraph structures for high-order feature correlations and adaptive multi-scale fusion; a Dual-Domain Feature Encoder (DDFE) combining Bipolar Efficient Attention (BEA) and Frequency-Enhanced Feed-Forward Network (FEFFN) for precise feature weight allocation; and a Spatial-Channel Fusion Upsampling Block (SCFUB) improving feature fidelity through depth-wise separable convolution and channel shift mixing. Experiments conducted on a self-built special vehicle dataset containing 2388 images demonstrate that HSF-DETR achieves mAP50 and mAP50-95 of 96.6% and 70.6%, respectively, representing improvements of 3.1% and 4.6% over baseline RT-DETR while maintaining computational efficiency at 59.7 GFLOPs and 18.07 M parameters. Cross-domain validation on VisDrone2019 and BDD100K datasets confirms the method’s generalization capability and robustness across diverse scenarios, establishing HSF-DETR as an effective solution for special vehicle detection in complex environments. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

19 pages, 14033 KiB

Open AccessArticle

SCCA-YOLO: Spatial Channel Fusion and Context-Aware YOLO for Lunar Crater Detection

by Jiahao Tang, Boyuan Gu, Tianyou Li and Ying-Bo Lu

Remote Sens. 2025, 17(14), 2380; https://doi.org/10.3390/rs17142380 - 10 Jul 2025

Viewed by 268

Abstract

Lunar crater detection plays a crucial role in geological analysis and the advancement of lunar exploration. Accurate identification of craters is also essential for constructing high-resolution topographic maps and supporting mission planning in future lunar exploration efforts. However, lunar craters often suffer from [...] Read more.

Lunar crater detection plays a crucial role in geological analysis and the advancement of lunar exploration. Accurate identification of craters is also essential for constructing high-resolution topographic maps and supporting mission planning in future lunar exploration efforts. However, lunar craters often suffer from insufficient feature representation due to their small size and blurred boundaries. In addition, the visual similarity between craters and surrounding terrain further exacerbates background confusion. These challenges significantly hinder detection performance in remote sensing imagery and underscore the necessity of enhancing both local feature representation and global semantic reasoning. In this paper, we propose a novel Spatial Channel Fusion and Context-Aware YOLO (SCCA-YOLO) model built upon the YOLO11 framework. Specifically, the Context-Aware Module (CAM) employs a multi-branch dilated convolutional structure to enhance feature richness and expand the local receptive field, thereby strengthening the feature extraction capability. The Joint Spatial and Channel Fusion Module (SCFM) is utilized to fuse spatial and channel information to model the global relationships between craters and the background, effectively suppressing background noise and reinforcing feature discrimination. In addition, the improved Channel Attention Concatenation (CAC) strategy adaptively learns channel-wise importance weights during feature concatenation, further optimizing multi-scale semantic feature fusion and enhancing the model’s sensitivity to critical crater features. The proposed method is validated on a self-constructed Chang’e 6 dataset, covering the landing site and its surrounding areas. Experimental results demonstrate that our model achieves an

m A P_{0.5}

of 96.5% and an

m A P_{0.5 : 0.95}

of 81.5%, outperforming other mainstream detection models including the YOLO family of algorithms. These findings highlight the potential of SCCA-YOLO for high-precision lunar crater detection and provide valuable insights into future lunar surface analysis. Full article

► Show Figures

Figure 1

21 pages, 5895 KiB

Open AccessArticle

Improved YOLO-Based Pulmonary Nodule Detection with Spatial-SE Attention and an Aspect Ratio Penalty

by Xinhang Song, Haoran Xie, Tianding Gao, Nuo Cheng and Jianping Gou

Sensors 2025, 25(14), 4245; https://doi.org/10.3390/s25144245 - 8 Jul 2025

Viewed by 285

Abstract

The accurate identification of pulmonary nodules is critical for the early diagnosis of lung diseases; however, this task remains challenging due to inadequate feature representation and limited localization sensitivity. Current methodologies often utilize channel attention mechanisms and intersection over union (IoU)-based loss functions. [...] Read more.

The accurate identification of pulmonary nodules is critical for the early diagnosis of lung diseases; however, this task remains challenging due to inadequate feature representation and limited localization sensitivity. Current methodologies often utilize channel attention mechanisms and intersection over union (IoU)-based loss functions. Yet, they frequently overlook spatial context and struggle to capture subtle variations in aspect ratios, which hinders their ability to detect small objects. In this study, we introduce an improved YOLOV11 framework that addresses these limitations through two primary components: a spatial squeeze-and-excitation (SSE) module that concurrently models channel-wise and spatial attention to enhance the discriminative features pertinent to nodules and explicit aspect ratio penalty IoU (EAPIoU) loss that imposes a direct penalty on the squared differences in aspect ratios to refine the bounding box regression process. Comprehensive experiments conducted on the LUNA16, LungCT, and Node21 datasets reveal that our approach achieves superior precision, recall, and mean average precision (mAP) across various IoU thresholds, surpassing previous state-of-the-art methods while maintaining computational efficiency. Specifically, the proposed SSE module achieves a precision of 0.781 on LUNA16, while the EAPIoU loss boosts mAP@50 to 92.4% on LungCT, outperforming mainstream attention mechanisms and IoU-based loss functions. These findings underscore the effectiveness of integrating spatially aware attention mechanisms with aspect ratio-sensitive loss functions for robust nodule detection. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

20 pages, 7167 KiB

Open AccessArticle

FM-Net: Frequency-Aware Masked-Attention Network for Infrared Small Target Detection

by Yongxian Liu, Zaiping Lin, Boyang Li, Ting Liu and Wei An

Remote Sens. 2025, 17(13), 2264; https://doi.org/10.3390/rs17132264 - 1 Jul 2025

Viewed by 271

Abstract

Infrared small target detection (IRSTD) aims to locate and separate targets from complex backgrounds. The challenges in IRSTD primarily come from extremely sparse target features and strong background clutter interference. However, existing methods typically perform discrimination directly on the features extracted by deep [...] Read more.

Infrared small target detection (IRSTD) aims to locate and separate targets from complex backgrounds. The challenges in IRSTD primarily come from extremely sparse target features and strong background clutter interference. However, existing methods typically perform discrimination directly on the features extracted by deep networks, neglecting the distinct characteristics of weak and small targets in the frequency domain, thereby limiting the improvement of detection capability. In this paper, we propose a frequency-aware masked-attention network (FM-Net) that leverages multi-scale frequency clues to assist in representing global context and suppressing noise interference. Specifically, we design the wavelet residual block (WRB) to extract multi-scale spatial and frequency features, which introduces a wavelet pyramid as the intermediate layer of the residual block. Then, to perceive global information on the long-range skip connections, a frequency-modulation masked-attention module (FMM) is used to interact with multi-layer features from the encoder. FMM contains two crucial elements: (a) a mask attention (MA) mechanism for injecting broad contextual feature efficiently to promote full-level semantic correlation and focus on salient regions, and (b) a channel-wise frequency modulation module (CFM) for enhancing the most informative frequency components and suppressing useless ones. Extensive experiments on three benchmark datasets (e.g., SIRST, NUDT-SIRST, IRSTD-1k) demonstrate that FM-Net achieves superior detection performance. Full article

(This article belongs to the Special Issue Advanced Artificial Intelligence and Deep Learning for Remote Sensing (3rd Edition))

► Show Figures

Graphical abstract

18 pages, 3974 KiB

Open AccessArticle

LKD-YOLOv8: A Lightweight Knowledge Distillation-Based Method for Infrared Object Detection

by Xiancheng Cao, Yueli Hu and Haikun Zhang

Sensors 2025, 25(13), 4054; https://doi.org/10.3390/s25134054 - 29 Jun 2025

Viewed by 408

Abstract

Currently, infrared object detection is utilized in a broad spectrum of fields, including military applications, security, and aerospace. Nonetheless, the limited computational power of edge devices presents a considerable challenge in achieving an optimal balance between accuracy and computational efficiency in infrared object [...] Read more.

Currently, infrared object detection is utilized in a broad spectrum of fields, including military applications, security, and aerospace. Nonetheless, the limited computational power of edge devices presents a considerable challenge in achieving an optimal balance between accuracy and computational efficiency in infrared object detection. In order to enhance the accuracy of infrared target detection and strengthen the implementation of robust models on edge platforms for rapid real-time inference, this paper presents LKD-YOLOv8, an innovative infrared object detection method that integrates YOLOv8 architecture with masked generative distillation (MGD), further augmented by the lightweight convolution design and attention mechanism for improved feature adaptability. Linear deformable convolution (LDConv) strengthens spatial feature extraction by dynamically adjusting kernel offsets, while coordinate attention (CA) refines feature alignment through channel-wise interaction. We employ a large-scale model (YOLOv8s) as the teacher to imparts knowledge and supervise the training of a compact student model (YOLOv8n). Experiments show that LKD-YOLOv8 achieves a 1.18% mAP@0.5:0.95 improvement over baseline methods while reducing the parameter size by 7.9%. Our approach effectively balances accuracy and efficiency, rendering it applicable for resource-constrained edge devices in infrared scenarios. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

13 pages, 778 KiB

Open AccessArticle

Tunnel Lining Recognition and Thickness Estimation via Optical Image to Radar Image Transfer Learning

by Chuan Li, Tong Pu, Nianbiao Cai, Xi Yang, Hao Liu and Lulu Wang

Appl. Sci. 2025, 15(13), 7306; https://doi.org/10.3390/app15137306 - 28 Jun 2025

Viewed by 243

Abstract

The secondary lining of a tunnel is a critical load-bearing component, whose stability and structural integrity are essential for ensuring the overall safety of the tunnel. However, identifying lining structures and estimating their thickness using ground-penetrating radar (GPR) remain challenging due to several [...] Read more.

The secondary lining of a tunnel is a critical load-bearing component, whose stability and structural integrity are essential for ensuring the overall safety of the tunnel. However, identifying lining structures and estimating their thickness using ground-penetrating radar (GPR) remain challenging due to several inherent limitations. First, the limited electromagnetic contrast between the primary and secondary linings results in weak interface reflections in GPR imagery, thereby hindering accurate delineation. Second, construction errors such as over-excavation or under-excavation often lead to complex interface geometries, further complicating the interpretation of GPR signals. To address these challenges, we propose an enhanced YOLOv8-seg network capable of performing pixel-level segmentation on GPR images to accurately delineate secondary lining regions and estimate their thickness. The model integrates a convolutional block attention module (CBAM) to refine feature extraction by emphasizing critical characteristics of the two interface layers through channel-wise and spatial attention mechanisms. The model is first pretrained on the COCO dataset and subsequently fine-tuned via transfer learning using a hybrid GPR dataset comprising real-world measurements and numerically simulated data based on forward modeling. Finally, the model is validated on real-world GPR measurements acquired from the Longhai tunnel. Experimental results demonstrate that the proposed method reliably identifies secondary tunnel linings and accurately estimates their average thickness. Full article

► Show Figures

Figure 1

20 pages, 2599 KiB

Open AccessArticle

Efficient Smoke Segmentation Using Multiscale Convolutions and Multiview Attention Mechanisms

by Xuesong Liu and Emmett J. Ientilucci

Electronics 2025, 14(13), 2593; https://doi.org/10.3390/electronics14132593 - 27 Jun 2025

Viewed by 222

Abstract

Efficient segmentation of smoke plumes is crucial for environmental monitoring and industrial safety. Existing models often face high computational demands and limited adaptability to diverse smoke appearances. To address these issues, we propose SmokeNet, a deep learning architecture integrating multiscale convolutions, multiview linear [...] Read more.

Efficient segmentation of smoke plumes is crucial for environmental monitoring and industrial safety. Existing models often face high computational demands and limited adaptability to diverse smoke appearances. To address these issues, we propose SmokeNet, a deep learning architecture integrating multiscale convolutions, multiview linear attention, and layer-specific loss functions. Specifically, multiscale convolutions capture diverse smoke shapes by employing varying kernel sizes optimized for different plume orientations. Subsequently, multiview linear attention emphasizes spatial and channel-wise features relevant to smoke segmentation tasks. Additionally, layer-specific loss functions promote consistent feature refinement across network layers, facilitating accurate and robust segmentation. SmokeNet achieves a segmentation accuracy of 72.74% mean Intersection over Union (mIoU) on our newly introduced quarry blast smoke dataset and maintains comparable performance on three benchmark smoke datasets, reaching up to 76.45% mIoU on the Smoke100k dataset. With a computational complexity of only 0.34 M parameters and 0.07 Giga Floating Point Operations (GFLOPs), SmokeNet is suitable for real-time applications. Evaluations conducted across these datasets demonstrate SmokeNet’s effectiveness and versatility in handling complex real-world scenarios. Full article

► Show Figures

Figure 1

27 pages, 5780 KiB

Open AccessArticle

Utilizing GCN-Based Deep Learning for Road Extraction from Remote Sensing Images

by Yu Jiang, Jiasen Zhao, Wei Luo, Bincheng Guo, Zhulin An and Yongjun Xu

Sensors 2025, 25(13), 3915; https://doi.org/10.3390/s25133915 - 23 Jun 2025

Viewed by 441

Abstract

The technology of road extraction serves as a crucial foundation for urban intelligent renewal and green sustainable development. Its outcomes can optimize transportation network planning, reduce resource waste, and enhance urban resilience. Deep learning-based approaches have demonstrated outstanding performance in road extraction, particularly [...] Read more.

The technology of road extraction serves as a crucial foundation for urban intelligent renewal and green sustainable development. Its outcomes can optimize transportation network planning, reduce resource waste, and enhance urban resilience. Deep learning-based approaches have demonstrated outstanding performance in road extraction, particularly excelling in complex scenarios. However, extracting roads from remote sensing data remains challenging due to several factors that limit accuracy: (1) Roads often share similar visual features with the background, such as rooftops and parking lots, leading to ambiguous inter-class distinctions; (2) Roads in complex environments, such as those occluded by shadows or trees, are difficult to detect. To address these issues, this paper proposes an improved model based on Graph Convolutional Networks (GCNs), named FR-SGCN (Hierarchical Depth-wise Separable Graph Convolutional Network Incorporating Graph Reasoning and Attention Mechanisms). The model is designed to enhance the precision and robustness of road extraction through intelligent techniques, thereby supporting precise planning of green infrastructure. First, high-dimensional features are extracted using ResNeXt, whose grouped convolution structure balances parameter efficiency and feature representation capability, significantly enhancing the expressiveness of the data. These high-dimensional features are then segmented, and enhanced channel and spatial features are obtained via attention mechanisms, effectively mitigating background interference and intra-class ambiguity. Subsequently, a hybrid adjacency matrix construction method is proposed, based on gradient operators and graph reasoning. This method integrates similarity and gradient information and employs graph convolution to capture the global contextual relationships among features. To validate the effectiveness of FR-SGCN, we conducted comparative experiments using 12 different methods on both a self-built dataset and a public dataset. The proposed model achieved the highest F1 score on both datasets. Visualization results from the experiments demonstrate that the model effectively extracts occluded roads and reduces the risk of redundant construction caused by data errors during urban renewal. This provides reliable technical support for smart cities and sustainable development. Full article

(This article belongs to the Topic Digital and Intelligent Technologies and Application in Urban Construction, Operation, Maintenance, and Renewal)

► Show Figures

Figure 1

17 pages, 35033 KiB

Open AccessArticle

A Multi-Branch Attention Fusion Method for Semantic Segmentation of Remote Sensing Images

by Kaibo Li, Zhenping Qiang, Hong Lin and Xiaorui Wang

Remote Sens. 2025, 17(11), 1898; https://doi.org/10.3390/rs17111898 - 30 May 2025

Viewed by 502

Abstract

In recent years, advancements in remote sensing image observation technology have significantly enriched the surface feature information captured in remote sensing images, posing greater challenges for semantic information extraction from remote sensing imagery. While convolutional neural networks (CNNs) excel at understanding relationships between [...] Read more.

In recent years, advancements in remote sensing image observation technology have significantly enriched the surface feature information captured in remote sensing images, posing greater challenges for semantic information extraction from remote sensing imagery. While convolutional neural networks (CNNs) excel at understanding relationships between adjacent image regions, processing multidimensional data requires reliance on attention mechanisms. However, due to the inherent complexity of remote sensing images, most attention mechanisms designed for natural images underperform when applied to remote sensing data. To address these challenges in remote sensing image semantic segmentation, we propose a highly generalizable multi-branch attention fusion method based on shallow and deep features. This approach applies pixel-wise, spatial, and channel attention mechanisms to feature maps fused with shallow and deep features, thereby enhancing the network’s semantic information extraction capability. Through evaluations on the Cityscapes, LoveDA, and WHDLD datasets, we validate the performance of our method in processing remote sensing data. The results demonstrate consistent improvements in segmentation accuracy across most categories, highlighting its strong generalization capability. Specifically, compared to baseline methods, our approach achieves average mIoU improvements of 0.42% and 0.54% on the WHDLD and LoveDA datasets, respectively, significantly enhancing network performance in complex remote sensing scenarios. Full article

(This article belongs to the Special Issue Ocean Remote Sensing Based on Radar, Sonar and Optical Techniques (Second Edition))

► Show Figures

Figure 1

19 pages, 7025 KiB

Open AccessArticle

CDWMamba: Cloud Detection with Wavelet-Enhanced Mamba for Optical Satellite Imagery

by Shiyao Meng, Wei Gong, Siwei Li, Ge Song, Jie Yang and Yu Ding

Remote Sens. 2025, 17(11), 1874; https://doi.org/10.3390/rs17111874 - 28 May 2025

Viewed by 459

Abstract

Accurate cloud detection is a critical preprocessing step in remote sensing applications, as cloud and cloud shadow contamination can significantly degrade the quality of optical satellite imagery. In this paper, we propose CDWMamba, a novel dual-domain neural network that integrates the Mamba-based state [...] Read more.

Accurate cloud detection is a critical preprocessing step in remote sensing applications, as cloud and cloud shadow contamination can significantly degrade the quality of optical satellite imagery. In this paper, we propose CDWMamba, a novel dual-domain neural network that integrates the Mamba-based state space model with discrete wavelet transform (DWT) for effective cloud detection. CDWMamba adopts a four-direction Mamba module to capture long-range dependencies, while the wavelet decomposition enables multi-scale global context modeling in the frequency domain. To further enhance fine-grained spatial features, we incorporate a multi-scale depth-wise separable convolution (MDC) module for spatial detail refinement. Additionally, a spectral–spatial bottleneck (SSN) with channel-wise attention is introduced to promote inter-band information interaction across multi-spectral inputs. We evaluate our method on two benchmark datasets, L8 Biome and S2_CMC, covering diverse land cover types and environmental conditions. Experimental results demonstrate that CDWMamba achieves state-of-the-art performance across multiple metrics, significantly outperforming deep-learning-based baselines in terms of overall accuracy, mIoU, precision, and recall. Moreover, the model exhibits satisfactory performance under challenging conditions such as snow/ice and shrubland surfaces. These results verify the effectiveness of combining a state space model, frequency-domain representation, and spectral–spatial attention for cloud detection in multi-spectral remote sensing imagery. Full article

(This article belongs to the Special Issue Applications and Analysis of Satellite Cloud Imagery Using Deep Learning Techniques)

► Show Figures

Graphical abstract

17 pages, 11290 KiB

Open AccessArticle

Learning to Utilize Multi-Scale Feature Information for Crisp Power Line Detection

by Kai Li, Min Liu, Feiran Wang, Xinyang Guo, Geng Han, Xiangnan Bai and Changsong Liu

Electronics 2025, 14(11), 2175; https://doi.org/10.3390/electronics14112175 - 27 May 2025

Viewed by 306

Abstract

Power line detection (PLD) is a crucial task in the electric power industry where accurate PLD forms the foundation for achieving automated inspections. However, recent top-performing power line detection methods tend to generate thick and noisy edge lines, adding to the difficulties of [...] Read more.

Power line detection (PLD) is a crucial task in the electric power industry where accurate PLD forms the foundation for achieving automated inspections. However, recent top-performing power line detection methods tend to generate thick and noisy edge lines, adding to the difficulties of subsequent tasks. In this work, we propose a multi-scale feature-based PLD method named LUM-Net to allow for the detection of power lines in a crisp and precise way. The algorithm utilizes EfficientNetV1 as the backbone network, ensuring effective feature extraction across various scales. We developed a Coordinated Convolutional Block Attention Module (CoCBAM) to focus on critical features by emphasizing both channel-wise and spatial information, thereby refining the power lines and reducing noise. Furthermore, we constructed the Bi-Large Kernel Convolutional Block (BiLKB) as the decoder, leveraging large kernel convolutions and spatial selection mechanisms to capture more contextual information, supplemented by auxiliary small kernels to refine the extracted feature information. By integrating these advanced components into a top-down dense connection mechanism, our method achieves effective, multi-scale information interaction, significantly improving the overall performance. The experimental results show that our method can predict crisp power line maps and achieve state-of-the-art performance on the PLDU dataset (ODS = 0.969) and PLDM dataset (ODS = 0.943). Full article

► Show Figures

Figure 1

26 pages, 14974 KiB

Open AccessArticle

HFEF²-YOLO: Hierarchical Dynamic Attention for High-Precision Multi-Scale Small Target Detection in Complex Remote Sensing

by Yao Lu, Biyun Zhang, Chunmin Zhang, Yifan He and Yanqiang Wang

Remote Sens. 2025, 17(10), 1789; https://doi.org/10.3390/rs17101789 - 20 May 2025

Viewed by 579

Abstract

Deep learning-based methods for real-time small target detection are critical for applications such as traffic monitoring, land management, and marine transportation. However, achieving high-precision detection of small objects against complex backgrounds remains challenging due to insufficient feature representation and background interference. Existing methods [...] Read more.

Deep learning-based methods for real-time small target detection are critical for applications such as traffic monitoring, land management, and marine transportation. However, achieving high-precision detection of small objects against complex backgrounds remains challenging due to insufficient feature representation and background interference. Existing methods often struggle to balance multi-scale feature enhancement and computational efficiency, particularly in scenarios with low target-to-background contrast. To address this challenge, this study proposes an efficient detection method called hierarchical feature enhancement and feature fusion YOLO (HFEF²-YOLO), which is based on the hierarchical dynamic attention. Firstly, a Hierarchical Filtering Feature Pyramid Network (HF-FPN) is introduced, which employs a dynamic gating mechanism to achieve differentiated screening and fusion of cross-scale features. This design addresses the feature redundancy caused by fixed fusion strategies in conventional FPN architectures, preserving edge details of tiny targets. Secondly, we propose a Dynamic Spatial–Spectral Attention Module (DSAM), which adaptively fuses channel-wise and spatial–dimensional responses through learnable weight allocation, generating dedicated spatial modulation factors for individual channels and significantly enhancing the saliency representation of dim small targets. Extensive experiments on four benchmark datasets (VEDAI, AI-TOD, DOTA, NWPU VHR-10) demonstrate the superiority of HFEF²-YOLO; the proposed method can reach an accuracy of 0.761, 0.621, 0.737, and 0.969 (in terms of mAP@0.5), outperforming state-of-the-art methods by 3.5–8.1%. Furthermore, a lightweight version (L-HFEF²-YOLO) is developed via dynamic convolution, reducing parameters by 42% while maintaining >95% accuracy, demonstrating real-time applicability on edge devices. Robustness tests under simulated degradation (e.g., noise, blur) validate its practicality for satellite-based tasks. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

19 pages, 14292 KiB

Open AccessArticle

GCAFlow: Multi-Scale Flow-Based Model with Global Context-Aware Channel Attention for Industrial Anomaly Detection

by Lin Liao, Congde Lu, Yujie Gao, Hao Yu and Biao Cai

Sensors 2025, 25(10), 3205; https://doi.org/10.3390/s25103205 - 20 May 2025

Viewed by 469

Abstract

In anomaly detection tasks, labeled defect data are often scarce. Unsupervised learning leverages only normal samples during training, making it particularly suitable for anomaly detection tasks. Among unsupervised methods, normalizing flow models have shown distinct advantages. They allow precise modeling of data distributions [...] Read more.

In anomaly detection tasks, labeled defect data are often scarce. Unsupervised learning leverages only normal samples during training, making it particularly suitable for anomaly detection tasks. Among unsupervised methods, normalizing flow models have shown distinct advantages. They allow precise modeling of data distributions and enable direct computation of sample log-likelihoods. Recent work has largely focused on feature fusion strategies. However, most of the flow-based methods emphasize spatial information while neglecting the critical role of channel-wise features. To address this limitation, we propose GCAFlow, a novel flow-based model enhanced with a global context-aware channel attention mechanism. In addition, we design a hierarchical convolutional subnetwork to improve the probabilistic modeling capacity of the flow-based framework. This subnetwork supports more accurate estimation of data likelihoods and enhances anomaly detection performance. We evaluate GCAFlow on three benchmark anomaly detection datasets, and the results demonstrate that it consistently outperforms existing flow-based models in both accuracy and robustness. In particular, on the VisA dataset, GCAFlow achieves an image-level AUROC of 98.2% and a pixel-level AUROC of 99.0%. Full article

(This article belongs to the Section Industrial Sensors)

► Show Figures

Figure 1

16 pages, 3751 KiB

Open AccessArticle

Improved Face Image Super-Resolution Model Based on Generative Adversarial Network

by Qingyu Liu, Yeguo Sun, Lei Chen and Lei Liu

J. Imaging 2025, 11(5), 163; https://doi.org/10.3390/jimaging11050163 - 19 May 2025

Viewed by 635

Abstract

Image super-resolution (SR) models based on the generative adversarial network (GAN) face challenges such as unnatural facial detail restoration and local blurring. This paper proposes an improved GAN-based model to address these issues. First, a Multi-scale Hybrid Attention Residual Block (MHARB) is designed, [...] Read more.

Image super-resolution (SR) models based on the generative adversarial network (GAN) face challenges such as unnatural facial detail restoration and local blurring. This paper proposes an improved GAN-based model to address these issues. First, a Multi-scale Hybrid Attention Residual Block (MHARB) is designed, which dynamically enhances feature representation in critical face regions through dual-branch convolution and channel-spatial attention. Second, an Edge-guided Enhancement Block (EEB) is introduced, generating adaptive detail residuals by combining edge masks and channel attention to accurately recover high-frequency textures. Furthermore, a multi-scale discriminator with a weighted sub-discriminator loss is developed to balance global structural and local detail generation quality. Additionally, a phase-wise training strategy with dynamic adjustment of learning rate (Lr) and loss function weights is implemented to improve the realism of super-resolved face images. Experiments on the CelebA-HQ dataset demonstrate that the proposed model achieves a PSNR of 23.35 dB, a SSIM of 0.7424, and a LPIPS of 24.86, outperforming classical models and delivering superior visual quality in high-frequency regions. Notably, this model also surpasses the SwinIR model (PSNR: 23.28 dB → 23.35 dB, SSIM: 0.7340 → 0.7424, and LPIPS: 30.48 → 24.86), validating the effectiveness of the improved model and the training strategy in preserving facial details. Full article

(This article belongs to the Section AI in Imaging)

► Show Figures

Figure 1

23 pages, 6938 KiB

Open AccessArticle

A Hybrid Attention Framework Integrating Channel–Spatial Refinement and Frequency Spectral Analysis for Remote Sensing Smoke Recognition

by Guangtao Cheng, Lisha Yang, Zhihao Yu, Xiaobo Li and Guanghui Fu

Fire 2025, 8(5), 197; https://doi.org/10.3390/fire8050197 - 14 May 2025

Viewed by 427

Abstract

In recent years, accelerated global climate change has precipitated an increased frequency of wildfire events, with their devastating impacts on ecological systems and human populations becoming increasingly significant. Satellite remote sensing technology, leveraging its extensive spatial coverage and real-time monitoring capabilities, has emerged [...] Read more.

In recent years, accelerated global climate change has precipitated an increased frequency of wildfire events, with their devastating impacts on ecological systems and human populations becoming increasingly significant. Satellite remote sensing technology, leveraging its extensive spatial coverage and real-time monitoring capabilities, has emerged as a pivotal approach for wildfire early warning and comprehensive disaster assessment. To effectively detect subtle smoke signatures while minimizing background interference in remote sensing imagery, this paper introduces a novel dual-branch attention framework (CSFAttention) that synergistically integrates channel–spatial refinement with frequency spectral analysis to aggregate smoke features in remote sensing images. The channel–spatial branch implements an innovative triple-pooling strategy (incorporating average, maximum, and standard deviation pooling) across both channel and spatial dimensions to generate complementary descriptors that enhance distinct statistical properties of smoke representations. Concurrently, the frequency branch explicitly enhances high-frequency edge patterns, which are critical for distinguishing subtle textural variations characteristic of smoke plumes. The outputs from these complementary branches are fused through element-wise summation, yielding a refined feature representation that optimizes channel dependencies, spatial saliency, and spectral discriminability. The CSFAttention module is strategically integrated into the bottleneck structures of the ResNet architecture, forming a specialized deep network specifically designed for robust smoke recognition. Experimental validation on the USTC_SmokeRS dataset demonstrates that the proposed CSFResNet achieves recognition accuracy of 96.84%, surpassing existing deep networks for RS smoke recognition. Full article

► Show Figures

Figure 1

Search Results (113)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (113)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI