Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (179)

Search Parameters:
Keywords = multi-channel feature pyramid networks

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 4169 KiB  
Article
Multi-Scale Differentiated Network with Spatial–Spectral Co-Operative Attention for Hyperspectral Image Denoising
by Xueli Chang, Xiaodong Wang, Xiaoyu Huang, Meng Yan and Luxiao Cheng
Appl. Sci. 2025, 15(15), 8648; https://doi.org/10.3390/app15158648 (registering DOI) - 5 Aug 2025
Abstract
Hyperspectral image (HSI) denoising is a crucial step in image preprocessing as its effectiveness has a direct impact on the accuracy of subsequent tasks such as land cover classification, target recognition, and change detection. However, existing methods suffer from limitations in effectively integrating [...] Read more.
Hyperspectral image (HSI) denoising is a crucial step in image preprocessing as its effectiveness has a direct impact on the accuracy of subsequent tasks such as land cover classification, target recognition, and change detection. However, existing methods suffer from limitations in effectively integrating multi-scale features and adaptively modeling complex noise distributions, making it difficult to construct effective spatial–spectral joint representations. This often leads to issues like detail loss and spectral distortion, especially when dealing with complex mixed noise. To address these challenges, this paper proposes a multi-scale differentiated denoising network based on spatial–spectral cooperative attention (MDSSANet). The network first constructs a multi-scale image pyramid using three downsampling operations and independently models the features at each scale to better capture noise characteristics at different levels. Additionally, a spatial–spectral cooperative attention module (SSCA) and a differentiated multi-scale feature fusion module (DMF) are introduced. The SSCA module effectively captures cross-spectral dependencies and spatial feature interactions through parallel spectral channel and spatial attention mechanisms. The DMF module adopts a multi-branch parallel structure with differentiated processing to dynamically fuse multi-scale spatial–spectral features and incorporates a cross-scale feature compensation strategy to improve feature representation and mitigate information loss. The experimental results show that the proposed method outperforms state-of-the-art methods across several public datasets, exhibiting greater robustness and superior visual performance in tasks such as handling complex noise and recovering small targets. Full article
(This article belongs to the Special Issue Remote Sensing Image Processing and Application, 2nd Edition)
Show Figures

Figure 1

16 pages, 4587 KiB  
Article
FAMNet: A Lightweight Stereo Matching Network for Real-Time Depth Estimation in Autonomous Driving
by Jingyuan Zhang, Qiang Tong, Na Yan and Xiulei Liu
Symmetry 2025, 17(8), 1214; https://doi.org/10.3390/sym17081214 - 1 Aug 2025
Viewed by 212
Abstract
Accurate and efficient stereo matching is fundamental to real-time depth estimation from symmetric stereo cameras in autonomous driving systems. However, existing high-accuracy stereo matching networks typically rely on computationally expensive 3D convolutions, which limit their practicality in real-world environments. In contrast, real-time methods [...] Read more.
Accurate and efficient stereo matching is fundamental to real-time depth estimation from symmetric stereo cameras in autonomous driving systems. However, existing high-accuracy stereo matching networks typically rely on computationally expensive 3D convolutions, which limit their practicality in real-world environments. In contrast, real-time methods often sacrifice accuracy or generalization capability. To address these challenges, we propose FAMNet (Fusion Attention Multi-Scale Network), a lightweight and generalizable stereo matching framework tailored for real-time depth estimation in autonomous driving applications. FAMNet consists of two novel modules: Fusion Attention-based Cost Volume (FACV) and Multi-scale Attention Aggregation (MAA). FACV constructs a compact yet expressive cost volume by integrating multi-scale correlation, attention-guided feature fusion, and channel reweighting, thereby reducing reliance on heavy 3D convolutions. MAA further enhances disparity estimation by fusing multi-scale contextual cues through pyramid-based aggregation and dual-path attention mechanisms. Extensive experiments on the KITTI 2012 and KITTI 2015 benchmarks demonstrate that FAMNet achieves a favorable trade-off between accuracy, efficiency, and generalization. On KITTI 2015, with the incorporation of FACV and MAA, the prediction accuracy of the baseline model is improved by 37% and 38%, respectively, and a total improvement of 42% is achieved by our final model. These results highlight FAMNet’s potential for practical deployment in resource-constrained autonomous driving systems requiring real-time and reliable depth perception. Full article
Show Figures

Figure 1

26 pages, 62045 KiB  
Article
CML-RTDETR: A Lightweight Wheat Head Detection and Counting Algorithm Based on the Improved RT-DETR
by Yue Fang, Chenbo Yang, Chengyong Zhu, Hao Jiang, Jingmin Tu and Jie Li
Electronics 2025, 14(15), 3051; https://doi.org/10.3390/electronics14153051 - 30 Jul 2025
Viewed by 157
Abstract
Wheat is one of the important grain crops, and spike counting is crucial for predicting spike yield. However, in complex farmland environments, the wheat body scale has huge differences, its color is highly similar to the background, and wheat ears often overlap with [...] Read more.
Wheat is one of the important grain crops, and spike counting is crucial for predicting spike yield. However, in complex farmland environments, the wheat body scale has huge differences, its color is highly similar to the background, and wheat ears often overlap with each other, which makes wheat ear detection work face a lot of challenges. At the same time, the increasing demand for high accuracy and fast response in wheat spike detection has led to the need for models to be lightweight function with reduced the hardware costs. Therefore, this study proposes a lightweight wheat ear detection model, CML-RTDETR, for efficient and accurate detection of wheat ears in real complex farmland environments. In the model construction, the lightweight network CSPDarknet is firstly introduced as the backbone network of CML-RTDETR to enhance the feature extraction efficiency. In addition, the FM module is cleverly introduced to modify the bottleneck layer in the C2f component, and hybrid feature extraction is realized by spatial and frequency domain splicing to enhance the feature extraction capability of wheat to be tested in complex scenes. Secondly, to improve the model’s detection capability for targets of different scales, a multi-scale feature enhancement pyramid (MFEP) is designed, consisting of GHSDConv, for efficiently obtaining low-level detail information and CSPDWOK for constructing a multi-scale semantic fusion structure. Finally, channel pruning based on Layer-Adaptive Magnitude Pruning (LAMP) scoring is performed to reduce model parameters and runtime memory. The experimental results on the GWHD2021 dataset show that the AP50 of CML-RTDETR reaches 90.5%, which is an improvement of 1.2% compared to the baseline RTDETR-R18 model. Meanwhile, the parameters and GFLOPs have been decreased to 11.03 M and 37.8 G, respectively, resulting in a reduction of 42% and 34%, respectively. Finally, the real-time frame rate reaches 73 fps, significantly achieving parameter simplification and speed improvement. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

34 pages, 4388 KiB  
Article
IRSD-Net: An Adaptive Infrared Ship Detection Network for Small Targets in Complex Maritime Environments
by Yitong Sun and Jie Lian
Remote Sens. 2025, 17(15), 2643; https://doi.org/10.3390/rs17152643 - 30 Jul 2025
Viewed by 340
Abstract
Infrared ship detection plays a vital role in maritime surveillance systems. As a critical remote sensing application, it enables maritime surveillance across diverse geographic scales and operational conditions while offering robust all-weather operation and resilience to environmental interference. However, infrared imagery in complex [...] Read more.
Infrared ship detection plays a vital role in maritime surveillance systems. As a critical remote sensing application, it enables maritime surveillance across diverse geographic scales and operational conditions while offering robust all-weather operation and resilience to environmental interference. However, infrared imagery in complex maritime environments presents significant challenges, including low contrast, background clutter, and difficulties in detecting small-scale or distant targets. To address these issues, we propose an Infrared Ship Detection Network (IRSD-Net), a lightweight and efficient detection network built upon the YOLOv11n framework and specially designed for infrared maritime imagery. IRSD-Net incorporates a Hierarchical Multi-Kernel Convolution Network (HMKCNet), which employs parallel multi-kernel convolutions and channel division to enhance multi-scale feature extraction while reducing redundancy and memory usage. To further improve cross-scale fusion, we design the Dynamic Cross-Scale Feature Pyramid Network (DCSFPN), a bidirectional architecture that combines up- and downsampling to integrate low-level detail with high-level semantics. Additionally, we introduce Wise-PIoU, a novel loss function that improves bounding box regression by enforcing geometric alignment and adaptively weighting gradients based on alignment quality. Experimental results demonstrate that IRSD-Net achieves 92.5% mAP50 on the ISDD dataset, outperforming YOLOv6n and YOLOv11n by 3.2% and 1.7%, respectively. With a throughput of 714.3 FPS, IRSD-Net delivers high-accuracy, real-time performance suitable for practical maritime monitoring systems. Full article
Show Figures

Figure 1

27 pages, 13439 KiB  
Article
Swin-ReshoUnet: A Seismic Profile Signal Reconstruction Method Integrating Hierarchical Convolution, ORCA Attention, and Residual Channel Attention Mechanism
by Jie Rao, Mingju Chen, Xiaofei Song, Chen Xie, Xueyang Duan, Xiao Hu, Senyuan Li and Xingyue Zhang
Appl. Sci. 2025, 15(15), 8332; https://doi.org/10.3390/app15158332 - 26 Jul 2025
Viewed by 168
Abstract
This study proposes a Swin-ReshoUnet architecture with a three-level enhancement mechanism to address inefficiencies in multi-scale feature extraction and gradient degradation in deep networks for high-precision seismic exploration. The encoder uses a hierarchical convolution module to build a multi-scale feature pyramid, enhancing cross-scale [...] Read more.
This study proposes a Swin-ReshoUnet architecture with a three-level enhancement mechanism to address inefficiencies in multi-scale feature extraction and gradient degradation in deep networks for high-precision seismic exploration. The encoder uses a hierarchical convolution module to build a multi-scale feature pyramid, enhancing cross-scale geological signal representation. The decoder replaces traditional self-attention with ORCA attention to enable global context modeling with lower computational cost. Skip connections integrate a residual channel attention module, mitigating gradient degradation via dual-pooling feature fusion and activation optimization, forming a full-link optimization from low-level feature enhancement to high-level semantic integration. Simulated and real dataset experiments show that at decimation ratios of 0.1–0.5, the method significantly outperforms SwinUnet, TransUnet, etc., in reconstruction performance. Residual signals and F-K spectra verify high-fidelity reconstruction. Despite increased difficulty with higher sparsity, it maintains optimal performance with notable margins, demonstrating strong robustness. The proposed hierarchical feature enhancement and cross-scale attention strategies offer an efficient seismic profile signal reconstruction solution and show generality for migration to complex visual tasks, advancing geophysics-computer vision interdisciplinary innovation. Full article
Show Figures

Figure 1

22 pages, 4611 KiB  
Article
MMC-YOLO: A Lightweight Model for Real-Time Detection of Geometric Symmetry-Breaking Defects in Wind Turbine Blades
by Caiye Liu, Chao Zhang, Xinyu Ge, Xunmeng An and Nan Xue
Symmetry 2025, 17(8), 1183; https://doi.org/10.3390/sym17081183 - 24 Jul 2025
Viewed by 322
Abstract
Performance degradation of wind turbine blades often stems from geometric asymmetry induced by damage. Existing methods for assessing damage face challenges in balancing accuracy and efficiency due to their limited ability to capture fine-grained geometric asymmetries associated with multi-scale damage under complex background [...] Read more.
Performance degradation of wind turbine blades often stems from geometric asymmetry induced by damage. Existing methods for assessing damage face challenges in balancing accuracy and efficiency due to their limited ability to capture fine-grained geometric asymmetries associated with multi-scale damage under complex background interference. To address this, based on the high-speed detection model YOLOv10-N, this paper proposes a novel detection model named MMC-YOLO. First, the Multi-Scale Perception Gated Convolution (MSGConv) Module was designed, which constructs a full-scale receptive field through multi-branch fusion and channel rearrangement to enhance the extraction of geometric asymmetry features. Second, the Multi-Scale Enhanced Feature Pyramid Network (MSEFPN) was developed, integrating dynamic path aggregation and an SENetv2 attention mechanism to suppress background interference and amplify damage response. Finally, the Channel-Compensated Filtering (CCF) module was constructed to preserve critical channel information using a dynamic buffering mechanism. Evaluated on a dataset of 4818 wind turbine blade damage images, MMC-YOLO achieves an 82.4% mAP [0.5:0.95], representing a 4.4% improvement over the baseline YOLOv10-N model, and a 91.1% recall rate, an 8.7% increase, while maintaining a lightweight parameter count of 4.2 million. This framework significantly enhances geometric asymmetry defect detection accuracy while ensuring real-time performance, meeting engineering requirements for high efficiency and precision. Full article
(This article belongs to the Special Issue Symmetry and Its Applications in Image Processing)
Show Figures

Figure 1

22 pages, 2420 KiB  
Article
BiEHFFNet: A Water Body Detection Network for SAR Images Based on Bi-Encoder and Hybrid Feature Fusion
by Bin Han, Xin Huang and Feng Xue
Mathematics 2025, 13(15), 2347; https://doi.org/10.3390/math13152347 - 23 Jul 2025
Viewed by 193
Abstract
Water body detection in synthetic aperture radar (SAR) imagery plays a critical role in applications such as disaster response, water resource management, and environmental monitoring. However, it remains challenging due to complex background interference in SAR images. To address this issue, a bi-encoder [...] Read more.
Water body detection in synthetic aperture radar (SAR) imagery plays a critical role in applications such as disaster response, water resource management, and environmental monitoring. However, it remains challenging due to complex background interference in SAR images. To address this issue, a bi-encoder and hybrid feature fuse network (BiEHFFNet) is proposed for achieving accurate water body detection. First, a bi-encoder structure based on ResNet and Swin Transformer is used to jointly extract local spatial details and global contextual information, enhancing feature representation in complex scenarios. Additionally, the convolutional block attention module (CBAM) is employed to suppress irrelevant information of the output features of each ResNet stage. Second, a cross-attention-based hybrid feature fusion (CABHFF) module is designed to interactively integrate local and global features through cross-attention, followed by channel attention to achieve effective hybrid feature fusion, thus improving the model’s ability to capture water structures. Third, a multi-scale content-aware upsampling (MSCAU) module is designed by integrating atrous spatial pyramid pooling (ASPP) with the Content-Aware ReAssembly of FEatures (CARAFE), aiming to enhance multi-scale contextual learning while alleviating feature distortion caused by upsampling. Finally, a composite loss function combining Dice loss and Active Contour loss is used to provide stronger boundary supervision. Experiments conducted on the ALOS PALSAR dataset demonstrate that the proposed BiEHFFNet outperforms existing methods across multiple evaluation metrics, achieving more accurate water body detection. Full article
(This article belongs to the Special Issue Advanced Mathematical Methods in Remote Sensing)
Show Figures

Figure 1

17 pages, 2893 KiB  
Article
Insulator Defect Detection Based on Improved YOLO11n Algorithm Under Complex Environmental Conditions
by Shoutian Dong, Yiqi Qin, Benrui Li, Qi Zhang and Yu Zhao
Electronics 2025, 14(14), 2898; https://doi.org/10.3390/electronics14142898 - 20 Jul 2025
Viewed by 379
Abstract
Detecting defects in transmission line insulators is crucial to prevent power grid failures as power systems continue to expand. This study introduces YOL011n-SSA, an enhanced insulator defect detection technique method that addresses the challenges of effectively identifying flaws in complex environments. First, this [...] Read more.
Detecting defects in transmission line insulators is crucial to prevent power grid failures as power systems continue to expand. This study introduces YOL011n-SSA, an enhanced insulator defect detection technique method that addresses the challenges of effectively identifying flaws in complex environments. First, this study incorporates the StarNet network into the backbone of the model. By stacking multiple layers of star operations, the model reduces both parameter count and model size, improving its adaptability to real-time object detection tasks. Secondly, the SOPN feature pyramid network is introduced into the neck part of the model. By optimizing the multi-scale feature fusion of the richer information obtained after expanding the channel dimension, the detection efficiency for low-resolution images and small objects is improved. Then, the ADown module was adopted to improve the backbone and neck parts of the model. It effectively reduces parameter count and significantly lowers the computational cost by implementing downsampling operations between different layers of the feature map, thereby enhancing the practicality of the model. Meanwhile, by introducing the NWD to improve the evaluation index of the loss function, the detection model’s capability in assessing the similarities among various small-object defects is enhanced. Experimental results were obtained using an expanded dataset based on a public dataset, incorporating three types of insulator defects under complex environmental conditions. The results demonstrate that the YOLO11n-SSA algorithm achieved an mAP@0.5 of 0.919, an mAP@0.5:0.95 of 70.7%, a precision of 0.95, and a recall of 0.875, representing improvements of 3.9%, 5.5%, 2%, and 5.7%, respectively, when compared to the original YOLO1ln method. The detection time per image is 0.0134 s. Compared to other mainstream algorithms, the YOLO11n-SSA algorithm demonstrates superior detection accuracy and real-time performance. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

20 pages, 33417 KiB  
Article
Enhancing UAV Object Detection in Low-Light Conditions with ELS-YOLO: A Lightweight Model Based on Improved YOLOv11
by Tianhang Weng and Xiaopeng Niu
Sensors 2025, 25(14), 4463; https://doi.org/10.3390/s25144463 - 17 Jul 2025
Viewed by 562
Abstract
Drone-view object detection models operating under low-light conditions face several challenges, such as object scale variations, high image noise, and limited computational resources. Existing models often struggle to balance accuracy and lightweight architecture. This paper introduces ELS-YOLO, a lightweight object detection model tailored [...] Read more.
Drone-view object detection models operating under low-light conditions face several challenges, such as object scale variations, high image noise, and limited computational resources. Existing models often struggle to balance accuracy and lightweight architecture. This paper introduces ELS-YOLO, a lightweight object detection model tailored for low-light environments, built upon the YOLOv11s framework. ELS-YOLO features a re-parameterized backbone (ER-HGNetV2) with integrated Re-parameterized Convolution and Efficient Channel Attention mechanisms, a Lightweight Feature Selection Pyramid Network (LFSPN) for multi-scale object detection, and a Shared Convolution Separate Batch Normalization Head (SCSHead) to reduce computational complexity. Layer-Adaptive Magnitude-Based Pruning (LAMP) is employed to compress the model size. Experiments on the ExDark and DroneVehicle datasets demonstrate that ELS-YOLO achieves high detection accuracy with a compact model. Here, we show that ELS-YOLO attains a mAP@0.5 of 74.3% and 68.7% on the ExDark and DroneVehicle datasets, respectively, while maintaining real-time inference capability. Full article
(This article belongs to the Special Issue Vision Sensors for Object Detection and Tracking)
Show Figures

Figure 1

15 pages, 1142 KiB  
Technical Note
Terrain and Atmosphere Classification Framework on Satellite Data Through Attentional Feature Fusion Network
by Antoni Jaszcz and Dawid Połap
Remote Sens. 2025, 17(14), 2477; https://doi.org/10.3390/rs17142477 - 17 Jul 2025
Viewed by 228
Abstract
Surface, terrain, or even atmosphere analysis using images or their fragments is important due to the possibilities of further processing. In particular, attention is necessary for satellite and/or drone images. Analyzing image elements by classifying the given classes is important for obtaining information [...] Read more.
Surface, terrain, or even atmosphere analysis using images or their fragments is important due to the possibilities of further processing. In particular, attention is necessary for satellite and/or drone images. Analyzing image elements by classifying the given classes is important for obtaining information about space for autonomous systems, identifying landscape elements, or monitoring and maintaining the infrastructure and environment. Hence, in this paper, we propose a neural classifier architecture that analyzes different features by the parallel processing of information in the network and combines them with a feature fusion mechanism. The neural architecture model takes into account different types of features by extracting them by focusing on spatial, local patterns and multi-scale representation. In addition, the classifier is guided by an attention mechanism for focusing more on different channels, spatial information, and even feature pyramid mechanisms. Atrous convolutional operators were also used in such an architecture as better context feature extractors. The proposed classifier architecture is the main element of the modeled framework for satellite data analysis, which is based on the possibility of training depending on the client’s desire. The proposed methodology was evaluated on three publicly available classification datasets for remote sensing: satellite images, Visual Terrain Recognition, and USTC SmokeRS, where the proposed model achieved accuracy scores of 97.8%, 100.0%, and 92.4%, respectively. The obtained results indicate the effectiveness of the proposed attention mechanisms across different remote sensing challenges. Full article
Show Figures

Figure 1

20 pages, 1935 KiB  
Article
Residual Attention Network with Atrous Spatial Pyramid Pooling for Soil Element Estimation in LUCAS Hyperspectral Data
by Yun Deng, Yuchen Cao, Shouxue Chen and Xiaohui Cheng
Appl. Sci. 2025, 15(13), 7457; https://doi.org/10.3390/app15137457 - 3 Jul 2025
Viewed by 305
Abstract
Visible and near-infrared (Vis–NIR) spectroscopy enables the rapid prediction of soil properties but faces three limitations with conventional machine learning: information loss and overfitting from high-dimensional spectral features; inadequate modeling of nonlinear soil–spectra relationships; and failure to integrate multi-scale spatial features. To address [...] Read more.
Visible and near-infrared (Vis–NIR) spectroscopy enables the rapid prediction of soil properties but faces three limitations with conventional machine learning: information loss and overfitting from high-dimensional spectral features; inadequate modeling of nonlinear soil–spectra relationships; and failure to integrate multi-scale spatial features. To address these challenges, we propose ReSE-AP Net, a multi-scale attention residual network with spatial pyramid pooling. Built on convolutional residual blocks, the model incorporates a squeeze-and-excitation channel attention mechanism to recalibrate feature weights and an atrous spatial pyramid pooling (ASPP) module to extract multi-resolution spectral features. This architecture synergistically represents weak absorption peaks (400–1000 nm) and broad spectral bands (1000–2500 nm), overcoming single-scale modeling limitations. Validation on the LUCAS2009 dataset demonstrated that ReSE-AP Net outperformed conventional machine learning by improving the R2 by 2.8–36.5% and reducing the RMSE by 14.2–69.2%. Compared with existing deep learning methods, it increased the R2 by 0.4–25.5% for clay, silt, sand, organic carbon, calcium carbonate, and phosphorus predictions, and decreased the RMSE by 0.7–39.0%. Our contributions include statistical analysis of LUCAS2009 spectra, identification of conventional method limitations, development of the ReSE-AP Net model, ablation studies, and comprehensive comparisons with alternative approaches. Full article
Show Figures

Figure 1

24 pages, 2149 KiB  
Article
STA-3D: Combining Spatiotemporal Attention and 3D Convolutional Networks for Robust Deepfake Detection
by Jingbo Wang, Jun Lei, Shuohao Li and Jun Zhang
Symmetry 2025, 17(7), 1037; https://doi.org/10.3390/sym17071037 - 1 Jul 2025
Viewed by 551
Abstract
Recent advancements in deep learning have driven the rapid proliferation of deepfake generation techniques, raising substantial concerns over digital security and trustworthiness. Most current detection methods primarily focus on spatial or frequency domain features but show limited effectiveness when dealing with compressed videos [...] Read more.
Recent advancements in deep learning have driven the rapid proliferation of deepfake generation techniques, raising substantial concerns over digital security and trustworthiness. Most current detection methods primarily focus on spatial or frequency domain features but show limited effectiveness when dealing with compressed videos and cross-dataset scenarios. Observing that mainstream generation methods use frame-by-frame synthesis without adequate temporal consistency constraints, we introduce the Spatiotemporal Attention 3D Network (STA-3D), a novel framework that combines a lightweight spatiotemporal attention module with a 3D convolutional architecture to improve detection robustness. The proposed attention module adopts a symmetric multi-branch architecture, where each branch follows a nearly identical processing pipeline to separately model temporal-channel, temporal-spatial, and intra-spatial correlations. Our framework additionally implements Spatial Pyramid Pooling (SPP) layers along the temporal axis, enabling adaptive modeling regardless of input video length. Furthermore, we mitigate the inherent asymmetry in the quantity of authentic and forged samples by replacing standard cross entropy with focal loss for training. This integration facilitates the simultaneous exploitation of inter-frame temporal discontinuities and intra-frame spatial artifacts, achieving competitive performance across various benchmark datasets under different compression conditions: for the intra-dataset setting on FF++, it improves the average accuracy by 1.09 percentage points compared to existing SOTA, with a more significant gain of 1.63 percentage points under the most challenging C40 compression level (particularly for NeuralTextures, achieving an improvement of 4.05 percentage points); while for the intra-dataset setting, AUC is enhanced by 0.24 percentage points on the DFDC-P dataset. Full article
Show Figures

Figure 1

20 pages, 7167 KiB  
Article
FM-Net: Frequency-Aware Masked-Attention Network for Infrared Small Target Detection
by Yongxian Liu, Zaiping Lin, Boyang Li, Ting Liu and Wei An
Remote Sens. 2025, 17(13), 2264; https://doi.org/10.3390/rs17132264 - 1 Jul 2025
Viewed by 375
Abstract
Infrared small target detection (IRSTD) aims to locate and separate targets from complex backgrounds. The challenges in IRSTD primarily come from extremely sparse target features and strong background clutter interference. However, existing methods typically perform discrimination directly on the features extracted by deep [...] Read more.
Infrared small target detection (IRSTD) aims to locate and separate targets from complex backgrounds. The challenges in IRSTD primarily come from extremely sparse target features and strong background clutter interference. However, existing methods typically perform discrimination directly on the features extracted by deep networks, neglecting the distinct characteristics of weak and small targets in the frequency domain, thereby limiting the improvement of detection capability. In this paper, we propose a frequency-aware masked-attention network (FM-Net) that leverages multi-scale frequency clues to assist in representing global context and suppressing noise interference. Specifically, we design the wavelet residual block (WRB) to extract multi-scale spatial and frequency features, which introduces a wavelet pyramid as the intermediate layer of the residual block. Then, to perceive global information on the long-range skip connections, a frequency-modulation masked-attention module (FMM) is used to interact with multi-layer features from the encoder. FMM contains two crucial elements: (a) a mask attention (MA) mechanism for injecting broad contextual feature efficiently to promote full-level semantic correlation and focus on salient regions, and (b) a channel-wise frequency modulation module (CFM) for enhancing the most informative frequency components and suppressing useless ones. Extensive experiments on three benchmark datasets (e.g., SIRST, NUDT-SIRST, IRSTD-1k) demonstrate that FM-Net achieves superior detection performance. Full article
Show Figures

Graphical abstract

21 pages, 8541 KiB  
Article
Infrared Ship Detection in Complex Nearshore Scenes Based on Improved YOLOv5s
by Xiuwen Liu, Mingchen Liu and Yong Yin
Sensors 2025, 25(13), 3979; https://doi.org/10.3390/s25133979 - 26 Jun 2025
Viewed by 305
Abstract
Ensuring navigational safety in nearshore waters is essential for the sustainable development of the shipping economy. Accurate ship identification and classification are central to this objective, underscoring the critical importance of ship detection technology. However, compared to open-sea surface, dense vessel distributions and [...] Read more.
Ensuring navigational safety in nearshore waters is essential for the sustainable development of the shipping economy. Accurate ship identification and classification are central to this objective, underscoring the critical importance of ship detection technology. However, compared to open-sea surface, dense vessel distributions and complex backgrounds in nearshore areas substantially limit detection efficacy. Infrared vision sensors offer distinct advantages over visible light by enabling reliable target detection in all weather conditions. This study therefore proposes CGSE-YOLOv5s, an enhanced YOLOv5s-based algorithm specifically designed for complex infrared nearshore scenarios. Three key improvements are introduced: (1) Contrast Limited Adaptive Histogram Equalization integrated with Gaussian Filtering enhances target edge sharpness; (2) Replacement of the feature pyramid network’s C3 module with a Swin Transformer-based C3STR module reduces multi-scale false detections; and (3) Implementation of an Efficient Channel Attention mechanism amplifies critical target features. Experimental results demonstrate that CGSE-YOLOv5s achieves a mean average precision (mAP@0.5) of 94.8%, outperforming YOLOv5s by 1.3% and surpassing other detection algorithms. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

20 pages, 4391 KiB  
Article
GDS-YOLOv7: A High-Performance Model for Water-Surface Obstacle Detection Using Optimized Receptive Field and Attention Mechanisms
by Xu Yang, Lei Huang, Fuyang Ke, Chao Liu, Ruixue Yang and Shicheng Xie
ISPRS Int. J. Geo-Inf. 2025, 14(7), 238; https://doi.org/10.3390/ijgi14070238 - 23 Jun 2025
Viewed by 322
Abstract
Unmanned ships, equipped with self-navigation and image processing capabilities, are progressively expanding their applications in fields such as mining, fisheries, and marine environments. Along with this development, issues concerning waterborne traffic safety are gradually emerging. To address the challenges of navigation and obstacle [...] Read more.
Unmanned ships, equipped with self-navigation and image processing capabilities, are progressively expanding their applications in fields such as mining, fisheries, and marine environments. Along with this development, issues concerning waterborne traffic safety are gradually emerging. To address the challenges of navigation and obstacle detection on the water’s surface, this paper presents CDS-YOLOv7, an enhanced obstacle-detection framework for aquatic environments, architecturally evolved from YOLOv7. The proposed system implements three key innovations: (1) Architectural optimization through replacement of the Spatial Pyramid Pooling Cross Stage Partial Connections (SPPCSPC) module with GhostSPPCSPC for expanded receptive field representation. (2) Integration of a parameter-free attention mechanism (SimAM) with refined pooling configurations to boost multi-scale detection sensitivity, and (3) Strategic deployment of depthwise separable convolutions (DSC) to reduce computational complexity while maintaining detection fidelity. Furthermore, we develop a Spatial–Channel Synergetic Attention (SCSA) mechanism to counteract feature degradation in convolutional operations, embedding this module within the Extended Effective Long-Range Aggregation Network (E-ELAN) network to enhance contextual awareness. Experimental results reveal the model’s superiority over baseline YOLOv7, achieving 4.9% mean average precision@0.5 (mAP@0.5), +4.3% precision (P), and +6.9% recall (R) alongside a 22.8% reduction in Giga Floating-point Operations Per Second (GFLOPS). Full article
Show Figures

Figure 1

Back to TopTop