MDPI - Publisher of Open Access Journals

27 pages, 4829 KB

Open AccessArticle

Dual RANSAC with Rescue Midpoint Multi-Trend Vanishing Point Detection

by Nada Said, Bilal Nakhal, Ali El-Zaart and Lama Affara

J. Imaging 2026, 12(4), 172; https://doi.org/10.3390/jimaging12040172 - 16 Apr 2026

Viewed by 256

Vanishing point detection is a fundamental step in computer vision that allows 3D scene understanding and autonomous navigation. Classical techniques have significant challenges when trying to understand scenes that are heavily cluttered and images containing multiple perspective cues, leading to poor or unreliable [...] Read more.

Vanishing point detection is a fundamental step in computer vision that allows 3D scene understanding and autonomous navigation. Classical techniques have significant challenges when trying to understand scenes that are heavily cluttered and images containing multiple perspective cues, leading to poor or unreliable vanishing point determination. We present a Dual RANSAC with Rescue Midpoint-based Multi-Trend Vanishing Point Detection framework, which targets the simultaneous detection and fine-tuning of multiple, globally consistent vanishing points. The proposed framework introduces a novel Midpoint-based Multi-Trend Random Sample Consensus formulation that operates on line segment midpoints to infer dominant directional groups, thereby eliminating noisy or unstable midpoints and stabilizing subsequent vanishing point inference. The main novelty lies in using line segment midpoints to model the orientation variation as a linear regression in the midpoint–orientation space, which helps reduce sensitivity to endpoint instability. Candidate vanishing points are prioritized through inlier-based confidence ranking and subsequently optimized via an MSAC-based arbiter to resolve hypothesis conflicts and minimize geometric error. We evaluate our work against state-of-the-art techniques such as J-Linkage and Conditional Sample Consensus, over two of the current challenging public datasets that comprise the York Urban Dataset and the Toulouse Vanishing Point Dataset. The results show that the proposed framework achieves a recall of up to 95% and an image success rate of almost 84%, outperforming both J-Linkage and Conditional Sample Consensus, especially under tighter angular thresholds. This demonstrates the ability of the proposed framework to provide enhanced stability and localization accuracy. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

19 pages, 1748 KB

Open AccessArticle

BiASTM-CL: Bidirectional Adaptive Spatiotemporal Modeling with Contrastive Learning for Few-Shot Action Recognition

by Jing Huang and Zijian Zhao

Electronics 2026, 15(8), 1637; https://doi.org/10.3390/electronics15081637 - 14 Apr 2026

Viewed by 246

Abstract

In few-shot action recognition (FSAR), limited annotated data and large scene variations make it difficult for models to learn stable spatial semantics and reliable temporal dynamics. As a result, spatiotemporal representations tend to be weak, and models often fail to focus on discriminative [...] Read more.

In few-shot action recognition (FSAR), limited annotated data and large scene variations make it difficult for models to learn stable spatial semantics and reliable temporal dynamics. As a result, spatiotemporal representations tend to be weak, and models often fail to focus on discriminative motion regions or capture frame-to-frame changes accurately. Furthermore, the insufficient fusion of local details and global context renders the learned features more susceptible to background noise and scene bias. These issues become more pronounced when background clutter is severe or when different action classes share locally similar segments, leading to unreliable support–query matching and shifted similarity distributions, which ultimately result in class confusion. To address these challenges, we propose a bidirectional adaptive spatiotemporal modeling method integrated with contrastive learning for FSAR. The method constructs attention-guided bidirectional differencing features to model inter-frame variations with semantic alignment, while suppressing background noise. It introduces a local–global interactive channel attention module to strengthen both local and global dynamic representations, and integrates dynamic distance adjustment with hard negative mining during tuple-level matching. This combination imposes contrastive constraints that enhance intra-class compactness and inter-class separability, thereby mitigating interference from cross-class similar segments. Experiments under the standard 5-way 1-shot/5-shot protocol demonstrate consistent improvements across multiple datasets, and the proposed method achieves the best performance under the 5-shot setting while remaining competitive under the 1-shot setting. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 25208 KB

Open AccessArticle

HFI-Former: High-Frequency Interaction Transformer for Robust Scene Text Detection

by Yubing Gao, Quanli Gao, Lianhe Shao, Xihan Wang and Lufang Liu

Information 2026, 17(4), 365; https://doi.org/10.3390/info17040365 - 13 Apr 2026

Viewed by 206

Abstract

Scene text detection aims to accurately localize text instances in images captured under complex environments. Its performance depends heavily on precise text boundary delineation and reliable semantic discrimination from cluttered backgrounds. However, existing methods still struggle in such complex scenes. Repeated downsampling gradually [...] Read more.

Scene text detection aims to accurately localize text instances in images captured under complex environments. Its performance depends heavily on precise text boundary delineation and reliable semantic discrimination from cluttered backgrounds. However, existing methods still struggle in such complex scenes. Repeated downsampling gradually biases features toward low-frequency components, thereby weakening edge details and local structures that are critical to text morphology. Additionally, semantic information and local details are often modeled independently. This lack of coordination makes high-frequency responses vulnerable to background noise. To address these issues, we propose HFI-Former, a Transformer-based model designed for high-frequency enhancement and feature interaction. The framework consists of multi-scale feature extraction, frequency-enhanced representation, semantic-guided feature interaction, and deformable Transformer encoding. Frequency-domain enhancement is introduced to preserve high-frequency structural features degraded by repeated downsampling. Semantic-aware feature interaction further injects global context to regulate multi-scale feature fusion. Experiments on CTW1500, Total-Text and ICDAR1500 demonstrate competitive boundary localization accuracy and strong overall detection performance in complex scenes. Full article

(This article belongs to the Section Information Applications)

► Show Figures

Figure 1

15 pages, 58473 KB

Open AccessArticle

Aw-DuNet: Adaptive-Weight Deep Unfolding Network for High Precision Infrared Weak Target Segmentation

by Xu Yang, Aoxiang Li, Hancui Zhang, Long Wu, Zhen Yang, Yong Zhang and Jianlong Zhang

Appl. Sci. 2026, 16(8), 3767; https://doi.org/10.3390/app16083767 - 12 Apr 2026

Viewed by 178

Abstract

Deep learning (DL) methods have achieved promising performance in infrared weak target segmentation. However, their interpretability and robustness against cluttered backgrounds and noise remain limited. We propose an adaptive-weighted deep unfolding network (AwDuNet) that unfolds alternating direction method of multipliers (ADMM) iterations for [...] Read more.

Deep learning (DL) methods have achieved promising performance in infrared weak target segmentation. However, their interpretability and robustness against cluttered backgrounds and noise remain limited. We propose an adaptive-weighted deep unfolding network (AwDuNet) that unfolds alternating direction method of multipliers (ADMM) iterations for adaptive sparse–low-rank decomposition into multi-stage interpretable modules for end-to-end training. An adaptive weight matrix is jointly estimated from a local structural-difference matrix and a sparse-enhancement matrix, thereby strengthening target–background separation while preserving fine target details. To suppress background clutter, we design a dual-path complementary attention (DCA) mechanism for the low-rank background reconstruction module (LBRM), which improves low-rank background modeling by jointly leveraging spatial and channel attention. By extracting local details and global context in parallel, DCA enhances weak-target responses and mitigates interference from complex backgrounds. We also build a real-scene infrared dataset with 632 images for out-of-domain evaluation. The model is tested without fine-tuning after training on public datasets to assess practical robustness. Experiments on multiple public datasets validate the effectiveness and generalization of AwDuNet. Full article

► Show Figures

Figure 1

16 pages, 2602 KB

Open AccessArticle

A Feature-Enhanced Network for Vegetable Disease Detection in Complex Environments

by Xuewei Wang and Jun Liu

Plants 2026, 15(8), 1182; https://doi.org/10.3390/plants15081182 - 11 Apr 2026

Viewed by 413

Abstract

Accurate vegetable disease detection in complex cultivation environments remains challenging because early lesions are often small, low-contrast, and easily confounded by cluttered backgrounds. To address this issue, we propose VDD-Net, a feature-enhanced detection network based on YOLOv10 for robust vegetable disease detection in [...] Read more.

Accurate vegetable disease detection in complex cultivation environments remains challenging because early lesions are often small, low-contrast, and easily confounded by cluttered backgrounds. To address this issue, we propose VDD-Net, a feature-enhanced detection network based on YOLOv10 for robust vegetable disease detection in protected agriculture. The proposed framework integrates three modules: a receptive field enhancement (RFE) module to improve local perception of small lesions, an adaptive channel fusion (ACF) module to strengthen multi-scale feature aggregation and suppress background interference, and a global context attention (GCA) module to capture long-range dependencies and improve contextual discrimination. Experiments on a custom vegetable disease dataset showed that VDD-Net achieved an mAP@0.5 of 95.2% with only 7.78 M parameters. To further evaluate robustness, zero-shot cross-domain testing was conducted on the PlantDoc dataset, where VDD-Net achieved an mAP@0.5 of 76.5%, outperforming the baseline and showing improved generalization to natural scenes. In addition, after TensorRT optimization and FP16 quantization, the model maintained real-time inference on edge platforms, reaching 89.3 FPS on Jetson AGX Orin and 24.2 FPS on Jetson Nano. These results indicate that VDD-Net provides a practical balance among detection accuracy, cross-domain robustness, and deployment efficiency for intelligent disease monitoring in modern agriculture. Full article

(This article belongs to the Special Issue Combined Stresses on Plants: From Mechanisms to Adaptations)

► Show Figures

Figure 1

27 pages, 6579 KB

Open AccessArticle

EF-YOLO: Detecting Small Targets in Early-Stage Agricultural Fires via UAV-Based Remote Sensing

by Jun Tao, Zhihan Wang, Jianqiu Wu, Yunqin Li, Tomohiro Fukuda and Jiaxin Zhang

Remote Sens. 2026, 18(8), 1119; https://doi.org/10.3390/rs18081119 - 9 Apr 2026

Cited by 1 | Viewed by 318

Abstract

Early detection of agricultural fires with Unmanned Aerial Vehicles (UAVs) is important for environmental safety, yet it remains difficult because ignition cues are extremely small, smoke patterns vary widely, and farmland scenes often contain strong background interference such as specular reflections. Model development [...] Read more.

Early detection of agricultural fires with Unmanned Aerial Vehicles (UAVs) is important for environmental safety, yet it remains difficult because ignition cues are extremely small, smoke patterns vary widely, and farmland scenes often contain strong background interference such as specular reflections. Model development is further constrained by the scarcity of data from the early ignition stage. To address these challenges, we propose a joint data and model optimization framework. We first build a hybrid dataset through an ROI-guided synthesis pipeline, in which latent diffusion models are used to insert high-fidelity, carefully screened fire samples into real farmland backgrounds. We then introduce EF-YOLO, a detector designed for high sensitivity to small targets. The network uses SPD-Conv to reduce feature loss during spatial downsampling and includes a high-resolution P2 head to improve the detection of minute objects. To reduce background clutter, a Dual-Path Frequency–Spatial Enhancement (DP-FSE) module serves as a lightweight statistical surrogate that extracts global contextual cues and local salient features in parallel, thereby suppressing high-frequency noise. Experimental results show that EF-YOLO achieves an

{A P}_{S}

of 40.2% on sub-pixel targets, exceeding the YOLOv8s baseline by 15.4 percentage points. With a recall of 88.7% and a real-time inference speed of 78 FPS, the proposed framework offers a strong balance between detection performance and efficiency, making it well suited for edge-deployed agricultural fire early-warning systems. Full article

► Show Figures

Figure 1

15 pages, 3194 KB

Open AccessArticle

Detection of Microplastics in Coastal Environments Based on Semantic Segmentation

by Javier Lorenzo-Navarro, José Salas-Cáceres, Modesto Castrillón-Santana, May Gómez and Alicia Herrera

Microplastics 2026, 5(2), 66; https://doi.org/10.3390/microplastics5020066 - 3 Apr 2026

Viewed by 339

Abstract

Microplastics represent an emerging threat to aquatic ecosystems, human health, and coastal aesthetics, with increasing concern about their accumulation on beaches due to ocean currents, wave action, and accidental spills. Despite their environmental impact, current methods for detecting and quantifying microplastics remain largely [...] Read more.

Microplastics represent an emerging threat to aquatic ecosystems, human health, and coastal aesthetics, with increasing concern about their accumulation on beaches due to ocean currents, wave action, and accidental spills. Despite their environmental impact, current methods for detecting and quantifying microplastics remain largely manual, time-consuming, and spatially limited. In this study, we propose a deep learning-based approach for the semantic segmentation of microplastics on sandy beaches, enabling pixel-level localization of small particles under real-world conditions. Twelve segmentation models were evaluated, including U-Net and its variants (Attention U-Net, ResUNet), as well as state-of-the-art architectures such as LinkNet, PAN, PSPNet, and YOLOv11 with segmentation heads. Models were trained and tested on augmented data patches, and their performance was assessed using Intersection over Union (IoU) and Dice coefficient metrics. LinkNet achieved the best performance with a Dice coefficient of 80% and an IoU of 72.6% on the test set, showing superior capability in segmenting microplastics even in the presence of visual clutter such as debris or sand variation. Qualitative results support the quantitative findings, highlighting the robustness of the model in complex scenes. Full article

(This article belongs to the Topic Plastic Contamination (Plastamination): An Environmental and Public Health-Related Concern)

► Show Figures

Figure 1

24 pages, 3448 KB

Open AccessArticle

Gaussian-Guided Stage-Aware Deformable FPN with Coarse-to-Fine Unit-Circle Resolver for Oriented SAR Ship Detection

by Liangjie Meng, Qingle Guo, Danxia Li, Jinrong He and Zhixin Li

Remote Sens. 2026, 18(7), 1019; https://doi.org/10.3390/rs18071019 - 29 Mar 2026

Viewed by 314

Abstract

Synthetic Aperture Radar (SAR) enables all-weather maritime surveillance, yet ship-oriented bounding box (OBB) detection remains challenging in complex scenes. Strong sea clutter and dense harbor scatterers often mask the slender characteristics of ships as well as the weak responses of small ships. Meanwhile, [...] Read more.

Synthetic Aperture Radar (SAR) enables all-weather maritime surveillance, yet ship-oriented bounding box (OBB) detection remains challenging in complex scenes. Strong sea clutter and dense harbor scatterers often mask the slender characteristics of ships as well as the weak responses of small ships. Meanwhile, the periodicity of angle parameterization introduces regression discontinuities, and near-symmetric, bright-scatterer-dominated signatures further cause heading ambiguity, undermining the stability of orientation prediction. Moreover, in most detectors, multi-scale feature fusion and angle estimation lack explicit coordination, and rotated-box localization performance is often jointly affected by feature degradation and unstable orientation prediction. To this end, we propose a unified framework that simultaneously strengthens multi-scale representations and stabilizes orientation modeling. Specifically, we design a Gaussian-Guided Stage-Aware Deformable Feature Pyramid Network (GSDFPN) and a Coarse-to-Fine Unit-Circle Resolver (CF-UCR). GSDFPN enhances multi-scale fusion with two plug-in components: (i) a Gaussian-guided High-level Semantic Refinement Module (GHSRM) that suppresses clutter-dominated semantics while strengthening ship-responsive cues, and (ii) a Stage-aware Deformable Fusion Module (SDFM) for low-level features, which disentangles channels into a geometry-preserving spatial stream and a clutter-resistant semantic stream, and couples them via deformable interaction with bidirectional cross-stream gating to better capture the inherent slender characteristics of ships and localize small ships. For orientation, CF-UCR decomposes angle prediction into direction-cluster classification and intra-cluster residual regression on the unit circle, effectively mitigating periodicity-induced discontinuities and stabilizing rotated-box estimation. On SSDD+ and RSDD, our method achieves AP/AP₅₀/AP₇₅ of 0.5390/0.9345/0.4529 and 0.4895/0.9210/0.4712, respectively, while reaching AP_s75/AP_m75/AP_l75 of 0.5614/0.8300/0.8392 and 0.4986/0.8163/0.8934, evidencing strong rotated-box localization across target scales in complex maritime scenes. Full article

(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)

► Show Figures

Figure 1

24 pages, 5620 KB

Open AccessArticle

AviaTAD-LGH: A Multi-Task Spatio-Temporal Action Detector with Lightweight Gradient Harmonization for Real-Time Avian Behavior Monitoring

by Zihui Xie, Haifang Jian, Wenhui Yang, Mengdi Fu, Wanting Peng, Markus Peter Eichhorn, Ramiro Daniel Crego, Xin Ning, Jun Du and Hongchang Wang

Sensors 2026, 26(7), 2088; https://doi.org/10.3390/s26072088 - 27 Mar 2026

Viewed by 522

Abstract

Fine-grained spatio-temporal action detection in continuous, unconstrained field videos remains a formidable challenge due to severe background clutter, high inter-class similarity, and the scarcity of domain-specific benchmarks. To address these limitations, we first introduce a large-scale Wintering-Crane Benchmark, providing dense, individual-level bounding box [...] Read more.

Fine-grained spatio-temporal action detection in continuous, unconstrained field videos remains a formidable challenge due to severe background clutter, high inter-class similarity, and the scarcity of domain-specific benchmarks. To address these limitations, we first introduce a large-scale Wintering-Crane Benchmark, providing dense, individual-level bounding box annotations for six complex behaviors across diverse habitat scenes. Leveraging this data, we propose AviaTAD-LGH, a real-time multi-task framework that incorporates auxiliary motion supervision into a dual-pathway 3D backbone to enhance feature discriminability. A critical bottleneck in such multi-task settings is the negative transfer caused by conflicting optimization objectives. To resolve this, we present Lightweight Gradient Harmonization (LGH), a plug-and-play optimization strategy that dynamically modulates task weights based on the cosine similarity of gradient directions. This mechanism effectively aligns optimization trajectories without introducing inference latency. Extensive experiments demonstrate that AviaTAD-LGH achieves a state-of-the-art mAP of 68.60%, surpassing strong public baselines by 7.44% and improving upon the single-task baseline by 2.80%, with significant gains observed on ambiguous dynamic classes. The proposed pipeline enables efficient, scalable ecological monitoring suitable for edge deployment. Full article

(This article belongs to the Special Issue Advanced Sensing Systems for Biological Monitoring)

► Show Figures

Figure 1

28 pages, 43592 KB

Open AccessArticle

TreeSpecViT: Fine-Grained Tree Species Classification from UAV RGB Imagery for Campus-Scale Human–Vegetation Coupling Analysis

by Yinghui Yuan, Yunfeng Yang, Zhulin Chen and Sheng Xu

Remote Sens. 2026, 18(6), 928; https://doi.org/10.3390/rs18060928 - 18 Mar 2026

Viewed by 364

Abstract

On university campuses, trees and green spaces shape how students and staff move and use outdoor spaces. To support planning, tree species information is needed at the level of individual trees. Tree species classification from UAV RGB imagery remains difficult in complex campus [...] Read more.

On university campuses, trees and green spaces shape how students and staff move and use outdoor spaces. To support planning, tree species information is needed at the level of individual trees. Tree species classification from UAV RGB imagery remains difficult in complex campus scenes because roads, buildings, shadows and subtle inter species differences degrade recognition. To address background interference, the loss of subtle fine-grained cues before tokenization, and insufficient local structure modeling in lightweight transformer-based classification, we propose TreeSpecViT for tree species classification. It uses a MobileViT backbone and a Background Suppression Module (BSM) to reduce clutter from non-canopy regions. A Fine-Grained Feature Guidance (FGF) module is inserted before the unfold operation to enhance canopy details and guide tokenization toward key regions.

1 \times 1

convolutional neck layers align channels, and a Global and Local Fusion (GLF) module jointly models overall crown semantics and local textures for species recognition. From the predicted masks and species labels, we build an individual tree digital archive. The archive stores per tree geometric attributes and can be linked with grids of campus activity intensity to analyze how activity patterns relate to vegetation structure. TreeSpecViT achieves an Accuracy of 87.88% (+6.06%) and an F1 score of 76.48% (+5.08%) on the SZUTreeDataset. On our self constructed NJFUDataset, it reaches 76.30% (+5.10%) in Accuracy and 70.10% (+7.20%) in F1. These results surpass mainstream models. Ablation experiments show that the modules jointly reduce background clutter and enhance canopy features. Overall, TreeSpecViT supports campus scale analyses that link human activity intensity to vegetation patterns and provides a practical basis for planning and adjusting campus green spaces. Full article

► Show Figures

Figure 1

30 pages, 26587 KB

Open AccessArticle

Research on Synthetic Data Methods and Detection Models for Micro-Cracks

by Yaotong Jiang, Tianmiao Wang, Xuanhe Chen and Jianhong Liang

Sensors 2026, 26(6), 1883; https://doi.org/10.3390/s26061883 - 17 Mar 2026

Viewed by 375

Abstract

Micro-crack detection on concrete surfaces is challenging because labeled micro-crack data are scarce, crack cues are extremely weak (often only a few pixels wide), and complex backgrounds (e.g., non-uniform illumination, shadows, and stains) degrade feature extraction; this study aims to improve both data [...] Read more.

Micro-crack detection on concrete surfaces is challenging because labeled micro-crack data are scarce, crack cues are extremely weak (often only a few pixels wide), and complex backgrounds (e.g., non-uniform illumination, shadows, and stains) degrade feature extraction; this study aims to improve both data availability and detection robustness for practical inspection. A Poisson image editing-based synthesis strategy is developed to generate visually coherent micro-crack samples via gradient-domain blending, and a Complex-Scene-Tolerant YOLO (CST-YOLO) detector is proposed on top of YOLOv10, following an “lighting decoupling–global perception–micro-feature enhancement” design. CST-YOLO integrates an Lighting-Adaptive Preprocessing Module (LAPM) to suppress illumination/shadow perturbations, a Spatial–Channel Sparse Transformer (SCS-Former) to model long-range crack topology efficiently, and a Small Object Focus Block (SOFB) to enhance micro-scale cues under cluttered backgrounds. Experiments are conducted on a 650-image dataset (200 real and 450 synthesized), in which synthesized samples are used only for training, and the validation/test sets contain only real images, with a 7:2:1 split. CST-YOLO achieves 0.990 mAP@0.5 and 0.926 mAP@0.5:0.95 at 139 FPS, and ablation results indicate complementary contributions from LAPM, SCS-Former, and SOFB. These results support the effectiveness of combining realistic synthesis and architecture-level robustness for real-time micro-crack detection in complex scenes. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

28 pages, 9208 KB

Open AccessArticle

Knowledge-Aided Multichannel SAR Clutter Suppression Algorithm in Complex Scenes

by Yun Zhang, Niezipeng Kang, Zuzhen Huang, Qinglong Hua and Hang Ren

Remote Sens. 2026, 18(6), 879; https://doi.org/10.3390/rs18060879 - 12 Mar 2026

Viewed by 262

Abstract

Multichannel synthetic aperture radar (SAR) achieves high-resolution imaging while significantly enhancing the spatial freedom of the SAR system. As SAR hardware performance continues to improve, observed scenes often transition from simple to complex scenes. The increasingly complex clutter components introduced by complex scenes [...] Read more.

Multichannel synthetic aperture radar (SAR) achieves high-resolution imaging while significantly enhancing the spatial freedom of the SAR system. As SAR hardware performance continues to improve, observed scenes often transition from simple to complex scenes. The increasingly complex clutter components introduced by complex scenes make clutter suppression increasingly challenging. Traditional multichannel clutter suppression algorithms usually assume that the observed scene, as a whole, satisfies the independent and identical distribution (IID) condition. However, in actual complex scenes, this assumption often proves difficult to uphold. Therefore, how to achieve more effective clutter suppression for complex scenes is a challenge for SAR. In this paper, we propose a knowledge-aided (KA) multichannel SAR clutter suppression algorithm for complex scenes. First, the single-channel image is processed at the superpixel level and a superpixel fusion algorithm is proposed, which adaptively realizes the refined classification of the complex scene. Then, a two-step clutter suppression processing method with multi-strategy clutter suppression preprocessing and sparse Bayesian residual clutter suppression is proposed. This method not only provides effective classification information for complex scenes but also achieves more efficient clutter suppression in complex scenes based on this classification information. Finally, the clutter suppression performance of this algorithm in complex scenes was validated through measured data. Full article

(This article belongs to the Special Issue Advances in Synthetic Aperture Radar (SAR) System, Signal Processing and Applications)

► Show Figures

Figure 1

24 pages, 5693 KB

Open AccessArticle

From Geometric Alignment to Scale Balance: Directional Strip Convolution and Efficient Scale Fusion for Remote Sensing Ship Detection

by Jing Sun, Guoyou Shi, Yaxin Yang and Xiaolian Cheng

Remote Sens. 2026, 18(6), 873; https://doi.org/10.3390/rs18060873 - 12 Mar 2026

Viewed by 406

Abstract

Optical remote sensing ship detection faces significant challenges in realistic maritime scenes due to strong background clutter (e.g., docks, shorelines, wake streaks), extreme scale variation, and the elongated geometry of ships with diverse orientations. These factors frequently lead to geometric misalignment, unstable localization, [...] Read more.

Optical remote sensing ship detection faces significant challenges in realistic maritime scenes due to strong background clutter (e.g., docks, shorelines, wake streaks), extreme scale variation, and the elongated geometry of ships with diverse orientations. These factors frequently lead to geometric misalignment, unstable localization, and false alarms, particularly in congested ports and complex sea states. To enhance robustness under clutter while retaining the set prediction efficiency of DETR, we propose the Directional Efficient Network (DENet), a structure-aware enhancement built upon RT-DETR. DENet introduces two complementary components. First, Directional Strip Convolution (DSConv) replaces the standard

3 \times 3

convolution for spatial mixing. By predicting offsets conditioned on input features, DSConv performs strip aggregation that aligns with slender hull structures, thereby suppressing interference from line-shaped background patterns. Second, Efficient Scale Fusion (ESF) augments the Hybrid Encoder as an additive residual correction. It combines multiple receptive field branches with lightweight differential compensation to balance low-frequency context and high-frequency structural transitions, ensuring stable multi-scale fusion in cluttered scenes. Extensive experiments demonstrate the effectiveness of DENet. On ShipRSImageNet,

{AP}^{val}

improves from 58.8% to 63.2% and

{AP}_{50}^{val}

increases from 68.5% to 73.6%. Consistent gains are also observed on NWPU VHR-10, where

{AP}^{val}

reaches 63.0% and

{AP}_{50}^{val}

reaches 94.6%, alongside improvements on the Infrared Ship Database and VisDrone2019-DET, validating the method’s generalization capabilities. Full article

(This article belongs to the Special Issue Advances in Deep Learning and Machine Learning for Remote Sensing Image Analysis)

► Show Figures

Figure 1

25 pages, 4978 KB

Open AccessArticle

Full Polarimetric Scattering Matrix Estimation with Single-Channel Echoes via Time-Varying Polarization Modulation

by Yan Chen, Zhanling Wang, Zhuang Wang and Yongzhen Li

Remote Sens. 2026, 18(6), 870; https://doi.org/10.3390/rs18060870 - 11 Mar 2026

Viewed by 277

Abstract

Polarimetric information is essential for scattering interpretation and target characterization in synthetic aperture radar (SAR) remote sensing, yet many resource-constrained platforms (e.g., small satellites and unmanned aerial vehicles (UAVs)) operate with limited polarization modes or even a single radio frequency (RF) chain, which [...] Read more.

Polarimetric information is essential for scattering interpretation and target characterization in synthetic aperture radar (SAR) remote sensing, yet many resource-constrained platforms (e.g., small satellites and unmanned aerial vehicles (UAVs)) operate with limited polarization modes or even a single radio frequency (RF) chain, which limits full polarimetric scattering acquisition. To address this limitation, this paper proposes a single-channel framework for estimating the full polarization scattering matrix (PSM) enabled by time-varying polarization modulation. The transmit/receive polarization states are steered along predefined trajectories on the Poincaré sphere to generate time-varying polarization tags that are encoded into the received echoes through the target’s polarization-varying response. A compact observation model is then derived to relate the single-channel echoes, the known polarization tags, and the unknown PSM; based on this, the PSM is then estimated via a least squares formulation with a low-rank approximation. Simulation results demonstrate the robust reconstruction of the full polarimetric scattering matrix under diverse modulation trajectories. For arbitrarily chosen random point targets, when the signal-to-noise ratio (SNR) exceeds −20 dB, the polarimetric similarity coefficient approaches 1, and the estimation errors of Pauli power components converge toward zero. Furthermore, the method’s reliability is validated on distributed vegetation clutter. Quantitative metrics demonstrate near-perfect statistical consistency, with polarimetric entropy and alpha angle errors within 0.14%. Overall, the proposed approach provides a practical pathway to enhance the availability of full polarimetric scattering information under limited-observation conditions, confirming its feasibility for downstream analysis in complex natural scenes while maintaining a single radio frequency (RF) chain architecture augmented by a polarization modulator. Full article

(This article belongs to the Special Issue Advances in Synthetic Aperture Radar (SAR) Imaging and Time-Varying Scattering Target Interaction: Innovation, Theory, and Applications)

► Show Figures

Figure 1

27 pages, 15115 KB

Open AccessArticle

An Object Tracking Algorithm Based on Multi-Scale Attention and Adaptive Fusion

by Deyu Zhang, Haiyang Li and Yanhui Lv

Appl. Sci. 2026, 16(6), 2646; https://doi.org/10.3390/app16062646 - 10 Mar 2026

Viewed by 316

Abstract

Single-object tracking in complex scenes faces challenges such as drastic target scale variation and strong background interference. To address these issues, an object tracking algorithm based on multi-scale attention and adaptive fusion is proposed. The method integrates a multi-scale attention module and an [...] Read more.

Single-object tracking in complex scenes faces challenges such as drastic target scale variation and strong background interference. To address these issues, an object tracking algorithm based on multi-scale attention and adaptive fusion is proposed. The method integrates a multi-scale attention module and an adaptive gated fusion module, enabling the adaptive mining of key features and dynamic adjustment of fusion weights across multi-level features. This effectively highlights target regions, suppresses redundant information, and enhances the model’s discriminative capability and robustness under complex backgrounds and occlusion. Experiments are conducted on the OTB100 and UAV123 datasets. Results show that, compared with the baseline model, the proposed algorithm improves the success rate and precision by 1.9% and 3.3%, respectively, on OTB100, and by 2.9% and 3.5%, respectively, on UAV123. Moreover, it achieves superior performance when facing typical challenging attributes such as occlusion, scale variation, and background clutter. In summary, the proposed algorithm enhances both tracking accuracy and robustness, offering a viable approach for object tracking under complex conditions. Full article

► Show Figures

Figure 1

Search Results (234)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (234)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI