MDPI - Publisher of Open Access Journals

22 pages, 11365 KB

Open AccessArticle

Addressing Dense Small-Object Detection in Remote Sensing: An Open-Vocabulary Object Detection Framework

by Menghan Ju, Yingchao Feng, Wenhui Diao and Chunbo Liu

Remote Sens. 2026, 18(6), 851; https://doi.org/10.3390/rs18060851 - 10 Mar 2026

Viewed by 166

Remote sensing open-vocabulary object detection focuses on identifying and localizing unseen categories within remote sensing imagery. However, constrained by characteristics such as dense target distribution, complex background interference, and drastic scale variations inherent to remote sensing scenarios, existing methods are prone to background [...] Read more.

Remote sensing open-vocabulary object detection focuses on identifying and localizing unseen categories within remote sensing imagery. However, constrained by characteristics such as dense target distribution, complex background interference, and drastic scale variations inherent to remote sensing scenarios, existing methods are prone to background noise interference when extracting features from dense, small target regions. This leads to weakened semantic representation and reduced localization accuracy. Therefore, we propose RS-DINO to address these challenges. Specifically: Firstly, to address the issue of small features being obscured by the background, the feature extraction module incorporates a multi-scale large-kernel attention mechanism. This expands the receptive field while enhancing local detail modelling, significantly improving the feature representation of minute targets. Secondly, a cross-modal feature fusion module employing bidirectional cross-attention achieves deep alignment between image and textual features. Subsequently, a language-guided query selection mechanism enhances detection accuracy through hybrid query strategies. Finally, to enhance the spatial sensitivity and channel adaptability of fusion features, the multimodal decoder integrates a convolutional gated feedforward network, significantly boosting the model’s robustness in dense, multi-scale scenes. Experiments on DIOR, DOTA v2.0, and NWPU-VHR10 demonstrate substantial gains, with fine-tuned RS-DINO surpassing existing methods by 3.5%, 3.7%, and 4.0% in accuracy, respectively. Full article

(This article belongs to the Special Issue Remote Sensing Intelligent Interpretation in the Era of Large Models and Intelligent Agents: New Challenges, Methods and Opportunities)

► Show Figures

Figure 1

20 pages, 21647 KB

Open AccessArticle

Spatial Orthogonal and Boundary-Aware Network for Rotated and Elongated-Target Detection

by Yong Liu, Zhengbiao Jing, Yinghong Chang and Donglin Jing

Algorithms 2026, 19(3), 206; https://doi.org/10.3390/a19030206 - 9 Mar 2026

Viewed by 82

Abstract

In recent years, the refinement of bounding box representations has emerged as a major research focus in remote sensing. Nevertheless, mainstream detection algorithms typically ignore the disruptive impacts induced by the diverse morphologies and arbitrary orientations of high-aspect-ratio aerial objects throughout model training, [...] Read more.

In recent years, the refinement of bounding box representations has emerged as a major research focus in remote sensing. Nevertheless, mainstream detection algorithms typically ignore the disruptive impacts induced by the diverse morphologies and arbitrary orientations of high-aspect-ratio aerial objects throughout model training, thereby giving rise to several critical technical challenges: (1) Anisotropic information distribution: Target features are highly concentrated in one spatial dimension but sparse in the other, with significant feature differences across bounding box parameters, breaking the symmetry of feature distribution. (2) Missing high-quality positive samples: IoU-based assignment strategies fail to adequately capture the symmetric structural characteristics of elongated targets, resulting in incomplete coverage of critical features. (3) Loss function gradient instability: Small deviations in large-aspect-ratio bounding boxes cause drastic loss value fluctuations, as the asymmetric gradient changes hinder stable optimization directions during training. To address the challenges, we propose a Spatial Orthogonal and Boundary-Aware Network (SOBA-Net) for rotated and elongated target detection, leveraging symmetry-aware designs to enhance feature representation. Specifically, spatial staggered convolutions are constructed to fuse local and directional contextual features, effectively modeling long-range symmetric information across multiple spatial scales and reducing background noise interference. Secondly, the designed Symmetric-Constrained Label Assignment (SC-LA) introduces an IoU-weighted function, ensuring high-quality samples with symmetric structural features are classified as positive samples. Ultimately, the designed Gradient Dynamic Equilibrium Loss Function mitigates the problem of unstable gradients associated with high-aspect-ratio objects by enforcing symmetrical gradient regulation across samples with negligible localization deviations. Comprehensive evaluations across three representative remote sensing benchmarks—DOTA, UCAS-AOD, and HRSC2016—sufficiently corroborate the superiority of symmetry-aware enhancement schemes, which boast straightforward implementation and efficient inference deployment. Full article

(This article belongs to the Special Issue Advances in Deep Learning-Based Data Analysis)

► Show Figures

Figure 1

27 pages, 2940 KB

Open AccessArticle

A Unified Framework for Vehicle Detection, Tracking, and Counting Across Ground and Aerial Views Using Knowledge Distillation with YOLOv10-S

by Md Rezaul Karim Khan and Naphtali Rishe

Remote Sens. 2026, 18(5), 842; https://doi.org/10.3390/rs18050842 - 9 Mar 2026

Viewed by 219

Abstract

Accurate and reliable vehicle detection, tracking, and counting across different surveillance platforms are fundamental requirements for developing smart Traffic Management Systems (TMS) and promoting sustainable urban mobility. Recent advances in both ground-level surveillance and remote sensing using deep learning have opened new opportunities [...] Read more.

Accurate and reliable vehicle detection, tracking, and counting across different surveillance platforms are fundamental requirements for developing smart Traffic Management Systems (TMS) and promoting sustainable urban mobility. Recent advances in both ground-level surveillance and remote sensing using deep learning have opened new opportunities for extracting detailed vehicular information from high-resolution aerial and surveillance video data. Our research reported here aims to present a unified, real-time vehicle analysis framework that integrates lightweight deep learning–based detection, robust multi-object tracking, and trajectory-driven counting within a single modular pipeline. The proposed framework employs a “You Only Look Once” system, YOLOv10-S as the detection backbone and enhances its robustness through supervision-level knowledge distillation without introducing any architectural modifications. Temporal consistency is enforced using an observation-centric multi-object tracking algorithm (OC-SORT), enabling stable identity preservation under camera motion and dense traffic conditions. Vehicle counting is performed using a trajectory-based virtual gate strategy, reducing duplicate counts and improving counting reliability. Comprehensive experiments conducted on the UA-DETRAC and VisDrone benchmarks show that the proposed framework effectively balances detection performance, tracking robustness, counting accuracy, and real-time efficiency in both ground-based and aerial surveillance settings. Furthermore, cross-dataset evaluations under direct train–test transfer highlight the inherent challenges of domain shift while showing that knowledge distillation consistently improves robustness in detection, tracking identity consistency, and vehicle counting. Overall, this framework enables effective real-world traffic monitoring by adopting a scalable and practical system design, where reliability is prioritized over architectural complexity. Full article

(This article belongs to the Section Urban Remote Sensing)

► Show Figures

Figure 1

23 pages, 14232 KB

Open AccessArticle

A Dual-Branch Perception Network for High-Precision Oriented Object Detection in Remote Sensing

by Qi Wang and Wei Sun

Remote Sens. 2026, 18(5), 839; https://doi.org/10.3390/rs18050839 - 9 Mar 2026

Viewed by 185

Abstract

With the rapid evolution of remote sensing earth observation technology, high-resolution object detection is crucial in military and civilian domains but faces challenges from expansive views and complex backgrounds. Small objects are particularly challenging due to their low pixel coverage, poor textures, and [...] Read more.

With the rapid evolution of remote sensing earth observation technology, high-resolution object detection is crucial in military and civilian domains but faces challenges from expansive views and complex backgrounds. Small objects are particularly challenging due to their low pixel coverage, poor textures, and susceptibility to drastic illumination changes and background clutter. To address these problems, this paper proposes MDCA-YOLO for oriented object detection. A Dual-Branch Perception Module (DBPM) is designed utilizing a synergistic mechanism of large-kernel and strip convolutions to establish long-range dependencies, accurately capturing geometric features of tiny objects even in the absence of local details; Multi-Adaptive Selection Fusion (MASF) is proposed to address cross-scale feature loss by adaptively enhancing feature response while suppressing background noise; furthermore, a reconstructed decoupled detection head, CoordAttOBB, significantly improves angle regression accuracy while reducing complexity. Experimental results on the DIOR-R dataset show MDCA-YOLO surpasses YOLO11s, improving mAP50 and mAP50:95 by 2.5% and 2.7%, respectively, effectively proving the algorithm’s superiority in remote sensing tasks. Full article

(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)

► Show Figures

Figure 1

26 pages, 13700 KB

Open AccessArticle

DG-Net: Few-Shot Remote Sensing Detection with Dynamic Dual-Stream Collaboration and Generative Meta-Learning

by Shanliang Liu, Xinnan Shao, Yan Dong, Qihang He and Chunlei Li

Symmetry 2026, 18(3), 461; https://doi.org/10.3390/sym18030461 - 7 Mar 2026

Viewed by 147

Abstract

Existing research has demonstrated that meta-learning methods hold considerable promise in addressing the challenges posed by few-shot object detection. However, remote sensing scenarios present two major challenges. The sparse features of small objects provide insufficient support information for query enhancement, and significant morphological [...] Read more.

Existing research has demonstrated that meta-learning methods hold considerable promise in addressing the challenges posed by few-shot object detection. However, remote sensing scenarios present two major challenges. The sparse features of small objects provide insufficient support information for query enhancement, and significant morphological variations caused by lighting and viewpoint differences hinder intra-class consistency capture via direct alignment in few-shot learning. To address these challenges, we propose a generative meta-learning detection framework. The framework first introduces a Dynamic Relation Dual-Stream Network to achieve dynamic support-query feature alignment through joint modeling of evolutionary and relational features, thereby enhancing representation in few-shot conditions. Second, an Optimal Transport-based Generative Meta-Learner is developed to mitigate feature distribution bias via generative augmentation in latent space. Additionally, an Orthogonal Frequency Decomposition Head is incorporated to adaptively separate query features into low-frequency contour and high-frequency detail components, effectively suppressing background noise interference. Experiments on multiple remote sensing datasets demonstrate that the proposed method achieves consistent performance gains over leading baseline methods in various few-shot settings. Its effectiveness is further validated across different backbone networks, highlighting strong generalization in few-shot remote sensing object detection. Full article

(This article belongs to the Special Issue Symmetry/Asymmetry in Evolutionary Computation and Machine Learning)

► Show Figures

Figure 1

28 pages, 48517 KB

Open AccessArticle

DDF-DETR: A Multi-Scale Spatial Context Method for Field Cotton Seedling Detection

by Feng Xu, Huade Zhou, Yinyi Pan, Yi Lu and Luan Dong

Agriculture 2026, 16(5), 615; https://doi.org/10.3390/agriculture16050615 - 7 Mar 2026

Viewed by 280

Abstract

Accurate assessment of cotton emergence rates is essential for precision agriculture management, and unmanned aerial vehicle (UAV) imagery provides a scalable means for field-level monitoring. However, cotton seedling detection from UAV images faces persistent challenges: individual seedlings appear as small targets with diverse [...] Read more.

Accurate assessment of cotton emergence rates is essential for precision agriculture management, and unmanned aerial vehicle (UAV) imagery provides a scalable means for field-level monitoring. However, cotton seedling detection from UAV images faces persistent challenges: individual seedlings appear as small targets with diverse morphologies across varying flight altitudes; strong plastic film reflections, weeds, and soil cracks introduce substantial background interference; and “missing seedling” targets, which manifest as negative space features, exhibit high similarity to background noise. Existing CNN–Transformer hybrid detection architectures are limited by fixed convolutional receptive fields that cannot adapt to multi-scale target variations, attention mechanisms that lack explicit directional geometric modeling, and interpolation-based upsampling that attenuates high-frequency edge details of small targets. To address these issues, this paper proposes DDF-DETR (Dynamic-Direction-Frequency Detection Transformer), a multi-scale spatial context detection method based on RT-DETR. The method incorporates three components: a Dynamic Gated Mixer Block (DGMB) for adaptive multi-scale feature extraction with background noise suppression, a Direction-Aware Adaptive Transformer Encoder (DAATE) for directional geometric feature modeling at linear computational complexity, and a Frequency-Aware Sub-pixel Upsampling Network (FASN) for high-frequency detail recovery in the feature pyramid. On the self-constructed Xinjiang cotton field dataset, DDF-DETR achieves 83.72% mAP@0.5 and 63.46% mAP@0.5:0.95, representing improvements of 2.38% and 5.28% over the baseline RT-DETR-R18, while reducing the parameter count by 30.6% and computational cost to 42.8 GFLOPs. Generalization experiments on the VisDrone2019 and TinyPerson datasets further validate the robustness of the proposed method for small target detection across different scenarios. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

32 pages, 7690 KB

Open AccessArticle

FSSC-Net: A Frequency–Spatial Self-Calibrated Network for Task-Adaptive Remote Sensing Image Understanding

by Hao Yuan and Bin Zhang

Remote Sens. 2026, 18(5), 824; https://doi.org/10.3390/rs18050824 - 6 Mar 2026

Viewed by 329

Abstract

Although recent studies have achieved remarkable progress in remote sensing image understanding by fusing spatial- and frequency-domain features to leverage their complementary strengths, they still face two key limitations: frequency modeling remains rigid due to static constraints, limiting adaptability, and spatial–frequency fusion often [...] Read more.

Although recent studies have achieved remarkable progress in remote sensing image understanding by fusing spatial- and frequency-domain features to leverage their complementary strengths, they still face two key limitations: frequency modeling remains rigid due to static constraints, limiting adaptability, and spatial–frequency fusion often suffers from poor generalization and instability across tasks and network depths. Our experiments reveal that the relative importance of low- and high-frequency components varies dynamically across feature hierarchies and training stages, indicating that frequency information is inherently task-dependent and stage-aware. Motivated by these observations, we propose the Frequency–Spatial Self-Calibrated Network (FSSC-Net), a task-driven framework for adaptive frequency modeling and collaborative spatial–frequency fusion. FSSC-Net incorporates a lightweight, plug-and-play self-calibrated frequency modeling mechanism, comprising a Dynamic Frequency Selection Module and a Task-Guided Calibration Fusion Module. This mechanism adaptively modulates frequency responses via soft masks, enabling dynamic extraction of task-relevant low- and high-frequency components and effective alignment between spatial- and frequency-domain features. Moreover, we present a systematic analysis of frequency importance across tasks and training stages, providing quantitative evidence for the necessity of task-calibrated frequency modeling. Extensive experiments on various benchmarks demonstrate that FSSC-Net consistently outperforms state-of-the-art methods, exhibiting strong task adaptability and robust cross-task generalization. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

26 pages, 2634 KB

Open AccessSystematic Review

A Systematic Review of Terrestrial Laser Scanning (TLS) Applications in Sediment Management

by Md. Emon Sardar, Muhammad Arifur Rahman, Md. Rasheduzzaman, Md. Shamsuzzoha, Abul Kalam Azad, Ayesha Akter, Kamrunnahar Ishana, Ahmed Parvez, Md. Anwarul Abedin, Mohammad Kabirul Islam, Md. Sagirul Islam Majumder, Mehedi Ahmed Ansary and Rajib Shaw

NDT 2026, 4(1), 10; https://doi.org/10.3390/ndt4010010 - 6 Mar 2026

Viewed by 262

Abstract

Sediment management is defined as the strategic monitoring and control of erosion, transport, and deposition processes to maintain environmental and infrastructural stability. Terrestrial laser scanning (TLS) has emerged as a critical high-precision technology for monitoring sediment dynamics, erosion processes, and geomorphic change detection [...] Read more.

Sediment management is defined as the strategic monitoring and control of erosion, transport, and deposition processes to maintain environmental and infrastructural stability. Terrestrial laser scanning (TLS) has emerged as a critical high-precision technology for monitoring sediment dynamics, erosion processes, and geomorphic change detection across diverse environments, including riverine, coastal, watershed, and infrastructure-related landscapes. While the field of TLS technology has seen significant advancements in recent years, including improvements in data accuracy, enhanced operational performance, artificial intelligence (AI), machine learning-based processing, and integration with other remote sensing tools such as unmanned aerial vehicles (UAVs) and satellite light detection and ranging (LiDAR), the study has focused on these developments. These advancements have further extended the application prospects of TLS technology. Despite these advancements, there remains a crucial need to systematically identify global research trends to identify the effectiveness, limitations, and knowledge gaps of TLS in sediment management. The methodological advantages and challenges of TLS applications provide insights into its gradual development role in enhancing sediment monitoring and environmental resilience. The objective of this study is to synthesize the current state of sediment management by conducting a systematic review of 108 peer-reviewed research papers retrieved from academic databases, including Google Scholar, ResearchGate, ScienceDirect, Scopus, and Web of Science, from 28 countries, published between 2000 and 2025. The study will evaluate the effectiveness of TLS methodologies in comparison to conventional techniques and management procedures, following the PRISMA 2020 guidelines. It will examine their capacity to enhance measurement accuracy, reduce error margins, and improve structural guidelines, particularly by advancing TLS technology through the integration of AI and machine learning (ML) algorithms. The findings of the study indicate that TLS and Iterative Closest Point (ICP) techniques can enhance the analysis of 3D models of dam deformation, ensuring improved structural monitoring and safety. The findings offer insights into the evolving role of TLS in sediment monitoring, emphasizing its potential for enhancing environmental management and climate resilience strategies. Furthermore, this review identifies future research directions to optimize TLS applications in sediment management through interdisciplinary approaches. Full article

► Show Figures

Figure 1

26 pages, 9001 KB

Open AccessArticle

PSiam-HDSFNet: A Pseudo-Siamese Hybrid Dilation Spiral Feature Network for Flood Inundation Change Detection Based on Heterogeneous Remote Sensing Imagery

by Yichuang Luo, Xunqiang Gong, Yuanxin Ye, Pengyuan Lv, Shuting Yang, Ailong Ma and Yanfei Zhong

Remote Sens. 2026, 18(5), 788; https://doi.org/10.3390/rs18050788 - 4 Mar 2026

Viewed by 168

Abstract

Flood change detection from remote sensing data can be used to identify post-disaster flooded areas, providing decision support for emergency rescue and post-disaster reconstruction. Although the combination of SAR and optical images effectively addresses obscuration by clouds and rain, the inherent difference in [...] Read more.

Flood change detection from remote sensing data can be used to identify post-disaster flooded areas, providing decision support for emergency rescue and post-disaster reconstruction. Although the combination of SAR and optical images effectively addresses obscuration by clouds and rain, the inherent difference in their imaging mechanisms poses a challenge to improving the accuracy of flood area change detection. Furthermore, existing flood inundation change detection methods based on heterogeneous remote sensing imagery struggle to distinguish small ground objects within the background from the actual inundated regions. Therefore, a pseudo-Siamese hybrid dilation spiral feature network (PSiam-HDSFNet) is proposed in this paper. Firstly, the feature extraction pipeline progressively processes optical and SAR images through five-layer Enhanced Deep Residual Blocks and five-layer Residual Dense Blocks, respectively. A Hybrid Dilated Pyramid (HDP) module based on a sawtooth wave-like dilated coefficient is designed to enhance multi-scale semantics of deep features in order to selectively reinforce semantic features in flood areas and weaken the noise semantics from small ground objects. Then, a Spiral Feature Pyramid (SFP) module is designed to make the deep features of SAR and optical images more consistent in spatial structure and numerical distribution patterns, so that the features of flood areas become more prominent while the noise semantics from small ground objects are further suppressed. After that, the Galerkin-type attention with linear complexity is introduced to the decoder, rapidly reconstructing the abstract semantic information of floods into interpretable flood features. Finally, the Align OPT-SAR (AlignOS) method is designed to align SAR and optical image features, enabling subsequent flood area detection. Seven metrics are adopted in the comparison between PSiam-HDSFNet and the other 14 methods. The results indicate that PSiam-HDSFNet improves change detection accuracy by extracting and processing depth features of these two images without image domain translation, and its F1 scores are improved by 7.704%, 7.664%, 4.353%, and 1.111% in the four flood coverage categories detection tasks compared to the suboptimum. Full article

(This article belongs to the Special Issue City Future: The Innovative Fusion of Artificial Intelligence and Multi-Source Remote Sensing Data)

► Show Figures

Figure 1

29 pages, 2468 KB

Open AccessArticle

From Structural Degradation to Semantic Misalignment: A Unified Frequency-Aware Compensation Framework for Remote Sensing Object Detection

by Hao Yuan, Bin Zhang, Yachuan Wang and Qianyao Qiang

Remote Sens. 2026, 18(5), 777; https://doi.org/10.3390/rs18050777 - 4 Mar 2026

Viewed by 226

Abstract

Remote sensing object detection within multi-scale frameworks remains challenging, largely due to structural degradation and semantic misalignment introduced during cross-scale semantic enhancement. As feature hierarchies deepen, high-frequency details for small-object localization decay, while nonlinear transformations and receptive field asymmetry cause cross-scale semantic and [...] Read more.

Remote sensing object detection within multi-scale frameworks remains challenging, largely due to structural degradation and semantic misalignment introduced during cross-scale semantic enhancement. As feature hierarchies deepen, high-frequency details for small-object localization decay, while nonlinear transformations and receptive field asymmetry cause cross-scale semantic and spatial offsets. While existing feature pyramid-based approaches improve detection performance through multi-scale fusion or semantic aggregation, they fail to fundamentally address the cumulative information degradation arising from hierarchical feature extraction. To this end, we propose CFBA-FPN, a unified shallow–deep cross-scale feature compensation framework that explicitly models both frequency discrepancies and semantic offsets across scales. Specifically, shallow features are exploited as structural and spatial anchors to inject lost high-frequency information into deeper layers, effectively mitigating structural degradation. Meanwhile, a cross-scale collaborative semantic alignment strategy is introduced to correct semantic inconsistencies and spatial misalignments among multi-scale features. Building upon these designs, a cascaded gated fusion mechanism is developed to adaptively balance shallow structural compensation and deep semantic representation, thereby suppressing background noise and enhancing small-object responses. Extensive experiments on the AI-TOD, VisDrone, and DIOR benchmarks demonstrate that CFBA-FPN consistently improves localization accuracy and recognition capability, validating its effectiveness and generalization ability in remote sensing object detection. Full article

► Show Figures

Figure 1

17 pages, 2985 KB

Open AccessArticle

Automated BRDF Measurement for Aerospace Materials and 1D-CNN-Based Estimation of Mixed-Material Composition

by Depu Yao, Yulai Sun, Limin He, Heng Wu, Guanyu Lin, Jianing Wang and Zihui Zhang

Sensors 2026, 26(5), 1560; https://doi.org/10.3390/s26051560 - 2 Mar 2026

Viewed by 177

Abstract

With the growing global emphasis on space resources, the significance of space detection and surveillance technologies has escalated. Currently, space-based optical surveillance stands as the primary means for acquiring information on space objects. However, constrained by the diffraction limits of space telescopes, distant [...] Read more.

With the growing global emphasis on space resources, the significance of space detection and surveillance technologies has escalated. Currently, space-based optical surveillance stands as the primary means for acquiring information on space objects. However, constrained by the diffraction limits of space telescopes, distant space objects are typically imaged as point sources. The resulting lack of sufficient spatial resolution renders traditional image-based recognition algorithms ineffective. In contrast, the Bidirectional Reflectance Distribution Function (BRDF) fully characterizes surface light scattering properties through four-dimensional features, significantly outperforming traditional two-dimensional spectral techniques in material identification. Consequently, leveraging BRDF signatures at varying phase angles has emerged as an effective approach for Space Object Identification. In this study, we developed an automated BRDF measurement system to characterize various typical aerospace materials and investigated the BRDF properties of mixed-material surfaces. A material composition ratio prediction model was constructed based on a One-Dimensional Convolutional Neural Network (1D-CNN). This model effectively extracts key features, including local slope variations and global waveform characteristics, from the BRDF curves. Experimental results demonstrate that the model achieves a maximum relative percentage error of 6.21%, implying a prediction accuracy for mixed-material composition ratios consistently exceeding 93.79%. Compared to image classification methods based on remote sensing imagery, the proposed approach offers higher computational efficiency, significantly reduced model complexity and computational cost, and enhanced robustness. This work provides essential data support for material identification by space-based telescopes and establishes an algorithmic and experimental foundation for intelligent space situational awareness systems. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

20 pages, 5981 KB

Open AccessArticle

YOLO11-MSCAM UAV Remote Sensing-Based Detection of Illegal Rare-Earth Mining with Multi-Scale Convolution and Attention Module

by Hengkai Li, Yingming Cai, Shengdong Nie and Kunming Liu

Remote Sens. 2026, 18(5), 738; https://doi.org/10.3390/rs18050738 - 28 Feb 2026

Viewed by 207

Abstract

Ion-adsorption rare-earth mining in southern China often leaves small, fragmented disturbances in rugged, forested terrain, making UAV-based enforcement challenging due to confusion with bare ground, canopy gaps, and shadows. We propose YOLO11-MSCAM, an enhanced YOLO11vm detector in which the original SPPF at the [...] Read more.

Ion-adsorption rare-earth mining in southern China often leaves small, fragmented disturbances in rugged, forested terrain, making UAV-based enforcement challenging due to confusion with bare ground, canopy gaps, and shadows. We propose YOLO11-MSCAM, an enhanced YOLO11vm detector in which the original SPPF at the backbone–neck junction is replaced by a Multi-Scale Convolution–Attention Module that cascades channel attention, spatial attention, and multi-scale residual convolutions to enhance context aggregation and suppress background clutter. We build a field-acquired UAV dataset, SIMA (0.05 m GSD; September–November 2023), generating 1630 non-overlapping 640 × 640 orthomosaic tiles split into 1320/147/163 for training/validation/testing; five-lens raw images (nadir + oblique) are additionally used as auxiliary training samples and for post-detection verification. On the test set, YOLO11-MSCAM achieves mAP@0.5 = 83.24%, mAP@0.5:0.95 = 58.29%, and F1 = 79.92%, outperforming YOLOv11m and other detectors (YOLOv5m/6m/8m/9m/10m and Faster R-CNN with ResNet-50). With 19.67 M parameters, 67.34 GFLOPs@640, and 45.86 FPS, it supports tile-based batch screening to prioritize suspicious sites for field checks and evidence collection. Full article

► Show Figures

Figure 1

22 pages, 87137 KB

Open AccessArticle

FLD-Net for Floating Litter Detection in UAV Remote Sensing

by Xingyue Wang, Bin Zhou, Xia Ye, Lidong Wang and Zhen Wang

Remote Sens. 2026, 18(5), 736; https://doi.org/10.3390/rs18050736 - 28 Feb 2026

Viewed by 116

Abstract

Unmanned Aerial Vehicles provide a cost-effective solution for water environment monitoring, yet detecting floating litter remains challenging due to small target scales, complex geometries, and severe surface interferences. To bridge the data deficiency in this domain, this study introduces UAV-Flow, a multi-scenario benchmark [...] Read more.

Unmanned Aerial Vehicles provide a cost-effective solution for water environment monitoring, yet detecting floating litter remains challenging due to small target scales, complex geometries, and severe surface interferences. To bridge the data deficiency in this domain, this study introduces UAV-Flow, a multi-scenario benchmark dataset wherein small-scale targets constitute 78.9%. Building upon this foundation, we propose the Floating Litter Detection Network (FLD-Net), a lightweight, real-time detection framework tailored for edge deployment. Adopting a progressive optimization paradigm, FLD-Net integrates three cascaded enhancement modules to achieve holistic performance gains across feature extraction, cross-scale fusion, and noise suppression. Specifically, the Deformation Feature Extraction Module (DFEM) enhances backbone adaptability to small targets and non-rigid deformations; the Dynamic Cross-scale Fusion Network (DCFN) facilitates efficient cross-scale semantic fusion via content-aware upsampling and an asymmetric topology; and the Dual-domain Anti-noise Attention (DANA) mechanism achieves discriminative decoupling between target semantics and structural noise through spatial-channel interaction. Experimental results on UAV-Flow demonstrate that FLD-Net achieves an mAP₅₀ of 80.47%. Compared to the YOLOv11s baseline, it improves Recall and mAP₅₀ by 11.66% and 8.51%, respectively, with only 9.9 M parameters. Furthermore, deployment on the NVIDIA Jetson Xavier NX yields an inference latency of 14 ms and an energy efficiency of 4.80 FPS/W, confirming the system’s robustness and viability for automated pollution monitoring. Full article

(This article belongs to the Special Issue Remote Sensing Intelligent Interpretation in the Era of Large Models and Intelligent Agents: New Challenges, Methods and Opportunities)

► Show Figures

Figure 1

29 pages, 1954 KB

Open AccessReview

A Review on Bathymetric Inversion Research Based on Deep Learning Models and Remote Sensing Images

by Delong Liu, Yufeng Shi and Hong Fang

Remote Sens. 2026, 18(5), 720; https://doi.org/10.3390/rs18050720 - 27 Feb 2026

Viewed by 443

Abstract

High-precision inversion of shallow-water depth is crucial to marine resource development, ecological protection, and national defense security. Traditional acoustic detection, LiDAR, and empirical models are limited by high cost, low efficiency, or water quality dependence, struggling to meet people’s growing demand for shallow-water [...] Read more.

High-precision inversion of shallow-water depth is crucial to marine resource development, ecological protection, and national defense security. Traditional acoustic detection, LiDAR, and empirical models are limited by high cost, low efficiency, or water quality dependence, struggling to meet people’s growing demand for shallow-water depth. With the rapid development of theories and technologies such as remote sensing information, computer science, and artificial intelligence, bathymetric inversion based on remote sensing images and deep learning models has become a research hotspot. In this study, journal articles and conference papers were searched in the Web of Science (WOS) and Google Scholar databases using keywords such as “remote sensing image”, “bathymetry”, and “deep learning model”. The publication time of the papers ranges from January 2021 to September 2025. A total of 309 relevant studies were retrieved and, after screening and quality control, 132 core studies were finally selected as the research objects for this review. These studies were classified according to deep learning models, including CNN, U-Net, MLP, and RNN. The study analyzed and summarized the characteristics of different deep learning models in bathymetric inversion, as well as their data source selection, inversion accuracy, and limitations. Additionally, the future development trends were discussed in combination with the latest research results. Full article

(This article belongs to the Special Issue Artificial Intelligence and Big Data for Oceanography (2nd Edition))

► Show Figures

Figure 1

36 pages, 12324 KB

Open AccessArticle

Volumetric Path Planning and Visualization for ROV-Based Forward-Looking Sonar Scanning of 3D Water Areas

by Yu-Cheng Chou and Wei-Shan Chang

J. Mar. Sci. Eng. 2026, 14(5), 452; https://doi.org/10.3390/jmse14050452 - 27 Feb 2026

Viewed by 165

Abstract

Remotely operated vehicles (ROVs) equipped with multibeam forward-looking sonar are widely used for underwater object search in environments where visibility is limited. Ensuring complete three-dimensional (3D) scan coverage within a bounded mission duration remains a challenging planning problem due to sonar beam geometry [...] Read more.

Remotely operated vehicles (ROVs) equipped with multibeam forward-looking sonar are widely used for underwater object search in environments where visibility is limited. Ensuring complete three-dimensional (3D) scan coverage within a bounded mission duration remains a challenging planning problem due to sonar beam geometry and vehicle motion constraints. This study presents a deterministic, geometry-driven framework for volumetric path planning of ROV-based forward-looking sonar scanning in predefined circular and rectangular underwater volumes. The proposed approach constructs layered planar scan trajectories by explicitly incorporating sonar detection range, horizontal and vertical beamwidths, and scan volume geometry. Mission duration is analytically estimated from path length and vehicle kinematic parameters, enabling systematic comparison among multiple planning strategies. To support qualitative interpretation of scan effectiveness, a distance-based target position certainty metric is introduced and combined with the active sonar equation to estimate likely target locations within the scanned volume. Simulation results under idealized sensing and motion assumptions demonstrate that the corrected zigzag pattern for rectangular scan areas, as well as the corrected zigzag-II and corrected arithmetic spiral-III patterns for circular scan areas, achieve complete volumetric coverage with bounded mission duration and consistent localization performance. The proposed framework provides a transparent analytical baseline for evaluating volumetric scan path planning strategies for forward-looking sonar–equipped ROVs. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

Search Results (1,627)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,627)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI