MDPI - Publisher of Open Access Journals

19 pages, 1354 KB

Open AccessArticle

LSCA-RCNN: Large-Kernel Spatial Residual and Cascade Attention Network for Voxel-Based 3D Object Detection

by Yuyang Liu, Zhanyuan Jiang, Min Mao, Kun Zhang, Yu Xu, Mingchen Zhu and Xianjun Wu

Sensors 2026, 26(13), 4089; https://doi.org/10.3390/s26134089 (registering DOI) - 27 Jun 2026

Viewed by 204

LiDAR-based 3D object detection remains challenging due to sparse and irregular point cloud distributions, which degrade detection accuracy for small and occluded objects. In view of this, this paper proposes a novel two-stage voxel-based 3D detector, namely LSCA-RCNN, to address these issues. First, [...] Read more.

LiDAR-based 3D object detection remains challenging due to sparse and irregular point cloud distributions, which degrade detection accuracy for small and occluded objects. In view of this, this paper proposes a novel two-stage voxel-based 3D detector, namely LSCA-RCNN, to address these issues. First, spatial residual blocks (SRBs) and large-kernel spatial-wise convolutions are integrated into the 3D backbone to suppress feature degradation and to expand the receptive fields for stable multi-scale feature learning. Second, a ConvNeXt-based 2D backbone with spatial attention is constructed to enhance discriminative feature representation of small objects. Third, a cascaded detection head embedded with fine-grained grouped convolutions and cross-stage cross-attention is designed to achieve progressive bounding box refinement and to improve localization precision. Extensive evaluations on the KITTI dataset with the R40 metric show that the proposed method achieves consistent performance improvements over the baseline. In the moderate setting, LSCA-RCNN increases the 3D AP by 2.12%, 7.66%, and 5.43% for cars, pedestrians, and cyclists, respectively, while achieving gains of 1.62%, 5.05%, and 7.05% under the hard setting. These results validate the effectiveness and robustness of the proposed LSCA-RCNN for complex and challenging autonomous driving detection tasks. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 4th Edition)

18 pages, 30352 KB

Open AccessArticle

An Intelligent Building Recognition Method in Remote Sensing Images Based on Cascade R-CNN

by Mingguang Diao, Changyuan Shen, Jikang Jiang, Wenji Li and Zheng Lian

Appl. Sci. 2026, 16(12), 6277; https://doi.org/10.3390/app16126277 - 22 Jun 2026

Viewed by 137

Abstract

Building recognition and detection in remote sensing images are of great significance for urban planning, spatial database updating, and the construction of urban geographic information systems. For remote sensing images with complex background information, variations in the size of building objects make automatic [...] Read more.

Building recognition and detection in remote sensing images are of great significance for urban planning, spatial database updating, and the construction of urban geographic information systems. For remote sensing images with complex background information, variations in the size of building objects make automatic building detection and recognition challenging, thereby affecting the recognition accuracy of deep learning models. At the same time, the lack of a standardized workflow for converting detection results into vector data formats makes it difficult to directly transform building detection results into usable GIS-compatible vector data. Based on the Cascade R-CNN model, an intelligent building recognition model for remote sensing images and a vectorization workflow for the recognition results are proposed. To address the issue of building recognition accuracy in remote sensing images, an intelligent building recognition model comprising ResNet101, a Feature Pyramid Network (FPN), a Region Proposal Network (RPN), and a cascade detector is proposed, which enhances the recognition precision and localization capability of building objects in multi-scale remote sensing images. To address the efficiency issue of vectorizing detection results, a procedural conversion method for building detection results in remote sensing images is proposed, which converts raster recognition results into GIS-compatible vector files through data verification, information extraction, boundary construction, polygon generation, and format conversion. Experiments show that the intelligent recognition model achieves a recall of 0.958, a miss rate of 0.042, a precision of 0.963, and an F1-score of 0.960. In addition, mAP@0.5, mAP@0.5:0.95, and mean IoU reach 0.954, 0.793, and 0.742, respectively, indicating good performance in building detection and localization. Compared with manual vectorization, the automated workflow reduces the processing time for 57 raster files from 25.4 min to 3.1 min, corresponding to an 87.8% reduction in processing time. These results indicate that the proposed method improves building recognition accuracy while enhancing the efficiency of converting recognition results into GIS vector data, showing application potential for urban spatial information extraction. Full article

(This article belongs to the Topic Artificial Intelligence, Remote Sensing and Digital Twin Driving Innovation in Sustainable Natural Resources and Ecology)

► Show Figures

Figure 1

20 pages, 5981 KB

Open AccessArticle

YOLO11-MSCAM UAV Remote Sensing-Based Detection of Illegal Rare-Earth Mining with Multi-Scale Convolution and Attention Module

by Hengkai Li, Yingming Cai, Shengdong Nie and Kunming Liu

Remote Sens. 2026, 18(5), 738; https://doi.org/10.3390/rs18050738 - 28 Feb 2026

Cited by 1 | Viewed by 696

Abstract

Ion-adsorption rare-earth mining in southern China often leaves small, fragmented disturbances in rugged, forested terrain, making UAV-based enforcement challenging due to confusion with bare ground, canopy gaps, and shadows. We propose YOLO11-MSCAM, an enhanced YOLO11vm detector in which the original SPPF at the [...] Read more.

Ion-adsorption rare-earth mining in southern China often leaves small, fragmented disturbances in rugged, forested terrain, making UAV-based enforcement challenging due to confusion with bare ground, canopy gaps, and shadows. We propose YOLO11-MSCAM, an enhanced YOLO11vm detector in which the original SPPF at the backbone–neck junction is replaced by a Multi-Scale Convolution–Attention Module that cascades channel attention, spatial attention, and multi-scale residual convolutions to enhance context aggregation and suppress background clutter. We build a field-acquired UAV dataset, SIMA (0.05 m GSD; September–November 2023), generating 1630 non-overlapping 640 × 640 orthomosaic tiles split into 1320/147/163 for training/validation/testing; five-lens raw images (nadir + oblique) are additionally used as auxiliary training samples and for post-detection verification. On the test set, YOLO11-MSCAM achieves mAP@0.5 = 83.24%, mAP@0.5:0.95 = 58.29%, and F1 = 79.92%, outperforming YOLOv11m and other detectors (YOLOv5m/6m/8m/9m/10m and Faster R-CNN with ResNet-50). With 19.67 M parameters, 67.34 GFLOPs@640, and 45.86 FPS, it supports tile-based batch screening to prioritize suspicious sites for field checks and evidence collection. Full article

► Show Figures

Figure 1

17 pages, 5035 KB

Open AccessArticle

An Improved Cascade R-CNN-Based Fastener Detection Method for Coating Workshop Inspection

by Jiaqi Liu, Shanhui Liu, Yuhong Chen, Jiawen Zhao and Jiahao Fu

Coatings 2026, 16(1), 37; https://doi.org/10.3390/coatings16010037 - 30 Dec 2025

Viewed by 577

Abstract

To address the challenges of small fastener targets, complex backgrounds, and the low efficiency of traditional manual inspection in coating workshop scenarios, this paper proposes an improved Cascade R-CNN-based fastener detection method. A VOC-format dataset was constructed covering three target categories—Marking-painted fastener, Fastener, [...] Read more.

To address the challenges of small fastener targets, complex backgrounds, and the low efficiency of traditional manual inspection in coating workshop scenarios, this paper proposes an improved Cascade R-CNN-based fastener detection method. A VOC-format dataset was constructed covering three target categories—Marking-painted fastener, Fastener, and Fallen off—which represents typical inspection scenarios of coating equipment under diverse operating conditions and enhances the adaptability of the model. Within the Cascade R-CNN framework, three improvements were introduced: the Convolutional Block Attention Module (CBAM) was integrated into the ResNet-101 backbone to enhance feature representation of small objects; anchor scales were reduced to better align with the actual size distribution of fasteners; and Soft-NMS was adopted in place of conventional NMS to effectively reduce missed detections in overlapping regions. Experimental results demonstrate that the proposed method achieves a mean Average Precision (mAP) of 96.60% on the self-constructed dataset, with both Precision and Recall exceeding 95%, significantly outperforming Faster R-CNN and the original Cascade R-CNN. The method enables accurate detection and missing-state recognition of fasteners in complex backgrounds and small-object scenarios, providing reliable technical support for the automation and intelligence of printing equipment inspection. Full article

(This article belongs to the Special Issue Intelligent Monitoring, Control and Manufacturing in Coating Technologies)

► Show Figures

Figure 1

18 pages, 3847 KB

Open AccessArticle

Research on the Detection of Ocean Internal Waves Based on the Improved Faster R-CNN in SAR Images

by Gaoyuan Shen, Zhi Zeng, Hao Huang, Zhifan Jiao and Jun Song

J. Mar. Sci. Eng. 2026, 14(1), 23; https://doi.org/10.3390/jmse14010023 - 23 Dec 2025

Cited by 1 | Viewed by 1197

Abstract

Ocean internal waves occur in stably stratified seawater and play a crucial role in energy cascade, material transport, and military activities. However, the complex and irregular spatial patterns of internal waves pose significant challenges for accurate detection in SAR images when using conventional [...] Read more.

Ocean internal waves occur in stably stratified seawater and play a crucial role in energy cascade, material transport, and military activities. However, the complex and irregular spatial patterns of internal waves pose significant challenges for accurate detection in SAR images when using conventional convolutional neural networks, which often lack adaptability to geometric variations. To address this problem, this paper proposes a refined Faster R-CNN detection framework, termed “rFaster R-CNN”, and adopts a transfer learning strategy to enhance model generalization and robustness. In the feature extraction stage, a backbone network called “ResNet50_CDCN” that integrates the CBAM attention mechanism and DCNv2 deformable convolution is constructed to enhance the feature expression ability of key regions in the images. Experimental results show that in the internal wave dataset constructed in this paper, this network improves the detection accuracy by approximately 3% compared to the original ResNet50 network. At the region proposal stage, this paper further adds two small-scale anchors and combines the ROI Align and FPN modules, effectively enhancing the spatial hierarchical information and semantic expression ability of ocean internal waves. compared with classical object detection algorithms such as SSD, YOLO, and RetinaNet, the proposed “rFaster R-CNN” achieves superior detection performance, showing significant improvements in both accuracy and robustness. Full article

(This article belongs to the Special Issue Artificial Intelligence and Its Application in Ocean Engineering)

► Show Figures

Figure 1

24 pages, 3622 KB

Open AccessArticle

Simple and Affordable Vision-Based Detection of Seedling Deficiencies to Relieve Labor Shortages in Small-Scale Cruciferous Nurseries

by Po-Jui Su, Tse-Min Chen and Jung-Jeng Su

Agriculture 2025, 15(21), 2227; https://doi.org/10.3390/agriculture15212227 - 25 Oct 2025

Cited by 1 | Viewed by 982

Abstract

Labor shortages in seedling nurseries, particularly in manual inspection and replanting, hinder operational efficiency despite advancements in automation. This study aims to develop a cost-effective, GPU-free machine vision system to automate the detection of deficient seedlings in plug trays, specifically for small-scale nursery [...] Read more.

Labor shortages in seedling nurseries, particularly in manual inspection and replanting, hinder operational efficiency despite advancements in automation. This study aims to develop a cost-effective, GPU-free machine vision system to automate the detection of deficient seedlings in plug trays, specifically for small-scale nursery operations. The proposed Deficiency Detection and Replanting Positioning (DDRP) machine integrates low-cost components including an Intel RealSense Depth Camera D435, Raspberry Pi 4B, stepper motors, and a programmable logic controller (PLC). It utilizes OpenCV’s Haar cascade algorithm, HSV color space conversion, and Otsu thresholding to enable real-time image processing without GPU acceleration. The proposed Deficiency Detection and Replanting Positioning (DDRP) machine integrates low-cost components including an Intel RealSense Depth Camera D435, Raspberry Pi 4B, stepper motors, and a programmable logic controller (PLC). It utilizes OpenCV’s Haar cascade algorithm, HSV color space conversion, and Otsu thresholding to enable real-time image processing without GPU acceleration. Under controlled laboratory conditions, the DDRP-Machine achieved high detection accuracy (96.0–98.7%) and precision rates (82.14–83.78%). Benchmarking against deep-learning models such as YOLOv5x and Mask R-CNN showed comparable performance, while requiring only one-third to one-fifth of the cost and avoiding complex infrastructure. The Batch Detection (BD) mode significantly reduced processing time compared to Continuous Detection (CD), enhancing real-time applicability. The DDRP-Machine demonstrates strong potential to improve seedling inspection efficiency and reduce labor dependency in nursery operations. Its modular design and minimal hardware requirements make it a practical and scalable solution for resource-limited environments. This study offers a viable pathway for small-scale farms to adopt intelligent automation without the financial burden of high-end AI systems. Future enhancements, adaptive lighting and self-learning capabilities, will further improve field robustness and including broaden its applicability across diverse nursery conditions. Full article

(This article belongs to the Topic Digital Agriculture, Smart Farming and Crop Monitoring)

► Show Figures

Figure 1

25 pages, 17236 KB

Open AccessArticle

Hierarchical Deep Learning Model for Identifying Similar Targets in UAV Imagery

by Dmytro Borovyk, Oleksander Barmak, Pavlo Radiuk and Iurii Krak

Drones 2025, 9(11), 743; https://doi.org/10.3390/drones9110743 - 25 Oct 2025

Cited by 5 | Viewed by 1893

Abstract

Accurate object detection in UAV imagery is critical for situational awareness, yet conventional deep learning models often struggle to distinguish between visually similar targets. To address this challenge, this study introduces a hierarchical deep learning architecture that decomposes the multi-class detection task into [...] Read more.

Accurate object detection in UAV imagery is critical for situational awareness, yet conventional deep learning models often struggle to distinguish between visually similar targets. To address this challenge, this study introduces a hierarchical deep learning architecture that decomposes the multi-class detection task into a structured, multi-level classification cascade. Our approach combines a high-recall Faster R-CNN for initial object proposal, specialized YOLO models for granular feature extraction, and a dedicated FT-Transformer for fine-grained classification. Experimental evaluation on a complex dataset demonstrated the effectiveness of this strategy. The hierarchical model achieved an aggregate

F_{1}

-score of 93.9%, representing a 1.41% improvement over the 92.46%

F_{1}

-score from a traditional, non-hierarchical baseline model. These results indicate that a modular, coarse-to-fine cascade can effectively reduce inter-class ambiguity, offering a scalable approach to improving object recognition in complex UAV-based monitoring environments. This work contributes a promising approach to developing more accurate and reliable situational awareness systems. Full article

(This article belongs to the Special Issue Advances in Deep Learning for Drones and Its Applications: 2nd Edition)

► Show Figures

Graphical abstract

19 pages, 2933 KB

Open AccessArticle

Image-Based Detection of Chinese Bayberry (Myrica rubra) Maturity Using Cascaded Instance Segmentation and Multi-Feature Regression

by Hao Zheng, Li Sun, Yue Wang, Han Yang and Shuwen Zhang

Horticulturae 2025, 11(10), 1166; https://doi.org/10.3390/horticulturae11101166 - 1 Oct 2025

Viewed by 1101

Abstract

The accurate assessment of Chinese bayberry (Myrica rubra) maturity is critical for intelligent harvesting. This study proposes a novel cascaded framework combining instance segmentation and multi-feature regression for accurate maturity detection. First, a lightweight SOLOv2-Light network is employed to segment each [...] Read more.

The accurate assessment of Chinese bayberry (Myrica rubra) maturity is critical for intelligent harvesting. This study proposes a novel cascaded framework combining instance segmentation and multi-feature regression for accurate maturity detection. First, a lightweight SOLOv2-Light network is employed to segment each fruit individually, which significantly reduces computational costs with only a marginal drop in accuracy. Then, a multi-feature extraction network is developed to fuse deep semantic, color (LAB space), and multi-scale texture features, enhanced by a channel attention mechanism for adaptive weighting. The maturity ground truth is defined using the a*/b* ratio measured by a colorimeter, which correlates strongly with anthocyanin accumulation and visual ripeness. Experimental results demonstrated that the proposed method achieves a mask mAP of 0.788 on the instance segmentation task, outperforming Mask R-CNN and YOLACT. For maturity prediction, a mean absolute error of 3.946% is attained, which is a significant improvement over the baseline. When the data are discretized into three maturity categories, the overall accuracy reaches 95.51%, surpassing YOLOX-s and Faster R-CNN by a considerable margin while reducing processing time by approximately 46%. The modular design facilitates easy adaptation to new varieties. This research provides a robust and efficient solution for in-field bayberry maturity detection, offering substantial value for the development of automated harvesting systems. Full article

(This article belongs to the Topic Intelligent Agriculture: Perception Technologies and Agricultural Equipment for Crop Production Processes)

► Show Figures

Figure 1

24 pages, 4112 KB

Open AccessArticle

Enhancing Breast Lesion Detection in Mammograms via Transfer Learning

by Beibit Abdikenov, Dimash Rakishev, Yerzhan Orazayev and Tomiris Zhaksylyk

J. Imaging 2025, 11(9), 314; https://doi.org/10.3390/jimaging11090314 - 13 Sep 2025

Cited by 4 | Viewed by 2932

Abstract

Early detection of breast cancer via mammography enhances patient survival rates, prompting this study to assess object detection models—Cascade R-CNN, YOLOv12 (S, L, and X variants), RTMDet-X, and RT-DETR-X—for detecting masses and calcifications across four public datasets (INbreast, CBIS-DDSM, VinDr-Mammo, and EMBED). The [...] Read more.

Early detection of breast cancer via mammography enhances patient survival rates, prompting this study to assess object detection models—Cascade R-CNN, YOLOv12 (S, L, and X variants), RTMDet-X, and RT-DETR-X—for detecting masses and calcifications across four public datasets (INbreast, CBIS-DDSM, VinDr-Mammo, and EMBED). The evaluation employs a standardized preprocessing approach (CLAHE, cropping) and augmentation (rotations, scaling), with transfer learning tested by training on combined datasets (e.g., INbreast + CBIS-DDSM) and validating on held-out sets (e.g., VinDr-Mammo). Performance is measured using precision, recall, mean Average Precision at IoU 0.5 (

{mAP}_{50}

), and F1-score. YOLOv12-L excels in mass detection with an

{mAP}_{50}

of 0.963 and F1-score up to 0.917 on INbreast, while RTMDet-X achieves an

{mAP}_{50}

of 0.697 on combined datasets with transfer learning. Preprocessing improves

{mAP}_{50}

by up to 0.209, and transfer learning elevates INbreast performance to an

{mAP}_{50}

of 0.995, though it incurs 5–11% drops on CBIS-DDSM (0.566 to 0.447) and VinDr-Mammo (0.59 to 0.5) due to domain shifts. EMBED yields a low

{mAP}_{50}

of 0.306 due to label inconsistencies, and calcification detection remains weak (

{mAP}_{50}

< 0.116), highlighting the value of high-capacity models, preprocessing, and augmentation for mass detection while identifying calcification detection and domain adaptation as key areas for future investigation. Full article

(This article belongs to the Section Medical Imaging)

► Show Figures

Graphical abstract

26 pages, 7402 KB

Open AccessArticle

Hybrid Architecture for Tight Sandstone: Automated Mineral Identification and Quantitative Petrology

by Lanfang Dong, Chenxu Sun, Xiaolu Yu, Xinming Zhang, Menglian Chen and Mingyang Xu

Minerals 2025, 15(9), 962; https://doi.org/10.3390/min15090962 - 11 Sep 2025

Cited by 1 | Viewed by 1008

Abstract

This study proposes an integrated computer vision system for automated petrological analysis of tight sandstone micro-structures. The system combines Zero-Shot Segmentation SAM (Segment Anything Model), Mask R-CNN (Region-Based Convolutional Neural Networks) instance segmentation, and an improved MetaFormer architecture with Cascaded Group Attention (CGA) [...] Read more.

This study proposes an integrated computer vision system for automated petrological analysis of tight sandstone micro-structures. The system combines Zero-Shot Segmentation SAM (Segment Anything Model), Mask R-CNN (Region-Based Convolutional Neural Networks) instance segmentation, and an improved MetaFormer architecture with Cascaded Group Attention (CGA) attention mechanism, together with a parameter analysis module to form a hybrid deep learning system. This enables end-to-end mineral identification and multi-scale structural quantification of granulometric properties, grain contact relationships, and pore networks. The system is validated on proprietary tight sandstone datasets, SMISD (Sandstone Microscopic Image Segmentation Dataset)/SMIRD (Sandstone Microscopic Image Recognition Dataset). It achieves 92.1% mIoU segmentation accuracy and 90.7% mineral recognition accuracy while reducing processing time from more than 30 min to less than 2 min per sample. The system provides standardized reservoir characterization through automated generation of quantitative reports (Excel), analytical images (JPG), and structured data (JSON), demonstrating production-ready efficiency for tight sandstone evaluation. Full article

(This article belongs to the Section Mineral Exploration Methods and Applications)

► Show Figures

Figure 1

15 pages, 2951 KB

Open AccessArticle

Fusing Residual and Cascade Attention Mechanisms in Voxel–RCNN for 3D Object Detection

by You Lu, Yuwei Zhang, Xiangsuo Fan, Dengsheng Cai and Rui Gong

Sensors 2025, 25(17), 5497; https://doi.org/10.3390/s25175497 - 4 Sep 2025

Cited by 1 | Viewed by 1778

Abstract

In this paper, a high-precision 3D object detector—Voxel–RCNN—is adopted as the baseline detector, and an improved detector named RCAVoxel-RCNN is proposed. To address various issues present in current mainstream 3D point cloud voxelisation methods, such as the suboptimal performance of Region Proposal Networks [...] Read more.

In this paper, a high-precision 3D object detector—Voxel–RCNN—is adopted as the baseline detector, and an improved detector named RCAVoxel-RCNN is proposed. To address various issues present in current mainstream 3D point cloud voxelisation methods, such as the suboptimal performance of Region Proposal Networks (RPNs) in generating candidate regions and the inadequate detection of small-scale objects caused by overly deep convolutional layers in both 3D and 2D backbone networks, this paper proposes a Cascade Attention Network (CAN). The CAN is designed to progressively refine and enhance the proposed regions, thereby producing more accurate detection results. Furthermore, a 3D Residual Network is introduced, which improves the representation of small objects by reducing the number of convolutional layers while incorporating residual connections. In the Bird’s-Eye View (BEV) feature extraction network, a Residual Attention Network (RAN) is developed. This follows a similar approach to the aforementioned 3D backbone network, leveraging the spatial awareness capabilities of the BEV. Additionally, the Squeeze-and-Excitation (SE) attention mechanism is incorporated to assign dynamic weights to features, allowing the network to focus more effectively on informative features. Experimental results on the KITTI validation dataset demonstrate the effectiveness of the proposed method, with detection accuracy for cars, pedestrians, and bicycles improving by 3.34%, 10.75%, and 4.61%, respectively, under the KITTI hard level. The primary evaluation metric adopted is the 3D Average Precision (AP), computed over 40 recall positions (R40). The Intersection over IoU thresholds used are 0.7 for cars and 0.5 for both pedestrians and bicycles. Full article

(This article belongs to the Section Communications)

► Show Figures

Figure 1

37 pages, 10467 KB

Open AccessArticle

Cascaded Hierarchical Attention with Adaptive Fusion for Visual Grounding in Remote Sensing

by Huming Zhu, Tianqi Gao, Zhixian Li, Zhipeng Chen, Qiuming Li, Kongmiao Miao, Biao Hou and Licheng Jiao

Remote Sens. 2025, 17(17), 2930; https://doi.org/10.3390/rs17172930 - 23 Aug 2025

Cited by 2 | Viewed by 1906

Abstract

Visual grounding for remote sensing (RSVG) is the task of localizing the referred object in remote sensing (RS) images by parsing free-form language descriptions. However, RSVG faces the challenge of low detection accuracy due to unbalanced multi-scale grounding capabilities, where large objects have [...] Read more.

Visual grounding for remote sensing (RSVG) is the task of localizing the referred object in remote sensing (RS) images by parsing free-form language descriptions. However, RSVG faces the challenge of low detection accuracy due to unbalanced multi-scale grounding capabilities, where large objects have more prominent grounding accuracy than small objects. Based on Faster R-CNN, we propose Faster R-CNN in Visual Grounding for Remote Sensing (FR-RSVG), a two-stage method for grounding RS objects. Building on this foundation, to enhance the ability to ground multi-scale objects, we propose Faster R-CNN with Adaptive Vision-Language Fusion (FR-AVLF), which introduces a layered Adaptive Vision-Language Fusion (AVLF) module. Specifically, this method can adaptively fuse deep or shallow visual features according to the input text (e.g., location-related or object characteristic descriptions), thereby optimizing semantic feature representation and improving grounding accuracy for objects of different scales. Given that RSVG is essentially an expanded form of RS object detection, and considering the knowledge the model acquired in prior RS object detection tasks, we propose Faster R-CNN with Adaptive Vision-Language Fusion Pretrained (FR-AVLF_PRE). To further enhance model performance, we propose Faster R-CNN with Cascaded Hierarchical Attention Grounding and Multi-Level Adaptive Vision-Language Fusion Pretrained (FR-CHAGAVLF_PRE), which introduces a cascaded hierarchical attention grounding mechanism, employs a more advanced language encoder, and improves upon AVLF by proposing Multi-Level AVLF, significantly improving localization accuracy in complex scenarios. Extensive experiments on the DIOR-RSVG dataset demonstrate that our model surpasses most existing advanced models. To validate the generalization capability of our model, we conducted zero-shot inference experiments on shared categories between DIOR-RSVG and both Complex Description DIOR-RSVG (DIOR-RSVG-C) and OPT-RSVG datasets, achieving performance superior to most existing models. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

24 pages, 2440 KB

Open AccessArticle

A Novel Dynamic Context Branch Attention Network for Detecting Small Objects in Remote Sensing Images

by Huazhong Jin, Yizhuo Song, Ting Bai, Kaimin Sun and Yepei Chen

Remote Sens. 2025, 17(14), 2415; https://doi.org/10.3390/rs17142415 - 12 Jul 2025

Cited by 2 | Viewed by 1325

Abstract

Detecting small objects in remote sensing images is challenging due to their size, which results in limited distinctive features. This limitation necessitates the effective use of contextual information for accurate identification. Many existing methods often struggle because they do not dynamically adjust the [...] Read more.

Detecting small objects in remote sensing images is challenging due to their size, which results in limited distinctive features. This limitation necessitates the effective use of contextual information for accurate identification. Many existing methods often struggle because they do not dynamically adjust the contextual scope based on the specific characteristics of each target. To address this issue and improve the detection performance of small objects (typically defined as objects with a bounding box area of less than 1024 pixels), we propose a novel backbone network called the Dynamic Context Branch Attention Network (DCBANet). We present the Dynamic Context Scale-Aware (DCSA) Block, which utilizes a multi-branch architecture to generate features with diverse receptive fields. Within each branch, a Context Adaptive Selection Module (CASM) dynamically weights information, allowing the model to focus on the most relevant context. To further enhance performance, we introduce an Efficient Branch Attention (EBA) module that adaptively reweights the parallel branches, prioritizing the most discriminative ones. Finally, to ensure computational efficiency, we design a Dual-Gated Feedforward Network (DGFFN), a lightweight yet powerful replacement for standard FFNs. Extensive experiments conducted on four public remote sensing datasets demonstrate that the DCBANet achieves impressive mAP@0.5 scores of 80.79% on DOTA, 89.17% on NWPU VHR-10, 80.27% on SIMD, and a remarkable 42.4% mAP@0.5:0.95 on the specialized small object benchmark AI-TOD. These results surpass RetinaNet, YOLOF, FCOS, Faster R-CNN, Dynamic R-CNN, SKNet, and Cascade R-CNN, highlighting its effectiveness in detecting small objects in remote sensing images. However, there remains potential for further improvement in multi-scale and weak target detection. Future work will integrate local and global context to enhance multi-scale object detection performance. Full article

(This article belongs to the Special Issue High-Resolution Remote Sensing Image Processing and Applications)

► Show Figures

Figure 1

26 pages, 5939 KB

Open AccessArticle

Multi-Resolution UAV Remote Sensing for Anthropogenic Debris Detection in Complex River Environments

by Peaceibisia Jack, Trent Biggs, Daniel Sousa, Lloyd Coulter, Sarah Hutmacher and Hilary McMillan

Remote Sens. 2025, 17(13), 2172; https://doi.org/10.3390/rs17132172 - 25 Jun 2025

Cited by 2 | Viewed by 1977

Abstract

Anthropogenic debris in urban floodplains poses significant environmental and ecological risks, with an estimated 4 to 12 million metric tons entering oceans annually via riverine transport. While remote sensing and artificial intelligence (AI) offer promising tools for automated debris detection, most existing datasets [...] Read more.

Anthropogenic debris in urban floodplains poses significant environmental and ecological risks, with an estimated 4 to 12 million metric tons entering oceans annually via riverine transport. While remote sensing and artificial intelligence (AI) offer promising tools for automated debris detection, most existing datasets focus on marine environments with homogeneous backgrounds, leaving a critical gap for complex terrestrial floodplains. This study introduces the San Diego River Debris Dataset, a multi-resolution UAV imagery collection with ground reference designed to support automated detection of anthropogenic debris in urban floodplains. The dataset includes manually annotated debris objects captured under diverse environmental conditions using two UAV platforms (DJI Matrice 300 and DJI Mini 2) across spatial resolutions ranging from 0.4 to 4.4 cm. We benchmarked five deep learning architectures (RetinaNet, SSD, Faster R-CNN, DetReg, Cascade R-CNN) to assess detection accuracy across varying image resolutions and environmental settings. Cascade R-CNN achieved the highest accuracy (0.93) at 0.4 cm resolution, with accuracy declining rapidly at resolutions above 1 cm and 3.3 cm. Spatial analysis revealed that 51% of debris was concentrated within unsheltered encampments, which occupied only 2.6% of the study area. Validation confirmed a strong correlation between predicted debris extent and field measurements, supporting the dataset’s operational reliability. This openly available dataset fills a gap in environmental monitoring resources and provides guides for future research and deployment of UAV-based debris detection systems in urban floodplain areas. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

30 pages, 8985 KB

Open AccessArticle

Dynamic Cascade Detector for Storage Tanks and Ships in Optical Remote Sensing Images

by Tong Wang, Bingxin Liu and Peng Chen

Remote Sens. 2025, 17(11), 1882; https://doi.org/10.3390/rs17111882 - 28 May 2025

Viewed by 1222

Abstract

Regional Convolutional Neural Network (RCNN)−based detectors have played a crucial role in object detection in remote sensing images due to their exceptional detection capabilities. Some studies have shown that different stages should have different Intersections of Union (IoU) thresholds to distinguish positive and [...] Read more.

Regional Convolutional Neural Network (RCNN)−based detectors have played a crucial role in object detection in remote sensing images due to their exceptional detection capabilities. Some studies have shown that different stages should have different Intersections of Union (IoU) thresholds to distinguish positive and negative samples because each stage has different IoU distributions. However, these studies have overlooked the fact that the IoU distribution at each stage changes continuously during the training process. Therefore, the IoU threshold at each stage should also be adjusted continuously to adapt to the changes in the IoU distribution. We realized that the IoU distribution at each stage is very similar to a Gaussian skewed distribution. In this paper, we introduce a novel dynamic IoU threshold method based on the Cascade RCNN architecture, called the Dynamic Cascade detector, with reference to the Gaussian skewed distribution. We tested the effectiveness of this method by detecting horizontal storage tanks and rotated ships in optical remote sensing images. Our experiments demonstrated that this technique can significantly improve detection results, as evaluated based on the COCO metric. In addition, the threshold range of the last stage impacts other stages, so the threshold range of one stage may change significantly when the number of stages changes. Furthermore, the threshold may not always increase during the training process and may decrease when the IoU distribution resembles a negatively skewed distribution. Full article

► Show Figures

Graphical abstract

Search Results (100)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (100)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI