MDPI - Publisher of Open Access Journals

21 pages, 5917 KiB

Open AccessArticle

VML-UNet: Fusing Vision Mamba and Lightweight Attention Mechanism for Skin Lesion Segmentation

by Tang Tang, Haihui Wang, Qiang Rao, Ke Zuo and Wen Gan

Electronics 2025, 14(14), 2866; https://doi.org/10.3390/electronics14142866 (registering DOI) - 17 Jul 2025

Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks [...] Read more.

Deep learning has advanced medical image segmentation, yet existing methods struggle with complex anatomical structures. Mainstream models, such as CNN, Transformer, and hybrid architectures, face challenges including insufficient information representation and redundant complexity, which limit their clinical deployment. Developing efficient and lightweight networks is crucial for accurate lesion localization and optimized clinical workflows. We propose the VML-UNet, a lightweight segmentation network with core innovations including the CPMamba module and the multi-scale local supervision module (MLSM). The CPMamba module integrates the visual state space (VSS) block and a channel prior attention mechanism to enable efficient modeling of spatial relationships with linear computational complexity through dynamic channel-space weight allocation, while preserving channel feature integrity. The MLSM enhances local feature perception and reduces the inference burden. Comparative experiments were conducted on three public datasets, including ISIC2017, ISIC2018, and PH2, with ablation experiments performed on ISIC2017. VML-UNet achieves 0.53 M parameters, 2.18 MB memory usage, and 1.24 GFLOPs time complexity, with its performance on the datasets outperforming comparative networks, validating its effectiveness. This study provides valuable references for developing lightweight, high-performance skin lesion segmentation networks, advancing the field of skin lesion segmentation. Full article

(This article belongs to the Section Bioelectronics)

► Show Figures

Figure 1

19 pages, 1521 KiB

Open AccessArticle

SAGEFusionNet: An Auxiliary Supervised Graph Neural Network for Brain Age Prediction as a Neurodegenerative Biomarker

by Suraj Kumar, Suman Hazarika and Cota Navin Gupta

Brain Sci. 2025, 15(7), 752; https://doi.org/10.3390/brainsci15070752 - 15 Jul 2025

Viewed by 77

Abstract

Background: The ability of Graph Neural Networks (GNNs) to analyse brain structural patterns in various kinds of neurodegenerative diseases, including Parkinson’s disease (PD), has drawn a lot of interest recently. One emerging technique in this field is brain age prediction, which estimates biological [...] Read more.

Background: The ability of Graph Neural Networks (GNNs) to analyse brain structural patterns in various kinds of neurodegenerative diseases, including Parkinson’s disease (PD), has drawn a lot of interest recently. One emerging technique in this field is brain age prediction, which estimates biological age to identify ageing patterns that may serve as biomarkers for such disorders. However, a significant problem with most of the GNNs is their depth, which can lead to issues like oversmoothing and diminishing gradients. Methods: In this study, we propose SAGEFusionNet, a GNN architecture specifically designed to enhance brain age prediction and assess PD-related brain ageing patterns using T1-weighted structural MRI (sMRI). SAGEFusionNet learns important ROIs for brain age prediction by incorporating ROI-aware pooling at every layer to overcome the above challenges. Additionally, it incorporates multi-layer feature fusion to capture multi-scale structural information across the network hierarchy and auxiliary supervision to enhance gradient flow and feature learning at multiple depths. The dataset utilised in this study was sourced from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. It included a total of 580 T1-weighted sMRI scans from healthy individuals. The brain sMRI scans were parcellated into 56 regions of interest (ROIs) using the LPBA40 brain atlas in CAT12. The anatomical graph was constructed based on grey matter (GM) volume features. This graph served as input to the GNN models, along with GM and white matter (WM) volume as node features. All models were trained using 5-fold cross-validation to predict brain age and subsequently tested for performance evaluation. Results: The proposed framework achieved a mean absolute error (MAE) of

4.24 \pm 0.38

years and a mean Pearson’s Correlation Coefficient (PCC) of

0.72 \pm 0.03

during cross-validation. We also used 215 PD patient scans from the Parkinson’s Progression Markers Initiative (PPMI) database to assess the model’s performance and validate it. The initial findings revealed that out of 215 individuals with Parkinson’s disease, 213 showed higher and 2 showed lower predicted brain ages than their actual ages, with a mean MAE of 13.36 years (95% confidence interval: 12.51–14.28). Conclusions: These results suggest that brain age prediction using the proposed method may provide important insights into neurodegenerative diseases. Full article

(This article belongs to the Section Neurorehabilitation)

► Show Figures

Figure 1

21 pages, 3826 KiB

Open AccessArticle

UAV-OVD: Open-Vocabulary Object Detection in UAV Imagery via Multi-Level Text-Guided Decoding

by Lijie Tao, Guoting Wei, Zhuo Wang, Zhaoshuai Qi, Ying Li and Haokui Zhang

Drones 2025, 9(7), 495; https://doi.org/10.3390/drones9070495 - 14 Jul 2025

Viewed by 157

Abstract

Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore [...] Read more.

Object detection in drone-captured imagery has attracted significant attention due to its wide range of real-world applications, including surveillance, disaster response, and environmental monitoring. Although the majority of existing methods are developed under closed-set assumptions, and some recent studies have begun to explore open-vocabulary or open-world detection, their application to UAV imagery remains limited and underexplored. In this paper, we address this limitation by exploring the relationship between images and textual semantics to extend object detection in UAV imagery to an open-vocabulary setting. We propose a novel and efficient detector named Unmanned Aerial Vehicle Open-Vocabulary Detector (UAV-OVD), specifically designed for drone-captured scenes. To facilitate open-vocabulary object detection, we propose improvements from three complementary perspectives. First, at the training level, we design a region–text contrastive loss to replace conventional classification loss, allowing the model to align visual regions with textual descriptions beyond fixed category sets. Structurally, building on this, we introduce a multi-level text-guided fusion decoder that integrates visual features across multiple spatial scales under language guidance, thereby improving overall detection performance and enhancing the representation and perception of small objects. Finally, from the data perspective, we enrich the original dataset with synonym-augmented category labels, enabling more flexible and semantically expressive supervision. Experiments conducted on two widely used benchmark datasets demonstrate that our approach achieves significant improvements in both mean mAP and Recall. For instance, for Zero-Shot Detection on xView, UAV-OVD achieves 9.9 mAP and 67.3 Recall, 1.1 and 25.6 higher than that of YOLO-World. In terms of speed, UAV-OVD achieves 53.8 FPS, nearly twice as fast as YOLO-World and five times faster than DetrReg, demonstrating its strong potential for real-time open-vocabulary detection in UAV imagery. Full article

(This article belongs to the Special Issue Applications of UVs in Digital Photogrammetry and Image Processing)

► Show Figures

Figure 1

21 pages, 24495 KiB

Open AccessArticle

UAMS: An Unsupervised Anomaly Detection Method Integrating MSAA and SSPCAB

by Zhe Li, Wenhui Chen and Weijie Wang

Symmetry 2025, 17(7), 1119; https://doi.org/10.3390/sym17071119 - 12 Jul 2025

Viewed by 181

Abstract

Anomaly detection methods play a crucial role in automated quality control within modern manufacturing systems. In this context, unsupervised methods are increasingly favored due to their independence from large-scale labeled datasets. However, existing methods present limited multi-scale feature extraction ability and may fail [...] Read more.

Anomaly detection methods play a crucial role in automated quality control within modern manufacturing systems. In this context, unsupervised methods are increasingly favored due to their independence from large-scale labeled datasets. However, existing methods present limited multi-scale feature extraction ability and may fail to effectively capture subtle anomalies. To address these challenges, we propose UAMS, a pyramid-structured normalization flow framework that leverages the symmetry in feature recombination to harmonize multi-scale interactions. The proposed framework integrates a Multi-Scale Attention Aggregation (MSAA) module for cross-scale dynamic fusion, as well as a Self-Supervised Predictive Convolutional Attention Block (SSPCAB) for spatial channel attention and masked prediction learning. Experiments on the MVTecAD dataset show that UAMS largely outperforms state-of-the-art unsupervised methods, in terms of detection and localization accuracy, while maintaining high inference efficiency. For example, when comparing UAMS against the baseline model on the carpet category, the AUROC is improved from 90.8% to 94.5%, and AUPRO is improved from 91.0% to 92.9%. These findings validate the potential of the proposed method for use in real industrial inspection scenarios. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

22 pages, 1661 KiB

Open AccessArticle

UniText: A Unified Framework for Chinese Text Detection, Recognition, and Restoration in Ancient Document and Inscription Images

by Lu Shen, Zewei Wu, Xiaoyuan Huang, Boliang Zhang, Su-Kit Tang, Jorge Henriques and Silvia Mirri

Appl. Sci. 2025, 15(14), 7662; https://doi.org/10.3390/app15147662 - 8 Jul 2025

Viewed by 260

Abstract

Processing ancient text images presents significant challenges due to severe visual degradation, missing glyph structures, and various types of noise caused by aging. These issues are particularly prominent in Chinese historical documents and stone inscriptions, where diverse writing styles, multi-angle capturing, uneven lighting, [...] Read more.

Processing ancient text images presents significant challenges due to severe visual degradation, missing glyph structures, and various types of noise caused by aging. These issues are particularly prominent in Chinese historical documents and stone inscriptions, where diverse writing styles, multi-angle capturing, uneven lighting, and low contrast further hinder the performance of traditional OCR techniques. In this paper, we propose a unified neural framework, UniText, for the detection, recognition, and glyph restoration of Chinese characters in images of historical documents and inscriptions. UniText operates at the character level and processes full-page inputs, making it robust to multi-scale, multi-oriented, and noise-corrupted text. The model adopts a multi-task architecture that integrates spatial localization, semantic recognition, and visual restoration through stroke-aware supervision and multi-scale feature aggregation. Experimental results on our curated dataset of ancient Chinese texts demonstrate that UniText achieves a competitive performance in detection and recognition while producing visually faithful restorations under challenging conditions. This work provides a technically scalable and generalizable framework for image-based document analysis, with potential applications in historical document processing, digital archiving, and broader tasks in text image understanding. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence for Image Processing: Insights and Applications)

► Show Figures

Figure 1

46 pages, 5911 KiB

Open AccessArticle

Leveraging Prior Knowledge in Semi-Supervised Learning for Precise Target Recognition

by Guohao Xie, Zhe Chen, Yaan Li, Mingsong Chen, Feng Chen, Yuxin Zhang, Hongyan Jiang and Hongbing Qiu

Remote Sens. 2025, 17(14), 2338; https://doi.org/10.3390/rs17142338 - 8 Jul 2025

Viewed by 263

Abstract

Underwater acoustic target recognition (UATR) is challenged by complex marine noise, scarce labeled data, and inadequate multi-scale feature extraction in conventional methods. This study proposes DART-MT, a semi-supervised framework that integrates a Dual Attention Parallel Residual Network Transformer with a mean teacher paradigm, [...] Read more.

Underwater acoustic target recognition (UATR) is challenged by complex marine noise, scarce labeled data, and inadequate multi-scale feature extraction in conventional methods. This study proposes DART-MT, a semi-supervised framework that integrates a Dual Attention Parallel Residual Network Transformer with a mean teacher paradigm, enhanced by domain-specific prior knowledge. The architecture employs a Convolutional Block Attention Module (CBAM) for localized feature refinement, a lightweight New Transformer Encoder for global context modeling, and a novel TriFusion Block to synergize spectral–temporal–spatial features through parallel multi-branch fusion, addressing the limitations of single-modality extraction. Leveraging the mean teacher framework, DART-MT optimizes consistency regularization to exploit unlabeled data, effectively mitigating class imbalance and annotation scarcity. Evaluations on the DeepShip and ShipsEar datasets demonstrate state-of-the-art accuracy: with 10% labeled data, DART-MT achieves 96.20% (DeepShip) and 94.86% (ShipsEar), surpassing baseline models by 7.2–9.8% in low-data regimes, while reaching 98.80% (DeepShip) and 98.85% (ShipsEar) with 90% labeled data. Under varying noise conditions (−20 dB to 20 dB), the model maintained a robust performance (F1-score: 92.4–97.1%) with 40% lower variance than its competitors, and ablation studies validated each module’s contribution (TriFusion Block alone improved accuracy by 6.9%). This research advances UATR by (1) resolving multi-scale feature fusion bottlenecks, (2) demonstrating the efficacy of semi-supervised learning in marine acoustics, and (3) providing an open-source implementation for reproducibility. In future work, we will extend cross-domain adaptation to diverse oceanic environments. Full article

(This article belongs to the Special Issue Remote Sensing Target Recognition and Detection: Theory and Applications (Second Edition))

► Show Figures

Figure 1

15 pages, 1770 KiB

Open AccessArticle

PSHNet: Hybrid Supervision and Feature Enhancement for Accurate Infrared Small-Target Detection

by Weicong Chen, Chenghong Zhang and Yuan Liu

Appl. Sci. 2025, 15(14), 7629; https://doi.org/10.3390/app15147629 - 8 Jul 2025

Viewed by 127

Abstract

Detecting small targets in infrared imagery remains highly challenging due to sub-pixel target sizes, low signal-to-noise ratios, and complex background clutter. This paper proposes PSHNet, a hybrid deep-learning framework that combines dense spatial heatmap supervision with geometry-aware regression for accurate infrared small-target detection. [...] Read more.

Detecting small targets in infrared imagery remains highly challenging due to sub-pixel target sizes, low signal-to-noise ratios, and complex background clutter. This paper proposes PSHNet, a hybrid deep-learning framework that combines dense spatial heatmap supervision with geometry-aware regression for accurate infrared small-target detection. The network generates position–scale heatmaps to guide coarse localization, which are further refined through sub-pixel offset and size regression. A Complete IoU (CIoU) loss is introduced as a geometric regularization term to improve alignment between predicted and ground-truth bounding boxes. To better preserve fine spatial details essential for identifying small thermal signatures, an Enhanced Low-level Feature Module (ELFM) is incorporated using multi-scale dilated convolutions and channel attention. Experiments on the NUDT-SIRST and IRSTD-1k datasets demonstrate that PSHNet outperforms existing methods in IoU, detection probability, and false alarm rate, achieving IoU improvement and robust performance under low-SNR conditions. Full article

► Show Figures

Figure 1

21 pages, 5977 KiB

Open AccessArticle

A Two-Stage Machine Learning Approach for Calving Detection in Rangeland Cattle

by Yuxi Wang, Andrés Perea, Huiping Cao, Mehmet Bakir and Santiago Utsumi

Agriculture 2025, 15(13), 1434; https://doi.org/10.3390/agriculture15131434 - 3 Jul 2025

Viewed by 324

Abstract

Monitoring parturient cattle during calving is crucial for reducing cow and calf mortality, enhancing reproductive and production performance, and minimizing labor costs. Traditional monitoring methods include direct animal inspection or the use of specialized sensors. These methods can be effective, but impractical in [...] Read more.

Monitoring parturient cattle during calving is crucial for reducing cow and calf mortality, enhancing reproductive and production performance, and minimizing labor costs. Traditional monitoring methods include direct animal inspection or the use of specialized sensors. These methods can be effective, but impractical in large-scale ranching operations due to time, cost, and logistical constraints. To address this challenge, a network of low-power and long-range IoT sensors combining the Global Navigation Satellite System (GNSS) and tri-axial accelerometers was deployed to monitor in real-time 15 parturient Brangus cows on a 700-hectare pasture at the Chihuahuan Desert Rangeland Research Center (CDRRC). A two-stage machine learning approach was tested. In the first stage, a fully connected autoencoder with time encoding was used for unsupervised detection of anomalous behavior. In the second stage, a Random Forest classifier was applied to distinguish calving events from other detected anomalies. A 5-fold cross-validation, using 12 cows for training and 3 cows for testing, was applied at each iteration. While 100% of the calving events were successfully detected by the autoencoder, the Random Forest model failed to classify the calving events of two cows and misidentified the onset of calving for a third cow by 46 h. The proposed framework demonstrates the value of combining unsupervised and supervised machine learning techniques for detecting calving events in rangeland cattle under extensive management conditions. The real-time application of the proposed AI-driven monitoring system has the potential to enhance animal welfare and productivity, improve operational efficiency, and reduce labor demands in large-scale ranching. Future advancements in multi-sensor platforms and model refinements could further boost detection accuracy, making this approach increasingly adaptable across diverse management systems, herd structures, and environmental conditions. Full article

(This article belongs to the Special Issue Modeling of Livestock Breeding Environment and Animal Behavior)

► Show Figures

Figure 1

17 pages, 1609 KiB

Open AccessArticle

Parallel Multi-Scale Semantic-Depth Interactive Fusion Network for Depth Estimation

by Chenchen Fu, Sujunjie Sun, Ning Wei, Vincent Chau, Xueyong Xu and Weiwei Wu

J. Imaging 2025, 11(7), 218; https://doi.org/10.3390/jimaging11070218 - 1 Jul 2025

Viewed by 273

Abstract

Self-supervised depth estimation from monocular image sequences provides depth information without costly sensors like LiDAR, offering significant value for autonomous driving. Although self-supervised algorithms can reduce the dependence on labeled data, the performance is still affected by scene occlusions, lighting differences, and sparse [...] Read more.

Self-supervised depth estimation from monocular image sequences provides depth information without costly sensors like LiDAR, offering significant value for autonomous driving. Although self-supervised algorithms can reduce the dependence on labeled data, the performance is still affected by scene occlusions, lighting differences, and sparse textures. Existing methods do not consider the enhancement and interaction fusion of features. In this paper, we propose a novel parallel multi-scale semantic-depth interactive fusion network. First, we adopt a multi-stage feature attention network for feature extraction, and a parallel semantic-depth interactive fusion module is introduced to refine edges. Furthermore, we also employ a metric loss based on semantic edges to take full advantage of semantic geometric information. Our network is trained and evaluated on KITTI datasets. The experimental results show that the methods achieve satisfactory performance compared to other existing methods. Full article

► Show Figures

Figure 1

19 pages, 7851 KiB

Open AccessArticle

Ship Plate Detection Algorithm Based on Improved RT-DETR

by Lei Zhang and Liuyi Huang

J. Mar. Sci. Eng. 2025, 13(7), 1277; https://doi.org/10.3390/jmse13071277 - 30 Jun 2025

Viewed by 288

Abstract

To address the challenges in ship plate detection under complex maritime scenarios—such as small target size, extreme aspect ratios, dense arrangements, and multi-angle rotations—this paper proposes a multi-module collaborative detection algorithm, RT-DETR-HPA, based on an enhanced RT-DETR framework. The proposed model integrates three [...] Read more.

To address the challenges in ship plate detection under complex maritime scenarios—such as small target size, extreme aspect ratios, dense arrangements, and multi-angle rotations—this paper proposes a multi-module collaborative detection algorithm, RT-DETR-HPA, based on an enhanced RT-DETR framework. The proposed model integrates three core components: an improved High-Frequency Enhanced Residual Block (HFERB) embedded in the backbone to strengthen multi-scale high-frequency feature fusion, with deformable convolution added to handle occlusion and deformation; a Pinwheel-shaped Convolution (PConv) module employing multi-directional convolution kernels to achieve rotation-adaptive local detail extraction and accurately capture plate edges and character features; and an Adaptive Sparse Self-Attention (ASSA) mechanism incorporated into the encoder to automatically focus on key regions while suppressing complex background interference, thereby enhancing feature discriminability. Comparative experiments conducted on a self-constructed dataset of 20,000 ship plate images show that, compared to the original RT-DETR, RT-DETR-HPA achieves a 3.36% improvement in mAP@50 (up to 97.12%), a 3.23% increase in recall (reaching 94.88%), and maintains real-time detection speed at 40.1 FPS. Compared with mainstream object detection models such as the YOLO series and Faster R-CNN, RT-DETR-HPA demonstrates significant advantages in high-precision localization, adaptability to complex scenarios, and real-time performance. It effectively reduces missed and false detections caused by low resolution, poor lighting, and dense occlusion, providing a robust and high-accuracy solution for intelligent ship supervision. Future work will focus on lightweight model design and dynamic resolution adaptation to enhance its applicability on mobile maritime surveillance platforms. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

24 pages, 2802 KiB

Open AccessArticle

MSDCA: A Multi-Scale Dual-Branch Network with Enhanced Cross-Attention for Hyperspectral Image Classification

by Ning Jiang, Shengling Geng, Yuhui Zheng and Le Sun

Remote Sens. 2025, 17(13), 2198; https://doi.org/10.3390/rs17132198 - 26 Jun 2025

Viewed by 324

Abstract

The high dimensionality of hyperspectral data, coupled with limited labeled samples and complex scene structures, makes spatial–spectral feature learning particularly challenging. To address these limitations, we propose a dual-branch deep learning framework named MSDCA, which performs spatial–spectral joint modeling under limited supervision. First, [...] Read more.

The high dimensionality of hyperspectral data, coupled with limited labeled samples and complex scene structures, makes spatial–spectral feature learning particularly challenging. To address these limitations, we propose a dual-branch deep learning framework named MSDCA, which performs spatial–spectral joint modeling under limited supervision. First, a multiscale 3D spatial–spectral feature extraction module (3D-SSF) employs parallel 3D convolutional branches with diverse kernel sizes and dilation rates, enabling hierarchical modeling of spatial–spectral representations from large-scale patches and effectively capturing both fine-grained textures and global context. Second, a multi-branch directional feature module (MBDFM) enhances the network’s sensitivity to directional patterns and long-range spatial relationships. It achieves this by applying axis-aware depthwise separable convolutions along both horizontal and vertical axes, thereby significantly improving the representation of spatial features. Finally, the enhanced cross-attention Transformer encoder (ECATE) integrates a dual-branch fusion strategy, where a cross-attention stream learns semantic dependencies across multi-scale tokens, and a residual path ensures the preservation of structural integrity. The fused features are further refined through lightweight channel and spatial attention modules. This adaptive alignment process enhances the discriminative power of heterogeneous spatial–spectral features. The experimental results on three widely used benchmark datasets demonstrate that the proposed method consistently outperforms state-of-the-art approaches in terms of classification accuracy and robustness. Notably, the framework is particularly effective for small-sample classes and complex boundary regions, while maintaining high computational efficiency. Full article

► Show Figures

Graphical abstract

28 pages, 11793 KiB

Open AccessArticle

Unsupervised Multimodal UAV Image Registration via Style Transfer and Cascade Network

by Xiaoye Bi, Rongkai Qie, Chengyang Tao, Zhaoxiang Zhang and Yuelei Xu

Remote Sens. 2025, 17(13), 2160; https://doi.org/10.3390/rs17132160 - 24 Jun 2025

Cited by 1 | Viewed by 317

Abstract

Cross-modal image registration for unmanned aerial vehicle (UAV) platforms presents significant challenges due to large-scale deformations, distinct imaging mechanisms, and pronounced modality discrepancies. This paper proposes a novel multi-scale cascaded registration network based on style transfer that achieves superior performance: up to 67% [...] Read more.

Cross-modal image registration for unmanned aerial vehicle (UAV) platforms presents significant challenges due to large-scale deformations, distinct imaging mechanisms, and pronounced modality discrepancies. This paper proposes a novel multi-scale cascaded registration network based on style transfer that achieves superior performance: up to 67% reduction in mean squared error (from 0.0106 to 0.0068), 9.27% enhancement in normalized cross-correlation, 26% improvement in local normalized cross-correlation, and 8% increase in mutual information compared to state-of-the-art methods. The architecture integrates a cross-modal style transfer network (CSTNet) that transforms visible images into pseudo-infrared representations to unify modality characteristics, and a multi-scale cascaded registration network (MCRNet) that performs progressive spatial alignment across multiple resolution scales using diffeomorphic deformation modeling to ensure smooth and invertible transformations. A self-supervised learning paradigm based on image reconstruction eliminates reliance on manually annotated data while maintaining registration accuracy through synthetic deformation generation. Extensive experiments on the LLVIP dataset demonstrate the method’s robustness under challenging conditions involving large-scale transformations, with ablation studies confirming that style transfer contributes 28% MSE improvement and diffeomorphic registration prevents 10.6% performance degradation. The proposed approach provides a robust solution for cross-modal image registration in dynamic UAV environments, offering significant implications for downstream applications such as target detection, tracking, and surveillance. Full article

(This article belongs to the Special Issue Advances in Deep Learning Approaches: UAV Data Analysis)

► Show Figures

Graphical abstract

22 pages, 143709 KiB

Open AccessArticle

Boundary-Aware Camouflaged Object Detection via Spatial-Frequency Domain Supervision

by Penglin Wang, Yaochi Zhao and Zhuhua Hu

Electronics 2025, 14(13), 2541; https://doi.org/10.3390/electronics14132541 - 23 Jun 2025

Viewed by 286

Abstract

Camouflaged object detection (COD) aims to detect objects that seamlessly integrate with their surrounding environment and are thereby intractable to distinguish from the background. Existing approaches face difficulties in dynamically adapting to scenarios where the foreground closely resembles the background. Additionally, these methods [...] Read more.

Camouflaged object detection (COD) aims to detect objects that seamlessly integrate with their surrounding environment and are thereby intractable to distinguish from the background. Existing approaches face difficulties in dynamically adapting to scenarios where the foreground closely resembles the background. Additionally, these methods primarily rely on single-domain boundary supervision while overlooking multi-dimensional constraints, leading to indistinct object boundaries. Inspired by the hawk’s visual predation mechanism, namely, global perception and local refinement, we design an innovative two-stage boundary-aware network, namely, SFNet, which relies on supervision in the spatial-frequency domains. In detail, to simulate the global perception mechanism, we design a multi-scale dynamic attention module to capture contextual relationships between camouflaged objects and surroundings and to enhance key feature representation. In the local refinement stage, we introduce a dual-domain boundary supervision mechanism that jointly optimizes boundaries in frequency and spatial domains, along with an adaptive gated boundary guided module to maintain global semantic consistency. Extensive experiments on four camouflaged object detection datasets demonstrate that SFNet surpasses state-of-the-art methods by 4.1%, with lower computational overhead and memory costs. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

25 pages, 34278 KiB

Open AccessArticle

Complementary Local–Global Optimization for Few-Shot Object Detection in Remote Sensing

by Yutong Zhang, Xin Lyu, Xin Li, Siqi Zhou, Yiwei Fang, Chenlong Ding, Shengkai Gao and Jiale Chen

Remote Sens. 2025, 17(13), 2136; https://doi.org/10.3390/rs17132136 - 21 Jun 2025

Viewed by 502

Abstract

Few-shot object detection (FSOD) in remote sensing remains challenging due to the scarcity of annotated samples and the complex background environments in aerial images. Existing methods often struggle to capture fine-grained local features or suffer from bias during global adaptation to novel categories, [...] Read more.

Few-shot object detection (FSOD) in remote sensing remains challenging due to the scarcity of annotated samples and the complex background environments in aerial images. Existing methods often struggle to capture fine-grained local features or suffer from bias during global adaptation to novel categories, leading to misclassification as background. To address these issues, we propose a framework that simultaneously enhances local feature learning and global feature adaptation. Specifically, we design an Extensible Local Feature Aggregator Module (ELFAM) that reconstructs object structures via multi-scale recursive attention aggregation. We further introduce a Self-Guided Novel Adaptation (SGNA) module that employs a teacher-student collaborative strategy to generate high-quality pseudo-labels, thereby refining the semantic feature distribution of novel categories. In addition, a Teacher-Guided Dual-Branch Head (TG-DH) is developed to supervise both classification and regression using pseudo-labels generated by the teacher model to further stabilize and enhance the semantic features of novel classes. Extensive experiments on DlOR and iSAlD datasets demonstrate that our method achieves superior performance compared to existing state-of-the-art FSOD approaches and simultaneously validate the effectiveness of all proposed components. Full article

(This article belongs to the Special Issue Efficient Object Detection Based on Remote Sensing Images)

► Show Figures

Figure 1

22 pages, 9695 KiB

Open AccessArticle

DAENet: A Deep Attention-Enhanced Network for Cropland Extraction in Complex Terrain from High-Resolution Satellite Imagery

by Yushen Wang, Mingchao Yang, Tianxiang Zhang, Shasha Hu and Qingwei Zhuang

Agriculture 2025, 15(12), 1318; https://doi.org/10.3390/agriculture15121318 - 19 Jun 2025

Viewed by 353

Abstract

Prompt and precise cropland mapping is indispensable for safeguarding food security, enhancing land resource utilization, and advancing sustainable agricultural practices. Conventional approaches faced difficulties in complex terrain marked by fragmented plots, pronounced elevation differences, and non-uniform field borders. To address these challenges, we [...] Read more.

Prompt and precise cropland mapping is indispensable for safeguarding food security, enhancing land resource utilization, and advancing sustainable agricultural practices. Conventional approaches faced difficulties in complex terrain marked by fragmented plots, pronounced elevation differences, and non-uniform field borders. To address these challenges, we propose DAENet, a novel deep learning framework designed for accurate cropland extraction from high-resolution GaoFen-1 (GF-1) satellite imagery. DAENet employs a novel Geometric-Optimized and Boundary-Restrained (GOBR) Block, which combines channel attention, multi-scale spatial attention, and boundary supervision mechanisms to effectively mitigate challenges arising from disjointed cropland parcels, topography-cast shadows, and indistinct edges. We conducted comparative experiments using 8 mainstream semantic segmentation models. The results demonstrate that DAENet achieves superior performance, with an Intersection over Union (IoU) of 0.9636, representing a 4% improvement over the best-performing baseline, and an F1-score of 0.9811, marking a 2% increase. Ablation analysis further validated the indispensable contribution of GOBR modules in improving segmentation precision. Using our approach, we successfully extracted 25,556.98 hectares of cropland within the study area, encompassing a total of 67,850 individual blocks. Additionally, the proposed method exhibits robust generalization across varying spatial resolutions, underscoring its effectiveness as a high-accuracy solution for agricultural monitoring and sustainable land management in complex terrain. Full article

(This article belongs to the Section Digital Agriculture)

► Show Figures

Figure 1

Search Results (403)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (403)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI