MDPI - Publisher of Open Access Journals

28 pages, 4950 KiB

Open AccessArticle

A Method for Auto Generating a Remote Sensing Building Detection Sample Dataset Based on OpenStreetMap and Bing Maps

by Jiawei Gu, Chen Ji, Houlin Chen, Xiangtian Zheng, Liangbao Jiao and Liang Cheng

Remote Sens. 2025, 17(14), 2534; https://doi.org/10.3390/rs17142534 - 21 Jul 2025

Viewed by 334

In remote sensing building detection tasks, data acquisition remains a critical bottleneck that limits both model performance and large-scale deployment. Due to the high cost of manual annotation, limited geographic coverage, and constraints of image acquisition conditions, obtaining large-scale, high-quality labeled datasets remains [...] Read more.

In remote sensing building detection tasks, data acquisition remains a critical bottleneck that limits both model performance and large-scale deployment. Due to the high cost of manual annotation, limited geographic coverage, and constraints of image acquisition conditions, obtaining large-scale, high-quality labeled datasets remains a significant challenge. To address this issue, this study proposes an automatic semantic labeling framework for remote sensing imagery. The framework leverages geospatial vector data provided by OpenStreetMap, precisely aligns it with high-resolution satellite imagery from Bing Maps through projection transformation, and incorporates a quality-aware sample filtering strategy to automatically generate accurate annotations for building detection. The resulting dataset comprises 36,647 samples, covering buildings in both urban and suburban areas across multiple cities. To evaluate its effectiveness, we selected three publicly available datasets—WHU, INRIA, and DZU—and conducted three types of experiments using the following four representative object detection models: SSD, Faster R-CNN, DETR, and YOLOv11s. The experiments include benchmark performance evaluation, input perturbation robustness testing, and cross-dataset generalization analysis. Results show that our dataset achieved a mAP at 0.5 intersection over union of up to 93.2%, with a precision of 89.4% and a recall of 90.6%, outperforming the open-source benchmarks across all four models. Furthermore, when simulating real-world noise in satellite image acquisition—such as motion blur and brightness variation—our dataset maintained a mean average precision of 90.4% under the most severe perturbation, indicating strong robustness. In addition, it demonstrated superior cross-dataset stability compared to the benchmarks. Finally, comparative experiments conducted on public test areas further validated the effectiveness and reliability of the proposed annotation framework. Full article

► Show Figures

Figure 1

23 pages, 16886 KiB

Open AccessArticle

SAVL: Scene-Adaptive UAV Visual Localization Using Sparse Feature Extraction and Incremental Descriptor Mapping

by Ganchao Liu, Zhengxi Li, Qiang Gao and Yuan Yuan

Remote Sens. 2025, 17(14), 2408; https://doi.org/10.3390/rs17142408 - 12 Jul 2025

Viewed by 413

Abstract

In recent years, the use of UAVs has become widespread. Long distance flight of UAVs requires obtaining precise geographic coordinates. Global Navigation Satellite Systems (GNSS) are the most common positioning models, but their signals are susceptible to interference from obstacles and complex electromagnetic [...] Read more.

In recent years, the use of UAVs has become widespread. Long distance flight of UAVs requires obtaining precise geographic coordinates. Global Navigation Satellite Systems (GNSS) are the most common positioning models, but their signals are susceptible to interference from obstacles and complex electromagnetic environments. In this case, vision-based technology can serve as an alternative solution to ensure the self-positioning capability of UAVs. Therefore, a scene adaptive UAV visual localization framework (SAVL) is proposed. In the proposed framework, UAV images are mapped to satellite images with geographic coordinates through pixel-level matching to locate UAVs. Firstly, to tackle the challenge of inaccurate localization resulting from sparse terrain features, this work proposes a novel feature extraction network grounded in a general visual model, leveraging the robust zero-shot generalization capability of the pre-trained model and extracting sparse features from UAV and satellite imagery. Secondly, in order to overcome the problem of weak generalization ability in unknown scenarios, a descriptor incremental mapping module was designed, which reduces multi-source image differences at the semantic level through UAV satellite image descriptor mapping and constructs a confidence-based incremental strategy to dynamically adapt to the scene. Finally, due to the lack of annotated public datasets, a scene-rich UAV dataset (RealUAV) was constructed to study UAV visual localization in real-world environments. In order to evaluate the localization performance of the proposed framework, several related methods were compared and analyzed in detail. The results on the dataset indicate that the proposed method achieves excellent positioning accuracy, with an average error of only 8.71 m. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Machine Learning Applications on Remote Sensing Images)

► Show Figures

Figure 1

24 pages, 6594 KiB

Open AccessArticle

GAT-Enhanced YOLOv8_L with Dilated Encoder for Multi-Scale Space Object Detection

by Haifeng Zhang, Han Ai, Donglin Xue, Zeyu He, Haoran Zhu, Delian Liu, Jianzhong Cao and Chao Mei

Remote Sens. 2025, 17(13), 2119; https://doi.org/10.3390/rs17132119 - 20 Jun 2025

Viewed by 478

Abstract

The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale [...] Read more.

The problem of inadequate object detection accuracy in complex remote sensing scenarios has been identified as a primary concern. Traditional YOLO-series algorithms encounter challenges such as poor robustness in small object detection and significant interference from complex backgrounds. In this paper, a multi-scale feature fusion framework based on an improved version of YOLOv8_L is proposed. The combination of a graph attention network (GAT) and Dilated Encoder network significantly improves the algorithm detection and recognition performance for space remote sensing objects. It mainly includes abandoning the original Feature Pyramid Network (FPN) structure, proposing an adaptive fusion strategy based on multi-level features of backbone network, enhancing the expression ability of multi-scale objects through upsampling and feature stacking, and reconstructing the FPN. The local features extracted by convolutional neural networks are mapped to graph-structured data, and the nodal attention mechanism of GAT is used to capture the global topological association of space objects, which makes up for the deficiency of the convolutional operation in weight allocation and realizes GAT integration. The Dilated Encoder network is introduced to cover different-scale targets by differentiating receptive fields, and the feature weight allocation is optimized by combining it with a Convolutional Block Attention Module (CBAM). According to the characteristics of space missions, an annotated dataset containing 8000 satellite and space station images is constructed, covering a variety of lighting, attitude and scale scenes, and providing benchmark support for model training and verification. Experimental results on the space object dataset reveal that the enhanced algorithm achieves a mean average precision (mAP) of 97.2%, representing a 2.1% improvement over the original YOLOv8_L. Comparative experiments with six other models demonstrate that the proposed algorithm outperforms its counterparts. Ablation studies further validate the synergistic effect between the graph attention network (GAT) and the Dilated Encoder. The results indicate that the model maintains a high detection accuracy under challenging conditions, including strong light interference, multi-scale variations, and low-light environments. Full article

(This article belongs to the Special Issue Remote Sensing Image Thorough Analysis by Advanced Machine Learning)

► Show Figures

Figure 1

26 pages, 7349 KiB

Open AccessArticle

Enhancing DeepLabv3+ Convolutional Neural Network Model for Precise Apple Orchard Identification Using GF-6 Remote Sensing Images and PIE-Engine Cloud Platform

by Guining Gao, Zhihan Chen, Yicheng Wei, Xicun Zhu and Xinyang Yu

Remote Sens. 2025, 17(11), 1923; https://doi.org/10.3390/rs17111923 - 31 May 2025

Viewed by 520

Abstract

Utilizing remote sensing models to monitor apple orchards facilitates the industrialization of agriculture and the sustainable development of rural land resources. This study enhanced the DeepLabv3+ model to achieve superior performance in apple orchard identification by incorporating ResNet, optimizing the algorithm, and adjusting [...] Read more.

Utilizing remote sensing models to monitor apple orchards facilitates the industrialization of agriculture and the sustainable development of rural land resources. This study enhanced the DeepLabv3+ model to achieve superior performance in apple orchard identification by incorporating ResNet, optimizing the algorithm, and adjusting hyperparameter configuration using the PIE-Engine cloud platform. GF-6 PMS images were used as the data source, and Qixia City was selected as the case study area for demonstration. The results indicate that the accuracies of apple orchard identification using the proposed DeepLabv3+_34, DeepLabv3+_50, and DeepLabv3+_101 reached 91.17%, 92.55%, and 94.37%, respectively. DeepLabv3+_101 demonstrated superior identification performance for apple orchards compared with ResU-Net and LinkNet, with an average accuracy improvement of over 3%. The identified area of apple orchards using the DeepLabv3+_101 model was 629.32 km², accounting for 31.20% of Qixia City’s total area; apple orchards were mainly located in the western part of the study area. The innovation of this research lies in combining image annotation and object-oriented methods during training, improving annotation efficiency and accuracy. Additionally, an enhanced DeepLabv3+ model was constructed based on GF-6 satellite images and the PIE-Engine cloud platform, exhibiting superior performance in feature expression compared with conventional machine learning classification and recognition algorithms. Full article

(This article belongs to the Special Issue Remote Sensing Image Classification: Theory and Application)

► Show Figures

Figure 1

16 pages, 3645 KiB

Open AccessArticle

A Global Coseismic InSAR Dataset for Deep Learning: Automated Construction from Sentinel-1 Observations (2015–2024)

by Xu Liu, Zhenjie Wang, Yingfeng Zhang, Xinjian Shan and Ziwei Liu

Remote Sens. 2025, 17(11), 1832; https://doi.org/10.3390/rs17111832 - 23 May 2025

Viewed by 839

Abstract

Interferometric synthetic aperture radar (InSAR) technology has been widely employed in the rapid monitoring of earthquakes and associated geological hazards. With the continued advancement of InSAR technology, the growing volume of satellite-acquired data has opened new avenues for applying deep learning (DL) techniques [...] Read more.

Interferometric synthetic aperture radar (InSAR) technology has been widely employed in the rapid monitoring of earthquakes and associated geological hazards. With the continued advancement of InSAR technology, the growing volume of satellite-acquired data has opened new avenues for applying deep learning (DL) techniques to the analysis of earthquake-induced surface deformation. Although DL holds great promise for processing InSAR data, its development progress has been significantly constrained by the absence of large-scale, accurately annotated datasets related to earthquake-induced deformation. To address this limitation, we propose an automated method for constructing deep learning training datasets by integrating the Global Centroid Moment Tensor (GCMT) earthquake catalog with Sentinel-1 InSAR observations. This approach reduces the inefficiencies and manual labor typically involved in InSAR data preparation, thereby significantly enhancing the efficiency and automation of constructing deep learning datasets for coseismic deformation. Using this method, we developed and publicly released a large-scale training dataset consisting of coseismic InSAR samples. The dataset contained 353 Sentinel-1 interferograms corresponding to 62 global earthquakes that occurred between 2015 and 2024. Following standardized preprocessing and data augmentation (DA), a large number of image samples were generated for model training. Multidimensional analyses of the dataset confirmed its high quality and strong representativeness, making it a valuable asset for deep learning research on coseismic deformation. The dataset construction process followed a standardized and reproducible workflow, ensuring objectivity and consistency throughout data generation. As additional coseismic InSAR observations become available, the dataset can be continuously expanded, evolving into a comprehensive, high-quality, and diverse training resource. It serves as a solid foundation for advancing deep learning applications in the field of InSAR-based coseismic deformation analysis. Full article

(This article belongs to the Special Issue Artificial Intelligence and Remote Sensing for Geohazards)

► Show Figures

Figure 1

26 pages, 20953 KiB

Open AccessArticle

Optimization-Based Downscaling of Satellite-Derived Isotropic Broadband Albedo to High Resolution

by Niko Lukač, Domen Mongus and Marko Bizjak

Remote Sens. 2025, 17(8), 1366; https://doi.org/10.3390/rs17081366 - 11 Apr 2025

Viewed by 372

Abstract

In this paper, a novel method for estimating high-resolution isotropic broadband albedo is proposed, by downscaling satellite-derived albedo using an optimization approach. At first, broadband albedo is calculated from the lower-resolution multispectral satellite image using standard narrow-to-broadband (NTB) conversion, where the surfaces are [...] Read more.

In this paper, a novel method for estimating high-resolution isotropic broadband albedo is proposed, by downscaling satellite-derived albedo using an optimization approach. At first, broadband albedo is calculated from the lower-resolution multispectral satellite image using standard narrow-to-broadband (NTB) conversion, where the surfaces are considered Lambertian with isotropic reflectance. The high-resolution true orthophoto for the same location is segmented with the deep learning-based Segment Anything Model (SAM), and the resulting segments are refined with a classified digital surface model (cDSM) to exclude small transient objects. Afterwards, the remaining segments are grouped using K-means clustering, by considering orthophoto-visible (VIS) and near-infrared (NIR) bands. These segments present surfaces with similar materials and underlying reflectance properties. Next, the Differential Evolution (DE) optimization algorithm is applied to approximate albedo values to these segments so that their spatial aggregate matches the coarse-resolution satellite albedo, by proposing two novel objective functions. Extensive experiments considering different DE parameters over an 0.75 km² large urban area in Maribor, Slovenia, have been carried out, where Sentinel-2 Level-2A NTB-derived albedo was downscaled to 1 m spatial resolution. Looking at the performed spatiospectral analysis, the proposed method achieved absolute differences of 0.09 per VIS band and below 0.18 per NIR band, in comparison to lower-resolution NTB-derived albedo. Moreover, the proposed method achieved a root mean square error (RMSE) of 0.0179 and a mean absolute percentage error (MAPE) of 4.0299% against ground truth broadband albedo annotations of characteristic materials in the given urban area. The proposed method outperformed the Enhanced Super-Resolution Generative Adversarial Networks (ESRGANs), which achieved an RMSE of 0.0285 and an MAPE of 9.2778%, and the Blind Super-Resolution Generative Adversarial Network (BSRGAN), which achieved an RMSE of 0.0341 and an MAPE of 12.3104%. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Graphical abstract

28 pages, 3815 KiB

Open AccessArticle

Collaborative Static-Dynamic Teaching: A Semi-Supervised Framework for Stripe-like Space Target Detection

by Zijian Zhu, Ali Zia, Xuesong Li, Bingbing Dan, Yuebo Ma, Hongfeng Long, Kaili Lu, Enhai Liu and Rujin Zhao

Remote Sens. 2025, 17(8), 1341; https://doi.org/10.3390/rs17081341 - 9 Apr 2025

Cited by 1 | Viewed by 483

Abstract

Stripe-like space target detection (SSTD) plays a crucial role in advancing space situational awareness, enabling missions like satellite navigation and debris monitoring. Existing unsupervised methods often falter in low signal-to-noise ratio (SNR) conditions, while fully supervised approaches require extensive and labor-intensive pixel-level annotations. [...] Read more.

Stripe-like space target detection (SSTD) plays a crucial role in advancing space situational awareness, enabling missions like satellite navigation and debris monitoring. Existing unsupervised methods often falter in low signal-to-noise ratio (SNR) conditions, while fully supervised approaches require extensive and labor-intensive pixel-level annotations. To address these limitations, this paper introduces MRSA-Net, a novel encoder-decoder network specifically designed for SSTD. MRSA-Net incorporates multi-receptive field processing and multi-level feature fusion to effectively extract features of variable and low-SNR stripe-like targets. Building upon this, we propose the Collaborative Static-Dynamic Teaching (CSDT) architecture, a semi-supervised learning architecture that reduces reliance on labeled data by leveraging both static and dynamic teacher models. The framework uses the straight-line prior of stripe-like targets to customize linearity and presents an innovative Adaptive Pseudo-Labeling (APL) strategy, dynamically selecting high-quality pseudo-labels to enhance the student model’s learning process. Extensive experiments on AstroStripeSet and other real-world datasets demonstrate that the CSDT framework achieves state-of-the-art performance in SSTD. Using just 1/16 of the labeled data, CSDT outperforms the second-best Interactive Self-Training Mean Teacher (ISMT) method by 2.64% in mean Intersection over Union (mIoU) and 4.5% in detection rate (

P_{d}

), while exhibiting strong generalization in unseen scenarios. This work marks the first application of semi-supervised learning techniques to SSTD, offering a flexible and scalable solution for challenging space imaging tasks. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

18 pages, 5506 KiB

Open AccessArticle

Optimizing Satellite Imagery Datasets for Enhanced Land/Water Segmentation

by Marco Scarpetta, Luisa De Palma, Attilio Di Nisio, Maurizio Spadavecchia, Paolo Affuso and Nicola Giaquinto

Sensors 2025, 25(6), 1793; https://doi.org/10.3390/s25061793 - 13 Mar 2025

Viewed by 824

Abstract

This paper presents an automated procedure for optimizing datasets used in land/water segmentation tasks with deep learning models. The proposed method employs the Normalized Difference Water Index (NDWI) with a variable threshold to automatically assess the quality of annotations associated with multispectral satellite [...] Read more.

This paper presents an automated procedure for optimizing datasets used in land/water segmentation tasks with deep learning models. The proposed method employs the Normalized Difference Water Index (NDWI) with a variable threshold to automatically assess the quality of annotations associated with multispectral satellite images. By systematically identifying and excluding low-quality samples, the method enhances dataset quality and improves model performance. Experimental results on two different publicly available datasets—the SWED and SNOWED—demonstrate that deep learning models trained on optimized datasets outperform those trained on baseline datasets, achieving significant improvements in segmentation accuracy, with up to a 10% increase in mean intersection over union, despite a reduced dataset size. Therefore, the presented methodology is a promising scalable solution for improving the quality of datasets for environmental monitoring and other remote sensing applications. Full article

(This article belongs to the Special Issue Deep Learning for Intelligent Systems: Challenges and Opportunities)

► Show Figures

Figure 1

37 pages, 14442 KiB

Open AccessArticle

Domain Adaptation and Fine-Tuning of a Deep Learning Segmentation Model of Small Agricultural Burn Area Detection Using High-Resolution Sentinel-2 Observations: A Case Study of Punjab, India

by Anamika Anand, Ryoichi Imasu, Surendra K. Dhaka and Prabir K. Patra

Remote Sens. 2025, 17(6), 974; https://doi.org/10.3390/rs17060974 - 10 Mar 2025

Cited by 1 | Viewed by 1675

Abstract

High-resolution Sentinel-2 imagery combined with a deep learning (DL) segmentation model offers a promising approach for accurate mapping of small and fragmented agricultural burn areas. Initially, the model was trained using ICNF burn area data from Portugal to capture large fire and burn [...] Read more.

High-resolution Sentinel-2 imagery combined with a deep learning (DL) segmentation model offers a promising approach for accurate mapping of small and fragmented agricultural burn areas. Initially, the model was trained using ICNF burn area data from Portugal to capture large fire and burn area delineation, thereby achieving moderate accuracy. Subsequent fine-tuning using annotated data from Punjab improved the model’s ability to detect small burn patches, demonstrating higher accuracy than the baseline Normalized Burn Ratio (NBR) Index method. On-ground validation using buffer zone analysis and crop field images confirmed the effectiveness of DL approach. Challenges such as cloud interference, temporal gaps in satellite data, and limited reference data for training persist, but this study underscores the methodogical advancements and potential of DL models applied for small burn area detection in agricultural settings. The model achieved overall accuracy of 98.7%, a macro-F1 score of 97.6%, IoU 0.54, and a Dice coefficient of 0.64, demonstrating its capability for detailed burn area delineation. The model can capture burn area smaller than 250 m², but the model at present is less efficient at representing the full extent of the fires. Overall, outcomes demonstrate the model’s applicability to generalize to a new domain despite regional differences among research areas. Full article

► Show Figures

Graphical abstract

16 pages, 86590 KiB

Open AccessArticle

Automated Detection of Araucaria angustifolia (Bertol.) Kuntze in Urban Areas Using Google Earth Images and YOLOv7x

by Mauro Alessandro Karasinski, Ramon de Sousa Leite, Emmanoella Costa Guaraná, Evandro Orfanó Figueiredo, Eben North Broadbent, Carlos Alberto Silva, Erica Kerolaine Mendonça dos Santos, Carlos Roberto Sanquetta and Ana Paula Dalla Corte

Remote Sens. 2025, 17(5), 809; https://doi.org/10.3390/rs17050809 - 25 Feb 2025

Viewed by 1174

Abstract

This study addresses the urgent need for effective methods to monitor and conserve Araucaria angustifolia, a critically endangered species of immense ecological and cultural significance in southern Brazil. Using high-resolution satellite images from Google Earth, we apply the YOLOv7x deep learning model [...] Read more.

This study addresses the urgent need for effective methods to monitor and conserve Araucaria angustifolia, a critically endangered species of immense ecological and cultural significance in southern Brazil. Using high-resolution satellite images from Google Earth, we apply the YOLOv7x deep learning model to detect this species in two distinct urban contexts in Curitiba, Paraná: isolated trees across the urban landscape and A. angustifolia individuals within forest remnants. Data augmentation techniques, including image rotation, hue and saturation adjustments, and mosaic augmentation, were employed to increase the model’s accuracy and robustness. Through a 5-fold cross-validation, the model achieved a mean Average Precision (AP) of 90.79% and an F1-score of 88.68%. Results show higher detection accuracy in forest remnants, where the homogeneous background of natural landscapes facilitated the identification of trees, compared to urban areas where complex visual elements like building shadows presented challenges. To reduce false positives, especially misclassifications involving palm species, additional annotations were introduced, significantly enhancing performance in urban environments. These findings highlight the potential of integrating remote sensing with deep learning to automate large-scale forest inventories. Furthermore, the study highlights the broader applicability of the YOLOv7x model for urban forestry planning, offering a cost-effective solution for biodiversity monitoring. The integration of predictive data with urban forest maps reveals a spatial correlation between A. angustifolia density and the presence of forest fragments, suggesting that the preservation of these areas is vital for the species’ sustainability. The model’s scalability also opens the door for future applications in ecological monitoring across larger urban areas. As urban environments continue to expand, understanding and conserving key species like A. angustifolia is critical for enhancing biodiversity, resilience, and addressing climate change. Full article

► Show Figures

Figure 1

35 pages, 19516 KiB

Open AccessArticle

DoubleNet: A Method for Generating Navigation Lines of Unstructured Soil Roads in a Vineyard Based on CNN and Transformer

by Xuezhi Cui, Licheng Zhu, Bo Zhao, Ruixue Wang, Zhenhao Han, Kunlei Lu, Xuguang Feng, Jipeng Ni and Xiaoyi Cui

Agronomy 2025, 15(3), 544; https://doi.org/10.3390/agronomy15030544 - 23 Feb 2025

Viewed by 664

Abstract

Navigating unstructured roads in vineyards with weak satellite signals presents significant challenges for robotic systems. This research introduces DoubleNet, an innovative deep-learning model designed to generate navigation lines for such conditions. To improve the model’s ability to extract image features, DoubleNet incorporates several [...] Read more.

Navigating unstructured roads in vineyards with weak satellite signals presents significant challenges for robotic systems. This research introduces DoubleNet, an innovative deep-learning model designed to generate navigation lines for such conditions. To improve the model’s ability to extract image features, DoubleNet incorporates several key innovations, such as a unique multi-head self-attention mechanism (Fused-MHSA), a modified activation function (SA-GELU), and a specialized operation block (DNBLK). Based on them, DoubleNet is structured as an encoder–decoder network that includes two parallel subnetworks: one dedicated to processing 2D feature maps and the other focused on 1D tensors. These subnetworks interact through two feature fusion networks, which operate in both the encoder and decoder stages, facilitating a more integrated feature extraction process. Additionally, we utilized a specially annotated dataset comprising images fused with RGB and mask, with five navigation points marked to enhance the accuracy of point localization. As a result of these innovations, DoubleNet achieves a remarkable 95.75% percentage of correct key points (PCK) and operates at 71.16 FPS on our dataset, with a combined performance that outperformed several well-known key point detection algorithms. DoubleNet demonstrates strong potential as a competitive solution for generating effective navigation routes for robots operating in vineyards with unstructured roads. Full article

(This article belongs to the Special Issue Advanced Machine Learning in Agriculture)

► Show Figures

Figure 1

23 pages, 10921 KiB

Open AccessArticle

A Weakly Supervised and Self-Supervised Learning Approach for Semantic Segmentation of Land Cover in Satellite Images with National Forest Inventory Data

by Daniel Moraes, Manuel L. Campagnolo and Mário Caetano

Remote Sens. 2025, 17(4), 711; https://doi.org/10.3390/rs17040711 - 19 Feb 2025

Cited by 1 | Viewed by 1219

Abstract

National Forest Inventories (NFIs) provide valuable land cover (LC) information but often lack spatial continuity and an adequate update frequency. Satellite-based remote sensing offers a viable alternative, employing machine learning to extract thematic data. State-of-the-art methods such as convolutional neural networks rely on [...] Read more.

National Forest Inventories (NFIs) provide valuable land cover (LC) information but often lack spatial continuity and an adequate update frequency. Satellite-based remote sensing offers a viable alternative, employing machine learning to extract thematic data. State-of-the-art methods such as convolutional neural networks rely on fully pixel-level annotated images, which are difficult to obtain. Although reference LC datasets have been widely used to derive annotations, NFIs consist of point-based data, providing only sparse annotations. Weakly supervised and self-supervised learning approaches help address this issue by reducing dependence on fully annotated images and leveraging unlabeled data. However, their potential for large-scale LC mapping needs further investigation. This study explored the use of NFI data with deep learning and weakly supervised and self-supervised methods. Using Sentinel-2 images and the Portuguese NFI, which covers other LC types beyond forest, as sparse labels, we performed weakly supervised semantic segmentation with a convolutional neural network to create an updated and spatially continuous national LC map. Additionally, we investigated the potential of self-supervised learning by pretraining a masked autoencoder on 65,000 Sentinel-2 image chips and then fine-tuning the model with NFI-derived sparse labels. The weakly supervised baseline achieved a validation accuracy of 69.60%, surpassing Random Forest (67.90%). The self-supervised model achieved 71.29%, performing on par with the baseline using half the training data. The results demonstrated that integrating both learning approaches enabled successful countrywide LC mapping with limited training data. Full article

(This article belongs to the Section Earth Observation Data)

► Show Figures

Figure 1

20 pages, 3955 KiB

Open AccessArticle

Deep Learning Extraction of Tidal Creeks in the Yellow River Delta Using GF-2 Imagery

by Bojie Chen, Qianran Zhang, Na Yang, Xiukun Wang, Xiaobo Zhang, Yilan Chen and Shengli Wang

Remote Sens. 2025, 17(4), 676; https://doi.org/10.3390/rs17040676 - 16 Feb 2025

Viewed by 956

Abstract

Tidal creeks are vital geomorphological features of tidal flats, and their spatial and temporal variations contribute significantly to the preservation of ecological diversity and the spatial evolution of coastal wetlands. Traditional methods, such as manual annotation and machine learning, remain common for tidal [...] Read more.

Tidal creeks are vital geomorphological features of tidal flats, and their spatial and temporal variations contribute significantly to the preservation of ecological diversity and the spatial evolution of coastal wetlands. Traditional methods, such as manual annotation and machine learning, remain common for tidal creek extraction, but they are slow and inefficient. With increasing data volumes, accurately analyzing tidal creeks over large spatial and temporal scales has become a significant challenge. This study proposes a residual U-Net model that utilizes full-dimensional dynamic convolution to segment tidal creeks in the Yellow River Delta, employing Gaofen-2 satellite images with a resolution of 4 m. The model replaces the traditional convolutions in the residual blocks of the encoder with Omni-dimensional Dynamic Convolution (ODConv), mitigating the loss of fine details and improving segmentation for small targets. Adding coordinate attention (CA) to the Atrous Spatial Pyramid Pooling (ASPP) module improves target classification and localization in remote sensing images. Including dice coefficients in the focal loss function improves the model’s gradient and tackles class imbalance within the dataset. Furthermore, the inclusion of dice coefficients in the focal loss function improves the gradient of the model and tackles the dataset’s class inequality. The study results indicate that the model attains an F1 score and kappa coefficient exceeding 80% for both mud and salt marsh regions. Comparisons with several semantic segmentation models on the mud marsh tidal creek dataset show that ODU-Net significantly enhances tidal creek segmentation, resolves class imbalance issues, and delivers superior extraction accuracy and stability. Full article

(This article belongs to the Special Issue Remote Sensing of Coastal, Wetland, and Intertidal Zones)

► Show Figures

Figure 1

25 pages, 7982 KiB

Open AccessArticle

Aerial Imagery Redefined: Next-Generation Approach to Object Classification

by Eran Dahan, Itzhak Aviv and Tzvi Diskin

Information 2025, 16(2), 134; https://doi.org/10.3390/info16020134 - 11 Feb 2025

Cited by 1 | Viewed by 1092

Abstract

Identifying and classifying objects in aerial images are two significant and complex issues in computer vision. The fine-grained classification of objects in overhead images has become widespread in various real-world applications, due to recent advancements in high-resolution satellite and airborne imaging systems. The [...] Read more.

Identifying and classifying objects in aerial images are two significant and complex issues in computer vision. The fine-grained classification of objects in overhead images has become widespread in various real-world applications, due to recent advancements in high-resolution satellite and airborne imaging systems. The task is challenging, particularly in low-resource cases, due to the minor differences between classes and the significant differences within each class caused by the fine-grained nature. We introduce Classification of Objects for Fine-Grained Analysis (COFGA), a recently developed dataset for accurately categorizing objects in high-resolution aerial images. The COFGA dataset comprises 2104 images and 14,256 annotated objects across 37 distinct labels. This dataset offers superior spatial information compared to other publicly available datasets. The MAFAT Challenge is a task that utilizes COFGA to improve fine-grained classification methods. The baseline model achieved a mAP of 0.6. This cost was 60, whereas the most superior model achieved a score of 0.6271 by utilizing state-of-the-art ensemble techniques and specific preprocessing techniques. We offer solutions to address the difficulties in analyzing aerial images, particularly when annotated and imbalanced class data are scarce. The findings provide valuable insights into the detailed categorization of objects and have practical applications in urban planning, environmental assessment, and agricultural management. We discuss the constraints and potential future endeavors, specifically emphasizing the potential to integrate supplementary modalities and contextual information into aerial imagery analysis. Full article

(This article belongs to the Special Issue Online Registration and Anomaly Detection of Cyber Security Events)

► Show Figures

Figure 1

16 pages, 9121 KiB

Open AccessEditor’s ChoiceTechnical Note

A Benchmark Dataset for Aircraft Detection in Optical Remote Sensing Imagery

by Jianming Hu, Xiyang Zhi, Bingxian Zhang, Tianjun Shi, Qi Cui and Xiaogang Sun

Remote Sens. 2024, 16(24), 4699; https://doi.org/10.3390/rs16244699 - 17 Dec 2024

Viewed by 1983

Abstract

The problem is that existing aircraft detection datasets rarely simultaneously consider the diversity of target features and the complexity of environmental factors, which has become an important factor restricting the effectiveness and reliability of aircraft detection algorithms. Although a large amount of research [...] Read more.

The problem is that existing aircraft detection datasets rarely simultaneously consider the diversity of target features and the complexity of environmental factors, which has become an important factor restricting the effectiveness and reliability of aircraft detection algorithms. Although a large amount of research has been devoted to breaking through few-sample-driven aircraft detection technology, most algorithms still struggle to effectively solve the problems of missed target detection and false alarms caused by numerous environmental interferences in bird-eye optical remote sensing scenes. To further aircraft detection research, we have established a new dataset, Aircraft Detection in Complex Optical Scene (ADCOS), sourced from various platforms including Google Earth, Microsoft Map, Worldview-3, Pleiades, Ikonos, Orbview-3, and Jilin-1 satellites. It integrates 3903 meticulously chosen images of over 400 famous airports worldwide, containing 33,831 annotated instances employing the oriented bounding box (OBB) format. Notably, this dataset encompasses a wide range of various targets characteristics including multi-scale, multi-direction, multi-type, multi-state, and dense arrangement, along with complex relationships between targets and backgrounds like cluttered backgrounds, low contrast, shadows, and occlusion interference conditions. Furthermore, we evaluated nine representative detection algorithms on the ADCOS dataset, establishing a performance benchmark for subsequent algorithm optimization. The latest dataset will soon be available on the Github website. Full article

(This article belongs to the Section Earth Observation Data)

► Show Figures

Figure 1

Search Results (96)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (96)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI