remotesensing-logo

Journal Browser

Journal Browser

Deep Learning and Computer Vision in Remote Sensing-II

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "AI Remote Sensing".

Deadline for manuscript submissions: closed (31 July 2023) | Viewed by 29382

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor Assistant
Department of Computing, University of Turku, Turku, Finland
Interests: machine learning; deep learning; computer vision; data analysis; pose estimation

Special Issue Information

Dear Colleagues,

Computer Vision (CV) have seen a massive rise in popularity in the remote sensing field over the last few years. This success is mostly due to the effectiveness of deep learning (DL) algorithms. However, remote sensing data acquisition and annotation, as well as information extraction from massive remote sensing data, are still challenging.

We are pleased to announce this Part II Special Issue, which will follow on from Part I, focusing on deep learning and computer vision methods for remote sensing. This Special Issue will give researchers the opportunity to explore the abovementioned challenges. We seek collaborative contributions from academia and industrial experts in deep learning, computer vision, data science, and remote sensing fields. Major topics of interest, although by no means exclusive, are as follows:

  • Satellite image processing and analysis based on deep learning;
  • Deep learning for object detection, image classification, and semantic and instance segmentation;
  • Deep learning for remote sensing scene understanding and classification;
  • Transfer learning, deep reinforcement learning for remote sensing;
  • Supervised and unsupervised representation learning for remote sensing environments;
  • Applications.

Dr. Fahimeh Farahnakian
Prof. Dr. Jukka Heikkonen
Guest Editors
Pouya Jafarzadeh
Guest Editor Assistant

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (15 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 6782 KiB  
Article
MeViT: A Medium-Resolution Vision Transformer for Semantic Segmentation on Landsat Satellite Imagery for Agriculture in Thailand
by Teerapong Panboonyuen, Chaiyut Charoenphon and Chalermchon Satirapod
Remote Sens. 2023, 15(21), 5124; https://doi.org/10.3390/rs15215124 - 26 Oct 2023
Viewed by 1446
Abstract
Semantic segmentation is a fundamental task in remote sensing image analysis that aims to classify each pixel in an image into different land use and land cover (LULC) segmentation tasks. In this paper, we propose MeViT (Medium-Resolution Vision Transformer) on Landsat satellite imagery [...] Read more.
Semantic segmentation is a fundamental task in remote sensing image analysis that aims to classify each pixel in an image into different land use and land cover (LULC) segmentation tasks. In this paper, we propose MeViT (Medium-Resolution Vision Transformer) on Landsat satellite imagery for the main economic crops in Thailand as follows: (i) para rubber, (ii) corn, and (iii) pineapple. Therefore, our proposed MeViT enhances vision transformers (ViTs), one of the modern deep learning on computer vision tasks, to learn semantically rich and spatially precise multi-scale representations by integrating medium-resolution multi-branch architectures with ViTs. We revised mixed-scale convolutional feedforward networks (MixCFN) by incorporating multiple depth-wise convolution paths to extract multi-scale local information to balance the model’s performance and efficiency. To evaluate the effectiveness of our proposed method, we conduct extensive experiments on the publicly available dataset of Thailand scenes and compare the results with several state-of-the-art deep learning methods. The experimental results demonstrate that our proposed MeViT outperforms existing methods and performs better in the semantic segmentation of Thailand scenes. The evaluation metrics used are precision, recall, F1 score, and mean intersection over union (IoU). Among the models compared, MeViT, our proposed model, achieves the best performance in all evaluation metrics. MeViT achieves a precision of 92.22%, a recall of 94.69%, an F1 score of 93.44%, and a mean IoU of 83.63%. These results demonstrate the effectiveness of our proposed approach in accurately segmenting Thai Landsat-8 data. The achieved F1 score overall, using our proposed MeViT, is 93.44%, which is a major significance of this work. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Figure 1

20 pages, 7890 KiB  
Article
Generalizing Spacecraft Recognition via Diversifying Few-Shot Datasets in a Joint Trained Likelihood
by Xi Yang, Dechen Kong, Ren Lin and Dong Yang
Remote Sens. 2023, 15(17), 4321; https://doi.org/10.3390/rs15174321 - 01 Sep 2023
Viewed by 691
Abstract
With the exploration of outer space, the number of space targets has increased dramatically, while the pressures of space situational awareness have also increased. Among them, spacecraft recognition is the foundation and a critical step in space situational awareness. However, unlike natural images [...] Read more.
With the exploration of outer space, the number of space targets has increased dramatically, while the pressures of space situational awareness have also increased. Among them, spacecraft recognition is the foundation and a critical step in space situational awareness. However, unlike natural images that can be easily captured using low-cost devices, space targets can suffer from motion blurring, overexposure, and excessive dragging at the time of capture, which greatly affects the quality of the images and reduces the number of effective images. To this end, specialized or sufficiently versatile techniques are required, with dataset diversity playing a key role in enabling algorithms to categorize previously unseen spacecraft and perform multiple tasks. In this paper, we propose a joint dataset formulation to increase diversity. Our approach involves reformulating two local processes to condition the Conditional Neural Adaptive Processes, which results in global feature resampling schemes to adapt a pre-trained embedding function to be task-specific. Specifically, we employ variational resampling to category-wise auxiliary features, adding a generative constraint to amortize task-specific parameters. We also develop a neural process variational inference to encode representation, using grid density for conditioning. Our evaluation of the BUAA dataset shows promising results, with no-training performance close to a specifically designed learner and an accuracy rate of 98.2% on unseen categories during the joint training session. Further experiments on the Meta-dataset benchmark demonstrate at least a 4.6% out-of-distribution improvement compared to the baseline conditional models. Both dataset evaluations indicate the effectiveness of exploiting dataset diversity in few-shot feature adaptation. Our proposal offers a versatile solution for tasks across domains. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Figure 1

21 pages, 8239 KiB  
Article
Sparse Signal Models for Data Augmentation in Deep Learning ATR
by Tushar Agarwal, Nithin Sugavanam and Emre Ertin
Remote Sens. 2023, 15(16), 4109; https://doi.org/10.3390/rs15164109 - 21 Aug 2023
Viewed by 887
Abstract
Automatic target recognition (ATR) algorithms are used to classify a given synthetic aperture radar (SAR) image into one of the known target classes by using the information gleaned from a set of training images that are available for each class. Recently, deep learning [...] Read more.
Automatic target recognition (ATR) algorithms are used to classify a given synthetic aperture radar (SAR) image into one of the known target classes by using the information gleaned from a set of training images that are available for each class. Recently, deep learning methods have been shown to achieve state-of-the-art classification accuracy if abundant training data are available, especially if they are sampled uniformly over the classes and in their poses. In this paper, we consider the ATR problem when a limited set of training images are available. We propose a data-augmentation approach to incorporate SAR domain knowledge and improve the generalization power of a data-intensive learning algorithm, such as a convolutional neural network (CNN). The proposed data-augmentation method employs a physics-inspired limited-persistence sparse modeling approach, which capitalizes on the commonly observed characteristics of wide-angle synthetic aperture radar (SAR) imagery. Specifically, we fit over-parametrized models of scattering to limited training data, and use the estimated models to synthesize new images at poses and sub-pixel translations that are not available in the given data in order to augment the limited training data. We exploit the sparsity of the scattering centers in the spatial domain and the smoothly varying structure of the scattering coefficients in the azimuthal domain to solve the ill-posed problem of the over-parametrized model fitting. The experimental results show that, for the training on the data-starved regions, the proposed method provides significant gains in the resulting ATR algorithm’s generalization performance. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Figure 1

20 pages, 5882 KiB  
Article
We Need to Communicate: Communicating Attention Network for Semantic Segmentation of High-Resolution Remote Sensing Images
by Xichen Meng, Liqun Zhu, Yilong Han and Hanchao Zhang
Remote Sens. 2023, 15(14), 3619; https://doi.org/10.3390/rs15143619 - 20 Jul 2023
Viewed by 894
Abstract
Traditional models that employ CNNs as encoders do not sufficiently combine high-level features and low-level features. However, high-level features are rich in semantic information but lack spatial detail, while low-level features are the opposite. Therefore, the integrated utilization of multi-level features and the [...] Read more.
Traditional models that employ CNNs as encoders do not sufficiently combine high-level features and low-level features. However, high-level features are rich in semantic information but lack spatial detail, while low-level features are the opposite. Therefore, the integrated utilization of multi-level features and the bridging of the gap between them is crucial to promote the accuracy of semantic segmentation. To address this issue, we presented communicating mutual attention (CMA) and communicating self-attention (CSA) modules to enhance the interaction and fusion of different levels of feature maps. On the one hand, CMA aggregates the global context information of high-level features into low-level features and embeds the spatial detail localization characteristics of low-level features in high-level features. On the other hand, the CSA module is deployed to integrate the spatially detailed representation of low-level features into the attention map of high-level features. We have experimented with the communicating attention network (CANet), a U-net-like network composed of multiple CMA and CSA modules, on the ISPRS Vaihingen and Potsdam datasets with mean F1-scores of 89.61% and 92.60%, respectively. The results demonstrate that CANet embodies superior performance in the semantic segmentation task of remote sensing of images. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Graphical abstract

21 pages, 7206 KiB  
Article
TranSDet: Toward Effective Transfer Learning for Small-Object Detection
by Xinkai Xu, Hailan Zhang, Yan Ma, Kang Liu, Hong Bao and Xu Qian
Remote Sens. 2023, 15(14), 3525; https://doi.org/10.3390/rs15143525 - 12 Jul 2023
Cited by 5 | Viewed by 2000
Abstract
Small-object detection is a challenging task in computer vision due to the limited training samples and low-quality images. Transfer learning, which transfers the knowledge learned from a large dataset to a small dataset, is a popular method for improving performance on limited data. [...] Read more.
Small-object detection is a challenging task in computer vision due to the limited training samples and low-quality images. Transfer learning, which transfers the knowledge learned from a large dataset to a small dataset, is a popular method for improving performance on limited data. However, we empirically find that due to the dataset discrepancy, directly transferring the model trained on a general object dataset to small-object datasets obtains inferior performance. In this paper, we propose TranSDet, a novel approach for effective transfer learning for small-object detection. Our method adapts a model trained on a general dataset to a small-object-friendly model by augmenting the training images with diverse smaller resolutions. A dynamic resolution adaptation scheme is employed to ensure consistent performance on various sizes of objects using meta-learning. Additionally, the proposed method introduces two network components, an FPN with shifted feature aggregation and an anchor relation module, which are compatible with transfer learning and effectively improve small-object detection performance. Extensive experiments on the TT100K, BUUISE-MO-Lite, and COCO datasets demonstrate that TranSDet achieves significant improvements compared to existing methods. For example, on the TT100K dataset, TranSDet outperforms the state-of-the-art method by 8.0% in terms of the mean average precision (mAP) for small-object detection. On the BUUISE-MO-Lite dataset, TranSDet improves the detection accuracy of RetinaNet and YOLOv3 by 32.2% and 12.8%, respectively. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Figure 1

48 pages, 6997 KiB  
Article
AiTLAS: Artificial Intelligence Toolbox for Earth Observation
by Ivica Dimitrovski, Ivan Kitanovski, Panče Panov, Ana Kostovska, Nikola Simidjievski and Dragi Kocev
Remote Sens. 2023, 15(9), 2343; https://doi.org/10.3390/rs15092343 - 28 Apr 2023
Cited by 3 | Viewed by 3365
Abstract
We propose AiTLAS—an open-source, state-of-the-art toolbox for exploratory and predictive analysis of satellite imagery. It implements a range of deep-learning architectures and models tailored for the EO tasks illustrated in this case. The versatility and applicability of the toolbox are showcased in a [...] Read more.
We propose AiTLAS—an open-source, state-of-the-art toolbox for exploratory and predictive analysis of satellite imagery. It implements a range of deep-learning architectures and models tailored for the EO tasks illustrated in this case. The versatility and applicability of the toolbox are showcased in a variety of EO tasks, including image scene classification, semantic image segmentation, object detection, and crop type prediction. These use cases demonstrate the potential of the toolbox to support the complete data analysis pipeline starting from data preparation and understanding, through learning novel models or fine-tuning existing ones, using models for making predictions on unseen images, and up to analysis and understanding of the predictions and the predictive performance yielded by the models. AiTLAS brings the AI and EO communities together by facilitating the use of EO data in the AI community and accelerating the uptake of (advanced) machine-learning methods and approaches by EO experts. It achieves this by providing: (1) user-friendly, accessible, and interoperable resources for data analysis through easily configurable and readily usable pipelines; (2) standardized, verifiable, and reusable data handling, wrangling, and pre-processing approaches for constructing AI-ready data; (3) modular and configurable modeling approaches and (pre-trained) models; and (4) standardized and reproducible benchmark protocols including data and models. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Graphical abstract

28 pages, 12729 KiB  
Article
Dynamic High-Resolution Network for Semantic Segmentation in Remote-Sensing Images
by Shichen Guo, Qi Yang, Shiming Xiang, Pengfei Wang and Xuezhi Wang
Remote Sens. 2023, 15(9), 2293; https://doi.org/10.3390/rs15092293 - 26 Apr 2023
Cited by 1 | Viewed by 1783
Abstract
Semantic segmentation of remote-sensing (RS) images is one of the most fundamental tasks in the understanding of a remote-sensing scene. However, high-resolution RS images contain plentiful detailed information about ground objects, which scatter everywhere spatially and have variable sizes, styles, and visual appearances. [...] Read more.
Semantic segmentation of remote-sensing (RS) images is one of the most fundamental tasks in the understanding of a remote-sensing scene. However, high-resolution RS images contain plentiful detailed information about ground objects, which scatter everywhere spatially and have variable sizes, styles, and visual appearances. Due to the high similarity between classes and diversity within classes, it is challenging to obtain satisfactory and accurate semantic segmentation results. This paper proposes a Dynamic High-Resolution Network (DyHRNet) to solve this problem. Our proposed network takes HRNet as a super-architecture, aiming to leverage the important connections and channels by further investigating the parallel streams at different resolution representations of the original HRNet. The learning task is conducted under the framework of a neural architecture search (NAS) and channel-wise attention module. Specifically, the Accelerated Proximal Gradient (APG) algorithm is introduced to iteratively solve the sparse regularization subproblem from the perspective of neural architecture search. In this way, valuable connections are selected for cross-resolution feature fusion. In addition, a channel-wise attention module is designed to weight the channel contributions for feature aggregation. Finally, DyHRNet fully realizes the dynamic advantages of data adaptability by combining the APG algorithm and channel-wise attention module simultaneously. Compared with nine classical or state-of-the-art models (FCN, UNet, PSPNet, DeepLabV3+, OCRNet, SETR, SegFormer, HRNet+FCN, and HRNet+OCR), DyHRNet has shown high performance on three public challenging RS image datasets (Vaihingen, Potsdam, and LoveDA). Furthermore, the visual segmentation results, the learned structures, the iteration process analysis, and the ablation study all demonstrate the effectiveness of our proposed model. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Figure 1

19 pages, 20975 KiB  
Article
A Multi-Feature Fusion and Attention Network for Multi-Scale Object Detection in Remote Sensing Images
by Yong Cheng, Wei Wang, Wenjie Zhang, Ling Yang, Jun Wang, Huan Ni, Tingzhao Guan, Jiaxin He, Yakang Gu and Ngoc Nguyen Tran
Remote Sens. 2023, 15(8), 2096; https://doi.org/10.3390/rs15082096 - 16 Apr 2023
Cited by 3 | Viewed by 2235
Abstract
Accurate multi-scale object detection in remote sensing images poses a challenge due to the complexity of transferring deep features to shallow features among multi-scale objects. Therefore, this study developed a multi-feature fusion and attention network (MFANet) based on YOLOX. By reparameterizing the backbone, [...] Read more.
Accurate multi-scale object detection in remote sensing images poses a challenge due to the complexity of transferring deep features to shallow features among multi-scale objects. Therefore, this study developed a multi-feature fusion and attention network (MFANet) based on YOLOX. By reparameterizing the backbone, fusing multi-branch convolution and attention mechanisms, and optimizing the loss function, the MFANet strengthened the feature extraction of objects at different sizes and increased the detection accuracy. The ablation experiment was carried out on the NWPU VHR-10 dataset. Our results showed that the overall performance of the improved network was around 2.94% higher than the average performance of every single module. Based on the comparison experiments, the improved MFANet demonstrated a high mean average precision of 98.78% for 9 classes of objects in the NWPU VHR-10 10-class detection dataset and 94.91% for 11 classes in the DIOR 20-class detection dataset. Overall, MFANet achieved an mAP of 96.63% and 87.88% acting on the NWPU VHR-10 and DIOR datasets, respectively. This method can promote the development of multi-scale object detection in remote sensing images and has the potential to serve and expand intelligent system research in related fields such as object tracking, semantic segmentation, and scene understanding. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Graphical abstract

27 pages, 24384 KiB  
Article
Learning Sparse Geometric Features for Building Segmentation from Low-Resolution Remote-Sensing Images
by Zeping Liu and Hong Tang
Remote Sens. 2023, 15(7), 1741; https://doi.org/10.3390/rs15071741 - 23 Mar 2023
Cited by 2 | Viewed by 1434
Abstract
High-resolution remote-sensing imagery has proven useful for building extraction. Unfortunately, due to the high acquisition costs and infrequent availability of high-resolution imagery, low-resolution images are more practical for large-scale mapping or change tracking of buildings. However, extracting buildings from low-resolution images is a [...] Read more.
High-resolution remote-sensing imagery has proven useful for building extraction. Unfortunately, due to the high acquisition costs and infrequent availability of high-resolution imagery, low-resolution images are more practical for large-scale mapping or change tracking of buildings. However, extracting buildings from low-resolution images is a challenging task. Compared with high-resolution images, low-resolution images pose two critical challenges in terms of building segmentation: the effects of fuzzy boundary details on buildings and the lack of local textures. In this study, we propose a sparse geometric feature attention network (SGFANet) based on multi-level feature fusion to address the aforementioned issues. From the perspective of the fuzzy effect, SGFANet enhances the representative boundary features by calculating the point-wise affinity of the selected feature points in a top-down manner. From the perspective of lacking local textures, we convert the top-down propagation from local to non-local by introducing the grounding transformer harvesting the global attention of the input image. SGFANet outperforms competing baselines on remote-sensing images collected worldwide and multiple sensors at 4 and 10 m resolution, thereby, improving the IoU by at least 0.66%. Notably, our method is robust and generalizable, which makes it useful for extending the accessibility and scalability of building dynamic tracking across developing areas (e.g., the Xiong’an New Area in China) by using low-resolution images. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Graphical abstract

24 pages, 10646 KiB  
Article
A Lightweight and High-Accuracy Deep Learning Method for Grassland Grazing Livestock Detection Using UAV Imagery
by Yuhang Wang, Lingling Ma, Qi Wang, Ning Wang, Dongliang Wang, Xinhong Wang, Qingchuan Zheng, Xiaoxin Hou and Guangzhou Ouyang
Remote Sens. 2023, 15(6), 1593; https://doi.org/10.3390/rs15061593 - 15 Mar 2023
Cited by 4 | Viewed by 1832
Abstract
Unregulated livestock breeding and grazing can degrade grasslands and damage the ecological environment. The combination of remote sensing and artificial intelligence techniques is a more convenient and powerful means to acquire livestock information in a large area than traditional manual ground investigation. As [...] Read more.
Unregulated livestock breeding and grazing can degrade grasslands and damage the ecological environment. The combination of remote sensing and artificial intelligence techniques is a more convenient and powerful means to acquire livestock information in a large area than traditional manual ground investigation. As a mainstream remote sensing platform, unmanned aerial vehicles (UAVs) can obtain high-resolution optical images to detect grazing livestock in grassland. However, grazing livestock objects in UAV images usually occupy very few pixels and tend to gather together, which makes them difficult to detect and count automatically. This paper proposes the GLDM (grazing livestock detection model), a lightweight and high-accuracy deep-learning model, for detecting grazing livestock in UAV images. The enhanced CSPDarknet (ECSP) and weighted aggregate feature re-extraction pyramid modules (WAFR) are constructed to improve the performance based on the YOLOX-nano network scheme. The dataset of different grazing livestock (12,901 instances) for deep learning was made from UAV images in the Hadatu Pasture of Hulunbuir, Inner Mongolia, China. The results show that the proposed method achieves a higher comprehensive detection precision than mainstream object detection models and has an advantage in model size. The mAP of the proposed method is 86.47%, with the model parameter 5.7 M. The average recall and average precision can be above 85% at the same time. The counting accuracy of grazing livestock in the testing dataset, when converted to a unified sheep unit, reached 99%. The scale applicability of the model is also discussed, and the GLDM could perform well with the image resolution varying from 2.5 to 10 cm. The proposed method, the GLDM, was better for detecting grassland grazing livestock in UAV images, combining remote sensing, AI, and grassland ecological applications with broad application prospects. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Graphical abstract

22 pages, 11268 KiB  
Article
Complex-Valued U-Net with Capsule Embedded for Semantic Segmentation of PolSAR Image
by Lingjuan Yu, Qiqi Shao, Yuting Guo, Xiaochun Xie, Miaomiao Liang and Wen Hong
Remote Sens. 2023, 15(5), 1371; https://doi.org/10.3390/rs15051371 - 28 Feb 2023
Cited by 1 | Viewed by 1695
Abstract
In recent years, semantic segmentation with pixel-level classification has become one of the types of research focus in the field of polarimetric synthetic aperture radar (PolSAR) image interpretation. Fully convolutional network (FCN) can achieve end-to-end semantic segmentation, which provides a basic framework for [...] Read more.
In recent years, semantic segmentation with pixel-level classification has become one of the types of research focus in the field of polarimetric synthetic aperture radar (PolSAR) image interpretation. Fully convolutional network (FCN) can achieve end-to-end semantic segmentation, which provides a basic framework for subsequent improved networks. As a classic FCN-based network, U-Net has been applied to semantic segmentation of remote sensing images. Although good segmentation results have been obtained, scalar neurons have made it difficult for the network to obtain multiple properties of entities in the image. The vector neurons used in the capsule network can effectively solve this problem. In this paper, we propose a complex-valued (CV) U-Net with a CV capsule network embedded for semantic segmentation of a PolSAR image. The structure of CV U-Net is lightweight to match the small PolSAR data, and the embedded CV capsule network is designed to extract more abundant features of the PolSAR image than the CV U-Net. Furthermore, CV dynamic routing is proposed to realize the connection between capsules in two adjacent layers. Experiments on two airborne datasets and one Gaofen-3 dataset show that the proposed network is capable of distinguishing different types of land covers with a similar scattering mechanism and extracting complex boundaries between two adjacent land covers. The network achieves better segmentation performance than other state-of-art networks, especially when the training set size is small. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Figure 1

23 pages, 14857 KiB  
Article
Deep Network Architectures as Feature Extractors for Multi-Label Classification of Remote Sensing Images
by Marjan Stoimchev, Dragi Kocev and Sašo Džeroski
Remote Sens. 2023, 15(2), 538; https://doi.org/10.3390/rs15020538 - 16 Jan 2023
Cited by 7 | Viewed by 3490
Abstract
Data in the form of images are now generated at an unprecedented rate. A case in point is remote sensing images (RSI), now available in large-scale RSI archives, which have attracted a considerable amount of research on image classification within the remote sensing [...] Read more.
Data in the form of images are now generated at an unprecedented rate. A case in point is remote sensing images (RSI), now available in large-scale RSI archives, which have attracted a considerable amount of research on image classification within the remote sensing community. The basic task of single-target multi-class image classification considers the case where each image is assigned exactly one label from a predefined finite set of class labels. Recently, however, image annotations have become increasingly complex, with images labeled with several labels (instead of just one). In other words, the goal is to assign multiple semantic categories to an image, based on its high-level context. The corresponding machine learning tasks is called multi-label classification (MLC). The classification of RSI is currently predominantly addressed by deep neural network (DNN) approaches, especially convolutional neural networks (CNNs), which can be utilized as feature extractors as well as end-to-end methods. After only considering single-target classification for a long period, DNNs have recently emerged that address the task of MLC. On the other hand, trees and tree ensembles for MLC have a long tradition and are the best-performing class of MLC methods, but need predefined feature representations to operate on. In this work, we explore different strategies for model training based on the transfer learning paradigm, where we utilize different families of (pre-trained) CNN architectures, such as VGG, EfficientNet, and ResNet. The architectures are trained in an end-to-end manner and used in two different modes of operation, namely, as standalone models that directly perform the MLC task, and as feature extractors. In the latter case, the learned representations are used with tree ensemble methods for MLC, such as random forests and extremely randomized trees. We conduct an extensive experimental analysis of methods over several publicly available RSI datasets and evaluate their effectiveness in terms of standard MLC measures. Of these, ranking-based evaluation measures are most relevant, especially ranking loss. The results show that, for addressing the RSI-MLC task, it is favorable to use lightweight network architectures, such as EfficientNet-B2, which is the best performing end-to-end approach, as well as a feature extractor. Furthermore, in the datasets with a limited number of images, using traditional tree ensembles for MLC can yield better performance compared to end-to-end deep approaches. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Figure 1

27 pages, 59847 KiB  
Article
A Method of Fusing Probability-Form Knowledge into Object Detection in Remote Sensing Images
by Kunlong Zheng, Yifan Dong, Wei Xu, Yun Su and Pingping Huang
Remote Sens. 2022, 14(23), 6103; https://doi.org/10.3390/rs14236103 - 01 Dec 2022
Cited by 2 | Viewed by 1443
Abstract
In recent years, dramatic progress in object detection in remote sensing images has been made due to the rapid development of convolutional neural networks (CNNs). However, most existing methods solely pay attention to training a suitable network model to extract more powerful features [...] Read more.
In recent years, dramatic progress in object detection in remote sensing images has been made due to the rapid development of convolutional neural networks (CNNs). However, most existing methods solely pay attention to training a suitable network model to extract more powerful features in order to solve the problem of false detections and missed detections caused by background complexity, various scales, and the appearance of the object. To open up new paths, we consider embedding knowledge into geospatial object detection. As a result, we put forward a method of digitizing knowledge and embedding knowledge into detection. Specifically, we first analyze the training set and then transform the probability into a knowledge factor according to an analysis using an improved version of the method used in existing work. With a knowledge matrix consisting of knowledge factors, the Knowledge Inference Module (KIM) optimizes the classification in which the residual structure is introduced to avoid performance degradation. Extensive experiments are conducted on two public remote sensing image data sets, namely DOTA and DIOR. The experimental results prove that the proposed method is able to reduce some false detections and missed detections and obtains a higher mean average precision (mAP) performance than the baseline method. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Figure 1

24 pages, 9830 KiB  
Article
DSANet: A Deep Supervision-Based Simple Attention Network for Efficient Semantic Segmentation in Remote Sensing Imagery
by Wenxu Shi, Qingyan Meng, Linlin Zhang, Maofan Zhao, Chen Su and Tamás Jancsó
Remote Sens. 2022, 14(21), 5399; https://doi.org/10.3390/rs14215399 - 27 Oct 2022
Cited by 7 | Viewed by 2098
Abstract
Semantic segmentation for remote sensing images (RSIs) plays an important role in many applications, such as urban planning, environmental protection, agricultural valuation, and military reconnaissance. With the boom in remote sensing technology, numerous RSIs are generated; this is difficult for current complex networks [...] Read more.
Semantic segmentation for remote sensing images (RSIs) plays an important role in many applications, such as urban planning, environmental protection, agricultural valuation, and military reconnaissance. With the boom in remote sensing technology, numerous RSIs are generated; this is difficult for current complex networks to handle. Efficient networks are the key to solving this challenge. Many previous works aimed at designing lightweight networks or utilizing pruning and knowledge distillation methods to obtain efficient networks, but these methods inevitably reduce the ability of the resulting models to characterize spatial and semantic features. We propose an effective deep supervision-based simple attention network (DSANet) with spatial and semantic enhancement losses to handle these problems. In the network, (1) a lightweight architecture is used as the backbone; (2) deep supervision modules with improved multiscale spatial detail (MSD) and hierarchical semantic enhancement (HSE) losses synergistically strengthen the obtained feature representations; and (3) a simple embedding attention module (EAM) with linear complexity performs long-range relationship modeling. Experiments conducted on two public RSI datasets (the ISPRS Potsdam dataset and Vaihingen dataset) exhibit the substantial advantages of the proposed approach. Our method achieves 79.19% mean intersection over union (mIoU) on the ISPRS Potsdam test set and 72.26% mIoU on the Vaihingen test set with speeds of 470.07 FPS on 512 × 512 images and 5.46 FPS on 6000 × 6000 images using an RTX 3090 GPU. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Figure 1

20 pages, 35130 KiB  
Article
Deep 1D Landmark Representation Learning for Space Target Pose Estimation
by Shengli Liu, Xiaowen Zhu, Zewei Cao and Gang Wang
Remote Sens. 2022, 14(16), 4035; https://doi.org/10.3390/rs14164035 - 18 Aug 2022
Viewed by 1435
Abstract
Monocular vision-based pose estimation for known uncooperative space targets plays an increasingly important role in on-orbit operations. The existing state-of-the-art methods of space target pose estimation build the 2D-3D correspondences to recover the space target pose, where space target landmark regression is a [...] Read more.
Monocular vision-based pose estimation for known uncooperative space targets plays an increasingly important role in on-orbit operations. The existing state-of-the-art methods of space target pose estimation build the 2D-3D correspondences to recover the space target pose, where space target landmark regression is a key component of the methods. The 2D heatmap representation is the dominant descriptor in landmark regression. However, its quantization error grows dramatically under low-resolution input conditions, and extra post-processing is usually needed to compute the accurate 2D pixel coordinates of landmarks from heatmaps. To overcome the aforementioned problems, we propose a novel 1D landmark representation that encodes the horizontal and vertical pixel coordinates of a landmark as two independent 1D vectors. Furthermore, we also propose a space target landmark regression network to regress the locations of landmarks in the image using 1D landmark representations. Comprehensive experiments conducted on the SPEED dataset show that the proposed 1D landmark representation helps the proposed space target landmark regression network outperform existing state-of-the-art methods at various input resolutions, especially at low resolutions. Based on the 2D landmarks predicted by the proposed space target landmark regression network, the error of space target pose estimation is also smaller than existing state-of-the-art methods under all input resolution conditions. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing-II)
Show Figures

Figure 1

Back to TopTop