remotesensing-logo

Journal Browser

Journal Browser

Deep Learning and Computer Vision in Remote Sensing

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "AI Remote Sensing".

Deadline for manuscript submissions: closed (31 May 2022) | Viewed by 101483

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor Assistant
Department of Computing, University of Turku, Turku, Finland
Interests: machine learning; deep learning; computer vision; data analysis; pose estimation

Special Issue Information

Dear Colleagues,

In the last few years, the field of computer vision has made huge progress in remote sensing. This success and progress is mostly due to the effectiveness of deep learning (DL) algorithms. In addition, the remote sensing community has shifted its attention to DL, and DL algorithms have achieved significant success in many image analysis tasks. However, for remote sensing, a number of challenges from difficult data acquisition and annotation have not been fully solved yet.

The aim of this Special Issue is to give the opportunity to explore the mentioned challenges in remote sensing using computer vision, deep learning, and artificial intelligence. Its scope is interdisciplinary, and it seeks collaborative contributions from academia and industrial experts in areas of deep learning, computer vision, data science, and remote sensing. Major topics of interest, by no means exclusive, are as follows:

  • Deep learning and computer vision for RS problems
  • Deep learning for RS image understanding, such as object detection, image classification, and semantic and instance segmentation
  • Deep learning for RS scene understanding and classification
  • Satellite images processing and analysis
  • Transfer learning and machine learning for RS
  • Applications

Dr. Fahimeh Farahnakian
Prof. Dr. Jukka Heikkonen
Guest Editors
Mr. Pouya Jafarzadeh
Guest Editor Assistant

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (26 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

18 pages, 5567 KiB  
Article
Convolutional Neural Network Algorithms for Semantic Segmentation of Volcanic Ash Plumes Using Visible Camera Imagery
by José Francisco Guerrero Tello, Mauro Coltelli, Maria Marsella, Angela Celauro and José Antonio Palenzuela Baena
Remote Sens. 2022, 14(18), 4477; https://doi.org/10.3390/rs14184477 - 8 Sep 2022
Cited by 12 | Viewed by 2322
Abstract
In the last decade, video surveillance cameras have experienced a great technological advance, making capturing and processing of digital images and videos more reliable in many fields of application. Hence, video-camera-based systems appear as one of the techniques most widely used in the [...] Read more.
In the last decade, video surveillance cameras have experienced a great technological advance, making capturing and processing of digital images and videos more reliable in many fields of application. Hence, video-camera-based systems appear as one of the techniques most widely used in the world for monitoring volcanoes, providing a low cost and handy tool in emergency phases, although the processing of large data volumes from continuous acquisition still represents a challenge. To make these systems more effective in cases of emergency, each pixel of the acquired images must be assigned to class labels to categorise them and to locate and segment the observable eruptive activity. This paper is focused on the detection and segmentation of volcanic ash plumes using convolutional neural networks. Two well-established architectures, the segNet and the U-Net, have been used for the processing of in situ images to validate their usability in the field of volcanology. The dataset fed into the two CNN models was acquired from in situ visible video cameras from a ground-based network (Etna_NETVIS) located on Mount Etna (Italy) during the eruptive episode of 24th December 2018, when 560 images were captured from three different stations: CATANIA-CUAD, BRONTE, and Mt. CAGLIATO. In the preprocessing phase, data labelling for computer vision was used, adding one meaningful and informative label to provide eruptive context and the appropriate input for the training of the machine-learning neural network. Methods presented in this work offer a generalised toolset for volcano monitoring to detect, segment, and track ash plume emissions. The automatic detection of plumes helps to significantly reduce the storage of useless data, starting to register and save eruptive events at the time of unrest when a volcano leaves the rest status, and the semantic segmentation allows volcanic plumes to be tracked automatically and allows geometric parameters to be calculated. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

18 pages, 16059 KiB  
Article
Mutual Guidance Meets Supervised Contrastive Learning: Vehicle Detection in Remote Sensing Images
by Hoàng-Ân Lê, Heng Zhang, Minh-Tan Pham and Sébastien Lefèvre
Remote Sens. 2022, 14(15), 3689; https://doi.org/10.3390/rs14153689 - 1 Aug 2022
Cited by 4 | Viewed by 2081
Abstract
Vehicle detection is an important but challenging problem in Earth observation due to the intricately small sizes and varied appearances of the objects of interest. In this paper, we use these issues to our advantage by considering them results of latent image augmentation. [...] Read more.
Vehicle detection is an important but challenging problem in Earth observation due to the intricately small sizes and varied appearances of the objects of interest. In this paper, we use these issues to our advantage by considering them results of latent image augmentation. In particular, we propose using supervised contrastive loss in combination with a mutual guidance matching process to helps learn stronger object representations and tackles the misalignment of localization and classification in object detection. Extensive experiments are performed to understand the combination of the two strategies and show the benefits for vehicle detection on aerial and satellite images, achieving performance on par with state-of-the-art methods designed for small and very small object detection. As the proposed method is domain-agnostic, it might also be used for visual representation learning in generic computer vision problems. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Figure 1

29 pages, 9921 KiB  
Article
Detection of River Plastic Using UAV Sensor Data and Deep Learning
by Nisha Maharjan, Hiroyuki Miyazaki, Bipun Man Pati, Matthew N. Dailey, Sangam Shrestha and Tai Nakamura
Remote Sens. 2022, 14(13), 3049; https://doi.org/10.3390/rs14133049 - 25 Jun 2022
Cited by 24 | Viewed by 6753
Abstract
Plastic pollution is a critical global issue. Increases in plastic consumption have triggered increased production, which in turn has led to increased plastic disposal. In situ observation of plastic litter is tedious and cumbersome, especially in rural areas and around transboundary rivers. We [...] Read more.
Plastic pollution is a critical global issue. Increases in plastic consumption have triggered increased production, which in turn has led to increased plastic disposal. In situ observation of plastic litter is tedious and cumbersome, especially in rural areas and around transboundary rivers. We therefore propose automatic mapping of plastic in rivers using unmanned aerial vehicles (UAVs) and deep learning (DL) models that require modest compute resources. We evaluate the method at two different sites: the Houay Mak Hiao River, a tributary of the Mekong River in Vientiane, Laos, and Khlong Nueng canal in Talad Thai, Khlong Luang, Pathum Thani, Thailand. Detection models in the You Only Look Once (YOLO) family are evaluated in terms of runtime resources and mean average Precision (mAP) at an Intersection over Union (IoU) threshold of 0.5. YOLOv5s is found to be the most effective model, with low computational cost and a very high mAP of 0.81 without transfer learning for the Houay Mak Hiao dataset. The performance of all models is improved by transfer learning from Talad Thai to Houay Mak Hiao. Pre-trained YOLOv4 with transfer learning obtains the overall highest accuracy, with a 3.0% increase in mAP to 0.83, compared to the marginal increase of 2% in mAP for pre-trained YOLOv5s. YOLOv3, when trained from scratch, shows the greatest benefit from transfer learning, with an increase in mAP from 0.59 to 0.81 after transfer learning from Talad Thai to Houay Mak Hiao. The pre-trained YOLOv5s model using the Houay Mak Hiao dataset is found to provide the best tradeoff between accuracy and computational complexity, requiring model resources yet providing reliable plastic detection with or without transfer learning. Various stakeholders in the effort to monitor and reduce plastic waste in our waterways can utilize the resulting deep learning approach irrespective of location. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

23 pages, 13756 KiB  
Article
Point RCNN: An Angle-Free Framework for Rotated Object Detection
by Qiang Zhou and Chaohui Yu
Remote Sens. 2022, 14(11), 2605; https://doi.org/10.3390/rs14112605 - 29 May 2022
Cited by 17 | Viewed by 4861
Abstract
Rotated object detection in aerial images is still challenging due to arbitrary orientations, large scale and aspect ratio variations, and extreme density of objects. Existing state-of-the-art rotated object detection methods mainly rely on angle-based detectors. However, angle-based detectors can easily suffer from a [...] Read more.
Rotated object detection in aerial images is still challenging due to arbitrary orientations, large scale and aspect ratio variations, and extreme density of objects. Existing state-of-the-art rotated object detection methods mainly rely on angle-based detectors. However, angle-based detectors can easily suffer from a long-standing boundary problem. To tackle this problem, we propose a purely angle-free framework for rotated object detection, called Point RCNN. Point RCNN is a two-stage detector including both PointRPN and PointReg which are angle-free. Given an input aerial image, first, the backbone-FPN extracts hierarchical features, then, the PointRPN module generates an accurate rotated region of interests (RRoIs) by converting the learned representative points of each rotated object using the MinAreaRect function of OpenCV. Motivated by RepPoints, we designed a coarse-to-fine process to regress and refine the representative points for more accurate RRoIs. Next, based on the learned RRoIs of PointRPN, the PointReg module learns to regress and refine the corner points of each RRoI to perform more accurate rotated object detection. Finally, the final rotated bounding box of each rotated object can be attained based on the learned four corner points. In addition, aerial images are often severely unbalanced in categories, and existing rotated object detection methods almost ignore this problem. To tackle the severely unbalanced dataset problem, we propose a balanced dataset strategy. We experimentally verified that re-sampling the images of the rare categories can stabilize the training procedure and further improve the detection performance. Specifically, the performance was improved from 80.37 mAP to 80.71 mAP in DOTA-v1.0. Without unnecessary elaboration, our Point RCNN method achieved new state-of-the-art detection performance on multiple large-scale aerial image datasets, including DOTA-v1.0, DOTA-v1.5, HRSC2016, and UCAS-AOD. Specifically, in DOTA-v1.0, our Point RCNN achieved better detection performance of 80.71 mAP. In DOTA-v1.5, Point RCNN achieved 79.31 mAP, which significantly improved the performance by 2.86 mAP (from ReDet’s 76.45 to our 79.31). In HRSC2016 and UCAS-AOD, our Point RCNN achieved higher performance of 90.53 mAP and 90.04 mAP, respectively. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Figure 1

22 pages, 1584 KiB  
Article
LSNet: Learned Sampling Network for 3D Object Detection from Point Clouds
by Mingming Wang, Qingkui Chen and Zhibing Fu
Remote Sens. 2022, 14(7), 1539; https://doi.org/10.3390/rs14071539 - 23 Mar 2022
Cited by 11 | Viewed by 2802
Abstract
The3D object detection of LiDAR point cloud data has generated widespread discussion and implementation in recent years. In this paper, we concentrate on exploring the sampling method of point-based 3D object detection in autonomous driving scenarios, a process which attempts to reduce expenditure [...] Read more.
The3D object detection of LiDAR point cloud data has generated widespread discussion and implementation in recent years. In this paper, we concentrate on exploring the sampling method of point-based 3D object detection in autonomous driving scenarios, a process which attempts to reduce expenditure by reaching sufficient accuracy using fewer selected points. FPS (farthest point sampling), the most used sampling method, works poorly in small sampling size cases, and, limited by the massive points, some newly proposed sampling methods using deep learning are not suitable for autonomous driving scenarios. To address these issues, we propose the learned sampling network (LSNet), a single-stage 3D object detection network containing an LS module that can sample important points through deep learning. This advanced approach can sample points with a task-specific focus while also being differentiable. Additionally, the LS module is streamlined for computational efficiency and transferability to replace more primitive sampling methods in other point-based networks. To reduce the issue of the high repetition rates of sampled points, a sampling loss algorithm was developed. The LS module was validated with the KITTI dataset and outperformed the other sampling methods, such as FPS and F-FPS (FPS based on feature distance). Finally, LSNet achieves acceptable accuracy with only 128 sampled points and shows promising results when the number of sampled points is small, yielding up to a 60% improvement against competing methods with eight sampled points. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Figure 1

21 pages, 20011 KiB  
Article
Oriented Object Detection in Remote Sensing Images with Anchor-Free Oriented Region Proposal Network
by Jianxiang Li, Yan Tian, Yiping Xu and Zili Zhang
Remote Sens. 2022, 14(5), 1246; https://doi.org/10.3390/rs14051246 - 3 Mar 2022
Cited by 11 | Viewed by 4121
Abstract
Oriented object detection is a fundamental and challenging task in remote sensing image analysis that has recently drawn much attention. Currently, mainstream oriented object detectors are based on densely placed predefined anchors. However, the high number of anchors aggravates the positive and negative [...] Read more.
Oriented object detection is a fundamental and challenging task in remote sensing image analysis that has recently drawn much attention. Currently, mainstream oriented object detectors are based on densely placed predefined anchors. However, the high number of anchors aggravates the positive and negative sample imbalance problem, which may lead to duplicate detections or missed detections. To address the problem, this paper proposes a novel anchor-free two-stage oriented object detector. We propose the Anchor-Free Oriented Region Proposal Network (AFO-RPN) to generate high-quality oriented proposals without enormous predefined anchors. To deal with rotation problems, we also propose a new representation of an oriented box based on a polar coordinate system. To solve the severe appearance ambiguity problems faced by anchor-free methods, we use a Criss-Cross Attention Feature Pyramid Network (CCA-FPN) to exploit the contextual information of each pixel and its neighbors in order to enhance the feature representation. Extensive experiments on three public remote sensing benchmarks—DOTA, DIOR-R, and HRSC2016—demonstrate that our method can achieve very promising detection performance, with a mean average precision (mAP) of 80.68%, 67.15%, and 90.45%, respectively, on the benchmarks. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

19 pages, 3814 KiB  
Article
Multiview Image Matching of Optical Satellite and UAV Based on a Joint Description Neural Network
by Chuan Xu, Chang Liu, Hongli Li, Zhiwei Ye, Haigang Sui and Wei Yang
Remote Sens. 2022, 14(4), 838; https://doi.org/10.3390/rs14040838 - 10 Feb 2022
Cited by 7 | Viewed by 4572
Abstract
Matching aerial and satellite optical images with large dip angles is a core technology and is essential for target positioning and dynamic monitoring in sensitive areas. However, due to the long distances and large dip angle observations of the aerial platform, there are [...] Read more.
Matching aerial and satellite optical images with large dip angles is a core technology and is essential for target positioning and dynamic monitoring in sensitive areas. However, due to the long distances and large dip angle observations of the aerial platform, there are significant perspective, radiation, and scale differences between heterologous space-sky images, which seriously affect the accuracy and robustness of feature matching. In this paper, a multiview satellite and unmanned aerial vehicle (UAV) image matching method based on deep learning is proposed to solve this problem. The main innovation of this approach is to propose a joint descriptor consisting of soft descriptions and hard descriptions. Hard descriptions are used as the main description to ensure matching accuracy. Soft descriptions are used not only as auxiliary descriptions but also for the process of network training. Experiments on several problems show that the proposed method ensures matching efficiency and achieves better matching accuracy for multiview satellite and UAV images than other traditional methods. In addition, the matching accuracy of our method in optical satellite and UAV images is within 3 pixels, and can nearly reach 2 pixels, which meets the requirements of relevant UAV missions. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Figure 1

21 pages, 9042 KiB  
Article
Logging Trail Segmentation via a Novel U-Net Convolutional Neural Network and High-Density Laser Scanning Data
by Omid Abdi, Jori Uusitalo and Veli-Pekka Kivinen
Remote Sens. 2022, 14(2), 349; https://doi.org/10.3390/rs14020349 - 13 Jan 2022
Cited by 6 | Viewed by 3094
Abstract
Logging trails are one of the main components of modern forestry. However, spotting the accurate locations of old logging trails through common approaches is challenging and time consuming. This study was established to develop an approach, using cutting-edge deep-learning convolutional neural networks and [...] Read more.
Logging trails are one of the main components of modern forestry. However, spotting the accurate locations of old logging trails through common approaches is challenging and time consuming. This study was established to develop an approach, using cutting-edge deep-learning convolutional neural networks and high-density laser scanning data, to detect logging trails in different stages of commercial thinning, in Southern Finland. We constructed a U-Net architecture, consisting of encoder and decoder paths with several convolutional layers, pooling and non-linear operations. The canopy height model (CHM), digital surface model (DSM), and digital elevation models (DEMs) were derived from the laser scanning data and were used as image datasets for training the model. The labeled dataset for the logging trails was generated from different references as well. Three forest areas were selected to test the efficiency of the algorithm that was developed for detecting logging trails. We designed 21 routes, including 390 samples of the logging trails and non-logging trails, covering all logging trails inside the stands. The results indicated that the trained U-Net using DSM (k = 0.846 and IoU = 0.867) shows superior performance over the trained model using CHM (k = 0.734 and IoU = 0.782), DEMavg (k = 0.542 and IoU = 0.667), and DEMmin (k = 0.136 and IoU = 0.155) in distinguishing logging trails from non-logging trails. Although the efficiency of the developed approach in young and mature stands that had undergone the commercial thinning is approximately perfect, it needs to be improved in old stands that have not received the second or third commercial thinning. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

27 pages, 19359 KiB  
Article
BiFDANet: Unsupervised Bidirectional Domain Adaptation for Semantic Segmentation of Remote Sensing Images
by Yuxiang Cai, Yingchun Yang, Qiyi Zheng, Zhengwei Shen, Yongheng Shang, Jianwei Yin and Zhongtian Shi
Remote Sens. 2022, 14(1), 190; https://doi.org/10.3390/rs14010190 - 1 Jan 2022
Cited by 23 | Viewed by 3787
Abstract
When segmenting massive amounts of remote sensing images collected from different satellites or geographic locations (cities), the pre-trained deep learning models cannot always output satisfactory predictions. To deal with this issue, domain adaptation has been widely utilized to enhance the generalization abilities of [...] Read more.
When segmenting massive amounts of remote sensing images collected from different satellites or geographic locations (cities), the pre-trained deep learning models cannot always output satisfactory predictions. To deal with this issue, domain adaptation has been widely utilized to enhance the generalization abilities of the segmentation models. Most of the existing domain adaptation methods, which based on image-to-image translation, firstly transfer the source images to the pseudo-target images, adapt the classifier from the source domain to the target domain. However, these unidirectional methods suffer from the following two limitations: (1) they do not consider the inverse procedure and they cannot fully take advantage of the information from the other domain, which is also beneficial, as confirmed by our experiments; (2) these methods may fail in the cases where transferring the source images to the pseudo-target images is difficult. In this paper, in order to solve these problems, we propose a novel framework BiFDANet for unsupervised bidirectional domain adaptation in the semantic segmentation of remote sensing images. It optimizes the segmentation models in two opposite directions. In the source-to-target direction, BiFDANet learns to transfer the source images to the pseudo-target images and adapts the classifier to the target domain. In the opposite direction, BiFDANet transfers the target images to the pseudo-source images and optimizes the source classifier. At test stage, we make the best of the source classifier and the target classifier, which complement each other with a simple linear combination method, further improving the performance of our BiFDANet. Furthermore, we propose a new bidirectional semantic consistency loss for our BiFDANet to maintain the semantic consistency during the bidirectional image-to-image translation process. The experiments on two datasets including satellite images and aerial images demonstrate the superiority of our method against existing unidirectional methods. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

19 pages, 9134 KiB  
Article
Semantic Segmentation and Analysis on Sensitive Parameters of Forest Fire Smoke Using Smoke-Unet and Landsat-8 Imagery
by Zewei Wang, Pengfei Yang, Haotian Liang, Change Zheng, Jiyan Yin, Ye Tian and Wenbin Cui
Remote Sens. 2022, 14(1), 45; https://doi.org/10.3390/rs14010045 - 23 Dec 2021
Cited by 45 | Viewed by 5024
Abstract
Forest fire is a ubiquitous disaster which has a long-term impact on the local climate as well as the ecological balance and fire products based on remote sensing satellite data have developed rapidly. However, the early forest fire smoke in remote sensing images [...] Read more.
Forest fire is a ubiquitous disaster which has a long-term impact on the local climate as well as the ecological balance and fire products based on remote sensing satellite data have developed rapidly. However, the early forest fire smoke in remote sensing images is small in area and easily confused by clouds and fog, which makes it difficult to be identified. Too many redundant frequency bands and remote sensing index for remote sensing satellite data will have an interference on wildfire smoke detection, resulting in a decline in detection accuracy and detection efficiency for wildfire smoke. To solve these problems, this study analyzed the sensitivity of remote sensing satellite data and remote sensing index used for wildfire detection. First, a high-resolution remote sensing multispectral image dataset of forest fire smoke, containing different years, seasons, regions and land cover, was established. Then Smoke-Unet, a smoke segmentation network model based on an improved Unet combined with the attention mechanism and residual block, was proposed. Furthermore, in order to reduce data redundancy and improve the recognition accuracy of the algorithm, the conclusion was made by experiments that the RGB, SWIR2 and AOD bands are sensitive to smoke recognition in Landsat-8 images. The experimental results show that the smoke pixel accuracy rate using the proposed Smoke-Unet is 3.1% higher than that of Unet, which could effectively segment the smoke pixels in remote sensing images. This proposed method under the RGB, SWIR2 and AOD bands can help to segment smoke by using high-sensitivity band and remote sensing index and makes an early alarm of forest fire smoke. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

21 pages, 6797 KiB  
Article
Pyramid Information Distillation Attention Network for Super-Resolution Reconstruction of Remote Sensing Images
by Bo Huang, Zhiming Guo, Liaoni Wu, Boyong He, Xianjiang Li and Yuxing Lin
Remote Sens. 2021, 13(24), 5143; https://doi.org/10.3390/rs13245143 - 17 Dec 2021
Cited by 5 | Viewed by 2644
Abstract
Image super-resolution (SR) technology aims to recover high-resolution images from low-resolution originals, and it is of great significance for the high-quality interpretation of remote sensing images. However, most present SR-reconstruction approaches suffer from network training difficulties and the challenge of increasing computational complexity [...] Read more.
Image super-resolution (SR) technology aims to recover high-resolution images from low-resolution originals, and it is of great significance for the high-quality interpretation of remote sensing images. However, most present SR-reconstruction approaches suffer from network training difficulties and the challenge of increasing computational complexity with increasing numbers of network layers. This indicates that these approaches are not suitable for application scenarios with limited computing resources. Furthermore, the complex spatial distributions and rich details of remote sensing images increase the difficulty of their reconstruction. In this paper, we propose the pyramid information distillation attention network (PIDAN) to solve these issues. Specifically, we propose the pyramid information distillation attention block (PIDAB), which has been developed as a building block in the PIDAN. The key components of the PIDAB are the pyramid information distillation (PID) module and the hybrid attention mechanism (HAM) module. Firstly, the PID module uses feature distillation with parallel multi-receptive field convolutions to extract short- and long-path feature information, which allows the network to obtain more non-redundant image features. Then, the HAM module enhances the sensitivity of the network to high-frequency image information. Extensive validation experiments show that when compared with other advanced CNN-based approaches, the PIDAN achieves a better balance between image SR performance and model size. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

17 pages, 7765 KiB  
Article
Deep Learning Triplet Ordinal Relation Preserving Binary Code for Remote Sensing Image Retrieval Task
by Zhen Wang, Nannan Wu, Xiaohan Yang, Bingqi Yan and Pingping Liu
Remote Sens. 2021, 13(23), 4786; https://doi.org/10.3390/rs13234786 - 26 Nov 2021
Cited by 4 | Viewed by 2157
Abstract
As satellite observation technology rapidly develops, the number of remote sensing (RS) images dramatically increases, and this leads RS image retrieval tasks to be more challenging in terms of speed and accuracy. Recently, an increasing number of researchers have turned their attention to [...] Read more.
As satellite observation technology rapidly develops, the number of remote sensing (RS) images dramatically increases, and this leads RS image retrieval tasks to be more challenging in terms of speed and accuracy. Recently, an increasing number of researchers have turned their attention to this issue, as well as hashing algorithms, which map real-valued data onto a low-dimensional Hamming space and have been widely utilized to respond quickly to large-scale RS image search tasks. However, most existing hashing algorithms only emphasize preserving point-wise or pair-wise similarity, which may lead to an inferior approximate nearest neighbor (ANN) search result. To fix this problem, we propose a novel triplet ordinal cross entropy hashing (TOCEH). In TOCEH, to enhance the ability of preserving the ranking orders in different spaces, we establish a tensor graph representing the Euclidean triplet ordinal relationship among RS images and minimize the cross entropy between the probability distribution of the established Euclidean similarity graph and that of the Hamming triplet ordinal relation with the given binary code. During the training process, to avoid the non-deterministic polynomial (NP) hard problem, we utilize a continuous function instead of the discrete encoding process. Furthermore, we design a quantization objective function based on the principle of preserving triplet ordinal relation to minimize the loss caused by the continuous relaxation procedure. The comparative RS image retrieval experiments are conducted on three publicly available datasets, including UC Merced Land Use Dataset (UCMD), SAT-4 and SAT-6. The experimental results show that the proposed TOCEH algorithm outperforms many existing hashing algorithms in RS image retrieval tasks. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

19 pages, 7032 KiB  
Article
An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation
by Xiangkai Xu, Zhejun Feng, Changqing Cao, Mengyuan Li, Jin Wu, Zengyan Wu, Yajie Shang and Shubing Ye
Remote Sens. 2021, 13(23), 4779; https://doi.org/10.3390/rs13234779 - 25 Nov 2021
Cited by 81 | Viewed by 10715
Abstract
Remote sensing image object detection and instance segmentation are widely valued research fields. A convolutional neural network (CNN) has shown defects in the object detection of remote sensing images. In recent years, the number of studies on transformer-based models increased, and these studies [...] Read more.
Remote sensing image object detection and instance segmentation are widely valued research fields. A convolutional neural network (CNN) has shown defects in the object detection of remote sensing images. In recent years, the number of studies on transformer-based models increased, and these studies achieved good results. However, transformers still suffer from poor small object detection and unsatisfactory edge detail segmentation. In order to solve these problems, we improved the Swin transformer based on the advantages of transformers and CNNs, and designed a local perception Swin transformer (LPSW) backbone to enhance the local perception of the network and to improve the detection accuracy of small-scale objects. We also designed a spatial attention interleaved execution cascade (SAIEC) network framework, which helped to strengthen the segmentation accuracy of the network. Due to the lack of remote sensing mask datasets, the MRS-1800 remote sensing mask dataset was created. Finally, we combined the proposed backbone with the new network framework and conducted experiments on this MRS-1800 dataset. Compared with the Swin transformer, the proposed model improved the mask AP by 1.7%, mask APS by 3.6%, AP by 1.1% and APS by 4.6%, demonstrating its effectiveness and feasibility. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

29 pages, 66893 KiB  
Article
A Dense Encoder–Decoder Network with Feedback Connections for Pan-Sharpening
by Weisheng Li, Minghao Xiang and Xuesong Liang
Remote Sens. 2021, 13(22), 4505; https://doi.org/10.3390/rs13224505 - 9 Nov 2021
Cited by 1 | Viewed by 2241
Abstract
To meet the need for multispectral images having high spatial resolution in practical applications, we propose a dense encoder–decoder network with feedback connections for pan-sharpening. Our network consists of four parts. The first part consists of two identical subnetworks, one each to extract [...] Read more.
To meet the need for multispectral images having high spatial resolution in practical applications, we propose a dense encoder–decoder network with feedback connections for pan-sharpening. Our network consists of four parts. The first part consists of two identical subnetworks, one each to extract features from PAN and MS images, respectively. The second part is an efficient feature-extraction block. We hope that the network can focus on features at different scales, so we propose innovative multiscale feature-extraction blocks that fully extract effective features from networks of various depths and widths by using three multiscale feature-extraction blocks and two long-jump connections. The third part is the feature fusion and recovery network. We are inspired by the work on U-Net network improvements to propose a brand new encoder network structure with dense connections that improves network performance through effective connections to encoders and decoders at different scales. The fourth part is a continuous feedback connection operation with overfeedback to refine shallow features, which enables the network to obtain better reconstruction capabilities earlier. To demonstrate the effectiveness of our method, we performed several experiments. Experiments on various satellite datasets show that the proposed method outperforms existing methods. Our results show significant improvements over those from other models in terms of the multiple-target index values used to measure the spectral quality and spatial details of the generated images. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Figure 1

18 pages, 3695 KiB  
Article
DisasterGAN: Generative Adversarial Networks for Remote Sensing Disaster Image Generation
by Xue Rui, Yang Cao, Xin Yuan, Yu Kang and Weiguo Song
Remote Sens. 2021, 13(21), 4284; https://doi.org/10.3390/rs13214284 - 25 Oct 2021
Cited by 17 | Viewed by 3795
Abstract
Rapid progress on disaster detection and assessment has been achieved with the development of deep-learning techniques and the wide applications of remote sensing images. However, it is still a great challenge to train an accurate and robust disaster detection network due to the [...] Read more.
Rapid progress on disaster detection and assessment has been achieved with the development of deep-learning techniques and the wide applications of remote sensing images. However, it is still a great challenge to train an accurate and robust disaster detection network due to the class imbalance of existing data sets and the lack of training data. This paper aims at synthesizing disaster remote sensing images with multiple disaster types and different building damage with generative adversarial networks (GANs), making up for the shortcomings of the existing data sets. However, existing models are inefficient in multi-disaster image translation due to the diversity of disaster and inevitably change building-irrelevant regions caused by directly operating on the whole image. Thus, we propose two models: disaster translation GAN can generate disaster images for multiple disaster types using only a single model, which uses an attribute to represent disaster types and a reconstruction process to further ensure the effect of the generator; damaged building generation GAN is a mask-guided image generation model, which can only alter the attribute-specific region while keeping the attribute-irrelevant region unchanged. Qualitative and quantitative experiments demonstrate the validity of the proposed methods. Further experimental results on the damaged building assessment model show the effectiveness of the proposed models and the superiority compared with other data augmentation methods. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

19 pages, 8950 KiB  
Article
SGA-Net: Self-Constructing Graph Attention Neural Network for Semantic Segmentation of Remote Sensing Images
by Wenjie Zi, Wei Xiong, Hao Chen, Jun Li and Ning Jing
Remote Sens. 2021, 13(21), 4201; https://doi.org/10.3390/rs13214201 - 20 Oct 2021
Cited by 19 | Viewed by 3340
Abstract
Semantic segmentation of remote sensing images is always a critical and challenging task. Graph neural networks, which can capture global contextual representations, can exploit long-range pixel dependency, thereby improving semantic segmentation performance. In this paper, a novel self-constructing graph attention neural network is [...] Read more.
Semantic segmentation of remote sensing images is always a critical and challenging task. Graph neural networks, which can capture global contextual representations, can exploit long-range pixel dependency, thereby improving semantic segmentation performance. In this paper, a novel self-constructing graph attention neural network is proposed for such a purpose. Firstly, ResNet50 was employed as backbone of a feature extraction network to acquire feature maps of remote sensing images. Secondly, pixel-wise dependency graphs were constructed from the feature maps of images, and a graph attention network is designed to extract the correlations of pixels of the remote sensing images. Thirdly, the channel linear attention mechanism obtained the channel dependency of images, further improving the prediction of semantic segmentation. Lastly, we conducted comprehensive experiments and found that the proposed model consistently outperformed state-of-the-art methods on two widely used remote sensing image datasets. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Figure 1

21 pages, 31076 KiB  
Article
SSSGAN: Satellite Style and Structure Generative Adversarial Networks
by Javier Marín and Sergio Escalera
Remote Sens. 2021, 13(19), 3984; https://doi.org/10.3390/rs13193984 - 5 Oct 2021
Cited by 6 | Viewed by 4409
Abstract
This work presents Satellite Style and Structure Generative Adversarial Network (SSGAN), a generative model of high resolution satellite imagery to support image segmentation. Based on spatially adaptive denormalization modules (SPADE) that modulate the activations with respect to segmentation map structure, in addition to [...] Read more.
This work presents Satellite Style and Structure Generative Adversarial Network (SSGAN), a generative model of high resolution satellite imagery to support image segmentation. Based on spatially adaptive denormalization modules (SPADE) that modulate the activations with respect to segmentation map structure, in addition to global descriptor vectors that capture the semantic information in a vector with respect to Open Street Maps (OSM) classes, this model is able to produce consistent aerial imagery. By decoupling the generation of aerial images into a structure map and a carefully defined style vector, we were able to improve the realism and geodiversity of the synthesis with respect to the state-of-the-art baseline. Therefore, the proposed model allows us to control the generation not only with respect to the desired structure, but also with respect to a geographic area. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

19 pages, 5394 KiB  
Article
Fast and High-Quality 3-D Terahertz Super-Resolution Imaging Using Lightweight SR-CNN
by Lei Fan, Yang Zeng, Qi Yang, Hongqiang Wang and Bin Deng
Remote Sens. 2021, 13(19), 3800; https://doi.org/10.3390/rs13193800 - 22 Sep 2021
Cited by 9 | Viewed by 1992
Abstract
High-quality three-dimensional (3-D) radar imaging is one of the challenging problems in radar imaging enhancement. The existing sparsity regularizations are limited to the heavy computational burden and time-consuming iteration operation. Compared with the conventional sparsity regularizations, the super-resolution (SR) imaging methods based on [...] Read more.
High-quality three-dimensional (3-D) radar imaging is one of the challenging problems in radar imaging enhancement. The existing sparsity regularizations are limited to the heavy computational burden and time-consuming iteration operation. Compared with the conventional sparsity regularizations, the super-resolution (SR) imaging methods based on convolution neural network (CNN) can promote imaging time and achieve more accuracy. However, they are confined to 2-D space and model training under small dataset is not competently considered. To solve these problem, a fast and high-quality 3-D terahertz radar imaging method based on lightweight super-resolution CNN (SR-CNN) is proposed in this paper. First, an original 3-D radar echo model is presented and the expected SR model is derived by the given imaging geometry. Second, the SR imaging method based on lightweight SR-CNN is proposed to improve the image quality and speed up the imaging time. Furthermore, the resolution characteristics among spectrum estimation, sparsity regularization and SR-CNN are analyzed by the point spread function (PSF). Finally, electromagnetic computation simulations are carried out to validate the effectiveness of the proposed method in terms of image quality. The robustness against noise and the stability under small are demonstrate by ablation experiments. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Figure 1

19 pages, 6943 KiB  
Article
Predicting Arbitrary-Oriented Objects as Points in Remote Sensing Images
by Jian Wang, Le Yang and Fan Li
Remote Sens. 2021, 13(18), 3731; https://doi.org/10.3390/rs13183731 - 17 Sep 2021
Cited by 12 | Viewed by 2746
Abstract
To detect rotated objects in remote sensing images, researchers have proposed a series of arbitrary-oriented object detection methods, which place multiple anchors with different angles, scales, and aspect ratios on the images. However, a major difference between remote sensing images and natural images [...] Read more.
To detect rotated objects in remote sensing images, researchers have proposed a series of arbitrary-oriented object detection methods, which place multiple anchors with different angles, scales, and aspect ratios on the images. However, a major difference between remote sensing images and natural images is the small probability of overlap between objects in the same category, so the anchor-based design can introduce much redundancy during the detection process. In this paper, we convert the detection problem to a center point prediction problem, where the pre-defined anchors can be discarded. By directly predicting the center point, orientation, and corresponding height and width of the object, our methods can simplify the design of the model and reduce the computations related to anchors. In order to further fuse the multi-level features and get accurate object centers, a deformable feature pyramid network is proposed, to detect objects under complex backgrounds and various orientations of rotated objects. Experiments and analysis on two remote sensing datasets, DOTA and HRSC2016, demonstrate the effectiveness of our approach. Our best model, equipped with Deformable-FPN, achieved 74.75% mAP on DOTA and 96.59% on HRSC2016 with a single-stage model, single-scale training, and testing. By detecting arbitrarily oriented objects from their centers, the proposed model performs competitively against oriented anchor-based methods. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

26 pages, 14627 KiB  
Article
Learning Rotated Inscribed Ellipse for Oriented Object Detection in Remote Sensing Images
by Xu He, Shiping Ma, Linyuan He, Le Ru and Chen Wang
Remote Sens. 2021, 13(18), 3622; https://doi.org/10.3390/rs13183622 - 10 Sep 2021
Cited by 8 | Viewed by 2618
Abstract
Oriented object detection in remote sensing images (RSIs) is a significant yet challenging Earth Vision task, as the objects in RSIs usually emerge with complicated backgrounds, arbitrary orientations, multi-scale distributions, and dramatic aspect ratio variations. Existing oriented object detectors are mostly inherited from [...] Read more.
Oriented object detection in remote sensing images (RSIs) is a significant yet challenging Earth Vision task, as the objects in RSIs usually emerge with complicated backgrounds, arbitrary orientations, multi-scale distributions, and dramatic aspect ratio variations. Existing oriented object detectors are mostly inherited from the anchor-based paradigm. However, the prominent performance of high-precision and real-time detection with anchor-based detectors is overshadowed by the design limitations of tediously rotated anchors. By using the simplicity and efficiency of keypoint-based detection, in this work, we extend a keypoint-based detector to the task of oriented object detection in RSIs. Specifically, we first simplify the oriented bounding box (OBB) as a center-based rotated inscribed ellipse (RIE), and then employ six parameters to represent the RIE inside each OBB: the center point position of the RIE, the offsets of the long half axis, the length of the short half axis, and an orientation label. In addition, to resolve the influence of complex backgrounds and large-scale variations, a high-resolution gated aggregation network (HRGANet) is designed to identify the targets of interest from complex backgrounds and fuse multi-scale features by using a gated aggregation model (GAM). Furthermore, by analyzing the influence of eccentricity on orientation error, eccentricity-wise orientation loss (ewoLoss) is proposed to assign the penalties on the orientation loss based on the eccentricity of the RIE, which effectively improves the accuracy of the detection of oriented objects with a large aspect ratio. Extensive experimental results on the DOTA and HRSC2016 datasets demonstrate the effectiveness of the proposed method. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Figure 1

20 pages, 7717 KiB  
Article
Split-Attention Networks with Self-Calibrated Convolution for Moon Impact Crater Detection from Multi-Source Data
by Yutong Jia, Gang Wan, Lei Liu, Jue Wang, Yitian Wu, Naiyang Xue, Ying Wang and Rixin Yang
Remote Sens. 2021, 13(16), 3193; https://doi.org/10.3390/rs13163193 - 12 Aug 2021
Cited by 11 | Viewed by 2695
Abstract
Impact craters are the most prominent features on the surface of the Moon, Mars, and Mercury. They play an essential role in constructing lunar bases, the dating of Mars and Mercury, and the surface exploration of other celestial bodies. The traditional crater detection [...] Read more.
Impact craters are the most prominent features on the surface of the Moon, Mars, and Mercury. They play an essential role in constructing lunar bases, the dating of Mars and Mercury, and the surface exploration of other celestial bodies. The traditional crater detection algorithms (CDA) are mainly based on manual interpretation which is combined with classical image processing techniques. The traditional CDAs are, however, inefficient for detecting smaller or overlapped impact craters. In this paper, we propose a Split-Attention Networks with Self-Calibrated Convolution (SCNeSt) architecture, in which the channel-wise attention with multi-path representation and self-calibrated convolutions can generate more prosperous and more discriminative feature representations. The algorithm first extracts the crater feature model under the well-known target detection R-FCN network framework. The trained models are then applied to detecting the impact craters on Mercury and Mars using the transfer learning method. In the lunar impact crater detection experiment, we managed to extract a total of 157,389 impact craters with diameters between 0.6 and 860 km. Our proposed model outperforms the ResNet, ResNeXt, ScNet, and ResNeSt models in terms of recall rate and accuracy is more efficient than that other residual network models. Without training for Mars and Mercury remote sensing data, our model can also identify craters of different scales and demonstrates outstanding robustness and transferability. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Figure 1

23 pages, 2063 KiB  
Article
Variational Generative Adversarial Network with Crossed Spatial and Spectral Interactions for Hyperspectral Image Classification
by Zhongwei Li, Xue Zhu, Ziqi Xin, Fangming Guo, Xingshuai Cui and Leiquan Wang
Remote Sens. 2021, 13(16), 3131; https://doi.org/10.3390/rs13163131 - 7 Aug 2021
Cited by 7 | Viewed by 2449
Abstract
Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) have been widely used in hyperspectral image classification (HSIC) tasks. However, the generated HSI virtual samples by VAEs are often ambiguous, and GANs are prone to the mode collapse, which lead the poor generalization abilities [...] Read more.
Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) have been widely used in hyperspectral image classification (HSIC) tasks. However, the generated HSI virtual samples by VAEs are often ambiguous, and GANs are prone to the mode collapse, which lead the poor generalization abilities ultimately. Moreover, most of these models only consider the extraction of spectral or spatial features. They fail to combine the two branches interactively and ignore the correlation between them. Consequently, the variational generative adversarial network with crossed spatial and spectral interactions (CSSVGAN) was proposed in this paper, which includes a dual-branch variational Encoder to map spectral and spatial information to different latent spaces, a crossed interactive Generator to improve the quality of generated virtual samples, and a Discriminator stuck with a classifier to enhance the classification performance. Combining these three subnetworks, the proposed CSSVGAN achieves excellent classification by ensuring the diversity and interacting spectral and spatial features in a crossed manner. The superior experimental results on three datasets verify the effectiveness of this method. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

22 pages, 18646 KiB  
Article
An Attention-Guided Multilayer Feature Aggregation Network for Remote Sensing Image Scene Classification
by Ming Li, Lin Lei, Yuqi Tang, Yuli Sun and Gangyao Kuang
Remote Sens. 2021, 13(16), 3113; https://doi.org/10.3390/rs13163113 - 6 Aug 2021
Cited by 12 | Viewed by 2454
Abstract
Remote sensing image scene classification (RSISC) has broad application prospects, but related challenges still exist and urgently need to be addressed. One of the most important challenges is how to learn a strong discriminative scene representation. Recently, convolutional neural networks (CNNs) have shown [...] Read more.
Remote sensing image scene classification (RSISC) has broad application prospects, but related challenges still exist and urgently need to be addressed. One of the most important challenges is how to learn a strong discriminative scene representation. Recently, convolutional neural networks (CNNs) have shown great potential in RSISC due to their powerful feature learning ability; however, their performance may be restricted by the complexity of remote sensing images, such as spatial layout, varying scales, complex backgrounds, category diversity, etc. In this paper, we propose an attention-guided multilayer feature aggregation network (AGMFA-Net) that attempts to improve the scene classification performance by effectively aggregating features from different layers. Specifically, to reduce the discrepancies between different layers, we employed the channel–spatial attention on multiple high-level convolutional feature maps to capture more accurately semantic regions that correspond to the content of the given scene. Then, we utilized the learned semantic regions as guidance to aggregate the valuable information from multilayer convolutional features, so as to achieve stronger scene features for classification. Experimental results on three remote sensing scene datasets indicated that our approach achieved competitive classification performance in comparison to the baselines and other state-of-the-art methods. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

21 pages, 4560 KiB  
Article
Learning the Incremental Warp for 3D Vehicle Tracking in LiDAR Point Clouds
by Shengjing Tian, Xiuping Liu, Meng Liu, Yuhao Bian, Junbin Gao and Baocai Yin
Remote Sens. 2021, 13(14), 2770; https://doi.org/10.3390/rs13142770 - 14 Jul 2021
Cited by 3 | Viewed by 2529
Abstract
Object tracking from LiDAR point clouds, which are always incomplete, sparse, and unstructured, plays a crucial role in urban navigation. Some existing methods utilize a learned similarity network for locating the target, immensely limiting the advancements in tracking accuracy. In this study, we [...] Read more.
Object tracking from LiDAR point clouds, which are always incomplete, sparse, and unstructured, plays a crucial role in urban navigation. Some existing methods utilize a learned similarity network for locating the target, immensely limiting the advancements in tracking accuracy. In this study, we leveraged a powerful target discriminator and an accurate state estimator to robustly track target objects in challenging point cloud scenarios. Considering the complex nature of estimating the state, we extended the traditional Lucas and Kanade (LK) algorithm to 3D point cloud tracking. Specifically, we propose a state estimation subnetwork that aims to learn the incremental warp for updating the coarse target state. Moreover, to obtain a coarse state, we present a simple yet efficient discrimination subnetwork. It can project 3D shapes into a more discriminatory latent space by integrating the global feature into each point-wise feature. Experiments on KITTI and PandaSet datasets showed that compared with the most advanced of other methods, our proposed method can achieve significant improvements—in particular, up to 13.68% on KITTI. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

20 pages, 3997 KiB  
Article
Improved YOLO Network for Free-Angle Remote Sensing Target Detection
by Yuhao Qing, Wenyi Liu, Liuyan Feng and Wanjia Gao
Remote Sens. 2021, 13(11), 2171; https://doi.org/10.3390/rs13112171 - 1 Jun 2021
Cited by 51 | Viewed by 6624
Abstract
Despite significant progress in object detection tasks, remote sensing image target detection is still challenging owing to complex backgrounds, large differences in target sizes, and uneven distribution of rotating objects. In this study, we consider model accuracy, inference speed, and detection of objects [...] Read more.
Despite significant progress in object detection tasks, remote sensing image target detection is still challenging owing to complex backgrounds, large differences in target sizes, and uneven distribution of rotating objects. In this study, we consider model accuracy, inference speed, and detection of objects at any angle. We also propose a RepVGG-YOLO network using an improved RepVGG model as the backbone feature extraction network, which performs the initial feature extraction from the input image and considers network training accuracy and inference speed. We use an improved feature pyramid network (FPN) and path aggregation network (PANet) to reprocess feature output by the backbone network. The FPN and PANet module integrates feature maps of different layers, combines context information on multiple scales, accumulates multiple features, and strengthens feature information extraction. Finally, to maximize the detection accuracy of objects of all sizes, we use four target detection scales at the network output to enhance feature extraction from small remote sensing target pixels. To solve the angle problem of any object, we improved the loss function for classification using circular smooth label technology, turning the angle regression problem into a classification problem, and increasing the detection accuracy of objects at any angle. We conducted experiments on two public datasets, DOTA and HRSC2016. Our results show the proposed method performs better than previous methods. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

Other

Jump to: Research

14 pages, 1257 KiB  
Technical Note
NDFTC: A New Detection Framework of Tropical Cyclones from Meteorological Satellite Images with Deep Transfer Learning
by Shanchen Pang, Pengfei Xie, Danya Xu, Fan Meng, Xixi Tao, Bowen Li, Ying Li and Tao Song
Remote Sens. 2021, 13(9), 1860; https://doi.org/10.3390/rs13091860 - 10 May 2021
Cited by 29 | Viewed by 3661
Abstract
Accurate detection of tropical cyclones (TCs) is important to prevent and mitigate natural disasters associated with TCs. Deep transfer learning methods have advantages in detection tasks, because they can further improve the stability and accuracy of the detection model. Therefore, on the basis [...] Read more.
Accurate detection of tropical cyclones (TCs) is important to prevent and mitigate natural disasters associated with TCs. Deep transfer learning methods have advantages in detection tasks, because they can further improve the stability and accuracy of the detection model. Therefore, on the basis of deep transfer learning, we propose a new detection framework of tropical cyclones (NDFTC) from meteorological satellite images by combining the deep convolutional generative adversarial networks (DCGAN) and You Only Look Once (YOLO) v3 model. The algorithm process of NDFTC consists of three major steps: data augmentation, a pre-training phase, and transfer learning. First, to improve the utilization of finite data, DCGAN is used as the data augmentation method to generate images simulated to TCs. Second, to extract the salient characteristics of TCs, the generated images obtained from DCGAN are inputted into the detection model YOLOv3 in the pre-training phase. Furthermore, based on the network-based deep transfer learning method, we train the detection model with real images of TCs and its initial weights are transferred from the YOLOv3 trained with generated images. Training with real images helps to extract universal characteristics of TCs and using transferred weights as initial weights can improve the stability and accuracy of the model. The experimental results show that the NDFTC has a better performance, with an accuracy (ACC) of 97.78% and average precision (AP) of 81.39%, in comparison to the YOLOv3, with an ACC of 93.96% and AP of 80.64%. Full article
(This article belongs to the Special Issue Deep Learning and Computer Vision in Remote Sensing)
Show Figures

Graphical abstract

Back to TopTop