An Effective Res-Progressive Growing Generative Adversarial Network-Based Cross-Platform Super-Resolution Reconstruction Method for Drone and Satellite Images

Han, Hao; Du, Wen; Feng, Ziyi; Guo, Zhonghui; Xu, Tongyu

doi:10.3390/drones8090452

Open AccessArticle

An Effective Res-Progressive Growing Generative Adversarial Network-Based Cross-Platform Super-Resolution Reconstruction Method for Drone and Satellite Images

by

Hao Han

¹,

Wen Du

^1,2,3,

Ziyi Feng

^1,2,3,*,

Zhonghui Guo

¹ and

Tongyu Xu

^1,2,3,*

¹

College of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang 110866, China

²

National Digital Agriculture Regional Innovation Center (Northeast), Shenyang 110866, China

³

High-Resolution Earth Observation System, Liaoning Forest and Grass Resources and Environment Remote Sensing Research and Application Center, Shenyang 110866, China

^*

Authors to whom correspondence should be addressed.

Drones 2024, 8(9), 452; https://doi.org/10.3390/drones8090452

Submission received: 9 July 2024 / Revised: 27 August 2024 / Accepted: 31 August 2024 / Published: 1 September 2024

(This article belongs to the Section Drones in Agriculture and Forestry)

Download

Browse Figures

Versions Notes

Abstract

In recent years, accurate field monitoring has been a research hotspot in the domains of aerial remote sensing and satellite remote sensing. In view of this, this study proposes an innovative cross-platform super-resolution reconstruction method for remote sensing images for the first time, aiming to make medium-resolution satellites capable of field-level detection through a super-resolution reconstruction technique. The progressive growing generative adversarial network (PGGAN) model, which has excellent high-resolution generation and style transfer capabilities, is combined with a deep residual network, forming the Res-PGGAN model for cross-platform super-resolution reconstruction. The Res-PGGAN architecture is similar to that of the PGGAN, but includes a deep residual module. The proposed Res-PGGAN model has two main benefits. First, the residual module facilitates the training of deep networks, as well as the extraction of deep features. Second, the PGGAN structure performs well in cross-platform sensor style transfer, allowing for cross-platform high-magnification super-resolution tasks to be performed well. A large pre-training dataset and real data are used to train the Res-PGGAN to improve the resolution of Sentinel-2’s 10 m resolution satellite images to 0.625 m. Three evaluation metrics, including the structural similarity index metric (SSIM), the peak signal-to-noise ratio (PSNR), and the universal quality index (UQI), are used to evaluate the high-magnification images obtained by the proposed method. The images generated by the proposed method are also compared with those obtained by the traditional bicubic method and two deep learning super-resolution reconstruction methods: the enhanced super-resolution generative adversarial network (ESRGAN) and the PGGAN. The results indicate that the proposed method outperforms all the comparison methods and demonstrates an acceptable performance regarding all three metrics (SSIM/PSNR/UQI: 0.9726/44.7971/0.0417), proving the feasibility of cross-platform super-resolution image recovery.

Keywords:

deep learning; unmanned aerial vehicle (UAV) image; precision agriculture; super-resolution (SR)

1. Introduction

With the development of remote sensing technology and hardware improvements, monitoring field crops using various types of sensing devices carried by drones and satellites has become the mainstream method [1,2,3,4]. Satellite remote sensing has the advantage of wide-range imaging and has been widely used in many fields, including crop inversion, monitoring, and yield estimation [5,6,7]. The spatial resolution of multispectral satellites is often low, and high-spatial-resolution multispectral images are costly to acquire and limited in width, making accurate monitoring extremely challenging. The rapid development of drones in recent years has made them an important tool in the field of smart agriculture. As an essential part of agricultural remote sensing, drones can carry a variety of sensors, such as visible light, near-infrared, thermal infrared, and LiDAR sensors, which can provide a large amount of near-ground remote sensing data on farmland. The application of drones to the basic condition monitoring of arable land has the advantages of simple maintenance, fast data acquisition, and a short operation cycle and can achieve fine monitoring at the field level compared to that of satellite remote sensing [4,8,9,10,11,12]. However, its detection range is usually limited by the battery capacity of the vehicle and the flight altitude, and it is not possible to obtain information on crop growth in a large monitoring area at a low cost. In the field of computer vision, the super-resolution reconstruction of an image is a technique used to upgrade a low-resolution image to a high-resolution image [13]. With the rapid development in the field of computer vision, models that use generative adversarial networks as the backbone have achieved unprecedentedly good results in super-resolution reconstruction tasks. To solve the problems that drones are unable to perform large-scale detection and satellite remote sensing is unable to perform fine monitoring at the field scale, this study proposes a cross-platform super-resolution reconstruction method of remotely sensed images based on asymptotic growth generative adversarial networks.

The super-resolution methods were initially based on interpolation methods and evolved through image degradation modeling toward learning-based methods [14]. The learning-based methods mainly learn the correspondence between high- and low-resolution image pairs using certain algorithms, and then use the low-resolution images to reconstruct the high-resolution images through relationship mapping methods, including sample-based [15,16], local embedding-based [17,18,19], sparse coding-based [20], and deep learning-based methods [21,22,23]. In recent years, there has been significant progress in the super-resolution imaging methods based on deep learning. The main idea is to feed data directly to the neural network input so that the network can learn the end-to-end mapping relationship between image pairs. In 2014, Chao Dong et al. [24] applied deep learning to single-frame image super-resolution reconstruction and proposed a deep learning-based super-resolution reconstruction framework named the super-resolution convolutional neural network (SRCNN). The SRCNN was trained by a three-layer convolutional network to obtain an end-to-end mapping model, accomplishing low-to-high-resolution image reconstruction. In 2017, Ledig et al. [25] proposed image super-resolution using a deep convolutional network (SRGAN) and applied generative adversarial methods to image super-resolution reconstruction for the first time. The SRGAN method has a perceptual loss function, which combines the adversarial and content losses, and uses a discriminative network to identify whether an image is a natural high-definition image or a reconstructed image. The SRGAN also achieved, for the first time, the generation of natural, clear, zoomed-in images with a quadruple zoom factor. In 2018, Wang et al. [26] optimized the SRGAN network and proposed the enhanced super-resolution generative adversarial network (ESRGAN) model, which used residual-in-residual dense blocks (RRDBs) as the network building blocks. The ESRGAN reduced the artifacts of the SRGAN-generated images and achieved good four-fold super-resolution results. The success of the Transformer model [27,28,29] in the field of natural language processing has attracted great research attention from computer researchers. The TTSR is a typical Transformer model consisting of only an attention mechanism [29]; this model adopts the RefSR technique and employs the attention mechanism to transfer high-resolution texture features to alleviate the problem of reconstructed image blurring in SISR reconstruction. Transformer-based super-resolution networks typically have high computational complexity and often large GPU memories, which cannot achieve the several-fold resolution upgrade under regular conditions. There have also been a number of studies on high-fold pixel upgrading beyond the typical upgrading factor of 2× or 4×. Dahl et al. [30] proposed a full probabilistic pixel recursive network for upsampling very coarse images with a resolution of 8 × 8. The RFB-ESRGAN is based on the ESRGAN and applies the receptive field blocks (RFBs) of 16-fold super-resolution images to extract multi-scale information and enhance feature distinguishability. Chan et al. [31] achieved good results in high-magnification super-resolution image processing by using a pre-trained StyleGAN model for style transfer between the encoder and decoder. The progressive growing generative adversarial network (PGGAN) model, as a predecessor of the StyleGAN model, focuses more on high-resolution image generation and can be trained incrementally to generate high-quality face images with a resolution of 1024 × 1024 pixels from images with a resolution of 4 × 4 pixels; it also has a good style transfer capability. In recent years, many high-resolution generative networks have been combined with denoising probabilistic models, but the obtained results were similar to those of the PGGAN model [32].

Inspired by deep residual learning [33] and the PGGAN model [34], this study proposes the Res-PGGAN model, which takes advantage of both deep residual learning and the PGGAN architecture. The difference between the proposed Res-PGGAN and the original PGGAN network is threefold. First, residual units are added to the proposed model and used as pre-module blocks. Second, a deep residual-block-like structure is used at each resolution level. Third, the structure of the residual module is redesigned to make it suitable for high-magnification resolution reconstruction.

2. Methods

(1): PGGAN

For the generative adversarial network (GAN) model, it is challenging to conduct mapping from a latent code to 1024 × 1024-resolution samples directly and generate high-resolution images. In the process of image generation, the discriminator can easily recognize “fake” images, so the generator is difficult to train. To overcome this shortcoming, in 2018, Tero Karras et al. [34] proposed the PGGAN model to realize layer-by-layer training. The PGGAN allows for generating high-quality, high-resolution images through procedural generation. The PGGAN continues to deepen the network as the training process proceeds, and the network structure is constantly adjusted. The main advantage of this approach is that most of the iterations of the PGGAN are performed at lower resolutions, which increases the training speed by a factor of 2–6 compared to that of the traditional GANs. The PGGAN first creates the main part of an image by learning the basic features that can be shown even in a low-resolution image, and then it learns more details as the resolution increases and time passes. Because the previous layers have been trained each time, the PGGAN focuses on the layers added after training, so increasing the resolution will not make training of the added layers more difficult. Therefore, apart from being simple and fast, training using low-resolution images also helps in higher-level image training, making the overall training process faster. Each unit of the PGGAN consists of a 3 × 3 convolutional layer and an LReLU layer. Between the training levels, a pooling operation is used to upsample the size of feature maps, upgrading the resolution in a level-by-level manner. Further, a minibatch standard deviation layer is added to the discriminator output, allowing the generative network to obtain diverse gradient directions, which is one of the reasons for its good style transfer capability.

(2): Res-PGGAN

The proposed deep Pes-PGGAN is a generative adversarial network for cross-platform super-resolution reconstruction, which combines the advantages of the PGGAN and residual neural networks. This combination provided three benefits: (1) the gradient vanishing of small samples is avoided; (2) the residual blocks optimize the network training process at high magnification; (3) the extraction of deep features of images is facilitated and performed using only a few pixel points. This method can achieve improved results in high-magnification super-resolution reconstruction.

The Res-PGGAN structure is shown in Figure 1. Like most GAN models, the Res-PGGAN model consists of two parts: the generator and the discriminator. The generator first extracts the deep features from images with a resolution of 8 × 8, and then trains them layer-by-layer and generates 128 × 128 images, which represents 16-fold ultra-high-multiple resolution reconstruction. The structure consisting of 16 residual blocks with a pixel normalization (PN) layer is added to the generator to enhance the extraction and representation effects of deep features. The PN layer is defined by Equation (1), where x represents the tensor, and the ε value is set to 1 × 10⁻⁸. A three-unit structure similar to that of the residual blocks is used in the resolution layers, where each block consists of a convolutional layer, an LRelu layer, and a PN layer, which aids in the deep transfer of network parameters. Accordingly, each resolution level of the discriminator is decoded in a level-by-level manner, and each resolution level consists of three units containing a convolutional layer and an LReLU layer. Further, the resolution levels are connected through downsampling. After the last level of the decoding path, the loss value is computed using 1 × 1 convolution and dense generation mapping.

{PN}_{x} = x_{m} \sqrt[\frac{1}{2}]{\frac{({x_{1}}^{2} + {x_{2}}^{2} \dots + {x_{n}}^{2})}{n} + ε}

(1)

3. Experiment Details

3.1. Datasets

3.1.1. Real Dataset

In this study, the Liaozhong Kalima Rice Experiment Station of Shenyang Agricultural University was selected as the super-resolution reconstruction test area. The station is in Kalima Village, Chengjiao Street, Liaozhong District, Shenyang City, and has a latitude of 41°12′ N and a longitude of 122°23′ E. It is positioned in the lower basin of Liaohe River, in the alluvial plain of Liaohe River and Hunhe River, which has fertile soil, four distinct seasons, abundant water resources, and sufficient sunshine. This area has a northern, temperate, semi-humid, monsoon, continental climate zone with an annual average temperature of 8 °C, rainfall of 640 mm, and a frost-free period of 171 days. Therefore, it is suitable for the growth of varieties of crops. Several rice varieties were planted in the fields, as shown in Figure 2; the images of these varieties of rice differ greatly. A true 16× super-resolution reconstruction dataset was constructed through the geo-alignment of the images of the Liaozhong Kalima Rice Station acquired using a DJI M300 RTK drone [35,36] during the rice growing season with 10 m resolution images collected by the Sentinel-2 satellite during the same period (13 September 2023), as shown in Figure 3 and Figure 4. The specific parameters of the UAVs and satellites are shown in Table 1 and Table 2.

3.1.2. Pre-Training Dataset

In the process of high-magnification super-resolution reconstruction, valuable training samples are very limited due to the large difference in the width between the drone and satellite images. The GAN generates data from the perspective of data distribution that is similar to real data distribution through competition between the generator and the discriminator, which is undesirable when there are fewer samples of real data. This is because in the case of a small amount of data the GAN network cannot achieve a satisfactory Nash equilibrium and can easily fall into mode collapse. In actual experiments, an image cannot be reconstructed based on the data distribution of an ultra-small sample, and small samples are also prone to cause the overfitting of a GAN network. Therefore, this study adopts the idea of pre-training (i.e., transfer learning [37]) in the experiment. Through large-scale pre-training, the generative network can learn more information on data and distributions, which can solve the problems caused by very small samples. In this study, the 10 m resolution satellite image was first bicubically [38] downsampled to 160 m resolution to construct the pre-training dataset. More than 39,000 pre-training sample pairs were collected in the complex feature area, covering a wide range of features, including paddy fields, roads, houses, rivers, and wetlands. Figure 5 shows the selection and sampling details of the sample points.

3.2. Environments and Parameters

The dataset used for model pre-training in the experiment included 39,000 training sample pairs, and the real dataset had 1186 sample pairs, each of which consisted of a low-resolution image cropped to the size of 8 × 8 pixels, and the corresponding high-resolution sample pair was cropped to the size of 128 × 128 pixels for 16-fold super-resolution training. Theoretically, the proposed network could crop image pairs to any size.

The proposed generative adversarial network model was implemented using Tensorflow [39], and the Res-PGGAN model was trained with a small batch with a size of four on an NVIDIA Quadro P5000 with CUDA version 12.0 and 16 GB of RAM. The learning rate was set to 0.0002; 400 training epochs were conducted on the pre-training dataset, and 800 epochs were conducted on the real dataset. The training process was performed until the generated images achieved the best quality and could not be further improved [34]. The experimental environment and specific parameters are shown in Table 3 and Table 4.

3.3. Evaluation Metrics

To evaluate the effectiveness of the SRR quantitatively, this study calculated the peak signal-to-noise ratio (PSNR) [40,41] and the structural similarity index metric (SSIM) [41,42] of the non-overlapping portion of images in the dataset, as well as the universal quality index (UQI) [43]. The three selected metrics are commonly used in the evaluation of the image adjustment effect. The metrics were averaged across all the bands to obtain the final values. The larger the PSNR (unit, dB) value was, the higher the quality of the reconstructed HR images was. The PSNR was calculated as follows:

{PSNR = 10 \times \log (255}^{2} / MSE)

(2)

The SSIM is one of the main evaluation criteria for evaluating the correlation between pixels in an image and has become a mainstream image quality evaluation criterion in recent years. The SSIM is calculated as follows:

SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{({μ_{x}}^{2} + 2 μ_{y} + C_{1}) ({σ_{x}}^{2} + {σ_{y}}^{2} + C_{2})}

(3)

The UQI is a metric proposed by Wang et al. [43] in 2005 to evaluate image quality. It represents a combination of correlation loss, luminance distortion, and contrast between images and is commonly used to evaluate image quality, similar to the PSNR and SSIM metrics. The UQI is calculated as follows:

U Q I = \frac{4 σ_{x y} \bar{x} \bar{y}}{({σ_{x}}^{2} + {σ_{y}}^{2}) [{(\bar{x})}^{2} + {(\bar{y})}^{2}]}

(4)

4. Results and Discussions

The overall reconstruction effect is shown in Figure 6. The results indicated that the proposed method performed well in resolution upgrade, detail restoration, and style transfer. Figure 7 shows the results of a small-area comparison of the traditional bicubic-based method, the deep learning-based ESRGAN, the PGGAN, and the proposed Res-PGGAN. The qualitative analysis of the comparison images indicated that the three deep learning-based reconstruction methods performed significantly better than the traditional bicubic-based method, and the high-magnification super-resolution images generated by the Res-PGGAN method were the closest to the original images visually than those obtained by the other methods. Further, the proposed method could achieve good resolution reconstruction and style transfer for cross-platform heterogeneous sensors. In addition, in cross-platform super-resolution reconstruction, the planting details of the fields and the field paths that could not be distinguished in the Sentinel-2 images were suitably recovered by the proposed method. The planting greenhouses were recovered through super-resolution reconstruction by the Res-PGGAN method, but were almost impossible to differentiate from those obtained by the bicubic, ESRGAN, and PGGAN methods. Multiple comparison images show that the proposed method provided clearer results than the other two deep learning-based methods. Namely, the addition of the PN in the proposed method avoided extreme weights and improved the stability of the GAN model. Moreover, by combining the deep residual blocks of the PN, the complex features of different image elements could be propagated through the network, allowing the proposed method to generate super-resolution images with fewer artifacts. As a result, the edges of the buildings were more prominent, and the details were better reconstructed. In particular, in the complex field part, as shown in the third row of Figure 6, the proposed method could effectively restore the cultivation of the field. Also, the proposed method achieved better results in the mixed areas of fields, roads, and buildings compared to those of the other methods. However, although the proposed method outperformed the other two deep learning-based methods regarding several metrics, its performance in restoring complex field planting details could be further improved.

The quantitative evaluation results of the comparison algorithms and the proposed algorithm are shown in Table 5, where it can be seen that the proposed algorithm achieved the best results among all the algorithms regarding all the three evaluation metrics: the SSIM, the PSNR, and UIQ. To prevent over-generalization and draw general conclusions, the final value of the experimental results is derived from the experimental results of 10 super-resolution reconstructions, and compared with the Res-PGGAN using BatchNorm, the Res-PGGAN with the PN layer has a certain improvement in various indicators. The proposed Res-PGGAN improved the SSIM, PSNR, and UIQ metrics by 0.0062 and 0.0039, 0.9266 and 0.4742, and 0.0151 and 0.0005 compared to those of the ESRGAN and PGGAN methods, respectively; among them, the improvement of the SSIM and UQI indicators is more significant. Quantitative comparison further proved that the proposed method achieved the best result in 16-fold cross-platform image reconstruction among all the tested methods.

5. Conclusions

UAV image acquisition is usually limited by the flight altitude, wind power, and battery power; medium-resolution satellite images cover a wide range, but the existing hardware conditions cannot carry out high-precision field observations. This study proposes an innovative super-resolution reconstruction method for cross-platform remote sensing images, which can achieve the high-magnification resolution reconstruction of cross-platform heterogeneous remote sensing images obtained from drones and satellites. This is a topic that has not been discussed by previous researchers. In addition, the generative adversarial network named the Res-PGGAN, which can implement cross-platform super-resolution reconstruction, is designed. This network combines the advantages of residual learning, PGGAN high-resolution generation, and style transfer. Moreover, information propagation through the generator is facilitated by deep residual units, enabling the generative network to obtain the deep features of images. The proposed Res-PGGAN network is verified on different datasets. Qualitative and quantitative analyses show that the proposed method is significantly superior to the ESRGAN and PGGAN methods and achieves a better balance in complexity and training time than the Transformer-based method. The cross-platform and cross-sensor resolution reconstruction of remote sensing images is performed, and using this reconstruction, along with model training with more samples, can allow for medium-resolution satellites to detect field crops accurately, as shown in the experimental areas we selected. The images reconstructed with high-power resolution can better distinguish the rice growth trends of different fields.

The current national farming models encourage large-scale mechanized farming, and farmland is often very large. Through cross-platform super-resolution reconstruction that combines the multiple advantages of UAV remote sensing and satellite remote sensing, researchers can obtain images of large areas of a study area with high spatiotemporal resolution, especially in the agricultural field. The stitching method adopted by UAV images when obtaining large-scale images may not be able to find similarities, and they cannot be accurately stitched, resulting in the phenomenon of missing images, which can be filled by the super-resolution reconstruction of satellite images. The proposed method can also reduce the cost of continuously acquiring low-altitude drone images over large areas to a certain extent. However, wider collaborative applications between UAVs and satellites should be considered in future work. Although the super-resolution reconstruction method can restore the original information of the planting area to the greatest extent, there are still some reconstructed fuzzy areas, which depends on the further development of the super-resolution algorithm, such as the use of generative diffusion models, based on which high-magnification resolution upgrades have recently yielded surprising results.

Author Contributions

Conceptualization, H.H. and Z.F.; methodology, H.H.; software, H.H.; validation, H.H. and Z.F. investigation, H.H.; resources, W.D.; data curation, H.H. and Z.G.; writing—original draft preparation, H.H.; writing—review and editing, H.H.; visualization, H.H.; supervision, W.D. and Z.F.; project administration, Z.F. and T.X.; funding acquisition, T.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Applied Basic Research Program of Liaoning Province (2023JH2/101300120).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to their continued use for other research projects.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Weiss, M.; Jacob, F.; Duveiller, G. Remote Sensing for Agricultural Applications: A Meta-Review. Remote Sens. Environ. 2020, 236, 111402. [Google Scholar] [CrossRef]
Deng, L.; Mao, Z.; Li, X.; Hu, Z.; Duan, F.; Yan, Y. UAV-Based Multispectral Remote Sensing for Precision Agriculture: A Comparison between Different Cameras. ISPRS J. Photogramm. Remote Sens. 2018, 146, 124–136. [Google Scholar] [CrossRef]
Chiu, M.S.; Wang, J. Evaluation of Machine Learning Regression Techniques for Estimating Winter Wheat Biomass Using Biophysical, Biochemical, and UAV Multispectral Data. Drones 2024, 8, 287. [Google Scholar] [CrossRef]
Tanaka, T.S.T.; Wang, S.; Jørgensen, J.R.; Gentili, M.; Vidal, A.Z.; Mortensen, A.K.; Acharya, B.S.; Beck, B.D.; Gislum, R. Review of Crop Phenotyping in Field Plot Experiments Using UAV-Mounted Sensors and Algorithms. Drones 2024, 8, 212. [Google Scholar] [CrossRef]
Segarra, J.; Buchaillot, M.L.; Araus, J.L.; Kefauver, S.C. Remote Sensing for Precision Agriculture: Sentinel-2 Improved Features and Applications. Agronomy 2020, 10, 641. [Google Scholar] [CrossRef]
Nakalembe, C.; Becker-Reshef, I.; Bonifacio, R.; Hu, G.; Humber, M.L.; Justice, C.J.; Keniston, J.; Mwangi, K.; Rembold, F.; Shukla, S.; et al. A Review of Satellite-Based Global Agricultural Monitoring Systems Available for Africa. Glob. Food Secur. 2021, 29, 100543. [Google Scholar] [CrossRef]
Pettorelli, N. Satellite Remote Sensing to Support Agriculture and Forestry. In Satellite Remote Sensing and the Management of Natural Resources; Pettorelli, N., Ed.; Oxford University Press: Oxford, UK, 2019; ISBN 978-0-19-871726-3. [Google Scholar]
Blekanov, I.; Molin, A.; Zhang, D.; Mitrofanov, E.; Mitrofanova, O.; Li, Y. Monitoring of Grain Crops Nitrogen Status from Uav Multispectral Images Coupled with Deep Learning Approaches. Comput. Electron. Agric. 2023, 212, 108047. [Google Scholar] [CrossRef]
Inoue, Y. Satellite- and Drone-Based Remote Sensing of Crops and Soils for Smart Farming—A Review. Soil Sci. Plant Nutr. 2020, 66, 798–810. [Google Scholar] [CrossRef]
Benami, E.; Jin, Z.; Carter, M.R.; Ghosh, A.; Hijmans, R.J.; Hobbs, A.; Kenduiywo, B.; Lobell, D.B. Uniting Remote Sensing, Crop Modelling and Economics for Agricultural Risk Management. Nat. Rev. Earth Environ. 2021, 2, 140–159. [Google Scholar] [CrossRef]
Ali, A.M.; Abouelghar, M.; Belal, A.A.; Saleh, N.; Yones, M.; Selim, A.I.; Amin, M.E.S.; Elwesemy, A.; Kucher, D.E.; Maginan, S.; et al. Crop Yield Prediction Using Multi Sensors Remote Sensing (Review Article). Egypt. J. Remote Sens. Space Sci. 2022, 25, 711–716. [Google Scholar] [CrossRef]
Sun, Z.; Wang, X.; Wang, Z.; Yang, L.; Xie, Y.; Huang, Y. UAVs as Remote Sensing Platforms in Plant Ecology: Review of Applications and Challenges. J. Plant Ecol. 2021, 14, 1003–1023. [Google Scholar] [CrossRef]
Yue, L.; Shen, H.; Li, J.; Yuan, Q.; Zhang, H.; Zhang, L. Image Super-Resolution: The Techniques, Applications, and Future. Signal Process. 2016, 128, 389–408. [Google Scholar] [CrossRef]
Park, S.C.; Park, M.K.; Kang, M.G. Super-Resolution Image Reconstruction: A Technical Overview. IEEE Signal Process. Mag. 2003, 20, 21–36. [Google Scholar] [CrossRef]
Jia, S.; Han, B.; Kutz, J.N. Example-Based Super-Resolution Fluorescence Microscopy. Sci. Rep. 2018, 8, 5700. [Google Scholar] [CrossRef] [PubMed]
Freeman, W.T.; Jones, T.R.; Pasztor, E.C. Example-Based Super-Resolution. IEEE Comput. Graph. Appl. 2002, 22, 56–65. [Google Scholar] [CrossRef]
Chang, H.; Yeung, D.-Y.; Xiong, Y. Super-Resolution through Neighbor Embedding. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 27 June–2 July 2004; Volume 1, p. I. [Google Scholar]
Chan, T.-M.; Zhang, J.; Pu, J.; Huang, H. Neighbor Embedding Based Super-Resolution Algorithm through Edge Detection and Feature Selection. Pattern Recognit. Lett. 2009, 30, 494–502. [Google Scholar] [CrossRef]
Pan, L.; Peng, G.; Yan, W.; Zheng, H. Single Image Super Resolution Based on Multiscale Local Similarity and Neighbor Embedding. Neurocomputing 2016, 207, 250–263. [Google Scholar] [CrossRef]
Zhao, J.; Chen, C.; Zhou, Z.; Cao, F. Single Image Super-Resolution Based on Adaptive Convolutional Sparse Coding and Convolutional Neural Networks. J. Vis. Commun. Image Represent. 2019, 58, 651–661. [Google Scholar] [CrossRef]
Ha, V.K.; Ren, J.-C.; Xu, X.-Y.; Zhao, S.; Xie, G.; Masero, V.; Hussain, A. Deep Learning Based Single Image Super-Resolution: A Survey. Int. J. Autom. Comput. 2019, 16, 413–426. [Google Scholar] [CrossRef]
Yang, J.; Wright, J.; Huang, T.S.; Ma, Y. Image Super-Resolution Via Sparse Representation. IEEE Trans. Image Process. 2010, 19, 2861–2873. [Google Scholar] [CrossRef]
Wang, Z.; Chen, J.; Hoi, S.C.H. Deep Learning for Image Super-Resolution: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3365–3387. [Google Scholar] [CrossRef]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar]
Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Loy, C.C. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Computer Vision—ECCV 2018 Workshops; Leal-Taixé, L., Roth, S., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2019; Volume 11133, pp. 63–79. ISBN 978-3-030-11020-8. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Yang, F.; Yang, H.; Fu, J.; Lu, H.; Guo, B. Learning Texture Transformer Network for Image Super-Resolution. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5790–5799. [Google Scholar]
Dahl, R.; Norouzi, M.; Shlens, J. Pixel Recursive Super Resolution. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Shang, T.; Dai, Q.; Zhu, S.; Yang, T.; Guo, Y. Perceptual Extreme Super Resolution Network with Receptive Field Block. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1778–1787. [Google Scholar]
Chan, K.C.K.; Wang, X.; Xu, X.; Gu, J.; Loy, C.C. GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 6840–6851. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv 2018, arXiv:1710.10196v3. [Google Scholar]
DJI M300 RTK. Available online: https://airborne.ed.ac.uk/airborne-research-and-innovation/unmanned-aircraft-systems-uas/unmanned-aircraft-systems-fleet/dji-m300-rtk (accessed on 28 February 2024).
Matrice 300 RTK—Industrial Grade Mapping Inspection Drones—DJI Enterprise. Available online: https://enterprise.dji.com/matrice-300 (accessed on 4 March 2024).
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Hall, C.A. Natural Cubic and Bicubic Spline Interpolation. SIAM J. Numer. Anal. 1973, 10, 1055–1060. [Google Scholar] [CrossRef]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv 2016, arXiv:1603.04467. [Google Scholar]
Tanchenko, A. Visual-PSNR Measure of Image Quality. J. Vis. Commun. Image Represent. 2014, 25, 874–878. [Google Scholar] [CrossRef]
Horé, A.; Ziou, D. Image Quality Metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
Bakurov, I.; Buzzelli, M.; Schettini, R.; Castelli, M.; Vanneschi, L. Structural Similarity Index (SSIM) Revisited: A Data-Driven Approach. Expert Syst. Appl. 2022, 189, 116087. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]

Figure 1. The overall structure of the Res-PGGAN.

Figure 2. A schematic of the study area.

Figure 3. A part of the Sentinel2 and drone images selected to construct the real dataset.

Figure 4. The training image pairs of the Sentinel2 and drone images.

Figure 5. Preparation of pre-trained datasets and selection of sample points.

Figure 6. Comparison of the rice station (a) original Sentinel-2 image; (b) PGGAN-reconstructed image; (c) Res-PGGAN-reconstructed image; and (d) original drone image.

Figure 7. Comparison of details reconstructed by different methods (from left to right: original Sentinel-2 image, bicubic reconstruction result, ESRGAN reconstruction result, PGGAN reconstruction result, Res-PGGAN reconstruction result, and original drone image).

Table 1. The specific parameters of DJI M300.

Drone Parameters	Indicators
Size	Dimensions (unfolded, excluding propellers): 810 × 670 × 430 mm (L × W × H)
Maximum take-off weight	9 kg
RTK position accuracy	In RTK FIX: 1 cm + 1 ppm (horizontal) 1.5 cm + 1 ppm (vertical)
Maximum flight altitude	5000 m
Maximum flight time	55 min
GNSS	GPS + GLONASS + BeiDou + Galileo
Operating ambient temperature	−20°C–50°C
Maximum signal distance	NCC/FCC: 15 km CE/MIC: 8 km SRRC: 8 km
GSD/flight altitude (in this work)	0.0163 cm/50.83 m

Table 2. The specific parameters of Sentinel-2.

Satellite Parameters	Indicators
Date of launch	23 June 2015
Revisiting Period	5 days (Sentinel-2A&B)
Image resolution	Bands 2,3,4,8: 10 m Bands 5,6,7,8a,11,12: 20 m Bands 1,9,10: 60 m
Swath/field of view	290 km/20.6°
Altitude	sun-synchronous orbit (786 km)

Table 3. Experiment environments for this study.

OS	WSL2-Ubuntu20.04LST	CUDA	12.0
CPU	Intel Xeon Bronze 3204	CuDNN	8.2.4
GPU	NVIDIA Quardo P5000	Tensorflow	2.8.0
Python	3.9.0	OTB	7.2

Table 4. The parameter settings for training the Res-PGGAN.

Parameter	Value	Parameter	Value
vgg type	vgg19	batch size	4
vgg weight	0.0003	adam	0.0002
L1 weight	200	L2 weight	0.0
LR scale	0.0001	depth	64
HR scale	0.0001	epoc	50

Table 5. The evaluation metrics of the reconstruction results of the four methods.

	SSIM	PSNR	UIQ
Bicubic	0.8990	41.1225	−0.0086
ESRGAN	0.9664	43.8705	0.0266
PGGAN	0.9687	44.3229	0.0412
Res-PGGAN-BN	0.9710	44.7174	0.0375
Res-PGGAN-PN	0.9726 (±0.004)	44.7971 (±0.003)	0.0417 (±0.0003)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, H.; Du, W.; Feng, Z.; Guo, Z.; Xu, T. An Effective Res-Progressive Growing Generative Adversarial Network-Based Cross-Platform Super-Resolution Reconstruction Method for Drone and Satellite Images. Drones 2024, 8, 452. https://doi.org/10.3390/drones8090452

AMA Style

Han H, Du W, Feng Z, Guo Z, Xu T. An Effective Res-Progressive Growing Generative Adversarial Network-Based Cross-Platform Super-Resolution Reconstruction Method for Drone and Satellite Images. Drones. 2024; 8(9):452. https://doi.org/10.3390/drones8090452

Chicago/Turabian Style

Han, Hao, Wen Du, Ziyi Feng, Zhonghui Guo, and Tongyu Xu. 2024. "An Effective Res-Progressive Growing Generative Adversarial Network-Based Cross-Platform Super-Resolution Reconstruction Method for Drone and Satellite Images" Drones 8, no. 9: 452. https://doi.org/10.3390/drones8090452

APA Style

Han, H., Du, W., Feng, Z., Guo, Z., & Xu, T. (2024). An Effective Res-Progressive Growing Generative Adversarial Network-Based Cross-Platform Super-Resolution Reconstruction Method for Drone and Satellite Images. Drones, 8(9), 452. https://doi.org/10.3390/drones8090452

Article Menu

An Effective Res-Progressive Growing Generative Adversarial Network-Based Cross-Platform Super-Resolution Reconstruction Method for Drone and Satellite Images

Abstract

1. Introduction

2. Methods

3. Experiment Details

3.1. Datasets

3.1.1. Real Dataset

3.1.2. Pre-Training Dataset

3.2. Environments and Parameters

3.3. Evaluation Metrics

4. Results and Discussions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI