Next Article in Journal
Characteristics of Freeze–Thaw Cycles in an Endorheic Basin on the Qinghai-Tibet Plateau Based on SBAS-InSAR Technology
Previous Article in Journal
A Tracking Imaging Control Method for Dual-FSM 3D GISC LiDAR
Previous Article in Special Issue
Cloud-Based Monitoring and Evaluation of the Spatial-Temporal Distribution of Southeast Asia’s Mangroves Using Deep Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SRT: A Spectral Reconstruction Network for GF-1 PMS Data Based on Transformer and ResNet

1
The School of Software, Xinjiang University, Urumqi 830091, China
2
The Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830091, China
3
The Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
4
The Department of Geography, Beijing Normal University, Beijing 100875, China
5
The School of Geography and Remote Sensing Science, Xinjiang University, Urumqi 830049, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2022, 14(13), 3163; https://doi.org/10.3390/rs14133163
Submission received: 5 June 2022 / Revised: 24 June 2022 / Accepted: 28 June 2022 / Published: 1 July 2022

Abstract

:
The time of acquiring remote sensing data was halved after the joint operation of Gao Fen-6 (GF-6) and Gao Fen-1 (GF-1) satellites. Meanwhile, GF-6 added four bands, including the “red-edge” band that can effectively reflect the unique spectral characteristics of crops. However, GF-1 data do not contain these bands, which greatly limits their application to crop-related joint monitoring. In this paper, we propose a spectral reconstruction network (SRT) based on Transformer and ResNet to reconstruct the missing bands of GF-1. SRT is composed of three modules: (1) The transformer feature extraction module (TFEM) fully extracts the correlation features between spectra. (2) The residual dense module (RDM) reconstructs local features and avoids the vanishing gradient problem. (3) The residual global construction module (RGM) reconstructs global features and preserves texture details. Compared with competing methods, such as AWAN, HRNet, HSCNN-D, and M2HNet, the proposed method proved to have higher accuracy by a margin of the mean relative absolute error (MRAE) and root mean squared error (RMSE) of 0.022 and 0.009, respectively. It also achieved the best accuracy in supervised classification based on support vector machine (SVM) and spectral angle mapper (SAM).

1. Introduction

The GF-6 was successfully launched in 2018 as China’s first medium-high-resolution agricultural observation satellite, which cooperated with GF-1, China’s first high-resolution earth observation satellite that was launched in 2013. It can not only reduce the time of remote sensing data acquisition from 4 days to 2, but also significantly improve the ability to monitor agriculture, forestry, grassland, and other resources, providing remote sensing data support for agricultural and rural development, ecological civilization construction [1], and other significant needs. GF-6 also realized the localization of the 8-band CMOS detector and added the red-edge band that can effectively reflect the unique spectral characteristics of crops [2,3].
However, GF-1 was launched earlier and is mission-oriented differently, so it only contains four multispectral bands. Compared with the GF-6 satellite in Table 1, GF-1 lacks four bands (purple, yellow, red-edge I, and red-edge II bands), which greatly constrains its development for crop-related joint monitoring. So, we try to find a spectral reconstruction method to reconstruct the lacking four bands.
In recent years, spectral reconstruction mainly focused on RGB or multispectral to hyperspectral. Earlier researchers adopted the sparse dictionary method [4,5,6,7,8,9]. With the development of deep learning, owing to its excellent feature extraction and reconstruction capabilities, more and more researchers are adopting deep learning methods to gradually replace the traditional sparse dictionary approach [10,11,12,13,14,15,16].
In addition, it should be pointed out that most studies on spectral reconstruction focus on visible three bands (red, green, and blue) images, while remote sensing images usually contain at least four bands (red, green, blue, and nir). This results in the lack of one essential nir band as the input, which does not make full use of the original information, thereby leading to a waste of information. There are already some studies of remote sensing spectral reconstruction considering this problem [15,16]. Few studies have been conducted on large-scale and highly complex scenarios such as satellite remote sensing. On the contrary, most of them have only done performed research in a relatively small area [15]. Most deep learning methods adopt a lot of up-sampling, down-sampling, and nonlocal attention structure for ground images. Due to the large-scale, numerous, and complex ground objects of remote sensing images, these structures are difficult to play an excellent effect in the spectral reconstruction of remote sensing images [16].
To better adapt to the spectral reconstruction of remote sensing images, we propose a more suitable spectral reconstruction network (SRT) for GF-1 panchromatic and multispectral sensor (PMS) data based on Transformer and ResNet. This network includes a TFEM, the RDM, and the RGM. The first module contributes to the extraction of correlation characteristics between spectra. To avoid the vanishing gradient problem, the second module reconstructs these features nonlinearly at the local features. The third module, mainly used for the global reconstruction of these features, prevents loss of texture details. The main contributions of this article are summarized as follows:
  • We propose a spectral reconstruction network. The network trains on GF-6 wide field view (WFV) images to reconstruct the four lacking bands of GF-1 PMS images, which significantly increases the classification capability of GF-1.
  • We produce a large-scale dataset that covers a wide area and is rich in land types. It basically meets the ground object information required for spectral reconstruction.
  • In order to evaluate the generalization ability of our model, we compare it with other models in image similarity and classification accuracy, and conclude that our model has the best result.
The remaining part of this article is organized as follows: Section 2 describes the related works of spectral reconstruction methods. We present the network of SRT in Section 3. Section 4 presents our results, including the dataset description, the experimental part, and its analysis. Section 5 is the conclusion.

2. Related Works

Due to the limitations of the hardware resources (bandwidth and sensors), researchers have had to make trade-offs in the temporal, spatial, and spectral dimensions of remote sensing images. With the problem of low spectral dimension, researchers mainly used principal component analysis (PCA) [17,18], Wiener estimation (WEN) [19], and pseudoinverse (PI) [20,21] to construct a spectral mapping matrix. In recent years, spectral reconstruction methods have been divided into two branches: prior-driven and data-driven methods.
The first type is mainly based on sparse dictionary learning, which aims to extract the most important spectral mapping features. It can represent as much knowledge as possible with as few resources as possible, and this representation has the added benefit of being computationally fast. For example, Arad and Ben-Shahar [4] were the first to apply an overcomplete dictionary to recover hyperspectral images from RGB. Jonas et al. [5] used the A+ algorithm to improve Arad’s approach to the sparse dictionary. The A+ algorithm directly constructs the mapping from RGB to hyperspectral at the local anchor point, and the running speed of the algorithm is significantly improved. The sparse dictionary method only considers the sparsity of spectral information and does not use local linearity. The disadvantage is that the reconstruction is inaccurate, and the reconstructed image has metamerism [22]. Li et al. [7] proposed a locally linear embedding sparse dictionary method to improve the representation ability of sparse coding. In order to improve the representation ability of the sparse dictionary, this method only selects the local best samples and introduces texture information in the reconstruction, reducing the metamerism. Geng et al. [8] proposed a spectral reconstruction method that preserves contextual information. Gao et al. [9] performed spectral enhancement of multispectral images by jointly learning low-rank dictionary pairs from overlapping regions.
The second type is mainly based on deep learning. With the development of deep learning, a large number of excellent models have gradually replaced the first method owing to its powerful generalization ability. However, compared to the first one, deep learning usually requires enormous amounts of data, and the training process takes a lot of computational time. However, with the increase in computing power, deep learning becomes much more effective, and the related methods are used by more and more researchers. Xiong et al. [10] proposed a deep learning framework for recovering spectral information from spectrally undersampled images. Koundinya et al. [12] compared 2D and 3D kernel-based CNN for spectral reconstruction. Alvarez-Gila et al. [11] posed spectral reconstruction as an image-to-image mapping problem and proposed a generative adversarial networks for spatial context-aware spectral image reconstruction. In the NTIRE 2018 [23] first spectral reconstruction challenge, the entries of Shi et al. [13] ranked in first (HSCNN-D) and second (HSCNN-R) place on both the “Clean” and “Real World” tracks. The main difference between the two networks is that the former adopts a series method for feature fusion, while the latter is an addition method. The series method can learn the mapping relationship between spectra very well. Respectively considering shallow feature extraction and deep feature extraction, Li et al. [24] proposed an adaptive weighted attention network, which obtained the first rank on the “Clean” track. Zhao et al. [14] proposed a hierarchical regression network (HRNet) that obtained first place on the “Real World” track; it is a 4-level multi-scale structure that uses down-sampling and up-sampling to extract spectral features. In the processing of remote sensing images, Deng et al. [15] proposed a more suitable network (M2H-Net) for remote sensing to meet the needs of multiple bands and complex scenes. Li and Gu [16] proposed a progressive spatial–spectral joint network for hyperspectral image reconstruction.

3. Proposed Method

3.1. SRT Architecture

Figure 1 shows the architecture of SRT. In training, the model inputs red, blue, green, and nir bands of GF-6 WFV, and the remaining purple, yellow, red-edge I, and red-edge II bands are used as labels. The overall structure includes TFEM, RDM, RGM, convolution operations, and other related operations.
The whole SRT is an end-to-end structure, which can be divided into three parts:
  • The TFEM is used to extract correlation between spectra by self-attention mechanism.
  • The RDM, which can fully learn and reconstruct these local features to prevent gradient vanishing in training.
  • The RGM is able to reconstruct these global features. Considering the model is ultimately used for GF-1 PMS (8 m) images, it doubles the spatial resolution compared to the trained GF-6 WFV (16 m) images. This module can prevent losing the texture details in the training or inference process.

3.2. TFEM

Google first proposed the Transformer architecture in June 2017 [25]. The impact on the whole natural language processing (NLP) field has been tremendous. In just four years since it was proposed, Transformer has become the dominant model in NLP [26]. Since 2020, it has started to shine in the field of computer vision (CV): image classification (ViT [27], DeiT [28]), object detection (DETR [29], Deformable DETR [30]), semantic segmentation (SETR [31], MedT [32]), image generation (GANsformer [33]) and so on. He et al. [34] showed scalable self-supervised learners for CV (masked autoencoders, MAE). Once again, Transformer shined in the CV. Inspired by the development of Transformer, we try to use Transformer as the backbone of feature extraction for SRT to fully extract relevant features between spectra with the help of its effective attention mechanism. The architecture of TFEM is shown in Figure 2.
Following ViT [27], we divide the remote sensing images into multiple small patches and serialize each patch through a linear projection of flattened patches so that a vision problem turns into a NLP problem. The module needs to add learnable position embedding parameters to maintain the spatial location information between the input patches. The Transformer encoder extracts spectral features from input sequences with the help of its multi-attention mechanism. In our experiment, considering Transformer is only used for feature extraction; we remove the learnable classification embedded in the ViT and use ConvTranspose to replace the MLP head to ensure that the model maps to the same dimension.

3.3. RDM

He et al. [35] proposed a residual learning framework (ResNet) to ease the training of networks that are substantially deeper than those used previously. Based on ResNet, DenseNet makes each layer connect to all previous layers, it [36] is a new network framework that enriches the CNN network system from LeNet [37] to the present ones. It connects all layers to ensure maximum exchange of spectral information flow in the network. In addition, DenseNet also has the advantage that it requires fewer parameters for the same performance or the same number of layers. This is because it has a direct connection to all the previous layers, so it does not have to relearn some of the features that have already been learned.
The RDM contains four residual dense blocks which is shown in Figure 3, and a long skip connection is added in front of the module to prevent the vanishing gradient problem in the network. The spectral reconstruction model of the residual network and dense network can alleviate the vanishing gradient problem during training and ensure more accurate results.

3.4. RGM

The RGM references SE-ResNet [38] and HRNet [14] which is shown in Figure 4. Average pooling can bias the features of the image toward the overall characteristics and prevent the loss of too much high-dimensional information. The final convolution layer is used for channel number mapping, and the global residual is used to preserve spatial details in the image of different spatial resolutions.

3.5. Loss Function

We use the mean relative absolute error (MRAE, Equation (1)) as the loss function, due to the reflectance of the same object on the ground, varies greatly in different bands. It replaces the absolute difference of the mean square error (MSE, Equation (2)), with the mean relative absolute error to achieve adaptive error adjustment according to each band. In a way, it can effectively reduce the high errors caused by different reflectance and demonstrate the accuracy of the reconstructed network more visually. In the validation set, we measure the metric of the models by peak signal-to-noise ratio (PSNR [39], Equation (3)), and save the best model.
MRAE = 1 n i = 1 n P g t i P rec i P g t i
MSE = 1 n i = 1 n P g t i P rec i 2
where P g t i is the gray-scale value of the ith pixel in the reference image, P rec i is the reconstructed gray-scale value of the ith pixel, and n is the number of pixels in the image.
PSNR = 20 · log 10 M A X I MSE
where M A X I is the maximum value of the gray-scale value. All data in this experiment is normalized, M A X I is 1.

3.6. Network Training and Parameter Settings

The parameters of the Transformer encoder are set by default, and the network hyperparameters are set according to Table 2. The size of each convolution kernel in the network is 3 × 3. For the optimizer, we choose Adam.
The computer configuration in this study: CPU is Intel (R) Xeon (R) Gold 6148, GPU is Telsa V100 16 G, and RAM is 16 G. Paddle2.2 was chosen as the development environment.

4. Experiments

The experiment evaluates the quality of the spectral reconstruction by accuracy and classification. Furthermore, AWAN, HRNet, HSCNN-D, and remote sensing image reconstruction (M2H-Net) are the four outstanding methods that are selected to compare with our model SRT, SRT*, and the former three are spectral reconstruction champion methods in the NTIRE challenge. SRT* removes the RGM compared to SRT to test the effect of the module.

4.1. Dataset Description

We use image scenes from GF-1 PMS and GF-6 WFV. The data acquisition for the study areas is shown in Figure 5.
We select nine GF-6 WFV images to form the dataset, six for training and three for testing. The dataset covers a wide range of land types and provides sufficient feature information for the spectral reconstruction of GF-1 PMS. We randomly divide the training images into 13,500 overlapping patches of 128 × 128 pixels, 90% of them for training and the rest for validation. The testing ones are divided into 2000 overlapping patches of 128 × 128 pixels.
The image shown in Area1 is the Songhua River, located in Yilan, Heilongjiang. It is a cropped GF-6 WFV test image that contains abundant information on water, vegetation, tree, and so on. The size of it is 2275 × 2174.
Area2, imaged by GF-1 on 11 Aprill 2016, is located in Tengzhou, Shandong, and contains ample information on building, vegetation, and road. The size of its image is 2500 × 2322.
Area3, imaged by GF-1 on 21 June 2018, is located in Nenjiang, Heilongjiang, and contains rich vegetation, bare land, and tree. The size of its image is 3254 × 3145.
The preprocessing of GF-1 PMS and GF-6 WFV images includes radiometric correction and atmospheric correction in ENVI 5.3. The parameters for the correlation correction are obtained from China Resource Satellite Application Center [40].
Table 3 lists the detailed number of pixels of the training and testing samples for classification in the three areas. Each of them is manually annotated into six classes in ENVI 5.3 software (Exelis Inc., Boulder, CO, USA) to test the classification ability of the reconstructed images, as is shown in Figure 6.

4.2. Evaluation Metrics

We use five indicators to evaluate the different methods, including RMSE, MRAE (Equation (1)), PSNR (Equation (3)), spectral angle mapper (SAM [41]), and structural similarity (SSIM [42]). The formulas of RMSE, SAM, and SSIM are given as follows:
RMSE = 1 n i = 1 n P gt i P rec i 2 N
SAM = cos 1 P rec T P rec P gt T P gt 1 / 2 P rec T P rec 1 / 2
where P gt i is the gray-scale value of the ith pixel in the reference image, P rec c is the reconstructed gray-scale value of the ith pixel, and n is the number of pixels in the image.
SSIM ( gt , rec ) = 2 μ g t μ rec + C 1 2 σ gtrec + C 2 μ gt 2 + μ rec 2 + C 1 μ gt 2 + σ rec 2 + C 2
where μ gt is the average value of the reference image, μ rec is the average value of the reconstructed image, σ g t rec is the covariance of the reference image and the reconstructed image, σ gt is the standard deviation of the reference image, σ rec is the standard deviation of the reconstructed image, and C 1 = ( k 1 L ) 2 and C 2 = ( k 2 L ) 2 are constants used to maintain stability. L is the dynamic range of the pixel values and k 1 is set to 0.01 and k 2 to 0.03.
Classification is an essential application of remote sensing images, and we use SVM and SAM classification to test the classification performance of images. SVM can solve linear and non-linear classification problems well, with fewer support vectors to determine the classification surface, and is not sensitive to the number of samples and spectral dimensionality. SAM measures the similarity between spectra by treating both spectra as vectors and calculating the spectral angle between them. Therefore, it is sensitive to samples and spectral dimensionality.
For the testing of GF-1 PMS images, we cannot use the above indicators to evaluate the four generated bands, except for the classification accuracy. The assessment steps include the following: First, input the original image to the model after radiometric calibration and atmospheric correction. Then, classify the outputs by SVM and SAM methods. Finally, compare the overall accuracy (OA), kappa coefficient (Kappa), and accuracy for every class of all the methods with each other.

4.3. Similarity-Based Evaluation

Table 4 shows the accuracy assessment of the reconstructed GF-6 WFV images on the dataset. Overall, the PSNR and SSIM of the four bands are all high, not less than 38.92 and 0.970, respectively. Similarly, MARE, SAM, and RMSE are all relatively low, indicating that the overall accuracy of the reconstruction is high.
Among the six methods, the results of the AWAN, HSCNN-D, and M2HNet methods are similar. HRNet, SRT, and SRT* are much better than the other three methods in PSNR, MRAE, and SAM. The SRT outperforms HRNet on the dataset, demonstrating that our TFEM outperforms the multi-scale feature extraction of HRNet. In addition, SRT* lacks the RGM compared to SRT and is slightly worse than SRT in some indicators, but still has some advantages compared to other methods.
Compared with the scatter plot in Figure 7, it turns out that the inference results of bands 5 and 6 have larger areas of scattering compared to bands 7 and 8, which indicates that the reconstruction is less relevant. It is also reflected by the PSNR metric on Table 4. The larger the PSNR is, the smaller the scattering region and the strongest the correlation between the predicted band and the original one. The PSNR of band 7 in Table 4 is the highest, and the scatter region of band 7 in Figure 7 is the smallest. Therefore, we can conclude that the reconstruction accuracy of band 7 is the best. It can be seen from the scatter plot that the reconstruction accuracy of each band is different. Using MRAE as the loss function compared to RMSE can well avoid the training of the band-dominant model with large errors.

4.4. Classification-Based Evaluation

For GF-6 WFV images, we evaluate the confusion matrix by the classification results of the original image and the predicted one. Table 5 shows the evaluation results of the SVM classification. Among them, both the OA and KAPPA coefficients of SRT are the highest, 3.3% and 4.2% higher than AWAN, respectively. In the classification result of vegetation, the SRT classification result is 6.3% higher than the second-highest M2HNet. In Figure 8, we can see that the water classification result of M2HNet is significantly different from the reference image.
Table 6 shows the evaluation results of SAM classification, and the SRT results are still the best. Its errors in the OA and Kappa coefficients with the original image classification are only 0.5% and 0.24%. It indicates that the spectral reconstruction capability of SRT is optimal among other methods.
For GF-1 PMS images, our classification results should be higher than GF-1 (8 m spatial resolution, four bands). Table 7 shows the accuracy metrics for SVM classification in Area2. Most methods improve the classification evaluation metrics, with SRT improving OA and KAPPA by 2.1% and 4.3%, respectively. Except in the two classes of tree and road, the classification accuracy of SRT is higher than the original GF-1 PMS for other classes.
Table 8 shows the evaluation results for the SAM classification, where all the methods are still higher than the original results, and the SRT method is the best. Additionally, the results in Figure 9 show that the accuracy of the SVM is higher than the SAM, especially for urban scenes.
Table 9 shows the classification accuracy of Area3. Compared to the GF-1 image classification results, it can improve the OA and Kappa of SRT by 2.41% and 2.0%, respectively. Most classes’ accuracies are better than before. Except for water and bare land, the classification accuracy of SRT is higher than that of other methods for other classes. As shown in Table 10, SRT remains the highest. However, the SAM classification accuracy of all methods in Area3 is much lower than that of SVM. The original image’s OA and Kappa coefficients of the SAM classification are lower than the SVM, with differences as high as 8.8% and 16.7%, respectively. From Figure 10, it also can be seen that the difference between SVM and SAM results classification. SAM classification does not classify the build area well, it divides a small part of bare land into water and divides bare land into two lots of tree. This vast difference may result from the lower spectral dimensions, while the SAM method is more sensitive to the spectrum, so the classification accuracy of SAM is lower than before.
Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10 show that both the SRT and SRT* outperform other methods in terms of overall accuracy, which indicates that the TFEM has a significant advantage in performing spectral feature extraction. The SRT results are still the best in terms of SVM and SAM. By comparing the results of SRT and SRT*, we find that SRT needs to use RGM to prevent the model from losing some details during GF-1 PMS image inference. In addition, under the condition of the same samples, the classification result of SAM is lower than that of SVM. We think that the main reason is that the number of image bands used for classification is too small compared to hyperspectral images, which cannot exert the performance of SAM.
Our method has a robust spectral reconstruction capability, and the reconstructed bands can improve the classification capability of GF-1 PMS images.

4.5. Comparison of Computational Cost

Table 11 shows the parameters, GFLOPs (giga floating-point operations per second), and the running time of all test methods on an input image of 4 × 128 × 128 pixels. Comparing the parameter quantities of SRT and SRT*, it can be found that the parameter quantity of RGM is only 0.08 M, and the GFLOPs and running time increase by 1.21 and 0.02 s, respectively. In addition, the SRT method is only higher than HSCNN-D in the number of parameters and lower than the other three methods. Although the parameter quantity of HSCNN-D is small, the running time is very long, much higher than 0.27 s of SRT, mainly due to the series structure of HSCNN-D, the number of network layers is deepened, and the network operation takes a lot of time.

5. Conclusions

This article proposes a Transformer- and ResNet-based network (SRT) to reconstruct GF-1 PMS images from GF-6 WFV. SRT consists of three parts: the TFEM, the RDM, and the RGM. The TFEM learns correlation between spectra by the attention mechanism. We use the RDM to reconstruct these relevant features locally and apply the RGM to globally reconstruct.
To ensure the model’s generalization, we produce a wide-range, land-type-rich band mapping dataset and test the accuracy in similarity and classification. Meanwhile, to verify whether the knowledge learned from the GF-6 WFV images can be applied to the GF-1 PMS images with inconsistent spatial resolution, we refer to the method of Deng [15] and Li [16]. We believe that the reconstructed band can improve the classification ability of the original image and test it on the Area2 (city is the main scene) and Area3 (farmland is the main scene) GF-1 PMS images. The results show that SRT performs well on both the testing set and the classification accuracy of Area1, Area2, and Area3 compared to other spectral reconstruction methods. The classification accuracy of the reconstructed 8-band images is significantly higher than that of the original 4-band GF-1 PMS images.
In future work, our method still has the following aspects worth expanding on and improving: (1) The structure of the model needs to be improved. Although the parameter quantity of SRT decreases, the detection time does increase slightly. (2) Can it be extended to other satellites, such as GaoFen-2 and GaoFen-4?

Author Contributions

Conceptualization, K.M. and Z.Z.; methodology, K.M, Y.Q. and S.L.; software K.M. and Z.Z.; validation, Z.Z., M.S. and K.M.; formal analysis, K.M. and Z.Z. writing—original draft, K.M.; writing—review and editing, Z.Z., Y.Q. and R.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (61966035), the National Science Foundation of China under Grant (U1803261), the Xinjiang Uygur Autonomous Region Innovation Team (XJE-DU2017T002), the Autonomous Region Graduate Innovation Project (XJ2019G069, XJ2021G062 and XJ2020G074).

Data Availability Statement

Data are all downloaded from China Center For Resources Satellite Data and Application [40]. The data information is in Table A1.

Acknowledgments

The authors would like to thank all of the reviewers for their valuable contributions to our work.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Data acquisition for the study areas.
Table A1. Data acquisition for the study areas.
ApplicationSatelliteSensorAcquisition DataLocation
TrainGF-6WFV10 October 201885.8 E 44.6 N
GF-6WFV4 September 2018100.5 E 31.3 N
GF-6WFV11 October 2018102.5 E 40.2 N
GF-6WFV5 October 2018110.1 E 26.9 N
GF-6WFV29 October 2018114.8 E 31.3 N
GF-6WFV18 September 2018118.6 E 42.4 N
TestGF-6WFV1 October 201888.8 E 40.2 N
GF-6WFV17 October 2018114.9 E 38.0 N
GF-6WFV16 September 2018129.9 E 46.8 N
Area1GF-6WFV16 September 2018129.9 E 46.8 N
Area2GF-1PMS14 November 2016125.3 E 48.8 N
Area3GF-1PMS221 June 2018117.2 E 35.2 N

References

  1. Wu, Z.; Zhang, J.; Deng, F.; Zhang, S.; Zhang, D.; Xun, L.; Javed, T.; Liu, G.; Liu, D.; Ji, M. Fusion of GF and MODIS Data for Regional-Scale Grassland Community Classification with EVI2 Time-Series and Phenological Features. Remote Sens. 2021, 13, 835. [Google Scholar] [CrossRef]
  2. Jiang, X.; Fang, S.; Huang, X.; Liu, Y.; Guo, L. Rice Mapping and Growth Monitoring Based on Time Series GF-6 Images and Red-Edge Bands. Remote Sens. 2021, 13, 579. [Google Scholar] [CrossRef]
  3. Kang, Y.; Hu, X.; Meng, Q.; Zou, Y.; Zhang, L.; Liu, M.; Zhao, M. Land Cover and Crop Classification Based on Red Edge Indices Features of GF-6 WFV Time Series Data. Remote Sens. 2021, 13, 4522. [Google Scholar] [CrossRef]
  4. Arad, B.; Ben-Shahar, O. Sparse recovery of hyperspectral signal from natural RGB images. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 19–34. [Google Scholar]
  5. Aeschbacher, J.; Wu, J.; Timofte, R. In defense of shallow learned spectral reconstruction from RGB images. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 471–479. [Google Scholar]
  6. Fu, Y.; Zheng, Y.; Zhang, L.; Huang, H. Spectral Reflectance Recovery From a Single RGB Image. IEEE Trans. Comput. Imaging 2018, 4, 382–394. [Google Scholar] [CrossRef]
  7. Li, Y.; Wang, C.; Zhao, J. Locally Linear Embedded Sparse Coding for Spectral Reconstruction From RGB Images. IEEE Signal Process. Lett. 2018, 25, 363–367. [Google Scholar] [CrossRef]
  8. Geng, Y.; Mei, S.; Tian, J.; Zhang, Y.; Du, Q. Spatial Constrained Hyperspectral Reconstruction from RGB Inputs Using Dictionary Representation. In Proceedings of the IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 3169–3172. [Google Scholar] [CrossRef]
  9. Gao, L.; Hong, D.; Yao, J.; Zhang, B.; Gamba, P.; Chanussot, J. Spectral superresolution of multispectral imagery with joint sparse and low-rank learning. IEEE Trans. Geosci. Remote Sens. 2020, 59, 2269–2280. [Google Scholar] [CrossRef]
  10. Xiong, Z.; Shi, Z.; Li, H.; Wang, L.; Liu, D.; Wu, F. Hscnn: Cnn-based hyperspectral image recovery from spectrally undersampled projections. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 518–525. [Google Scholar]
  11. Alvarez-Gila, A.; Van De Weijer, J.; Garrote, E. Adversarial networks for spatial context-aware spectral image reconstruction from rgb. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 480–490. [Google Scholar]
  12. Koundinya, S.; Sharma, H.; Sharma, M.; Upadhyay, A.; Manekar, R.; Mukhopadhyay, R.; Karmakar, A.; Chaudhury, S. 2D-3D CNN based architectures for spectral reconstruction from RGB images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 844–851. [Google Scholar]
  13. Shi, Z.; Chen, C.; Xiong, Z.; Liu, D.; Wu, F. Hscnn+: Advanced cnn-based hyperspectral recovery from rgb images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 939–947. [Google Scholar]
  14. Zhao, Y.; Po, L.M.; Yan, Q.; Liu, W.; Lin, T. Hierarchical regression network for spectral reconstruction from RGB images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 422–423. [Google Scholar]
  15. Deng, L.; Sun, J.; Chen, Y.; Lu, H.; Duan, F.; Zhu, L.; Fan, T. M2H-Net: A Reconstruction Method For Hyperspectral Remotely Sensed Imagery. ISPRS J. Photogramm. Remote Sens. 2021, 173, 323–348. [Google Scholar] [CrossRef]
  16. Li, T.; Gu, Y. Progressive Spatial–Spectral Joint Network for Hyperspectral Image Reconstruction. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
  17. Zhang, X.; Xu, H. Reconstructing spectral reflectance by dividing spectral space and extending the principal components in principal component analysis. J. Opt. Soc. Am. A 2008, 25, 371–378. [Google Scholar] [CrossRef] [PubMed]
  18. Liu, X.; Liu, L. Improving chlorophyll fluorescence retrieval using reflectance reconstruction based on principal components analysis. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1645–1649. [Google Scholar]
  19. Haneishi, H.; Hasegawa, T.; Hosoi, A.; Yokoyama, Y.; Tsumura, N.; Miyake, Y. System design for accurately estimating the spectral reflectance of art paintings. Appl. Opt. 2000, 39, 6621–6632. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Imai, F.H.; Berns, R.S. Spectral estimation using trichromatic digital cameras. In Proceedings of the International Symposium on Multispectral Imaging and Color Reproduction for Digital Archives, Chiba, Japan, 21–22 October 1999; Volume 42, pp. 1–8. [Google Scholar]
  21. Cheung, V.; Westland, S.; Li, C.; Hardeberg, J.; Connah, D. Characterization of trichromatic color cameras by using a new multispectral imaging technique. JOSA A 2005, 22, 1231–1240. [Google Scholar] [CrossRef] [PubMed]
  22. Zhang, J.; Su, R.; Ren, W.; Fu, Q.; Nie, Y. Learnable Reconstruction Methods from RGB Images to Hyperspectral Imaging: A Survey. arXiv 2021, arXiv:2106.15944. [Google Scholar]
  23. Arad, B.; Ben-Shahar, O.; Timofte, R.; Gool, L.V.; Yang, M.H. NTIRE 2018 Challenge on Spectral Reconstruction from RGB Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  24. Li, J.; Wu, C.; Song, R.; Li, Y.; Liu, F. Adaptive weighted attention network with camera spectral sensitivity prior for spectral reconstruction from RGB images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 462–463. [Google Scholar]
  25. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  26. Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A survey of transformers. arXiv 2021, arXiv:2106.04554. [Google Scholar]
  27. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  28. Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning (PMLR), Virtual Event, 13–14 August 2021; pp. 10347–10357. [Google Scholar]
  29. Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Seattle, WA, USA, 14–19 June 2020; pp. 213–229. [Google Scholar]
  30. Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
  31. Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
  32. Valanarasu, J.M.J.; Oza, P.; Hacihaliloglu, I.; Patel, V.M. Medical transformer: Gated axial-attention for medical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France, 27 September–1 October 2021; pp. 36–46. [Google Scholar]
  33. Arad Hudson, D.; Zitnick, L. Compositional Transformers for Scene Generation. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Online, 6–12 December 2020; pp. 9506–9520. [Google Scholar]
  34. He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. arXiv 2021, arXiv:2111.06377. [Google Scholar]
  35. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  36. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Venice, Italy, 22–29 October 2017; pp. 4700–4708. [Google Scholar]
  37. LeCun, Y.; Boser, B.; Denker, J.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L. Handwritten digit recognition with a back-propagation network. In Proceedings of the Advances in Neural Information Processing Systems 2, Denver, CO, USA, 27–30 November 1989. [Google Scholar]
  38. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  39. Huynh-Thu, Q.; Ghanbari, M. Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 2008, 44, 800–801. [Google Scholar] [CrossRef]
  40. CRESDA. China Centre For Resources Satellite Data and Application. 2021. Available online: http://www.cresda.com/CN/index.shtml (accessed on 2 June 2022).
  41. Yuhas, R.H.; Goetz, A.F.; Boardman, J.W. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. In Summaries of the Third Annual JPL Airborne Geoscience Workshop. Volume 1: AVIRIS Workshop; Jet Propulsion Laboratory: La Cañada Flintridge, CA, USA, 1992. [Google Scholar]
  42. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Network architecture of our SRT.
Figure 1. Network architecture of our SRT.
Remotesensing 14 03163 g001
Figure 2. The architecture of TFEM, where the purple blocks are obtained vectors from each linear projection of the flattened patch, and the red blocks are the learnable positional encodings of the corresponding vectors.
Figure 2. The architecture of TFEM, where the purple blocks are obtained vectors from each linear projection of the flattened patch, and the red blocks are the learnable positional encodings of the corresponding vectors.
Remotesensing 14 03163 g002
Figure 3. The architecture of residual dense block, where LRelu is leakyRelu.
Figure 3. The architecture of residual dense block, where LRelu is leakyRelu.
Remotesensing 14 03163 g003
Figure 4. The architecture of RGM.
Figure 4. The architecture of RGM.
Remotesensing 14 03163 g004
Figure 5. Image regions used for training and testing from China Center For Resources Satellite Data and Application website [40].
Figure 5. Image regions used for training and testing from China Center For Resources Satellite Data and Application website [40].
Remotesensing 14 03163 g005
Figure 6. Distribution of the selected sample objects in Area1–3.
Figure 6. Distribution of the selected sample objects in Area1–3.
Remotesensing 14 03163 g006
Figure 7. The scatter plot shows the predicted bands of each method compared with the original bands.
Figure 7. The scatter plot shows the predicted bands of each method compared with the original bands.
Remotesensing 14 03163 g007
Figure 8. Result of SVM and SAM classification on Area1.
Figure 8. Result of SVM and SAM classification on Area1.
Remotesensing 14 03163 g008
Figure 9. Result of SVM and SAM classification on Area2.
Figure 9. Result of SVM and SAM classification on Area2.
Remotesensing 14 03163 g009
Figure 10. Result of SVM classification on Area3.
Figure 10. Result of SVM classification on Area3.
Remotesensing 14 03163 g010
Table 1. Band specification of the GF-1 PMS and GF-6 WFV images.
Table 1. Band specification of the GF-1 PMS and GF-6 WFV images.
GF-1 PMSGF-6 WFV
BandWavelength (nm)Spatial
Resolution (m)
BandWavelength (nm)Spatial
Resolution (m)
Blue450∼5208Blue450∼52016
Green520∼5908Green520∼59016
Red630∼6908Red630∼69016
Nir730∼8908Nir730∼89016
Red edge 1690∼73016
Red edge 1730∼77016
Purple400∼45016
Yellow590∼64016
Pan450∼9002
Table 2. Hyper-parameters setting of SRT network.
Table 2. Hyper-parameters setting of SRT network.
Parameter NameParameter Setting
Batch size
Initial learning rate
32
0.01
OptimizerAdam
Decay rate0.1
Learning rate decay steps2000 steps
Epochs200
Activation functionRelu
Sigmod
Leaky-Relu
Table 3. Details of the ground truth in Area1–3.
Table 3. Details of the ground truth in Area1–3.
 Area1Area2Area3
TrainTestTrainTestTrainTest
Water638124513992266119213487
Build7383923008200512191598
Bare land101994306287013971491
Plant243112734169278072048551
Tree1037183611691112765183131
Road8714615921062300259
Table 4. Quantitative assessment of different spectral reconstruction methods for the dataset. The best results are shown in bold.
Table 4. Quantitative assessment of different spectral reconstruction methods for the dataset. The best results are shown in bold.
EvaluationBandMethod
AWANHSCNN-DHRNetM2HnetSRTSRT*
PSNRband540.0640.1043.9240.8045.2344.91
band639.3739.4743.0038.9244.0043.39
band742.5141.7146.5642.5048.2947.81
band841.2342.1044.7140.9545.8745.63
avg40.7940.8544.5540.7945.8545.43
SSIMband50.9800.9720.9910.9790.9910.991
band60.9700.9830.9920.9760.9910.990
band70.9800.9850.9900.9730.9920.992
band80.9700.9810.9900.9780.9920.992
avg0.9750.9800.9910.9760.9920.991
MRAEband50.0380.0410.0240.0380.0210.024
band60.0260.0240.0190.0370.0200.020
band70.0310.0320.0220.0360.0190.025
band80.0320.0290.0210.0380.0200.022
avg0.0320.0320.0220.0370.0200.023
SAMband51.661.591.171.681.061.07
band61.211.230.871.360.810.84
band71.421.430.881.510.800.81
band81.661.501.081.721.001.02
avg1.491.441.001.570.920.93
RMSEband50.0100.0120.0080.0160.0070.010
band60.0150.0210.0070.0110.0090.014
band70.0090.0140.0090.0040.0060.008
band80.0100.0130.0070.0160.0070.010
avg0.0110.0150.0080.0120.0080.011
Table 5. Accuracy of classification result of Area1 with SVM. The best results are shown in bold.
Table 5. Accuracy of classification result of Area1 with SVM. The best results are shown in bold.
SVMAWANHSCNN-DHRNetM2HNetSRTSRT*GF-6
OA0.89090.90150.90300.90380.92370.91180.9291
Kappa0.79000.80710.81060.81330.83210.82150.8357
Water0.95600.82720.97330.97860.83240.98130.9847
Build0.92170.89180.98490.92950.97770.98940.9817
Bare Land0.57190.74550.58550.58040.50350.66460.6654
Vegetation0.88030.88360.88360.85060.91400.88710.9262
Tree0.88120.81960.89090.89830.86570.90010.9200
Road0.57560.58570.58230.57850.43570.58720.5768
Table 6. Accuracy of classification result of Area1 with SAM. The best results are shown in bold.
Table 6. Accuracy of classification result of Area1 with SAM. The best results are shown in bold.
SAMAWANHSCNN-DHRNetM2HNetSRTSRT*GF-6
OA0.87430.87090.87820.87010.89390.88540.8992
Kappa0.74350.74090.75130.74010.80120.78210.8036
Water0.89700.86830.83240.84980.88230.88490.8500
Build0.51480.63860.50770.53390.46360.48510.4715
Bare Land0.71210.74900.50350.66060.55390.57460.8823
Vegetation0.95180.94290.9140.94920.92020.94110.9278
Tree0.87990.87740.91570.88620.92580.90280.9331
Road0.57030.61830.43570.64490.41960.69770.6976
Table 7. Accuracy of classification result of Area2 with SVM. The best results are shown in bold.
Table 7. Accuracy of classification result of Area2 with SVM. The best results are shown in bold.
SVMAWANHSCNN-DHRNetM2HnetSRTSRT*GF-1
OA0.86620.87490.88390.87880.88620.88530.8648
Kappa0.83090.83590.85070.83990.86550.85480.8223
Water0.97540.98080.98540.98800.98810.98440.9775
Build0.75270.76380.82990.80450.75900.75140.7519
Bare Land0.83780.88790.84920.86000.93980.93950.8527
Vegetation0.96140.95050.95350.95550.95430.95730.9531
Tree0.82790.78760.83580.82270.79400.79070.8370
Road0.66500.67840.69470.65470.64580.65580.6264
Table 8. Accuracy of classification result of Area2 with SAM. The best results are shown in bold.
Table 8. Accuracy of classification result of Area2 with SAM. The best results are shown in bold.
SVMAWANHSCNN-DHRNetM2HnetSRTSRT*GF-1
OA0.80460.79540.80470.80580.81960.80660.7923
Kappa0.79200.78360.79210.79470.80480.79560.7822
Water0.99730.99720.99070.99720.99970.99900.9988
Build0.90870.91040.90870.93810.88220.91400.9015
Bare Land0.46230.44030.46250.42880.48120.46110.4233
Vegetation0.80590.78720.80590.79590.84180.80620.7775
Tree0.86100.87260.86100.92640.94290.87370.9160
Road0.98740.98860.98740.98750.97790.98520.9776
Table 9. Accuracy of classification result of Area3 with SVM. The best results are shown in bold.
Table 9. Accuracy of classification result of Area3 with SVM. The best results are shown in bold.
SVMAWANHSCNN-DHRNetM2HnetSRTSRT*GF-1
OA0.93030.92120.93270.93220.94870.94060.9246
Kappa0.92580.91270.93020.91900.93570.93480.9157
Water0.9910.98540.98970.98880.98800.98530.9931
Build0.63040.62990.57100.59330.66500.59440.4950
Bare Land0.90470.93470.95450.95640.92480.95250.9149
Vegetation0.95710.93580.96240.96080.97980.97250.9691
Tree0.95200.94920.95980.95030.97280.97250.9472
Road0.96200.95350.96380.96130.98940.96820.9660
Table 10. Accuracy of classification result of Area3 with SAM. The best results are shown in bold.
Table 10. Accuracy of classification result of Area3 with SAM. The best results are shown in bold.
SVMAWANHSCNN-DHRNetM2HnetSRTSRT*GF-1
OA0.82980.83440.84390.83010.84900.84380.8367
Kappa0.73970.74450.75630.74200.75990.75090.7484
Water0.85830.85830.87140.84850.88560.88920.8574
Build0.38220.41950.40160.41190.43450.43380.3973
Bare Land0.85990.86170.88040.85960.88000.87790.877
Vegetation0.62710.61010.67860.67320.67580.62050.6083
Tree0.93090.92890.93960.92370.93750.92760.9363
Road0.63560.66340.65740.66340.66470.66340.6436
Table 11. The complexity comparison of different models.
Table 11. The complexity comparison of different models.
AWANHSCNN-DHRNetM2HnetSRTSRT*
Params (M)21.584.6232.0422.7317.6217.54
GFLOPs352.8575.6440.89245.86121.66120.45
Time (S)0.211.210.410.240.270.25
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Mu, K.; Zhang, Z.; Qian, Y.; Liu, S.; Sun, M.; Qi, R. SRT: A Spectral Reconstruction Network for GF-1 PMS Data Based on Transformer and ResNet. Remote Sens. 2022, 14, 3163. https://doi.org/10.3390/rs14133163

AMA Style

Mu K, Zhang Z, Qian Y, Liu S, Sun M, Qi R. SRT: A Spectral Reconstruction Network for GF-1 PMS Data Based on Transformer and ResNet. Remote Sensing. 2022; 14(13):3163. https://doi.org/10.3390/rs14133163

Chicago/Turabian Style

Mu, Kai, Ziyuan Zhang, Yurong Qian, Suhong Liu, Mengting Sun, and Ranran Qi. 2022. "SRT: A Spectral Reconstruction Network for GF-1 PMS Data Based on Transformer and ResNet" Remote Sensing 14, no. 13: 3163. https://doi.org/10.3390/rs14133163

APA Style

Mu, K., Zhang, Z., Qian, Y., Liu, S., Sun, M., & Qi, R. (2022). SRT: A Spectral Reconstruction Network for GF-1 PMS Data Based on Transformer and ResNet. Remote Sensing, 14(13), 3163. https://doi.org/10.3390/rs14133163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop