Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Visualizing Near Infrared Hyperspectral Images with Generative Adversarial Networks

Remote Sens. 2020, 12(23), 3848; https://doi.org/10.3390/rs12233848

by Rongxin Tang^1,2

, Hualin Liu¹ and Jingbo Wei^1,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Remote Sens. 2020, 12(23), 3848; https://doi.org/10.3390/rs12233848

Submission received: 20 October 2020 / Revised: 19 November 2020 / Accepted: 20 November 2020 / Published: 24 November 2020

(This article belongs to the Section Remote Sensing Image Processing)

Round 1

Reviewer 1 Report

This paper presents an end-to-end deep neural network for generating an RGB image with enhanced visibility from hyperspectral images. The novelty lies in utilizing a deep convolutional generative adversarial network, which is an exciting idea in computer science. However, the following points need to be clarified to improve the readability.

There are several approaches to enhancing remotely sensed images (i.e., haze/cloud removal). Thus, it is advisable to amend the introduction to include references to such methods. The following papers can be cases in point.
1. Ngo, D.; Lee, S.; Kang, B. Robust Single-Image Haze Removal Using Optimal Transmission Map and Adaptive Atmospheric Light. Remote Sens. 2020, 12, 2233.
2. Jiang, H.; Lu, N.; Yao, L. A High-Fidelity Haze Removal Method Based on HOT for Visible Remote Sensing Images. Remote Sens. 2016, 8, 844.
3. Jiang, H.; Lu, N. Multi-Scale Residual Convolutional Neural Network for Haze Removal of Remote Sensing Images. Remote Sens. 2018, 10, 945.
In Section 2.1, because the W character denotes the image's width, it is suggested to use another symbol (e.g., the Greek omega symbol) to denote the weight of the network layer.
In Section 2.2, it appears that the description is somewhat limited. Therefore, please a detailed description of the employed U-Net architecture. For example, the following points should be clarified in the revised manuscript.
1. The deeper network usually results in better performance, but it sometimes blurs the final image. Hence, please explain the reason for setting the depth of the network to 14.
2. Please explain the advantages of utilizing the concatenation.
It is suggested to rearrange the layers in Figure 2 in a horizontal direction to reduce the blank spaces.
The content loss function is Equation (4) is not an L1-norm. Please revise the loss function and fix the relevant code as well.
In Section 3, it is advisable to include the training results of the employed network. Since training a deep convolutional generative adversarial network is quite complicated, the training results should be presented and discussed in greater detail.
The utilization of deep neural networks usually brings about impressive quantitative evaluation scores. However, the results in Section 4.2 demonstrated that the SSIM scores are solely around 0.5 to 0.7. Thus, please check the evaluation thoroughly since it appears that the experiment was not conducted correctly.
Interpretation of the quantitative evaluation results is essential in a research article. Please provide more discussion regarding the results presented in Tables 2-6.
The manuscript contains several grammatical errors; hence, please proofread it thoroughly.

Author Response

(1) Thank you for your advice. These papers have been cited as references [28-30].

(2) Thank you for your advice. The notation for weights has been replaced with \omega.

(3) Thank you for your advice. The explanation to concatenation is added in the second paragraph of subsection 2.2, as is listed below.

Because of less convolution, low-level features have higher resolution to hold position and detail, but they are noisy and of few semantics. On the contrary, high-level features have stronger semantic information, but details are not perceivable. Concatenation is then used to combine low-level features and high-level features to improve model performance.

(4) Thank you for your advice. The figure has been redrawn to reduce space.

(5) We are sorry for the mistake. The equation has been revised. The code has also been checked which used the \ell_1 norm as described.

(6) Thank you for your advice. A new subsection 7.1 is added in the discussion section where the loss curves are presented to account for the training details.

(7) Thank you for your concern. We have checked the SSIM calculation and confirmed that the scores are right. The unsatisfactory scores may come from the fact that the visualization of infrared images is still very challenging, and our method solves this problem only for the first step.

Synthesizing a multispectral image from the corresponding infrared image is an issue firstly raised in this paper. For the same ground target, the content presented by infrared image and visible light multispectral image is quite different. For example, due to the lack of red bands, vegetation cannot be accurately identified in infrared images. Inaccuracy in classification results in the inability to restore accurate colors during the reconstruction process, which confuses the structural information. As a comparison, the values of SSIM in Tables 6-9 are higher than the values of SSIM in Tables 2-5 by more than 0.2 when the visible light bands are added to the input of the network. The different SSIM values indirectly prove that the calculation works well.

(8) Thank you for your advice. Two new paragraphs are added in subsection 4.2 to address the results in Tables 2-6.

The visualization results of image 1 and image 2 are compared in Table 2 and Table 1, respectively. In these tables, the proposed HVCNN method is far superior to other algorithms in the evaluation of all indicators. Due to the spectral inconsistency between the input image and the output image, neither the dimensionality reduction method nor the band selection method can predict the color of the target image effectively. Among the competing algorithms, DSEBS has the best color consistency, while the two DHV methods have the worst color performance, but none of their colors are easily understandable. In contrast, our method can synthesize roughly acceptable colors as Q4 illustrates. At the same time, PSNR and CC also proved that the data authenticity of the new method is better than other methods. The results of SSIM show that our method has an easily recognizable structure.

Tables 3 and Table 4 present the visualization results of image 3 and image 4, respectively, which are quite different from the content of the first two images. In these tables, the scores of all algorithms are improved because the features are simple and free of urban areas. In terms of structural information, the effect of the competing algorithm is very close to that of HVCNN. However, the advantages of this method are still obvious in demonstrating better structure and color.

(9) We are sorry for the writing mistakes that we made. We have made a thorough revise through the whole paper to avoid grammar or word mistakes as possible as we can.

Reviewer 2 Report

This paper presents a deep learning method to synthetize RGB images from hyperspectral images, trying to approach true color.

The first part of the paper is really good, both the structure and the contents of the firsts sections (Introduction, State-of-art and methodology). However, the paper needs to be reviewed, since the elaboration of the Results and Discussion sections are a little sloppy. Specially the last one, where it seems that the authors include the results of some additional experiments, not previously described and therefore making difficult to understand their objectives. In addition, a deep discussion of the results include in the Results section is missing.

Some especifics aspect to be improved in the Discussion section are:

* Results for the TG-1 Data it is not described enough. The information has to be improved or the section removed
* Similar for subsection 5.2. Why are not visual results in this section?. Given the improvement reflected in the quality measures, the visualisation improvement has to be also shown.
* Subsection 5.3 —> What authors considered an advantage, from my point of view is a limitation of the method. The method does not learn the difference between seasons, the method only learns the colors that are shown during the training phase.
* Some figures are not referenced in the text, neither described, and the captions are not as descriptive as they should be (Figures 8 and 9).

Finally, some typos have to be corrected. Just to mention some of them :

Abstract —> hypserspectral
L 65 —> Movivated
L 180 —> denition

Author Response

(1) Thank you for your advice. The TG-1 experiment does not match the topic exactly as only 19 bands are available for visualization. But it is a proof of the universality of the proposed method. Moreover, when someone uses data with a similar number of bands, the method in this article can be referenced.

To better describe the experiment, this subsection is rewritten where more information about TIANGONG-1 is introduced, as well as the experimental scheme. The results are not extended because structural details are poor due to the insufficient data source.

(2) Thank you for your advice. A new section is listed to account for the experimental results when all Hyperion bands are used for visualization. Four figures and four tables are listed, and are compared with the results from visual light bands removed hyperspectral images.

(3) Thank you for your advice. We understand that the statement in the paper is not serious. Therefore, this paragraph is rewritten which is listed below.

Different color styles result from different seasons and locations. For some a location, our model implicitly learns different color styles from the training image pairs. Then the network can output the appropriate style according to the style of the input data if training images of corresponding moment are provided. This is accomplished when the hyperspectral image and multispectral image of the training data pair are taken from the same moment. In addition, it is also possible to map hyperspectral images of different seasons to the same season. This can be achieved by fixing the capturing time of multispectral images in all training data pairs. The latter facilitates comparisons to quickly discover new information in the ground. However, no matter which scheme is adopted, a large amount of training data is required. The feasibility of our method has been proven for limited data, and its feasibility for large-scale data can be expected. However, in order to fully prove this, a lot of work are needed for data collection and experiment, which can be expected in the future work.

(4) The original figures 8 and 9 have been replaced with Figures 8-12 in the revised manuscript. These figures have been referenced in the paper. Captions for all figures have been enriched.

(5) We are sorry for the writing mistakes that we made. We have made a thorough revise through the whole paper to avoid grammar or word mistakes as possible as we can.

Reviewer 3 Report

If possible, please provide a link where the images can be downloaded

Author Response

The LandSat-8 and Hyperion images were downloaded from https://earthexplorer.usgs.gov/ for free.

The TIANGONG-1 data is not for public.

Registration was performed with ENVI after the images were downloaded.

Rotation is used to enlarge the available pixel number.

Round 2

Reviewer 1 Report

First of all, I would like to appreciate the efforts that the authors have put into revising the manuscript.

The authors have raised most of my comments and the revised manuscript has been significantly improved.

However, some minor mistakes still persist. For example, in section 2.4, it is advisable to use L_{con} instead of L_{MAE} in Eq. (4) for consistency. Thus, I highly recommend the authors use the proofreading service to polish the paper before publication.

Author Response

We appreciate you very much that you have given us many valuable advices to improve our work. Following your advice, we made a thorough check to find mistakes as possible as we can. A lot of changes have been made in the lastest version, which are highlighted in the supplementary file.

Reviewer 2 Report

I consider that the paper can be published in the current form.

Author Response

We appreciate you very much that you have given us many valuable advices to improve our work. To polish the language and avoid minor mistakes, we made a thorough check as possible as we can. A lot of changes have been made in the lastest version, which are highlighted in the supplementary file.

Article Menu

Visualizing Near Infrared Hyperspectral Images with Generative Adversarial Networks

Further Information

Guidelines

MDPI Initiatives

Follow MDPI