Review Reports - Multi-Scale Feature Fusion with Attention Mechanism Based on CGAN Network for Infrared Image Colorization

Round 1

Reviewer 1 Report

Review of the manuscript entitled "Multi-scale feature fusion with attention mechanism based on 2 CGAN network for infrared image colorization" (ID: applsci-2308224).

This manuscript improved the CGAN to solve the problems of color leakage and unclear semantics in near-infrared images. Briefly, the generator of CGAN is improved, the multi-scale feature fusion module is added to the U-Net, and the attention mechanism module is proposed to be added to the discriminator.

However, this paper is not well presented. I recommend the acceptance of this manuscript subject to a major revision.

The language of the manuscript is not simple enough, and many places are written redundantly. There are some errors in the manuscript. Some are mentioned in the attached PDF file. Please go through such errors throughout the manuscript and correct it. Meanwhile, the language of the manuscript should be well checked and improved by native English editor.

Some major problems listed below.

- First of all, the current form of the manuscript does not adequately highlight the content of this study. The introduction section needs to be revised.

- Abstract: There is no quantitative information regarding the evaluation of the proposed method.

- Page 2, Line 80-97: The authors are suggested to pay attention to recent fusion methods, which are also related to the manuscript topic. A comprehensive literature review will reflect the authors' rich knowledge reserves.

[1] J. Liu, X. Fan, J. Jiang, R. Liu and Z. Luo, "Learning a Deep Multi-Scale Feature Ensemble and an Edge-Attention Guidance for Image Fusion," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 1, pp. 105-119, Jan. 2022, doi: 10.1109/TCSVT.2021.3056725.

[2] Wang, J., Yu, L. & Tian, S. MsRAN: a multi-scale residual attention network for multi-model image fusion. Med Biol Eng Comput 60, 3615–3634 (2022). https://doi.org/10.1007/s11517-022-02690-1.

[3] J. Li, H. Huo, C. Li, R. Wang and Q. Feng, "AttentionFGAN: Infrared and Visible Image Fusion Using Attention-Based Generative Adversarial Networks," in IEEE Transactions on Multimedia, vol. 23, pp. 1383-1396, 2021, doi: 10.1109/TMM.2020.2997127.

- Most importantly, what's new in this study in comparison with the mentioned references? Please discuss it. the major innovation of this study is not clear. No specific information is available with respect to the state-of-the-art models. I would suggest increasing more detail in the introduction.

- Page 3, Line 98-100: "With its unique network structure and training mechanism, generative adversarial network (GAN) have been applied by many scholars in the field of coloring infrared images". The sentence requires some references.

- Page 3, Line 114-116: "It is demonstrated through experiments that the method in this paper simultaneously improves the problems of color leakage and unclear semantics of infrared image colorization". The purpose of this sentence is to describe the results, and should not be placed in the introduction.

- Section 2.1, Line 123-130; Line 134-143; Line 147-155; Equation (1) and Section 2.2, Line 159-172; Equation (2): Sentences (paragraphs) and equations need references. It should be added in other sections.

- The quality of some Figures (Also text font inside the Figure) should be improved, for example: Figure 7 and Figure 10.

- Results section: Need more and detailed Quantitative analysis and Should compare with at least two other related methods.

- The manuscript need discussion section. In the discussion section, previous studies must be discussed by some citations. Also, discuss how this study is better than the previous studies. Are there any differences of the findings from this study with previous studies? Did the method used in this study perform better than previously widely-used methods, and why? Are there any implications and limitations of this study? What should we do in future effort? etc.

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

Thank you for spending time to review our work.

We are grateful for your valuable comments and suggestions which help improve the quality of the paper (manuscriptID: applsci-2308224). We have studied your comments carefully and have made modifications and corrections which we hope meet your approval.

We have made the responses to the comments “point by point” as follows, while the details of the revisions in the manuscript have been modified and corrected by using the "Track Changes" function in Microsoft Word.

Thanks for your attention to our manuscript and we would like to appreciate your detailed and professional advices again.

Yours sincerely,

Yibo Ai, Email:ybai@ustb.edu.cn

National Center for Materials Service Safety, University of Science and Technology Beijing, China, 100083

Corresponding author:

Weidong Zhang, E-mail: zwd@ustb.edu.cn

Response:Thank you for your comment on our manuscript and careful modifications! As your good instruction, we have revised the whole manuscript carefully and tried to avoid any grammar error. We have modified the manuscript accordingly, and detailed corrections are listed below point by point:

- First of all, the current form of the manuscript does not adequately highlight the content of this study. The introduction section needs to be revised.

Response:Thank you for your feedback. We recognize that the introduction section of our original manuscript did not sufficiently highlight the novelty and significance of our work. As a result, we have strengthened the introduction section to emphasize the innovation of our research.

Building upon the previous methods, this paper proposes an infrared image coloring algorithm based on the CGAN with a multi-scale feature fusion and attention mechanism. CGAN is an extension of GAN [25], which controls the image generation process by adding conditional constraints, resulting in better image quality and more detailed output. In this work, we improve the generator architecture of CGAN by incorporating a multi-scale convolution module with three types of convolutions in the U-Net network to fuse different scale features, enhance the network's feature extraction ability, improve learning speed, semantic understanding ability, and address issues such as color leakage and blurring during the coloring process. We add an attention module to the discriminator, which contains both channel attention and spatial attention, to filter the feature layers from a channel perspective and select important regions on the feature map. This allows the network to focus on useful features and ignore unnecessary ones, while also improving the discriminator's effectiveness and efficiency. By combining the improvements to the generator and discriminator, a new network with multi-scale feature fusion and attention module is obtained. Finally, we tested our proposed method on a near-infrared image dataset that combines the advantages of both infrared and visible images by preserving more details, edges, and texture features from the visible light images while retaining the benefits of the infrared images.

- Abstract: There is no quantitative information regarding the evaluation of the proposed method.

Response:Thank you for your advice, and we have added the following statement to the abstract:

Experimental results show that our proposed method achieved a peak signal-to-noise ratio (PSNR) of 16.5342dB and a structural similarity index (SSIM) of 0.6385 on an RGB-NIR (Red, Green, Blue-Near Infrared) testing dataset, representing a 5% and 13% improvement over the original CGAN network, respectively.

3.- Page 2, Line 80-97: The authors are suggested to pay attention to recent fusion methods, which are also related to the manuscript topic. A comprehensive literature review will reflect the authors' rich knowledge reserves.

[2] Wang, J., Yu, L. & Tian, S. MsRAN: a multi-scale residual attention network for multi-model image fusion. Med Biol Eng Comput 60, 3615–3634 (2022). https://doi.org/10.1007/s11517-022-02690-1.

Li, H. Huo, C. Li, R. Wang and Q. Feng, "AttentionFGAN: Infrared and Visible Image Fusion Using Attention-Based Generative Adversarial Networks," in IEEE Transactions on Multimedia, vol. 23, pp. 1383-1396, 2021, doi: 10.1109/TMM.2020.2997127.

Response:Thank you very much and we have accepted your advice and modified it.We have added two of these references related to infrared images in the introduction section and mentioned the methods of the three aforementioned papers in a new discussion section later.

4.- Most importantly, what's new in this study in comparison with the mentioned references? Please discuss it. the major innovation of this study is not clear. No specific information is available with respect to the state-of-the-art models. I would suggest increasing more detail in the introduction.

Response:Thank you for your suggestion, we have added the references you recommended and revised the introduction accordingly. One of the changes to the innovation points has been shown in the first response in the full paragraph.

- Page 3, Line 98-100: "With its unique network structure and training mechanism, generative adversarial network (GAN) have been applied by many scholars in the field of coloring infrared images". The sentence requires some references.

Response:It is a good advice and we have added references in the appropriate places.

- Page 3, Line 114-116: "It is demonstrated through experiments that the method in this paper simultaneously improves the problems of color leakage and unclear semantics of infrared image colorization". The purpose of this sentence is to describe the results, and should not be placed in the introduction.

Response:Thank you for your suggestion, we have removed this sentence.

7.- Section 2.1, Line 123-130; Line 134-143; Line 147-155; Equation (1) and Section 2.2, Line 159-172; Equation (2): Sentences (paragraphs) and equations need references. It should be added in other sections.

Response:Thank you for your reminder, we have added references before these paragraphs and at the corresponding equations.

8.- The quality of some Figures (Also text font inside the Figure) should be improved, for example: Figure 7 and Figure 10.

Response:Thank you for your reminder, we have modified the corresponding pictures.

9.- Results section: Need more and detailed Quantitative analysis and Should compare with at least two other related methods.

Response:Thank you for your good advice. We have added more analysis and description in the part of Quantitabie analysis. We have added the comparison with other related method in the manuscript, in Chapter 3.1.

10- The manuscript need discussion section. In the discussion section, previous studies must be discussed by some citations. Also, discuss how this study is better than the previous studies. Are there any differences of the findings from this study with previous studies? Did the method used in this study perform better than previously widely-used methods, and why? Are there any implications and limitations of this study? What should we do in future effort? etc.

Response:Thank you very much for your serious suggestion, it's a good one and we have added a discussion section according to what you mentioned.

Discussion

In this paper, we propose a method to improve the infrared image colorization problem by using a CGAN network that integrates multi-scale features into the generator and adds an attention mechanism to the discriminator. Prior literature has also attempted to improve networks using multi-scale feature modules and attention mechanisms [22,23,31]. For example, reference [31] proposed a multi-scale residual attention network (MsRAN), while reference [22] integrated multi-scale attention mechanisms into the generator and discriminator of a GAN to fuse infrared and visible light images (AttentionFGAN), and reference [23] proposed a deep network that concatenates feature learning modules and fusion learning mechanisms for infrared and visible light image fusion.

The main improvement direction for infrared image colorization is to focus the network's attention on the most important areas of the image and retain more texture information. We also use multi-scale feature modules and attention mechanism modules, but unlike previous studies, we choose to improve the CGAN network. The CGAN network is more suitable for tasks that require generating images with specified conditional information than the GAN network. Infrared image colorization tasks require generating color images, and images generated solely by the GAN generator may have issues such as unnatural colors, blurriness, and distortion. The CGAN generator can generate corresponding color images based on the input infrared images, and the input infrared images as conditional information can help the generator generate more accurate corresponding color images, thus solving the above issues. We then add the multi-scale feature module and attention mechanism to the generator and discriminator separately, rather than adding both modules to the generator or discriminator simultaneously. Our goal is to use the game theory of GAN networks, allowing the generator and discriminator networks to compete and promote each other using different methods.

We selected a dataset that includes many images with texture details, such as buildings with tightly arranged windows and dense trees. However, due to our focus on making the edges of the subject object smaller and clearer to solve color leakage and edge blur problems, there may be some deviation in the background color of the sky in some images. In future work, we will consider preprocessing the images before inputting them into the CGAN network to enhance the image quality and color restoration ability. After generating color images, post-processing can also be applied to the output images, such as denoising, smoothing, and enhancing contrast, to improve the quality and realism of the output images. The quality and quantity of the dataset are also crucial for the effectiveness of infrared image colorization. Therefore, future research can try to collect more high-quality infrared image datasets and conduct more in-depth studies based on the dataset.

Author Response File: Author Response.docx

Reviewer 2 Report

The review report

The manuscript title: Multi-scale feature fusion with attention mechanism based on 2 CGAN network for infrared image colorization.

In this paper, the authors present an infrared image colorization algorithm based on multiscale feature fusion and the conditional generative adversarial network (CGAN) heightened attention mechanism.

The authors have used an appropriate number of recent references relevant to the topic of their article. The manuscript is appropriately prepared, the conclusions presented are consistent with the evidence, and arguments presented, and address the main question raised.

Based on the above, I propose to accept the article for publication after taking into account the following points:

1- All keywords must start with a capital letter.

2- Mathematical equations and formulas must be rewritten to match the font type and size used in the text.

3- The accuracy of figures 4, 5 and 10 must be increased.

4- The conclusion should be short so that it contains all the results without elaborating on the explanation.

Author Response

Dear Reviewer,

Thank you for spending time to review our work.

Thanks for your attention to our manuscript and we would like to appreciate your detailed and professional advices again.

Yours sincerely,

Yibo Ai, Email:ybai@ustb.edu.cn

National Center for Materials Service Safety, University of Science and Technology Beijing, China, 100083

Corresponding author:

Weidong Zhang, E-mail: zwd@ustb.edu.cn

Thank you for your comment on our manuscript and careful modifications! As your good instruction, we have revised the whole manuscript carefully and tried to avoid any grammar error. We have modified the manuscript accordingly, and detailed corrections are listed below point by point:

1-All keywords must start with a capital letter.

Response:Thank you very much and we have accepted your advice and modified it.

Mathematical equations and formulas must be rewritten to match the font type and size used in the text.

Response:Thank you for your suggestion. We have re-edited all the formulas using a formula editor.

The accuracy of figures 4, 5 and 10 must be increased.

Response:Thank you for your reminder, we have modified the corresponding pictures.

The conclusion should be short so that it contains all the results without elaborating on the explanation.

Response:Thank you for your suggestion. We have removed some redundant sentences in the conclusion section and made some revisions to the language used previously.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments of this reviewer on the manuscript Applsci-2308224 are as follows:

· In general, to the best knowledge of this reviewer, the Authors provided readers with a very interesting study on infrared image colorization using attention mechanism and multi-scale features. However, there are a few minor details that should be fixed.

· · The novelty that is described in Introduction should be highlighted in Abstract.

· · Keywords should be listed in alphabetical order and followed by the associated abbreviations in parentheses (if any). For instance: Generative Adversarial Network (GAN), etc.

· · All the abbreviations used in the manuscript should be defined at their first appearance.

· · Figure 1 should be removed from Introduction and replaced by an appropriate text (description).

· Phrases such as “The generator generates generated data…” should be avoided.

· · The Authors stated the following: “From the above comparison, although the network with the multi-scale feature fusion module still cannot fully restore the color of the real image, it obviously improves the problem of color leakage that exists in other methods and makes the contour boundary of the object clearer.” This is obvious from Figure 11. Could appropriate filtering or additional application of artificial intelligence improve the colorization?

· · The Authors also stated the following: “The experiments show that both in terms of subjective human perception and objective evaluation metrics, the method in this paper effectively improves the color leakage and semantic uncertainty of the original network for infrared coloring.” The improvement achieved by colonization cannot be considered effective. Accordingly, in this sentence, it should be said “…the method used significantly improves…” instead of “…the method in this paper effectively improves…”.

Author Response

Dear Reviewer,

Thank you for spending time to review our work.

Thanks for your attention to our manuscript and we would like to appreciate your detailed and professional advices again.

Yours sincerely,

Yibo Ai, Email:ybai@ustb.edu.cn

National Center for Materials Service Safety, University of Science and Technology Beijing, China, 100083

Corresponding author:

Weidong Zhang, E-mail: zwd@ustb.edu.cn

·The novelty that is described in Introduction should be highlighted in Abstract.

Response:Thank you for your suggestion. We have made revisions to the abstract accordingly.

Abstract: This paper proposes a colorization algorithm for infrared images based on a Conditional Generative Adversarial Network (CGAN) with multi-scale feature fusion and attention mechanisms, aiming to address issues such as color leakage and unclear semantics in existing infrared image coloring methods. Firstly, we improve the generator of the CGAN network by incorporating a multi-scale feature extraction module into the U-Net architecture to fuse features from different scales, thereby enhancing the network's ability to extract features and improving its semantic understanding, which improves problems of color leakage and blurriness during colorization. Secondly, we enhance the discriminator of the CGAN network by introducing an attention mechanism module, which includes channel attention and spatial attention modules, to better distinguish between real and generated images, thereby improving the semantic clarity of the resulting infrared images. Finally, we jointly improve the generator and discriminator of the CGAN network by incorporating both the multi-scale feature fusion module and attention mechanism module. We tested our method on a dataset containing both infrared and near-infrared images, which retains more detailed features while also preserving the advantages of existing infrared images. Experimental results show that our proposed method achieved a peak signal-to-noise ratio (PSNR) of 16.5342dB and a structural similarity index (SSIM) of 0.6385 on an RGB-NIR (Red, Green, Blue-Near Infrared) testing dataset, representing a 5% and 13% improvement over the original CGAN network, respectively. These results demonstrate the effectiveness of our proposed algorithm in addressing the issues of color leakage and unclear semantics in the original network.The proposed method in this paper is not only applicable for infrared image colorization but can also be widely applied for colorization of remote sensing and CT images.

·Keywords should be listed in alphabetical order and followed by the associated abbreviations in parentheses (if any). For instance: Generative Adversarial Network (GAN), etc.

Response:Thank you for your reminder, we have modified the keywords.

·All the abbreviations used in the manuscript should be defined at their first appearance.

Response:Thank you very much and we have modified it.

·Figure 1 should be removed from Introduction and replaced by an appropriate text (description).

Response:We have accepted your suggestion to remove Figure 1 and added some text descriptions in the introductory section.

·Phrases such as “The generator generates generated data…” should be avoided.

Response:Thank you for the reminder. We have carefully checked and revised the language.

·The Authors stated the following: “From the above comparison, although the network with the multi-scale feature fusion module still cannot fully restore the color of the real image, it obviously improves the problem of color leakage that exists in other methods and makes the contour boundary of the object clearer.” This is obvious from Figure 11. Could appropriate filtering or additional application of artificial intelligence improve the colorization?

Response:It's a good question，and thank you very much. The idea using the other method such as appropriate filtering, artificial intelligence to improve the colorization, is a good suggestion to improve the result of the colorization. We have do such work as you suggest, but in our research we divided it into two stesps. The first step is colorization, and the second is image quality improvement, which has published as “No-Reference Image Quality Assessment Based on Image Multi-Scale Contour Prediction, Appl. Sci. 2022, 12, 2833.https://doi.org/10.3390/app12062833”. The result shows that your suggestion is a feasible method to improve the colorization.We have also added a new discussion section to the article, in which we discuss the issue you raised.

·The Authors also stated the following: “The experiments show that both in terms of subjective human perception and objective evaluation metrics, the method in this paper effectively improves the color leakage and semantic uncertainty of the original network for infrared coloring.” The improvement achieved by colonization cannot be considered effective. Accordingly, in this sentence, it should be said “…the method used significantly improves…” instead of “…the method in this paper effectively improves…”

Response:Thank you very much for your thoughtful suggestion. It was our oversight during the language check. We have made the revision to the sentence as per your suggestion.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I would like to thank the authors for their scientific effort in revising the paper.

Reviewer 2 Report

The authors have completed the required revisions, so I suggest accepting the article for publication