Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

SFPFusion: An Improved Vision Transformer Combining Super Feature Attention and Wavelet-Guided Pooling for Infrared and Visible Images Fusion

Sensors 2023, 23(18), 7870; https://doi.org/10.3390/s23187870

by Hui Li^*,†, Yongbiao Xiao^†

, Chunyang Cheng and Xiaoning Song

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Sensors 2023, 23(18), 7870; https://doi.org/10.3390/s23187870

Submission received: 10 August 2023 / Revised: 8 September 2023 / Accepted: 11 September 2023 / Published: 13 September 2023

(This article belongs to the Special Issue AI-Driven Sensing for Image Processing and Recognition)

Round 1

Reviewer 1 Report

The main proposed approach has novelty in contribution and methodology. Revision in terms of technical details is needed before publication. Also, paper organization can be improved. In this respect, some comments are suggested to describe technical details.

1. How do you select the tradeoff value between Lpixel and Lssim in the Eq. 12? Did you use a same tradeoff value in all experiments?

2. Did you implement all of the compared methods in the Table 3? If no, add related references in the Table. If yes, discuss about the initial parameters of compared methods.

3. In some cases, fusion should be performed in real time. So, it is suggested to discuss about the runtime of your proposed approach briefly.

4. What is the meaning of the sentence “As mentioned in Section 3.1.2 , the extracted detail information is sparse and the 480 effect is significant” in the section 4.3.5? Discuss in a clear way.

5. Your proposed approach can be used widely in image retrieval systems as preprocess. For example, I find a paper titled “Innovative local texture descriptor in joint of human-based color features for content-based image retrieval”, which has enough relation. Cite this paper and discuss about potential future works briefly.

6. Add some image examples of the fusion output images.

7. Which kind of pooling did you use in Super Feature Pooling Transformer module? and Why?

Review the whole text in terms of possible English grammar mistakes or typing errors.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors have proposed a novel vision transformer-based fusion network that combines wavelet-guided pooling and super feature attention mechanism to obtain effective global features while enhancing detailed information. The topic of the manuscript is interesting and well organized. However, similar works have been published in the field in recent years, which the authors have not mentioned. Novelty and contribution are not clear compared to them. Also, manuscript writing is evaluated at a low level. So, I can't recommend accepting it.

1) The manuscript does not mention the following completely related works. The novelty of this manuscript is questionable.

- D. Rao, T. Xu, and X.-J. Wu, "Tgfuse: An infrared and visible image fusion approach based on transformer and generative adversarial network," IEEE Transactions on Image Processing, 2023.

- T. Yao, Y. Pan, Y. Li, C.-W. Ngo, and T. Mei, "Wave-vit: Unifying wavelet and transformers for visual representation learning," in European Conference on Computer Vision, 2022: Springer, pp. 328-345.

2) Please avoid using terms like new and novel in the title.

3) Provide references for metrics used.

4) The manuscript has several writing errors. I have mentioned only some of them in the abstract below. The wording throughout the manuscript must benefit from copy-editing by a professional editor.

- Infrared → The infrared

- which preserves → that preserves

- networks(CNN) → networks (CNNs)

- obtains → obtain

- to extract → for extracting

- Transformer based → transformer-based

- task and other down-stream → tasks and other downstream

- into fusion → to the fusion

- termed as SFPFusion → termed SFPFusion

- state-of-the-arts → state-of-the-art

5) Please observe the spacing/no spacing between characters throughout the article; for example: Transformer[42] → Transformer [42], …

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Most of comments have been considered by authors in the revised version. The revised version is better than original submission in terms of paper organization and technical details.

Reviewer 2 Report

I have no further comment.

Article Menu

SFPFusion: An Improved Vision Transformer Combining Super Feature Attention and Wavelet-Guided Pooling for Infrared and Visible Images Fusion

Further Information

Guidelines

MDPI Initiatives

Follow MDPI