Next Article in Journal
THBase: A Coprocessor-Based Scheme for Big Trajectory Data Management
Next Article in Special Issue
Forward-Looking Element Recognition Based on the LSTM-CRF Model with the Integrity Algorithm
Previous Article in Journal
An Agent Based Model to Analyze the Bitcoin Mining Activity and a Comparison with the Gold Mining Industry
Previous Article in Special Issue
Bidirectional Recurrent Neural Network Approach for Arabic Named Entity Recognition
Peer-Review Record

Object Detection Network Based on Feature Fusion and Attention Mechanism

Future Internet 2019, 11(1), 9;
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Future Internet 2019, 11(1), 9;
Received: 9 November 2018 / Revised: 20 December 2018 / Accepted: 25 December 2018 / Published: 2 January 2019
(This article belongs to the Special Issue Innovative Topologies and Algorithms for Neural Networks)

Round 1

Reviewer 1 Report

  This paper presents a object detection network with feature fusion and attention mechanism to obtain better CNN feature and performance of small object. It is quite meaningful for deep learning efficiency improvement because it provides another possibility for feature extraction and integration. The contents in every paragraph and comparison process seem to be able to generate persuasive instructions. However, the “abstract” and “paragraph 3.4.”should be wrote in clear, easy-to-understand text to make sure the research can be verified by others. It is very clear in the narrative part of the structure of the article and the research methods, but whether some detailed part (e.g., bottom-up top-down structure & sampling strategies for subsampling) in the research results can be reproduced may have higher uncertainty. Basically, the professionalism and creativity of this article is worthy of recognition. It would be a good article for publishing after minor revision. Here with concerns need to be addressed:


Question & Comment1:

The previous research part was written very completely, and you have already presented the constructive revision to enhance interest feature and weak background interference. I think it would be interesting to explain in more detail about “Visual attention mechanism” or to add some comparision information in discussion paragraph. It would be quite helpful for readers to quickly understand the contribution of your research.


Question & Comment2

In this study, whether the different initial window sizes for feature extraction, and the arrangement of convolution layers, relu layers, and pooling layers affect the research results have not been carefully explored. CNN itself has certain gray box characteristics, so it is necessary to mention the uncertainty of the research.


Question & Comment3:

I am not sure whether or not your method can directly applied the CCD video without correction and noise removal processes and show promising results, because it is very common to use unoptimized video directly in fast accessment of video-based behavior recognition. I suggest the applicable materials (e.g., image quality, frame size and spatial resolution) can be defined more detailed.


Question & Comment4:

In your research, you metioned that the attention mechanism with weight mask could enhance the interest features and weakens the irrelevant feature in the CNN feature map. I am very curious whether the "feed-forward sweep and top-down feedback" will be affected by the “density of objects”, “the overlapping of objects”, and “the image distortion of objects”?


Question & Comment5:

Feature convolution and pooling can solve the problem of knowledge adaptation from multiple scene datasets, and reduce the network parameters and the spatiotemporal size of the image representations. However, the image distortion and shadow effects seemed to be an unignorable problem and have great influence in frature recognition and image classification. If the input of the training sample from original images is highly discriminating (e.g., light changes, shape changes, image distortion) is it possible to correct this problem automatically through the research process?


Question & Comment6:

Pixel based classification and object based classification are two main basic methods applied in VHR or HD imagery classification. Is it possible to apply object based segmentation or detection in the first stage of your work flow? I think it would be helpful to accelerate the test speed.


Question & Comment7:

  The research process and the formula for updating and checking classification objects were very creative and seem to be reasonable. However, more additional instructions about the effect of forward and reverse verification, the commission error and omission error, and decision making accuracy are suggested.


Author Response


Author Response File: Author Response.pdf

Reviewer 2 Report

The paper proposes a deep architecture for object detection, combining attention modules and features fusion. 

The language of the paper is very poor and the paper contains many typing errors, so that some sentences are difficult to understand. Also acronyms are not defined, and the same is for some symbols in the formulas. The method proposed by the authors is not well described and their choices are not justified, it is therefore hard to evaluate the scientifc soundness of the method. However, results seems to be enough good, even if more examples are needed to confirm it. I suggest to deeply revise the paper, adding more results to give evidence of the method suitability. Also the state of the art could be improved, adding more details and comments. The text in the figures is too small and the captions are too short. A detailed description of each block would be of help, along with a final resume of the proposed algorithms.

Author Response


Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The paper shows some improvements but some of my previous concerns still holds:

-characters are still too small in most figures

-spaces, fonts and the use of capital letters must be carefully revised

-figure captions are still too short and do not allow to understand the meaning of each blocks

-formulas formatting is very poor  

Author Response


Author Response File: Author Response.pdf

Back to TopTop