Next Article in Journal
Pipistrellus pipistrellus and Pipistrellus pygmaeus in the Iberian Peninsula: An Annotated Segmented Dataset and a Proof of Concept of a Classifier in a Real Environment
Previous Article in Journal
Virtual Scene Construction for Seismic Damage of Building Ceilings and Furniture
 
 
Article
Peer-Review Record

FDCNet: Frontend-Backend Fusion Dilated Network Through Channel-Attention Mechanism

Appl. Sci. 2019, 9(17), 3466; https://doi.org/10.3390/app9173466
by Yuqian Zhang, Guohui Li *, Jun Lei and Jiayu He
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2019, 9(17), 3466; https://doi.org/10.3390/app9173466
Submission received: 2 August 2019 / Revised: 17 August 2019 / Accepted: 19 August 2019 / Published: 22 August 2019
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

The authors proposed frontend-backend fusion dilated network (FDCNet) model for crowd counting. They collected front-end feature map with the back-end feature map to achieve various scale features fusion through channel-attention block and utilized the dilated layers to obtain high-quality density map. The authors adopted SSIM-based loss function compare the local correlation between the estimated density map and the ground truth. The model was verified in four common datasets and compared to four other models.

This paper is well written. The experiments are properly conducted and the results are well explained. The results are reasonable and convincible. The references are adequate and up to date.

However, there are many grammar errata. I suggest the authors consult native English speakers to improve their paper.

In section "2. Related work", the enumeration of the subsections is incorrect.

Line 213: Conv1_2, Conv2_2,... are not clearly defined.
Line 218: The terms concatenate and fusion are not clearly defined.
Line 241: "SE" is not specified.
Line 258: not a formal equation
Line 316: the 2-norm formular is not formal.
Line 344: the summation index from t to M is incorrect.
Line 405: "column" is incorrect.
Line 503: "Patents" information?

I recommend the paper be accepted with minor revision.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Generally, the paper is properly written; it includes all the necessary parts (intro, overview, description, experiments), and it is clear what is the author contribution. The obtained results show the importance of the solution; on the other hand, source codes are missing; authors should include a link into the paper. There are also several issues mentioned below:


Introduction: "For example, two 3×3 convolution kernels are equivalent to one 5×5 convolution kernel and three 3×3 convolution kernels are equivalent to a 7×7 convolution kernels" that is misleading. You should emphasize that you mean, e.g., two succeeding 3x3 kernels, not two 3x3 kernels from the same scale (layer/level).

The introduction is clearly written, and the contributions are described well. The section "related work" is expressive enough, so I do not have critical remarks here.

Section 3:
The frontend-backend fusion is not so the new idea to desires so long description as is given in 3.1. In this part of the architecture is the same, as in the case of the well-known U-Net.

Figure 2: you should split the image into two particular images; there is no need to have them concatenated.

Formula 6: it does not seem to be correct. Please check the signs.

Your SSIM is confusing. Firstly, Formula 12 defines it as an index for two images (density maps). But, Formula 13 defines it as a function of one variable. Furthermore, the variable t suppose to be a particular pixel? That is a contradiction to the claim that SSIM is computed over the whole image, not a single pixel. Please, make it more clear.

Section "Experiments" includes a dummy sentence from the template "Authors should discuss the results and how they can be interpreted in perspective of previous studies and of the working hypotheses. The findings and their implications should be discussed in the broadest context possible. Future research directions may also be highlighted". Delete it.

Section 4.1: what is the motivation to use Adam with lr 1e-6 when usually it is used with lr 1e-3 or 1e-4? That seems to be very low and also can explain why you have used 2000 epochs. Let me note that such a low learning rate may not be able to reach the best possible loss.

Figure 3: in the description, you use the word "column" instead of "row".

The experiments are OK.

Author Response

Thank you for reviewing our paper and put forward valuable comments.Please see the attachment for a point-by-point response to the comments .

Author Response File: Author Response.pdf

Reviewer 3 Report

Generally, I lack deeper discussion and conclusion as a fundamental parts of academic article. Please extend it.

Authors mentioned "density map" as an expected output in a abstract and introduction for a few times - but there are no map outputs in the article. Partial images in Figure 3 can not be considered as a "map", it should be called "data preview" as it not meet any map and rules (title, legend, values etc.).

Furthermore I miss a discussion on this preview - what the colours/intervals means, how colours/intervals were calculated, are values relative or absolute, so can we compare values among images or not?

Figure 3 - check and correct column vs. row

Author Response

Thank you for reviewing our paper and put forward valuable comments.Please see the attachment for a point-by-point response to the comments .

Author Response File: Author Response.pdf

Back to TopTop