Next Article in Journal
Unilateral Condylar Hyperplasia in Surgeons’ Perspective—A Narrative Review
Previous Article in Journal
Effects of Cheonwangbosim-dan in a Mouse Model of Chronic Obstructive Pulmonary Disease: Anti-Inflammatory and Anti-Fibrotic Therapy
Previous Article in Special Issue
OFDM Emitter Identification Method Based on Data Augmentation and Contrastive Learning
 
 
Article
Peer-Review Record

DLMFCOS: Efficient Dual-Path Lightweight Module for Fully Convolutional Object Detection

Appl. Sci. 2023, 13(3), 1841; https://doi.org/10.3390/app13031841
by Beomyeon Hwang 1, Sanghun Lee 2,* and Hyunho Han 3
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4:
Appl. Sci. 2023, 13(3), 1841; https://doi.org/10.3390/app13031841
Submission received: 27 December 2022 / Revised: 24 January 2023 / Accepted: 30 January 2023 / Published: 31 January 2023
(This article belongs to the Special Issue Learning-Based Object and Pattern Recognition)

Round 1

Reviewer 1 Report

To resolve the problem of low accuracy and high computational cost existing in traditional fully convolutional neural networks, this study proposed a nove lightweight module fully convolutional on-stage detector. In the proposed method, the feature loss was minimised via  extracting spatial and channel information in parallel and implementing a bottom-up feature pyramid network that improves low-level information detection. Finally, the performance of the proposed method was validated using experimental verification, with satisfactory results. Overall, the topic of this study is interesting, and the manuscript was well organised and written. The detailed comments are provided as follows.

1.       The main innovations and contributions should be well clarified in abstract and introduction.

2.       Please broaden and update literature review on fundamental and applications of CNN. E.g. Vision-based concrete crack detection using a hybrid framework considering noise effect. Torsional capacity evaluation of RC beams using an improved bird swarm algorithm optimised 2D convolutional neural network.

3.       How did the authors set the DLMFCOS hyperparameters to achieve optimal prediction performance?

4.       More details about the training and test data separation should be added.

5.       How about the robustness of the proposed method against noise effects?

 

6.       More future research should be included in conclusion part.

Author Response

Dear Reviewer

First of all, thank you for reviewing the insufficient paper.

Thank you for your comments on the paper, and I will respond to your comments.

Q1. The main innovations and contributions should be well clarified in abstract and introduction.

A1. The contributions and major innovations of this thesis have been revised to reflect more in the
abstract and introduction.

Q2. Please broaden and update literature review on fundamental and applications of CNN.
E.g. Vision-based concrete crack detection using a hybrid framework considering noise effect. Torsional capacity evaluation of RC beams using an improved bird swarm algorithm optimised 2D convolutional neural network.

A2. When revising the introduction, the recommended literature and related contents were reviewed and revised.

Q3. How did the authors set the DLMFCOS hyperparameters to achieve optimal prediction performance?

A3. I added parameter information that was not mentioned in this paper to section 4.1.
(epoch, LRscheduler-hpyerparameters)

Q4. More details about the training and test data separation should be added.

A4. I added details about training and testing datasets.

Q5. How about the robustness of the proposed method against noise effects?

A5. In my opinion, if you look at the detection results of the PASCAL VOC and MS COCO data sets, it
is confirmed that the detection performance is improved in some overlaps, and it is thought to be more robust against noise than FCOS.

Q6. More future research should be included in conclusion part.

A6. Multiple backbones, anchor base models, and future studies related to application to various
industries have been added.

Thanks again for reviewing the paper.

The best regard

from author

Author Response File: Author Response.pdf

Reviewer 2 Report

1. related work section include only too old architectures.  You may find SOTA at least here https://paperswithcode.com/sota/real-time-object-detection-on-coco to adding it to your section.

2. Line 39: it is not obvious either DLM module architecture is proposed here or taken anywhere. In the second case please show its history, 

3. please, increase Fig.1 size 

4. I do not find any difference in Fig. 3 and Fig. 4

5. Fig. 5 is seems that (C) (C in circle in the center upper part) should be concatenation instead of convolution , 

6. the notation of eq 1 is not obvious, especially "!" which has not any explanation. Also (x) usually assume element-wise (or other) multiplication, then concatenation ((U)), where (.) is circle.

7. Fig. 6 dileted conv assumes showing its dilation rate.

8.  Fig. 7  show the (S) and (x) notation.

9. eq 9, show condition of each case.

10. line 219  Equations ??

11.  eq. 17 \rho need to have denotation

12.  tab. 4 please compare your results with SOTA here: https://paperswithcode.com/sota/real-time-object-detection-on-coco 

or https://paperswithcode.com/task/object-detection

you may do it at least theoretically to suggest benefits of using your approach.

13. please also add recommendation of using your results compared to the sota one.

Author Response

Dear Reviewer

First of all, thank you for reviewing the insufficient paper.

Thank you for your comments on the paper, and I will respond to your comments.

Q1. related work section include only too old architectures.  You may find SOTA at least here
https://paperswithcode.com/sota/real-time-object-detection-on-coco to adding it to your section.

A1. As the reviewer suggested, I recommended it according to the COCO-minval used in the test in
this paper and compared it with SOTA.

Q2. Line 39: it is not obvious either DLM module architecture is proposed here or taken anywhere. In the second case please show its history, 

A2. DLM (Dual-path lightweight module) is a method proposed in this paper. Therefore, I have modified the contents to avoid confusion.

Q3. please, increase Fig.1 size 

A3. I modified the size of figure 1 according to the comment.

Q4. I do not find any difference in Fig. 3 and Fig. 4

A4. As you comment we confirmed that Figure 4 was an incorrect picture and corrected it.

Q5. Fig. 5 is seems that (C) (C in circle in the center upper part) should be concatenation instead of convolution ,

A5. As you comment we confirmed that Figure 5 was an incorrect caption and corrected it.

Q6. the notation of eq 1 is not obvious, especially "!" which has not any explanation. Also (x) usually assume elementwise (or other) multiplication, then concatenation ((U)), where (.) is circle.

A6. According to your opinion, the latex error that occurred during the writing of the thesis has been corrected and the contents have been reflected. ! = W_1 (Conv 1\times 1)

Q7. Fig. 6 dileted conv assumes showing its dilation rate.

A7. According to your opinion, we added the kernel and dilated rate of dilated depthwise convolution to the caption.

Q8.  Fig. 7  show the (S) and (x) notation.

A8. Added clarification for missing symbols (s) and (x) as per your comment.

Q9. eq 9, shows condition of each case.

A9 According to your opinion, we have added the condition for Eq.9.

Q10. line 219  Equations ??

A10. Fixed the incorrect reference in line 219 as in your comment.

Q11. eq. 17 \rho need to have denotation

A11. According to your opinion, I modified p -> \rho in the formula.

Q12. tab. 4 please compare your results with SOTA here: https://paperswithcode.com/sota/real-timeobject-detection-on-coco or https://paperswithcode.com/task/object-detection
you may do it at least theoretically to suggest benefits of using your approach.

A12. Based on your comments, we have added the SOTA model based on the MS COCO dataset to table 4.

Q13. please also add recommendation of using your results compared to the sota one.

A13. When you say that you are comparing with SOTA, are you referring to the detection result? Comments are appreciated.

Thanks again for reviewing the paper.

The best regard

from author

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors have proposed a DLMFCOS object detection network for deep-learning object detection. The manuscript is not acceptable due to the issues mentioned below:

1)     Authors should clearly describe what the major contribution of this work is compared to the following works:

-        H. Shi, Q. Zhou, Y. Ni, X. Wu, and L. J. Latecki, "DPNET: Dual-Path Network for Efficient Object Detectioj with Lightweight Self-Attention," arXiv preprint arXiv:2111.00500, 2021.

-        Q. Zhou, H. Shi, W. Xiang, B. Kang, X. Wu, and L. J. Latecki, "DPNet: Dual-Path Network for Real-time Object Detection with Lightweight Attention," arXiv preprint arXiv:2209.13933, 2022.

In addition to the above, why is the following related article not mentioned?

-        J. Pan, H. Sun, Z. Song, and J. Han, "Dual-Resolution Dual-Path Convolutional Neural Networks for Fast Object Detection," Sensors, vol. 19, no. 14, p. 3111, 2019.

2)     It should be logically justified and clearly explained why the addition of a DLM leads to improved detection accuracy.

3)     Cross-entropy improvement due to Eq. (8) will lead to possible loss of what performance(s)?

4)     What does “!_1” mean in Eq. (1)?

5)     H × W must be defined.

6)     Please provide references for non-original formulas.

7)     What values can p take in Eq. (8)? In other words, specify its domain.

8)     What is “Centerness”?

9)     Eq. (17) is incomprehensible.

10) “The number of parameters (params) was used to measure the amount of computation.”: It needs more explanation.

11) “… our DLM FCOS object detection network improved average accuracy by at least 1.5%, with approximately 10% fewer computations. This method makes a great contribution to the field because it finally achieves a good balance between accuracy and efficiency.”. First, how is this a balance in which both accuracy is improved and calculations are reduced?! Second, please avoid giving extra credits to your work (such as "great contribution") in a scientific article.

12) Some writing comments:

-        Observe the space between the characters; for example, lines 29, 32, 168, 214 and ….

-        Please pay attention to the correct and appropriate use of a dot or comma at the end of the equations.

-        Please pay attention to the beginning of the new paragraph; For example, line 173 and ...

 

-        Correct lines 95, 198 and 219.

Author Response

Dear Reviewer

First of all, thank you for reviewing the insufficient paper.

Thank you for your comments on the paper, and I will respond to your comments.

Q1. Authors should clearly describe what the major contribution of this work is compared to the following
works:

- J. Pan, H. Sun, Z. Song, and J. Han, "Dual-Resolution Dual-Path Convolutional Neural Networks
for Fast Object Detection," Sensors, vol. 19, no. 14, p. 3111, 2019.

- H. Shi, Q. Zhou, Y. Ni, X. Wu, and L. J. Latecki, "DPNET: Dual-Path Network for Efficient Object Detectioj with Lightweight Self-Attention," arXiv preprint arXiv:2111.00500, 2021.

- Q. Zhou, H. Shi, W. Xiang, B. Kang, X. Wu, and L. J. Latecki, "DPNet: Dual-Path Network for Real-time Object Detection with Lightweight Attention," arXiv preprint arXiv:2209.13933, 2022.

In addition to the above, why is the following related article not mentioned?

A1. Fixed the main contribution as requested. Also, the reason for not mentioning the papers below is that DLM was an improved method following the previously proposed LNFCOS and HISFCOS methods. DPNET, DPNet is a method I hadn't seen before a reviewer mentioned it. But I added a short mention in the induction part.

Q2. It should be logically justified and clearly explained why the addition of a DLM leads to improved
detection accuracy.

A2. It should be logically justified and clearly explained why the addition of a DLM leads to improved detection accuracy.We minimized the amount of computation and added the loss initialization of feature information (low-level information such as edge and semantic information).

Q3. Cross-entropy improvement due to Eq. (8) will lead to possible loss of what performance(s)?

A3 In the case of cross-entropy, it is difficult for the network to distinguish between relatively easy samples and difficult samples. Therefore, it is possible to improve the problem by using a focal loss that can be weighted for easy samples.

Q4 What does “!_1” mean in Eq. (1)?

A4. Improved latex formula problem. ( !_1 = \omega_1 = conv 1\times 1)

Q5 H × W must be defined.

A5. H\times W is H: height, W: width. (input resolution) was added to the paper.

Q6) Please provide references for non-original formulas.

A6 For an intuitive understanding of the paper loss function, a direct formula was used. I think this will be helpful when reading the paper.

 

Q7) What values can p take in Eq. (8)? In other words, specify its domain.

A7 p in Equation 8 is the predicted value of the network. The content has been added to the paper.

Q8) What is “Centerness”?

A8 Centerness is a method proposed in the FCOS paper to indicate the center of an object.

Q9) Eq. (17) is incomprehensible.

A9 According to the comments, the error in eq17 has been corrected.

Q10. “The number of parameters (params) was used to measure the amount of computation.”:
It needs more explanation.

A10 As you commented, it is difficult to express the amount of computation with params as an explanation. Therefore, for that part, the calculation cost, or amount of calculation, was explained by adding FLOPs.

Q11 “our DLM FCOS object detection network improved average accuracy by at least 1.5%,
with approximately 10% fewer computations. This method makes a great contribution to the field because it finally achieves a good balance between accuracy and efficiency.”.

- First, how is this a balance in which both accuracy is improved and calculations are reduced?!

- Second, please avoid giving extra credits to your work (such as "great contribution") in a scientific   article.

A11. As the reviewer stated, the amount of computation was reduced and the accuracy improved compared to the baseline FCOS. Therefore, it is expressed that the balance between the amount of calculation and the accuracy has been achieved. In addition, the second horse has been corrected and deleted.

Q12) Some writing comments:

Observe the space between the characters; for example, lines 29, 32, 168, 214 and ….

Please pay attention to the correct and appropriate use of a dot or comma at the end of the equations.

Please pay attention to the beginning of the new paragraph; For example, line 173 and ...

Correct lines 95, 198 and 219.

A12 As per your comment, we have checked and corrected the relevant part.

Thanks again for reviewing the paper.

The best regard

from author

 

Author Response File: Author Response.pdf

Reviewer 4 Report

The idea of target detection proposed by the author, such as dual-channel attention mechanism, lightweight network and multi-scale feature, is basically the conventional idea of target detection technology. The summary of innovation points of the paper is not very accurate.

Specific suggestions as follows:

1) In the introduction, the quotation and review of existing technologies are not comprehensive, and it is suggested to supplement the latest technologies in this field. 

2) In the conclusion, the author mentioned that "In future studies, it will be necessary to test different backbones networks.". 

3) In the mathematical formula, some variables do not explain the physical meaning, and it is suggested to supplement them, such as eq.13, eq.14, etc. 

4) 4.1. In Implementation Details, the description of details is too simple, such as the setting of experimental parameters, the selection of network layers, etc. At the same time, the discussion part is relatively small. 

5) In the comparison experiment, only two evaluation indicators were used for evaluation, which is not very objective. It is suggested to supplement the typical indicators of target detection for comparison test.

Author Response

Dear Reviewer

First of all, thank you for reviewing the insufficient paper.

Thank you for your comments on the paper, and I will respond to your comments.

Q1 In the introduction, the quotation and review of existing technologies are not comprehensive, and it is suggested to supplement the latest technologies in this field. 

A1. As requested, the contents of the introduction and summary have been corrected and supplemented.

Q2 In the conclusion, the author mentioned that "In future studies, it will be necessary to test different backbones networks."

A2. As future research, various backbone networks (starting with ViT, high-performance backbones compared to ResNet) and furthermore, it is necessary to apply to various industrial fields, so the contents have been modified.

Q3 In the mathematical formula, some variables do not explain the physical meaning, and it is suggested to supplement them, such as eq.13, eq.14, etc.

A3 Based on your opinion, explanations have been added and corrected for expressions such as Equation 13 where explanations for symbols are insufficient. However, Equation 14 is used as it is because it is a formula representing the sum of the loss functions.Q4 4.1. In Implementation Details, the description of details is too simple, such as the setting of experimental parameters, the selection of network layers, etc. At the same time, the discussion part is relatively small.

Q4 4.1. In Implementation Details, the description of details is too simple, such as the setting of experimental parameters, the selection of network layers, etc. At the same time, the discussion part is relatively small

A4 Based on your opinion, we have supplemented the explanation and contents of the relevant part.

Q5 In the comparison experiment, only two evaluation indicators were used for evaluation, which is not very objective. It is suggested to supplement the typical indicators of target detection for comparison test.

A5 According to your opinion, the comparison status by class was added to the PASCAL VOC in the study of resection.

Thanks again for reviewing the paper.

The best regard

from author

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

see line 219

 

Author Response

Dear Reviewer

First of all, thank you for reviewing the insufficient paper.

The reviewer's comments have been confirmed and the contents have been corrected.

Thanks again for reviewing the paper.

The best regard

from author

Author Response File: Author Response.pdf

Reviewer 3 Report

The manuscript is not acceptable. Authors' responses to comments were not convincing/complete:

1)     Repeated comment [due to not receiving a convincing answer]:

Authors should clearly describe what the major contribution of this work is compared to the following works:

-        H. Shi, Q. Zhou, Y. Ni, X. Wu, and L. J. Latecki, "DPNET: Dual-Path Network for Efficient Object Detectioj with Lightweight Self-Attention," arXiv preprint arXiv:2111.00500, 2021.

-        Q. Zhou, H. Shi, W. Xiang, B. Kang, X. Wu, and L. J. Latecki, "DPNet: Dual-Path Network for Real-time Object Detection with Lightweight Attention," arXiv preprint arXiv:2209.13933, 2022.

-        J. Pan, H. Sun, Z. Song, and J. Han, "Dual-Resolution Dual-Path Convolutional Neural Networks for Fast Object Detection," Sensors, vol. 19, no. 14, p. 3111, 2019.

2)     Repeated comment [in this regard, a convincing technical justification with full description should be added to the manuscript]:

It should be logically justified and clearly explained why the addition of a DLM leads to improved detection accuracy.

3)     Repeated comment [some formulas need reference (for example, Eqs. (15)-(18)]:

Please provide references for non-original formulas.

4)     Repeated comment [is it real/integer/complex/...?]:

What values can p take in Eq. (8)? In other words, specify its domain.

5)     Repeated comment [explanation along with relevant reference should be added to the manuscript]:

What is “Centerness”?

6)     Unfixed duplicate writing/grammar points:

-        The end of Eq. (18) needs a comma and not a dot. Also, a new paragraph should not start after that (similarly for line 174).

-        The sentences in lines 96 and 199 have grammatical/writing problems.

 

-        The equation (15-18) shows precision, recall, AP, and mAP. → The equations (15-18) show precision, recall, AP, and mAP

Author Response

Dear reviewer

Thank you again for reviewing the thesis.

We will answer any questions you may have.

Q.1) Repeated comment [due to not receiving a convincing answer]:
Authors should clearly describe what the major contribution of this work is compared to the following works:

H. Shi, Q. Zhou, Y. Ni, X. Wu, and L. J. Latecki, "DPNET: Dual-Path Network for Efficient Object Detectioj with Lightweight Self-Attention," arXiv preprint arXiv:2111.00500, 2021.

Q. Zhou, H. Shi, W. Xiang, B. Kang, X. Wu, and L. J. Latecki, "DPNet: Dual-Path Network for Real-time Object Detection with Lightweight Attention," arXiv preprint arXiv:2209.13933, 2022.

A.1) I read the thesis referring to the opinions of the reviewers. Three papers are based on the single shot multibox detector (SSD), an anchor-based 1-stage network.

The proposed method of each paper is as follows.
DualNet (2019) is a lightweight network that proposes a low-resolution and high-resolution dual path structure using the features of two CNNs (MobileNetv2-SSD, MobileNetv2) with different resolutions (300px, 600px).

DPNet(2021), DPNet(2022), papers by the same author have been identified as models for performance-limited devices such as edge devices. It starts with the same purpose as DualNet (2019).

DPNet (2021) is a network that divides one backbone network into dual-path when it is 1/8 the resolution of the original, and approaches it from the perspective of high resolution and low resolution at different resolutions. At this time, when extracting features (like CSP, the features are calculated using depth-wise conv with low computational complexity by dividing two paths, and the spatial and channel features of the input feature map are emphasized with softmax-based LCAM.

Another method, DPNet (2022), has a structure similar to DPNet (2021).
Both methods use a backbone and an attention mechanism that basically uses two different resolutions (low resolution, high resolution).

However, the proposed method is a similar but different approach. Therefore, in the introduction part of this thesis, a mention of the technology was added to the part about lightweight technology.


J. Pan, H. Sun, Z. Song, and J. Han, "Dual-Resolution Dual-Path Convolutional Neural Networks for Fast Object Detection," Sensors, vol. 19, no. 14, p. 3111, 2019.

Q.2) Repeated comment [in this regard, a convincing technical justification with full description should be added to the manuscript]:

It should be logically justified and clearly explained why the addition of a DLM leads to improved detection accuracy.

A.2) Adding a Dula-path lightweight module (DLM) extracts features with optimized methods for spatial and channel information in the DLM compared to FCOS consisting of only standard convolution. Spatial information is LSM, channel information is CA, and important channel information is extracted from feature maps.


Q.3) Repeated comment [some formulas need a reference (for example, Eqs. (15)-(18)]: Please provide references for non-original formulas.

A.3) Added reference to formulas (15)-(18). (Covered in one paper.)

Q.4) Repeated comment [is it real/integer/complex/...?]:
What values can p take in Eq. (8)? In other words, specify its domain.
A.4) The predicted value p has the domain p \in \real^{H\times W \times C} H: height, W: width, C: number of Class.

Q.5) Repeated comment [explanation along with relevant reference should be added to the manuscript]: What is “Centerness”?
A.5) An explanation of centerness loss has been added.

Q.6) Unfixed duplicate writing/grammar points:

- The end of Eq. (18) needs a comma and not a dot. Also, a new paragraph should not start after that (similarly for line 174).

- The sentences in lines 96 and 199 have grammatical/writing problems.

- The equation (15-18) shows precision, recall, AP, and mAP. → The equations (15-18) show precision, recall, AP, and mAP

A.6) I checked the contents and corrected and checked the grammar.

Author Response File: Author Response.pdf

Reviewer 4 Report

The authors have addressed all my comments. One suggestion is that in the summary, it is suggested to use the present tense instead of the past tense.

Author Response

Dear Reviewer

First of all, thank you for reviewing the insufficient paper.

The reviewer's comments have been confirmed and the contents have been corrected.

Thanks again for reviewing the paper.

The best regard

from author

Author Response File: Author Response.pdf

Back to TopTop