Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

NGIoU Loss: Generalized Intersection over Union Loss Based on a New Bounding Box Regression

Appl. Sci. 2022, 12(24), 12785; https://doi.org/10.3390/app122412785

by Chenghao Tong, Xinhao Yang^*

, Qing Huang and Feiyang Qian

Reviewer 1:

Wattanapong Kurdthongmee

Reviewer 2:

Thippa Reddy Gadekallu

Reviewer 3:

Prakash Duraisamy

Appl. Sci. 2022, 12(24), 12785; https://doi.org/10.3390/app122412785

Submission received: 29 October 2022 / Revised: 25 November 2022 / Accepted: 7 December 2022 / Published: 13 December 2022

(This article belongs to the Special Issue Advances in Deep Learning III)

Round 1

Reviewer 1 Report

In this paper, the new IoU Loss function is introduced. It is used to test with three different object detection frameworks. The results indicate slightly improvement to the detectors.

Major:

The weakness: lack reviews on similar and possibly state-of-the-art approaches; i.e. softer-NMS [Yihui He et al.], DIoU, ..
Also, the previous published review article that could provide useful information concerning the loss function was untouched: An Updated IoU Loss Function for Bounding Box Regression.

More discussions are required.

Minor

Check citation format. Require space prior to the citation; i.e. network structure [1-4].
Check citation format. Only family name is used; i.e., page 3 line 78: Hamid Rezatofighi et al. should be replaced by Rezatofighi et al. [XXX].
Check spelling; i.e., as see or as seen.

Author Response

Point 1：The weakness: lack reviews on similar and possibly state-of-the-art approaches; i.e. softer-NMS [Yihui He et al.], DIoU, ..

Response 1: A review of similar and possibly most advanced methods has been supplemented in Related Work.

Point 2：Also, the previous published review article that could provide useful information concerning the loss function was untouched: An Updated IoU Loss Function for Bounding Box Regression.

Response 2: I read the document carefully and added references to it.（Reference 31）

Point 3：More discussions are required.

Response 3: Experiments of different loss functions on yolov5 have increased.

Point 4：Check citation format. Require space prior to the citation; i.e. network structure [1-4].

Check citation format. Only family name is used; i.e., page 3 line 78: Hamid Rezatofighi et al. should be replaced by Rezatofighi et al. [XXX].

Response 4: The citation format has been modified.

Point 5：Check spelling; i.e., as see or as seen.

Response 5: Spelling format checked.

Reviewer 2 Report

- The abstract can include the numerical results obtained.

- The motivations and novelty of this work have to be discussed in more details.

- A section related works can be added to summarize some of the recent works such as the following:

Catalysis of neural activation functions: Adaptive feed-forward training for big data applications, Computer Vision and Recognition Systems: Research Innovations and Trends

- The results obtained have to be compared with recent state of the art.

- Analyse the computational complexity of the proposed approach.

- The authors should enhance the results section by presenting an analysis that includes the authors inferences on the results obtained.

- What are the threats to validity of the proposed approach?

Author Response

Point 1:The abstract can include the numerical results obtained.

Response 1:The abstract has been modified.

Point 2:The motivations and novelty of this work have to be discussed in more details.

Response 2:The calculation process of DIOU loss function only includes the overlapping area and center distance of the ground truth detection box and the predicted detection box, which ignores the aspect ratio of the two boxes. The aspect ratio of CIOU describes the relative value, which has some ambiguity and does not consider the balance of difficult samples.

A new bounding box is added. The NGIoU loss works efficiently even the overlap ratio of the two remains unchanged. The coefficient of penalty term in overlapping is adjusted appropriately.

More details are added in section 1.

Point 3:A section related works can be added to summarize some of the recent works such as the following:

Catalysis of neural activation functions: Adaptive feed-forward training for big data applications，

Response 3:Chapter Related Work has been added.

Point 4:The results obtained have to be compared with recent state of the art.

Response 4:Experiments of different loss functions on yolov5 have increased.

Point 5:Analyse the computational complexity of the proposed approach.

Response 5:The network training time with NGIoU loss function is 49 minutes shorter than that with GIoU loss function, and has slight advantages over those with CIoU loss function.

More details are added in section 4（YOLOv5 Algorithm Based on PASCAL VOC 2007）.

Point 6:The authors should enhance the results section by presenting an analysis that includes the authors inferences on the results obtained.

Response 6:The discussion in the results section has been enhanced.

Point 7:What are the threats to validity of the proposed approach?

Response 7:It can be seen from Table 2 that the yolov5 model with NGIoU Loss function did not achieve the highest ap value when predicting objects of boat, bottle, sheep and other categories. Although the overall ap value was improved, the improvement of NGIoU Loss function had certain limitations.

Reviewer 3 Report

if possible add references from 2021-2022

in section 2.2 Hamid Rezatofighi et al cite the reference number

if possible at the end of section I, talk about what is missing in the past and what is new in this paper and motivation of this paper and how it bridges the research gap

yolo4 is one stage detector and how this can be replaced for multiple boxes at the same time

any plan to incorporate Yolo 5

from Table 2 , when you compare smooth VS IOU for AP there is only little improvement, do you think it makes any significant impact in overall result with respect to ground truth

you are only focussing on 2D images, in most cases the accuracy matters depend in the object size and how many objects you want to target on the image. do you consider this factor into account

Author Response

Point 1:if possible add references from 2021-2022

Response 1:References from 2021-2022 are added.

Point 2:in section 2.2 Hamid Rezatofighi et al cite the reference number

Response 2:The citation format has been modified.

Point 3: if possible at the end of section I, talk about what is missing in the past and what is new in this paper and motivation of this paper and how it bridges the research gap

Response 3:This has been added at the end of Part 1.

Point 4:yolo4 is one stage detector and how this can be replaced for multiple boxes at the same time

any plan to incorporate Yolo 5

Response 4:In the feature utilization part, YoloV4 extracted multiple feature layers for target detection. Three feature layers were extracted in total, and each feature layer included three prior boxes of different scales. In the output layer, YoloV4 obtained the prediction results of the three feature layers, obtained the center of the prediction box by decoding, and then calculated the length of the prediction box by combining the prior box with h and w. The final prediction also requires score ranking and non-maximum suppression.

Experiments of different loss functions on yolov5 have increased.

Point 5:from Table 2 , when you compare smooth VS IOU for AP there is only little improvement, do you think it makes any significant impact in overall result with respect to ground truth

Response 5:Although the ap value of iou is slightly higher than that of smooth, it can be seen from FIG. 10 that iou has a significant performance advantage over smooth in the field of multi-target detection.

Point 6:you are only focussing on 2D images, in most cases the accuracy matters depend in the object size and how many objects you want to target on the image. do you consider this factor into account

Response 6:In general, accuracy depends on whether the object is judged to be of a certain class and the predicted score. The object detection model will only recognize different categories of objects in the data set. The ngiou loss function assists the model in identifying small or overlapping objects, thereby increasing the number of objects to be located in the image.

Round 2

Reviewer 2 Report

All the comments are addressed.

Article Menu

NGIoU Loss: Generalized Intersection over Union Loss Based on a New Bounding Box Regression

Further Information

Guidelines

MDPI Initiatives

Follow MDPI