NGIoU Loss: Generalized Intersection over Union Loss Based on a New Bounding Box Regression
Round 1
Reviewer 1 Report
- In this paper, the new IoU Loss function is introduced. It is used to test with three different object detection frameworks. The results indicate slightly improvement to the detectors.
Major:
- The weakness: lack reviews on similar and possibly state-of-the-art approaches; i.e. softer-NMS [Yihui He et al.], DIoU, ..
- Also, the previous published review article that could provide useful information concerning the loss function was untouched: An Updated IoU Loss Function for Bounding Box Regression.
- More discussions are required.
Minor
- Check citation format. Require space prior to the citation; i.e. network structure [1-4].
- Check citation format. Only family name is used; i.e., page 3 line 78: Hamid Rezatofighi et al. should be replaced by Rezatofighi et al. [XXX].
- Check spelling; i.e., as see or as seen.
Author Response
Point 1:The weakness: lack reviews on similar and possibly state-of-the-art approaches; i.e. softer-NMS [Yihui He et al.], DIoU, ..
Response 1: A review of similar and possibly most advanced methods has been supplemented in Related Work.
Point 2:Also, the previous published review article that could provide useful information concerning the loss function was untouched: An Updated IoU Loss Function for Bounding Box Regression.
Response 2: I read the document carefully and added references to it.(Reference 31)
Point 3:More discussions are required.
Response 3: Experiments of different loss functions on yolov5 have increased.
Point 4:Check citation format. Require space prior to the citation; i.e. network structure [1-4].
Check citation format. Only family name is used; i.e., page 3 line 78: Hamid Rezatofighi et al. should be replaced by Rezatofighi et al. [XXX].
Response 4: The citation format has been modified.
Point 5:Check spelling; i.e., as see or as seen.
Response 5: Spelling format checked.
Reviewer 2 Report
- The abstract can include the numerical results obtained.
- The motivations and novelty of this work have to be discussed in more details.
- A section related works can be added to summarize some of the recent works such as the following:
Catalysis of neural activation functions: Adaptive feed-forward training for big data applications, Computer Vision and Recognition Systems: Research Innovations and Trends
- The results obtained have to be compared with recent state of the art.
- Analyse the computational complexity of the proposed approach.
- The authors should enhance the results section by presenting an analysis that includes the authors inferences on the results obtained.
- What are the threats to validity of the proposed approach?
Author Response
Point 1:The abstract can include the numerical results obtained.
Response 1:The abstract has been modified.
Point 2:The motivations and novelty of this work have to be discussed in more details.
Response 2:The calculation process of DIOU loss function only includes the overlapping area and center distance of the ground truth detection box and the predicted detection box, which ignores the aspect ratio of the two boxes. The aspect ratio of CIOU describes the relative value, which has some ambiguity and does not consider the balance of difficult samples.
A new bounding box is added. The NGIoU loss works efficiently even the overlap ratio of the two remains unchanged. The coefficient of penalty term in overlapping is adjusted appropriately.
More details are added in section 1.
Point 3:A section related works can be added to summarize some of the recent works such as the following:
Catalysis of neural activation functions: Adaptive feed-forward training for big data applications,
Response 3:Chapter Related Work has been added.
Point 4:The results obtained have to be compared with recent state of the art.
Response 4:Experiments of different loss functions on yolov5 have increased.
Point 5:Analyse the computational complexity of the proposed approach.
Response 5:The network training time with NGIoU loss function is 49 minutes shorter than that with GIoU loss function, and has slight advantages over those with CIoU loss function.
More details are added in section 4(YOLOv5 Algorithm Based on PASCAL VOC 2007).
Point 6:The authors should enhance the results section by presenting an analysis that includes the authors inferences on the results obtained.
Response 6:The discussion in the results section has been enhanced.
Point 7:What are the threats to validity of the proposed approach?
Response 7:It can be seen from Table 2 that the yolov5 model with NGIoU Loss function did not achieve the highest ap value when predicting objects of boat, bottle, sheep and other categories. Although the overall ap value was improved, the improvement of NGIoU Loss function had certain limitations.
Reviewer 3 Report
if possible add references from 2021-2022
in section 2.2 Hamid Rezatofighi et al cite the reference number
if possible at the end of section I, talk about what is missing in the past and what is new in this paper and motivation of this paper and how it bridges the research gap
yolo4 is one stage detector and how this can be replaced for multiple boxes at the same time
any plan to incorporate Yolo 5
from Table 2 , when you compare smooth VS IOU for AP there is only little improvement, do you think it makes any significant impact in overall result with respect to ground truth
you are only focussing on 2D images, in most cases the accuracy matters depend in the object size and how many objects you want to target on the image. do you consider this factor into account
Author Response
Point 1:if possible add references from 2021-2022
Response 1:References from 2021-2022 are added.
Point 2:in section 2.2 Hamid Rezatofighi et al cite the reference number
Response 2:The citation format has been modified.
Point 3: if possible at the end of section I, talk about what is missing in the past and what is new in this paper and motivation of this paper and how it bridges the research gap
Response 3:This has been added at the end of Part 1.
Point 4:yolo4 is one stage detector and how this can be replaced for multiple boxes at the same time
any plan to incorporate Yolo 5
Response 4:In the feature utilization part, YoloV4 extracted multiple feature layers for target detection. Three feature layers were extracted in total, and each feature layer included three prior boxes of different scales. In the output layer, YoloV4 obtained the prediction results of the three feature layers, obtained the center of the prediction box by decoding, and then calculated the length of the prediction box by combining the prior box with h and w. The final prediction also requires score ranking and non-maximum suppression.
Experiments of different loss functions on yolov5 have increased.
Point 5:from Table 2 , when you compare smooth VS IOU for AP there is only little improvement, do you think it makes any significant impact in overall result with respect to ground truth
Response 5:Although the ap value of iou is slightly higher than that of smooth, it can be seen from FIG. 10 that iou has a significant performance advantage over smooth in the field of multi-target detection.
Point 6:you are only focussing on 2D images, in most cases the accuracy matters depend in the object size and how many objects you want to target on the image. do you consider this factor into account
Response 6:In general, accuracy depends on whether the object is judged to be of a certain class and the predicted score. The object detection model will only recognize different categories of objects in the data set. The ngiou loss function assists the model in identifying small or overlapping objects, thereby increasing the number of objects to be located in the image.
Round 2
Reviewer 2 Report
All the comments are addressed.