On the Application of DiffusionDet to Automatic Car Damage Detection and Classification via High-Performance Computing
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript introduces an enhanced iteration of the car damage claim management system. The overarching framework of the system has been previously disclosed in earlier publications. The principal advancement highlighted in this study is the implementation of a more extensive and intricate deep learning model, facilitated by high-performance computing (HPC) resources. This model is constructed upon a diffusion architecture and a Swin transformer backbone. While there is a noticeable improvement in performance attributable to the increased model size, the originality of the model does not meet the standards required by this journal. It would be more appropriately published as a technical report.
Author Response
Please find the authors' reply to the reviewer in the attached letter.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors This paper presents an enhanced version of Insoore AI for automatic car damage detection and classification in the insurance claim management process. Leveraging the DiffusionDet architecture with a Swin Transformer backbone and the high - performance computing (HPC) resources of the Leonardo HPC system's Booster module, the study aims to overcome the limitations of previous methods. The authors first recap the previous Insoore AI pipeline, which used the Faster R - CNN architecture and faced challenges in handling complex damages. Then, they introduce DiffusionDet, a generative AI - based object detection framework. It formulates object detection as a denoising process, with forward and reverse diffusion steps, and has components like an image encoder and a detection decoder. The enhanced Insoore AI pipeline works by gathering vehicle images, detecting damage, segmenting car parts, mapping damage to parts, calculating the relative damage area, classifying severity, and deciding on repair or replacement. HPC resources are crucial for training GenAI - based deep learning models. Benchmarking shows that the Leonardo Booster setup offers significant training speed improvements compared to a standard configuration. The experiments use the same dataset as the previous work, with annotations for four damage classes. Performance metrics such as AP, AP50, and AP75 are used for evaluation. The results demonstrate a substantial improvement in performance, especially in AP50, which increased from 30.45 to 38.87. Future work will focus on further architectural optimizations and alternative feature extraction techniques.Questions
Author Response
Please find the authors' reply to the reviewer in the attached letter.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis manuscript describes the improvement of authors' previously published automatic car damage recognition and localization tool. The improvement is due to the usage of LEONARDO HPC system and obtained results are reasonable. However, the reviewer would like to suggest the following points for improving this manuscript.
1. Abstract
Usually, in the abstract, the reference numbers, [1], [2], etc., are not included. Because only the abstract is taken and used. So, please exclude the reference numbers.
2. Figure 1:
This figure is not referred to in the manuscript. Please describe this figure by using the symbols in the figure although this figure shows the basics of diffusion model.
3. Figure 2:
Please explain this figure in much more detail. For example, in each of the pairs, the left is the original image and the right is the recognized results, etc. What do the shapes and colors of recognized area mean? The recognized results are satisfactory or not?
4. Figure 3:
The explanation on this figure is insufficient. The details may have been described in the previous paper dealing with Insoore AI, but the manuscript should be self-completed.
5. p.8, line 4 from the bottom:
The reviewer does not understand "Approximately 120,000 images were collected." Where are these images used? These images are used for obtaining the results shown in Figure 4? Please describe the relationship between "the test set includes 540 annotations extracted from 326 images" and these 120,000 images.
6. line 3, in Appendix I:
Maybe a typographical error, "segmentation mapping, All experiments".
7. Absolute evaluation of precision:
As described in "7. Conclusions and Future Work," 27.65% improvement has been attained, however, the reviewer is wondering whether new results are satisfactory or not. Please describe how good the improved results are in practical use.
Author Response
Please find the authors' reply to the reviewer in the attached letter.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI maintain that the originality of the manuscript is constrained. Both the diffusion model and the Swin Transformer are resource-intensive regarding time and computational power. The authors have merely implemented these two techniques in the context of car damage detection and classification. This application represents an adaptation rather than a novel contribution to the field. Consequently, the work would be more appropriately classified as a technical report rather than a research paper.
Author Response
We thank the reviewer for the suggestions and insights. Please find attached our reply to the reviewer.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper can be accepted.
Author Response
We thank you very much for the positive evaluation of our revised manuscript.