Next Article in Journal
Explainable, Flexible, Frequency Response Function-Based Parametric Surrogate for Guided Wave-Based Evaluation in Multiple Defect Scenarios
Previous Article in Journal
Artificial Intelligence of Things for Solar Energy Monitoring and Control
 
 
Article
Peer-Review Record

TCE-YOLOv5: Lightweight Automatic Driving Object Detection Algorithm Based on YOLOv5

Appl. Sci. 2025, 15(11), 6018; https://doi.org/10.3390/app15116018
by Han Wang 1, Zhenwei Yang 2,*, Qiaoshou Liu 2, Qiang Zhang 1 and Honggang Wang 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4:
Appl. Sci. 2025, 15(11), 6018; https://doi.org/10.3390/app15116018
Submission received: 10 January 2025 / Revised: 27 April 2025 / Accepted: 30 April 2025 / Published: 27 May 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript (applsci-3445975) proposes a lightweight object detection algorithm based on YOLOv5 to optimise autonomous driving by reducing network parameters while maintaining accuracy. By modifying the C3 module with Bottleneck convolution grouping, integrating Res2Net for multi-scale feature extraction, and introducing the EIOU loss function for precise box overlap measurement, the model reduces the number of parameters by 20%, computational cost by 21%, and improves mAP0.5 by 1.0%. After TensorRT optimisation, it achieves an inference speed of 61 FPS on the Jetson Xavier NX, representing a 15% increase over the original YOLOv5. In its current form, the manuscript is not suitable for publication. The abstract is unclear and lacks a structured presentation of the authors' work. The introduction fails to provide a current and exploratory overview of the topic, missing essential elements to contextualise the reader. Moreover, it does not present a hypothesis to be tested. The materials and methods section is insufficiently informative, lacking key details necessary for proper characterisation and reproducibility of the study. The results and discussion sections are written in a way that does not adequately reflect the data. Figures, tables, and captions are simplistic and do not comprehensively describe the elements they present. Furthermore, the manuscript lacks an appropriate discussion supported by relevant references.

Comments on the Quality of English Language

Need to check grammar, spelling, and verbosity by native speakers.

Author Response

Reviewer # 1’s, comment 1): 

This manuscript (applsci-3445975) proposes a lightweight object detection algorithm based on YOLOv5 to optimise autonomous driving by reducing network parameters while maintaining accuracy. By modifying the C3 module with Bottleneck convolution grouping, integrating Res2Net for multi-scale feature extraction, and introducing the EIOU loss function for precise box overlap measurement, the model reduces the number of parameters by 20%, computational cost by 21%, and improves mAP0.5 by 1.0%. After TensorRT optimisation, it achieves an inference speed of 61 FPS on the Jetson Xavier NX, representing a 15% increase over the original YOLOv5. In its current form, the manuscript is not suitable for publication. The abstract is unclear and lacks a structured presentation of the authors' work. The introduction fails to provide a current and exploratory overview of the topic, missing essential elements to contextualise the reader. Moreover, it does not present a hypothesis to be tested. The materials and methods section is insufficiently informative, lacking key details necessary for proper characterisation and reproducibility of the study. The results and discussion sections are written in a way that does not adequately reflect the data. Figures, tables, and captions are simplistic and do not comprehensively describe the elements they present. Furthermore, the manuscript lacks an appropriate discussion supported by relevant references.

Authors response:

Thank you very much for your comments which would further improve our article. After carefully thinking your and other Reviewers’ Comments, we have made revisions of our manuscript, especially in Sec.1 Introduction and Sec. 2 Related work. We have restructured these two parts and adding some new reverences. In Sec.1 Introduction, we mainly introduced the important role of object detection and the development of artificial intelligence and its application in object detection. We added the introduction about the latest version of YOLO (YOLOv8[19] and YOLOv10[20]). In Sec. 2 Related work, we focused on introducing the application of deep neural networks, especially YOLO series, in autonomous driving. By comparing and analyzing on theses references, determining the reasons why we chose YOLOv5 to modify for autonomous driving. In 3.4 C3ResNet, we gave more details about modifying the bottleneck. In 4.4 RESULT ANALYSIS, we conducted a quantitative analysis about replacing C3 module C3Res2Net in Table 7. We carefully checked our manuscript on your and other reviewers’ comments,then made Corresponding modifications to tables, figures and writing. We have provided a diff file for all the changes, and the changes in it are highlighted.

 

Reviewer 2 Report

Comments and Suggestions for Authors

- What is the significant contribution of using C3 module in neck is replaced by Res2Net module in relation to other works published in the literature where they are used for situations similar to those proposed?
- Complement with current publications where the proposed topic is addressed, with current versions of YOLO.
- Why use YOLOv5, if there is a current version of YOLO, and current works where the topic is similar, and with performances that are superior or similar to those presented?
- Table 1 presents a performance analysis of different versions of YOLO, but the results are not tested under the same conditions, since YOLOv5 shows 7 parameters and the most current version in that table is YOLOv8. So the results tend to justify YOLOv5 with higher performance. This represents a substantial part of section 4.4. RESULT ANALYSIS.

Author Response

Reviewer # 2’s, comment 1): What is the significant contribution of using C3 module in neck is replaced by Res2Net module in relation to other works published in the literature where they are used for situations similar to those proposed?

Authors response: Thank you for your comment. The significant contribution of replacing the C3 module in the neck with the Res2Net module is the enhanced extraction of multi-scale features, which is analyzed in 3.4. C3Res2Net. In 4.4 RESULT ANALYSIS, we conducted quantitative analysis, as shown in (Table 7 in resubmission version), where it can be observed that after the replacement, the accuracy for small, medium, and large objects all improved. We have provided a diff file for all the changes, and the changes in it are highlighted.

Reviewer # 2’s, comment 2): Complement with current publications where the proposed topic is addressed, with current versions of YOLO.

Authors response: Thank you for your comment. In Introduction, we have included the bibliographic introductions of YOLOv7([19] in resubmission version), YOLOv8([20] in resubmission version), and YOLOv10([21] in resubmission version). In Related Work, we have incorporated discussions and analyses of other relevant literature that have improved upon the YOLO series. Finally, we have clarified why YOLOv8 is not suitable for autonomous driving scenarios and tasks. We have provided a diff file for all the changes, and the changes in it are highlighted.

Reviewer # 2’s, comment 3): Why use YOLOv5, if there is a current version of YOLO, and current works where the topic is similar, and with performances that are superior or similar to those presented?

Authors response: Thank you for your comment. To further elaborate on why we use YOLOv5, we have incorporated YOLOv8([20] in resubmission version) into introduction and conducted a comparative analysis in related work, clarifying that our choice of YOLOv5 is due to its excellent balance between accuracy and speed, enabling real-time detection across various hardware platforms. We have provided a diff file for all the changes, and the changes in it are highlighted.

Reviewer # 2’s, comment 4): Table 1 presents a performance analysis of different versions of YOLO, but the results are not tested under the same conditions, since YOLOv5 shows 7 parameters and the most current version in that table is YOLOv8. So the results tend to justify YOLOv5 with higher performance. This represents a substantial part of section 4.4. RESULT ANALYSIS.

Authors response: Thank you for your comment. All our tests were conducted under the same conditions, particularly for those based on the Jetson Xavier NX, and all models were accelerated by using TensorRT. According to the test results (Table 2 to Table 5), compared to the original YOLOv5, the TCE-Yolov5 improved in this manuscript has reduced its parameter count from 7M to 5.7M and its FLOPs from 16G to 12.6G. These two metrics are the lowest among all compared versions. Although its performance metrics are not the highest, they are still basically the same as the original YOLOv5, and even some performances have been improved. Therefore, it is the most balanced and suitable for deployment in resource-limited in vehicle environments.

Reviewer 3 Report

Comments and Suggestions for Authors

The paper proposes TCE-YOLOv5, a lightweight object detection algorithm optimized for autonomous driving by reducing network parameters and computational complexity. It enhances YOLOv5 by introducing T-C3 and replacing the C3 module in the neck with Res2Net. However, some corrections and improvements should be considered as follows:

  1. The Introduction section mainly discusses YOLOv5 and its lower versions, without mentioning more recent advancements in object detection models.
  2. Some training parameters, such as batch size, number of epochs, etc. are missing and should be reported.
  3. The abbreviated parameters in tables, such as ‘P’  and ‘R’ , should be defined in the table captions or explained in the text.
  4. The novelties and contributions of the proposed algorithm should be more clearly explained.
  5. The text in Figures 8 and 9 is difficult to read and should be improved for better clarity.
  6. Several figures are not described or referenced in the text. All figures should be mentioned and properly explained in the main text. For example, Figures 8 and 9 should be discussed explicitly, and labels should indicate which plot belongs to which algorithm. Additionally, training plots for all compared algorithms and datasets could be included for better analysis.
  7. The Conclusion section repeats the results without deeper insights into why TCE-YOLOv5 is a significant improvement. The impact of its modifications should be better justified.
  8. In Figure 3, the text states: "Fig. 3(a) is the original Bottleneck, and Fig. 3(b) …", but the figure does not explicitly label (a) and (b). This should be corrected.
  9. Equations that are not originally derived by the authors should include proper citations.
Comments on the Quality of English Language
  1. The paper should be revised to correct typos and grammatical errors, such as: "The algorithm presented in this paper are related to YOLOv5", which should be corrected to "The algorithm presented in this paper is related to YOLOv5."

Author Response

Reviewer # 3’s, comment 1): The Introduction section mainly discusses YOLOv5 and its lower versions, without mentioning more recent advancements in object detection models.

Authors response: Thank you for your comment. In Introduction, we have included the bibliographic introductions of YOLOv7([19] in resubmission version), YOLOv8([20] in resubmission version), and YOLOv10([21] in resubmission version). In Related Work, we have incorporated discussions and analyses of other relevant literature that have improved upon the YOLO series. Finally, we have clarified why YOLOv8 is suitable for autonomous driving scenarios and tasks. We have provided a diff file for all the changes, and the changes in it are highlighted.

Reviewer # 3’s, comment 2): Some training parameters, such as batch size, number of epochs, etc. are missing and should be reported.

Authors response: We sincerely apologize for not providing sufficient information. We have added "The batch size is set to 32, and the number of epochs is 150." in the 4.1. EXPERIMENT SETTINGS section. We have provided a diff file for all the changes, and the changes in it are highlighted.

Reviewer # 3’s, comment 3): The abbreviated parameters in tables, such as ‘P’  and ‘R’ , should be defined in the table captions or explained in the text.

Authors response: We sincerely apologize for not providing sufficient information. In 4.3. EVALUATION METRICS, we have added "P (Precision), R (Recall), and..."as explanation.

Reviewer # 3’s, comment 4): The novelties and contributions of the proposed algorithm should be more clearly explained.

Authors response: Thank you for your comment. The innovations and contributions of this paper are summarized at the end of Related Work sections. The primary contribution of this manuscript is the adoption of T-C3 to reduce model parameters and computational load. In the neck part, C3Res2Net is utilized to facilitate deep interactions between features. The introduction of EIOU provides a more accurate measurement of the overlap between the predicted bounding box and the ground truth box. To better acquaint the readers with the rationale behind choosing YOLOv5, we have included discussions on YOLOv8([20] in resubmission version) and newer versions in both the Introduction and Related Work sections, analyzing and concluding the reasons for selecting YOLOv5 in the field of autonomous driving. Furthermore, in Section 3. Method, we provide detailed explanations of the improvements made to YOLOv5 in this manuscript. We have provided a diff file for all the changes, and the changes in it are highlighted.

Reviewer # 3’s, comment 5): The text in Figures 8 and 9 is difficult to read and should be improved for better clarity.

Authors response: We apologize for such mistakes. During the revision process, we conducted discussions and research. We think that the core theme of this manuscript is model light-weighting. While loss graphs are helpful for understanding the training process, they are not directly related to the theme of light-weighting. Light-weighting primarily focuses on model compression and acceleration in inference process, whereas loss graphs mainly reflect the fitting situation on the training set. After carefully arguing, we have decided to remove the loss graphs.

Reviewer # 3’s, comment 6): Several figures are not described or referenced in the text. All figures should be mentioned and properly explained in the main text. For example, Figures 8 and 9 should be discussed explicitly, and labels should indicate which plot belongs to which algorithm. Additionally, training plots for all compared algorithms and datasets could be included for better analysis.

Authors response: We apologize for such mistakes. During the revision process, we conducted discussions and research. We think that the core theme of this manuscript is model light-weighting. While loss graphs are helpful for understanding the training process, they are not directly related to the theme of light-weighting. Light-weighting primarily focuses on model compression and acceleration, whereas loss graphs mainly reflect the fitting situation on the training set. After carefully arguing, we have decided to remove the loss graphs.

Reviewer # 3’s, comment 7): The Conclusion section repeats the results without deeper insights into why TCE-YOLOv5 is a significant improvement. The impact of its modifications should be better justified.

Authors response: Thank you for your comment. We have revised the Conclusion section to provide a detailed description of each improvement and its impact. The experimental results have also been analyzed to verify the effectiveness of the improvements. Finally, we have also outlined our future work, which includes further enhancing the model's accuracy and increasing the number of real-world testing scenarios.

Reviewer # 3’s, comment 8): In Figure 3, the text states: "Fig. 3(a) is the original Bottleneck, and Fig. 3(b) …", but the figure does not explicitly label (a) and (b). This should be corrected.

Authors response: We apologize for such mistakes. We have annotated Figure 3 accordingly.

Reviewer # 3’s, comment 9): Equations that are not originally derived by the authors should include proper citations.

Authors response: Thank you for your comment. We previously cited the formulas for GIOU and DIOU, and now we have also included a citation for the CIOU formula. As for other formulas as Eq.(8), Eq.(9), Eq.(10), Eq.(11) that were not cited, we deemed them as common formulas within the field and therefore did not provide citations.

Reviewer # 3’s, comment 10): The paper should be revised to correct typos and grammatical errors, such as: "The algorithm presented in this paper are related to YOLOv5", which should be corrected to "The algorithm presented in this paper is related to YOLOv5."

Authors response: We apologize for such mistakes. We have changed "The algorithm presented in this paper are related to YOLOv5" in section 3.2 to "The algorithm presented in this paper is related to YOLOv5."

Reviewer 4 Report

Comments and Suggestions for Authors

A lightweight object detection algorithm based on YOLOv5 is proposed to solve the problem of excessive network parameters in automatic driving scenarios. The work is well-structured.

Some suggestions:

Creating a list of abbreviations at the end would facilitate reading.

The text mentions that "innovative optimization strategies" have been implemented. Are these strategies detailed in the article? Can the reader clearly understand what these optimizations are?

The article mentions "feature extraction at different scales" with Res2Net. Are there quantitative comparisons showing the impact of this modification?

Are the presented results sufficient to prove the superiority of the model? 

Are there tests in different autonomous driving scenarios, or were the experiments limited to the CCTDSB2021 and KITTI datasets?

The statement "we significantly reduced the parameter count"—is this quantified? What is the percentage reduction or absolute number of parameters?

The article mentions a 15% increase in inference speed on the Jetson Xavier NX platform. Is this improvement solely due to model optimization, or are other factors involved?

The conclusion reinforces the advantages of TCE-YOLOv5 but does not mention possible limitations or future research directions. Including this could provide a more balanced closing.

Author Response

Reviewer # 4’s, comment 1): Creating a list of abbreviations at the end would facilitate reading.

Authors response: Thank you for your suggestion. Adding a list of abbreviations is very beneficial to reading. We added a table (table 1 in resubmission version) of abbreviations at the beginning of Sec. 3 METHOD.

Reviewer # 4’s, comment 2): The text mentions that "innovative optimization strategies" have been implemented. Are these strategies detailed in the article? Can the reader clearly understand what these optimizations are?

Authors response: Thank you for your comment. The innovations and contributions of this paper are summarized at the end of Related Work sections. The primary contribution of this manuscript is the adoption of T-C3 to reduce model parameters and computational load. In the neck part, C3Res2Net is utilized to facilitate deep interactions between features. The introduction of EIOU provides a more accurate measurement of the overlap between the predicted bounding box and the ground truth box. To better acquaint the readers with the rationale behind choosing YOLOv5, we have included discussions on YOLOv8([20] in resubmission version) and newer versions in both the Introduction and Related Work sections, analyzing and concluding the reasons for selecting YOLOv5 in the field of autonomous driving. Furthermore, in Sec.3 Method, we provide detailed explanations of the improvements made to YOLOv5 in this manuscript. We have provided a diff file for all the changes, and the changes in it are highlighted.

Reviewer # 4’s, comment 3): The article mentions "feature extraction at different scales" with Res2Net. Are there quantitative comparisons showing the impact of this modification?

Authors response: Thank you for your comment. To illustrate Res2Net's ability to extract features at different scales, we conducted a quantitative analysis comparing the detection accuracy of YOLOv5 with C3Res2Net substitutions in 4.4 RESULT ANALYSIS against the original YOLOv5 for small, medium, and large targets. The result analysis is presented in (Table 7 in resubmission version). It can be observed that the accuracy for small, medium, and large targets has improved to some extent after the substitution. We have provided a diff file for all the changes, and the changes in it are highlighted.

Reviewer # 4’s, comment 4): Are the presented results sufficient to prove the superiority of the model? Are there tests in different autonomous driving scenarios, or were the experiments limited to the CCTDSB2021 and KITTI datasets?

Authors response: Thank you for your comment. We believe that the presented results can demonstrate the superiority of our model. In preliminary work, our tests were limited to the CCTDSB2021 and KITTI datasets, but they covered a wide range of scenarios. In future work, we will conduct tests in real-world scenarios. Moreover, State Key Laboratory of Intelligent Vehicle Safety Technology also has the capability to provide sufficient real-world testing scenarios.

Reviewer # 4’s, comment 5): The statement "we significantly reduced the parameter count"—is this quantified? What is the percentage reduction or absolute number of parameters?

Authors response: Thank you for your comment. We conducted quantization, as mentioned in 4.4. RESULTAN ALYSIS, resulting in a 18.5% reduction in the number of parameters compared to the original YOLOv5. But we think that the wording was inappropriate, as a 18.5% reduction in parameters cannot be described as "significant". Therefore, we have revised it to "we reduced the parameter count" in Conclusions. We have provided a diff file for all the changes, and the changes in it are highlighted.

Reviewer # 4’s, comment 6): The article mentions a 15% increase in inference speed on the Jetson Xavier NX platform. Is this improvement solely due to model optimization, or are other factors involved?

Authors response: Thank you for your comment. To provide a clearer description of the experimental environment setup, we have added the following to the end of section 4.1 EXPERIMENT SETTINGS in this manuscript: "All models were trained on a same PC. Similarly, when deployed on the Jetson Xavier NX, all models were quantized by using the TensorRT method." Therefore, the increase in inference speed is attributed to model optimization. We have provided a diff file for all the changes, and the changes in it are highlighted.

Reviewer # 4’s, comment 7): The conclusion reinforces the advantages of TCE-YOLOv5 but does not mention possible limitations or future research directions. Including this could provide a more balanced closing.

Authors response: Thank you for your comment. TCE-YOLOv5 is mainly designed for autonomous driving, where it excels in providing accurate and real-time object detection. However, when applied to other complex scenarios outside of edge computing scenarios in which devices with limited resources,such as autonomous driving, its performance may drop due to the differences in environmental conditions and object types. In that scenarios higher version YOLO such as YOLOv8, YOLOv11 and so on maybe a better choice.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I thank the authors for addressing my previous comments. However, the revisions made are still insufficient to make the manuscript suitable for publication. Important elements are still lacking, such as conclusions, appropriate statistical analyses, and other aspects I had previously noted. Although the authors responded to these points, they were not fully incorporated into the manuscript.

Author Response

Reviewer # 1’s, comment 1): I thank the authors for addressing my previous comments. However, the revisions made are still insufficient to make the manuscript suitable for publication. Important elements are still lacking, such as conclusions, appropriate statistical analyses, and other aspects I had previously noted. Although the authors responded to these points, they were not fully incorporated into the manuscript.

Authors response: Thank you for your comment. In response to your suggestions, We have made every effort to revise the manuscript. To further clarify the rationale for replacing C3 modules with T-C3 in backbone network, we added ” Parameters count and computational load of the network model are mainly concentrated in backbone network.” in Section 3.3, and added ” Parameters count is reduced by about 38% and the amount of computation cost is reduced by about 44%.” in Section 3.4. To further illustrate replacing C3 with C3Res2Net in neck network, we added "However, to align with the improvements made..." in Section 3.5. To verify the validity of replacing C3 with C3Res2Net, we added (Table 2 in resubmission version)and added "C3 and C3Res2Net were compared and the results are shown in Table 2..." in Section 3.5. The results are analyzed and explained. To further explain why EIOU is superior than CIOU, we added "EIOU explicitly incorporates the differences in width and height..." in Section 3.6. We summarized Tables 5(resubmission version) and Table 6(resubmission version) to further illustrate the performance of TCE-YOLOv5 in edge devices over the other two models. In Section 4.4, "Based on the analysis of Table 5 and Table 6, it can be concluded that..." was added. We further analyzed Fig.8 and added "Meanwhile, it can be seen that the detection accuracy of TCE-YOLOv5..." in Section 4.4 to confirm the conclusion in Table 2(Table 7 in old version). In Conclusion, we delete the redundancy, focus on the improvement of the module and the analysis of the results, so as to make the structure clearer. We have provided a diff file for all the changes, and the changes in it are highlighted.Thanks again,and hope you can accept our revised manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

.

Author Response

Thank you for your valuable comments to improve the manuscript.

Reviewer 3 Report

Comments and Suggestions for Authors

I have reviewed the authors' responses to the comments. However, the revised parts are not highlighted in the paper. How can I identify the changes compared to the previous version?

Additionally, the similarity (plagiarism) percentage of the manuscript is high and should be reduced to below 20-25%.

Author Response

Reviewer # 3’s, comment 1): I have reviewed the authors' responses to the comments. However, the revised parts are not highlighted in the paper. How can I identify the changes compared to the previous version?

Authors response: Thank you for your comment. Last time we submitted a total of three files, including MDPI_template.zip, new.pdf, and Cover Letter. We highlighted the changes in the diff file and put it in MDPI_template.zip. This time we not only put the diff file in MDPI_template.zip, but also submitted the diff file to Supplementary File(s), which you can find in both places.

 

Reviewer # 3’s, comment 2): Additionally, the similarity (plagiarism) percentage of the manuscript is high and should be reduced to below 20-25%.

Authors response: Thank you for your comment. We made sure that the manuscript is original and that all references were noted. We have checked the revised manuscript in authoritative institutions, and the similarity percentage is less than 20%.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The authors answered the questions satisfactorily.

Author Response

Thank you for your valuable comments to improve the manuscript.

Round 3

Reviewer 1 Report

Comments and Suggestions for Authors

I thank the authors for responding to my comments once again. However, the revised version is still not adequate for publication. The figure legends remain overly simplistic, and the data still lack appropriate statistical treatment. Additionally, the “related works” section has not been revised or refined. I also reiterate my question: where are the models, so that other researchers may use or replicate them?

Author Response

Reviewer # 1’s, comment 1):

I thank the authors for responding to my comments once again. However, the revised version is still not adequate for publication. The figure legends remain overly simplistic, and the data still lack appropriate statistical treatment. Additionally, the “related works” section has not been revised or refined. I also reiterate my question: where are the models, so that other researchers may use or replicate them?

Authors response:

Thank you for your comment. We have improved the related work, analyzed the advantages and disadvantages of the methods in the references, and added the relevant literature on traffic target detection at the same time. None of them achieved a good balance between accuracy and lightweight. We have included the model and code for PC platform in the compressed package, but we did not include the code for the Jetson Xavier NX platform due to the code cannot be used directly. Anyone who want use the code in Jetson Xavier NX platform must regenerate it by the tensorRT tool based-on the configuration of the platform. All changes are highlighted in the diff.pdf file.

 

We have carefully checked and found that apart from the overexposure in Fig.7 (which has now been replaced), there are no problems with other figures. We believe that all the figures legends are ok now. Would you please point out which figure legends remain overly simplistic in the manuscript? Thank you again.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have successfully addressed the comments from the previous version, making significant improvements to the paper. Therefore, the paper can be accepted in its current form.

Author Response

Thank you for your comment.Your suggestions make the manuscript more perfect.

Back to TopTop