An Enhanced YOLOv8n-Based Method for Fire Detection in Complex Scenarios

Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsReviewer Comments
Summary: Urban and forest fires are occurring with increasing frequency, likely driven by climate change, and can result in extensive property destruction, substantial economic losses, and severe environmental degradation. In such scenarios, the deployment of a reliable automated fire detection system becomes crucial, as it can alert authorities during the early stages of a fire and help avert large-scale catastrophes. To address this critical challenge, the authors propose a machine learning-based approach for fire detection. For improvements, see comments below.
Comment 1: The introduction section requires significant improvement to establish a clear motivation for the study. Specifically, it should include a more comprehensive review of prior research on fire detection systems, with proper citations to relevant literature. It is important to highlight the limitations and challenges faced by existing approaches—such as delays in detection, false positives due to smoke-like artifacts, limited generalizability across environments, or sensitivity to lighting and weather conditions.
Furthermore, the introduction should clarify how the proposed study addresses these gaps. For example, after presenting the limitations of earlier works, it would be helpful to discuss whether more recent studies have attempted to overcome these issues and, if so, whether those efforts were successful or fell short. This comparative discussion would provide a solid foundation for justifying the need for the current study.
Overall, a more structured and critical literature review detailing both the shortcomings of prior methods and how the current approach offers improvements will significantly strengthen the introduction and better contextualize the contribution of this work.
Comment 2: The authors have proposed a multi-step approach to defend against alpha-channel-based adversarial attacks within the context of object detection. However, it remains unclear what advantage the inclusion of the alpha channel offers in the model's input for this particular task. In many computer vision applications, including object detection, models are typically trained and configured to accept standard RGB inputs. If the input pipeline is explicitly restricted to RGB channels, the threat of alpha-channel-based adversarial manipulation is effectively eliminated, thereby simplifying the model architecture and reducing computational overhead.
Given this, it is important for the authors to clarify the rationale for incorporating the alpha channel into the object detection model. Specifically, 1) What information does the alpha channel provide that is essential or beneficial to object detection in this context? 2) Are there datasets used in training or inference that include transparency data relevant to detection tasks? 3) Could the model not be configured to discard or ignore the alpha channel entirely, thereby avoiding the need for specialized defenses?
A clear explanation of why alpha-channel handling is necessary—rather than simply enforcing RGB-only input—would help justify the proposed defense framework and better contextualize its practical relevance.
Comment 3: The manuscript lacks a presentation of the model’s performance specifically for fire detection, which is essential given the stated objective of the study. To strengthen the contribution, the authors should include inference results from their model on a representative test set. These results should be accompanied by ground truth annotations to allow for visual and quantitative evaluation of detection accuracy.
In addition, the authors are encouraged to evaluate the model on real-world fire imagery and publicly available fire detection datasets (e.g., FIRESENSE, FireNet, or Foggia's dataset). This will help assess the model’s generalizability and robustness in diverse environments.
Furthermore, a comparative analysis with existing fire detection models is necessary. This should include both qualitative and quantitative comparisons—highlighting where the proposed model performs better and where it may fall short. Metrics such as precision, recall, F1-score, and inference time should be reported to enable fair benchmarking.
A critical discussion of both the strengths and weaknesses of the model in relation to other state-of-the-art methods will provide a more balanced and transparent assessment of its practical utility.
Comment 4: The conclusion section is currently underdeveloped and requires substantial revision. Once the earlier comments—particularly those related to performance evaluation, comparison with existing methods, and the justification for model design choices—are addressed, the authors will be in a stronger position to draw meaningful conclusions. The revised conclusion should clearly summarize the key findings, highlight the significance of the proposed approach in the context of existing work, and discuss potential real-world applications. Additionally, outlining limitations and suggesting future research directions would further strengthen this section.
Comments for author File: Comments.pdf
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors- Currently, the research is only based on visual computing. Have you considered integrating multi-modal data such as infrared thermal imaging? What is the expected improvement in fire localization accuracy under complex environments?
- The proposed Alpha channel defense mechanism is mainly designed for fire scenarios. Has its applicability in other visual tasks (such as industrial defect detection and medical imaging) been verified? Are there differences in the characteristics of the Alpha channel between different image formats (e.g., JPEG and PNG)?
- The paper only mentions that injecting 10% attack images leads to a decline in model performance. Have you tested the effects of different attack intensities (such as 5% and 20%) on the model? What are the impacts?
- The improved YOLOv8n integrates BiFormer, Agent Attention, and CCC modules. Compared with the original YOLOv8n, what are the changes in model parameters and inference speed? Has the real-time performance on edge devices (such as embedded cameras) been tested? Are there lightweight optimization strategies?
- Have you considered the long-term operational stability of the model in real-world environments? Can this model be applied to fire detection in scenarios other than forest fires (such as industrial production)? If so, is there a calibration mechanism for adapting to environmental changes?
Further improvement is needed in the expression structure of English sentences
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have satisfactorily addressed the comments regarding the introduction and conclusion sections. However, they have not provided sufficient evidence to justify the advantage or necessity of incorporating the alpha channel for fire detection. To establish its value, the authors should compare the model’s performance on images with the alpha channel against its performance on the same images without the alpha channel. If the results demonstrate a clear improvement with the alpha channel, it would provide adequate justification for including the alpha defense module. Otherwise, this module should be removed, and the paper resubmitted without the alpha channel component.
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper has been revised according to the reviewer's comments. It is suggested to further summarize the introduction section, making it more organized and structured.
Author Response
Comments 1:The paper has been revised according to the reviewer's comments. It is suggested to further summarize the introduction section, making it more organized and structured.
Response 1:Thank you very much for your invaluable and expert review. We have meticulously revised the Introduction word-for-word and further condensed its content to ensure a clear, logical flow and a rigorous structure. Should you notice any remaining points for improvement, we would be most grateful for your guidance and will gladly address them straightaway.
Round 3
Reviewer 1 Report
Comments and Suggestions for AuthorsThe revised manuscript contributes to the field of visual computing for fire detection, supporting fire monitoring and early warning in complex visual scenarios. The manuscript may be accepted.
Author Response
Comments 1:The revised manuscript contributes to the field of visual computing for fire detection, supporting fire monitoring and early warning in complex visual scenarios. The manuscript may be accepted.
Response 1:We are sincerely grateful for your positive appraisal of our manuscript and would like to express our heartfelt thanks on behalf of all the authors.