Improving Fire and Smoke Detection with You Only Look Once 11 and Multi-Scale Convolutional Attention
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper presents an enhanced fire and smoke detection system using the latest YOLO11 model. The authors integrate a Multi-Scale Convolutional Attention (MSCA) mechanism into YOLO11 to form YOLO11s-MSCA, addressing challenges like object scale variability and environmental complexity. Experiments on the D-Fire and Fire and Smoke datasets show that the improved model boosts detection accuracy by up to 2.8%, while maintaining high speed and low computational cost.
Although the idea is valid, there are several comments that are pointed out below, and the authors need to solve/answer them:
1. Authors need to provide a detailed comparison of the limitations of existing approaches more explicitly to better position the novel contributions of their work.
2. Mention the advantages and disadvantages of the previous works while you are describing them.
3. To further validate the robustness and generalization of the proposed model, consider evaluating its performance on an additional benchmark dataset beyond D-Fire and the Fire and Smoke Dataset. One more dataset is enough and you can find the best suitable one in the following suggested paper.
-Boroujeni, S. P. H., Mehrabi, N., Afghah, F., McGrath, C. P., Bhatkar, D., Biradar, M. A., & Razi, A. (2025). Fire and Smoke Datasets in 20 Years: An In-depth Review. arXiv preprint arXiv:2503.14552.
- Alkhammash, E. H. (2025). Multi-Classification Using YOLOv11 and Hybrid YOLO11n-MobileNet Models: A Fire Classes Case Study. Fire, 8(1), 17.
- Zhao, C., Zhao, L., Zhang, K., Ren, Y., Chen, H., & Sheng, Y. (2025). Smoke and Fire-You Only Look Once: A Lightweight Deep Learning Model for Video Smoke and Flame Detection in Natural Scenes. Fire, 8(3), 104.
4. To enhance the comprehensiveness of your paper, it would be highly suggested to include more recent and cutting-edge research from 2023 to 2025 and especially 2025. Consider the previous mentioned paper and some new papers.
5. Ensure that all references are correctly cited in the paper. Some references, such as [17] and [19], are cited incorrectly and not properly formatted. Please correct them.
6. Provide a more detailed comparison of YOLO11’s novel modules (e.g., C3k2, C2PSA) with prior YOLO versions to highlight their specific impact on fire/smoke detection.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper proposes a fire and smoke detection method based on YOLO11 and Multi-Scale Convolutional Attention (MSCA), and conducts experimental verification on the D-Fire datasets. After comparing with other YOLO models, it shows that the method has improved in detection accuracy, speed, balance accuracy and so on. However, there is still much room for improvement in the method description, experimental design and result analysis:
- Polish and proofread the full text of the manuscript carefully to ensure that there are no grammatical and spelling errors.
- the manuscript only mentions c3k2 and c2psa modules, and its theoretical analysis is not sufficient. After comparing with other YOLOmodels, it does not discuss the advantages of its architecture improvement compared with other models.
- should the manuscript fully discuss the latest developments in the field of fire detection? For example, should we also introduce the detection advantages of VIT-Fire in complex background in recent years?
- does the manuscript lack a comparative analysis of lightweight model optimization methods? For the optimization of the attention mechanism of YOLO11s MSCA, should we also compare other YOLOmodels under the same type of optimization? It is suggested to supplement more in-depth literature review and compare its architecture differences with other recent optimization models.
- the specific integration mode of MSCA module in yolo11 is not specified, such as the number of layers replaced or added, parameter initialization strategy and training optimization method (page 7-8). Should we explain how it is combined with the FPN/PANet structure of yolo11?
- the manuscript does not specify the preprocessing process of d-fire datasets. Moreover, page 5 mentioned "the dataset also contains 9,838 images without fire or smoke" as negative samples, but lacked specific screening criteria. And the classification and types in the sample, should they be explained?
- why is part a in Figure 1 a UAV platform? Is there an error? The detection example in Figure 8-10 has low resolution and insufficient contrast between the callout box and the background, which affects the visualization of the improved effect. It is recommended to carefully correct the chart of the full text.
- no ablation experiment was designed to verify the independent contribution of MSCA. Although the YOLO model without MSCA was compared, the influence of other factors such as optimization of training strategy on the experimental results could not be ruled out.
In conclusion, the manuscript did not analyze the limitations of the structure of YOLO11+MSCA. Are there problems such as insufficient feature fusion?
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper proposes effective improvements for fire detection tasks, with a relatively comprehensive experimental design. However, further optimization is needed in terms of method details, experimental rigor, and depth of result analysis. It is recommended to supplement key experiments and theoretical explanations to enhance the technical contribution and reproducibility of the paper.
- Some sentences contain grammatical errors (e.g., in the Abstract, "demonstrate strong robust-ness" should be merged into "robustness").
- The literature review's description of the YOLO series evolution is too general and does not clearly compare the core improvements between YOLO11 and YOLOv8/v9/v10 (e.g., the specific functions of the C3k2 and C2PSA modules).
- It is recommended to supplement YOLO11’s key innovations and cite its official technical documentation or benchmark results to strengthen the explanation of the model’s advantages.
- There is a lack of ablation experiments to verify the independent contribution of the MSCA module (e.g., whether adding only MSCA outperforms other attention mechanisms).
- Some of the latest deep learning methods should be introduced.
- Details on the integration of the MSCA module are insufficient. It does not clearly state its specific position within the YOLO11 network (e.g., which stage of the neck part) and parameter settings (e.g., convolution kernel size, number of branches).
- The dataset description is not detailed enough. Differences in acquisition scenarios and annotation standards between D-Fire and the Fire and Smoke Dataset are not clearly explained, which may affect the comparability of results.
- Hyperparameter settings (e.g., learning rate, data augmentation strategies) are too briefly described; more detailed information should be added.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for Authors1. Regarding dataset evaluation, I highly recommend including a detailed discussion in the manuscript about the challenges and limitations encountered when evaluating your model on additional fire and smoke datasets. These challenges may pertain to either the characteristics of the datasets themselves (e.g., resolution, annotation quality, environmental conditions) or the limitations of your model in adapting to them. Addressing these issues in the paper or at least explain them are essential to ensure a fair and transparent comparison and to better illustrate your model's generalization capacity.
2. One of the suggested papers provided during the review process appears to be missing from the manuscript. Please include the missing paper and utilize it to highlight and compare a subset of fire and smoke datasets. Discuss the key differences, strengths, and limitations of these datasets, and relate them to your proposed method. This will enhance the comprehensiveness of your literature review and demonstrate how your work fits within the broader research landscape.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis manuscript first compares YOLO11 with classical YOLO models, demonstrating its advantages in detection accuracy and speed. It then proposes a fire and smoke detection method based on YOLO11 and multi-scale convolutional attention (MSCA) mechanism, constructing the YOLO11s-MSCA model. The model shows relatively outstanding performance on the D-Fire dataset, with an overall detection accuracy improved by 2.6% and smoke recognition accuracy improved by 2.8%. Additionally, the revisions in response to major revisions are well addressed, with an updated literature review and additional comparative experiments included.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThank you very much for the revision. I hope the authors can provide public links to the dataset and methodology in the main text.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf