YOLO-MFD: Object Detection for Multi-Scenario Fires
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper presents YOLO-MFD for fire scene detection. The paper is well organised. Some improvements are needed.
- In the literature, there are several existing methods for fire detection based on Yolo or Yolo Lite versions of the algorithm. Some of the algorithms are implemented on embedded devices. Literature reviews should include the algorithms.
- A discussion of improvements of existing algorithms for Yolo based methods should be discussed.
- The authors should provide a detailed breakdown of the Multi-scenario Fire Dataset (MFDB), including the distribution of fire scenarios (e.g., forest, grassland, building), image resolutions, and annotation protocols, to ensure reproducibility and the dataset.
- Are the algorithm’s parameters (learning rate, batch size, training epochs) optimised for YOLO-MFD?
- Why was YOLOv7 chosen as the baseline for this study? How does it perform with the latest yolo algorithms?
- Computational complexity should be discussed.
- How was the small-target subset of MFDB defined? The small class should be defined.
- Is it possible to include statistical analysis for the mAP improvements reported?
- How does the proposed algorithm perform under low light, heavy fog, or occlusion?
- Authors should present implementation details of the spatio-temporal channel attention mechanism in SAPM and discuss its computational overhead compared to standard attention mechanisms.
- Why are different convolutional kernel sizes (1x1, 3x3, 5x5) chosen for FAWM?
- Are there any limitations of YOLO-MFD?
Author Response
Comment 1: In the literature, there are several existing methods for fire detection based on Yolo or Yolo Lite versions of the algorithm. Some of the algorithms are implemented on embedded devices. Literature reviews should include the algorithms.
Response 1: We thank the reviewer for the valuable comment. In the revised manuscript, we have added and briefly introduced three recent works related to fire detection based on YOLO or YOLO Lite versions of the algorithm.(Page 2, Section 1, Paragraph 3 (Line 7))
Comment 2: A discussion of improvements of existing algorithms for Yolo based methods should be discussed.
Response 2: Thanks for the suggestion. After reviewing the existing YOLO-based methods, we have provided the following discussion on their improvements.(Page 2, Section 1, Paragraph 3 (Line 16))
Comment 3: The authors should provide a detailed breakdown of the Multi-scenario Fire Dataset (MFDB), including the distribution of fire scenarios (e.g., forest, grassland, building), image resolutions, and annotation protocols, to ensure reproducibility and the dataset.
Response 3: We appreciate your suggestions. We have compiled the distribution of fire scenarios and image resolutions in the MFDB, as shown in Figure 1. Additionally, we have included the annotation protocols.
Comment 4: Are the algorithm’s parameters (learning rate, batch size, training epochs) optimised for YOLO-MFD?
Response 4: Thank you very much for the valuable feedback. To verify the impact of the algorithm's parameters, we conducted the experiment with different learning rates, as shown in Figure 2, and have rephrased the description of the experimental environment, as follows:
Comment 5: Why was YOLOv7 chosen as the baseline for this study? How does it perform with the latest yolo algorithms?
Response 5: We are very grateful for your constructive suggestions. During the selection of the baseline, YOLOv7 demonstrated superior performance among the YOLO series. As per your guidance, comparing YOLO-MFD with the latest YOLO algorithms is essential to highlight its advancements. Therefore, we have included comparison experiments between YOLOv11 and YOLOv12 in Table 1, as detailed below.
Comment 6: Computational complexity should be discussed.
Response 6: Thank you for your valuable suggestions. YOLO is a detection method known for its real-time performance, it is essential to analyze its complexity. Therefore, we have re-analyzed it by combining the FPS with the characteristics of each method, along with Comment 5, as presented on Page 13, Section 4.4.3, Paragraph 1.
Comment 7: How was the small-target subset of MFDB defined? The small class should be defined.
Response 7: Thanks for the professional feedback, which has enabled us to provide a more comprehensive description of the MFDB dataset. First, the small object subset is determined by calculating the size of all fire targets in each image after labeling all fire scene. If the size proportion of a target is less than 10% of the entire image, it is classified as a small object. If the number of such small objects exceeds 60% of the total number of fire targets in the image, we categorize the image into the small object fire data subset.
Comment 8: Is it possible to include statistical analysis for the mAP improvements reported?
Response 8: Thank you for your comments. We have rephrased the improvements in mAP and the additional experiments with the latest YOLO algorithms, as recommended in Comment 5, on Page 13, Section 4.4.3, Paragraph 1.
Comment 9: How does the proposed algorithm perform under low light, heavy fog, or occlusion?
Response 9: We sincerely thank you for your detailed review. These scenarios are crucial for verifying the model's generalization. To this end, we obtained relevant scenarios from public sources such as Google and YouTube and conducted the generalization application experiments shown in Figure 5.
Comment 10: Authors should present implementation details of the spatio-temporal channel attention mechanism in SAPM and discuss its computational overhead compared to standard attention mechanisms.
Response 10: We are grateful for your thoughtful comments and suggestions, which have significantly contributed to enhancing the quality of our manuscript. To this end, we compared our approach with standard attention mechanisms to better analyze the applicability of SAPM for fire detection. For clarity, we have created Table 3 and included additional discussion and analysis in the manuscript.
Comment 11: Why are different convolutional kernel sizes (1x1, 3x3, 5x5) chosen for FAWM?
Response 11: We would like to thank you for constructive remarks. There is a significant scale variation between targets in fire scenarios. For example, smoke typically presents a large-scale, low-texture distribution, while flames are more localized with high-frequency edge features. To better capture these multi-scale features, we introduce convolution kernels of different sizes (1×1, 3×3, 5×5) in FAWM. The 1x1 convolution is used to enhance the nonlinear expressive capability between channels while controlling the parameter scale. The 3x3 convolution is suitable for capturing standard local structural information, whereas the 5x5 convolution contributes to model contextual features of large-scale, blurry targets such as smoke. This combination of multi-scale convolutions improves FAWM's sensitivity and expressive capability to different fire elements, enabling FAWM to exhibit stronger feature extraction and discriminative abilities in complex scenarios, thereby enhancing the accuracy and robustness of fire detection.
Comment 12: Are there any limitations of YOLO-MFD?
Response 12: Our sincere gratitude goes to you for careful review and helpful suggestions. In response to your inquiry, YOLO-MFD is capable of addressing the detection requirements of various fire scenarios, but its design overly emphasizes detection accuracy. For example, the SAPM module, to capture multi-scale objects, utilizes computationally complex deformable convolutions. Compared to state-of-the-art attention mechanisms, this design is overly complex, which limits the overall efficiency of the model. Therefore, pruning YOLO-MFD will be a key focus of our future work.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe submitted manuscript proposes an object detection method for multiple scenarios fire, called YOLO-MFD. The topic of the paper has been addressed so far in the literature, here missed detection caused by deformations of smoke and flame is resolved.
A short description of paper’s structure should be added at the end of the introduction.
Figure 7 is intended to provide a visual comparison of fire images across multiple scenarios. However, despite careful examination, I can see only two distinct scenarios (among the eight images). To improve the clarity, the authors should explicitly state what distinguishes each image or visually highlight the relevant areas where differences occur. This would help readers better understand the approach. The same for Figure 8.
In the tables with comparison with other methods, it is recommended that the proposed approach is somehow highlighted (written in bold and with a colored row background).
Regarding the comparison of small targets – the additional explanation is needed, especially as the proposed YOLO-MFD outperforms the other methods with respect to mAP and Smoke_mAP50, but regarding Fire_mAP50 are not so promising.
The conclusions are currently vague and lack specificity. They should be extended to more clearly summarize the key findings, highlight the contributions of the work, and discuss the practical implications and possible limitations. A more focused conclusion would strengthen the overall impact and clarity of the paper.
Author Response
Comment 1: A short description of paper’s structure should be added at the end of the introduction.
Response 1: We would like to express our gratitude to you for your time and effort in providing detailed and constructive feedback on the manuscript. Following your instructions, we have added a short description of paper’s structure at the end of the introduction.
Comment 2: Figure 7 is intended to provide a visual comparison of fire images across multiple scenarios. However, despite careful examination, I can see only two distinct scenarios (among the eight images). To improve the clarity, the authors should explicitly state what distinguishes each image or visually highlight the relevant areas where differences occur. This would help readers better understand the approach. The same for Figure 8.
Response 2: We are grateful to you for pointing out areas where clarification was needed, which has improved the manuscript, we have revised Figures 7 and 8 to ensure that the details are clear.
Comment 3: In the tables with comparison with other methods, it is recommended that the proposed approach is somehow highlighted (written in bold and with a colored row background).
Response 3: We would like to express our gratitude to you for your feedback, we have bolded the data of the proposed approach in each table.
Comment 4: Regarding the comparison of small targets–the additional explanation is needed, especially as the proposed YOLO-MFD outperforms the other methods with respect to mAP and Smoke_mAP50, but regarding Fire_mAP50 are not so promising.
Response 4: We would like to express our gratitude to you for your feedback. Smoke is inherently more difficult to detect in visual data due to its weak visual features, unstable morphology, and tendency to blend with the background. Typically appearing grayish and semi-transparent, smoke exhibits low contrast against its surroundings, with blurred and irregular boundaries, and lacks distinctive color and texture cues. These characteristics significantly hinder accurate detection in images.
Comment 5: The conclusions are currently vague and lack specificity. They should be extended to more clearly summarize the key findings, highlight the contributions of the work, and discuss the practical implications and possible limitations. A more focused conclusion would strengthen the overall impact and clarity of the paper.
Response 5: We are grateful to you for thoughtful suggestions, which helped refine the theoretical aspects of the manuscript.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsI appreciate the efforts of the authors to improve paper’s quality. Although the changes have not been highlighted, I tracked them one comment after the other. For the next time, please highlight all introduced changes. I have no other comments, the paper can be accepted for publication.