YOLOv7scb: A Small-Target Object Detection Method for Fire Smoke Inspection
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper presents an improved model for YOLOv7 tailored for fire and smoke detection, particularly focusing on the improvement of small targets. It incorporates multiple components, such as Space-to-Depth Convolutions (SPD-Conv), BiFPN, and a modified Focal-CIoU loss, as well as Transfer learning to improve feature extraction and handle imbalanced datasets effectively.
Here are the main concerns:
1. Is there any ablation analysis on the loss function Eq.1? Do they need different weights?
2. The dataset is limited; can it be applied to some known datasets, such as FLAME1 or FLAME2? Or, at least, the authors should discuss these works in their related work section.
[REF1] Aerial imagery pile burn detection using deep learning: The flame dataset
[REF2] “Flame 2: Fire detection and modeling: Aerial multi-spectral image dataset
3. Can authors compare some recent non-YOLO methods?
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for Authors1. Avoiding "we,our," could improve the quality of work.
2. Pictorial representation of algorithm, including gitlab work(if any) would improve readability. For ex:
Including adequate images related to your work could improve the readability. In general reader could be a Visual learners,Auditory learners,Kinesthetic learners,Logical mathematical learners,Reading and writing learners. So targeting these kinds of learners could reach more audience.
section 2.2. YOLOv7
where FIoU (A, B) denotes the intersection and merge ratio, i.e., the ratio of the area of 181
overlap (A ∩ B) between the labeled box A and the predicted box B to the area of union 182
(A ∩ B) between them; ρ(A, B) denotes the euclidean distance between the centroids of 183
both labeling box A and the predicted box B; C denotes the diagonal distance of the smallest 184
closure region that can contain both labeling box A and the predicted box B; α is a positive 185
trade-off parameter table; ν is used to measure the degree of approximation of the width- 186
to-height ratio of the labeled box and the predicted box, with a smaller value of ν indicates 187
that the width-to-height ratio is closer to the labeled box; wA and wB are the widths of the 188
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
Comments and Suggestions for Authors- In this paper, the authors proposed an improved version of the YOLOv7 deeplearning model called YOLOv7scb for fire and smoke detection on small targets.
The dataset used in this study was obtained by normalization (Microsoft COCO) of a public fire dataset from Roboflow.
- The proposed improved model was compared with 6 versions of the YOLO model, namely YOLOv3, YOLOv5, YOLOv7, YOLOv8, YOLOv10 and YOLOv11.
- The approach incorporates two key improvements to the YOLOv7 framework:
* the use of space-depth convolution (SPD-Conv) and C3 modules, in addition, the weighted bidirectional pyramidal network (BiFPN) is integrated into the feature extraction network.
* The replacement of the conventional full intersection loss function on union (CIoU) model with Focal-CIoU, which reduces the degrees of freedom in the loss function and improves the robustness of the model.
- This work is relevant for inspecting fire and smoke on small targets with better performance on several metrics.
- On the other hand, the article seems technically sound.
-Recommendation:
* Justify the choice of YOLO models for comparison.
* Clarify technically the proposed improvements (hyperparameters).
* The results discussion section needs more detail.
* The contribution of your work to related works is not well discussed.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 4 Report
Comments and Suggestions for Authors1. Organization of the manuscript is missing.
2. Authors should discuss the gaps in the existing literature and how they address these gaps at the end of the Literature Review section.
3. The following articles can be reviewed:
https://doi.org/10.3390/drones8090483
http://dx.doi.org/10.3934/math.2024526
4. Table 1 caption is wrong.
5. Did the authors attempt to optimize the hyperparameters of the YOLOv7scb model, e.g., learning rate, batch size, patience or regularization terms?
6. According to Table 3, Dataset A and B involve different epochs, why?
7. Authors should add the model size, # of parameters and training time per epoch or total training time.
8. The authors should discuss the limitations of the proposed work.
9. The data preprocessing and augmentation techniques should be elaborately explained in detail.
10. Authors should compare the results with related works.
11. Performance metrics (precision, recall and mAP) for CIoU and Focal-CIoU loss functions should be added.
12. Authors should discuss why Dataset A demonstrated poor results.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 4 Report
Comments and Suggestions for AuthorsThanks for addressing all my concerns.