Neural Network Implementation for Fire Detection in Critical Infrastructures: A Comparative Analysis on Embedded Edge Devices
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript discusses the performance comparison of the YOLOv5n model implemented on different embedded edge devices in the context of fire detection for critical infrastructure. The topic is clear, and the results offer guidance for selecting hardware in edge computing. This study is informative, but the following questions should be addressed carefully:
- The introduction section fails to cite key literature when reviewing the traditional sensor-based monitoring systems in the field of fire monitoring and the existing technologies such as cloud-based AI. Relevant literature should be supplemented to enhance the reliability of the academic argument.
- The introduction section needs to point out the shortcomings of the current literature and the contributions of this paper. Has there been any similar hardware comparison studies in the existing literature, especially for the battery-powered embedded fire monitoring scenarios? Or are there any limitations of the existing works (such as focusing only on a single platform, not quantifying energy consumption, etc.)? It is suggested that some additional information be provided to highlight the significance of the comparative work presented in this article.
- When introducing the first usage of certain abbreviations, the full terms should be provided, such as FPGA, etc.
- The test conditions for disconnecting the fan of Kria KV260 do not match the actual deployment scenarios. It is recommended that the impact of this restriction condition on the applicable scope of the conclusion be supplemented in the discussion section.
- This paper's experiment only selected YOLOv5n for testing and did not involve other models (such as s/m/l/x). It is recommended to analyze the influence of model size on the performance of different hardware platforms in the discussion section and cite relevant literature to support.
Author Response
We sincerely thank you for the constructive and insightful comments you have proportionated, helping us to improve the quality of our manuscript. All raised concerns have been carefully addressed, and the corresponding revisions are highlighted below for ease of review. We believe the paper is now stronger and more comprehensive thanks to your valuable feedback.
Comments 1: The introduction section fails to cite key literature when reviewing the traditional sensorbased monitoring systems in the field of fire monitoring and the existing technologies such as cloud-based AI. Relevant literature should be supplemented to enhance the reliability of the academic argument.
Response 1: We have expanded the Introduction with relevant literature, including Castro-Correa et al. (2022), which presents a wireless sensor network using gas, flame, and temperature sensors with cloud-based transmission, and Lema et al., who also conduct research with different AI models on 3 different platforms. Additionally, reviewer 2 has also recommended us to include the paper by Laganà et al., which ads a contrast with cloud computing systems. This strengthens the background and clarifies the shift toward AI-based edge-computer vision methods. We have also included in the bibliography additional research about benchmarking on embedded systems on-the-edge. [Lines: 28-38]
Comments 2: The introduction section needs to point out the shortcomings of the current literature and the contributions of this paper. Has there been any similar hardware comparison studies in the existing literature, especially for the battery-powered embedded fire monitoring scenarios? Or are there any limitations of the existing works (such as focusing only on a single platform, not quantifying energy consumption, etc.)? It is suggested that some additional information be provided to highlight the significance of the comparative work presented in this article.
Response 2: The Introduction now explicitly states the lack of comparative benchmarks for edge AI hardware in fire detection scenarios, especially under battery-powered constraints. We emphasize our contribution as a cross-platform, energy-focused analysis addressing this gap. [Lines: 61-74]
Comments 3: When introducing the first usage of certain abbreviations, the full terms should be provided, such as FPGA, etc.
Response 3: All abbreviations, including FPGA, NPU, GPU, SoC, and AI, are now defined at first mention for improved clarity.
Comments 4: The test conditions for disconnecting the fan of Kria KV260 do not match the actual deployment scenarios. It is recommended that the impact of this restriction condition on the applicable scope of the conclusion be supplemented in the discussion section.
Response 4: Although the research has been clear about how the tests were done and which would be the power drawn in case the OEM fan is connected, this point is now discussed in the Results and Discussion sections. The fan was disconnected to ensure comparable power measurements across devices. We acknowledge its real-world presence (~0.63 W) and discuss possible passive cooling strategies. [Lines: 518-530; 658-662]
Comments 5: This paper's experiment only selected YOLOv5n for testing and did not involve other models (such as s/m/l/x). It is recommended to analyze the influence of model size on the performance of different hardware platforms in the discussion section and cite relevant literature to support.
Response 5: This has been clarified in the discussion section. Initially, we also tested the YOLOv5s variant, but the first results exhibited a longer inference time (two-times slower) with only a 2.4% mAP gain for our trained model, so we focused on the YOLOv5n version instead. [Lines: 675-683]
Reviewer 2 Report
Comments and Suggestions for AuthorsThe article proposes a concrete and well-documented comparative analysis between three embedded platforms (i.MX93, Kria KV260, Jetson Orin Nano) for the implementation of YOLOv5n on low-power fire detection systems. The context is relevant, especially for real-time applications and critical infrastructures. The approach to measuring energy consumption is extremely detailed, using laboratory instruments (Agilent N6705B) and a reproducible experimental protocol. However, the paper presents some aspects that need further exploration. Specifically, the model was tested only on the D-Fire dataset. The robustness of the system under real conditions, with environmental variables or multimodal datasets, is not evaluated. Moreover, there is a lack of statistical significance tests on the results, especially concerning the loss of precision after quantization and conversion. The comparison with alternative techniques to YOLOv5 would be appreciated, especially with respect to other lightweight models (e.g., MobileNet, EfficientDet). The paper does not discuss end-to-end latency (including preprocessing and event transmission), which is critical in real-world contexts. The bibliography is not exhaustive, as the paper does not cite recent work on monitoring or the use of soft computing technologies in similar fields. In order to improve the scientific contribution of the paper, the authors should include in Section 2.2—Hardware Accelerators for CNNs—studies that address the challenges related to edge computing, temperature, and security in embedded devices. A paper that should be included in the section and bibliography is the following: doi: 10.3390/app15052439, because it is useful in highlighting the physiological and safety implications of hardware platforms. Likewise, the authors should include alternative methods (soft computing) to CNN in the bibliography, reinforcing the reflection on the architectural choices made. I am not asking for a new model to be implemented, but only for the following scientific work to be included in the Discussion Section and bibliography, doi:10.2478/jee-2025-0007.
To this end, the authors posed the following questions:
1) How would the system behave with real data coming from cameras in active tunnels? Have you tested simulated scenarios or generated a custom dataset?
2) Why wasn't the preprocessing and postprocessing time also considered in the latency calculation? It is relevant in a real-time application.
3) Did the authors compare YOLOv5n with other edge-optimized models (e.g., MobileNetV3, NanoDet)? If not, why this exclusion?
4) How is the maintenance and updating of the model managed in edge scenarios, especially for less performant devices?
5) Quantization caused a significant loss in mAP@50:95. Have you considered model-aware quantization techniques (e.g., QAT)?
6) Is the conversion and quantization pipeline fully automated, or does it require manual intervention (e.g., for layer removal)?
7) Can the authors clarify why TensorFlow 2.12 is preferred over more recent versions? Have the authors systematically verified the incompatibility?
8) Did the authors consider robustness metrics (e.g., on partially occluded or noisy images)?
9) How can the study be generalized to other types of critical events in infrastructures (not just fires)?
Author Response
Comments 1: The article proposes a concrete and well-documented comparative analysis between three embedded platforms (i.MX93, Kria KV260, Jetson Orin Nano) for the implementation of YOLOv5n on low-power fire detection systems. The context is relevant, especially for real-time applications and critical infrastructures. The approach to measuring energy consumption is extremely detailed, using laboratory instruments (Agilent N6705B) and a reproducible experimental protocol. However, the paper presents some aspects that need further exploration. Specifically, the model was tested only on the D-Fire dataset. The robustness of the system under real conditions, with environmental variables or multimodal datasets, is not evaluated. Moreover, there is a lack of statistical significance tests on the results, especially concerning the loss of precision after quantization and conversion. The comparison with alternative techniques to YOLOv5 would be appreciated, especially with respect to other lightweight models (e.g., MobileNet, EfficientDet). The paper does not discuss end-to-end latency (including preprocessing and event transmission), which is critical in real-world contexts. The bibliography is not exhaustive, as the paper does not cite recent work on monitoring or the use of soft computing technologies in similar fields. In order to improve the scientific contribution of the paper, the authors should include in Section 2.2— Hardware Accelerators for CNNs—studies that address the challenges related to edge computing, temperature, and security in embedded devices. A paper that should be included in the section and bibliography is the following: doi: 10.3390/app15052439, because it is useful in highlighting the physiological and safety implications of hardware platforms. Likewise, the authors should include alternative methods (soft computing) to CNN in the bibliography, reinforcing the reflection on the architectural choices made. I am not asking for a new model to be implemented, but only for the following scientific work to be included in the Discussion Section and bibliography, doi:10.2478/jee-2025-0007.
Response 1: We sincerely appreciate your constructive and insightful feedback. Your comments on robustness, latency, model comparisons, and bibliography were really helpful. All major points have been addressed, and your suggested reference and twelve more have been included. Regarding your second suggested reference (doi:10.2478/jee-2025-0007), we acknowledge its relevance but did not include it to maintain the focus on CNN-based approaches. We consider it a valuable direction for future work.
Below is our detailed response to each point raised.
Comments 2: How would the system behave with real data coming from cameras in active tunnels? Have you tested simulated scenarios or generated a custom dataset?
Response 2: Currently, only the D-Fire dataset was used. A real-world deployment is planned with a public infrastructure manager in the Basque Country, including tunnels, and this is noted in the Discussion. Lines: [684-695]
Comments 3: Why wasn't the preprocessing and postprocessing time also considered in the latency calculation? It is relevant in a real-time application.
Response 3: Only core inference time was measured to maintain benchmarking consistency. We’ve clarified this in Section 4.2, while acknowledging the relevance of preprocessing in operational scenarios. Lines: [483-488]
Comments 4: Did the authors compare YOLOv5n with other edge-optimized models (e.g., MobileNetV3, NanoDet)? If not, why this exclusion?
Response 4: We want to measure the three platforms in the most similar conditions, so the goal was to isolate hardware impact under a consistent model. We have included the justification in the 3.1 section. We also mention in the Discussion that YOLOv5n was selected due to its cross-platform compatibility and balanced performance, and specifically the nano version because we only achieved a 2.4% mAP improvement for our use case with the small version. Lines: [209-219; 669-677]
Comments 5: How is the maintenance and updating of the model managed in edge scenarios, especially for less performant devices?
Response 5: While outside this paper’s scope, we recognize its importance and added a note in the Conclusions regarding future work on update mechanisms (e.g., SWUpdate) aligned with cybersecurity regulations. Lines: [723-727]
Comments 6: Quantization caused a significant loss in mAP@50:95. Have you considered model-aware quantization techniques (e.g., QAT)?
Response 6: Section 3.4 now explicitly mentions that only post-training quantization (PTQ) was used. QAT is acknowledged as a potential future improvement to reduce accuracy loss. Lines: [315-320]
Comments 7: Is the conversion and quantization pipeline fully automated, or does it require manual intervention (e.g., for layer removal)?
Response 7: The pipeline is mostly automated for i.MX93 and Orin Nano. For Kria KV260, manual adjustment is needed due to unsupported operations in Vitis AI. This is explained in Section 3.1. Lines: [238-242]
Comments 8: Can the authors clarify why TensorFlow 2.12 is preferred over more recent versions? Have the authors systematically verified the incompatibility?
Response 8: Section 3.4.2 now explains that newer versions (e.g., 2.15) introduced bugs affecting detection. TF 2.12 was empirically selected for stable results. Source has been introduced. Lines: [344-350]
Comments 9: Did the authors consider robustness metrics (e.g., on partially occluded or noisy images)?
Response 9: Robustness was not evaluated in this work, but we recognize its importance and cite it as a direction for future studies in the Conclusions. Lines: [727-729]
Comments 10: How can the study be generalized to other types of critical events in infrastructures (not just fires)?
Response 10: The Discussion and Conclusions now clarify that the proposed benchmarking method can be extended to other infrastructure threats, such as intrusion or structural anomalies. Lines: [729-733]
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsI thank the authors for their replies to the comments. I have no further comments to ask.