Next Article in Journal
Authentication Challenges and Solutions in Microservice Architectures
Previous Article in Journal
A Dynamic Analysis of Angular Contact Ball Bearing 7205C Used for a Scraper Conveyor
 
 
Article
Peer-Review Record

Parameter Efficient Asymmetric Feature Pyramid for Early Wildfire Detection

Appl. Sci. 2025, 15(22), 12086; https://doi.org/10.3390/app152212086
by Xiaohui Cheng 1,2, Jialong Bian 1, Yanping Kang 1,2,*, Xiaolan Xie 1,2, Yun Deng 1,2, Qiu Lu 1,2, Jian Tang 3, Yuanyuan Shi 3 and Junyu Zhao 3
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2025, 15(22), 12086; https://doi.org/10.3390/app152212086
Submission received: 13 October 2025 / Revised: 5 November 2025 / Accepted: 11 November 2025 / Published: 14 November 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Line 40. “Vision-based detection and deep learning” should not be contrasted with and “remote sensing”.

It would be helpful to add references to international studies and increase the number of journal publications in the introduction.

Lines 107-119 – in my opinion, this would be better suited for the discussion rather than the introduction.

Lines 124-125 – Could the authors provide a brief description of the datasets used? Perhaps not all readers are familiar with them.

Author Response

Comments 1:
Line 40. “Vision-based detection and deep learning” should not be contrasted with “remote sensing”.
Response 1:
Thank you for the suggestion. We agree with this comment. Therefore, we revised the sentence to avoid any implication of contrast between vision-based methods and remote sensing.
Updated text (Page 2, Introduction, lines 39–42):
“Accordingly, data‑driven, vision‑based deep learning methods have become a mainstream approach for automated early wildfire detection on imagery acquired by satellites, UAVs, and ground cameras.”
This clarifies that remote‑sensing and ground systems denote the image‑acquisition platforms, while our work concerns the vision‑based analysis performed on their imagery.

Comments 2:
It would be helpful to add references to international studies and increase the number of journal publications in the introduction.
Response 2:
Thank you for the suggestion. We expanded the Introduction beyond YOLO‑focused and regional works by adding international journal publications that cover (i) UAV–IoT system deployments for early detection, (ii) AI+IoT surveys for practical implementation, (iii) satellite‑based deep‑learning approaches (detection/mapping/prediction and datasets), and (iv) airborne optical/thermal sensing for field monitoring. We inserted one‑sentence additions at the corresponding paragraphs (highlighted), as detailed below.
After “Jianwei Li et al. [14] … at small cost for real time edge deployment.”:
Ramadan et al. [15] design an AI‑powered UAV–IoT system for early wildfire prevention and detection, demonstrating end‑to‑end operation with low‑power nodes, long‑range communication, and field‑deployable workflows.
Immediately following the above sentence:
Giannakidou et al. [16] provide a comprehensive survey of AI‑ and IoT‑enabled wildfire prevention, detection and restoration, summarizing system architectures, communication stacks, energy/latency constraints, and field deployments.
After “… strong augmentation with multiscale training.”:
Ghali and Akhloufi [28] synthesize deep‑learning approaches for wildland fires using satellite remote‑sensing data, covering detection, mapping, and prediction together with commonly used datasets and evaluation practices.
After “Leo Ramos et al. [23] … infrared multispectral sensing … learning.”:
Allison et al. [25] review airborne optical and thermal remote sensing for wildfire detection and monitoring, outlining sensor modalities, manned/UAV platforms, and operational constraints relevant to field deployment.
New references added:
Ramadan, M.N.A.; Basmaji, T.; Gad, A.; Hamdan, H.; Akgün, B.T.; Ali, M.A.H.; Alkhedher, M.; Ghazal, M. Towards Early Forest Fire Detection and Prevention Using AI Powered Drones and the IoT. Internet of Things 2024, 27, 101248. https://doi.org/10.1016/j.iot.2024.101248.
Giannakidou, S.; Rodoglou‑Grammatikis, P.; Lagkas, T.; Argyriou, V.; Goudos, S.; Markakis, E.K.; Sarigiannidis, P. Leveraging the Power of Internet of Things and Artificial Intelligence in Forest Fire Prevention, Detection, and Restoration: A Comprehensive Survey. Internet of Things 2024, 26, 101171. https://doi.org/10.1016/j.iot.2024.101171.
Allison, R.S.; Johnston, J.M.; Craig, G.; Jennings, S. Airborne Optical and Thermal Remote Sensing for Wildfire Detection and Monitoring. Sensors 2016, 16(8), 1310. https://doi.org/10.3390/s16081310.
Ghali, R.; Akhloufi, M.A. Deep Learning Approaches for Wildland Fires Using Satellite Remote Sensing Data: Detection, Mapping, and Prediction. Fire 2023, 6(5), 192. https://doi.org/10.3390/fire6050192.

Comments 3:
Lines 107–119 – in my opinion, this would be better suited for the discussion rather than the introduction.
Response 3:
Thank you for the suggestion. We agree and have moved the entire paragraph from the end of the Introduction to the Discussion section, with minimal copy‑editing and no technical change.
Updated text:
Removed from the Introduction (previously lines 107–119 in the initial submission).
Added to the Discussion as the opening paragraph under “4. Discussion” (Page 18, lines 622–634):
“Accordingly, for forest fire detection, we revisit the information flow in FPN from the perspective of evaluation metrics and introduce node selection optimization to better balance accuracy, latency, and cost. Within the RetinaNet framework, we replace Smooth L1 with CIoU to enforce consistent modeling of overlap, center distance, and aspect ratio, thereby stabilizing regression for small objects. We further exploit the differences between flames and smoke in semantic strength and shape stability, where flames exhibit stronger semantics and relatively stable shapes, whereas smoke has low contrast, fuzzy boundaries, and pronounced deformation. We break the symmetric treatment across FPN levels and, after feature fusion, apply lightweight level‑specific enhancement to different levels, forming an asymmetric feature pyramid. This design prevents over‑concentration on a single scale, reduces computation, and improves generalization. It maintains stable attention and reduces false positives under strong backgrounds such as water glare, sunset, and clouds, aiming for both low false alarm rates and high recall.”

Comments 4: 
Lines 124-125 – Could the authors provide a brief description of the datasets used? Perhaps not all readers are familiar with them.
Response 4:
Thank you for the suggestion. We added a concise “Brief dataset description” in Section 2.1.1 (Page 3, lines 130–137), summarizing scene/landscape types, illumination conditions, common confounders, target patterns (flame/smoke; tiny to extended; multi‑target frames), hard negatives, annotation format, and de‑duplication with perceptual hashing, together with the final size (5,665 unique images) and the unified processing/evaluation protocol. The added text is:
“Brief dataset description. The composite corpus contains outdoor scenes from public wildfire/smoke repositories on Roboflow and covers diverse landscapes (dense/sparse forests, mountains, lakeshores, grassland, and the urban–rural interface) and illumination conditions (midday, overcast, dusk/sunset). Frequent confounders include water‑surface glare, low sun, clouds and haze. Images include both flame and smoke patterns, from tiny distant spots to extended plumes, with many multi‑target frames; images without visible targets are retained as hard negatives. Annotations are axis‑aligned bounding boxes in a unified label scheme (COCO format). After the initial merge, we apply a strict deduplication pipeline based on perceptual hashing [31], generating content‑aware hash signatures for each image to identify duplicates and near duplicates, thereby eliminating overlap and leakage across data sources. After cleaning, we obtain 5,665 unique images that cover diverse terrains, illumination, and weather conditions, providing a stable, reproducible, and representative basis for subsequent experiments. Meanwhile, we unify and standardize the annotation scheme to ensure consistent class definitions and clear annotation boundaries, and we adopt consistent data processing and evaluation protocols in both training and evaluation to avoid biases introduced by dataset differences.”

Reviewer 2 Report

Comments and Suggestions for Authors

The topic of the article is very useful, early detection of fire can significantly help reduce the damage caused by fires. After reviewing the manuscript, I conclude that it basically meets the requirements expected of scientific articles, there are contributions, however, I have a few suggestions that I think could increase the value of the article. My suggestions:


The objective of the manuscript is stated in the first sentence of the abstract, but I suggest that this be stated clearly and declared at the end of the introduction.

In the introduction, the authors only partially review the literature related to the topic, linking it together rather than providing a broader overview or critics of the relevant literature. There is a huge literature on remote sensing and fire detection, a lot of which also deals with fire detection, of which only those related to YOLO are linked together, but that is even not comprehensive. I suggest expanding this.

I also have another observation regarding the literature. The cited literature is specifically Asian-dominated, which for those familiar with the topic shows both a clearly unbalanced analysis from a professional perspective and is not elegant from the authors. The latter, of course, cannot be a requirement, but the first remark does. I note even that the cited literature is otherwise all relevant.

I miss from the manuscript how the results can be used in practice. Although this is referred to and written about in Chapter 4.3, this is completely insufficient. The methodology, the studies and the results of the manuscript are acceptable, and I even agree with the findings myself, but the manuscript was submitted to Applied Science, which means that applicability must also be proven. I accept that the details will be worked out in the next work, but here a more detailed explanation of the implementation of the results into practice, their usability and the effectiveness of developments based on the results obtained is necessary.

Due to the many acronyms, abbreviations I suggest that there be a summary list of them at the end of the article.

In Figure 7, the text on pictires are not readable at normal size (a,b), I suggest that this be made clear with an explanatory text in the caption.

In addition to the above, I recommend publishing the manuscript.

Author Response

Comments 1:
Please clearly restate the objective at the end of the Introduction.
Response 1:
Thank you for the suggestion. We added two closing sentences at the end of the Introduction to explicitly state the objective and scope.
Updated text (Page 3, Introduction, lines 120–124):
“In this context, we pursue a deployable, parameter‑efficient detector that maintains high recall while reducing false alarms under glare/sunset and small‑object conditions. We therefore adopt a lightweight post‑fusion refinement within RetinaNet+FPN to balance detection accuracy, computational efficiency, and end‑to‑end latency for early wildfire monitoring.”

Comments 2:
In the introduction, the authors only partially review the literature; there is a huge literature on remote sensing and fire detection, of which only YOLO‑related works are linked together. I suggest expanding this.
Response 2:
Thank you for the valuable suggestion. We expanded the Introduction beyond YOLO‑focused and regional works by adding international journal publications and concise one‑sentence additions in the relevant paragraphs. Specifically, we (i) added a UAV–IoT system deployment study to represent end‑to‑end early detection under field constraints, (ii) added a comprehensive AI+IoT survey to summarize practical architectures and constraints for deployment, (iii) added an airborne optical/thermal review to reflect sensing modalities and operational considerations, and (iv) added a satellite‑based deep‑learning overview to broaden the sensing and task coverage (detection, mapping, prediction, datasets). These sentences are highlighted in the manuscript and placed as follows:
After “Jianwei Li et al. [14] … at small cost for real time edge deployment.”: Ramadan et al. [15] design an AI‑powered UAV–IoT system for early wildfire prevention and detection, demonstrating end‑to‑end operation with low‑power nodes, long‑range communication, and field‑deployable workflows. Immediately following: Giannakidou et al. [16] provide a comprehensive survey of AI‑ and IoT‑enabled wildfire prevention, detection and restoration, summarizing system architectures, communication stacks, energy/latency constraints, and field deployments.
After “Leo Ramos et al. [23] … infrared multispectral sensing … learning.”: Allison et al. [25] review airborne optical and thermal remote sensing for wildfire detection and monitoring, outlining sensor modalities, manned/UAV platforms, and operational constraints relevant to field deployment.
After “… strong augmentation with multiscale training.”: Ghali and Akhloufi [28] synthesize deep‑learning approaches for wildland fires using satellite remote‑sensing data, covering detection, mapping, and prediction together with commonly used datasets and evaluation practices.
These additions balance the perspective, increase the number of journal publications, and clarify our positioning within international research.

Comments 3:
Please provide more details on how the results can be used in practice (applicability, usability, and effectiveness).
Response 3:
Thank you for the suggestion. We added Section 4.1 “Practical applicability and deployment” before the former Section 4.1 (now 4.2). The new section details threshold/NMS tuning, the end‑to‑end timing pipeline (including preprocessing and NMS), resource and latency budgets for edge deployment, minimal‑change integration with standard RetinaNet heads, and site onboarding without re‑training plus simple temporal filtering for video. This clarifies how the method is used in practice and why it meets real‑time monitoring needs.

Comments 4:
Due to the many acronyms, abbreviations I suggest that there be a summary list of them at the end of the article.
Response 4:
Thank you for the suggestion. We restored the template’s “Abbreviations” section at the end of the manuscript and added an alphabetically ordered list of the acronyms used in the paper (e.g., APs, BiFPN, CIoU, DPPM, DyFPN, EMA, FAM, FPN, FPS, GFLOPs, GN, IoU, LEB, MCCL, NMS, SiLU, SPPF, UAV) to improve readability.

Comments 5:
In Figure 7, the text on pictures are not readable at normal size (a,b). I suggest this be made clear with an explanatory text in the caption.
Response 5:
Thank you for the suggestion. We revised the caption of Figure 7 to explicitly explain how to read panels (a) and (b): the scenes contain long‑range small targets, so per‑box labels can appear small at the journal layout size; zoomed‑in crops (~×2.5) are provided beneath each frame to display the detections clearly, and the colored boxes indicate the detections. We also included a high‑resolution version of Figure 7 in the submission files for precise reading.

Reviewer 3 Report

Comments and Suggestions for Authors

This paper offers a solid contribution to efficient neck design for early wildfire detection. The move from a heavy attention block (DAEM) to a minimal post-fusion recalibration (LEB) is well argued and backed by controlled ablations. The unified end-to-end protocol and the qualitative evidence on challenging scenes (tiny distant targets, multiple small objects, water-glare backgrounds) are convincing. I recommend minor–moderate revisions mainly around reproducibility, domain coverage, and fine-grained efficiency reporting.

1) Main strengths

  • Clear methodological goal: lightweight post-fusion recalibration on P3–P5 to improve the accuracy–cost trade-off.
  • Clean ablation (v1–v5) exposing the “complexity trap” of DAEM and motivating LEB.
  • End-to-end timing that includes letterboxing and NMS—important for real deployments.
  • Persuasive qualitative results: fewer water-glare false positives and better recovery of distant tiny fires.

2) Major revisions requested (technical and verifiable)

  1. Efficiency metrics. Alongside η (mAP@0.5/params), report mAP@[0.5:0.95]/params in the main paper (not only in the supplement). This complements detection with localization quality and reduces potential bias.
  2. Compute accounting. Provide an explicit FLOPs/parameter breakdown per component (DW-3×3, PW-1×1, GroupNorm, activations, upsampling). Clarify whether GN/SiLU are excluded from FLOPs, and give an estimate if so.
  3. Domain generalization. Add results by scene and source (synthetic vs. real; water/forest/mountain/urban; lighting). A small table with mAP@0.5, mAP@[0.5:0.95], recall and FPR per domain will help assess robustness to domain shift.
  4. Edge hardware. Report latency, FPS, memory footprint, and ideally power on Jetson Orin/Xavier or ARM CPU using the same resolution (640) and identical pipeline. This is key for field deployment.
  5. Statistical variability. Provide mean±SD or 95% CI over ≥3 runs with different seeds for mAP@[0.5:0.95], recall and FPS. The ablation is clear; stability needs quantification.
  6. Error analysis. Include a confusion-style breakdown per difficult scenario and a short section on critical false negatives (very small, distant fires) versus glare-induced false positives. A few failure cases with commentary would be useful.
  7. LEB design choices. Justify GN group size and the choice of SiLU. Consider, at least in an appendix, placing LEB on P6–P7 under <3% GFLOPs overhead to test recall on ultra-small objects.
  8. Lightweight neck baselines. Add comparisons with PAN-Lite/BiFPN-Lite at matched compute budgets to strengthen the claim that the proposed post-fusion block is competitive.

3) Minor edits and presentation

  • Config clarity. Specify RetinaNet anchor sizes/ratios, full schedule (epochs, warm-up), EMA if used, data augmentation rules, and the exact NMS/confidence thresholds for the reported numbers.
  • Grad-CAM. State normalization and target layer; if possible, include a small fidelity check or a case where baseline vs. v5 activations drive different decisions.
  • Figure 6. Consider adding error bars or interval marks if variability is reported.
  • Language. English is fine; a light copy-edit to harmonize verb tense and acronyms would suffice.

 4) Reproducibility checklist

  • Public PyTorch code and training/inference configs.
  • Seeds and splits with anti-leak verification (published hashes/lists).
  • An end-to-end timing script including preprocessing and NMS.
  • v5 checkpoint and logs for the three reported runs.

Author Response

Comments 1:
Efficiency metrics. Alongside η (mAP@0.5/params), report mAP@[0.5:0.95]/params in the main paper (not only in the supplement).
Response 1:
Thank you for the suggestion. We have added the localization‑aware efficiency metric to the main paper.
(1) Abstract: we inserted the sentence “In addition, the localization‑aware efficiency, defined as η@[0.5:0.95]=mAP@[0.5:0.95]/Params(M), reaches ≈1.21 and is the highest under the unified protocol.”
(2) Table 3: we added a new column η@[0.5:0.95] and completed all entries (e.g., YOLOX‑x = 0.49). In the table notes, we now explicitly define both metrics: “Parameter efficiency is defined as η = mAP@0.5 / Params(M). The localization‑aware efficiency is defined as η@[0.5:0.95] = mAP@[0.5:0.95] / Params(M).”
(3) Section 3.3 (first sentence after Table 3): we revised it to “As shown in Table 3, AsymmetricFPNv5 attains the highest parameter efficiency η=2.34 and also the highest localization‑aware efficiency η@[0.5:0.95]=1.21.”
Under the unified protocol, v5 leads on both detection‑oriented efficiency (η@0.5) and localization‑aware efficiency (η@[0.5:0.95]) while keeping near‑baseline compute and the reported end‑to‑end FPS.

Comments 2:
Compute accounting. Provide an explicit FLOPs/parameter breakdown per component (DW‑3×3, PW‑1×1, GroupNorm, activations, upsampling). Clarify whether GN/SiLU are excluded from FLOPs, and give an estimate if so.
Response 2:
Thank you for the suggestion. We clarified compute accounting in the main text. In Section 2.2.3 (after Eq. (6)) we added that FLOPs/parameters are reported for convolutional terms, with bilinear upsampling counted separately; GroupNorm and activations (SiLU) are excluded and their runtime overhead at 640×640 is estimated to be <1–2% per image. We also state that, on P3–P5 under the unified setting, PW‑1×1 contributes ~85–92% and DW‑3×3 ~6–12% of per‑level FLOPs, consistent with Eqs. (5)(6). In Section 2.3.2 we added a sentence that the same convention is applied consistently across all variants to enable budget‑matched comparisons.

Comments 3:
Domain generalization. Add results by scene and source (synthetic vs. real; water/forest/mountain/urban; lighting). A small table with mAP@0.5, mAP@[0.5:0.95], recall and FPR per domain will help assess robustness to domain shift.
Response 3:
Thank you for the helpful suggestion. We agree that domain‑wise reporting is valuable and plan to support it with scripts and templates. In this version we did not add a full table because some per‑domain sample sizes are small and scene metadata are incomplete, which risks over‑interpretation. We added a note in the Limitations stating that we will release scripts for domain‑wise evaluation (e.g., source: synthetic/real; scene: glare/non‑glare; lighting) together with templates for site‑specific grouping upon acceptance. Section 3.2 already contrasts representative glare and distant‑tiny‑target cases under the unified thresholds, and the ordering in the overall results remains consistent.

Comments 4:
Edge hardware. Report latency, FPS, memory footprint, and ideally power on Jetson Orin/Xavier or ARM CPU using the same resolution (640) and identical pipeline.
Response 4:
Thank you for highlighting the deployment aspect. We acknowledge the importance of embedded measurements and will run the identical 640×640 end‑to‑end pipeline (including preprocessing and NMS) on Jetson Orin/Xavier and ARM CPU, reporting latency/FPS, peak memory, and power in the companion repository. We will also provide a hardware‑agnostic timing script and environment (Docker/conda) upon acceptance so that these measurements can be reproduced on embedded devices.

Comments 5:
Statistical variability. Provide mean±SD or 95% CI over ≥3 runs with different seeds for mAP@[0.5:0.95], recall and FPS.
Response 5:
Thank you for the request. We acknowledge the importance of reporting variability. In this revision we keep the unified protocol and point estimates in the main paper, and we will provide scripts to compute 95% confidence intervals via image‑level bootstrap on the test set and to run ≥3 seeds, together with per‑run logs and the v5 checkpoint, in the companion repository upon acceptance. This will allow readers to reproduce mean±SD/CI for mAP@[0.5:0.95], recall and FPS under identical thresholds and timing.

Comments 6:
Error analysis. Include a confusion‑style breakdown per difficult scenario and a short section on critical false negatives (very small, distant fires) versus glare‑induced false positives.
Response 6:
Thank you for the suggestion. We added a short paragraph at the end of Section 3.2 describing the two dominant failure modes under the unified thresholds: ultra‑small distant fires (potential FN) and water‑surface sun‑glare (FP). We point to the corresponding qualitative evidence and attention differences (Fig. 7a–b, 7e–f; Fig. 8) and summarize remaining residual errors (thin semi‑transparent plumes and sensor‑limit tiny targets). Scripts for per‑scenario confusion counting (e.g., glare‑FP vs. tiny‑FN) will be provided in the companion repository upon acceptance.

Comments 7:
LEB design choices. Justify GN group size and the choice of SiLU. Consider, at least in an appendix, placing LEB on P6–P7 under <3% GFLOPs overhead to test recall on ultra‑small objects.
Response 7:
Thank you for the constructive request. We clarified the design decisions in Section 2.2.3: GN uses 32 groups by default on P3–P5 (min(32, C) when C<32), and SiLU is adopted for smooth gradients and stable convergence under small‑batch GN with negligible runtime overhead. Based on Eqs. (5)(6), we estimate that adding one extra LEB on P6 and P7 would increase per‑image FLOPs by <3% at 640×640; because our dataset contains relatively few ultra‑small targets at those strides and we aim to preserve latency, we keep P6/P7 unchanged in this version. We will provide an optional code path to enable P6/P7 placement in a follow‑up release.

Comments 8:
Lightweight neck baselines. Add comparisons with PAN‑Lite/BiFPN‑Lite at matched compute budgets to strengthen the claim that the proposed post‑fusion block is competitive.
Response 8:
Thank you for this request. We added a note in the Discussion clarifying how our post‑fusion refinement differs from PAN‑Lite/BiFPN‑Lite and why it minimizes integration cost while achieving the highest η and η@[0.5:0.95] at near‑baseline compute under the unified protocol. A budget‑matched reproduction of PAN‑Lite/BiFPN‑Lite within the same protocol is planned; upon acceptance we will release neck‑swapping code paths and report the results in the companion repository.

Comments 9:
Config clarity. Specify RetinaNet anchor sizes/ratios, full schedule (epochs, warm‑up), EMA if used, data augmentation rules, and the exact NMS/confidence thresholds for the reported numbers.
Response 9:
Thank you. We consolidated the training/inference settings in Sections 2.3/2.3.2. Anchors (sizes/ratios/scales), the full schedule (100 epochs; 500‑iter warm‑up; cosine annealing; AdamW), augmentation (flip 0.5; scale jitter [0.8,1.2]), and the unified thresholds (score_threshold=0.05; NMS IoU=0.5) are explicitly stated, and we now add that EMA is not used.

Comments 10:
Grad‑CAM. State normalization and target layer; if possible, include a small fidelity check or a case where baseline vs. v5 activations drive different decisions.
Response 10:
Thank you. We added a sentence after Figure 8 specifying that Grad‑CAM targets the last convolution in P5 with per‑map min–max normalization before overlay, and we note a simple fidelity check: masking the top‑activated regions flips the baseline decision in the glare scene (Fig. 8b) but not v5 (Fig. 8c), consistent with the intended post‑fusion refinement.

Comments 11:
Figure 6. Consider adding error bars or interval marks if variability is reported.
Response 11:
Thank you for the suggestion. We keep Figure 6 as point estimates and note in the caption that the relative ordering is consistent across repeated runs under the unified protocol. We will release image‑level bootstrap scripts and provide 95% confidence intervals in the companion repository.

Comments 12:
Language. English is fine; a light copy‑edit to harmonize verb tense and acronyms would suffice.
Response 12:
We performed a light copy‑edit to harmonize verb tenses and acronyms across the paper.

Comments 13–16:
Reproducibility checklist: public PyTorch code/configs; seeds and splits with anti‑leak verification; an end‑to‑end timing script including preprocessing and NMS; v5 checkpoint and logs for three runs.
Response 13–16:
Thank you. The dataset sources are public and cited. The exact seeds and dataset splits with anti‑leak file lists/hashes, training/inference configurations, the end‑to‑end timing script (including preprocessing and NMS), and the v5 checkpoint with logs for the three reported runs are available from the corresponding author upon reasonable request. Subject to project and license constraints, we intend to deposit these materials in a public repository after acceptance.

Back to TopTop