Next Article in Journal
Monitoring Post-Flood Recovery of Croplands Using the Integrated Sentinel-1/2 Imagery in the Yangtze-Huai River Basin
Previous Article in Journal
Effects of Climate Change on Vegetation Growth in the Yellow River Basin from 2000 to 2019
 
 
Article
Peer-Review Record

Active Fire Mapping on Brazilian Pantanal Based on Deep Learning and CBERS 04A Imagery

Remote Sens. 2022, 14(3), 688; https://doi.org/10.3390/rs14030688
by Leandro Higa 1, José Marcato Junior 1, Thiago Rodrigues 2, Pedro Zamboni 1, Rodrigo Silva 1, Laisa Almeida 1, Veraldo Liesenberg 3,*, Fábio Roque 1, Renata Libonati 4, Wesley Nunes Gonçalves 1,5 and Jonathan Silva 1,5
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Remote Sens. 2022, 14(3), 688; https://doi.org/10.3390/rs14030688
Submission received: 13 December 2021 / Revised: 16 January 2022 / Accepted: 21 January 2022 / Published: 31 January 2022

Round 1

Reviewer 1 Report

The paper examines the detection of active fires in the Brazilian Pantanal using imagery from the CBERS 4A platform which has been assembled for this project. Fire sources are provided as point labels which are afterwards extended to bounding of a standardized size. As object detetion networks the authors examine two basic networks (FRCNN, Retinanet) and several more advanced techniques such as VFNET, Sabl, PAA and ATSS. The authors devide data into 5 fold containing train, verification and test data evaluate 5 different standardized box sizes. All in all this leads to 150 train-test experiments. Instead of using mAP based evaluation which is strongly dependent on finding exact label box boundaries, the author measure the distance between box centers and apply a threshold to determine hits. Based on these hits Precision, Recall and F1 Measure is determined. Matching between label and predicted location is done by nearest neighor matching. Results indicate that VFNET performs best on the new data set.

In general, I would consider the research question of the paper as whether the CBERS 4A could be used to improve the data quality of the MODIS based state-of-the-art. Though figure one shows that manual annotation seems to find multiple fire sources on the new imagery which where not annotated in the MODIS benchmark, there is no qualitative analysis on how many annotations where detected so far and how many of them could be discovered by the proposed CNN based approaches. The authors should directly compare detection rates here to stress the improvement of including a CBERS 4A based system.

Another problem with current form of the paper is the handling of the point labels. It is okay but not rather elegant to simply fit standard sized bounding boxes around the fire sources. As the smoke plumes might vary in size and shape, the imprecise size and placement of the labeling boxes might lead to negative effects on boxed-based detection methods. 
This might be th reason for the better performance of VFNET as the varifocal loss seems to come along better with this noisy type of annotation. By the way there are some recent works handling noisy data and point annotations. For example, https://arxiv.org/abs/2104.07434 works with point labels. But additional work on label correction and noisy annotations should be checked for being applicable here.
Though these approaches would add technical depth to the methodological part, a large problem is the matching of annotations and detections. As multiple annotations are allowed increasing the radius improves the results as there is only a positive impact on the TPs whereas TN, FP and FN are not negatively impacted. Thus, a more detailed dicussion on why a 20 pixel diameter is still precise enough would be necessary. From the imagery, it is kind of unclear whether 20 pixels offset is tolerable if the object (annotated fire source) is only 2-3 pixels in size. In general, it also seems that the source is often in the center of the smoke plume, e.g., under windy conditions.  Furthermore, allowing multiple matching between detections does not allow two distinguish fires. For example, using a 20 pixel diameter fires which might be 2 km appart from each other might be considered as the same detection. Using an hungarian method to find a cost minimal 1-to-1 matching (as it is done in the FCOS detector) would allow for evaluating the dedicated detection of all ground truth fires.

To summarize, the paper addresses an important and interesting research question but needs some refinement in the presented results:
1. The evaluation method genarating F1 scores should be redefined in order to make results more speaking ( use 1-to-1 matching). The distance threshold should be described in relation to the object size to have a better justification here. In Figure 4 the annotation occlude the center objects here,  but it seems that multiple distinguishable fires could be contained within the same circle.
2. A qualitative comparison between the MODIS benchmark and the newly proposed process based on the CBERS 4A data should be provided.

Minor: The caption of figure 6 should be fixed as it is currently occluded.

Author Response

General comment: Many thanks for your constructive comments. We addressed all your comments as follows. Please, find the answer for each raised point.

 

The paper examines the detection of active fires in the Brazilian Pantanal using imagery from the CBERS 4A platform which has been assembled for this project. Fire sources are provided as point labels which are afterwards extended to bounding of a standardized size. As object detetion networks the authors examine two basic networks (FRCNN, Retinanet) and several more advanced techniques such as VFNET, Sabl, PAA and ATSS. The authors devide data into 5 fold containing train, verification and test data evaluate 5 different standardized box sizes. All in all this leads to 150 train-test experiments. Instead of using mAP based evaluation which is strongly dependent on finding exact label box boundaries, the author measure the distance between box centers and apply a threshold to determine hits. Based on these hits Precision, Recall and F1 Measure is determined. Matching between label and predicted location is done by nearest neighor matching. Results indicate that VFNET performs best on the new data set.

In general, I would consider the research question of the paper as whether the CBERS 4A could be used to improve the data quality of the MODIS based state-of-the-art. Though figure one shows that manual annotation seems to find multiple fire sources on the new imagery which where not annotated in the MODIS benchmark, there is no qualitative analysis on how many annotations where detected so far and how many of them could be discovered by the proposed CNN based approaches. The authors should directly compare detection rates here to stress the improvement of including a CBERS 4A based system.

Answer: We agree and to address this issue we added a new section (Section 3.3 - lines 416-432). In this section, we applied an inference using VFNET considering the image from Figure 1. Besides, we conducted a quantitative and qualitative analysis. For more details, please see Section 3.3 (lines 416-432).

 

Another problem with current form of the paper is the handling of the point labels. It is okay but not rather elegant to simply fit standard sized bounding boxes around the fire sources. As the smoke plumes might vary in size and shape, the imprecise size and placement of the labeling boxes might lead to negative effects on boxed-based detection methods. This might be th reason for the better performance of VFNET as the varifocal loss seems to come along better with this noisy type of annotation. By the way there are some recent works handling noisy data and point annotations. For example, https://arxiv.org/abs/2104.07434 works with point labels. But additional work on label correction and noisy annotations should be checked for being applicable here.

 

Answer: In the first version, it was not clear the relation between smoke plumes and active fire. For that, we added an explanation in Section 2.2 (see lines 173-177). Our annotation is exactly in the cone apex, as the smoke of the active fire has a cone as a pattern. So, the initial pattern of smoke fire activity in the center of the bounding box. In this sense, the unique difference between a bounding box of size 10 to one with size 50 is that the bigger one considers a wide body of smoke fire activity. However, both bounding boxes contain the same initial portion of object interest (cone apex). The advantage to considering bigger bounding boxes is that the models could find more information about the pattern of smoke fire. However, it can introduce much more irrelevant information (for example ground, rivers, and vegetation) inside the bounding box (independent of whether it be centered on an object or not). I hope that this explanation associated with the added text addressed your concern. The suggested article is very interesting in the context of weakly semi-supervised detection by points, and we can investigate this kind of approach in future research.

 

Though these approaches would add technical depth to the methodological part, a large problem is the matching of annotations and detections. As multiple annotations are allowed increasing the radius improves the results as there is only a positive impact on the TPs whereas TN, FP and FN are not negatively impacted. Thus, a more detailed dicussion on why a 20 pixel diameter is still precise enough would be necessary.From the imagery, it is kind of unclear whether 20 pixels offset is tolerable if the object (annotated fire source) is only 2-3 pixels in size. In general, it also seems that the source is often in the center of the smoke plume, e.g., under windy conditions.  Furthermore, allowing multiple matching between detections does not allow two distinguish fires. For example, using a 20 pixel diameter fires which might be 2 km appart from each other might be considered as the same detection. Using an hungarian method to find a cost minimal 1-to-1 matching (as it is done in the FCOS detector) would allow for evaluating the dedicated detection of all ground truth fires.1 

 

Answer: Thank you for your comment. The text in the first version was focused mainly on considering 20 pixels. However, Figure 6 presents the results considering 10 (550 meters), 15 (825 meters), and 20 pixels (1100 meters). As expected, the F1 decreased considering 10 and 15 pixels compared to 20 pixels. In the revised version, we presented this discussion at the beginning of section 3.1 (see lines 312-314). Besides, we added a practical justification in considering these distance thresholds lines (314-317). Even 20 pixels is tolerable in Pantanal (study area) because the area presents a flat terrain, making it easy for firefighters to see the focus. 

In this application, it is more interesting to consider predictions close enough to any ground truth instead of finding a 1-to-1 matching between predictions and their respective annotation. In this sense, our metric is more relaxed than the Hungarian method, because it considers some FP that are close enough to any ground truth as a TP.  Also considering only the Hungarian method to assign each prediction to each ground truth without a threshold distance, could result in assigning very far objects to the annotations which can not be related spatially. Based on Figure 5, we added a discussion in the article to justify our choice.

 

In this example, we have four ground truths (highlighted as red circles) and four predictions (yellow circles). The resulting pairwise distances between predictions and ground truths can be represented as a distance matrix. In this case, we compare the 1-to-1 Hungarian Matching and our metric to obtain the TP, FP and FN values to calculate precision, recall and F1 score. For the Hungarian method, we consider only distances that are below a threshold value (20 pixels in our example). According to our metric, we can find 3 predictions as TP (one on top of figure and two on bottom) and 1 prediction as FP (the lower yellow circle) due to its distance from any ground truth being above the threshold distance, which results in an F1 score equal to 0.85.  However, considering the Hungarian method, we have the same number of TP and FP as in our metric, but one ground truth becomes a FN since there is no prediction left (those that were not assigned yet to any ground truth) to assign to it and obtains a F1 score equal to 0.75. Comparing these two F1 scores, we can note that Hungarian matching reduces the value from 0.85 to 0.75. Considering our application, we can observe that all critical regions in this image with smoke plumes were identified by our method. Even if our method found only two predictions (one on top e other on bottom) the results are relevant for the application since these predictions are close enough to each annotation. So, it is more interesting to have a more flexible metric that gives good predictions close enough (according to a threshold distance) to representative smoke active fires than a more elaborated method that can find an exact match to each annotation. Another challenge in the annotation task is that it is impossible to annotate (with bounding boxes) individually each smoke fire activity since they can occur very close to other ones. In this sense, we consider evaluating our results under this assumption with our distance threshold metric. We add this explanation in Section 2.5 (lines 289-305).

 

To summarize, the paper addresses an important and interesting research question but needs some refinement in the presented results:

  1. The evaluation method genarating F1 scores should be redefined in order to make results more speaking ( use 1-to-1 matching). The distance threshold should be described in relation to the object size to have a better justification here. In Figure 4 the annotation occlude the center objects here,  but it seems that multiple distinguishable fires could be contained within the same circle.

 

Answer: Thank you for your comment. We describe in the text comparison between our metric and by using the Hungarian 1-to-1 matching to justify our metric (Section 2.5 - lines 289-305). We hope that the previous explanation justifies our choice for these thresholds. 

 

  1. A qualitative comparison between the MODIS benchmark and the newly proposed process based on the CBERS 4A data should be provided.

 

Answer: We provide a qualitative analysis of methods to discuss the impact of trained models on CBERS4A to predict smoke fire activities in comparison to BD Queimadas data. Please, see Section 3.3 (see lines 416-432).

 

Minor: The caption of figure 6 should be fixed as it is currently occluded.

Answer: Many thanks for your detailed revision.

Reviewer 2 Report

The authors present a test of using state-of-the art imagine detection techniques to detect active fires through smoke plume detection in Brazil. Several objective detection techniques centered on using CNNS are compared and validated against human detection. The authors found promising results which warrant further development. The main issues are that with such technical descriptions of objection detect, the text can be hard to follow. A diagram would greatly help. Second, the paper tends to conflate smoke with fire. A brief discussion of this issues is warranted to place using smoke plumes to detect fires vs other methods, such as use of thermal sensors.   

Line 10: just list general technique, with one specific example.

Line 17: detecting smoke vs fire activity? Paper needs to better emphasize what is being measured here. Smoke is being used as a proxy for active fires. How much fire activity is being missing vs using thermal signatures?

Line 77: Meaning unclear relative to small fires.

Line 187: Description of approach actually used by authors vs the literature is hard to sort through. A diagram of workflow is needed.

Line 374: Would be interesting to compare with thermal signatures.

Author Response

General comment: Many thanks for your constructive comments. We addressed all your comments as follows. The answer for each point raised is presented below in red.

 

The authors present a test of using state-of-the art imagine detection techniques to detect active fires through smoke plume detection in Brazil. Several objective detection techniques centered on using CNNS are compared and validated against human detection. The authors found promising results which warrant further development. The main issues are that with such technical descriptions of objection detect, the text can be hard to follow. A diagram would greatly help. Second, the paper tends to conflate smoke with fire. A brief discussion of this issues is warranted to place using smoke plumes to detect fires vs other methods, such as use of thermal sensors.   

 

Line 10: just list general technique, with one specific example.

Answer: Thank you for your observation. We opted to remove all the names because the main idea is regarding the approach proposition, which does not depend on the method. This suggestion was also important to reduce the abstract size, which was necessary.

 

Line 17: detecting smoke vs fire activity? Paper needs to better emphasize what is being measured here. Smoke is being used as a proxy for active fires. How much fire activity is being missing vs using thermal signatures?

Answer: The reviewer is right. Here, smoke plumes are used as a proxy for active fire detection. This approach follows previous works where smoke plumes were used for the accuracy assessment of active fire detections (ref1, ref2, ref3). This point was clarified in the text in order to avoid misleading (see lines 172-177). 

ref1 - Abuelgasim, A.; Fraser, R. Day and night-time active fire detection over North America using NOAA-16 AVHRR data. IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2002, Vol. 3, pp. 1489–1491.

ref2 - Christopher, S.A.; Wang, M.; Barbieri, K.; Welch, R.M.; Yang, S.K. Satellite remote sensing of fires, smoke and regional radiative energy budgets. IGARSS’97. 1997 IEEE International Geoscience and Remote Sensing Symposium Proceedings. Remote Sensing-A Scientific Vision for Sustainable Development. IEEE, 1997, Vol. 4, pp. 1923–1925.

ref3 - Giglio, L.; Descloitres, J.; Justice, C.O.; Kaufman, Y.J. An enhanced contextual fire detection algorithm for MODIS. Remote sensing of environment 2003, 87, 273–282.



Line 77: Meaning unclear relative to small fires.

Answer: The reviewer is right again. The small fire size that we considered were those with smaller dimensions than MODIS spatial resolution of one square kilometer. Text updated (see lines 78-79).

 

 

Line 187: Description of approach actually used by authors vs the literature is hard to sort through. A diagram of workflow is needed.

 

Answer: The CNN’s used in our experiments to train models are the same purpose in the literature by original authors. These approaches were considered from MMDetection library and are publicly available. However, to better explain our work process we provide a figure (see Figure 4, between the lines 256-257) to explain the main process of training models to clarify this issue.



Line 374: Would be interesting to compare with thermal signatures.

Answer: We agree and to address this issue we added a new section (Section 3.3 - see lines 417-432). In this section, we applied an inference using VFNET considering the image from Figure 1. Besides, we conducted a quantitative and qualitative analysis. For more details, please see Section 3.3.




Round 2

Reviewer 1 Report

Thank you for addressing the  named short comings. From my point of view, the requested clarification were provided and the paper is ready for publication.

I would consider to read over the changed part of the paper again, as there seemed to be some typos left.

For example, line 427: Quantitatively it is  possible... VFNet. Therefore,...

 

Back to TopTop