Next Article in Journal
An Advanced Spatiotemporal Fusion Model for Suspended Particulate Matter Monitoring in an Intermontane Lake
Previous Article in Journal
Intelligent Fusion Structure for Wi-Fi/BLE/QR/MEMS Sensor-Based Indoor Localization
 
 
Article
Peer-Review Record

Delineation of Wetland Areas in South Norway from Sentinel-2 Imagery and LiDAR Using TensorFlow, U-Net, and Google Earth Engine

Remote Sens. 2023, 15(5), 1203; https://doi.org/10.3390/rs15051203
by Vegar Bakkestuen 1,*, Zander Venter 1, Alexandra Jarna Ganerød 2,3 and Erik Framstad 1
Reviewer 1:
Reviewer 2:
Remote Sens. 2023, 15(5), 1203; https://doi.org/10.3390/rs15051203
Submission received: 22 January 2023 / Revised: 17 February 2023 / Accepted: 18 February 2023 / Published: 22 February 2023

Round 1

Reviewer 1 Report

This very well written manuscript reports the results from a reasonably robust approach that will have appeal for a number of readers as deep learning and google earth engine are certainly hot topics a present. The introduction, study site description and methods are largely sound although the methods may will prove to be esoteric, that is, focused on the region under study (S Norway). As such, I think this should be reflected in the title.

The main weaknesses are in the results, discussion and conclusion. Greater detail needed when reporting the results. More crucially, a wide range of claims are made. Stick to the evidence that you have provided, everything else is conjecture that shouldn't be reported in a scientific study: As far as I can tell this approach produced a good accuracy score demonstrating a quality product that improves upon existing maps for south norway. Much more evidence is needed to demonstrate the additional claims.

Specific points:

Focus clearly on Norway and should perhaps be reflected in the title? Or at least the type of wetland biome being mapped, i.e. not tropical

Line 68: Lidar is not radar. Radar is an active system employing the microwave region of the spectrum, acting very differently to an optical based active system such as lidar. Please clarify

Line 77: introduce the notion that few studies have used deep learning to map wetlands using S1 and S2 but I would also establish by DL should be used? There are plenty of examples where ML (or even less sophisticated approaches) classifiers have achieved high classification accuracy scores and are perhaps more efficient to train that a DL approach. This is done, to an extent in the paragraph starting on 96 but it would be more thorough to quote the accuracy scores that ML approaches have achieved, thereby giving you a benchmark for assessing your DL approach, i.e. what was the relative improvement offered by the DL approach?

Line 133 Is it possible to have a figure showing what these different land cover types look like? Many readers will not be familiar with this landscape.

Line 170 the very “landed on” is too colloquial. Additionally, it is not clear what this ‘trial and error” involves: visually examining these different bands/indices against prior knowledge of wetland land cover types?

At some point the authors should discuss why radar imagery was not considered? There are many published examples using radar to map wetlands and flood water features.

Personal preference perhaps but fig 2 doesn’t look very professional. Would the authors consider redrawing with some software like app.diagrams.net?

 

A robust approach to ground truth data collection is offered, fully described and justified. Equally, the classification approach is well described in a manner which readers can repeat the methods selected.

Figure 4. it is a bit unprofessional at present and not that informative. I think it is important to have examples such as these but the information is not really coming across. For instance, 4b, it is not clear where the blue hatched area is. 4c, it is not clear what shades are supposed to represent a wetland feature. Authors may consider an additional zoom in (for each) to demonstrate the fine-scale information that the figures display.

Fig 1. Missing key and scale or map grid

Accuracy assessment results: given that you have a range of land cover classes, I would expect to see a full confusion matrix with producers/users accuracy for each

It is important to report the variable importance. A high number of independent variables have been used, including lidar derived metrics that are not widely available. Therefore, it is important to see whether these variables are important. It also might reveal the processes that underpin the distribution of wetlands in southern Norway.

Most wetland mapping procedures have used terrain-based metrics, in many instance, to mask out areas whether wetland presence is unlikely. For instance, setting a threshold on height above nearest drainage network (or even slope angle can be informative to a degree). I presume that there is a reason for not using these commonly used metrics – please can the authors discuss this point.

Lines 423-424: Important: this reads like an important shortcoming in the method. i.e. the models overfit the training data and are not transferable. This affects the scalability of the approach does it not? Extending on this point, lines 432-444 reads too much like conjecture. There is no evidence presented that fewer training data would be equally good. Rather, it sounds like you found the opposite. Moreover, “probably men that methods can be transferred…” I do not think you are in a position to make this statement.

Line 450 and others. You mentioned S1 quite often but you have not used it in your study.

Line 458 You really need to report the variable importance. At the moment, there is not evidence that the RE band offered added benefit.

Conclusions. But there isn’t evidence to say that DL is better than other supervised classification approaches. There are very good examples using object based classification with ML achieving high accuracy scores and transferability over wide regions.

Line 480 spatially explicit and consistent over time and space? What do you mean? How was the “consistency over time and space” defined and evaluated?

Line 481 you have not provided evidence that it is transferable. In fact, perhaps the opposite.

Line 482. This is purely conjecture. A CE analysis was not done.

Line 484: continually updateable but you use airborne lidar?

Author Response

Point-by-point response to reviewer 1

10 February 2023

Dear reviewer. Thank you for valuable comments and input. Below is a point-by-point response to the comments, including the actions we have taken to incorporate them.

General comments:

This very well written manuscript reports the results from a reasonably robust approach that will have appeal for a number of readers as deep learning and google earth engine are certainly hot topics a present. The introduction, study site description and methods are largely sound although the methods may will prove to be esoteric, that is, focused on the region under study (S Norway). As such, I think this should be reflected in the title.

Response: We agree and have included “South Norway” in the title

 

The main weaknesses are in the results, discussion and conclusion. Greater detail needed when reporting the results. More crucially, a wide range of claims are made. Stick to the evidence that you have provided, everything else is conjecture that shouldn't be reported in a scientific study: As far as I can tell this approach produced a good accuracy score demonstrating a quality product that improves upon existing maps for south Norway. Much more evidence is needed to demonstrate the additional claims.

Response: We understand this criticism and have adjusted the discussion and conclusion to reduce the conjecture. A specific example is that we have toned down the claims that DL performs better than other classifiers:

‘In the conclusion we have added this:

 “5) it performed better than existing reference data validated by regional unseen ground truth data. Whether the kind of deep learning approach presented here, or other machine learning methods provide better, and more effective classifications must be explored in future studies”

 

Specific points:

Focus clearly on Norway and should perhaps be reflected in the title? Or at least the type of wetland biome being mapped, i.e. not tropical

Response: We have added “South Norway” in the title.

 

Line 68: Lidar is not radar. Radar is an active system employing the microwave region of the spectrum, acting very differently to an optical based active system such as lidar. Please clarify

Response: Thanks for pointing out this. We have changed the phrase to “including optical, radar sensors and LiDAR”

 

Line 77: introduce the notion that few studies have used deep learning to map wetlands using S1 and S2 but I would also establish by DL should be used? There are plenty of examples where ML (or even less sophisticated approaches) classifiers have achieved high classification accuracy scores and are perhaps more efficient to train that a DL approach. This is done, to an extent in the paragraph starting on 96 but it would be more thorough to quote the accuracy scores that ML approaches have achieved, thereby giving you a benchmark for assessing your DL approach, i.e. what was the relative improvement offered by the DL approach?

Response:  We appreciate the critique and agree that out introduction can be improved in this regard. Therefore, we have added the following text:

“Few studies have applied DL models to wetland classification (Mahdianpari et al. 2020) and to the best of our knowledge none have done so using a fusion of Sentinel-1 and Sentinel-2 data. In a meta-analysis of more than 200 publications, Ma et al. (2019) found that the median accuracy for classifying land use and land cover using DL models was 91% and that there was no other tree- or kernel- based classifier which achieved a median accuracy over 90%. Therefore, although less sophisticated models are more efficient in terms of training data requirement and inference speeds, DL models ultimately achieve higher accuracies for land cover classification.”

 

Line 133 Is it possible to have a figure showing what these different land cover types look like? Many readers will not be familiar with this landscape.

Response: We have added a link to The Norwegian Biodiversity Center where all these land cover types are discussed and illustrated:

https://www.artsdatabanken.no/Pages/172028/Vaatmarkssystemer

 

Line 170 the very “landed on” is too colloquial. Additionally, it is not clear what this ‘trial and error” involves: visually examining these different bands/indices against prior knowledge of wetland land cover types?

Response: We agree. We discuss the procedure in the discussion, see lines 467-477 in the revised manuscript.

We have changed the text here to:

“After trial and error, we selected 3 Sentinel-2 bands and 10 indices as well as the mean canopy height model from LiDAR (accessed from hoydedata.no), as a substitute for tree height, in our explanatory variables.”

 

At some point the authors should discuss why radar imagery was not considered? There are many published examples using radar to map wetlands and flood water features.

Response: We tried to use radar in our first models. Actually, we ran about 150 models with including and excluding variables. However, we did not save each model due to the large amount of data and the cost to store these in the cloud. By visually examining the models with radar included, we finally decided to leave it out from the final model. Radar is discussed in the lines 515-513 in the revised manuscript.

 

Personal preference perhaps but fig 2 doesn’t look very professional. Would the authors consider redrawing with some software like app.diagrams.net? 

Response: Agree. We have made a new version. 

 

A robust approach to ground truth data collection is offered, fully described and justified. Equally, the classification approach is well described in a manner which readers can repeat the methods selected.

Response: OK

 

Figure 4. it is a bit unprofessional at present and not that informative. I think it is important to have examples such as these but the information is not really coming across. For instance, 4b, it is not clear where the blue hatched area is. 4c, it is not clear what shades are supposed to represent a wetland feature. Authors may consider an additional zoom in (for each) to demonstrate the fine-scale information that the figures display.

Response: Agree. We have decided to drop Figure 4. It is difficult to get a good design. We believe that the links we already have in the manuscript cover all the information contained in this figure. See  

https://vegar.users.earthengine.app/view/deeplearningmodel1

https://vegar.users.earthengine.app/view/deeplearningmodel2

 

Fig 1. Missing key and scale or map grid

Response: We have made a new Figure 1 with these included.

 

Accuracy assessment results: given that you have a range of land cover classes, I would expect to see a full confusion matrix with producers/users accuracy for each

Response: Our classification was a binary prediction of wetland and non-wetland, with the target class being wetland (relevant to true/false positive metrics). Although the independent validation data set includes different wetland types, we have aggregated all the types into a single wetland class. We have specified this in the manuscript.

 

It is important to report the variable importance. A high number of independent variables have been used, including lidar derived metrics that are not widely available. Therefore, it is important to see whether these variables are important. It also might reveal the processes that underpin the distribution of wetlands in southern Norway.

Response: Thank you for this important comment. Unlike tree-based models, where variable importance is trivial to calculate, DL methods are computationally intensive and iterating through every combination of predictor variable to isolate variable importance is time consuming. Therefore, we were not able to derive variable importance scores due to time and funding constraints. We have added the following sentences to the discussion section to acknowledge this limitation in our study:

“The transfer value of our results is limited by the fact that we did not calculate variable importance scores for the satellite features that informed the DL model. Due to funding and time constraints, we were not able to calculate importance scores for the set of predictor variables. We identify this as an avenue for further research given that reducing the feature set can drastically improve processing and inference times in DL architectures (Wang et al. 2022).”

 

Most wetland mapping procedures have used terrain-based metrics, in many instances, to mask out areas whether wetland presence is unlikely. For instance, setting a threshold on height above nearest drainage network (or even slope angle can be informative to a degree). I presume that there is a reason for not using these commonly used metrics – please can the authors discuss this point.

Response:  We tried to use slope and other terrain variables directly in the model in an early phase of the modelling. But this led to some unwanted areas being misclassified. But we clearly see the advantage of filtering out unwanted areas from the finished model using, for example, terrain variables. We have therefore added the following text to the discussion:

“Our model has balanced accuracy of 90.9% and, depending on further use of the model, users might explore additional filters and post-processing steps to remove some obvious misclassifies and unwanted areas. It will be possible to use slope to mask out steep areas with low possibility of being wetlands. It will be also possible to set a threshold on height above nearest drainage network. Agricultural land is also annually mapped with high degree of accuracy in Norway. This agricultural land layer may also be used to mask out misclassified wetland in these areas.”

 

Lines 423-424: Important: this reads like an important shortcoming in the method. i.e. the models overfit the training data and are not transferable. This affects the scalability of the approach does it not? Extending on this point, lines 432-444 reads too much like conjecture. There is no evidence presented that fewer training data would be equally good. Rather, it sounds like you found the opposite. Moreover, “probably men that methods can be transferred…” I do not think you are in a position to make this statement.

Response: Thank you for this reflecting comment.

The main challenge in the first run was that the training data did not cover all relevant variation in the study area, but this was rectified in the next round with more training data. We do not perceive this as 'overfitting' of the model. Although the model might not be transferable to other regions, we believe that our workflow and results provide insight that is 'transferable' to other cases. How 'scalable' the method is depending on how variable different wetlands are on different geographical scales. The more variable on a fine scale, the closer the training data must lie (unless this fine scale variation is repeatable on a coarser scale).

  1. The text about augmentation (current line 479) also does not claim that fewer training data can produce equally good results, but simply discusses the potential benefit other methodological approaches such as augmentation.

We agree that we that we are not in position to state that our model approach can be transferred. So we have changed the text:

“Deep learning methods has also been used to classify other LULC classes with some success (Ball et al. 2017, Hoeser et al. 2020, Kattenborn et al. 2021).”

 

Line 450 and others. You mentioned S1 quite often but you have not used it in your study.

Response: We have removed some statements about S1. And we discuss why we ended up not using S1 in lines 515-523 in the revised manuscript.

 

Line 458 You really need to report the variable importance. At the moment, there is not evidence that the RE band offered added benefit.

Response: Agree. We have removed this sentence. For feature importance see our response to the similar question earlier.

 

Conclusions. But there isn’t evidence to say that DL is better than other supervised classification approaches. There are very good examples using object based classification with ML achieving high accuracy scores and transferability over wide regions.

Response: Agree. We don’t state evidence for DL is better anymore. And we have added this to the conclusion:

“Whether the kind of deep learning approach presented here or other machine learning methods provide better and more effective classifications must be explored in future studies.”

 

Line 480 spatially explicit and consistent over time and space? What do you mean? How was the “consistency over time and space” defined and evaluated?

Response:

Here we were comparing remote sensing to manual mapping methods and referring to the consistency of satellite-derived spectral responses over large areas and over many years. We have elaborated on this point in the Conclusion to make it clearer:

“In our experience, classification of ecosystems and land cover classes based on satellite and repeated airborne remote sensing imagery offers some significant advantages over in-situ and manual reference mapping: 1) it covers large areas and multiple years in a consistent and comparable manner– in that sense it is objective compared to manual mapping which is performed by different individuals over different parts of the country.”

 

Line 481 you have not provided evidence that it is transferable. In fact, perhaps the opposite.

Response: Agree. So far no proof that out model is transferable. However, we do discuss generalizable lessons learnt during the generation of our wetland map which may benefit the remote sensing community. Nevertheless, we have been careful not to speculate beyond the evidence in our results.

Line 482. This is purely conjecture. A CE analysis was not done.

Response: Agree. We have not done a CE analysis and we have removed this point.

 

Line 484: continually updateable but you use airborne lidar?

Response: Agreed. A continuous update must be done based on satellites and not LiDAR.

We have added the following to the conclusion:

‘In our experience, classification of ecosystems and land cover classes based on satellite and repeated airborne remote sensing imagery offers some significant advantages over in-situ and manual reference mapping’

‘but note that LiDAR is not continuously updated and updates must not rely this source’

 

Reviewer 2 Report

The authors developed a deep learning based approach for delineation of wetland areas. The method sounds feasible, and the results have been detailedly summarized. Here are some concerns, most of which are about the methodology.

 - In lines 170-176, the reviewer found that the authors used 14-channel data as the input (3 bands, 10 indexes and the LiDAR data). But the reviewer want to know why select them and how each channel contributes to the model.

 - The title is "Delineation of wetland areas from Sentinel-2 imagery using TensorFlow, U-Net and Google Earth Engine". Actually, besides Sentinel-2 imagery, the authors also employed LiDAR data. Was the contribution of LiDAR data incremental? If so, please report the results of model without LiDAR data as the comparison. If not, please revise the title to stress the role of LiDAR data.  

 - Authors in Introduction section claimed that ''to distinguish wetland from other land use and land cover types (LULC), another type of deep learning method called semantic segmentation is needed''; in Method section, authors utilized binary accuracy as evaluation. But for most semantic segmentation models, they are evaluated by mIOU (mean pixel intersection-over-union) instead of binary accuracy. Authors must revise it.

 - Some technical parameters are missing, e.g., learning rate, batch size, the number of layers in U-Net, the number of channels of each convolutional layer, to name but a few. Without them, the reviewer is afraid that the readers cannot reproduce this work. 

 - There are also several typos. For example, line 109, "UNET" -> "U-Net". The more careful re-review on the text should be made. 

Author Response

Point-by-point response to reviewer 2

10 February 2023

Dear reviewer. Thank you for valuable comments and input. Below is a point-by-point response to the comments, including the actions we have taken to incorporate them.

Comments:

The authors developed a deep learning based approach for delineation of wetland areas. The method sounds feasible, and the results have been detailedly summarized. Here are some concerns, most of which are about the methodology.

 - In lines 170-176, the reviewer found that the authors used 14-channel data as the input (3 bands, 10 indexes and the LiDAR data). But the reviewer want to know why select them and how each channel contributes to the model.

Response: Unlike tree-based models, where variable importance is trivial to calculate, DL methods are computationally intensive and iterating through every combination of predictor variable to isolate variable importance is time consuming. Therefore, we were not able to derive variable importance scores due to time and funding constraints. We have added the following sentences to the discussion section to acknowledge this limitation in our study:

“The transfer value of our results is limited by the fact that we did not calculate variable importance scores for the satellite features that informed the DL model. Due to funding and time constraints, we were not able to calculate importance scores for the set of predictor variables. We identify this as an avenue for further research given that reducing the feature set can drastically improve processing and inference times in DL architectures (Wang et al. 2022).”

We ran 153 models with different channels, but we did not take care of them in a systematic way. But by adding new ones and removing others, we eventually got an overview by visual inspection of which channels contributed to better classifications. We know that there has been a program called SHAP which tries to approximate a variable importance assessment in DL models. But we haven't had the time or funding to test this out.

 

 - The title is "Delineation of wetland areas from Sentinel-2 imagery using TensorFlow, U-Net and Google Earth Engine". Actually, besides Sentinel-2 imagery, the authors also employed LiDAR data. Was the contribution of LiDAR data incremental? If so, please report the results of model without LiDAR data as the comparison. If not, please revise the title to stress the role of LiDAR data.  

Response: Agree. We have changed the title to include LiDAR as well.

 

 - Authors in Introduction section claimed that ''to distinguish wetland from other land use and land cover types (LULC), another type of deep learning method called semantic segmentation is needed''; in Method section, authors utilized binary accuracy as evaluation. But for most semantic segmentation models, they are evaluated by mIOU (mean pixel intersection-over-union) instead of binary accuracy. Authors must revise it.

Response: Thanks for this good advice. We have now used and report mIOU as accuracy evaluation.

 

 - Some technical parameters are missing, e.g., learning rate, batch size, the number of layers in U-Net, the number of channels of each convolutional layer, to name but a few. Without them, the reviewer is afraid that the readers cannot reproduce this work.

Response: Thanks for advising. We have added the following text to the methods.

“The U-net model is based on the architecture of a TensorFlow workflow made by the Google Earth Engine Team and can be inspected here. TensorFlow example workflows  |  Google Earth Engine  |  Google Developers.  We used 0.1 as learning rate, batch size 16, 50 epochs and 500 steps per epoch. The U-Net consists of five encoder and five decoder convolutional layers, each consisting of 32, 64, 128, 256, and 512 channels, plus one center layer with 1024 filters.”

 

 - There are also several typos. For example, line 109, "UNET" -> "U-Net". The more careful re-review on the text should be made. 

Response: We have corrected this typo and some others in the manuscript.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors have made significant edits that adequately address the various comments and queries raised in the first review round. 

I still think that you should present your classification accuracy results, per class, as a table: this is standard practise in RS. This can often help unpick where the approach is working well and where improvements could be made.

A number of changes have been made to the discussion that are much more reflective of the scope of the work completed. Use of slope and HAND to mask classification outputs: You should refer to studies that do this rather than leaving it as a loose statement

line 520: "it was an extensive process.." just clarify what "it" is - manual mapping of wetlands? collection of training data? currently it is not clear.

Conclusions: “in our experience”, i.e. a conclusion you have come to as a research team over many years? Or, the evidence presented in this paper demonstrates an efficient and accurate means of mapping wetlands in Norway, presenting improvements over ground-based approaches that are conventionally employed for providing inventories of wetland habitats.

Just double check your formatting post editing: there may be some double line breaks between paragraphs

Author Response

Point-by-point response to reviewer 1

17 February 2023

Dear reviewer. Thank you for the previous and valuable comments that improved our manuscript a lot. Below is a point-by-point response to your latest comments, including the actions we have taken to incorporate them.

 

Reviewer 1

 

The authors have made significant edits that adequately address the various comments and queries raised in the first review round.

 

I still think that you should present your classification accuracy results, per class, as a table: this is standard practise in RS. This can often help unpick where the approach is working well and where improvements could be made.

 

Response: We have added the following table to the manuscript:

“Table 2. Estimated error matrix for the final classification with estimates for user’s accuracy (UA) and producer’s accuracy (PA).

 

   

Reference

 

 

 

 

Wetland

Non-wetland

Total

UA (%)

Prediction

Wetland

491

351

842

58,3

Non-wetland

56

4068

4124

98,6

 

Total

547

4419

4966

 
 

PA (%)

89,8

92,1

 

 

 

We emphasize that our classification covers only two classes, open wetlands versus all other land cover categories

 

A number of changes have been made to the discussion that are much more reflective of the scope of the work completed. Use of slope and HAND to mask classification outputs: You should refer to studies that do this rather than leaving it as a loose statement

 

Response: Thank you for pointing this out. We have added two references to support this:

 

Halabisky, M.; Babcock, C.; Moskal, L.M. Harnessing the Temporal Dimension to Improve Object-Based Image Analysis Classification of Wetlands. Remote Sens. 2018, 10, 1467. https://doi.org/10.3390/rs10091467

Muro, J., Varea, A., Strauch, A., Guelmami, A., Fitoka, E., Thonfeld, F., Diekkrüger, B. & Waske, B. 2020. Multitemporal optical and radar metrics for wetland mapping at national level in Albania. Heliyon, Volume 6, Issue 8, ISSN 2405-8440, https://doi.org/10.1016/j.heliyon.2020.e04496.

 

 

line 520: "it was an extensive process.." just clarify what "it" is - manual mapping of wetlands? collection of training data? currently it is not clear.

 

Response: We agree. We have added more information here:

 

“It was an extensive process and task to create a satisfactory map of wetlands in southern Norway. This was largely due to the lack of satisfactory annotated data and most of the job was digitizing wall-to-wall wetland polygons in the rectangles for collection of image patches”

 

Conclusions: “in our experience”, i.e. a conclusion you have come to as a research team over many years? Or, the evidence presented in this paper demonstrates an efficient and accurate means of mapping wetlands in Norway, presenting improvements over ground-based approaches that are conventionally employed for providing inventories of wetland habitats.

 

Response: Good comment. We think it is both. We have changed the phrase to:

 

“In our experience and by evidence presented in this paper, classification of ecosystems and land cover classes based on satellite and repeated airborne remote sensing imagery offers some significant advantages over in-situ and manual reference mapping:”

 

Just double check your formatting post editing: there may be some double line breaks between paragraphs

 

Response: We have tried to remove double line breaks.

 

Author Response File: Author Response.docx

Reviewer 2 Report

All problems have been addressed.

Author Response

Dear reviewer. Thank you for the previous and valuable comments that improved our manuscript a lot. 

Back to TopTop