Next Article in Journal
Physically Based and Data-Driven Models for Landslide Susceptibility Assessment: Principles, Applications, and Challenges
Previous Article in Journal
Sparse Reconstruction-Based Target Localization with Distributed Waveform-Diverse Array Radars
 
 
Article
Peer-Review Record

From Clusters to Communities: Enhancing Wetland Vegetation Mapping Using Unsupervised and Supervised Synergy

Remote Sens. 2025, 17(13), 2279; https://doi.org/10.3390/rs17132279
by Li Wen 1,*, Shawn Ryan 1, Megan Powell 1,2 and Joanne E. Ling 1
Reviewer 1:
Reviewer 2:
Reviewer 3:
Remote Sens. 2025, 17(13), 2279; https://doi.org/10.3390/rs17132279
Submission received: 5 June 2025 / Revised: 25 June 2025 / Accepted: 30 June 2025 / Published: 3 July 2025
(This article belongs to the Section Environmental Remote Sensing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper presents a framework for mapping floodplain wetland vegetation communities through the integration of multi-temporal Sentinel-1 and Sentinel-2 imagery and various hydrological variables derived from LiDAR and SRTM digital elevation models. More specifically, the authors compared the relative contributions of the different predictor sets, analyzing the results across three classification levels (formation, functional group, and plant community type). As likely to be expected, the authors found that the model performance improved across all three of the classification levels with the inclusion of more predictor variables. Additionally, the authors found that the models performed best at the coarsest thematic resolution, with diminishing accuracy as the thematic resolution increased. Nonetheless, the full model consistently had high accuracy (>90%) at all three levels at classification, while the model that relied on topographic variables alone consistently showed poor performance.

Overall, the article is well written, the methodology is sound, and I believe it makes a valuable contribution to the literature of high resolution vegetation mapping of wetland communities. Overall, I have very few comments and even those are quite minor.

Specific Comments

Under Material and Method starting at line 168, the authors write “This study was specifically designed to address these challenges…” Personally, given the introduction and other materials presented, I found this unnecessary and redundant. We already know the purpose of the study and it doesn’t need to be included again in the material and methods section.

While this may just be the proof I have, the maps in figure 5 are all very difficult to read. Not only are the maps small and the text in the chart unreadable (again, this could just be the proof I have), the numbering in the legend don’t seem to make sense. I understand that numbers were likely used due to space constraints, but when I refer back to table one (which is inconvenient), I do not see associated numbers that match the legend. Am I supposed to assume that with each column (formation, functional group, and PCT) the numbering restarts at 1 for each row, so, for instance, Formation: Riverine Forest is 1, Riverine Woodland is 2 etc. and for Functional Group Riverine Forest is 1 and Riverine Forest/woodland is 2, etc. and for PCT River Red Gum-sedge open forest is 1 etc.? I assume that is the case, but it is really not intuitive.

As a complete aside, now that this methodology is established, would it be more efficient to just run the model at the highest thematic resolution (PCT) even though its accuracy is not the best because there is a many-to one relation for PCT to Functional Group and a many-to-one Functional Group to Formation relationship? In other words, if you can accurately predict the PCT, then you intrinsically know the functional group and the formation for each. I don’t know if this could be included in the discussion, but it is a thought. The authors have proved the efficacy of the method and show that, even though the accuracy for the PCT level is not as high as the accuracy of the formation level, it is still high.

Lastly, in the first sentence of the discussion (lines 473-475), the authors right that the method substantially reduces the need for costly field-based sampling. I somewhat disagree. The field sampling is going to remain a very important aspect of any such studies, whether it is for the purpose of ground truthing or if it is just to acquire information upon which to base the classes.

Author Response

This paper presents a framework for mapping floodplain wetland vegetation communities through the integration of multi-temporal Sentinel-1 and Sentinel-2 imagery and various hydrological variables derived from LiDAR and SRTM digital elevation models. More specifically, the authors compared the relative contributions of the different predictor sets, analyzing the results across three classification levels (formation, functional group, and plant community type). As likely to be expected, the authors found that the model performance improved across all three of the classification levels with the inclusion of more predictor variables. Additionally, the authors found that the models performed best at the coarsest thematic resolution, with diminishing accuracy as the thematic resolution increased. Nonetheless, the full model consistently had high accuracy (>90%) at all three levels at classification, while the model that relied on topographic variables alone consistently showed poor performance.

Overall, the article is well written, the methodology is sound, and I believe it makes a valuable contribution to the literature of high resolution vegetation mapping of wetland communities. Overall, I have very few comments and even those are quite minor.

Response: Thank you for your positive feedback. We have carefully addressed all your comments. Please find our point-by-point responses below.

Specific Comments

Under Material and Method starting at line 168, the authors write “This study was specifically designed to address these challenges…” Personally, given the introduction and other materials presented, I found this unnecessary and redundant. We already know the purpose of the study and it doesn’t need to be included again in the material and methods section.

Response: Thank you for your insightful feedback. We understand your concern regarding the redundancy in the Material and Methods section. To address this, we have revised the paragraph to focus on the methodology without reiterating the study's purpose. The updated paragraph is as follows:

To develop a classification framework for detailed vegetation mapping in the Great Cumbung Swamp, we integrated multi-source remote sensing data with advanced machine learning techniques. We combined Sentinel-2 optical imagery, Sentinel-1 radar data, and hydro-morphological predictors from LiDAR and SRTM DEMs to capture vegetation structure and moisture regimes. An unsupervised clustering approach was used to create an efficient training dataset, enhancing the model's ability to distinguish between similar vegetation types.

While this may just be the proof I have, the maps in figure 5 are all very difficult to read. Not only are the maps small and the text in the chart unreadable (again, this could just be the proof I have), the numbering in the legend don’t seem to make sense. I understand that numbers were likely used due to space constraints, but when I refer back to table one (which is inconvenient), I do not see associated numbers that match the legend. Am I supposed to assume that with each column (formation, functional group, and PCT) the numbering restarts at 1 for each row, so, for instance, Formation: Riverine Forest is 1, Riverine Woodland is 2 etc. and for Functional Group Riverine Forest is 1 and Riverine Forest/woodland is 2, etc. and for PCT River Red Gum-sedge open forest is 1 etc.? I assume that is the case, but it is really not intuitive.

Response: Thank you for your constructive comments. We added columns of “Map ID” in Table 1.

As a complete aside, now that this methodology is established, would it be more efficient to just run the model at the highest thematic resolution (PCT) even though its accuracy is not the best because there is a many-to one relation for PCT to Functional Group and a many-to-one Functional Group to Formation relationship? In other words, if you can accurately predict the PCT, then you intrinsically know the functional group and the formation for each. I don’t know if this could be included in the discussion, but it is a thought. The authors have proved the efficacy of the method and show that, even though the accuracy for the PCT level is not as high as the accuracy of the formation level, it is still high.

Response: We appreciate your constructive comment. We added a couple of paragraphs in the Discussion Section 4.3:

“An important practical consideration is whether classification at the highest thematic resolution—the plant community type (PCT) level—offers a more efficient workflow, even though its accuracy is lower than for broader categories such as functional groups or formations. Given the nested, hierarchical structure of the classification system, accurate prediction at the PCT level intrinsically provides the corresponding functional group and formation information. This approach would allow for all three classification levels to be derived in a single processing step, reducing computational effort and simplifying model implementation.

While classification accuracy typically declines as thematic detail increases, the PCT-level accuracy achieved in this study remains high, exceeding many comparable multi-class wetland mapping efforts. Future work could formally test this hierarchical workflow by comparing derived functional group and formation maps from PCT predictions against independently generated models at each level, providing practical insights into trade-offs between accuracy, efficiency, and thematic resolution”.

Lastly, in the first sentence of the discussion (lines 473-475), the authors write that the method substantially reduces the need for costly field-based sampling. I somewhat disagree. The field sampling is going to remain a very important aspect of any such studies, whether it is for the purpose of ground truthing or if it is just to acquire information upon which to base the classes.

Response: This is a fair and constructive comment. We revised the sentence to avoid over-claiming as:

“This study presents a robust and scalable framework for inland floodplain vegetation mapping that achieves high thematic resolution and classification accuracy, while reducing reliance on extensive field-based sampling for model training. Nonetheless, field data remain essential for defining classification schemes, guiding sample selection, and providing independent validation”.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

General Comments:

This manuscript presents an innovative and well-structured framework for mapping fine-scale wetland vegetation in inland floodplains using multi-source remote sensing data. The authors combine unsupervised K-means clustering with supervised Random Forest classification, using Sentinel-1 SAR, Sentinel-2 optical time series, and hydro-morphological variables derived from LiDAR and SRTM. The approach is applied to the Great Cumbung Swamp in Australia, producing highly accurate vegetation maps at multiple ecological levels.

However, there are several aspects that would benefit from clarification or further elaboration, particularly regarding model explainability, reproducibility, and limitations. I recommend major revisions before acceptance.

 

Specific Comments

  1. Abstract

(1)Please consider reporting the exact classification accuracy at the most detailed level (e.g., 93% for PCTs) to highlight the strength of the results.

(2)You may also briefly list key evaluation metrics (e.g., Kappa, MCC) to convey the rigor of model validation.

  1. Introduction

(1)The introduction is comprehensive but could be improved by adding a short review of similar cluster-guided or semi-supervised sampling strategies in ecological remote sensing.

(2)Consider moving the detailed ecological description of the Great Cumbung Swamp (lines 77–88) to the Study Area section for better flow.

  1. Materials and Methods

(1)Please elaborate on the GAMM fusion process between LiDAR and SRTM DEMs. How was the model validated?

(2)Justify the choice of K = 30 in the K-means clustering. Was it empirically determined or optimized using validation metrics?

(3)A spatial map of the selected training samples (or clusters) would help demonstrate ecological representativeness.

(4)Consider providing code or pseudocode in the supplementary material or via GitHub for better transparency.

  1. Results

(1)Figure 3: The violin plots are effective, but please include a table summarizing the number and types of predictors used in each model variant.

(2)Figure 4: Include class names on the bar chart to enhance interpretability.

(3)Consider adding a confusion matrix for the full model at the PCT level (at least for the top 10 classes) to assess specific misclassifications.

(4)Include a ranking of variable importance from the Random Forest model to enhance model interpretability.

  1. Discussion

(1)Please elaborate on the transferability of the proposed framework to other wetland or floodplain systems with different climatic or geomorphological settings.

(2)Include a brief discussion on potential limitations, such as seasonal data gaps, the subjectivity of cluster labeling, or challenges in regions with fewer reference data.

(3)When comparing with deep learning approaches (e.g., U-Net), provide more direct comparisons regarding training data requirements, computational cost, or boundary accuracy.

 

Comments for author File: Comments.pdf

Comments on the Quality of English Language

Language expression needs to be improved.

Author Response

This manuscript presents an innovative and well-structured framework for mapping fine-scale wetland vegetation in inland floodplains using multi-source remote sensing data. The authors combine unsupervised K-means clustering with supervised Random Forest classification, using Sentinel-1 SAR, Sentinel-2 optical time series, and hydro-morphological variables derived from LiDAR and SRTM. The approach is applied to the Great Cumbung Swamp in Australia, producing highly accurate vegetation maps at multiple ecological levels.

However, there are several aspects that would benefit from clarification or further elaboration, particularly regarding model explainability, reproducibility, and limitations. I recommend major revisions before acceptance.

Response: Thank you for your supportive feedback. We have carefully addressed all your comments and suggestions. Please find our point-by-point responses below.

Specific Comments

Abstract

(1)Please consider reporting the exact classification accuracy at the most detailed level (e.g., 93% for PCTs) to highlight the strength of the results.

(2)You may also briefly list key evaluation metrics (e.g., Kappa, MCC) to convey the rigor of model validation.

Response: Thank you for your insightful comments. We revised the Abstract as:

“High thematic resolution vegetation mapping is critical for understanding ecosystem structure, guiding conservation, and supporting effective wetland management, yet it remains challenging in large, heterogeneous floodplain landscapes. This study presents a robust framework for detailed vegetation classification in inland wetlands, integrating unsupervised clustering, expert-guided sample selection, and Random Forest modelling using multi-temporal Sentinel-1, Sentinel-2, and hydro-morphological data. Applied to the Great Cumbung Swamp in the lower Lachlan River floodplain, Australia, the approach produced vegetation maps at three hierarchical levels: formations (9 classes), functional groups (14 classes), and plant community types (PCTs; 23 classes). At the PCT level, the model achieved an overall accuracy of 93.2%, a Kappa coefficient of 0.91, and a Matthews Correlation Coefficient (MCC) of 0.89, with accuracies exceeding 95% at broader classification levels. The results demonstrate that, when supported by targeted sample selection and multi-source data integration, high thematic resolution wetland vegetation mapping can be achieved with minimal field data collection. The hierarchical structure also enables broader vegetation categories to be efficiently derived from PCT outputs, providing a practical and scalable pathway for wetland monitoring and conservation planning”.

Introduction

(1)The introduction is comprehensive but could be improved by adding a short review of similar cluster-guided or semi-supervised sampling strategies in ecological remote sensing.

Response: Thanks for your insightful comment. We added a few sentences in the Introduction as:

“Recent studies in ecological remote sensing have highlighted the potential of cluster-guided and semi-supervised sampling strategies to improve data efficiency and model performance. Cluster-guided approaches, such as K-means clustering (MacQueen, 1967), group similar data points based on spectral or environmental characteristics, enabling targeted, representative sampling across heterogeneous landscapes (Hastie et al., 2009). Semi-supervised strategies further enhance this by leveraging both labeled and unlabeled data, reducing the need for extensive field data while capturing complex ecological patterns (Zhu, 2005; Chapelle et al., 2006). These methodologies have proven effective for improving vegetation mapping, biodiversity assessment, and habitat monitoring in complex ecosystems (Dronova et al., 2011).

(2)Consider moving the detailed ecological description of the Great Cumbung Swamp (lines 77–88) to the Study Area section for better flow.

Response: Thank you for your valuable feedback. We agree that moving the detailed ecological description of the Great Cumbung Swamp to the Study Area section would improve the flow of the manuscript. We have made this adjustment to ensure a more logical and coherent structure.

Materials and Methods

(1)Please elaborate on the GAMM fusion process between LiDAR and SRTM DEMs. How was the model validated?

Response: Thank you for this valuable suggestion. We have now elaborated on the GAMM-based fusion process between the LiDAR-derived DEM and the hydrologically enforced SRTM DEM in the Methods section of the revised manuscript.

In this study, we applied a Generalized Additive Mixed Model (GAMM) to predict high-resolution (5 m) LiDAR-derived elevation using the SRTM DEM and land cover (LC) information as predictors. The GAMM was implemented using the “mgcv” and “caret” packages in R, following a repeated cross-validation procedure to ensure robust model validation. Specifically, the model was structured as follows:

library(caret)

library(mgcv)

set.seed(111)

cv <- train(DEM_LiDAR ~ SRTM_DEM_H + LC, data = model.data, method = "gam", family = "gaussian",

            trControl = trainControl(method = "repeatedcv", number = 20, repeats = 5),

            tuneGrid = data.frame(method = "GCV.Cp", select = FALSE))

Here, DEM_LiDAR represents the target high-resolution elevation, SRTM_DEM_H is the hydrologically enforced SRTM DEM (1-second resolution), and LC is the categorical land cover factor derived from Dynamic World land cover. The model used a repeated 20-fold cross-validation approach with five repeats to evaluate predictive performance and prevent overfitting.

The final GAMM achieved a high coefficient of determination (R²) of 0.8876, indicating strong agreement between predicted and actual LiDAR elevations. This suggests that the fusion process effectively leveraged the broad coverage of SRTM DEM and the spatial information from land cover to extend elevation estimates across the study area, particularly in locations where LiDAR data were unavailable.

This fused DEM was subsequently used as a hydro-morphological predictor in the vegetation classification workflow, providing high-resolution elevation information essential for distinguishing wetland vegetation communities.

(2)Justify the choice of K = 30 in the K-means clustering. Was it empirically determined or optimized using validation metrics?

Response: The number of clusters is determined empirically – in the text we wrote “informed by expert knowledge of vegetation community diversity in the region”.

(3)A spatial map of the selected training samples (or clusters) would help demonstrate ecological representativeness.

Response: Thanks for your helpful feedback. The samples (at foundation level) were represented in Figure 1, map of the study site. It would be quite messy and hard to distinguish if we try to plot samples at PCT level (23 classes).

(4)Consider providing code or pseudocode in the supplementary material or via GitHub for better transparency.

Response: Thanks for your suggestion – the R codes was added as S3.

Results

(1)Figure 3: The violin plots are effective, but please include a table summarizing the number and types of predictors used in each model variant.

Response: Thank you for this constructive suggestion. We agree that clearly summarizing the predictors used in each model variant is important for transparency. In this study, we intentionally designed Figure 3 to both illustrate model performance and variations of cross-validation results. The x-axis labels and figure caption explicitly describe the combinations of predictor groups (e.g., Sentinel-1, Sentinel-2, hydro-morphological variables), providing a clear summary alongside the model performance distributions.

To avoid redundancy and maintain manuscript brevity, we have opted not to include a separate summary table listing the predictors, as this information is already conveyed in Figure 3 and described in the Methods section. We believe this integrated presentation is sufficient for readers to interpret the model configurations and their comparative performance.

However, we would be happy to provide a supplementary table summarizing the predictors if the editor feels this would further benefit readers.

(2)Figure 4: Include class names on the bar chart to enhance interpretability.

Response: Thank you for this helpful suggestion. We agree that including class names enhances interpretability. In the current version of Figure 4, the class names corresponding to each Plant Community Type (PCT) are already included on the x-axis to provide clear context for each bar. We will review the figure layout to ensure the class names are legible and visually clear. If necessary, we can adjust the formatting (e.g., font size or label orientation) to further improve readability in the final version.

(3)Consider adding a confusion matrix for the full model at the PCT level (at least for the top 10 classes) to assess specific misclassifications.

Response: Thank you for your suggestion. We added the testing confusion matrices as S4.

(4)Include a ranking of variable importance from the Random Forest model to enhance model interpretability.

Response: Thank you for your suggestion. We added the testing confusion matrices as S5.

Discussion

(1)Please elaborate on the transferability of the proposed framework to other wetland or floodplain systems with different climatic or geomorphological settings.

Response: Thank you for your valuable suggestion. We added a paragraph in Discussion 4.3. as:

Although this study focused on the Great Cumbung Swamp, the proposed framework is designed to be broadly transferable to other wetland or floodplain systems. The integration of multi-temporal satellite data, hydro-morphological predictors, and cluster-guided sample selection does not rely on site-specific conditions, making the approach adaptable to landscapes with varying geomorphology, vegetation complexity, or climate regimes. However, transferability is likely to depend on the availability and quality of input data, particularly high-resolution terrain information and appropriately calibrated remote sensing imagery. In more topographically complex or hydrologically distinct environments, adjustments to predictor selection or model tuning may be required to account for local ecological drivers. Future research applying this framework across contrasting wetland types—such as arid-zone floodplains, coastal marshes, or peatlands—would help to further assess its generalizability and refine best practices for scalable wetland vegetation mapping [63].

(2)Include a brief discussion on potential limitations, such as seasonal data gaps, the subjectivity of cluster labeling, or challenges in regions with fewer reference data.

Response: Thank you for your valuable suggestion. See above. We also added information in Discussion 4.4.

(3)When comparing with deep learning approaches (e.g., U-Net), provide more direct comparisons regarding training data requirements, computational cost, or boundary accuracy.

Response: Thank you for your insightful comment. We added a paragraph to Discussion 4.2 as:

While deep learning approaches such as U-Net and other convolutional neural networks (CNNs) have shown great promise for vegetation classification and boundary delineation, they typically require large, well-labeled training datasets and substantial computational resources [64]. This can present significant challenges for wetland mapping in remote or data-scarce regions. Although deep learning models may offer enhanced boundary precision under optimal conditions, our framework provides a more accessible and scalable alternative, particularly suited to heterogeneous landscapes where field data collection is logistically or financially constrained. Moreover, the cluster-guided sample selection used in this study addresses a key limitation of many machine learning and deep learning applications by improving training data quality while minimizing ground survey demands. Future research could explore the integration of object-based segmentation with deep learning to combine the strengths of both approaches, enhancing boundary accuracy while maintaining efficiency in training data requirements.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This article proposes an innovative wetland vegetation mapping framework that achieves high-precision vegetation classification in complex wetland ecosystems by integrating multi-source data and unsupervised clustering guided training sample selection methods. The research methods are scientific and reasonable, and the results have important application value, which has a positive promoting effect on wetland protection and sustainable management. The writing of the paper is clear, logically coherent, and provides detailed descriptions of data and methods, which can provide useful references for researchers in related fields. Therefore, this manuscript has great potential for publication in the journal Remote Sensing. However, the following issues need to be considered before publication:

1. The current abstract content is too simple, lacking information on the research gap, key technologies used, main results supported by data, and the conclusions of this study.

The article mentions the importance of wetland ecosystems and the threats they face, but does not elaborate on the specific manifestations of these threats in the study area. For example, has the Lachlan River basin, where the Great Cumburg Swamp is located, been significantly affected by agricultural development, water resource exploitation, or climate change? How are these impacts reflected in the changes of wetland vegetation?

The author mentioned that 'Recent advances in remote sensing and ecological modeling have substantially improved the potential for large-scale wetland vegetation mapping', but did not provide a detailed list of the specific content of these advances. Although some technological developments are mentioned later in the article, there is a lack of systematic summary. It is suggested to add a brief section in the introduction, specifically summarizing the technological progress in wetland vegetation mapping in recent years, including but not limited to the emergence of high-resolution satellite platforms, the application of time series analysis, the development of machine learning algorithms, etc., and pointing out how these advances have promoted the development of wetland vegetation mapping. At the same time, the correlation between these advances and the research method can be compared.

I think it would be better to move the content from lines 90-106 to the Discussion section

In the "Classification Accuracy" section, the author presents the performance comparison results of different model configurations. Why is the performance of the Full Model better than other models? Which specific variables or combinations of variables contribute the most to performance improvement

In the "Performance in the Context of Recent Studies" section, I think the current discussion is somewhat superficial. The author may consider adding a detailed comparative analysis between this research method and other similar research methods, highlighting the innovation and advantages of this study. For example, comparisons can be made from multiple aspects such as data sources, methodology, classification accuracy, and field data requirements, and the significance and impact of these differences on wetland vegetation mapping can be discussed.

Conclusion, I suggest major revisions before considering whether it meets the publication standards of the journal.

Author Response

See attached notes of revision.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The author has made improvements based on the revision suggestions.

Reviewer 3 Report

Comments and Suggestions for Authors

I carefully read the author's revised manuscript and response, and found that the author made a lot of detailed revisions. I think the current manuscript can be considered for publication. I also believe that this manuscript will arouse the interest of many scholars.

Back to TopTop