Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Advanced Machine Learning Methods as a Planning Strategy in the Capellanía Wetland

Sustainability 2025, 17(18), 8462; https://doi.org/10.3390/su17188462

by Oscar Armando Cáceres Tovar¹

, José Alejandro Cleves-Leguízamo²

and Gina Paola González Angarita^3,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Sustainability 2025, 17(18), 8462; https://doi.org/10.3390/su17188462

Submission received: 5 August 2025 / Revised: 4 September 2025 / Accepted: 17 September 2025 / Published: 20 September 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Main results

Method innovation: By combining spectral indices (NDVI, NDWI, NDMI, MNDWI) with machine learning (random forest, cellular automata), the vegetation cover change and urban expansion trend of Capellan í a wetland were successfully predicted with high model accuracy (R ²=0.96, MAE=0.0286).
Scenario simulation: Using MOLUSCE tool to generate 2030 forecast scenarios, it was found that the vegetation in the wetland core area may be consolidated, but the peripheral areas face significant losses due to urbanization (such as the proportion of non urbanized areas decreasing from 68.33% to 34.67%).
Application value: The effectiveness of combining machine learning and remote sensing technology in wetland monitoring has been verified, providing tool support for ecological restoration and sustainable planning.

shortcomings

Model assumption limitations: Relying on linear relationships (such as linear regression assumed in NDVI predictions) may overlook the nonlinear characteristics of wetland dynamics (such as sudden climate events or policy interventions).
Data dependency: The model requires at least two observations to support it, which limits its application in data scarce areas such as wetlands where long-term monitoring is missing.
Doubts about regional universality: The study focuses on the Capellan í a wetland in Bogot á, and the conclusion has not been validated for generalizability to other types or geographical environments of wetlands, such as coastal or alpine wetlands.

Logical structural defects

Chaotic distribution of chapters: The discussion (4. Discussion) and conclusion (5. Conclusions) sections overlap, with descriptions of NDVI changes and comparisons with other cases (such as Lima and Concepci ó n) mentioned in both sections, weakening the independence and generalizability of the conclusion. The key information of methodology, such as MOLUSCE platform and random forest model, is not explained in separate chapters, but is scattered in discussions and conclusions, leading to a break in the logical chain.
Delayed analysis of model limitations: The limitations of the MOLUSCE model, such as relying on fixed rules and ignoring policy factors, were only mentioned at the end of the discussion and not explained in advance in the methodology or experimental design section, which may affect readers' judgment of the credibility of the results.
Insufficient connection between core issues and conclusions: The research objectives (such as "The potential impact of ALO road construction on wetlands") were only specifically discussed during the discussion, while the introduction only briefly mentioned "The threat of urban expansion to wetlands", resulting in a lack of coherence in logical progression.

Writing style issues

Redundancy and ambiguity in expression: Key terms are repeatedly used (such as "NDVI classes 3 and 4" being described in the same sentence structure multiple times), lacking simplified or alternative expressions. Some conclusions are expressed in a vague manner, such as' the results support incorporating spatial modeling into decision-making ', without specifying how to operate, which weakens practicality.
Insufficient explanation of data and terminology: The abbreviation "ALO" was not provided with its full name when it first appeared, which affected the understanding of non professional readers. The data availability statement only vaguely mentions "encouraging data sharing" without specifying the public channels or access restrictions, which appears insufficiently rigorous.
Format and language details issues: Paragraph numbers (such as 359-386, 387-428) are mixed in the main text, suspected of formatting errors, affecting reading fluency. Some sentences have complex structures (such as nested sentence structures that combine... and utilize...), which increases the difficulty of understanding.

Author Response

Jury 1	Observation	Correction	Page	Lines
1	Limitations of model assumptions: Relying on linear relationships (such as the linear regression assumed in NDVI predictions) may overlook nonlinear features of wetland dynamics (such as sudden climate events or policy interventions).	In the section 2.6.1 Although this method assumes an approximately linear relationship over the analyzed period (2013–2024), potentially underestimating non-linear behaviors associated with climatic events or anthropogenic interventions, its application is appropriate given the limited number of observations per polygon and the need to obtain continuous values to feed the main model. This approach has been employed in previous studies on wetland and landscape dynamics [35]. Integration with subsequent non-linear models (such as Random Forest) mitigates this limitation and has been validated in studies of landscape and wetland dynamics [36]	8	261-265
2	Data dependency: The model requires at least two observations to support it, which limits its application in data-poor areas, such as wetlands where long-term monitoring is lacking.	Section 2.2 includes the field survey and land-cover identification.	5	178
3	Doubts about regional universality: The study focuses on the Capellanía wetland in Bogotá, and the conclusion has not been validated for generalization to other wetland types or geographic environments, such as coastal or alpine wetlands.	The conclusion includes some limitations of the research and makes some recommendations for further research.	20	637-666
4	Chaotic distribution of chapters: The discussion (4. Discussion) and conclusion (5. Conclusions) sections overlap, and both describe NDVI changes and compare them with other cases (such as Lima and Concepción), which weakens the independence and generalizability of the conclusion. Key information about the methodology, such as the MOLUSCE platform and the random forest model, are not explained in separate chapters, but are scattered throughout the discussion and conclusion sections, making it difficult to understand the results. rencia.	The sections have been reorganized to improve clarity: Appropriate clarifications are made in the Discussion and Conclusions. The following sections are explained in the methodology. 2.6.2 Estimation Using Random Forest 2.6.4 Urban Expansio Simulation with molusce	542 8 9	542 and 638 268 305
5	Late analysis of model limitations: Limitations of the MOLUSCE model, such as its reliance on fixed rules and ignoring policy factors, were only mentioned at the end of the discussion and were not explained in advance in the methodology or experimental design sections, which may affect readers' judgments about the credibility of the results.	Some limitations of the Molusce model are explained in section 2.6.4	9	306
6	Insufficient connection between central themes and conclusions: Research objectives (such as "The potential impact of ALO road construction on wetlands") were only specifically discussed during the debate, while the introduction only briefly mentioned "The threat of urban expansion to wetlands," resulting in a lack of coherence in logical progression.	The introduction was reinforced with a final paragraph explicitly linking the general threat of urbanization to the specific case of the Avenida Longitudinal de Occidente (ALO) and announcing the comparative scenarios (with/without ALO) to address the main objective of the study. Likewise, the conclusion was expanded with a final paragraph that synthesizes the progressive recovery of vegetation (2013–2032), highlights that the ALO would act as a catalyst for urbanization by reducing non-urbanized areas, and links these findings directly to the scenarios analyzed, reinforcing their implications for environmental and road planning in Bogotá.	16	513
7	Redundancy and ambiguity in expression: Key terms are used repeatedly (such as "NDVI classes 3 and 4" are described multiple times in the same sentence structure), without simplified or alternative expressions. Some conclusions are expressed vaguely, such as "the results support the incorporation of spatial modeling into decision-making," without specifying how it works, which reduces its feasibility.	The wording of the Results, Discussion, and Conclusions sections was revised to eliminate unnecessary repetition and improve flow. Synonyms and clearer language were used to refer to NDVI classes, avoiding the repeated use of "classes 3 and 4" and replacing it with terms such as "moderate and dense vegetation."	10, 17 20	366,542-637
8	Formatting Issues and Linguistic Details: Paragraph numbers (such as 359-386, 387-428) are interspersed within the main text, suggesting formatting errors that affect reading flow. Some sentences have complex structures (such as nested sentence structures that combine... and use...), making comprehension difficult.	The document's formatting was reviewed and corrected, eliminating paragraph numbers that appeared in the main text and could confuse readers. In addition, long sentences were simplified, breaking them into shorter, clearer phrases to improve readability. Punctuation and sentence structure were adjusted to facilitate understanding, especially in the Discussion and Conclusions, avoiding overly long or nested sentences.	The entire document

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear all,

I read the article entitled "Advanced machine learning methods as a planning strategy in the Capellanía wetland" and have the following comments:

1- The article's importance is undeniable, as it addresses the conservation of urban wetlands—extremely vulnerable ecosystems of strategic importance for biodiversity, flood control, and ecosystem services—with a focus on Bogotá. The work is statistically rigorous, with solid performance metrics.

Despite these strengths, I believe there are still some areas that could improve the quality of the work. These are:

1- The work has little field validation—the study is heavily based on remote sensing data. The lack of field verification (ground truth) can introduce uncertainty into the classification. I suggest the authors seek in situ data sources to validate their findings, even if for short periods of time.

2- Limited spatial scale—the study focuses only on the Capellanía Wetland. Comparisons with other wetlands in Bogotá could broaden generalization and strengthen recommendations for the entire city. Would this be possible? If so, please do so. If not, justify its application only to this location. Why this location and not others/all?

3- Absent climate change scenarios – the simulations only consider urban growth and infrastructure, but do not include climatic variables (rainfall, drought, and temperatures over the years), which are crucial for the dynamics of wetlands. I believe that in Bogotá there is a meteorological station linked to some public sector that provides data. I suggest including graphs showing the local climatology.

4- Limited discussion – The discussion is too short, addresses few articles in the literature, and does not highlight the findings of the study. Please improve.

Author Response

Jury 2	Observation	Correction	Page	Lines
1	The work has little field validation; the study is based primarily on remote sensing data. The lack of field verification (ground truth) can introduce uncertainty into the classification. I suggest the authors seek in situ data sources to validate their findings, even if only for short periods.	Section 2.2 Field survey and land-cover identification is incorporated, validating land cover with in-situ observations. In April 2024, a field visit was made to the Capellanía wetland, where plant species and landscape elements were recorded and used to generate and update a KMZ file (dense vegetation, grasslands, herbaceous areas, and bodies of water). This information was compared with NDVI spectral values to adjust and support the classification.	5	178
2	Limited spatial scale: The study focuses solely on the Capellanía Wetland. Comparisons with other wetlands in Bogotá could broaden the generalizability and strengthen recommendations for the entire city. Would this be possible? If so, please do so. If not, justify its application to this location only. Why this location and not others/all?	Sections 1 and 2.1 justify the selection of the Capellanía Wetland in the Introduction and Study Area. It was explained that Capellanía is one of the most impacted urban wetlands in Bogotá, with a loss of nearly 88% of its area due to road infrastructure and urban expansion, and with additional risk due to the planned Western Longitudinal Avenue (ALO). These conditions make it representative and a priority for evaluating vegetation dynamics under anthropogenic pressure.	3 y 2	98 and 143
3	Lack of climate change scenarios: The simulations only consider urban growth and infrastructure, but don't include climate variables (rainfall, drought, and temperatures over time), which are crucial for wetland dynamics. I believe there is a weather station in Bogotá linked to a public sector that provides data. I suggest including graphics showing the local climate.	Section 2.5 and Table 3 explain why climatic variables are not integrated.	7 y 8	236 and 248
4	Limited Discussion: The discussion is too brief, addresses few articles in the literature, and does not highlight the study's findings. Please improve. Lack of climate change scenarios: The simulations only consider urban growth and infrastructure, but do not include climate variables (rainfall, drought, and temperatures over time), which are crucial for wetland dynamics. I believe there is a weather station in Bogotá linked to a public sector that provides data. I suggest including graphics showing the local climatology.	The Discussion section has been expanded to strengthen the interpretation of results and their connection to recent literature. Additional relevant studies from different contexts (Pakistan, Sindh, Lima, Concepción, Barranquilla, Cauca) have been incorporated, highlighting the usefulness of Random Forest and MOLUSCE in urban environments and comparing observed trends with other Latin American wetlands.	17	542

https://github.com/oscara-cacerest/Capellania-Analysis/

Agradecemos sinceramente a los revisores y editores sus valiosos comentarios. En respuesta a los comentarios:

Edición en inglés : Todo el manuscrito se ha revisado cuidadosamente para mejorar la gramática, la claridad y la legibilidad general. Nos aseguramos de que el texto comunique nuestra investigación de forma más eficaz en inglés.
Figuras : Se ha revisado y mejorado la resolución y calidad de todas las figuras (300 dpi). Las etiquetas, leyendas y subtítulos se han traducido al inglés y reformateado para garantizar su coherencia y claridad.

Se adjunta el manuscrito revisado, que incorpora todas las mejoras mencionadas anteriormente.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript analyzes the vegetation cover change and the urbanization processes affecting it in the Capellanía wetland in Bogotá using Landsat-8 satellite images, different spectral indices, Random Forest-based NDVI prediction, and the MOLUSCE cellular automaton model. Although the topic is timely and relevant, and the multi-method approach is valuable, the article has important shortcomings that must first be addressed in order for the results to be truly convincing and scientifically sound.

1 - In line 168, Table 2, NDVI values between −1…−0.1 are classified as Bare soil. However, in the case of urban wetlands, negative NDVI values are typically associated with water surfaces or surfaces with very low reflectance. This classification can easily lead to systematic misclassification, for example in situations where open water appears as soil. How do the authors see this?

2 - In one place in the methodology description (line 180), the NDWI/NDMI/MNDWI indices are listed as predictors, while in another place (line 193) the model was trained with the variables "NDMI and MNDMI". Please clarify this inconsistency.

3 - The pseudo-R²=1.00000 value reported for MOLUSCE (line 208) needs to be interpreted: is this an exclusively in-sample calibration indicator or is it truly the result of out-of-sample validation? Please clarify this and supplement the analysis with a confusion matrix.

4 - In line 249 (Table 3), there is an extreme correlation between NDVI and NDWI, r = −0.99. This is likely to be due to the nature of the sampling frame or to strong collinearity between the predictors. To clarify this, it is recommended that the authors supplement the analysis with a predictor elimination (ablation) test, or present feature importance or SHAP results to show the actual contribution of each variable to the model performance.

5 - Regarding the value MSE = 0.397 in line 314 isn't this considered to be particularly large on the NDVI scale?

6 - Line 372 shows MAE = 0.018, while elsewhere MAE = 0.0018 is shown. Please clarify which is the valid result.

7 - How were the training and test sets separated in space and time in order to avoid data leakage? For example, how was it ensured that neighboring pixels from the same date were not included in the training and testing sets at the same time?

8 - ALO is inconsistent: Avenue Longitudinal of the West (in Abbreviation) vs. Avenida Longitudinal de Occidente (in text)

8 - Abbreviations: please include MAE and MSE.

9 - Typos
line 122: sutdy
line 160: Remote Sensing et spectral correlation
line 209: sptial
line 368: Analyses de metrics

10 - References are not cited in text startig from position 26

11 - I suggest to supplemented the reference list with relevant literature related to the methodological section (e.g., the use of remote sensing indices, Random Forest validation strategies, the use of MOLUSCE or other cellular automata-based urban growth models). This would strengthen the scientific basis of the manuscript and create a closer connection between the chosen methods and previous research.

12 - I recommend that the authors make available the files necessary for the reproduction of the processing steps (e.g. the identifiers of the Landsat scenes used, the AOI/shape files, the processing scripts, and the MOLUSCE parameter settings).

Author Response

Jury 3	Observation	Correction	Page	Lines
1	In line 168 of Table 2, NDVI values between -1 and -0.1 are classified as bare soil. However, in urban wetlands, negative NDVI values are often associated with water surfaces or surfaces with very low reflectance. This classification can easily lead to systematic misclassification, for example, in situations where open water appears as bare soil. How do the authors view this?	We appreciate the observation. Indeed, negative NDVI values may correspond to water bodies rather than bare soil. To avoid this confusion, we incorporated a water mask based on water-sensitive indices (MNDWI, NDWI, and NDMI), classifying water as an independent category. This ensures a precise separation between aquatic surfaces and non-vegetated soils throughout the analysis.	7	204
2	In the methodology description (line 180), the NDWI/NDMI/MNDWI indices are listed as predictors, while in another (line 193), the model was trained with the variables "NDMI and MNDMI." Please clarify this inconsistency..	NDVI was used as the dependent variable, while NDWI, NDMI, MNDWI, and the seasonal variables (sine and cosine of the month) were used as predictors. In the spatial validation section, the reference to “NDMI and MNDMI” was a typographical error and has been removed; the variables are now consistently listed throughout the methodological section.	8	256
3	The pseudo-R² value of 1.00000 reported for MOLUSCE (line 208) should be interpreted: Is this an indicator of calibration exclusively within the sample, or is it actually the result of out-of-sample validation? Please clarify this and complement the analysis with a confusion matrix.	It was clarified that the pseudo-R² = 1.00 corresponds to the internal calibration in MOLUSCE and not to an out-of-sample validation. Since the tool does not generate confusion matrices, transition matrices were incorporated as an equivalent resource (Tables 5a and 6a), following the methodology proposed by Muhammad et al. (2022, Land, 11(3), 419. https://doi.org/10.3390/land11030419	9	293
4	In line 249 (Table 3), there is an extreme correlation between NDVI and NDWI, r = −0.99. This is likely due to the nature of the sampling frame or to strong collinearity between the predictors. To clarify this, the authors recommend supplementing the analysis with a predictor elimination test (ablation), or presenting feature importance or SHAP results to show the true contribution of each variable to model performance.	The extreme correlation between NDVI and NDWI is due to their spectral nature and the use of common bands. To rule out biases, we applied a SHAP analysis, which revealed the differentiated importance of the predictors, keeping NDVI solely as the dependent variable and ensuring the consistency of the model.	12	403
5	Respecto al valor MSE = 0,397 en la línea 314 ¿no se considera que es particularmente grande en la escala NDVI?	After the reclassification and methodological adjustment, the model metrics improved substantially, yielding R² = 0.991, RMSE = 0.0214, and MAE = 0.0127 on the NDVI scale. These values indicate a very low error and confirm the high predictive capacity of the Random Forest in the multitemporal analysis.	1 y 18	14-568
6	Regarding the MSE value = 0.397 on line 314, is it not considered to be particularly large on the NDVI scale?	After the reclassification and methodological adjustment, the model metrics improved substantially, yielding R² = 0.991, RMSE = 0.0214, and MAE = 0.0127 on the NDVI scale. These values indicate a very low error and confirm the high predictive capacity of the Random Forest in the multitemporal analysis.	18	568-569
7	How were the training and test sets separated in space and time to prevent data leakage? For example, how was it ensured that neighboring pixels from the same date were not simultaneously included in the training and test sets?	The dataset was split into 80% for training and 20% for spatial validation using GroupShuffleSplit, ensuring independence between polygons (fid) and preventing the simultaneous inclusion of neighboring pixels in both sets.	8	267
8	ALO is inconsistent: Western Longitudinal Avenue (abbreviation) vs. Western Longitudinal Avenue (text)	The requested adjustment was made to ensure consistency in the use of the term. In the first mention, only the abbreviation ALO is used, and in subsequent mentions, the full name followed by the abbreviation is employed: Avenida Longitudinal de Occidente (ALO).	1	21
9	Abbreviations: include MAE and MSE.	Included in the abbreviations.	18	528
10	Line 122: Study Line 160: Remote Sensing and Spectral Correlation Line 209: Spatial Line 368: Metric Analysis		21	674
11	References are not cited in the text from position 26 onwards.	All references in the article have been corrected.
12	I suggest supplementing the reference list with relevant literature related to the methodology section (e.g., the use of remote sensing indices, Random Forest validation strategies, the use of MOLUSCE, or other cellular automata-based urban growth models). This would strengthen the scientific basis of the manuscript and create a closer connection between the chosen methods and previous research.	Recent references were added to strengthen the methodological section: for the use of spectral indices in urban wetlands [31, 33], for the application of Random Forest in remote sensing and spatial validation strategies [17, 38, 48], and for the use of MOLUSCE and other cellular automata models in urban growth simulation [40–43]. These additions reinforce the justification of the chosen methods and their consistency with the previous literature.	21,22 and 23	687-801
13	In line 168 of Table 2, NDVI values between -1 and -0.1 are classified as bare soil. However, in urban wetlands, negative NDVI values are often associated with water surfaces or surfaces with very low reflectance. This classification can easily lead to systematic misclassification, for example, in situations where open water appears as bare soil. How do the authors view this?	Adjustments are made see table 2	7	233
14	In the methodology description (line 180), the NDWI/NDMI/MNDWI indices are listed as predictors, while in another (line 193), the model was trained with the variables "NDMI and MNDMI." Please clarify this inconsistency.	Adjustments are made and information is presented in section 2.7.1	8	255
15	The pseudo-R² value of 1.00000 reported for MOLUSCE (line 208) should be interpreted: Is this an indicator of calibration exclusively within the sample, or is it actually the result of out-of-sample validation? Please clarify this and complement the analysis with a confusion matrix.	Adjustments are made to paragraphs 2.7.4	9	318
16	In line 249 (Table 3), there is an extreme correlation between NDVI and NDWI, r = −0.99. This is likely due to the nature of the sampling frame or to strong collinearity between the predictors. To clarify this, the authors recommend supplementing the analysis with a predictor elimination test (ablation), or presenting feature importance or SHAP results to show the true contribution of each variable to model performance.	Adjustment is made	12	403
17	Regarding the MSE value = 0.397 on line 314, is it not considered to be particularly large on the NDVI scale?	Adjustment is made	1 y 18	14-568
18	Line 372 shows an MAE of 0.018, while elsewhere it shows an MAE of 0.0018. Please clarify which result is valid.	Adjustment is made Adjustment is made	18	568-569
19	How were the training and test sets separated in space and time to prevent data leakage? For example, how was it ensured that neighboring pixels from the same date were not simultaneously included in the training and test sets?	In the model, a random partition of pixels was not performed; instead, a grouped split by polygon was applied using GroupShuffleSplit. In this way, all pixels from the same polygon were assigned exclusively to either the training or the test set, thus preventing spatial leakage, i.e., avoiding that neighboring pixels were mixed across both sets. Although a strict temporal blocking was not implemented —meaning that the same acquisition date could appear in different polygons across training and test— this strategy was consistent with the main objective of the study: to evaluate the model’s generalization capacity toward unseen polygons.	8	280
20	ALO is inconsistent: Western Longitudinal Avenue (abbreviation) vs. Western Longitudinal Avenue (text)	Adjustment is made	1	21
21	Abbreviations: include MAE and MSE.	Abbreviations are incorporated	21	685
22	Line 122: Study Line 160: Remote Sensing and Spectral Correlation Line 209: Spatial Line 368: Metric Analysis	Adjustments are made to the bibliographical	21,22 and 23	686-802
23	References are not cited in the text from position 26 onwards.
24	I suggest supplementing the reference list with relevant literature related to the methodology section (e.g., the use of remote sensing indices, Random Forest validation strategies, the use of MOLUSCE, or other cellular automata-based urban growth models). This would strengthen the scientific basis of the manuscript and create a closer connection between the chosen methods and previous research.	Adjustments are made to the bibliographical	21,22 and 23	686-802

https://github.com/oscara-cacerest/Capellania-Analysis/

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Dear all,

I believe that significant changes have been made to the article, addressing my concerns. Therefore, I recommend acceptance.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors provided adequate explanations to my questions and made the suggested changes, I have no further comments.

Article Menu

Advanced Machine Learning Methods as a Planning Strategy in the Capellanía Wetland

Further Information

Guidelines

MDPI Initiatives

Follow MDPI