Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessCommunication

Peer-Review Record

Predicting Abiotic Soil Characteristics Using Sentinel-2 at Nature-Management-Relevant Spatial Scales and Extents

Remote Sens. 2024, 16(16), 3094; https://doi.org/10.3390/rs16163094

by Jesper Erenskjold Moeslund^*

and Christian Frølund Damgaard

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Xiaomei Li

Remote Sens. 2024, 16(16), 3094; https://doi.org/10.3390/rs16163094

Submission received: 21 May 2024 / Revised: 15 August 2024 / Accepted: 19 August 2024 / Published: 22 August 2024

(This article belongs to the Special Issue Local-Scale Remote Sensing for Biodiversity, Ecology and Conservation)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

It is interesting that the manuscript focuses on predicting abiotic soil characteristics using Sentinel-2. However, the models with parameters don’t be descripted.

Suggest supplementing detailed methods and data as well as results.

L153 “in in a mixed linear model” has a repeated word.

L224 “In dry grasslands” should be “in dry grasslands”.

L226 “like we do here” should be “as we do here”.

L226 “We believe that having this longer” should be “We consider that this longer”.

L243 “like no other data source” should be “like other data source”.

L276 “feed it into models like ours” should be “feed it into our models”.

Comments on the Quality of English Language

Some of the syntax needs further improvement.

Author Response

It is interesting that the manuscript focuses on predicting abiotic soil characteristics using Sentinel-2.

Thanks a lot for Your time reviewing our MS and for your kind words!

However, the models with parameters don’t be descripted. Suggest supplementing detailed methods and data as well as results.

We have now improved describing the models, but basically we just used default settings as the point of origin, so we believe it's limited how much more details we have to give. We believe our model description are now well described and allows for others to redo our work.

Now the modelling description reads: "To test if the Sentinel-2 data is linked to the abiotic environment in the plots (the EIVs were used as response variables), we used supervised learning AI-algorithms (see below) on a training data set consisting of 30,000 randomly selected vegetation plots of the total 58,071 plots covering the 32 different habitat types found in Figure S1. For validation, we predict model estimates in the remaining vegetation plots (n = 28,017) and report predictive power (R² and standard deviation) based on that in Table 2." (Lines 154-159)

And: "To make sure to get the best model, we let the function “Predict” in Wolfram Mathematica version 14 automatically select (default settings) which machine learning algorithm that provided the best predictions for each EIV and for satellite data only, habitats only and these two data sources combined respectively (Table 2)." (Lines 165-168)

Following also the other reviewers we added quite a lot more information about the parameters used. Please see the responses to reviewer 2 and 3.

L153 “in in a mixed linear model” has a repeated word.

Fixed

L224 “In dry grasslands” should be “in dry grasslands”.

Fixed

L226 “like we do here” should be “as we do here”.

Fixed

L226 “We believe that having this longer” should be “We consider that this longer”.

Removed "having" as suggested, but kept "believe" as we do believe more here than we consider.

L243 “like no other data source” should be “like other data source”.

We don't understand this suggestion. That would change the meaning of the sentence to the opposite of what we mean. Therefore we kept the wording.

L276 “feed it into models like ours” should be “feed it into our models”.

We don't understand this, we don't mean to specifically feed into our models, it's more just to give future recommendation for coming works building on ours. To make this clearer the sentence now reads: "feed it into, for example models like ours in the future would be clearly desirable" (Lines 299-300)

Reviewer 2 Report

Comments and Suggestions for Authors

（1）The choice of keywords does not match the topic of this study, so it is suggested to simplify and modify.

（2） The study utilized satellite data from August 16, 2016, while the vegetation data was collected between 2004 and 2015. Firstly, there is a discrepancy in the recording years of the satellite data and the vegetation sample data. Secondly, relying solely on one day of satellite data may result in overfitting the model to that specific day's information. Soil moisture, soil nutrients, and pH levels can significantly vary across different seasons. The spectral characteristics of plants under diverse seasonal and climatic conditions were not examined; hence, the predictive results of the model may not be applicable to other time periods. Due to limitations inherent in both the data and methods employed, it is likely that this model's applicability and generalizability are constrained when applied in different regions or under varying temporal conditions. To enhance its generalizability, it is advisable to incorporate multi-temporal data encompassing various seasons and years to capture dynamic changes in plant and soil characteristics.

（3）This study highlights that the accuracy of GIS and the uncertainty of EIV, along with other factors, can impact the model's precision. However, it fails to consider meteorological conditions (such as rainfall, temperature, etc.) and other environmental factors that influence soil properties. These unaccounted variables may significantly contribute to the imprecision observed in the predictive outcomes.

（4）This study discusses the utilization of multiple machine learning algorithms for automated model selection, which is a commonly employed technique to enhance model performance. However, it should be noted that different algorithms exhibit diverse approaches and adaptability towards data. The study lacks detailed guidelines for algorithm selection and implementation specifics, potentially resulting in a lack of transparency and interpretability during the process of model selection.

（5）This study suggests that incorporating habitat information enhances the model's predictive capacity, yet it lacks a comprehensive discussion on the generalizability and limitations of these findings in practical applications. While the model demonstrates satisfactory performance within the study area, its ability to predict and apply may be influenced by varying geographical and climatic conditions. Further validation and testing are necessary to ascertain the model's performance across diverse environmental contexts. Additionally, the study does not delve into specific practical application scenarios or potential challenges, such as nature management and ecological monitoring.

（6）The introduction of the manuscript does not fully introduce the research background and progress related to the research topic, and needs to further improve the discussion of the introduction.

（7）The discussion points in the discussion section are too scattered, and it is recommended to reorganize the discussion based on the thesis theme and research findings. Additionally, the format of the subheadings in the discussion chapter appears arbitrary, which does not align with the publication requirements of the journal. Therefore, it is necessary to revise them according to the prescribed format specified by the journal.

（8）The manuscript is deficient in a concluding chapter.

Comments on the Quality of English Language

Minor editing of English language required.

Author Response

（1）The choice of keywords does not match the topic of this study, so it is suggested to simplify and modify.

We are in doubt which keywords does not match; we believe they are all relevant and captures the essence of our work. We have deleted "habitat characteristics" though as that one overlaps somewhat with the title and with the keyword "habitat condition". We hope this is now ok?

We fully agree with this comment and therefore we have already described this in its own paragraph in the section about "limitations and uncertainties". We have now expanded this section to leave no doubt that this issue needs to be tested before we can be sure about the generality of the model. That said, our model is built on a very large dataset and spanning many different habitats over a large geographic area. In addition to this we do not agree that soil pH and fertility can significantly vary across season (soil moisture can of course!), these factors are likely to be rather constant and only change slowly in nature areas. Combined with the fact that the floristic data comes from many different days over many years causes us to believe that our models are probably not far from a model trained on time series data. Also, please recall that our aim here is not to develop finished useable-off-the-shelves models, it is to test if this can be done and gain a rough impression of the possible predictive power. This is the new paragraph after our revisions:

"Our models were built using satellite imagery collected in a single day. This means that first of all, we are missing information on how the spectral signal changes over the year, and secondly, that the spectral data does not necessarily match the recording year of the floristic data from the vegetation plots. Therefore, there is a risk that our models are overfitted to the data. For that reason, we recommend that future works should test these models on data from multiple days over some years before practical applications to make sure they are not overfitted to data from a certain day. This will pinpoint issues with generality and ensure that models are sufficiently general for applications under day-to-day and year-to-year varying phenology and long-term weather conditions (e.g., moist and dry summers) that are known to strongly influence soil characteristics." (Lines 268-277)

We do not understand this comment. This is a core part of the paragraph on time-series issues (see above) and this was also the case before our revised version given above. This is the old text: "... to make sure they are not overfitted to data from a certain day, and hence are sufficiently general for applications under day-to-day and year-to-year varying phenology and long-term weather conditions (e.g., moist and dry summers)." We even referred to this in the paragraph on GPS uncertainty that used to read: "Considered together with the fact that our satellite data only captures a glimpse of conditions in time (see above), this means...". To leave out any doubt, we have now further emphasized this in the new revised paragraph mentioning these issues, and we believe this is now clear. Please see this new paragraph in the comment just above this one.

We write the following in the methods: "To make sure to get the best model, we let the function “Predict” in Wolfram Mathematica version 14 automatically select which machine learning algorithm that provided the best predictions...". However we did fail to write that we use Mathematica's default settings for this. We now added this information to make sure that it is clear what we did and ensure that others can redo what we did. It now reads: "To make sure to get the best model, we let the function “Predict” in Wolfram Mathematica version 14 automatically select (default settings) which machine learning algorithm that provided the best predictions..." (Lines 165-167)

We were not clear enough about the validation of our model. We used 30.000 plots for training and then we used the remaining 28.000 plots for validation and we never wrote this anywhere. The R2 values are calculated based on this, so it actually does mirror the models' general predictive ability across many different plots with many different habitat types and over a large geographic area. We have now changed the text accordingly to make sure this is clear:

"To test if the Sentinel-2 data is linked to the abiotic environment in the plots (the EIVs were used as response variables), we used supervised learning AI-algorithms (see below) on a training data set consisting of 30,000 randomly selected vegetation plots of the total 58,071 plots covering the 32 different habitat types found in Figure S1. For validation, we predict model estimates in the remaining vegetation plots (n = 28,017) and report predictive power (R² and standard deviation) based on that in Table 2." (Lines 154-159)

We also changed the figure 2 legend to highlight that it shows the predictions of the 28.000 remaining plots: "Predicted vs. actual values for the average Ellenberg Indicator Values (EIV [21]) from the validation plots (n = 28,017)." (Lines 204-205)

Also please recall that our study covers a very wide range of different open habitats, please see supplementary figure S1. This means that our model is already aware of a large number of different environmental, geographical and other conditions that can influence the model performance.

Regarding the missing discussion of this, we now added the following to the perspectives in the discussion in the paragraph were we discuss habitat type and its model influence: "Without knowledge of habitat type, models like the ones we developed here will have poorer performance and their results will be less useful in practical applications." (Lines 300-302). This will make it absolutely clear what the implications for practical application of models like ours are.

Finally, in this reviewer comment "This study suggests that incorporating habitat information enhances the model's predictive capacity, yet it lacks a comprehensive discussion on the generalizability and limitations of these findings in practical applications. " we do not understand the "generalizability" part of it. Our study is based on data on a large number of different habitats exactly to avoid issues with generalizability due to different habitat types. We believe this comment stems from the fact that we were not clear in the main text about this, so we have now added a reference to Fig S1 in the methods (Line 157) and we also now mention this specifically in the analysis description: "To test if the Sentinel-2 data is linked to the abiotic environment in the plots (the EIVs were used as response variables), we used supervised learning AI-algorithms (see below) on a training data set consisting of 30,000 randomly selected vegetation plots of the total 58,071 plots covering the 32 different habitat types found in Figure S1." (Lines 154-157)

（6）The introduction of the manuscript does not fully introduce the research background and progress related to the research topic, and needs to further improve the discussion of the introduction.

We have now carefully checked the introduction to make sure that all essential background information is there and we cannot identify any major parts of information lacking. This research field is huge, and off course we could add more reference and discuss more remote sensing techniques, analytical techniques etc. but we really don't feel that would improve the MS. In the author guidelines we are specifically asked to "briefly place the study in a broad context and highlight why it is important" and we believe this is exactly what we have done. We identified one place though, that could need a little more detail and have revised that:

We changed: "A large number of studies have attempted to use remote sensing techniques to predict ecological factors like plant species richness, plant phenology, plant traits and..."

To: "For example, we now have access to relatively fine-resolution (decameters) satellite borne spectral data at global scale [1,14]. Using these data, a large number of studies have attempted to predict ecological factors like plant species richness, plant phenology, plant traits and..." (Lines 42-45)

Normally we would structure the discussion to follow the study questions, i.e., answering each study question one at a time. However, for this study we did not feel this structure made sense as we cannot answer the first study question without also touching upon the second, and vice versa, so therefore we decided to structure the discussion according to the results as You also suggest here, first focusing on ph/nutrients as these two cannot be discussed separately because they are so tightly linked and then on soil moisture. To us this makes good sense and we cannot see that another structure would help the reader better so we prefer to keep the structure of the discussion as it is.

Thanks a lot for the hint on formatting, we have now corrected this. This was also a problem in the methods where we now also have corrected it.

（8）The manuscript is deficient in a concluding chapter.

This is intentionally. We do not feel the need for a conclusion as our MS is not long and complex and this is not mandatory (author guidelines)

Reviewer 3 Report

Comments and Suggestions for Authors

Comments for author File: Comments.pdf

Author Response

This paper is a very creative paper with fluent and clear writing. The research technology is
innovative. I agree the paper to be published with minor revision. There are a few comments to
be improved.

We sincerely thank this reviewer for her/his very kind words! We also wish to take the opportunity to express our gratitude for the effort spend on reviewing our MS!

1. In Materials and Methods, please add a table about the data, data source, data spatial resolution
in the research, it is easily to understand the data source; I suggest to add a technical flow chat of
the paper; Particularly, I want to know where the EIVs data in vegetation plots from field survey
or from satellite.

We are sorry that this was unclear and are thankful for the suggestion. We now added a table (see the new Table 1) to give this overview and in the paragraph in methods describing EIVs, we also now better explain EIVs and specifically link directly to the EIV reference for readers who want to know where these values come from. We believe all details of the processing of plant data and calculation of EIVs are now clear. The new sentence added: "Most European plant species are assigned an indicator value for each of the soil-factors mentioned above, see [21]." (Lines 94-95)

2. Please give the clear explanation about the soil EIVs. Because pH is a very special indicator,
below 7, it refers to the acid environment, and over 7, it refers to the alkaline environment, please
give the explanation about how to predict the pH?

Good point! We have made sure to explain the EIVs better, notable the soil reaction/pH. We added the following to the description of EIVs in the methods: "The soil moisture EIVs are in the interval 1 (very drought tolerant species) to 12 (submersed water plants). The soil nutrient indicator values are in the interval from 1 (indicating plants with low productivity typically adapted to growing in very nutrient poor sites) to 9 (plants that are highly productive typically found on very nutrient rich sites). The soil reaction indicator values are in the same interval with 1 indicating plants adapted to growing in strongly acidic soil (actual pH 3.5–4) and 9 indicating plants living almost exclusively on calcareous soils (actual pH 7.5–8) [21,26]." (Lines 97-104)

Why do you use N/R this indicator? what is
the physical meaning about this soil indicator?

We think this makes more sense after explaining the EIVs better. We have also tried to make it more clear by adding this sentence: "The N/R ratio is thought to represent the adaptation of plants to grow in nutrient poor or rich sites independently from what the soil pH is [27]" (Lines 107-109) We now believe the explanation of this parameter is clear.

3. For table 1, please give the lists of the independent variables(explanatory variables) including
Satellite spectral bands variables and Habitat,

We now explain this in the table legend: "“Satellite” denotes that all 9 bands (see main text) from the satellite data were used as explanatory variables. “Habitat” denotes that only habitat was used as a categorical explanatory variable" (Lines 194-196)

do you use the controlled variables in the results?

We do not understand this comment.

4. Figure 2 is vague, please use the clear figure.

The figure is indeed condensed. If the MS is published, the full size figure will be available to the audience where one can easily see the text etc.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The revised manuscript basically meets the requirements.

Comments on the Quality of English Language

Some of the syntax needs further improvement.

Author Response

Thanks a lot for Your kind words and thanks for taking on the task of re-reviewing, really appreciated!

We have now carefully read through the whole MS and made a few English edits in the first sections and a substantial number of edits in the discussion where we found explanations and text bits to be more complicated. We now believe the text reads easier, and believe the text is now publication worthy.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have carefully revised the comments of the first review, and the quality of the manuscript has been greatly improved.

Comments on the Quality of English Language

Minor editing of English language required.

Author Response

We thank this reviewer for her/his kind words and for taking on the task of re-reviewing, really appreciated!

Article Menu

Predicting Abiotic Soil Characteristics Using Sentinel-2 at Nature-Management-Relevant Spatial Scales and Extents

Further Information

Guidelines

MDPI Initiatives

Follow MDPI