Next Article in Journal
Linking Past and Present Land-Use Histories in Southern Amazonas, Peru
Next Article in Special Issue
Estimation of Pb Content Using Reflectance Spectroscopy in Farmland Soil near Metal Mines, Central China
Previous Article in Journal
Study of Atmospheric Turbidity in a Northern Tropical Region Using Models and Measurements of Global Solar Radiation
Previous Article in Special Issue
Soil Color and Mineralogy Mapping Using Proximal and Remote Sensing in Midwest Brazil
 
 
Article
Peer-Review Record

Soil Organic Matter Prediction Model with Satellite Hyperspectral Image Based on Optimized Denoising Method

Remote Sens. 2021, 13(12), 2273; https://doi.org/10.3390/rs13122273
by Xiangtian Meng 1,2,3, Yilin Bao 3, Qiang Ye 3, Huanjun Liu 2,3, Xinle Zhang 1,3,*, Haitao Tang 3 and Xiaohan Zhang 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Remote Sens. 2021, 13(12), 2273; https://doi.org/10.3390/rs13122273
Submission received: 17 April 2021 / Revised: 4 June 2021 / Accepted: 8 June 2021 / Published: 10 June 2021

Round 1

Reviewer 1 Report

I have read the manuscript ”Soil organic matter prediction model with satellite hyperspectral image based on optimized denoising method”, submitted to remote sensing, in order to provide a review.

In the manuscript, the authors use hyperspectral satellite images for mapping SOM in a 1481 km2 area in China. They test fractional order derivatives, coupled with three different denoising methods on the satellite spectra, to investigate if these techniques improve the SOM predictions. They develop a series of spectral indices based on the data and select data for Random Forest models using the recursive feature elimination algorithm to select the most relevant variables. The results show that 0.6-order derivatives coupled with discrete wavelet transform provide the most accurate SOC predictions.

I find that the methods and results presented in the manuscript are highly novel and relevant. The manuscript follows a clear, logical structure, and is generally easy to read. It also presents the contents in a thorough and concise manner. The language has only few grammatical errors, which do not hinder the understanding of the presentation.

While the presentation is generally appropriate, I have a few points, which I would like the authors to clarify in the revised manuscript. Furthermore, a few of the figures require improvement in order to be fully acceptable. I have listed my specific comments following the next paragraph.

In conclusion, I recommend the submitted manuscript for publication in remote sensing, after a minor revision, wherein the authors address the issues that I have raised.

Specific comments:

Abstract: Please include the size and location of the study area in the abstract.

L12 – 14: It appears that some words are missing from this sentence, as it does not make fully sense to me. Please revise.

L20 – 21: The abstract mentions the recursive feature elimination algorithm, whereas the text in the manuscript mentions the optimal band combination algorithm (e.g. L117). Are these the same algorithm? If so, please revise for consistency throughout the manuscript. Also, given the number of times that they are mentioned, please consider using an abbreviation.

L22 – 23: The sentence “The results […] hyperspectral data” is redundant and can be deleted to shorten the abstract.

L27 – 30: The text “Moreover, the spatial […] predict SOM, and” is mostly superfluous and can be deleted to shorten the abstract.

L63: “the noise has no periodicity” – Please add a source to this statement.

L65: “leads to wrong conclusions” – Please add a source to this statement.

L97: “few studies” – Please provide references for these studies.

L101: “difference index” – This index receives the abbreviation DI at a later point in the manuscript (L271). I think it would be appropriate to provide the abbreviation here already.

L105: “RPD” – I am not familiar with this abbreviation, please provide an explanation.

L106: “Previous study” – Which study? Please provide a reference.

Figure 1a: I do not think most readers are familiar with the location of Heilongjiang Province. I think it would be more helpful to the readers if the insert showed the location of the Northern Songnen Plain within China. Please revise.

Figure 1c: Please use different symbols for the calibration and validation samples.

Figure 1c: Please provide a source and a detailed description for the soil map. When and how was the soil map produced? What is the scale of the map?

L153: Did these 30 x 30 m squares align with the pixels of the satellite images? This would be useful information.

Sections 2.2.2: Did the authors make any efforts to extract only bare soil areas, and if so, how did they achieve this? I am curious of the images still contain any vegetated areas, and I would like to see this information included in the manuscript.

L287: “K-fold cross validation” – How many folds did you use? And did you test the optimal number of variables only on the calibration dataset, or on the full dataset? Please provide this information.

L304: “an iterative method” – This is not sufficiently clear. Please provide additional details on the method.

Table 2: I think reporting SOM contents with two decimals is not fully justified, as most laboratory methods for measuring SOM are not this accurate. Dropping the second decimal would also improve the readability of the manuscript.

Figure 2: I think it is a poor choice to show R2, RMSE and RPIQ in the same plot. It makes it very difficult to read the figure. Please use a separate subplot for each accuracy metric. Ideally, the authors could use a dotplot, as in this example: https://stackoverflow.com/questions/29850782/ordering-faceted-dotplot

Figure 3: Please explain the abbreviations in the legend for the figure.

Figure 4: This is a very interesting and informative figure. However, the authors could improve it by adding labels to explain the columns and rows. This would make it easier to gain a quick overview of the figure.

Figure 6: It is difficult to compare the maps, as they use different minimum and maximum values.

Figure 8: I do not believe that the soil boundaries shown in the map are sufficiently precise for this kind of comparison, given the coarse scale of the map. In fact, the soil boundaries appear to have little or no effect on the patterns seen in the images, which more or less defeats the purpose of the figure.

Figure 8: It is my impression that field management has a much larger effect on the patterns in the images than the soil type. However, the authors do not mention this in the text. I would like to see the authors elaborate on this comparison.

Figure 8: The numbers on the scale bars in the subplots are too small to read. Alternatively, the authors could simply write how large the subplot areas are in the caption for the figure.

Author Response

Dear reviewer

We would like to express our sincere appreciation for your helpful comments. These comments are all valuable and helpful for revising and improving our manuscript, as well as the important guiding significance to our researches. We have addressed the points noted below.

Responses to Reviewer #1:

I have read the manuscript “Soil organic matter prediction model with satellite hyperspectral image based on optimized denoising method”, submitted to remote sensing, in order to provide a review.

In the manuscript, the authors use hyperspectral satellite images for mapping SOM in a 1481 km2 area in China. They test fractional order derivatives, coupled with three different denoising methods on the satellite spectra, to investigate if these techniques improve the SOM predictions. They develop a series of spectral indices based on the data and select data for Random Forest models using the recursive feature elimination algorithm to select the most relevant variables. The results show that 0.6-order derivatives coupled with discrete wavelet transform provide the most accurate SOC predictions.

I find that the methods and results presented in the manuscript are highly novel and relevant. The manuscript follows a clear, logical structure, and is generally easy to read. It also presents the contents in a thorough and concise manner. The language has only few grammatical errors, which do not hinder the understanding of the presentation.

While the presentation is generally appropriate, I have a few points, which I would like the authors to clarify in the revised manuscript. Furthermore, a few of the figures require improvement in order to be fully acceptable. I have listed my specific comments following the next paragraph.

In conclusion, I recommend the submitted manuscript for publication in remote sensing, after a minor revision, wherein the authors address the issues that I have raised.

Specific comments:

Comment 1: Abstract: Please include the size and location of the study area in the abstract.

Answer: We have added the content in line 14-15.

Comment 2: L12 – 14: It appears that some words are missing from this sentence, as it does not make fully sense to me. Please revise.

Answer: We have revised it

Comment 3: L20 – 21: The abstract mentions the recursive feature elimination algorithm, whereas the text in the manuscript mentions the optimal band combination algorithm (e.g. L117). Are these the same algorithm? If so, please revise for consistency throughout the manuscript. Also, given the number of times that they are mentioned, please consider using an abbreviation.

Answer: Recursive feature elimination algorithm and optimal band combination algorithm are two different algorithms. Recursive feature elimination method is used to select a few input variables with high correlation with SOM content from a large number of input variables, so as to be used in SOM prediction. The optimal band combination algorithm constructs the spectral index by calculating the correlation between the combination of different bands and SOM content, which is helpful for improving the prediction accuracy of soil properties. The optimal band combination algorithm is mentioned in the abstract (line 21), and the abbreviation of recursive feature elimination algorithm is added in line 22.

Comment 4: L22 – 23: The sentence “The results […] hyperspectral data” is redundant and can be deleted to shorten the abstract.

Answer: We have deleted it.

Comment 5: L27 – 30: The text “Moreover, the spatial […] predict SOM, and” is mostly superfluous and can be deleted to shorten the abstract.

Answer: We have deleted it.

Comment 6: L63: “the noise has no periodicity” – Please add a source to this statement.

Answer: We have added a source to this statement in line65.

Comment 7: L65: “leads to wrong conclusions” – Please add a source to this statement.

Answer: We have added a source to this statement in line67.

Comment 8: L97: “few studies” – Please provide references for these studies.

Answer: We have revised it.

Comment 9: L101: “difference index” – This index receives the abbreviation DI at a later point in the manuscript (L271). I think it would be appropriate to provide the abbreviation here already.

Answer: We have revised it.

Comment 10: L105: “RPD” – I am not familiar with this abbreviation, please provide an explanation.

Answer: RPD is the abbreviation of the residual predictive deviation. RPD≥2, 1.4≤RPD<2 and RPD<1.4, which correspond to excellent, good and poor predictions, respectively. We have added the explanation in line107-108.

Comment 11: L106: “Previous study” – Which study? Please provide a reference.

Answer: We have revised it in line108-109.

Comment 12: Figure 1a: I do not think most readers are familiar with the location of Heilongjiang Province. I think it would be more helpful to the readers if the insert showed the location of the Northern Songnen Plain within China. Please revise.

Answer: We have revised it.

Comment 13: Figure 1c: Please use different symbols for the calibration and validation samples.

Answer: We have revised it.

Comment 14: Figure 1c: Please provide a source and a detailed description for the soil map. When and how was the soil map produced? What is the scale of the map?

Answer: The soil map is the Second National Soil Survey map, which is the great group level according to the Genetic Soil Classification of China. The map was obtained from http://vdb3.soil.csdb.cn/. The map was produced in the 1980s, and it was produced by the government by digging out and analyzing the soil profile. The scale of the map is 1:1000000. We have added the content in line 130-134.

Comment 15: L153: Did these 30 x 30 m squares align with the pixels of the satellite images? This would be useful information.

Answer: Yes, the 30 x 30 m squares is align with the pixels of the satellite images. We have added the content in line160-161.

Comment 16: Sections 2.2.2: Did the authors make any efforts to extract only bare soil areas, and if so, how did they achieve this? I am curious of the images still contain any vegetated areas, and I would like to see this information included in the manuscript.

Answer: The study area is bare soil areas. Firstly, the study area is the cultivated land of Mingshui county, so there are no buildings and water areas in the study area. Secondly, the study area is dominated by annual vegetation, and the farmers in this area deal with crop residues (such as straw) with the traditional method of burning, which leaves hardly any residue in the cultivated land. The cultivated land is plowed from the end of March to the end of April every year, and then the cultivated soil is directly exposed to the surface. There is neither a large area of vegetation nor snow cover, as April and May occur in the “bare soil period” [1]. The “bare soil period” is described on lines 143-149. In order to make it easier for readers to understand, we rewrote the sentence in line179-180.

[1] Yang, H.X.; Zhang, X.K.; Xu, M.Y.; Shao, S.; Wang, X.; Liu, W.Q.; Wu, D.Q.; Ma, Y.Y.; Bao, Y.L.; Zhang, X.L.; Liu, H.J. Hyper-temporal remote sensing data in bare soil period and terrain attributes for digital soil mapping in the Black soil regions of China. Catena. 2020, 184, 104259. DOI: 10.1016/j.catena.2019.104259.

Comment 17: L287: “K-fold cross validation” – How many folds did you use? And did you test the optimal number of variables only on the calibration dataset, or on the full dataset? Please provide this information.

Answer: We used 10-fold cross validation, and we used the full dataset to test the optimal number of variables. We have added the information in line294-295.

Comment 18: L304: “an iterative method” – This is not sufficiently clear. Please provide additional details on the method.

Answer: We have revised it.

Comment 19: Table 2: I think reporting SOM contents with two decimals is not fully justified, as most laboratory methods for measuring SOM are not this accurate. Dropping the second decimal would also improve the readability of the manuscript.

Answer: We predicted SOM content on a large-scale region, and compared with similar studies, there were more soil samples. If only one decimal is reserved, the number of soil sample with the same SOM content will increase, so it is difficult to reflect the spectral reflectance difference of different SOM content, and it is difficult to find the change rule between SOM content and spectral characteristics. A large number of SOM prediction studies have retained two decimals [2-4].

[2] Shi, Z., Ji, W., Viscarra Rossel, R.A., Chen, S. & Zhou, Y. 2015. Prediction of soil organic matter using a spatially constrained local partial least squares regression and the Chinese vis-NIR spectral library. European Journal of Soil Science, 66, 679-687.

[3] Wang, X.P.; Zhang, F.; Kung, H.T; Johnson, V.C. New methods for improving the remote sensing estimation of soil organic matter content (SOMC) in the Ebinur Lake Wetland National Nature Reserve (ELWNNR) in northwest China. Remote. Sens. Environ. 2018, 218, 104–118. DOI: 10.1016/j.rse.2018.09.020

[4] Kathrin, J., Sabine, C., Carsten, N. & Saskia F. 2019. A remote sensing adapted approach for soil organic carbon prediction based on the spectrally clustered LUCAS soil database. Geoderma, 353, 297-307.

Comment 20: Figure 2: I think it is a poor choice to show R2, RMSE and RPIQ in the same plot. It makes it very difficult to read the figure. Please use a separate subplot for each accuracy metric. Ideally, the authors could use a dotplot, as in this example: https://stackoverflow.com/questions/29850782/ordering-faceted-dotplot

Answer: We have revised it.

Comment 21: Figure 3: Please explain the abbreviations in the legend for the figure.

Answer: We have explained the abbreviations in the legend for the figure.

Comment 22: Figure 4: This is a very interesting and informative figure. However, the authors could improve it by adding labels to explain the columns and rows. This would make it easier to gain a quick overview of the figure.

Answer: We have revised it.

Comment 23: Figure 6: It is difficult to compare the maps, as they use different minimum and maximum values.

Answer: This figure is the result of SOM content prediction using reflectance after different denoising methods. The minimum and maximum values of SOM content predicted based on the reflectance of different denoising methods should be compared with the lab-measured SOM content. Figure 6 is not only used to reflect the spatial distribution of SOM content, but also to select the spatial distribution map of SOM content which is closest to the lab-measured SOM content, which is helpful to evaluate different denoising methods.

Comment 24: Figure 8: I do not believe that the soil boundaries shown in the map are sufficiently precise for this kind of comparison, given the coarse scale of the map. In fact, the soil boundaries appear to have little or no effect on the patterns seen in the images, which more or less defeats the purpose of the figure.

Answer: The purpose of figure 8 is not to compare differences in different soil classes. The reflectance characteristics of different soil classes are different. Therefore, according to the soil map, we selected the boundary position of different soil classes, which is more representative. In this way, we can more compare the difference of image details under different denoising methods.

Comment 25: Figure 8: It is my impression that field management has a much larger effect on the patterns in the images than the soil type. However, the authors do not mention this in the text. I would like to see the authors elaborate on this comparison.

Answer: We have revised it.

Comment 26: Figure 8: The numbers on the scale bars in the subplots are too small to read. Alternatively, the authors could simply write how large the subplot areas are in the caption for the figure.

Answer: We have revised it.

Reviewer 2 Report

The authors presented an interesting work on the use of hyperspectral remote sensing to predict soil organic matter. The study is very well done and it certainly deserves publication.

I only have a limited number of remarks/suggestions/questions:

  • Line 389: I am not sure whether it is right to call them “optimal”. Perhaps “highest” is more correct. The same use of the word "optimal" is found in other parts of the document.
  • The caption of Figure 5 refers to a black solid line representing the regression line. In the version of the document I received, the regression line looks red.
  • An important conclusion of the study was that the highest prediction accuracy was achieved by 0.6-order DWT. According to Table 4, that prediction is based on spectral indices only. It would be interesting to see how those indices are composed (i.e. which bands are being used). Unless I missed it, it is not clear whether each index (DI, RI, NDI and RDVI) is only used once in the final model. Are they built with different bands? Were they checked for collinearity? In addition to these questions I suggest that the most accurate model found in the study be better described by telling which bands are concerned in the spectral indices included in the model.
  • Linked to the previous comment. In section 3.5 it is mentioned that "With OR, OR-SVD, OR-FT and OR-DWT, the input variables are composed of spectral indexes and wavelengths of approximately 1500 nm". Further in the text: "With 0.6-order, 0.6-order-SVD, 0.6-order-FT and 0.6-order-DWT, the input variables are composed of spectral indexes and wavelengths of approximately 500 nm." Two questions on this part of the text:
    • The first citation refers to "OR-..." options. I understood OR means "Original Reflectance", why is it written "the input variables are composed of spectral indexes ....." (line 397)?
    • Independently of the mathematical procedures adopted in the text, how can be explained that in one option the wavelengths of approximately 1500 nm are sensitive to SOM and in other options it is the wavelengths of approximately 500 nm the ones that are sensitive to SOM?  What can be the physical explanation of that? The simple question would be: Which region of the electromagnetic spectrum can tell us something about the soil organic matter?

Author Response

Dear reviewer

We would like to express our sincere appreciation for your helpful comments. These comments are all valuable and helpful for revising and improving our manuscript, as well as the important guiding significance to our researches. We have addressed the points noted below.

Responses to Reviewer #2:

The authors presented an interesting work on the use of hyperspectral remote sensing to predict soil organic matter. The study is very well done and it certainly deserves publication.

I only have a limited number of remarks/suggestions/questions:

Comment 1: Line 389: I am not sure whether it is right to call them “optimal”. Perhaps “highest” is more correct. The same use of the word "optimal" is found in other parts of the document.

Answer: We have revised it.

Comment 2: The caption of Figure 5 refers to a black solid line representing the regression line. In the version of the document I received, the regression line looks red.

Answer: We have revised it in line435.

Comment 3: An important conclusion of the study was that the highest prediction accuracy was achieved by 0.6-order DWT. According to Table 4, that prediction is based on spectral indices only. It would be interesting to see how those indices are composed (i.e. which bands are being used). Unless I missed it, it is not clear whether each index (DI, RI, NDI and RDVI) is only used once in the final model. Are they built with different bands? Were they checked for collinearity? In addition to these questions I suggest that the most accurate model found in the study be better described by telling which bands are concerned in the spectral indices included in the model.

Answer: The optimal band combination algorithm constructs the spectral index by calculating the correlation between the combination of different bands and SOM content and those indices are composed by different bands. The results of the optimal band combination were shown in Figure 4 and Table 3. We have checked the collinearity, and it's very low. We have told which bands are concerned in the spectral indices included in the most accurate model in Chapter 3.4 and 4.3.

Comment 4: Linked to the previous comment. In section 3.5 it is mentioned that "With OR, OR-SVD, OR-FT and OR-DWT, the input variables are composed of spectral indexes and wavelengths of approximately 1500 nm". Further in the text: "With 0.6-order, 0.6-order-SVD, 0.6-order-FT and 0.6-order-DWT, the input variables are composed of spectral indexes and wavelengths of approximately 500 nm." Two questions on this part sof the text:

The first citation refers to "OR-..." options. I understood OR means "Original Reflectance", why is it written "the input variables are composed of spectral indexes" (line 397)?

Answer: For example, in OR, the input variables are composed of R1485, R1511, RI, NDI, RDVI and MSR (Table 4). RI, NDI, RDVI and MSR are spectral indexes, R1485, R1511 are the single band that around 1500nm. Therefore, we written "the input variables are composed of spectral indexes and some single band that around 1500nm". In order to make it easier for readers to understand, we rewrote this sentence.

Comment 5: Independently of the mathematical procedures adopted in the text, how can be explained that in one option the wavelengths of approximately 1500 nm are sensitive to SOM and in other options it is the wavelengths of approximately 500 nm the ones that are sensitive to SOM?  What can be the physical explanation of that? The simple question would be: Which region of the electromagnetic spectrum can tell us something about the soil organic matter?

Answer: In lab-measured hyperspectral reflectance, the SOM response band range is 550-700 nm [1]. In first-order derivative, the correlation between the first-order derivative reflection and the SOM exhibits positive relations in the ranges of 816-932 nm and 1039-1415 nm, but it is negatively correlated in the range 481-598 nm [2].

In these band range, the reflectance is highly correlated with SOM content, and the reflectance is inversely proportional to the SOM content. In OR and OR-SVD, the selected single bands in this manuscript were mainly around 1500 nm, but the selected spectral indexes were mainly constructed by the bands that around 500-600 nm (Table 3). The band selection is similar to the previous findings [1]. At the same time, recursive feature elimination algorithm was used to select some independent variables for predicting SOM content. The algorithm is not based on the correlation between reflectance and SOM content to select independent variables, but on the impact of different combinations of independent variables on SOM prediction. This is why the single band selected in this paper is not in the range of SOM response band.

[1] Galvão, L.S. & Vitorello, I. 1998. Variability of laboratory measured soil lines of soils from southeastern Brazil. Remote Sensing of Environment, 63, 166-181.

[2] Lu, Y., Bai, Y., Yang, L. & Wang, H. 2007. Prediction and validation of soil organic matter content based on hyperspectrum. Scientia Agricultura Sinica, 40, 1989-1995 (in Chinese).

Back to TopTop