Next Article in Journal
The Local Median Filtering Method for Correcting the Laser Return Intensity Information from Discrete Airborne Laser Scanning Data
Next Article in Special Issue
A Hybrid Approach Combining Conceptual Hydrological Models, Support Vector Machines and Remote Sensing Data for Rainfall-Runoff Modeling
Previous Article in Journal
Simulating Multi-Directional Narrowband Reflectance of the Earth’s Surface Using ADAM (A Surface Reflectance Database for ESA’s Earth Observation Missions)
Previous Article in Special Issue
Suitability of Satellite-Based Precipitation Products for Water Balance Simulations Using Multiple Observations in a Humid Catchment
 
 
Article
Peer-Review Record

Performance Evaluation of the Multiple Quantile Regression Model for Estimating Spatial Soil Moisture after Filtering Soil Moisture Outliers

Remote Sens. 2020, 12(10), 1678; https://doi.org/10.3390/rs12101678
by Chunggil Jung 1, Yonggwan Lee 2, Jiwan Lee 2 and Seongjoon Kim 3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Remote Sens. 2020, 12(10), 1678; https://doi.org/10.3390/rs12101678
Submission received: 27 April 2020 / Revised: 18 May 2020 / Accepted: 21 May 2020 / Published: 23 May 2020
(This article belongs to the Special Issue Remote Sensing for Streamflow Simulation)

Round 1

Reviewer 1 Report

The work applies the multiple quantile regression (MQR) model and compare the results obtained in a previous study by the same Authors, using the multiple linear regression (MLR). Besides, They used the solation forest (IF) algorithm to the observed SM data to detect outliers.

In the following my comments:

  • The general description is, in some part, too poor and should be expanded (for example for the IF is poorly described). As well as the symbol of the equations are not fully explained in the text.
  • Equation 5 and 6 shlud be inverted, since the score in the main equation. No description of c(n) is given, for example. 
  • In Figure 4 are the numbers at the top of the figure the ID of the stations?
  • The results are only a report of numbers, without a real deeper discussion. For example, why the MQR results for the clay were better than the others? 
  • This sentence "Due to the nature of Korea, which features a monsoon climate, the correlation coefficient is high in spring because rainfall is concentrated in the summer, whereas relatively little rainfall falls in the spring, and the fluctuations caused by this were small." looks ambigous, could you clarify it?
  • There was an improvement from the previous study but still the godness of fit indexes are poor in many cases. Did you try to condier other variables for the MQR? Could you think or suggest on how to improve them?
  • Most of the references are aged, considering the topic, therefore it is suggested to update them.

Author Response

The work applies the multiple quantile regression (MQR) model and compare the results obtained in a previous study by the same Authors, using the multiple linear regression (MLR). Besides, They used the solation forest (IF) algorithm to the observed SM data to detect outliers.

In the following my comments:

  • The general description is, in some part, too poor and should be expanded (for example for the IF is poorly described). As well as the symbol of the equations are not fully explained in the text.

: (Line 153-162) Corrected unnecessary sentences in the manuscript and added the description about an anomaly detection algorithm.

Equation 5 and 6 should be inverted, since the score in the main equation. No description of c(n) is given, for example. 

: (Line 181-187) As you pointed out, the order of equation 5 and 6 has been changed, and the explanation of the equation has been added with the reference [42] as follows:

 

(5)

 

(6)

where  can be estimated by  (Euler's constant) as the harmonic number since c(n) is the constant value to normalize the average path length for n trees.  is the number of nodes ().  is the path length of sample  from the root node.  is the average path length of each from a collection of iTrees.  is the anomaly score used in the following evaluation. The evaluation includes the following processes: (a) if  is very close to 1, then it is clearly an anomaly; (b) if  is much smaller than 0.5, then it may truly be a normal point. For instance, when s is 1,  will be zero (0). This means that all the path length for all n trees get close to the root node. In this study, the IF structure consisted of the sklearn-ensemble library in Python.

Chen, W.; Yun, Y.-H.; wen, m.; Lu, H.; Zhang, Z.; Liang, Y. Representative subset selection and outlier detection via isolation forest. Anal. Methods 2016, 8, 7225–7231; DOI:10.1039/C6AY01574C

  • In Figure 4 are the numbers at the top of the figure the ID of the stations?

: (Figure 4) At the top of the number in figure 4 is the same as the station number in table 1 and to clarify this, we added the text “Station No.” at the top.

  • The results are only a report of numbers, without a real deeper discussion. For example, why the MQR results for the clay were better than the others? 

: (Line 320-337) added the discussion about the results in the manuscript with table.

  • This sentence "Due to the nature of Korea, which features a monsoon climate, the correlation coefficient is high in spring because rainfall is concentrated in the summer, whereas relatively little rainfall falls in the spring, and the fluctuations caused by this were small." looks ambigous, could you clarify it?

: (Line 249-255) The sentence was revised as follows.

“The reason why R2 of clay was low in summer was considered to be due to the climatic characteristics of South Korea associated with monsoon season. Every year from June to July, there is a rainy season known as Jangma, which heavy rainfall is concentrated, and it may cause some places to flood. The uncertainty of the soil moisture variation pattern is largely due to the rainy season in the summer, and the predicted accuracy decreases accordingly. On the other hand, in the spring, there is relatively little rainfall, so the pattern of soil moisture change is monotonous and seems to have a high correlation.”

  • There was an improvement from the previous study but still the godness of fit indexes are poor in many cases. Did you try to consider other variables for the MQR? Could you think or suggest on how to improve them?

: (Line 340-361) As you mentioned, we can expect to improve the model performance of the MQR model through the application of additional variables, such as albedo, brightness, greenness, etc., but we did not add any variables because it was judged to be inconsistent with the purpose of this study. The description of this has been added to the manuscript.

 

  • Most of the referencses are aged, considering the topic, therefore it is suggested to update them.

: (reference number 8 to 13, and 30 to 34) we have added some references to apply recent research trends.

Author Response File: Author Response.docx

Reviewer 2 Report

Dear authors,

 

My compliments on a well-written and concise paper. The methodology and presentation of results is very good, and I only have some minor comments, see below.

Line 113: Please provide the full reference of this 'previous paper'.

Line 121: data were applied. Data has a plural form.

Line 148: Figure 2, it might be good to also include an elevation map, which could possibly link with SM regressions that are poorer over e.g. sloped terrain?

Line 160: I cannot distinguish the light green from dark green. Check whether the figure was made properly.

Eq 6: Please include the text 'where E(h(x)) is the average value of h(x) from a collection of iTrees'.

Line 225-226: shorten the expression of the values by using an equation.

 

Please address these comments.

 

Author Response

My compliments on a well-written and concise paper. The methodology and presentation of results is very good, and I only have some minor comments, see below.

Line 113: Please provide the full reference of this 'previous paper'.

: (Line 113) Reference added in “previous paper”.

Line 121: data were applied. Data has a plural form.

: (Line 121) The manuscript was revised.

Line 148: Figure 2, it might be good to also include an elevation map, which could possibly link with SM regressions that are poorer over e.g. sloped terrain?

: (Figure 2) The elevation is added. Also, the poorer results were explained using elevation in discussion.

 

Line 160: I cannot distinguish the light green from dark green. Check whether the figure was made properly.

: (Figure 3) The Figure 3 is revised. Based on IF theory, if the scores is much smaller than 0.5, then it may truly be a normal point. In addition, the anomalous data has shorter paths compared normal data, thus distinguishing them from the rest of the data.

 

Eq 6: Please include the text 'where E(h(x)) is the average value of h(x) from a collection of iTrees'.

: (Line 181-187) The description was added. Also, relationship E(h(x)) and score was described as follows.

“ is the path length of sample  by the number of edges x traverses and iTrees from the root node until the traversal is terminated at an external node.  is the average path length of each from a collection of iTrees.  is the anomaly score used in the following evaluation. The evaluation includes the following processes: (a) if  is very close to 1, then it is clearly an anomaly; (b) if  is much smaller than 0.5, then it may truly be a normal point. For instance, when s is 1,  will be zero (0). This means that all the path length for all n trees get close to the root node. In this study, the IF structure consisted of the sklearn-ensemble library in Python.

 

Line 225-226: shorten the expression of the values by using an equation.

: (Line 244-246) The manuscript was revised as follow.

“As mentioned above, quantile regression was analyzed with a total of 19 quantiles of 0.05 intervals from 0.05 to 0.95.”

Author Response File: Author Response.docx

Reviewer 3 Report

The authors evaluated the performance of multiple quantile regression analysis for estimating the spatial pattern of soil moisture by comparing the results of multiple linear regression using 58 stations’ data in South Korea. The study has clear objectives, and the experiment was carefully designed. The authors reviewed the existing literature thoroughly and described the methods well. The manuscript is generally well-written and easy to follow. I have only a few suggestions for improving the presentation quality of the manuscript.

 

  1. The manuscript could be organized more tightly. The authors described methods in the results section, and I suggest that they move the description of the methods to the methods section. See my remarks below.

 

  1. Some statistical analysis could be conducted for comparing the performance of MLR and MQR analyses. The authors currently rely on R2.

 

  1. Table and figure captions could be more descriptive. Some acronyms are not defined. It is also unclear how some specific stations were chosen. Are they representative stations based on what criteria? (i.e., stations shown in Figure 4 and Figure 5)

 

  1. While generally clearly written, there are a few redundant or missing words. See my comment below.

 

  1. Line 18, Remove “upon” after “To improve.”

 

  1. Lines 26-27. Change “the determinant coefficient” to “coefficient of determination.”

 

  1. Line 33. Improving both bias and variance? Perhaps “Reducing” might be a better word?

 

  1. Line 41. Change “use” to “using.”

 

  1. Line 103. Cite a reference for “the IDW technique.”

 

  1. Table 1. The authors can spell out the station names in the Station columns.

 

  1. Line 141. Change “Figure 2(b) show…” to “Figure 2(b) show.”

 

  1. Figure 2. Legend. What boundary is the base map referring to? I presume that it is provincial boundary.

 

  1. Lines 151 and 153. “IF is a new technique based on machine learning.” The same phrase is repeated twice.

 

  1. Line 167. Remove “the” in “the an outline score.”

 

  1. Lines 199-202. This section belongs to methods.

 

  1. Lines 218223. - This section belongs to methods.

 

  1. Table 2. Define acronyms (e.g., GT, Con., MDVI, LST).

 

  1. Lines 247 and 254. Change “4” to “four.”

Author Response

The authors evaluated the performance of multiple quantile regression analysis for estimating the spatial pattern of soil moisture by comparing the results of multiple linear regression using 58 stations’ data in South Korea. The study has clear objectives, and the experiment was carefully designed. The authors reviewed the existing literature thoroughly and described the methods well. The manuscript is generally well-written and easy to follow. I have only a few suggestions for improving the presentation quality of the manuscript. 

  1. The manuscript could be organized more tightly. The authors described methods in the results section, and I suggest that they move the description of the methods to the methods section. See my remarks below. 

: The manuscript was revised based on the points indicated below.

  1. Some statistical analysis could be conducted for comparing the performance of MLR and MQR analyses. The authors currently rely on R2. 

: (Line 268-282) To analyze the results of MLR and MQR, index of agreement (IOA) was additionally calculated and added to Table 4. Also added explanations for RMSE and IOA. Through this modification, the results of the two models can be compared in a more diverse way.

 

  1. Table and figure captions could be more descriptive. Some acronyms are not defined. It is also unclear how some specific stations were chosen. Are they representative stations based on what criteria? (i.e., stations shown in Figure 4 and Figure 5) 

: These representative stations in Figure 4 and 5 were recommended by the previous paper from Jung et al. (2017). From that paper, these selected were defined by considering physical characteristics, which are Field Capacity (FC) and Wilting Point (WP), to each soil type. Also, this explanation has been added to the manuscript in Line 272 to 275.

“Jung, C.G.; Lee, Y.G.; Cho, Y.; Kim, S. A study of spatial soil moisture estimation using a multiple linear regression model and MODIS land surface temperature data corrected by conditional merging. Remote Sens. 2017, 9, 870; DOI:10.3390/rs9080870.”

  1. While generally clearly written, there are a few redundant or missing words. See my comment below. 

: The manuscript was revised based on the points indicated below.

 

  1. Line 18, Remove “upon” after “To improve.” 

: (Line 18) The manuscript was revised.

 

  1. Lines 26-27. Change “the determinant coefficient” to “coefficient of determination.” 

: (Line 26-27) The manuscript was revised.

 

  1. Line 33. Improving both bias and variance? Perhaps “Reducing” might be a better word? 

: (Line 34) The manuscript was revised.

 

  1. Line 41. Change “use” to “using.” 

: (Line 41) The manuscript was revised.

 

  1. Line 103. Cite a reference for “the IDW technique.” 

: (Line 103) Reference added in the IDW technique [40].

“Shepard, D. A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the Proceedings of the 1968 23rd ACM national conference on -; ACM Press: New York, New York, USA, 1968; pp. 517–524.”

 

  1. Table 1. The authors can spell out the station names in the Station columns. 

: (Table 1) Most of the SM stations used in this study were operated by RDA. These stations were not provided as their “name” but provided as installed addresses such as Gapyeong-eup, Gapyeong-gun, Gyeonggi-do and Yeongok-myeon, Gangneung-si, Gangwon-do. Therefore, providing such a full address name of total 58 stations was determined to be unnecessary in describing the purpose and content of the overall study, so the names of the stations were abbreviated.

 

  1. Line 141. Change “Figure 2(b) show…” to “Figure 2(b) show.” 

: (Line 143) It was modified as follows: Figure 2b show ….

 

  1. Figure 2. Legend. What boundary is the base map referring to? I presume that it is provincial boundary. 

: (Figure 2) the name of “boundary” in the picture modified to “Provincial boundary”.

  1. Lines 151 and 153. “IF is a new technique based on machine learning.” The same phrase is repeated twice. 

: (Line 153-162) Corrected unnecessary sentences in the manuscript and added the description about an anomaly detection algorithm.

  1. Line 167. Remove “the” in “the an outline score.” 

: (Line 175) The manuscript was revised.

  1. Lines 199-202. This section belongs to methods. 

: (Line 222-236, Figure 4, and Table 2) Corrected sentences that did not fit this section. In addition, Figure 4 was modified and Table 2 was added for additional analysis. We have also added the description of this figure and table.

  1. Lines 218223. - This section belongs to methods. 

: (Line 213-216) The manuscript was revised.

  1. Table 2. Define acronyms (e.g., GT, Con., MDVI, LST). 

: (Table 3) Added description of abbreviations.

  1. Lines 247 and 254. Change “4” to “four.”

: (Line 276 and 285) The manuscript was revised.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The Authors greatly improved the quality of the manuscript, which is now in a considerably better form. 

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Back to TopTop