Next Article in Journal
Environmental Impact of the High Concentrator Photovoltaic Thermal 2000x System
Previous Article in Journal
User, Public, and Professional Perceptions of the Greenways in the Pearl River Delta, China
 
 
Article
Peer-Review Record

Evaluation of Sediment Trapping Efficiency of Vegetative Filter Strips Using Machine Learning Models

Sustainability 2019, 11(24), 7212; https://doi.org/10.3390/su11247212
by Joo Hyun Bae 1, Jeongho Han 2, Dongjun Lee 2, Jae E Yang 3, Jonggun Kim 2, Kyoung Jae Lim 2, Jason C Neff 4 and Won Seok Jang 4,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Sustainability 2019, 11(24), 7212; https://doi.org/10.3390/su11247212
Submission received: 19 November 2019 / Revised: 4 December 2019 / Accepted: 8 December 2019 / Published: 16 December 2019

Round 1

Reviewer 1 Report

In this manuscript, the authors developed machine learning models to predict sediment trapping efficiency of vegetative filter strips (VFS). The authors developed models using seven machine learning algorithms and compared with VFSMOD-W results. The authors also validated the machine learning models using the data set reported by Barfield and colleagues. Overall, the seven machine learning models showed good performance (R2 > 0.8). The authors well explained methodology and results. The topic of this manuscript, sediment erosion from agricultural area which could affect sustainability of the land and nearby stream, is well focused on the aims of this journal. Machine learning is most popular tool in modeling field which can attract audience. Even though this manuscript is well prepared, this review notice some parts still needing revision. Here is line by line comments.

 

Line 65-67; The statement needs more supporting references. This comment also applies to line 129, line 169-170, line 369-371, and line 375-377. The statements might look obvious but additional reference would be needed to support them and help audience for better understanding.

Line 118; Add abbreviation for "Korean Soil Information System (KSIS)". All abbreviations should be spelled out before use (see line 148).

Line 141; Same bracket types should not be used for better readability. (.... Curve Number (CN))

Line 145; the authors tabulated input variables in Table 2 but seven of them were used for machine learning models because two variables (pf, vg) have single value. This information should be clearly informed in the method section.

Line 171; the authors should state why the seven machine learning tools were selected (e.g., based on popularity, best available code, etc.)

Line 175-177; to improve manuscript and consistency, the authors should use either notations or same full terms. For example, there are Multi Layer Perceptron, Multilayer perceptron, and MLP throughout this manuscript, figures, and tables. Audiences may confuse with those terms. 

Line 438, 445, 456, 479, 496; there are error messages possibly due to Figure labels. Please check figure numbers throughout this manuscript since there are two Figure 3s.

Line 440; the authors need better explanation why scatter plot is not suitable for a continuous numerical data. Is this due to the distinct data formats (continuous vs. categorical)? 

Line 456; show p-value to support the statement that rv and sediment trapping efficiency shows strong positive relationship.

Line 497; according to the beginning statement, the line 497 is the first part of the section 3.4. this reviewer would recommend to move the beginning statement (line 497) to line 479. Or move line 479-496 to after line 503.

Line 519; please check r2 value for C value with 50.

 

 

 

Author Response

Reviewer1

Here are responses to the reviewer comments:

1) Line 65-67; The statement needs more supporting references. This comment also applies to line 129, line 169-170, line 369-371, and line 375-377. The statements might look obvious but additional reference would be needed to support them and help audience for better understanding.

-> We added references to each sentence (lines 69-71, 138-140, 184-185, 386-388, and 392-394).

 

2) Line 118; Add abbreviation for "Korean Soil Information System (KSIS)". All abbreviations should be spelled out before use (see line 148).

-> We corrected it to “Data for soil type, slope, and drainage class for agricultural fields were obtained from Korean Soil Information System (KSIS).”(lines 126-127). We also reviewed all abbreviations over the manuscript.

 

3) Line 141; Same bracket types should not be used for better readability. (.... Curve Number (CN))

-> “runoff-related parameters (e.g., rainfall intensity, rainfall duration time and Soil Conservation Service (SCS) Curve Number (CN))” has been changed to  “runoff-related parameters with examples of rainfall intensity, rainfall duration time and Soil Conservation Service (SCS) Curve Number (CN)” (lines 151-152).

 

4) Line 145; the authors tabulated input variables in Table 2 but seven of them were used for machine learning models because two variables (pf, vg) have single value. This information should be clearly informed in the method section.

-> As you advised, we added the explanation about it in Section 2.2.2 because the two variables (pf, vg) have a single value (lines 178-180).

 

5) Line 171; the authors should state why the seven machine learning tools were selected (e.g., based on popularity, best available code, etc.)

-> We revised it as follows (lines 186-188).

“In order to build a model for estimating sediment trapping efficiency of VFSs, seven supervised learning methods were used, which have been widely used in recent years by data scientists.”

 

6) Line 175-177; to improve manuscript and consistency, the authors should use either notations or same full terms. For example, there are Multi Layer Perceptron, Multilayer perceptron, and MLP throughout this manuscript, figures, and tables. Audiences may confuse with those terms.

-> We made a consistency with ‘Multi Layer Perceptron’ and ‘MLP’ for the abbreviation.

 

7) Line 438, 445, 456, 479, 496; there are error messages possibly due to Figure labels. Please check figure numbers throughout this manuscript since there are two Figure 3s.

-> We have carefully looked into the figure labels and revised them.

 

8) Line 440; the authors need better explanation why scatter plot is not suitable for a continuous numerical data. Is this due to the distinct data formats (continuous vs. categorical)?

-> Although continuous data could be presented in scatterplots, it is expected that the format of this figure can improve readability by presenting the legend in a categorical form with the sediment trapping efficiency. We revised the sentence as lines 457-458.

 

9) Line 456; show p-value to support the statement that rv and sediment trapping efficiency shows strong positive relationship.

-> We added the sentence, “The sediment trapping efficiency (ste) had a statistically significant difference in the paired T-test results from each attribute (p < .001).”, on lines 487-489.

 

10) Line 497; according to the beginning statement, the line 497 is the first part of the section 3.4. this reviewer would recommend to move the beginning statement (line 497) to line 479. Or move line 479-496 to after line 503.

-> We have rearranged the sentences to fit the flow (lines 497-498).

 

11) Line 519; please check r2 value for C value with 50.

-> We revised it correctly (line 537).

Author Response File: Author Response.pdf

Reviewer 2 Report

Below are listed my comments about this paper, which deals with the modelling of filter strips through machine learning methods. 

1) Abstract. This section is not well approached, since it contains a lot of acronyms and detailed data, but lacks any context or background. It should be reformulated.

2) The introduction section is well structured, but the need for developing this study (e.g. the need for machine learning) is unclear. These methods still require a series of inputs to be applied, and they are also more demanding than other approaches in computational terms. The authors must provide better reasons to justify the need for this investigation.

2) I have serious concerns about the scale and level of detail of the simulations. There is no clue about catchment delineation or similar. Taking into account the physical singularity of filter strips, I think more details should be added about the characteristics of the study area (or areas) used for the simulations. 

3) Lines 151-152. In line with the previous comment, how was the 60-min rainfall duration established? This is usually done from the longest flow path of catchments, but nothing is mentioned about this in the article.

4) Regarding the goodness-of-fit measures used. I am surprised the authors disregarded the Nash-Sutcliffe coefficient, which is so common in hydrological modelling. Also, the R2 coefficient has been found to be an unrelieable indicator in comparison with the Adj. R2 or especially the Pred. R2.

5) To properly ensure the validity and robustness of their simulations, the authors should also conduct a residual analysis.

6) Lines 367, 374. Is RMSE expressed as a percent? I think it is not.

7) Lines 438-439, 445-446, 456, 479, 496. There are several errors with cross-references.

8) Figure 3. What is the point of these plots? What about the p-values of these correlations, in order to verify if the relationships are statistically significant?

9) Lines 509-510, 529-530, 549-551. The reasons provided by the authors in this section are worrying, since their only aim is to maximise R2 by making changes in the parameters, including more hidden layers, etc. This kind of practices can lead to overfit issues, which in turn can endanger the reliability of the results achieved.

10) Most of the conclusions are a repetition of the abstract and the results achieved, while less attention is paid to the main findings derived from the research. Besides, citations should be avoided in this section. For instance, the optimization of the calibration of parameters through mathematical methods should be a must for future investigations in this line. 

Author Response

Reviewer2

Here are responses to the reviewer comments:

1) Abstract. This section is not well approached, since it contains a lot of acronyms and detailed data, but lacks any context or background. It should be reformulated.

-> We updated the abstract based on your comments (lines 17-35).

 

2) The introduction section is well structured, but the need for developing this study (e.g. the need for machine learning) is unclear. These methods still require a series of inputs to be applied, and they are also more demanding than other approaches in computational terms. The authors must provide better reasons to justify the need for this investigation.

-> In addition to the need for machine learning described in lines 69-74, we added more description of the advantages of machine learning applications (lines 98-101).

 

3) I have serious concerns about the scale and level of detail of the simulations. There is no clue about catchment delineation or similar. Taking into account the physical singularity of filter strips, I think more details should be added about the characteristics of the study area (or areas) used for the simulations.

-> This study did not target any specific watersheds or fields but whole agricultural fields in South Korea. Thus, we investigated the general characteristics over agricultural fields in South Korea. We set up scenarios by combining various conditions of agricultural fields and VFS. To clarify this, we have added the Figure 1 and additional description to the manuscript (lines 116, 121-122, and 159-160).

 

4) Lines 151-152. In line with the previous comment, how was the 60-min rainfall duration established? This is usually done from the longest flow path of catchments, but nothing is mentioned about this in the article.

-> We established the 60-min rainfall duration that referred to the previous studies (Otto et al., 2012; Han et al., 2012; Dillaha et al., 1988) that observed rainfall-runoff events at the experimental plots considering the rainfall duration and intensity that can give rise to runoff from agricultural fields. Otto et al. (2012) evaluated the effect of the rainfall amount and duration that occurred runoff with the 17 events (i.e., rainfall duration varies from 30-min to 37-hour 5-min). Otto et al. (2012) indicated that runoff occurred from the 30-min rainfall duration. Han et al. (2012) measured the 30 events (i.e., rainfall duration ranges from 30-min to 55-hour) and runoff occurred from the 30-min rainfall duration. Dillaha et al. (1988) observed that runoff occurred with 60-min rainfall duration in all plots. Thus, we set the 60-min rainfall duration with various rainfall intensity so that runoff from agricultural fields can be generated. We did not mention how to determine design rainfall in detail because the detailed explanation about design rainfall is beyond the scope of this study which concentrates on development of machine learning models to evaluate sediment trapping efficiency of vegetative filter strips.

 

References

Dillaha, T.A., Sherrard, J.H., Lee, D., Mostaghimi, S., Shanholtz, V.O., 1988. Evaluation of vegetative filter strips as a best management practice for feed lots. J. Water Pollut. Control Fed. 60(7), 1231-1238. https://www.jstor.org/stable/25043629?origin=JSTOR-pdf&seq=1#metadata_info_tab_contents.

 

Han, S., Xu, D., Wang, S., 2012. Runoff formation from experimental plot, field, to small catchment scales in agricultural North Huaihe River Plain, China. Hydrol. Earth Syst. Sci. 16, 3115-3125. doi:10.5194/hess-16-3115-2012.

 

Otto, S., Cardinali, A., Marotta, E., Paradisi, C., Zanin, G., 2012. Effect of vegetative filter strips on herbicide runoff under various types of rainfall. Chemosphere. 88(1), 113-119. doi:10.1016/j.chemosphere.2012.02.081.

 

5) Regarding the goodness-of-fit measures used. I am surprised the authors disregarded the Nash-Sutcliffe coefficient, which is so common in hydrological modelling. Also, the R2 coefficient has been found to be an unrelieable indicator in comparison with the Adj. R2 or especially the Pred. R2.

-> Equation (8) is to calculate the Nash Sutcliffe model Efficiency coefficient (NSE). The score function of each Scikit-Learn regression which is the machine learning library in Python that we used to develop a new machine learning-based model is NSE. Thus, we have replaced R2 with NSE throughout the manuscript. Also, for better flow of model validation section, lines 369-372 were revised.

 

6) To properly ensure the validity and robustness of their simulations, the authors should also conduct a residual analysis.

-> For the linear regression model, the residual scatterplot is analyzed to describe the variables. For a study using nonlinear models as here in the study, however, residual analysis may not be very meaningful for model validation. Using NSE, RMSE and MAPE for model validation is sufficient (Patricia et al., 2018 and Julie et al., 2016).

 

References

Jimeno-Sáez, P., Senent-Aparicio, J., Pérez-Sánchez, J., Pulido-Velazquez, D., 2018. A comparison of SWAT and ANN models for daily runoff simulation in different climatic zones of peninsular Spain. Water 10(2), 192. doi:10.3390/w10020192.

 

Shortridge, J.E., Guikema, S.D., Zaitchik, B.F., 2016. Machine learning methods for empirical streamflow simulation: A comparison of model accuracy, interpretability, and uncertainty in seasonal watersheds. Hydrol. Earth Syst. Sci. 20, 2611-2628. doi:10.5194/hess-20-2611-2016.

 

7) Lines 367, 374. Is RMSE expressed as a percent? I think it is not.

-> RMSE is expressed with the same units of the outcome. Thus, the unit for RMSE is expressed in a ‘percent (%)’ because the unit for the sediment trapping efficiency is a ‘percent (%)’.

 

8) Lines 438-439, 445-446, 456, 479, 496. There are several errors with cross-references.

-> We revised all that you mentioned and checked all figure numbers.

 

9) Figure 3. What is the point of these plots? What about the p-values of these correlations, in order to verify if the relationships are statistically significant?

-> The plots show the histogram of each attribute and the relationship between the attributes considering the sediment trapping efficiency class (ste_c). The p-value was added to identify statistical significance in the additional sentence (lines 487-489).

 

10) Lines 509-510, 529-530, 549-551. The reasons provided by the authors in this section are worrying, since their only aim is to maximize R2 by making changes in the parameters, including more hidden layers, etc. This kind of practices can lead to overfit issues, which in turn can endanger the reliability of the results achieved.

-> The reason why we validated the training and test data together by making the changes in the hyperparameter is to avoid over-fitting problem. The optimal value of hyperparameter was determined when objective functions (i.e., NSE, RMSE and MAPE) for both training and testing (or validation) showed good results. Thus, the results of this study are reliable without over-fitting problem.

 

11) Most of the conclusions are a repetition of the abstract and the results achieved, while less attention is paid to the main findings derived from the research. Besides, citations should be avoided in this section. For instance, the optimization of the calibration of parameters through mathematical methods should be a must for future investigations in this line.

-> We revised and updated the conclusion section (lines 579-605)

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors are commended for their efforts to improve the manuscript. I am satisfied with the modifications made and the responses provided to my comments, so I believe that the manuscript is now ready for publication.

Back to TopTop