# Evaluation of Sediment Trapping Efficiency of Vegetative Filter Strips Using Machine Learning Models

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methodology

#### 2.1. Study Area

#### 2.2. Model Introduction and Data

#### 2.2.1. VFSMOD-W Model

^{−1/3}) and slope) and by sediment parameters (e.g., particle size diameter (cm), particle weight density (g/cm

^{3}) and porosity of deposited sediment (%)). Considering these various factors for the VFS simulation is able to estimate the efficiencies of the VFS using multiple different scenarios [3].

#### 2.2.2. VFSMOD-W Data Used for Machine Learning

#### 2.3. Machine Learning Applications

#### 2.3.1. Data Preprocessing in Machine Learning

#### 2.3.2. Decision Tree

#### 2.3.3. Multilayer Perceptron (MLP)

#### 2.3.4. K-Nearest Neighbors (KNN)

#### 2.3.5. Support Vector Machine (SVM)

#### 2.3.6. Ensemble Learning

#### Random Forest

_{k}is independently generated. The tree is grown using the training set and θ

_{k}, and h (x, θ

_{k}) is generated. For instance, in bagging the random vector ($\mathsf{\theta}$) is the coefficient of the N box resulting from N darts thrown randomly into the box where N is an example number of training sets. $\mathsf{\theta}$ consists of a number of independent random integers between 1 and K. After a large number of trees is generated, the trees vote for the most popular class. These procedures are called random forest [39].

#### AdaBoost

#### Gradient Boosting

#### 2.4. Model Validation

## 3. Results and Discussion

#### 3.1. Development of Machine Learning Models

#### 3.2. Validation of Machine Learning Models

#### 3.3. Data Analysis to Develop Machine Learning Models

#### 3.4. Sensitivity Analysis of Model Hyperparameters

## 4. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Lee, J.; Eom, J.S.; Kim, B.C.; Jang, W.S.; Ryu, J.C.; Kang, H.; Kim, K.S.; Lim, K.J. Water quality prediction at mandae watershed using SWAT and water quality improvement with vegetated filter strip. J. Korean Soc. Agric. Eng.
**2011**, 53, 37–45. [Google Scholar] [CrossRef][Green Version] - Schmitt, T.J.; Dosskey, M.G.; Hoagland, K.D. Filter strip performance and processes for different vegetation, widths, and contaminants. J. Environ. Qual.
**1999**, 28, 1479–1489. [Google Scholar] [CrossRef][Green Version] - Muñoz-Carpena, R.; Parsons, J.E. VFSMOD-w Vegetative Filter Strips Modelling System–Model Documentation and User’s Manual Version 6; Press of University of Florida: Gainesville, FL, USA, 2014. [Google Scholar]
- Golkowska, K.; Rugani, B.; Koster, D.; Van Oers, C. Environmental and economic assessment of biomass sourcing from extensively cultivated buffer strips along water bodies. Environ. Sci. Policy
**2016**, 57, 31–39. [Google Scholar] [CrossRef] - Keesstra, S.; Nunes, J.; Novara, A.; Finger, D.; Avelar, D.; Kalantari, Z.; Cerdà, A. The superior effect of nature based solutions in land management for enhancing ecosystem services. Sci. Total Environ.
**2018**, 610, 997–1009. [Google Scholar] [CrossRef] [PubMed][Green Version] - Lambrechts, T.; François, S.; Lutts, S.; Muñoz-Carpena, R.; Bielders, C.L. Impact of plant growth and morphology and of sediment concentration on sediment retention efficiency of vegetative filter strips: Flume experiments and VFSMOD modeling. J. Hydrol.
**2014**, 511, 800–810. [Google Scholar] [CrossRef] - Park, Y.S.; Engel, B.; Shin, Y.; Choi, J.; Kim, N.W.; Kim, S.J.; Kong, D.S.; Lim, K.J. Development of Web GIS-based VFSMOD System with three modules for effective vegetative filter strip design. Water
**2013**, 5, 1194–1210. [Google Scholar] [CrossRef][Green Version] - White, M.J.; Arnold, J.G. Development of a simplistic vegetative filter strip model for sediment and nutrient retention at the field scale. Hydrol. Process.
**2009**, 23, 1602–1616. [Google Scholar] [CrossRef][Green Version] - Kamilaris, A.; Kartakoullis, A.; Prenafeta-Boldú, F.X. A review on the practice of big data analysis in agriculture. Comput. Electron. Agric.
**2017**, 143, 23–37. [Google Scholar] [CrossRef] - Aiken, V.C.F.; Dórea, J.R.R.; Acedo, J.S.; de Sousa, F.G.; Dias, F.G.; de Magalhães Rosa, G.J. Record linkage for farm-level data analytics: Comparison of deterministic, stochastic and machine learning methods. Comput. Electron. Agric.
**2019**, 163, 104857. [Google Scholar] [CrossRef] - Kim, J.; Park, Y.S.; Lee, S.; Shin, Y.; Lim, K.J.; Kim, K. Study of selection of regression equation for flow-conditions using machine-learning method: Focusing on Nakdonggang waterbody. J. Korean Soc. Agric. Eng.
**2017**, 59, 97–107. [Google Scholar] [CrossRef] - Partal, T.; Cigizoglu, H.K. Estimation and forecasting of daily suspended sediment data using wavelet–neural networks. J. Hydrol.
**2008**, 358, 317–331. [Google Scholar] [CrossRef] - Tiwari, M.K.; Chatterjee, C. Development of an accurate and reliable hourly flood forecasting model using wavelet-bootstrap-ANN (WBANN) hybrid approach. J. Hydrol.
**2010**, 364, 458–470. [Google Scholar] [CrossRef] - Adamowski, J.; Chan, H.F. A wavelet neural network conjunction model for groundwater level forecasting. J. Hydrol.
**2011**, 407, 28–40. [Google Scholar] [CrossRef] - Jothiprakash, V.; Magar, R.B. Multi-time-step ahead daily and hourly intermittent reservoir inflow prediction by artificial intelligent techniques using lumped and distributed data. J. Hydrol.
**2012**, 450, 293–307. [Google Scholar] [CrossRef] - Rajaee, T.; Nourani, V.; Zounemat-Kermani, M.; Kisi, O. River suspended sediment load prediction: Application of ANN and wavelet conjunction model. J. Hydrol. Eng.
**2011**, 16, 613–627. [Google Scholar] [CrossRef] - Thirumalaiah, K.; Deo, M.C. River stage forecasting using artificial neural networks. J. Hydrol. Eng.
**1998**, 3, 26–31. [Google Scholar] [CrossRef] - Dawson, C.W.; Wilby, R. An artificial neural network approach to rainfall-runoff modelling. Hydrol. Sci. J.
**1998**, 43, 47–66. [Google Scholar] [CrossRef] - Coulibaly, P.; Anctil, F. Real-time short-term natural water inflows forecasting using recurrent neural networks. In Proceedings of the IJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No. 99CH36339) (IEEE), Washington, DC, USA, 10–16 July 1999; Volume 6, pp. 3802–3805. [Google Scholar] [CrossRef]
- Adnan, R.; Ruslan, F.A.; Samad, A.M.; Zain, Z.M. Flood water level modelling and prediction using artificial neural network: Case study of Sungai Batu Pahat in Johor. In Proceedings of the 2012 IEEE Control and System Graduate Research Colloquium (ICSGRC), Shah Alam, Selangor, Malaysia, 16–17 July 2012. [Google Scholar] [CrossRef]
- Kim, M.; Baek, S.; Ligaray, M.; Pyo, J.; Park, M.; Cho, K.H. Comparative studies of different imputation methods for recovering streamflow observation. Water
**2015**, 7, 6847–6860. [Google Scholar] [CrossRef][Green Version] - Lee, J.H.; Heo, J.H. Evaluation of estimation methods for rainfall erosivity based on annual precipitation in Korea. J. Hydrol.
**2011**, 409, 30–48. [Google Scholar] [CrossRef] - Korean Statistical Information Servise (KOSIS). Available online: http://kosis.kr/index/index.do (accessed on 24 July 2019).
- Korean Soil Informaion System (KSIS). Available online: http://soil.rda.go.kr/soil/index.jsp (accessed on 24 July 2019).
- Korea Precipitation Frequency Data Server (KPFDS). Available online: http://www.k-idf.re.kr/ (accessed on 24 July 2019).
- Choi, K.; Lee, S.; Jang, J. Vegetative filter strip (Vfs) applications for runoff and pollution management in the saemangeum area of Korea. Irrig. Drain.
**2016**, 65, 168–174. [Google Scholar] [CrossRef] - Müller, A.C.; Guido, S. Introduction to Machine Learning with Python: A Guide for Data Scientists; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2016. [Google Scholar]
- Teng, C.-M. Correcting Noisy Data. In ICML; Citeseer: San Francisco, CA, USA, 1999; pp. 239–248. [Google Scholar]
- Kotsiantis, S.B.; Kanellopoulos, D.; Pintelas, P.E. Data preprocessing for supervised learning. Int. J. Comput. Sci.
**2006**, 1, 111–117. [Google Scholar] - Salzberg, S.L. C4. 5: Programs for machine learning by j. ross quinlan. morgan kaufmann publishers, inc., 1993. Mach. Learn.
**1994**, 16, 235–240. [Google Scholar] [CrossRef][Green Version] - Quinlan, J.R. Induction of Decision Trees. Mach. Learn.
**1986**, 1, 81–106. [Google Scholar] [CrossRef][Green Version] - Bishop, C.M. Neural Networks for Pattern Recognition; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning Internal Representations by Error Propagation; California Univ San Diego La Jolla Inst for Cognitive Science: San Diego, CA, USA, 1985. [Google Scholar]
- Tian, J.; Morillo, C.; Azarian, M.H.; Pecht, M. Motor bearing fault detection using spectral kurtosis-based feature extraction coupled with K-Nearest Neighbor distance analysis. IEEE Trans. Ind. Electron.
**2016**, 3, 1793–1803. [Google Scholar] [CrossRef] - Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn.
**1995**, 20, 273–297. [Google Scholar] [CrossRef] - Nawar, S.; Mouazen, A.M. Comparison between random forests, artificial neural networks and gradient boosted machines methods of on-line Vis-NIR spectroscopy measurements of soil total nitrogen and total carbon. Sensors
**2017**, 17, 2428. [Google Scholar] [CrossRef] - Bauer, E.; Kohavi, R. Empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach. Learn.
**1999**, 36, 105–139. [Google Scholar] [CrossRef] - Drucker, H.; Cortes, C. Boosting Decision Trees. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1996; pp. 479–485. [Google Scholar]
- Breiman, L. Random Forest. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef][Green Version] - Panchal, G.; Ganatra, A.; Kosta, Y.P.; Panchal, D. Behaviour analysis of multilayer perceptronswith multiple hidden neurons and hidden layers. Int. J. Comput. Theory Eng.
**2011**, 45, 5–32. [Google Scholar] [CrossRef][Green Version] - Viola, P.; Jones, M.J. Robust Real-Time Face Detection. Int. J. Comput. Vis.
**2004**, 57, 37–154. [Google Scholar] [CrossRef] - Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot.
**2013**, 7, 21. [Google Scholar] [CrossRef] [PubMed][Green Version] - Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I -A discussion of principles. J. Hydrol.
**1970**, 10, 282–290. [Google Scholar] [CrossRef] - Barfield, B.J.; Blevins, R.L.; Fogle, A.W.; Madison, C.E.; Inamdar, S.; Carey, D.I.; Evangelou, V.P. Water quality impacts of natural filter strips in karst areas. Trans. Am. Soc. Agric. Eng.
**1998**, 41, 371–381. [Google Scholar] [CrossRef]

**Figure 2.**Comparison of sediment trapping efficiencies by (

**a**) Decision Tree, (

**b**) Multilayer perceptrons, (

**c**) k-Nearest Neighbors, (

**d**) Support Vector Machin, (

**e**) Random Forest, (

**f**) AdaBoost, (

**g**) Gradient Boosting and VFSMOD-W model with test data. (

**h**) Comparison of machine learning accuracy.

**Figure 5.**Heat map to analyze correlation coefficients of model attributes and sediment trapping efficiency.

**Figure 7.**(

**a**) Feature importance in Decision Tree Classifier model and (

**b**) Comparison of training and test accuracy as a function of min_samples_leaf in Decision Tree Regressor model.

**Figure 8.**(

**a**) Comparison of training and test accuracy as functions of (

**a**) n_neighbors in KNeighborsRegressor model and (

**b**) hidden_layer_sizes in MLPRegressor model.

**Figure 9.**Comparison of training and test accuracy as functions of (

**a**) degree in SVR model and (

**b**) n_estimators in random forest regressor model.

**Figure 10.**Comparison of training and test accuracy as functions of (

**a**) n_estimators in AdaBoostRegressor model and (

**b**) max_depth in Gradient Boosting Regressor model.

Field Size(ha) | <0.1 | 0.1–0.2 | 0.2–0.3 | 0.3–0.5 | 0.5–0.7 | 0.7–1.0 | >1.0 | |||
---|---|---|---|---|---|---|---|---|---|---|

Area (%) | 12.8 | 26.3 | 12.3 | 19.1 | 9.0 | 7.9 | 12.6 | |||

Soil type | Loamy coarse sand | Loamy fine sand | Loamy sand | Fine sandy loam | Sandy loam | Loam | Silt loam | Silt clay loam | Clay loam | Others |

Area (%) | 0.2 | 1.2 | 0.6 | 4.0 | 26.3 | 44.3 | 16.0 | 3.9 | 2.7 | 0.7 |

Slope (%) | 0–2 | 2–7 | 7–15 | 15–30 | 30–60 | |||||

Area (%) | 9.2 | 30.3 | 39.5 | 19.6 | 1.4 | |||||

Drainage class | Excessively drained | Well drained | Moderately well drained | Somewhat poorly drained | Poorly drained | Very poorly drained | ||||

Area (%) | 5.2 | 86.3 | 8.6 | 0 | 0 | 0 |

**Table 2.**Description of inputs in the vegetative filter strip modeling system (VFSMOD-W) and machine learning models.

Model Parameter | Notation | Value | Number of Model Parameters | Total Number of Scenarios |
---|---|---|---|---|

Rainfall (mm/hour) | rf | 31, 57, 67 | 3 | 53,460 |

CN | cn | 60, 74, 86 | 3 | |

Soil texture | st | 0 (Sandy loam), | 2 | |

1 (Loam) | ||||

Slope (%) | sp | 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 | 10 | |

USLE P-factor | pf | 1 | 1 | |

USLE C-factor | cf | 0.1, 0.3, 0.5 | 3 | |

Ratio of VFS area to source area (%) | rv | 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | 11 | |

Source area (ha) | sa | 0.05, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2 | 9 | |

Vegetation | vg | Turfgrass | 1 |

Machine Learning Models | Module | Function | Notation | |
---|---|---|---|---|

Decision Tree | tree | DecisionTreeRegressor | DT | |

Multilayer Perceptron | neural_network | MLPRegressor | MLP | |

K-Nearest Neighbors | neighbors | KNeighborsRegressor | KN | |

Support Vector Machine | svm | SVR | SVM | |

Ensemble model | Random Forest | ensemble | RandomForestRegressor | RF |

AdaBoost | AdaBoostRegressor | AB | ||

Gradient Boosting | GradientBoostingRegressor | GB |

Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|

Criterion | Entropy | Splitter | best |

min_samples_leaf | 1 | min_samples_split | 2 |

min_impurity_decrease | 0 | random_state | 0 |

Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|

hidden_layer_sizes | (50, 50, 50) | activation | relu |

solver | adam | alpha | 0.0001 |

batch_size | auto | learning_rate | ‘constant’ |

learning_rate_init | 0.001 | power_t | 0.5 |

max_iter | 200 | shuffle | TRUE |

momentum | 0.9 | tol | 1.00 × 10^{−4} |

beta_1 | 0.9 | nesterovs_momentum | TRUE |

epsilon | 1.00 × 10^{−8} | validation_fraction | 0.1 |

beta_2 | 0.999 | n_iter_no_change | 10 |

Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|

n_neighbors | 4 | weights | distance |

algorithm | auto | leaf_size | 30 |

p | 2 | metric | Minkowski |

Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|

kernel | rbf | degree | 3 |

coef0 | 0 | tol | 1e-3 |

C | 50 | epsilon | 0.1 |

Shrinking | TRUE | cache_size | 200 |

max_iter | −1 | gamma | scale |

Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|

n_estimators | 52 | criterion | mse |

min_samples_split | 2 | min_samples_leaf | 1 |

min_weight_fraction_leaf | 0 | max_features | Auto |

min_impurity_decrease | 0 | bootstrap | TRUE |

verbose | 0 |

Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|

n_estimators | 126 | learning_rate | 1 |

loss p | linear |

Hyperparameter | Value | Hyperparameter | Value |
---|---|---|---|

Loss | ls | learning_rate | 0.1 |

n_estimators | 100 | subsample | 1 |

criterion | friedman_mse | min_samples_split | 2 |

min_samples_leaf | 1 | min_weight_fraction_leaf | 0 |

max_depth | 10 | min_impurity_decrease | 0 |

alpha | 0.9 | verbose | 0 |

presort | Auto | validation_fraction | 0.1 |

tol | 1e-4 |

Method | NSE | RMSE (%) | MAPE (%) |
---|---|---|---|

Decision Tree | 0.993 | 2.15 | 2.09 |

Multilayer perceptron | 1.000 | 0.37 | 0.53 |

k-Nearest Neighbors | 0.986 | 3.02 | 2.89 |

Support Vector Machine | 0.992 | 2.29 | 2.97 |

Random Forest | 0.998 | 1.30 | 1.18 |

AdaBoost | 0.831 | 10.78 | 21.81 |

Gradient Boosting | 0.999 | 0.89 | 0.78 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bae, J.H.; Han, J.; Lee, D.; Yang, J.E.; Kim, J.; Lim, K.J.; Neff, J.C.; Jang, W.S. Evaluation of Sediment Trapping Efficiency of Vegetative Filter Strips Using Machine Learning Models. *Sustainability* **2019**, *11*, 7212.
https://doi.org/10.3390/su11247212

**AMA Style**

Bae JH, Han J, Lee D, Yang JE, Kim J, Lim KJ, Neff JC, Jang WS. Evaluation of Sediment Trapping Efficiency of Vegetative Filter Strips Using Machine Learning Models. *Sustainability*. 2019; 11(24):7212.
https://doi.org/10.3390/su11247212

**Chicago/Turabian Style**

Bae, Joo Hyun, Jeongho Han, Dongjun Lee, Jae E Yang, Jonggun Kim, Kyoung Jae Lim, Jason C Neff, and Won Seok Jang. 2019. "Evaluation of Sediment Trapping Efficiency of Vegetative Filter Strips Using Machine Learning Models" *Sustainability* 11, no. 24: 7212.
https://doi.org/10.3390/su11247212