# Multiple Linear Regression Models for Predicting Nonpoint-Source Pollutant Discharge from a Highland Agricultural Region

^{1}

^{2}

^{*}

## Abstract

**:**

^{2}), root mean square error (RMSE), coefficient of variation of the root mean square error (CV(RMSE)), the ratio of the RMSE to the standard deviation of the observed data (RSR) and the Nash–Sutcliffe model efficiency (NSE). The performance of the MLR models of pollutant load except total nitrogen (TN) was good under the condition of RSR, and satisfactory for the NSE and R

^{2}. In the EMC and load/area models, the performance for suspended solids (SS) and total phosphorus (TP) was good for the RSR, and satisfactory for the NSE and R

^{2}. The standardized coefficients for the models were analyzed to identify the influential explanatory variables in the models. In the final performance evaluation, the results of jackknife validation indicate that the MLR models are robust.

## 1. Introduction

_{5}), turbidity, TSS, and antecedent dry days were the most influential independent variables for the bacteria concentrations at the monitoring sites. Several studies have utilized linear regression techniques to predict bacteria concentrations in rivers [16,17,18,19]. Furthermore, regression models have been widely used to predict and characterize rainfall and runoff characteristics and to determine the relationship between these two variables [20,21,22,23,24,25,26]. Process-based erosion prediction models have also been established to predict the intensity of soil erosion in a particular area [27,28,29,30].

## 2. Materials and Methods

#### 2.1. Study Area and Field Data

^{3}. The basin area is 2969.3 km

^{2}; the forest occupies 86.4%; and the dry field, paddy field and residential areas occupy 4.4%, 1.58%, and 1.60% of the basin, respectively.

#### 2.2. Data Analysis

_{i}is the runoff flow at n number of time steps (∆t) and C

_{i}is the concentration of a water quality measurement.

#### 2.3. MLR Model Building

_{0}is the regression constant and a

_{i}is the regression coefficient of the explanatory variable X

_{i}. In type 1, the original variables are used to build the MLR model. In type 2, dependent variables, such as pollutant load, EMC and load/area, are log e-transformed to reduce skewness. In type 3, all the explanatory and dependent variables are log e-transformed. In type 4, the dependent variables and some of the explanatory variables are log e-transformed. The fitness of the four regression equations was evaluated by the coefficients of determination of the MLR models. The MLR models were examined in terms of their ability to predict the runoff pollutant discharge for each water quality variable (SS, COD, BOD, TN, and TP).

^{2}, RMSE, CV(RMSE), RSR and the NSE.

_{i}is the observed daily load, $\overline{O}$ is the mean of the observed daily load, p

_{i}is the calculated daily load, and n is the number of data values. The R

^{2}index describes the ability of the model to explain variability among the data. RSR incorporates the benefits of error index statistics and includes a scaling/normalization factor; the lower the RSR is, the better the model simulation performance. The performance ratings for stream flow proposed by Moriasi et al. [33] were ‘very good’ (0.00 ≤ RSR ≤ 0.50), ‘good’ (0.50 < RSR ≤ 0.60), or ‘satisfactory’ (0.60 < RSR ≤ 0.70). NSE is a normalized statistic that reflects the relative magnitude of the residual variance compared with the variance in the observed data (good (NSE > 0.7), satisfactory (0.4 < NSE ≤ 0.7) and unsatisfactory (NSE ≤ 0.4)) [30,34].

## 3. Results and Discussion

#### 3.1. Correlation Analysis between Nonpoint Pollutant Discharge and Explanatory Variables

#### 3.2. MLR Analysis

^{2}values for SS, COD, BOD, TN, and TP in the type 1 MLR of pollutant load ranged from 0.275 to 0.447. The R

^{2}values for SS, COD, BOD, TN, and TP in the type 1 MLR of EMC and load/area were also low, indicating poor performance of the regression models. The R

^{2}values for SS, COD, BOD, TN, and TP in the type 2 MLR of pollutant load were 0.76, 0.67, 0.64, 0.65, and 0.80, respectively. The R

^{2}values of the type 2 MLR were quite high, but most of the VIF values were larger than 5, with a few values greater than 10. Thus, the VIF showed that multicollinearity was observed in the established models and that the type 2 MLR was not adapted. Although the R

^{2}values of the type 2 MLR for load/area were acceptable, the VIF values were high, indicating multicolinearity. VIF values and other statistics of MLRs were presented only for the selected model. The results of the MLR model employing the type 4 equation are listed in Table 7, Table 8 and Table 9.

^{2}values for SS, COD, BOD, TN, and TP in the type 3 MLR of the pollutant load were also fairly high, but all VIF values were less than 5. Among the type 3 MLR models, the SS, TN, and TP in the MLR of EMC and the SS and TP in the MLR of load/area showed acceptable R

^{2}values. The values of R

^{2}for SS, COD, BOD, TN, and TP in the type 4 MLR of pollutant load were 0.74, 0.69, 0.69, 0.61, and 0.74 respectively. The R

^{2}values of the type 4 MLR were a little better than those of the type 3 MLR, and all VIF values were less than 5. Thus, we selected the type 4 equation as the MLR model to predict the runoff pollutant discharge in the study area. However, the COD and BOD in the MLR of EMC and COD and TN in the MLR of load/area could not explain the variance in the pollutant discharge properly.

^{2}value, 73.6% of the variability of the dependent variable ln(SS load) is explained by the four explanatory variables. The MLR models indicated in Table 7, Table 8 and Table 9 are statistically significant at p < 0.0001 except for the ln(COD EMC) model (p = 0.00019). The R

^{2}values for SS, COD, BOD, TN, and TP in the type 4 MLR of pollutant load were fairly high (0.614 < R

^{2}< 0.741), as indicated in Table 7. The performance evaluation by CV(RMSE) [35] shows that the SS model was the best and that the other models of the water quality variables were also acceptable. The range of RSR for SS, COD, BOD, and TP in the MLR models of pollutant load (Table 7) was from 0.509 to 0.559, and the performance of the MLR for these variables was good [34]. The RSR for TN was 0.622, and the performance of the TN model was satisfactory. The NSE values for the MLR models of pollutant load ranged from 0.61 to 0.74, and the MLR models of the pollutant load had good performance. As a special case, in linear regression forecasting models like this study, NSE is equal to the coefficient of determination, R

^{2}[36]. Overall, all the MLR models of the pollutant load had good prediction performance.

_{i}< 0.709) and rainfall depth (0.453 < β

_{i}< 0.563) are important influential parameters for all the load predictions. In addition, % field has relatively small effects on the SS, BOD and TP models.

^{2}values were fairly high, as indicated in Table 8. The R

^{2}value for the TN model of EMC was 0.539, and the TN model was acceptable [34]. The CV(RMSE) value of the BOD model was quite high, and the model was not acceptable. The RSR values for SS and TP in the MLR models of EMC (Table 8) were 0.587 and 0.581, respectively, and the performances of these models were good. The RSR for the TN model was 0.679, and the performance of the TN model was satisfactory. However, the RSR values for the COD and BOD models were high, and these models were unsatisfactory. The NSE values for the MLR models of the EMC show that the SS, TP, and TN models were satisfactory but that the COD and BOD models were not satisfactory. The VIF values for the EMC models were lower than 5, and the mean VIF values were not large. Overall, the MLR models for SS and TP have good prediction performance, and the TN model has acceptable performance.

_{i}< 0.426) is an important factor for the TP, TN, and COD models, and rainfall depth is important for the SS and BOD models. In the pollutant load model, rainfall depth is a very important parameter, whereas rainfall intensity is not an important explanatory variable. However, rainfall intensity is an influential parameter for the EMC of a storm event. From the Pearson correlation matrix between natural log-transformed stormwater runoff discharge and subbasin characteristics in Table 5, we also can see that EMCs are better correlated to rainfall intensity than rainfall depth, and pollutant loads are much better correlated to rainfall depth than rainfall intensity. In agricultural areas such as the study area, the larger the rainfall intensity, the more nutrients are released from fertilizer and vegetation roots. The standardized regression coefficients of the mean slope for the SS and TP load models have (−) sign, and the mean slope has a large negative influence on the SS and TP EMC. Additionally, % field also has a negative impact on the SS and TP EMC.

^{2}values were fairly high, as indicated in Table 9. The R

^{2}value for the BOD model of load/area was 0.51; thus, the BOD model was acceptable. The RSR values for SS and TP in the MLR models of load/area (Table 9) were 0.55 and 0.57, respectively, and the performances of these models were good. The RSR for the BOD model was 0.70, and the performance of the TN model was satisfactory. The NSE values in the MLR models of the load/area show that the SS, TP, and BOD models were satisfactory. The VIF values for the load/area models were less than 5, and the mean VIF values were not large. Overall, the MLR models of load/area for SS and TP have good performance, and the BOD model has acceptable performance.

_{i}< 0.634) is a highly influential parameter for all response variables in the load/area prediction. The β coefficients of the mean slope for the SS and TP load/area models are −0.79 and −0.49, respectively, and the absolute values of the coefficients are comparable to the coefficients of rainfall depth, indicating that the mean slope is a remarkable negative parameter on the SS and TP load/area results.

#### 3.3. Jackknife Validation of the MLR Model

^{2}, RSR and NSE (Table 10). The R

^{2}values were calculated by the linear regression between observed and jackknife validation values, and RSR and NSE were also calculated. The R

^{2}(Figure 3) and NSE values associated with the jackknife procedure were slightly lower than the results of the MLR models, whereas the RSR values were slightly higher than the MLR models. Therefore, the performance of the jackknife validation was slightly worse than that of the MLR models. The results of jackknife validation indicate that the MLR models are robust.

## 4. Conclusions

^{2}values for SS, COD, BOD, TN, and TP in the type 4 MLR of pollutant load were quite high (the best among the four examined MLR types), and all VIF values were less than 5. Thus, the type 4 equation was chosen as the MLR model to predict the runoff pollutant discharge.

^{2}values for the five water quality variables in the MLR of pollutant load were fairly high (0.614 < R

^{2}< 0.741), and the RSR values for SS, COD, BOD, and TP in the MLR models of pollutant load ranged from 0.509 to 0.559. Hence, the performance of the MLR for these variables was good [34]. The RSR for TN was 0.622, and the performance of the TN model was satisfactory. The NSE values for the MLR models of the pollutant load indicated good performance. Hence, most of the MLR models of the pollutant load have good prediction performance.

^{2}, RSR and NSE values, the performance of the jackknife validation was slightly worse than that of the MLR models. Thus, the results of jackknife validation indicate that the MLR models are robust.

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Kim, H.J.; Lee, K.K. A comparison of the water environment policy of Europe and South Korea in response to climate change. Sustainability
**2018**, 10, 384. [Google Scholar] [CrossRef] - Cho, J.H.; Lee, J.H. Stormwater runoff characteristics and effective management of nonpoint source pollutants from a highland agricultural region in the Lake Soyang watershed. Water
**2017**, 9, 784. [Google Scholar] [CrossRef] - Kim, B.; Jung, S. Turbid storm runoffs in Lake Soyang and their environmental effect. J. Korean Soc. Environ. Eng.
**2007**, 29, 1185–1190. [Google Scholar] - Valtanen, M.; Sillanpää, N.; Setälä, H. Key factors affecting urban runoff pollution under cold climatic conditions. J. Hydrol.
**2015**, 529, 1578–1589. [Google Scholar] [CrossRef] - Bian, G.D.; Du, J.K.; Song, M.M.; Xu, Y.P.; Xie, S.P.; Zheng, W.L.; Xu, C.Y. A procedure for quantifying runoff response to spatial and temporal changes of impervious surface in Qinhuai River basin of southeastern China. Catena
**2017**, 157, 268–278. [Google Scholar] [CrossRef] - Roman, D.C.; Vogel, R.M.; Schwarz, G.E. Regional regression models of watershed suspended-sediment discharge for the eastern United States. J. Hydrol.
**2012**, 472–473, 53–62. [Google Scholar] [CrossRef] - Tuset, J.; Vericat, D.; Batalla, R.J. Rainfall, runoff and sediment transport in a Mediterranean mountainous catchment. Sci. Total Environ.
**2016**, 540, 114–132. [Google Scholar] [CrossRef] [PubMed] - Buendia, C.; Herrero, A.; Sabater, S.; Batalla, R.J. An appraisal of the sediment yield in western Mediterranean river basins. Sci. Total Environ.
**2016**, 572, 538–553. [Google Scholar] [CrossRef] [PubMed] - Castiglioni, S.; Lombardi, L.; Toth, E.; Castellarin, A.; Montanari, A. Calibration of rainfall-runoff models in ungauged basins: A regional maximum likelihood approach. Adv. Water Resour.
**2010**, 33, 1235–1242. [Google Scholar] [CrossRef] - Tramblay, Y.; Saint-Hilaire, A.; Ouarda, T.B.M.J.; Moatar, F.; Hecht, B. Estimation of local extreme suspended sediment concentrations in California Rivers. Sci. Total Environ.
**2010**, 408, 4221–4229. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Lombardi, L.; Toth, E.; Castellarin, A.; Montanari, A.; Brath, A. Calibration of a rainfall-runoff model at regional scale by optimising river discharge statistics: Performance analysis for the average/low flow regime. Phys. Chem. Earth
**2012**, 42–44, 77–84. [Google Scholar] [CrossRef] - Ali, M.; Seeger, M.; Sterk, G.; Moore, D. A unit stream power based sediment transport function for overland flow. Catena
**2013**, 101, 197–204. [Google Scholar] [CrossRef] - Heng, S.; Suetsugi, T. Comparison of regionalization approaches in parameterizing sediment rating curve in ungauged catchments for subsequent instantaneous sediment yield prediction. J. Hydrol.
**2014**, 512, 240–253. [Google Scholar] [CrossRef] - Zhao, J.; Vanmaercke, M.; Chen, L.; Govers, G. Vegetation cover and topography rather than human disturbance control gully density and sediment production on the Chinese Loess Plateau. Geomorphology
**2016**, 274, 92–105. [Google Scholar] [CrossRef] - Paule-Mercado, M.A.; Ventura, J.S.; Memon, S.A.; Jahng, D.; Kang, J.H.; Lee, C.H. Monitoring and predicting the fecal indicator bacteria concentrations from agricultural, mixed land use and urban stormwater runoff. Sci. Total Environ.
**2016**, 550, 1171–1181. [Google Scholar] [CrossRef] [PubMed] - Eleria, A.; Vogel, R.M. Predicting fecal coliform bacteria levels in the Charles River, Massachusetts, USA. J. Am. Water Resour. Assoc.
**2005**, 41, 1195–1209. [Google Scholar] [CrossRef] - David, M.M.; Haggard, B.E. Development of regression-based models to predict fecal bacteria numbers at select sites within the Illinois River Watershed, Arkansas and Oklahoma, USA. Water Air Soil Pollut.
**2011**, 215, 525–547. [Google Scholar] [CrossRef] - Motamarri, S.; Boccelli, D.L. Development of a neural-based forecasting tool to classify recreational water quality using fecal indicator organisms. Water Res.
**2012**, 46, 4508–4520. [Google Scholar] [CrossRef] [PubMed] - Herrig, I.M.; Böer, S.I.; Brennholt, N.; Manz, W. Development of multiple linear regression models as predictive tools for fecal indicator concentrations in a stretch of the lower Lahn River, Germany. Water Res.
**2015**, 85, 148–157. [Google Scholar] [CrossRef] [PubMed] - Khan, S.; Lau, S.-L.; Kayhanian, M.; Stenstrom, M.K. Oil and grease measurement in highway runoff—Sampling time and event mean concentrations. J. Environ. Eng.
**2006**, 132, 415–422. [Google Scholar] [CrossRef] - Kayhanian, M.; Suverkropp, C.; Ruby, A.; Tsay, K. Characterization and prediction of highway runoff constituent event mean concentration. J. Environ. Manag.
**2007**, 85, 279–295. [Google Scholar] [CrossRef] [PubMed] - Ha, S.J.; Stenstrom, M.K. Predictive modeling of storm-water runoff quantity and quality for a large urban watershed. J. Environ. Eng.
**2008**, 134, 703–711. [Google Scholar] [CrossRef] - Maniquiz, M.C.; Lee, S.; Kim, L.H. Multiple linear regression models of urban runoff pollutant load and event mean concentration considering rainfall variables. J. Environ. Sci.
**2010**, 22, 946–952. [Google Scholar] [CrossRef] - Madarang, K.J.; Kang, J.-H. Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics. J. Environ. Sci.
**2014**, 26, 1313–1320. [Google Scholar] [CrossRef] - Feng, X.; Cheng, W.; Fu, B.; Lü, Y. The role of climatic and anthropogenic stresses on long-term runoff reduction from the Loess Plateau, China. Sci. Total Environ.
**2016**, 571, 688–698. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Hou, X.; Zhou, F.; Leip, A.; Fu, B.; Yang, H.; Chen, Y.; Gao, S.; Shang, Z.; Ma, L. Spatial patterns of nitrogen runoff from Chinese paddy fields. Agric. Ecosyst. Environ.
**2016**, 231, 246–254. [Google Scholar] [CrossRef] - Smith, R.E.; Goodrich, D.C.; Quinton, J.N. Dynamic, distributed simulation of watershed erosion—The KINEROS2 and EUROSEM models. Trans. ASAE
**1995**, 50, 517–520. [Google Scholar] - De Roo, A.P.J.; Offermans, R.J.E.; Cremers, N.H.D.T. LISEM: A single-event, physically based hydrological and soil erosion model for drainage basins. II: Sensitivity analysis, validation and application. Hydrol. Process.
**1996**, 10, 1119–1126. [Google Scholar] [CrossRef] - Morgan, R.P.C.; Quinton, J.N.; Smith, R.E.; Govers, G.; Poesen, J.W.A.; Auerswald, K.; Chisci, G.; Torri, D.; Styczen, M.E.; Folly, A.J. The European soil erosion model (EUROSEM): Documentation and user guide. Earth Surf. Process. Landf.
**1998**, 23, 527–544. [Google Scholar] [CrossRef] - Wu, B.; Wang, Z.; Shen, N.; Wang, S. Modelling sediment transport capacity of rill flow for loess sediments on steep slopes. Catena
**2016**, 147, 453–462. [Google Scholar] [CrossRef] - Wonju Regional Environmental Office. Monitoring and Assessment for the Nonpoint Source Pollution Management Area of Mandae, Gaah and Jaun Region; Ministry of Environment: Wonju, Korea, 2016. [Google Scholar]
- Cho, K.H.; Kang, J.H.; Ki, S.J.; Park, Y.; Cha, S.M.; Kim, J.H. Determination of the optimal parameters in regression models for the prediction of chlorophyll-a: A case study of the Yeongsan Reservoir, Korea. Sci. Total Environ.
**2009**, 407, 2536–2545. [Google Scholar] [CrossRef] [PubMed] - Gonzalez, R.A.; Noble, R.T. Comparisons of statistical models to predict fecal indicator bacteria concentrations enumerated by qPCR- and culture-based methods. Water Res.
**2014**, 48, 296–305. [Google Scholar] [CrossRef] [PubMed] - Moriasi, D.N.; Arnold, J.G.; Van Liew, M.W.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE
**2007**, 50, 885–900. [Google Scholar] [CrossRef] - Chong, A.; Lam, K.P.; Pozzi, M.; Yang, J. Bayesian calibration of building energy models with large datasets. Energy Build.
**2017**, 154, 343–355. [Google Scholar] [CrossRef] [Green Version] - Hwang, S.H.; Ham, D.H.; Kim, J.H. A new measure for assessing the efficiency of hydrological data-driven forecasting models. Hydrol. Sci. J.
**2012**, 57, 1257–1274. [Google Scholar] [CrossRef] [Green Version]

**Figure 2.**Box plots of nonpoint pollutant discharge in the Lake Soyang basin. The top (

**a**) and bottom (

**c**) of each box represent the third and first quartiles, the solid line inside the box is the second quartile (

**b**), and the dotted line inside the box is the mean. One whisker stretches from the third quartile to the maximum, and the other whisker stretches from the first quartile to the minimum.

**Figure 3.**Comparison between the observed and predicted values during storm events based on (

**a**) MLR models and (

**b**) the results of jackknife validation.

Stream | Subbasin Area (ha) | Land Use | ||||
---|---|---|---|---|---|---|

Forest (ha) | Upland Field (ha) | Paddy Field (ha) | Others (ha) | Proportion of Agricultural Land (%) | ||

Jungjohangcheon | 1022 | 860 | 150 | 0 | 12 | 14.6 |

Johangcheon | 4161 | 3556 | 489 | 0.3 | 117 | 12.0 |

Jauncheon | 13,641 | 11,703 | 1445 | 0.4 | 493 | 11.0 |

Mandaecheon | 6079 | 1261 | 1261 | 420 | 3137 | 27.7 |

Gaahcheon | 4732 | 1104 | 578 | 298 | 2752 | 18.5 |

**Table 2.**Distribution of the nonpoint pollutant discharge for the 70 rainfall events in the Lake Soyang basin.

Pollutant | Min. | 25th Percentile | 50th Percentile | 75th Percentile | Max. | Mean |
---|---|---|---|---|---|---|

SS load (kg) | 613 | 44,839 | 263,083 | 1,802,409 | 46,125,100 | 1,843,969 |

COD load (kg) | 186 | 4410 | 13,192 | 44,608 | 1,686,594 | 83,565 |

BOD load (kg) | 31 | 689 | 3477 | 15,982 | 456,773 | 20,233 |

TN load (kg) | 187 | 1852 | 6563 | 20,745 | 541,563 | 28,277 |

TP load (kg) | 1.8 | 111 | 480 | 1685 | 32,406 | 2010 |

SS (EMC) (mg/L) | 3.8 | 75.6 | 157 | 338 | 1437 | 266 |

COD (EMC) (mg/L) | 1.5 | 5.13 | 7.21 | 12.2 | 43.6 | 9.16 |

BOD (EMC) (mg/L) | 0.20 | 1.1 | 1.85 | 3.47 | 9.0 | 2.49 |

TN (EMC) (mg/L) | 0.67 | 1.89 | 3.48 | 7.63 | 11.4 | 4.75 |

TP (EMC) (mg/L) | 0.011 | 0.13 | 0.27 | 0.54 | 1.96 | 0.37 |

SS (load/area) (kg/ha) | 0.129 | 5.81 | 22.6 | 95.37 | 2118 | 130 |

COD (load/area) (kg/ha) | 0.0139 | 0.57 | 1.38 | 3.54 | 44.0 | 3.12 |

BOD (load/area) (kg/ha) | 0.0078 | 0.10 | 0.34 | 1.18 | 9.5 | 0.87 |

TN (load/area) (kg/ha) | 0.0164 | 0.21 | 0.69 | 1.93 | 20.4 | 1.54 |

TP (load/area) (kg/ha) | 0.00038 | 0.012 | 0.047 | 0.12 | 5.19 | 0.16 |

Variables | Description | Units |
---|---|---|

% field | Percentage of fields | % |

SA | Subbasin area | km^{2} |

Ndry | Number of preceding dry days | day |

Rint | Rainfall intensity | mm/h |

Slope | Mean slope of the subbasin | ° |

Rain | Rainfall depth | mm |

Dur | Rainfall duration | h |

**Table 4.**Pearson correlation matrix between stormwater runoff discharge and subbasin characteristics.

Variables | % field | SA | Rain | Dur | Ndry | Rint | Slope |
---|---|---|---|---|---|---|---|

SS (load) | −0.187 | 0.102 | 0.524 | 0.236 | −0.135 | 0.260 | 0.088 |

COD (load) | −0.211 | 0.563 | 0.317 | 0.297 | 0.015 | 0.022 | 0.239 |

BOD (load) | −0.217 | 0.458 | 0.352 | 0.353 | −0.080 | 0.011 | 0.213 |

TN (load) | −0.214 | 0.488 | 0.374 | 0.316 | −0.053 | 0.057 | 0.220 |

TP (load) | −0.205 | 0.417 | 0.514 | 0.356 | −0.150 | 0.160 | 0.178 |

SS (EMC) | 0.498 | −0.293 | 0.387 | 0.181 | −0.115 | 0.251 | −0.585 |

COD (EMC) | 0.125 | −0.199 | 0.166 | −0.098 | −0.099 | 0.397 | −0.150 |

BOD (EMC) | 0.120 | −0.317 | 0.187 | 0.095 | −0.049 | 0.147 | −0.227 |

TN (EMC) | 0.196 | −0.065 | 0.157 | −0.022 | 0.227 | 0.195 | −0.207 |

TP (EMC) | 0.355 | −0.366 | 0.223 | −0.115 | −0.216 | 0.404 | −0.391 |

SS (load/area) | 0.198 | −0.166 | 0.632 | 0.313 | −0.185 | 0.283 | −0.240 |

COD (load/area) | 0.102 | −0.095 | 0.599 | 0.367 | −0.175 | 0.224 | −0.108 |

BOD (load/area) | 0.077 | −0.147 | 0.652 | 0.476 | −0.212 | 0.172 | −0.124 |

TN (load/area) | 0.132 | −0.180 | 0.583 | 0.348 | −0.172 | 0.210 | −0.137 |

TP (load/area) | 0.108 | −0.114 | 0.441 | 0.190 | −0.148 | 0.212 | −0.106 |

**Table 5.**Pearson correlation matrix between natural log-transformed stormwater runoff discharge and subbasin characteristics.

Variables | ln(% field) | ln(SA) | ln(Rain) | ln(Dur) | ln(Ndry) | ln(Rint) | Slope |
---|---|---|---|---|---|---|---|

ln(SS(load)) | −0.12 | 0.42 | 0.58 | 0.44 | −0.12 | 0.29 | −0.16 |

ln(COD(load)) | −0.38 | 0.69 | 0.43 | 0.49 | −0.12 | 0.08 | 0.25 |

ln(BOD(load)) | −0.34 | 0.60 | 0.51 | 0.49 | −0.11 | 0.18 | 0.14 |

ln(TN(load)) | −0.32 | 0.61 | 0.47 | 0.48 | −0.12 | 0.14 | 0.17 |

ln(TP(load)) | −0.16 | 0.48 | 0.59 | 0.43 | −0.16 | 0.30 | −0.06 |

ln(SS(EMC)) | 0.40 | −0.37 | 0.47 | 0.03 | −0.11 | 0.48 | −0.63 |

ln(COD(EMC)) | 0.19 | −0.40 | 0.23 | −0.12 | −0.06 | 0.33 | −0.14 |

ln(BOD(EMC)) | 0.20 | −0.44 | 0.37 | −0.06 | −0.02 | 0.44 | −0.36 |

ln(TN(EMC)) | 0.33 | −0.49 | 0.23 | −0.13 | 0.09 | 0.35 | −0.38 |

ln(TP(EMC)) | 0.45 | −0.56 | 0.36 | −0.17 | −0.08 | 0.52 | −0.59 |

ln(SS(load/area)) | 0.34 | −0.30 | 0.64 | 0.28 | −0.16 | 0.47 | −0.54 |

ln(COD (load/area)) | 0.17 | −0.18 | 0.62 | 0.39 | −0.22 | 0.37 | −0.23 |

ln(BOD (load/area)) | 0.18 | −0.24 | 0.65 | 0.35 | −0.18 | 0.43 | −0.33 |

ln(TN(load/area)) | 0.30 | −0.37 | 0.59 | 0.29 | −0.19 | 0.41 | −0.36 |

ln(TP(load/area)) | 0.37 | −0.37 | 0.65 | 0.24 | −0.21 | 0.52 | −0.52 |

Runoff Discharge Type | MLR Type | SS | COD | BOD | TN | TP |
---|---|---|---|---|---|---|

Load | Type 1 | 0.275 | 0.425 | 0.340 | 0.386 | 0.447 |

Type 2 | 0.764 | 0.672 | 0.641 | 0.654 | 0.801 | |

Type 3 | 0.720 | 0.687 | 0.688 | 0.614 | 0.689 | |

Type 4 | 0.736 | 0.687 | 0.694 | 0.614 | 0.741 | |

EMC | Type 1 | 0.477 | 0.157 | 0.100 | 0.254 | 0.33 |

Type 2 | 0.646 | 0.123 | 0.273 | 0.321 | 0.584 | |

Type 3 | 0.536 | 0.226 | 0.324 | 0.539 | 0.592 | |

Type 4 | 0.655 | 0.226 | 0.324 | 0.539 | 0.662 | |

Load/Area | Type 1 | 0.448 | 0.359 | 0.460 | 0.340 | 0.195 |

Type 2 | 0.734 | 0.503 | 0.526 | 0.497 | 0.686 | |

Type 3 | 0.640 | 0.424 | 0.496 | 0.471 | 0.651 | |

Type 4 | 0.695 | 0.427 | 0.509 | 0.471 | 0.675 |

Response Variable | Explanatory Variables | a_{0} | a_{i} | β_{i} | VIF | DW | R^{2} | RMSE | CV(RMSE) | RSR | NSE |
---|---|---|---|---|---|---|---|---|---|---|---|

ln(SS load) | Intercept | 12.50 | 1.773 | 0.736 | 1.230 | 0.099 | 0.514 | 0.736 | |||

ln(% field) | −2.31 | −0.40 | 4.017 | ||||||||

ln(SA) | 0.81 | 0.58 | 1.610 | ||||||||

ln(Rain) | 1.78 | 0.56 | 1.008 | ||||||||

Slope | −0.25 | −0.75 | 3.377 | ||||||||

ln(COD load) | Intercept | 0.44 | 1.746 | 0.687 | 1.135 | 0.119 | 0.559 | 0.687 | |||

ln(SA) | 0.86 | 0.71 | 1.001 | ||||||||

ln(Rain) | 1.24 | 0.45 | 1.001 | ||||||||

ln(BOD load) | Intercept | 4.92 | 1.938 | 0.694 | 1.172 | 0.145 | 0.553 | 0.694 | |||

ln(% field) | −1.53 | −0.30 | 4.017 | ||||||||

ln(SA) | 0.80 | 0.64 | 1.610 | ||||||||

ln(Rain) | 1.44 | 0.51 | 1.008 | ||||||||

Slope | −0.12 | −0.41 | 3.377 | ||||||||

ln(TN load) | Intercept | 0.72 | 1.873 | 0.614 | 1.121 | 0.127 | 0.622 | 0.614 | |||

ln(SA) | 0.67 | 0.63 | 1.001 | ||||||||

ln(Rain) | 1.19 | 0.49 | 1.001 | ||||||||

ln(TP load) | Intercept | 3.44 | 1.539 | 0.741 | 1.047 | 0.174 | 0.509 | 0.741 | |||

ln(% field) | −1.36 | −0.28 | 4.025 | ||||||||

ln(SA) | 0.77 | 0.64 | 1.620 | ||||||||

ln(Rain) | 1.50 | 0.56 | 1.039 | ||||||||

ln(Ndry) | −0.24 | −0.14 | 1.049 | ||||||||

Slope (°) | −0.17 | −0.60 | 3.423 |

_{0}is the regression constant; a

_{i}is the regression coefficient of the explanatory variable X

_{i}; β

_{i}is the standardized regression coefficient.

Response Variables | Explanatory Variables | a_{0} | a_{i} | β_{i} | VIF | DW | R^{2} | RMSE | CV(RMSE) | RSR | NSE |
---|---|---|---|---|---|---|---|---|---|---|---|

ln(SSEMC) | Intercept | 10.94 | 1.654 | 0.655 | 0.875 | 0.180 | 0.587 | 0.655 | |||

ln(% field) | −1.79 | −0.50 | 4.017 | ||||||||

ln(SA) | −0.17 | −0.19 | 1.610 | ||||||||

ln(Rain) | 0.83 | 0.42 | 1.008 | ||||||||

Slope | −0.19 | −0.93 | 3.377 | ||||||||

ln(CODEMC) | Intercept | 2.46 | 1.338 | 0.226 | 0.515 | 0.252 | 0.880 | 0.226 | |||

ln(SA) | −0.12 | −0.35 | 1.054 | ||||||||

ln(Rint) | 0.22 | 0.26 | 1.054 | ||||||||

ln(BODEMC) | Intercept | −0.07 | 1.129 | 0.324 | 0.723 | 1.223 | 0.822 | 0.324 | |||

ln(SA) | −0.23 | −0.43 | 1.001 | ||||||||

ln(Rain) | 0.42 | 0.36 | 1.001 | ||||||||

ln(TNEMC) | Intercept | 2.47 | 0.843 | 0.539 | 0.504 | 0.384 | 0.679 | 0.539 | |||

ln(SA) | −0.29 | −0.65 | 1.054 | ||||||||

ln(Rint) | 0.25 | 0.23 | 1.054 | ||||||||

ln(TPEMC) | Intercept | 4.62 | 0.919 | 0.662 | 0.723 | −0.481 | 0.581 | 0.662 | |||

ln(% field) | −1.14 | −0.38 | 4.017 | ||||||||

ln(SA) | −0.25 | −0.34 | 1.693 | ||||||||

ln(Ndry) | −0.19 | −0.19 | 1.057 | ||||||||

ln(Rint) | 0.75 | 0.43 | 1.100 | ||||||||

Slope | −0.12 | −0.69 | 3.396 |

Response Variables | Explanatory Variables | a_{0} | a_{i} | β_{i} | VIF | DW | R^{2} | RMSE | CV(RMSE) | RSR | NSE |
---|---|---|---|---|---|---|---|---|---|---|---|

ln(SS load/area) | Intercept | 5.70 | 1.728 | 0.695 | 1.246 | 0.408 | 0.552 | 0.695 | |||

ln(% field) | −1.81 | −0.33 | 3.365 | ||||||||

ln(Rain) | 1.79 | 0.60 | 1.007 | ||||||||

Slope | −0.25 | −0.79 | 3.376 | ||||||||

ln(COD load/area) | Intercept | −3.88 | 1.779 | 0.427 | 1.123 | 4.710 | 0.757 | 0.427 | |||

ln(Rain) | 1.22 | 0.61 | 1.003 | ||||||||

Slope | −0.04 | −0.19 | 1.003 | ||||||||

ln(BOD load/area) | Intercept | −5.65 | 1.927 | 0.509 | 1.206 | −0.995 | 0.701 | 0.509 | |||

ln(Rain) | 1.47 | 0.63 | 1.003 | ||||||||

Slope | −0.07 | −0.29 | 1.003 | ||||||||

ln(TN load/area) | Intercept | −3.88 | 1.873 | 0.471 | 1.121 | −2.108 | 0.727 | 0.471 | |||

ln(SA) | −0.33 | −0.36 | 1.001 | ||||||||

ln(Rain) | 1.19 | 0.58 | 1.001 | ||||||||

ln(TP load/area) | Intercept | −6.57 | 1.519 | 0.675 | 1.090 | −0.328 | 0.570 | 0.675 | |||

ln(Rain) | 1.53 | 0.60 | 1.033 | ||||||||

ln(Ndry) | −0.25 | −0.15 | 1.036 | ||||||||

Slope | −0.13 | −0.49 | 1.011 |

**Table 10.**Three performance indicators for the stormwater runoff discharge values based on jackknife validation.

Response Variable (Jackknife) | R^{2} | RSR | NSE |
---|---|---|---|

ln(SS load) | 0.694 | 0.554 | 0.693 |

ln(COD load) | 0.630 | 0.611 | 0.627 |

ln(BOD load) | 0.607 | 0.630 | 0.603 |

ln(TN load) | 0.550 | 0.674 | 0.545 |

ln(TP load) | 0.609 | 0.629 | 0.605 |

ln(SS EMC) | 0.537 | 0.684 | 0.533 |

ln(COD EMC) | 0.155 | 0.924 | 0.147 |

ln(BOD EMC) | 0.211 | 0.894 | 0.202 |

ln(TN EMC) | 0.503 | 0.730 | 0.468 |

ln(TP EMC) | 0.601 | 0.633 | 0.599 |

ln(SS load/area) | 0.655 | 0.588 | 0.654 |

ln(COD load/area) | 0.305 | 0.845 | 0.287 |

ln(BOD load/area) | 0.478 | 0.723 | 0.477 |

ln(TN load/area) | 0.413 | 0.768 | 0.410 |

ln(TP load/area) | 0.602 | 0.632 | 0.600 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cho, J.H.; Lee, J.H.
Multiple Linear Regression Models for Predicting Nonpoint-Source Pollutant Discharge from a Highland Agricultural Region. *Water* **2018**, *10*, 1156.
https://doi.org/10.3390/w10091156

**AMA Style**

Cho JH, Lee JH.
Multiple Linear Regression Models for Predicting Nonpoint-Source Pollutant Discharge from a Highland Agricultural Region. *Water*. 2018; 10(9):1156.
https://doi.org/10.3390/w10091156

**Chicago/Turabian Style**

Cho, Jae Heon, and Jong Ho Lee.
2018. "Multiple Linear Regression Models for Predicting Nonpoint-Source Pollutant Discharge from a Highland Agricultural Region" *Water* 10, no. 9: 1156.
https://doi.org/10.3390/w10091156