Modeling Multi-Event Non-Point Source Pollution in a Data-Scarce Catchment Using ANN and Entropy Analysis

Event-based runoff–pollutant relationships have been the key for water quality management, but the scarcity of measured data results in poor model performance, especially for multiple rainfall events. In this study, a new framework was proposed for event-based non-point source (NPS) prediction and evaluation. The artificial neural network (ANN) was used to extend the runoff–pollutant relationship from complete data events to other data-scarce events. The interpolation method was then used to solve the problem of tail deviation in the simulated pollutographs. In addition, the entropy method was utilized to train the ANN for comprehensive evaluations. A case study was performed in the Three Gorges Reservoir Region, China. Results showed that the ANN performed well in the NPS simulation, especially for light rainfall events, and the phosphorus predictions were always more accurate than the nitrogen predictions under scarce data conditions. In addition, peak pollutant data scarcity had a significant impact on the model performance. Furthermore, these traditional indicators would lead to certain information loss during the model evaluation, but the entropy weighting method could provide a more accurate model evaluation. These results would be valuable for monitoring schemes and the quantitation of event-based NPS pollution, especially in data-poor catchments.


Introduction
Non-point source (NPS) pollution has resulted in the deterioration of water bodies and has become a major environmental threat among most counties [1,2].The quantification of the rainfall-runoff process and the resulting NPS pollutants is essential for developing mitigation strategies, which are the basis for watershed management [3].The rainfall process is the major driving force for NPS, thus rainfall-runoff-pollutant (R-R-P) relationships have become the focus of watershed research [4,5].Many studies have been conducted in the fields of rainfall-runoff relationships but have rarely involved the runoff-pollutant relationship, especially for the event-based estimation of NPS loads [6][7][8].
The NPS processes can be expressed from the event-step to long-term steps.Event-based NPS exports and the resulting change in water quality can provide detailed features of the NPS, which is more appropriate for the design of storm-based management practices [9].Models are developed to construct the runoff-pollutant relationship, and the discrepancies of the collected measured data in different rainfall patterns would have a considerable influence on the model construction.Identifying the correlation among the series of rainfall, runoff and pollutant loads for multiple rainfall events is inevitable for NPS model construction.Although many models are well suited for offline water quality analyses, Soil and Water Assessment Tool (SWAT) is more representative than any other models [10].
Entropy 2017, 19, 265 2 of 18 However, owing to limited human resources, data scarcity has become one of the key barriers to establish the R-R-P relationship, especially for event-based process [11,12].Thus, the application of watershed models such as SWAT for assessing NPS pollution is also limited by temporal resolution which ranges from annual to sub-hourly averages.The SWAT model usually operates continuously at a daily time step, which ensures that the long-term impacts of NPS can be quantified.Sub-daily calculations of runoff, erosion, and sediment transport are also available in new version of SWAT by sub-daily rainfall input and Green and Ampt method, though few attempts have reached to that higher temporal resolution.In the future, we would develop other more appropriate models to solve this problem.Currently, acceptable rainfall and streamflow data sets are more readily available, especially because of the recent development of data centers and satellite data observations.However, hourly or sub-hourly flow data for high-frequency time series are still limited, especially with respect to event-based hydrological studies for data-poor regions [7].Water quality records, which are based on periodic monitoring by human resources, are thus even scarcer.Therefore, data scarcity for NPS predictions is unavoidable for multiple-rainfall event simulations.Typically, we collected samples during multiple rainfall events in the monitoring process but discarded some of events from further analysis, especially for light rainfall, for which only a few data points exist.This treatment of incomplete data would result in the loss of information, especially for multiple rainfall events among data-scarce regions.
Currently, statistical models have been widely used to estimate rainfall-runoff relationships for its ease of application without considering a large amount of delicate formulas and parameters [13].For example, unit hydrographs (UH), as one of the most famous methods, is used to estimate a direct runoff hydrograph of a given rainfall duration.Meanwhile, statistical models are used to simulate pollutant loads based on the established runoff-pollutant relationship.For example, Park and Engel [14] developed Load Estimator (LOADEST) to predict pollutant concentration (or load) on days when flow data were measured, and the results showed that absolute values of errors in the annual sediment load estimation decreased from 39.7% to 10.8%.Meanwhile, most of the findings demonstrate that the LOADEST model could provide more accurate results and may be useful for simulating runoff-pollutant processes [15][16][17].However, the LOADEST model has strict requirements on the number of data points, which should include continuous flow data and dispersed water quality data, and its calibration process is relatively complicated.
Owing to the limited measured data, the black-box model might be a substitution to construct the logical relationships between runoff and pollutant loads for multiple-events processes.The artificial neural network (ANN) with the characteristics of self-learning and adaptability has become the most commonly used tool in environmental prediction, and it is also available for poor-data regions.This method is applicable to simulate the imaginal thinking of the human brain, for which the most prominent characteristic is the parallel processing of information and distributed storage.As an example, Melesse et al. [18] used the ANNto estimate suspended sediment loads for three major rivers.The results showed that daily predictions were better than weekly predictions.Therefore, it can be seen that ANN models have flexible structures that allow multi-input and multi-output modeling.This is particularly important in streamflow forecasting where inflows at multiple locations are considered within a given catchment [19].Though the application of ANN in the field of load production has proliferated in recent years, the impact of data scarcity on its prediction capabilities during different rainfall patterns still creates limitations [20].
Simulation evaluation is the most important step for the setup of statistical models [21].In traditional applications, the model evaluation is usually performed using a single regression goodness-of-fit indicator, the most common of which is the point-to-point pairs (a series of single data pairs) of the predicted and measured data.However, this might lead to the loss of specific information, resulting in dubitable simulation results.In this case, a joint evaluation should be a substitute for the traditional single indicator.With the high precision and objectivity, the entropy regulates the uncertainty of different criteria from different perspectives [22].Compared with the traditional single indicators, it can combine different indicators to evaluate the discrepancy between causes comprehensively.For instance, Khosravi et al. [23] sought to map the flooding susceptibility using different bivariate methods, including Shannon's entropy, the statistical index and the weighting factor.Yuan et al. [24] developed an entropy method to find the weight sum of the information entropy maximum to allocate the reduction of pollutants for the main seven valleys in China.The entropy weighting method may be an efficient way to evaluate the regulation of the simulation results and to balance the strengths and weaknesses of the results.However, these studies do not provide much attention to event-based NPS predictions, especially for data-scarce catchments.
This study surveys the motivation for a methodology of action, looks at the difficulties posed by data scarcity and outlines the need for the development of possibility methods to cope with data scarcity in multiple rainfall events.The objectives of this work are: (1) to identify the impacts of different rainfall patterns on the model construction using a complete data series; (2) to simulate the scarce pollutant data in other data-scarce rainfall events; and (3) to test the application of the entropy weighting method for the evaluation of ANN.

Materials and Methods
A prediction-evaluation framework is proposed for the NPS prediction for data-scarce catchments, the flow chart of methods is shown in Figure 1.The ANN is proposed to simulate the missing data points during multiple-events, and the entropy weighting method is used as a comprehensive indicator to construct the model.As a necessary supplement, the interpolation method is used for tail correction during multiple rainfall events.Data-scarce rainfall events denote the absence of data, especially for measured flow and water quality, in a given period of time due to human mistakes during high-resolution monitoring process.Instead, complete rainfall events are defined if there are no measured flow and water quality data scarcity.The demonstration of traditional indicators is shown in Section 2.2.
Entropy 2017, 19, 265 3 of 18 using different bivariate methods, including Shannon's entropy, the statistical index and the weighting factor.Yuan et al. [24] developed an entropy method to find the weight sum of the information entropy maximum to allocate the reduction of pollutants for the main seven valleys in China.The entropy weighting method may be an efficient way to evaluate the regulation of the simulation results and to balance the strengths and weaknesses of the results.However, these studies do not provide much attention to event-based NPS predictions, especially for data-scarce catchments.This study surveys the motivation for a methodology of action, looks at the difficulties posed by data scarcity and outlines the need for the development of possibility methods to cope with data scarcity in multiple rainfall events.The objectives of this work are: (1) to identify the impacts of different rainfall patterns on the model construction using a complete data series; (2) to simulate the scarce pollutant data in other data-scarce rainfall events; and (3) to test the application of the entropy weighting method for the evaluation of ANN.

Materials and Methods
A prediction-evaluation framework is proposed for the NPS prediction for data-scarce catchments, the flow chart of methods is shown in Figure 1.The ANN is proposed to simulate the missing data points during multiple-events, and the entropy weighting method is used as a comprehensive indicator to construct the model.As a necessary supplement, the interpolation method is used for tail correction during multiple rainfall events.Data-scarce rainfall events denote the absence of data, especially for measured flow and water quality, in a given period of time due to human mistakes during high-resolution monitoring process.Instead, complete rainfall events are defined if there are no measured flow and water quality data scarcity.The demonstration of traditional indicators is shown in Section 2.2.

The Description of the ANN
The back propagation algorithm is a supervised learning method based on the commonly used steepest descent method to minimize global errors [25], while it is also the multilayer feedforward network based on the error back propagation algorithm [2,26].It accumulates an abundant mapping relation of the input-output pattern and does not need to reveal mathematical equations to describe the mapping relation before calculation.The ANN may be an efficient method to adjust the weights and thresholds through back propagation to minimize the sum of the squared errors.As shown in Figure 2, the topological structure of the ANN consists of an input layer, a hidden layer and an output layer [27].
The learning mechanism of the ANN is shown in Figure 2, where x i is the input signal and w i is the weight coefficient.The outside input samples x 1 , x 2 , . . ., x n are accepted into the input layer, and the network weight coefficients are adjusted during training.The discrete values, 0 and 1, are selected as the input sampling signals.By comparing the network output signals and the expected output signals to generate the error signals, the weight coefficients of the learning system can be rectified based through iterative adjustments to minimize the errors until reaching an acceptable range [28].In this process, the expected output signals are regarded as the teacher signals, which are compared with the actual output, and the errors produced are applied to rectify the weight coefficients.At the point when the actual output values and expected values are nearly the same, the process is concluded [26].Finally, the results are produced through and equation of U based on the weight coefficients and are exported by the output layers.In the ANN training process, three prime criteria can be summarized: the error surface gradient can converge rapidly, the mean squared error is below the error of the preset level, and the correlation coefficient of the training results is more than 0.9, indicating that training results are an improvement [29].This section briefly surveys the measurement for methodology, while the ANN should be judged for whether each indicator can or cannot reach the given standards.
In this study, multiple rainfall events are used as the input conditions.Multiple rainfall events are divided into either the training process or the simulation process based on the data conditions.To establish the black-box model, data of three complete rainfall events are first input into the layer, including light, moderate, and heavy rainfall patterns.The training results also indicate that the ANN is applicable for various rainfall patterns.In the simulation process, the flow data for all the rainfall and water quality information for the data-complete rainfall events for the same rainfall pattern are regarded as the input layer.The hidden layer contains the water quality data for the data-scarce rainfall events which correspond to all the flow and water quality data in the input layer.To obtain the output layer, the training layer feedbacks the results into the prediction interval.Finally, the output layer is simulated using the input data of the input layer.

The Description of the ANN
The back propagation algorithm is a supervised learning method based on the commonly used steepest descent method to minimize global errors [25], while it is also the multilayer feedforward network based on the error back propagation algorithm [2,26].It accumulates an abundant mapping relation of the input-output pattern and does not need to reveal mathematical equations to describe the mapping relation before calculation.The ANN may be an efficient method to adjust the weights and thresholds through back propagation to minimize the sum of the squared errors.As shown in Figure 2, the topological structure of the ANN consists of an input layer, a hidden layer and an output layer [27].
The learning mechanism of the ANN is shown in Figure 2, where xi is the input signal and wi is the weight coefficient.The outside input samples x1, x2, …, xn are accepted into the input layer, and the network weight coefficients are adjusted during training.The discrete values, 0 and 1, are selected as the input sampling signals.By comparing the network output signals and the expected output signals to generate the error signals, the weight coefficients of the learning system can be rectified based through iterative adjustments to minimize the errors until reaching an acceptable range [28].In this process, the expected output signals are regarded as the teacher signals, which are compared with the actual output, and the errors produced are applied to rectify the weight coefficients.At the point when the actual output values and expected values are nearly the same, the process is concluded [26].Finally, the results are produced through and equation of U based on the weight coefficients and are exported by the output layers.In the ANN training process, three prime criteria can be summarized: the error surface gradient can converge rapidly, the mean squared error is below the error of the preset level, and the correlation coefficient of the training results is more than 0.9, indicating that training results are an improvement [29].This section briefly surveys the measurement for methodology, while the ANN should be judged for whether each indicator can or cannot reach the given standards.
In this study, multiple rainfall events are used as the input conditions.Multiple rainfall events are divided into either the training process or the simulation process based on the data conditions.To establish the black-box model, data of three complete rainfall events are first input into the layer, including light, moderate, and heavy rainfall patterns.The training results also indicate that the ANN is applicable for various rainfall patterns.In the simulation process, the flow data for all the rainfall and water quality information for the data-complete rainfall events for the same rainfall pattern are regarded as the input layer.The hidden layer contains the water quality data for the data-scarce rainfall events which correspond to all the flow and water quality data in the input layer.To obtain the output layer, the training layer feedbacks the results into the prediction interval.Finally, the output layer is simulated using the input data of the input layer.

The Description of the Entropy Weighting Method
Three commonly used indicators, the mean relative error ( ̅ ), the standard deviation of the relative error (S), and the load deviation percentage (deviation), are selected to evaluate the simulation results [30,31].The formulas are shown as followed:

The Description of the Entropy Weighting Method
Three commonly used indicators, the mean relative error (d), the standard deviation of the relative error (S), and the load deviation percentage (deviation), are selected to evaluate the simulation results [30,31].The formulas are shown as followed: Entropy 2017, 19, 265 where O i is the set of measured data, P i is the set of predicted data, and O i origin denotes the total loads of the original conditions, and is the mean value of the measured data.Each of the three indicators represents the credibility of the measurements based on the discrepancy between the measured and simulated values.Lower indicator values indicate that the fitting between the simulated and measured data is improved, and the model is considered to have a satisfactory performance.However, single indicators have limitation on amount of information loss.Therefore, these indicators are handled with the entropy weighting method for a more comprehensive assessment of the ANN.Based on the fundamental principles of information theory, information is a measurement of the degree of order for a given system, and the entropy is a measurement of the degree of disorder [32].The entropy weighting method serves as a mathematic method and considers the information provided by each factor [33]. Information entropy is negatively associated with the increase in information provided by different indicators, and a smaller information entropy result in higher weights for each single indicator.As an objective and comprehensive method, the entropy weighting method considers the advantages of every indicator and makes a synthetic evaluation.This principle is as follows: Firstly, an n × m origin data matrix is established according to the selected evaluation indicators: where m denotes the evaluation indicator, and individual rows represent different evaluation objects.Therefore, matrix X is known.
A second, positive matrix should be established with a transformation following same trend.The transformed matrix is Matrix Y is normalized, and the ratio of each column vector y ij and the sum of all elements in this matrix should be normalized.The formulas for these calculations are: where Z ij are the elements of the normalized matrix.The operational formula in the process of generating the entropy weights of the evaluation indicators is Entropy 2017, 19, 265 where k is a normalizing constant, k = 1/lnn, and Z ij is the j-th the probability of the element of the i-th evaluation unit.Entropy values of the evaluation indicators should be transformed into the weighted values: where 0 ≤ w i ≤ 1 and ∑ m j=1 w j = 1 are the acquired weighted values.Finally, the comprehensive weighting values for each evaluation indicator should be ensured.The weighted values of each indicator are multiplied with the corresponding indicators and summed.The evaluation model is where U represents the comprehensive evaluation function of the entropy weights for each evaluation indicator.This function reflects the comprehensive characteristics of the evaluation objective, which avoids limiting these indicators [34].
The principle of the entropy weighting method is that information for each evaluation unit will be qualified and synthesized, while every factor is weighted to simplify the evaluation process [35].Therefore, the weight values can be ascertained with the entropy weighting method, and we choose the deviation, d, and S as the evaluation indicators.

Method for Tail Correction
Statistical models would result in tail deviation problems if data scarcity exists in this study.This problem addressed through data interpolation for the tail deviation.Therefore, linear interpolation, as a common-used method, is used to obtain the missing values of the other data points.Two values of the function f (x) are used to reduce the errors in the tail of the pollutographs.This approach is relatively straightforward and is used widely in the field of mathematics or computer graphics.The error of the approximate method can be defined as follows: where ρ represents the linear interpolated polynomial: As a result of Rolle's theorem, if f (x) has two continuous derivatives, the error range is As shown in Formula ( 12), the approximate error of the linear interpolation increases with the function curvature.

Study Areas
As shown in Figure 3, the Zhangjiachong catchment, which is a representative area in the Three Gorges Reservoir Region (TGRR), is selected as a case study [36].It covers a drainage area of 1.62 km 2 , and the landscape is primarily mountainous, with an elevation between 148 m and 530 m above the Yellow Sea level.Agriculture and forests cover the majority of the total area.The main local crops are tea, corn, oil seed rape, and chestnuts [37].The background values of nitrogen and phosphorus are higher because the fertilizer usage is relatively high, resulting in a high risk of nutrient loss into nearby streams [38].
The average annual temperature is approximately 18 • C, and the average annual precipitation is approximately 1439 m, 80% of which occurs from May to August.Thus, soil erosion frequently occurs during wet seasons, and results in an increase in the pollutant loads with increased runoff.We consider that the variation of rainfall might impact the model accuracy.Therefore, identifying the classification of rainfall patterns should be determined before any simulations.According to the investigation results of existing rainfall data, rainfall patterns are divided into light, moderate, and heavy events.Meanwhile, based on our monitoring data, a majority of rainfall events in the Zhangjiachong catchment are considered moderate events, while heavy events are rare.

Field Monitoring and Data Record
In this study, field monitoring data were collected from 1 January 2013 to 31 December 2014 and the rainfall, streamflow and pollutant data during eight rainfall events were recorded.The data used in this study represent three complete rainfall events, which include light, middle, and heavy rainfall (21 April 2014, 24 July 2014, and 5 August 2014), and five other data-scarce events (15 April 2014, 23 August 2014, 20 July 2014, 5 July 2013, and 28 August 2013).Data-scarce rainfall events denote the absence of data, especially for measured flow and water quality, in a given period of time due to human mistakes during high-resolution monitoring process.Instead, complete rainfall events are defined if there are no measured flow and water quality data scarcity.The equations with explicit parameters are constructed through a training process with complete data of the three complete rainfall events, and the constructed ANN is used to predict the missing NPS data in the other five data-scarce rainfall events.The output layer includes pollutant load data for five data-scarce rainfall events.
The weather station (Skye Lynx Standard) provided continuous records for climate data and a float-operator sensor (WGZ-1) was located at the catchment outlet, where high-frequency sampling was recorded in approximately 15 min steps.Base flows were measured before the runoff started, and water samples were collected every 15 min in the first hour after runoff began and every 30 min over the following two hours.After water levels had stabilized, water samples were collected once every hour until the end of the event.All water samples were placed in pre-cleaned glass jars with aluminumfoil liners along the lids and stored at −20 °C during transportation to the laboratory for processing and analysis.Specifically, the total nitrogen of NPS (NPS-TN) levels were measured via Alkaline persulfate oxidation-UV spectrophoto metric with the detection limitation from 0.05 mg/L to 4.0 mg/L, while the total phosphorus of NPS (NPS-TP) levels in the samples were measured via Potassium persulfate oxidation-molybdenum blue colorimetric methods.The main instrument is ultraviolet spectrophotometer.Finally, the recorded rainfall, flow and pollutant levels were used for the following analysis.
However, flow and water quality data were limited because of the use of flow instruments via manual collection.Rainfall levels were recorded to divide the rainfall into light, moderate, and heavy events.The rainfall levels for 21 April 2014, 24 July 2014, and 5 August 2014 are 1.308 mm/h, 3.000 mm/h, and 6.054 mm/h, respectively.The flow data were replenished with unit hydrographs  ).Data-scarce rainfall events denote the absence of data, especially for measured flow and water quality, in a given period of time due to human mistakes during high-resolution monitoring process.Instead, complete rainfall events are defined if there are no measured flow and water quality data scarcity.The equations with explicit parameters are constructed through a training process with complete data of the three complete rainfall events, and the constructed ANN is used to predict the missing NPS data in the other five data-scarce rainfall events.The output layer includes pollutant load data for five data-scarce rainfall events.
The weather station (Skye Lynx Standard) provided continuous records for climate data and a float-operator sensor (WGZ-1) was located at the catchment outlet, where high-frequency sampling was recorded in approximately 15 min steps.Base flows were measured before the runoff started, and water samples were collected every 15 min in the first hour after runoff began and every 30 min over the following two hours.After water levels had stabilized, water samples were collected once every hour until the end of the event.All water samples were placed in pre-cleaned glass jars with aluminumfoil liners along the lids and stored at −20 • C during transportation to the laboratory for processing and analysis.Specifically, the total nitrogen of NPS (NPS-TN) levels were measured via Alkaline persulfate oxidation-UV spectrophoto metric with the detection limitation from 0.05 mg/L to 4.0 mg/L, while the total phosphorus of NPS (NPS-TP) levels in the samples were measured via Potassium persulfate oxidation-molybdenum blue colorimetric methods.The main instrument is ultraviolet spectrophotometer.Finally, the recorded rainfall, flow and pollutant levels were used for the following analysis.However, flow and water quality data were limited because of the use of flow instruments via manual collection.Rainfall levels were recorded to divide the rainfall into light, moderate, and heavy events.The rainfall levels for 21 April 2014, 24 July 2014, and 5 August 2014 are 1.308 mm/h, 3.000 mm/h, and 6.054 mm/h, respectively.The flow data were replenished with unit hydrographs as the basis for the ANN.In addition, this catchment is dominated by agriculture, so fertilizer use results in deteriorated water quality.Therefore, the NPS-TN and NPS-TP are selected as the evaluation indicators.All the data for the three complete rainfall events and five typical data-scarce rainfall events are shown in Tables 1 and 2, respectively, including the rainfall intensity, flow data, and the pollutant concentration of the NPS-TN and NPS-TP.As shown in Table 1, complete data are used as the input of the ANN and represent the impacts of the rainfall patterns on the model applicability.As shown in Table 2, data scarcity of the five random rainfall events is simulated, and the impacts of the data scarcity are quantified.

Training Results of the ANNUsing the Complete Data
This section demonstrates the training process with data for the three complete rainfall events, illustrating that the applicability of ANN in different rainfall patterns.The training results for the ANN areas followed (the figure is shown in the Supplementary Materials): the error surface gradient rapidly converges to a flat surface for both the NPS-TN and NPS-TP.The mean squared error of the training results for the NPS-TP prediction reaches the 10 −3 , 10 −2 , and 10 −1 orders of magnitude for the light, moderate, and heavy rainfall events, respectively.However, the mean squared error for the NPS-TN prediction reaches the 1.0, 10 −3 , and 1.0 orders of magnitude during the light, moderate, and heavy rainfall events, respectively, indicating that all the results fall within the range of permissible errors or rapidly reach a flat surface.The correlation coefficients are more than 0.9, indicating that all the training results are good.In this respect, it can be said that the ANN is applicability to simulate the NPS for different rainfall patterns, and we extrapolated ANN for pollutant load simulations in the data-scarce rainfall events.
To better understand the simulation results, the entropy weighting method was used in the evaluation process.As shown in Table 3, K results are all higher than 0.9, indicating that there is no obvious deviation between the simulated and measured values.Meanwhile, the K values for the NPS-TP are higher than the NPS-TN for different rainfall patterns, and the K value for the light rainfall is higher than the other rainfall patterns.Therefore, it is apparent that the NPS-TP simulation is an improvement over the NPS-TN, and the simulation is better suited for the light rainfall events for both the NPS-TP and NPS-TN.It is obvious that the flow have different shear force in different rainfall patterns.The soil particles and pollutants act differently with different rainfall levels and intensities.It is possible that our monitoring scheme is more appropriate in light rainfall patterns in this experiment, and the peak data cannot be monitored during heavy rainfall patterns [39].The NPS-TN concentration peak and flow peak appear to be consistent.When one of the flow or load peaks is missing, it is the same as both of them missing simultaneously, resulting in a poor simulation effect.However, the apparent time of the NPS-TP concentration peak and flow peak is inconsistent in different rainfall patterns.Xu et al. [40] introduced the support vector regression (SVR) model to develop a quantitative relationship between the environmental factors and the eutrophic indices compared with the ANN.The results show that the correlation coefficients of the NPS-TP are greater than those for the NPS-TN, indicating that the model effect of the NPS-TP is improved over the NPS-TN.This study verifies this conclusion with the ANN model.Five typical data-scarce rainfall events were used to discuss the impact of different data-scarce patterns on the NPS predictions.As shown in Figure 4, the NPS-TP training results for the NPS-TN have a faster convergence rate for the grads and lower mean squared errors.The training values for the NPS-TN are represented by an R 2 value that is more than 0.9, and the mean squared errors are under the permissible values or reach the flat surface rapidly.However, only one event (5 July 2013) was observed to have lower grads beyond the preset value, and its training effect was the worst because this rainfall event has peak scarcity.
The entropy values in the five data-scarce rainfall patterns are shown in Table 4. Combined with the complete data events, it is apparent that the simulated effect for 5 July 2013 has a worse fit compared with the other rainfall events, which reflects the poor training effect when there is a scarcity of peak concentration data.The peak data are the key information, and reflect the overall process of the rainfall events.However, the peak scarcity is unintentional and due to system errors.In addition, the training effect of the NPS-TP is improved over the NPS-TN.We further compared the evaluation results between the traditional methods and the entropy weighting method, which are shown in Table 4.As shown in the results, the rank order of the effects of the simulation results with the traditional indicators (high to low) as the following: deviations: The application of a single indicator is limited by the indicator selection so that we cannot sum them up simply or select one of them.For instance, the effect of 5 July 2013 showed the best deviation but the worst S, which represents the rainfall amount and the average rainfall, respectively.Therefore, choosing these traditional indicators would lead to information loss during the model evaluation.Conversely, the entropy weighting method considers the advantages and characteristics of each traditional indicators and assesses the simulation results comprehensively form different perspectives [34,35].Thus, the K values are more accurate and easier to compare.Owing to the limited water quality data, we randomly selected 30% of the measured values as verification points, and the simulated data points were compared to the selected data to test the accuracy of the ANN during data-scarce conditions.The evaluated results are shown in Table 5, and the intuitionistic indicator is the mean percentage of the load deviation.As shown in Table 5, the effects of the training results for the runoff-pollutant load process are better in different rainfall events, and each of the load deviations is smaller.The mean percentage load deviations of the three events (15 April 2014, 23 August 2014, 28 August 2013) are higher than the other events.This is because these pollutant load data for the three individual rainfall events have peak loads nearby the flow peaks.It is apparent that the flow peak and these high values have major impacts on the training and predictive values of the ANN.In general, the simulation effects are improved, which shows that this method is feasible for estimating scarce pollutant load data.

Implication for NPS Studies of Multi-Events
Figure 5 compares pollutographs of complete rainfall events with pollutographs simulated by ANN in the same rainfall pattern (data-scarce rainfall events in light and moderate rainfall patterns).Most of the pollutographs conform to the ordinary rules (pollutographs in complete data rainfall events), and the overall tendency is consistent with the hydrographs with complete data, indicating that the method is reliable.Moreover, the model performance is worse under conditions of missing peak data, which is consistent with the abovementioned conclusions.According to the comparisons, the pollutographs of the measured points are more consistent with the ordinary rules than when the tails have missing data.Meanwhile, tail scarcity often appears in actual monitoring to reduce manpower [41].The tails of the pollutographs have stronger linear characteristics, so a linear interpolation is used to amend the incomplete tails [42].The pollutographs amended by linear interpolation are shown in Figure 6, indicating that the hydrographs with the tail correction are more coincident.
During the monitoring process, emphasis is placed on the discrepancy in the monitoring mechanism under different rainfall conditions.Based on the abovementioned analysis, the NPS prediction performs the best during the light rainfall events and is the worst during heavy rainfall events.Therefore, the monitoring process for the NPS can be appropriately focused on heavy rainfall conditions.Researchers should pay more attention to monitoring time to avoid peak data scarcity, especially for the NPS-TN monitoring [43].As already suggested, peak concentration appeared after nearly five hours of runoff during light rainfall events and after nearly three hours of runoff during moderate events.Therefore, we promote peak monitoring techniques, for example, anautomatic sampler with programming, we can appropriately shorten sampling intervals for the peak lag times.Meanwhile, based on the pollutographs improved by this study, we can design the sampling scheme and avoid the risk time in order to require complete water quality data.In addition, the entropy weighting method can be effectively used to evaluate the measured and simulated data [44], showing that it can be used to comprehensively assess the discrepancies more accurately and to easily compare the results, which can be generalized to other catchments.
anautomatic sampler with programming, we can appropriately shorten sampling intervals for the peak lag times.Meanwhile, based on the pollutographs improved by this study, we can design the sampling scheme and avoid the risk time in order to require complete water quality data.In addition, the entropy weighting method can be effectively used to evaluate the measured and simulated data [44], showing that it can be used to comprehensively assess the discrepancies more accurately and to easily compare the results, which can be generalized to other catchments.

Conclusions
In this study, a new framework is proposed for the event-based NPS prediction and evaluation in data-scarce catchments.The results obtained from this study indicate that the proposed ANN had an improved performance over the NPS simulation of light rainfall events, and the NPS-TP model

Conclusions
In this study, a new framework is proposed for the event-based NPS prediction and evaluation in data-scarce catchments.The results obtained from this study indicate that the proposed ANN had an improved performance over the NPS simulation of light rainfall events, and the NPS-TP model was always more accurate than the NPS-TN under scarce data conditions.In addition, the scarcity of the peak pollutant data has a significant impact on the model performance, so more attention should be given to the monitoring scheme of the event-based NPS studies, especially for the NPS-TN monitoring and the lag time of the peak data.Compared to the traditional indicators, the entropy weighting method can provide a more accurate ANN by considering all of the information during model evaluation.These tools could be extended to other catchments to quantify the event-based NPS pollution, especially data-poor catchments.
However, we should pay more attention to the mechanism of the NPS during multiple rainfall events because the NPS pollution was not the simple consequence of current rainfall events.Additionally, because of the computational burden, the errors and the related uncertainty of the model results were not explored, so more studies are suggested to test this new framework among more diverse regions.Meanwhile, data-driven black-box models are not good at long-term forecasting, nor are they good for examining the effect of BMPs.

Figure 1 .
Figure 1.The methods presented in a flow chart.

Figure 1 .
Figure 1.The methods presented in a flow chart.

Figure 2 .
Figure 2. The learning mechanism of the artificial neural network (ANN).

Figure 2 .
Figure 2. The learning mechanism of the artificial neural network (ANN).

Figure 3 .
Figure 3.The location of the Zhangjiachong catchment.

3. 2 .
Field Monitoring and Data Record In this study, field monitoring data were collected from 1 January 2013 to 31 December 2014 and the rainfall, streamflow and pollutant data during eight rainfall events were recorded.The data used in this study represent three complete rainfall events, which include light, middle, and heavy rainfall (21 April 2014, 24 July 2014, and 5 August 2014), and five other data-scarce events (15 April 2014, 23 August 2014, 20 July 2014, 5 July 2013, and 28 August 2013

Figure 4 .
Figure 4. Training results for the loads in five data-scarce rainfall events: (a) the total nitrogen of non-point source (NPS-TN); and (b) the total phosphorus of non-point source (NPS-TP).Note: the pink line with dots represent the epochs are smaller.

Figure 4 .
Figure 4. Training results for the loads in five data-scarce rainfall events: (a) the total nitrogen of non-point source (NPS-TN); and (b) the total phosphorus of non-point source (NPS-TP).Note: the pink line with dots represent the epochs are smaller.

Figure 5 .
Figure 5.The pollutographs for different rainfall patterns.Figure 5.The pollutographs for different rainfall patterns.

Figure 6 .
Figure 6.The tail amendment of the pollutographs for different rainfall patterns.

Table 1 .
Complete data for the three rainfall events.

Table 2 .
Five rainfall events with data scarcity.
Note: the units of flow are in m 3 /s; the units of NPS-TN and NPS-TP are mg/L; -denotes that data are missing at this time.

Table 3 .
Evaluation of the simulation results of the pollutant loads for different rainfall patterns.

Table 4 .
Evaluation effect of the data scarcity on the models for the five data-scarce rainfall events.

Table 5 .
Evaluation of the predicted effects of the verification points.