Simulation of Pollution Load at Basin Scale Based on LSTM-BP Spatiotemporal Combination Model

: Accurate simulation of pollution load at basin scale is very important for controlling pollution. Although data-driven models are increasingly popular in water environment studies, they are not extensively utilized in the simulation of pollution load at basin scale. In this paper, we developed a data-driven model based on Long-Short Term Memory (LSTM)-Back Propagation (BP) spatiotemporal combination. The model comprises several time simulators based on LSTM and a spatial combiner based on BP. The time series of the daily pollution load in the Zhouhe River basin during the period from 2006 to 2017 were simulated using the developed model, the BP model, the LSTM model and the Soil and Water Assessment Tool (SWAT) model, independently. Results showed that the spatial correlation (i.e., Pearson’s correlation coefﬁcient is larger than 0.5) supports using a single model to simulate the pollution load at all sub-basins, rather than using independent models for each sub-basin. Comparison of the LSTM-BP spatiotemporal combination model with the BP, LSTM and SWAT models showed that the performance of the LSTM model is better than that of the BP model and the LSTM model can obtain comparable performance with the SWAT model in most cases, whereas the performance of the LSTM-BP spatiotemporal combination model is much better than that of the LSTM and SWAT models. Although the variation of the simulated pollution load with the LSTM-BP model is high under different hydrological periods and precipitation intensities, the LSTM-BP model can track the temporal variation trend of pollution load accurately (i.e., the RMSE is 6.27, NSE is 0.86 and BIAS is 19.46 for the NH 3 load and the RMSE is 20.27, NSE is 0.71 and BIAS 36.87 is for the TN load). The results of this study demonstrate the applicability of data-driven models, especially the LSTM-BP model, in the simulation of pollution load at basin scale.


Introduction
Accurate simulation of pollution load at basin scale is very important for controlling pollution. At present, the estimation of pollution load in a basin is mainly performed using physical models, such as the Hydrological Simulation Program-Fortran (HSPF) model [1][2][3], the Soil and Water Assessment Tool (SWAT) model [4][5][6] and the Agricultural Non-Point Source (AGNPS) model [7][8][9]. However, these models describe the hydrological process and the transport processes of pollutants with complex physical mechanisms. Thus, these models are usually sensitive to parameters, data and resolution [10]. Moreover, they require great effort in terms of model calibration because of the uncertainty in model structure and parameter estimation [11].
Data-driven models can learn complex associations between inputs and outputs instantly and work with high accuracy even without prior knowledge of the underlying processes [12]. However, despite their advantages, the application of data-driven models in simulating pollution load at basin scale is still scarce. Several data-driven models, such

Time-series Data
Meteorological data, including precipitation, minimum and maximum temperatures, relative humidity, wind speed and solar radiation, were monitored at weather stations in the Zhouhe River basin (Figure 1(a)). Flow rates (i.e., runoff) and concentrations of NH3 and TN were measured at the automatic water quantity monitoring stations at the outlet of the basin (i.e., Liangtan station, shown in Figure 1(a)). Pollution discharged from point sources, including industrial and urban sewage and atmospheric deposition, were monitored by the Sichuan Environmental Monitoring Center. The time range is from January 1, 2006 to December 31, 2017 and the monitoring frequency is days, as well as these time series data were measured in accordance with China's Surface Water Quality Measurement Code.
The values of monthly NH3 load, TN load and precipitation on the validation set are shown in Figure 2 and

Time-Series Data
Meteorological data, including precipitation, minimum and maximum temperatures, relative humidity, wind speed and solar radiation, were monitored at weather stations in the Zhouhe River basin (Figure 1a). Flow rates (i.e., runoff) and concentrations of NH 3 and TN were measured at the automatic water quantity monitoring stations at the outlet of the basin (i.e., Liangtan station, shown in Figure 1a). Pollution discharged from point sources, including industrial and urban sewage and atmospheric deposition, were monitored by the Sichuan Environmental Monitoring Center. The time range is from 1 January 2006 to 31 December 2017 and the monitoring frequency is days, as well as these time series data were measured in accordance with China's Surface Water Quality Measurement Code.
The values of monthly NH 3 load, TN load and precipitation on the validation set are shown in Figures 2 and 3, respectively. Since the impact of precipitation on runoff and pollution load in the basin is different under different precipitation intensities and precipitation intensity influences the amount of water that dilutes the pollutants, the precipitation was divided into six levels, including no rain, light rain (precipitation between 0 and 10 mm, inclusive), moderate rain (precipitation between 10 and 25 mm, inclusive), heavy rain (precipitation between 25 and 50 mm, inclusive), torrential rain (precipitation between 50 and 100 mm, inclusive) and severe torrential rain (precipitation greater than 100 mm). Moreover, pollution load shows significant differences in different months and hydrological periods [26] and these differences play a crucial role in achieving an accurate simulation. According to the total precipitation of each month, it was divided into flood season, dry season and flat season. In the Zhouhe River basin, the annual precipitation is concentrated in the flood season (during the period from June to October). The dry season is from December to March of the following year. There is less precipitation during the dry season and the pollution load is significantly reduced due to the reduced runoff. The flat season is from April to May and November. During the flat season, the pollutants accumulated on the soil surface during the dry season converge into the rivers through surface runoff due to the increase of air temperature and precipitation. The flow rates and pollutants concentrations varied in the dry, flat and flood seasons. precipitation intensity influences the amount of water that dilutes the pollutants, the precipitation was divided into six levels, including no rain, light rain (precipitation between 0 and 10 mm, inclusive), moderate rain (precipitation between 10 and 25 mm, inclusive), heavy rain (precipitation between 25 and 50 mm, inclusive), torrential rain (precipitation between 50 and 100 mm, inclusive) and severe torrential rain (precipitation greater than 100 mm). Moreover, pollution load shows significant differences in different months and hydrological periods [26] and these differences play a crucial role in achieving an accurate simulation. According to the total precipitation of each month, it was divided into flood season, dry season and flat season. In the Zhouhe River basin, the annual precipitation is concentrated in the flood season (during the period from June to October). The dry season is from December to March of the following year. There is less precipitation during the dry season and the pollution load is significantly reduced due to the reduced runoff. The flat season is from April to May and November. During the flat season, the pollutants accumulated on the soil surface during the dry season converge into the rivers through surface runoff due to the increase of air temperature and precipitation. The flow rates and pollutants concentrations varied in the dry, flat and flood seasons.

Spatial Data
The Digital Elevation Model (DEM) used the global land GTOPO30 data from the United States Geological Survey (USGS)-Earth Resources Observation and Science (EROS) data center. Land use information for 2018 was obtained from the Chinese Academy of Sciences (Figure 1(b)). According to the World Reference Base [27], the soil in the calculation area includes 11 types (Figure 1(c)). Soil physical properties (i.e., particle size distribution, bulk density, field capacity and hydraulic conductivity) were obtained from the soil database and field survey in the Zhouhe River basin. Soil nutrient (i.e., TN content) was obtained from local soil survey reports (Figure 1(d)). The basin was divided into five sub-basins (i.e., S1 to S5 as shown in Figure 1(a)). Basic information of the five sub-basins (S1-S5) is show in Table 1.    precipitation intensity influences the amount of water that dilutes the pollutants, the precipitation was divided into six levels, including no rain, light rain (precipitation between 0 and 10 mm, inclusive), moderate rain (precipitation between 10 and 25 mm, inclusive), heavy rain (precipitation between 25 and 50 mm, inclusive), torrential rain (precipitation between 50 and 100 mm, inclusive) and severe torrential rain (precipitation greater than 100 mm). Moreover, pollution load shows significant differences in different months and hydrological periods [26] and these differences play a crucial role in achieving an accurate simulation. According to the total precipitation of each month, it was divided into flood season, dry season and flat season. In the Zhouhe River basin, the annual precipitation is concentrated in the flood season (during the period from June to October). The dry season is from December to March of the following year. There is less precipitation during the dry season and the pollution load is significantly reduced due to the reduced runoff. The flat season is from April to May and November. During the flat season, the pollutants accumulated on the soil surface during the dry season converge into the rivers through surface runoff due to the increase of air temperature and precipitation. The flow rates and pollutants concentrations varied in the dry, flat and flood seasons.

Spatial Data
The Digital Elevation Model (DEM) used the global land GTOPO30 data from the United States Geological Survey (USGS)-Earth Resources Observation and Science (EROS) data center. Land use information for 2018 was obtained from the Chinese Academy of Sciences (Figure 1(b)). According to the World Reference Base [27], the soil in the calculation area includes 11 types (Figure 1(c)). Soil physical properties (i.e., particle size distribution, bulk density, field capacity and hydraulic conductivity) were obtained from the soil database and field survey in the Zhouhe River basin. Soil nutrient (i.e., TN content) was obtained from local soil survey reports (Figure 1(d)). The basin was divided into five sub-basins (i.e., S1 to S5 as shown in Figure 1(a)). Basic information of the five sub-basins (S1-S5) is show in Table 1.

Spatial Data
The Digital Elevation Model (DEM) used the global land GTOPO30 data from the United States Geological Survey (USGS)-Earth Resources Observation and Science (EROS) data center. Land use information for 2018 was obtained from the Chinese Academy of Sciences ( Figure 1b). According to the World Reference Base [27], the soil in the calculation area includes 11 types (Figure 1c). Soil physical properties (i.e., particle size distribution, bulk density, field capacity and hydraulic conductivity) were obtained from the soil database and field survey in the Zhouhe River basin. Soil nutrient (i.e., TN content) was obtained from local soil survey reports ( Figure 1d). The basin was divided into five subbasins (i.e., S1 to S5 as shown in Figure 1a). Basic information of the five sub-basins (S1-S5) is show in Table 1. Table 1. Basic information of the five sub-basins (S1-S5).

Data Preprocessing
To avoid data dependence on measurement units and to improve the convergence speed of the calibration, data normalization was conducted before entering the model [28]. The data (i.e., pollution load, meteorology, runoff, digital elevation, land use and soil type data) were normalized between 0 and 1 by linear normalization. The normalization formula is as follows: where x is the original data and min x and max x represent the minimum and maximum values of the variable of the original data x, respectively. The normalized value of pollution load simulated by the model was converted to the output layer's actual value by the following formula: For nominal features such as month, hydrological period and precipitation intensity that do not have sequence and cannot be compared in size, simple values could not be used instead, because the attribute value of such features affects the operation of the model weight matrix. One-hot encoding is used to convert these nominal features into binary codes [29]. For example, one-hot encoding of the hydrological period features included "001," "010" and "100" for the flood season, the dry season and the flat season, respectively.

Spatial Correlation Analysis
The basin was divided into five sub-basins ( Figure 1a). More information about the five sub-basins is shown in Table 1. Pearson's correlation coefficient was used to measure the correlation between two random variables: where x i and x j respectively represent the series of NH 3 or TN load at sub-basin i and sub-basin j; Cov(·) is covariance and σ(·) is standard deviation. The correlations in the pollution load at the sub-basins are shown in Table 2. Notably, except the correlation coefficients between S1 and S5, between S2 and S5 and between S3 and S5 which are small due to the long distance and no confluence between the two sub-basins, the correlation coefficient in the pollution load between sub-basins is larger than 0.5. The strong spatial correlation supports using a single model to simulate the pollution load at all sub-basins, rather than using independent models for each sub-basin, because the associated inputs from nearby sub-basins can improve simulation accuracy [22]. Using the excellent ability of the LSTM model to remember long-term dependencies in time-series forecasting and modeling problems [30], a time simulator based on LSTM was established for each sub-basin to extract the time-series features of historical data and the complicated nonlinear relationships between input features automatically ( Figure 4). For the LSTM model, at sub-basin i, the input layer is an input vector (x t , x t−1 , . . . , x t−n ) and x t includes meteorology, precipitation intensity, runoff, month, hydrological period and pollution load. The hidden layer is the output (i.e., pollution load simulated value h it at time t). At time t, the input of the LSTM memory unit includes the current moment input variable x t , the previous moment hidden layer state variable h t−1 and the previous moment memory unit state variable c t−1 . The model passes through the forgotten gate f t , the input gate i t and the output gate o t in turn. The output of LSTM memory unit includes the current moment output variable h t and the current moment memory unit state variable c t . The calculation formulas are as follows: where are adjustable parameter matrices or vectors (these parameters were automatically optimized by the reverse error propagation algorithm in the calibration), σ is the Sigmoid activation function and tanh is the hyperbolic tangent activation function. In the simulation of pollution load, the input gate controls the influence of the new measured value on the current simulated value, while the output gate controls the influence of a past trend. For example, when the pollution load changes slowly, the output gate tends to close, retaining the trend information; when the pollution load changes sharply, the input gate is opened to obtain new observations.

BP-Based Spatial Combinatory
Using the nonlinear expression ability of the BP model [31], a spatial combiner based on BP was constructed to describe the spatial relationships among the sub-basins auto-

BP-Based Spatial Combinatory
Using the nonlinear expression ability of the BP model [31], a spatial combiner based on BP was constructed to describe the spatial relationships among the sub-basins automatically, thus achieving the accurate simulation of pollution load at each sub-basin ( Figure 5). The output z j of the jth neuron in the hidden layer of the BP neural network is as follows: ure 5). The output of the jth neuron in the hidden layer of the BP neural network is as follows: The after the hidden layer mapping of BP neural network is directly used as the input of the BP model's output layer and the output layer performs nonlinear fitting. The output S at sub-basin k in the output layer is as follows: where ℎ * is a vector composed of the underlying surface characteristics (i.e., digital elevation, land use and soil type) of sub-basin i and the output ℎ of the LSTM model at time t; wij, wjk, bj and bk are adjustable parameter matrices or vectors (these parameters will be automatically optimized by the reverse error propagation algorithm in the calibration); m is the total number of sub-basins, n is the number of neurons in the hidden layer and (•) is the activation function of neurons (Sigmoid activation function was selected in this study).

LSTM-BP Model
Using the deep-learning framework Keras, a LSTM-BP spatiotemporal combination model for the simulation of pollution load at basin scale was constructed ( Figure 6). The model comprised several time simulators based on LSTM and a spatial combiner based on BP. Historical data of time-series of the meteorology, precipitation intensity, runoff, month, hydrological period and pollution load were selected as the LSTM model's input. A time simulator based on LSTM was used to simulate changes of the pollution load in each sub-basin. Spatial data, including digital elevation, land use and soil type and the hidden layer output of LSTM models, were treated as the BP model's input. Then, a BPbased spatial combiner was released to capture the spatial relationships among the sub- The z j after the hidden layer mapping of BP neural network is directly used as the input of the BP model's output layer and the output layer performs nonlinear fitting. The output S k at sub-basin k in the output layer is as follows: where h * it is a vector composed of the underlying surface characteristics (i.e., digital elevation, land use and soil type) of sub-basin i and the output h it of the LSTM model at time t; w ij , w jk , b j and b k are adjustable parameter matrices or vectors (these parameters will be automatically optimized by the reverse error propagation algorithm in the calibration); m is the total number of sub-basins, n is the number of neurons in the hidden layer and f (·) is the activation function of neurons (Sigmoid activation function was selected in this study).

LSTM-BP Model
Using the deep-learning framework Keras, a LSTM-BP spatiotemporal combination model for the simulation of pollution load at basin scale was constructed ( Figure 6). The model comprised several time simulators based on LSTM and a spatial combiner based on BP. Historical data of time-series of the meteorology, precipitation intensity, runoff, month, hydrological period and pollution load were selected as the LSTM model's input. A time simulator based on LSTM was used to simulate changes of the pollution load in each sub-basin. Spatial data, including digital elevation, land use and soil type and the hidden layer output of LSTM models, were treated as the BP model's input. Then, a BP-based spatial combiner was released to capture the spatial relationships among the sub-basins to obtain the simulated value of pollution load at each sub-basin at time t at the output layer. The calibration period is from 1 January 2006 to 31 December 2014 and the validation period is from 1 January 2015 to 31 December 2017. basins to obtain the simulated value of pollution load at each sub-basin at time t at output layer. The calibration period is from January 1, 2006 to December 31, 2014 and validation period is from January 1, 2015 to December 31, 2017.

Tuning for Hyper-parameters
The optimal hyper-parameters were determined through multiple experimen There were two hidden layers of the LSTM network (15 neurons in the first hidden lay and 10 neurons in the second hidden layer) and one hidden layer of the BP network neurons). A dropout layer was set on the hidden layer with a dropout rate of 0.3 to redu overfitting. The maximum number of iterations is 200. The learning rate is 0.001. T batch_size is 50. The window size is 30. Adam stochastic gradient descent algorithm used. Root Mean Square Error (RMSE) is used as the loss function.

BP Model
The BP model ( Figure 5) implemented in this study was similar to the propos LSTM-BP model ( Figure 6). A single BP model was used for all sub-basins. The BP mode input was time-series data including meteorology, precipitation intensity, runoff, mon hydrological period and pollution load at each sub-basin and spatial data including di tal elevation, land use and soil type at each sub-basin. The BP model's output was simulated value of pollution load at each sub-basin at time t. The BP model also contain one hidden layer (64 neurons). A dropout layer was set on the hidden layer with a dropo rate of 0.

Tuning for Hyper-Parameters
The optimal hyper-parameters were determined through multiple experiments. There were two hidden layers of the LSTM network (15 neurons in the first hidden layer and 10 neurons in the second hidden layer) and one hidden layer of the BP network (64 neurons). A dropout layer was set on the hidden layer with a dropout rate of 0.3 to reduce overfitting. The maximum number of iterations is 200. The learning rate is 0.001. The batch_size is 50. The window size is 30. Adam stochastic gradient descent algorithm is used. Root Mean Square Error (RMSE) is used as the loss function.

BP Model
The BP model ( Figure 5) implemented in this study was similar to the proposed LSTM-BP model ( Figure 6). A single BP model was used for all sub-basins. The BP model's input was time-series data including meteorology, precipitation intensity, runoff, month, hydrological period and pollution load at each sub-basin and spatial data including digital elevation, land use and soil type at each sub-basin. The BP model's output was the simulated value of pollution load at each sub-basin at time t. The BP model also contained one hidden layer (64 neurons). A dropout layer was set on the hidden layer with a dropout rate of 0.3. The maximum number of iterations is 200. The learning rate is 0.001. The batch_size is 50. Adam stochastic gradient descent algorithm is used. RMSE is used as the loss function. The calibration period is from 1 January 2006 to 31 December 2014 and the validation period is from 1 January 2015 to 31 December 2017.

LSTM Model
The LSTM model (Figure 4) implemented in this study was similar to the proposed LSTM-BP model ( Figure 6). Independent LSTM models were used for each sub-basin. The LSTM model's input was time-series data including meteorology, precipitation intensity, runoff, month, hydrological period and pollution load at its own sub-basin. The LSTM model's output was the simulated value of pollution load at its own sub-basin at time t. The LSTM model also contained two hidden layers (15 neurons in the first hidden layer and 10 neurons in the second hidden layer). A dropout layer was set on the hidden layer with a dropout rate of 0.3. The maximum number of iterations is 200. The learning rate is 0.001. The batch_size is 50. The window size is 30. Adam stochastic gradient descent algorithm is used. RMSE is used as the loss function. The calibration period is from 1 January 2006 to 31 December 2014 and the validation period is from 1 January 2015 to 31 December 2017.

SWAT Model
The SWAT model was a physical-based distributed model for simulating hydrological and pollution load in a basin. In the SWAT model, the Zhouhe River basin was divided into 128 sub-basins based on DEM data. The sub-basin was further divided into Hydrological Response Units (HRUs) based on land use and soil properties. The SWAT model first simulated the process at the HRU level and subsequently routed at the sub-basin level. The SWAT Calibration and Uncertainty Programs (SWAT-CUP) were utilized to calibrate SWAT parameters. The Sequential Uncertainty Fitting (version 2, SUFI-2) algorithm in SWAT-CUP was selected as the calibration algorithm due to its good performance in large basins. The SWAT model ran at daily time step. The calibration period is from 1 January 2006 to 31 December 2014 and the validation period is from 1 January 2015 to 31 December 2017.

Model Evaluation Indicators
The RMSE, Nash-Sutcliffe Efficiency (NSE) and Bias Ratio (BIAS) were used to evaluate the model's performance. The RMSE reflects the degree of difference between the measured and simulated pollution load; the NSE verifies the model's accuracy; the BIAS judges whether there is a systematic bias between the measured and simulated pollution load. The indicators are defined as follows: where O i and S i are the measured and simulated pollution load, respectively and n is the total number of samples. The model guide developed by Moriasi et al. [32] was used to evaluate model performance.

Comparison of Simulation Performance with Other Models
The evaluation indicators of the BP, LSTM, LSTM-BP and SWAT models on the validation set are shown in Table 3. For the NH 3 load, the BP, LSTM, LSTM-BP and SWAT models are "very good" for all sub-basins [32]. For the TN load, the BP model is "satisfactory," "good" or "very good" for all sub-basins [32], while the LSTM, LSTM-BP and SWAT models are "good" or "very good" for all sub-basins [32]. NH 3 enters the rivers mainly in the form of dissolved N in the surface runoff, while TN enters the rivers in the form of dissolved N, as well as solid N absorbed on the suspended mass in the surface runoff and dissolved N in the sub-surface outflow [33]. Thus, the BP, LSTM, LSTM-BP and SWAT models underestimated TN peak. All models simulated the NH 3 load better than the TN load. The LSTM-BP model had the largest NSE and lowest BIAS and RMSE, demonstrating that the performance of the LSTM-BP model is much better than that of the BP, LSTM and SWAT models. Moreover, the performance of the LSTM model is better than that of the BP model and the LSTM model can achieve comparable performance with the SWAT model in most cases. These results are in substantial agreement with those of Kratzert [25]. The study by Kratzert [25] only used data from their own sub-basin as LSTM model's input, neglecting the fact that the associated input from nearby sub-basins may improve simulation performance, while our study effectively extracted the spatiotemporal characteristics between the various data using the LSTM-BP model.

LSTM-BP Model's Performance under Different Hydrological Periods and Precipitation Intensities
The evaluation indicators of the LSTM-BP model in different hydrological periods on the validation set are shown in Table 4. Among the dry season, the flat season and the flood season, the accuracy of the LSTM-BP model was highest during the dry season. Runoff seldom occurred during the dry season due to less rainfall. The pollution load, as well as the overall fluctuation, were small. The LSTM-BP model captured complicated nonlinear relationships between various factors. With the temperature and the precipitation increased during the flat season, the pollutants which accumulated on the soil surface over a long time during the dry season converged into the rivers with runoff, the pollution load peaked and the overall fluctuation increased [34,35]. We surmised that the nitrogen fertilizer application also contributed to this increase [11]. Karlen et al. [36] reported that higher TN in water was generally found after fertilizer applications. The fertilizers in Sichuan province are usually applied in spring and contain a large amount of nitrogen. The simulation accuracy of the LSTM-BP model decreased. Although the pollution load and overall fluctuation significantly increased during the flood season, the simulation accuracy of the LSTM-BP model increased. Although the study by Baek et al. [11] analyzed the pollutant distribution under different precipitation intensities, it did not analyze the model's simulation performance, while our study analyzed the model's simulation performance under different precipitation intensities. The evaluation indicators of the LSTM-BP under different precipitation intensities on the validation set are shown in Table 5. Zheng et al. [37] found through tests that the increase of rainfall intensity would increase the amount of pollutants. The greater the intensity of precipitation, the greater the coefficient of runoff production, the greater the runoff, the stronger the erosion of pollutants and soil, the stronger the capacity of pollution production and transportation of pollutants, the more pollutants carried away, the greater the degree of pollution generated, that is, the more threat to the surrounding water pollution [38]. The simulation accuracy of the LSTM-BP model decreased with the increasing of the pollution load. As the precipitation intensity was small, the pollution load tended to remain stable and fluctuates less. The pollution load and the fluctuation increased in orders of magnitude below the conditions of heavy precipitation [39]. The LSTM-BP model better simulated the stable fluctuation of the pollution load [40]. The comparison of the measured and LSTM-BP-simulated NH 3 and TN load on the validation set are shown in Figures 7 and 8, respectively. Although the variation of the simulated pollution load with the LSTM-BP model was high under different hydrological periods (Table 4) and precipitation intensities (Table 5), the LSTM-BP model could track the temporal variation trend of the pollution load accurately. Since these temporal variations may result from pollutant transport characteristics [11], this result implies that the LSTM-BP model can better reflect the transport characteristics of pollutants. These temporal variation have been well simulated in previous studies. For example, Baek et al. [11] simulated the temporal variation of TN, TP and TOC concentration using the LSTM model with the NSE value of 0.987, 0.899 and 0.832, respectively.

Conclusions
This study developed a data-driven model based on LSTM-BP spatiotemporal combination for the simulation of pollution load at basin scale. The model comprised several

Conclusions
This study developed a data-driven model based on LSTM-BP spatiotemporal combination for the simulation of pollution load at basin scale. The model comprised several

Conclusions
This study developed a data-driven model based on LSTM-BP spatiotemporal combination for the simulation of pollution load at basin scale. The model comprised several time simulators based on LSTM and a spatial combiner based on BP. The model was applied in the Zhouhe River basin. Results showed that there is a high spatial correlation (i.e., Pearson's correlation coefficient is larger than 0.5) in the pollution load between nearby sub-basins. The strong spatial correlation supports using a single model to simulate the pollution load at all sub-basins, rather than independent models for each sub-basin. Moreover, the performance of the LSTM model is better than that of the BP model and the LSTM model can achieve comparable performance with the SWAT model in most cases, whereas the performance of the LSTM-BP model is much better than that of the LSTM and SWAT models. Besides, the BP, LSTM, LSTM-BP and SWAT models underestimate the TN peak, which leads to a better simulation of NH 3 load than TN load. Although the variation of the simulated pollution load with the LSTM-BP model is high under different hydrological periods and precipitation intensities, the LSTM-BP model can track the temporal variation trend of the pollution load accurately (i.e., the RMSE is 6.27, NSE is 0.86 and BIAS is 19.46 for the NH 3 load and the RMSE is 20.27, NSE is 0.71 and BIAS 36.87 is for the TN load). The results of this study demonstrate the applicability of data-driven models, especially the LSTM-BP model, in the simulation of pollution load at basin scale. Although the developed model showed the acceptable simulation performance, only NH 3 load and TN load were simulated in this study. However, most physical models can simulate a variety of pollution load (i.e., total phosphorus, chemical oxygen demand and dissolved oxygen). Therefore, a further study to data-driven models is recommended to simulate more pollutants. Besides, this study only considers the pollution load simulation for the next day t, so multi-scale pollution load simulation can be considered in the future study, such as the variation trend simulation of pollution load in the next seven days.