Comparison of Long Short-Term Memory and Weighted Regressions on Time, Discharge, and Season Models for Nitrate-N Load Estimation

: The long short-term memory (LSTM) model has been widely used for a broad range of applications entailing the estimation of variables in di ﬀ erent ﬁelds to improve water quality management in rivers. The main objectives of this study are (1) to develop a novel LSTM-based model for the estimation of nitrate-N loads, which adversely a ﬀ ect water resources, and (2) to evaluate the performance of the model by comparing it with that of Monte Carlo sub-sampling and the weighted regressions on time discharge and season (WRTDS) model. We evaluated the model performance using various numbers of hidden layers, ranging from one to four, in the LSTM model to determine the appropriate number of hidden layers; furthermore, we applied the sampling frequencies of 6, 12, and 24 to assess their impact. Seven polluted river basins in the United States were used for analysis, and the relative root mean squared error ( rRMSE ) and the mean percentage error ( MPE ) metrics were applied for the validation of the model estimates. The proposed model achieved accurate nitrate-N load estimates using three to four hidden layers, and improved model performance was observed when the sampling frequency was increased. The di ﬀ erences among the results obtained using the LSTM model were examined based on a binning technique via a log-log plot of nitrate-N concentration against discharge. The binning analysis showed that the slope obtained from the average rates of discharge and low discharge values apparently inﬂuenced the estimates. Furthermore, box plot analyses of the statistical indices such as rRMSE and MPE demonstrate that the LSTM model seems to exhibit better performance than the WRTDS model. The results of the examination demonstrate that the LSTM model may be a good alternative with regard to estimating nitrate-N loads for the control of water quality constituents.


Introduction
Nitrate-nitrogen (Nitrate-N; NO 3 -N) load estimation is crucial to water resource management because excessive nutrients in water increase the degradation of water quality, resulting in water problems in rivers, streams, and receiving water bodies, such as the Laurentian Great Lakes [1,2]. River basins in the Midwestern United States have encountered difficulties with regard to water resource management because of nutrient enrichment and water pollutants derived from crop production using fertilizers and pesticides [3]. Specifically, rivers and streams in the state of Ohio are affected by the presence of high levels of nutrients, and they have been monitored for a long time period to estimate nitrate-N loads, in order to obtain an accurate assessment of water quality [4]. Notably, nonpoint the LSTM model. For the validation of the model, we used jackknife resampling techniques based on statistical indices. The reminder of this paper is organized as follows. A description of the datasets used for the study is provided in Section 2. In Section 3, we describe methodologies to estimate and evaluate the nitrate-N load. Sections 4 and 5 present the results and a discussion of the work. Finally, in Section 6, the conclusions are summarized.

Data Set
In this study, we estimated the nitrate-N loads for seven river basins, namely, the Cuyahoga (CY), Grand (GD), Great Miami (GM), Maumee (MM), Muskingum (MS), Raisin (RS), and Vermilion (VM) basins, which were chosen to represent the river basins of the Midwest US. Each basin has features that can affect the analysis of the nitrate-N load estimation. The basins consist of urban, wooded, and agricultural areas; particularly, water flow over these areas causes an eventual high nitrate-N concentration that affects the Great Lakes.
CY has a basin area of 1843 km 2 . Data regarding this basin are recorded via the United States Geological Survey (USGS) at station number 04208000. The rivers in this basin run through Cleveland city and are heavily influenced by its industrial pollution, which also feeds into Lake Erie. "Urban area" is the most significant land use type of this basin, accounting for 47% of land use. GD is a tributary of Lake Erie; furthermore, the GD basin has a basin area of 1758 km 2 . The most significant land use type of this is "woodland," at 52%. Data regarding this basin are gathered at station number 04212100. The GM basin has been assigned the station number 03271601; this basin has an area of 6953 km 2 and is surrounded by the Miami Valley. Furthermore, GM serves as a tributary of the Ohio River. The principal land use type for this basin is agriculture, at 82%. The MM basin has been assigned the station number 04,193,500 with a stream record; this basin has an area of 16,427 km 2 . Furthermore, MM flows from northeastern Indiana to northwestern Ohio and Lake Erie. The most significant land use type for this basin is also agriculture, at 81%. The station number for the MS basin is 03150000. Its basin area is 19,208 km 2 . It is part of the Ohio River and flows southward via eastern Ohio. Agriculture is the most significant land use type of this basin, at 52%. RS and VM have station numbers 04176500 and 04199500, respectively. Their basins areas are 2755 and 697 km 2 , respectively. RS is a river that flows into Lake Erie and VM is a tributary of Lake Erie in northern Ohio. Agriculture is the most significant land use type in both basins, at 72% and 71%, respectively. Figure 1 shows the locations of the seven river basins analyzed in this study. Table 1 lists the seven river basins according to their outlet, the portion of the land used, and the data collection period for daily discharge and nitrate-N concentration data. The daily discharge and nitrate-N concentration data were applied for load estimation. The daily discharge data were obtained from USGS [28], and the daily nitrate-N concentration data [4] were obtained from the Water Quality Laboratory of the National Center for Water Quality Research at Heidelberg University [29]. The variables used for load estimation were transformed for normality and standardized.   The daily nitrate-N concentration was used to develop an appropriate model for estimating the nitrate-N load considering different resampling frequencies and river basins. The data periods used for the CY, GD, GM, MM, MS, RS, and VM basins were 36,19,22,35,24,29, and 9 years, respectively. The average discharge for the seven river basins was 3059 m 3 /s, with a range of 501 m 3 /s to 8,608 m 3 /s. Furthermore, the average nitrate-N concentration for the basins was 2.48 mg/L, with a range of 0.46 mg/L to 4.40 mg/L. Figure 2 shows the daily and annual rates of discharge for each river basin. This figure uses two different y-axis scales, depending on whether the basin is large or small. GM, MM, and MS are all large basins with a large amount of discharge, whereas CY, GD, RS, and VM are small basins with a small amount of discharge. Figure 3 presents the daily and annual nitrate-N concentrations for each basin. In the figure, the average nitrate-N concentrations for CY, GM, MM, and RS are higher than those of GD, MS, and VM. The daily nitrate-N concentration was used to develop an appropriate model for estimating the nitrate-N load considering different resampling frequencies and river basins. The data periods used for the CY, GD, GM, MM, MS, RS, and VM basins were 36,19,22,35,24,29, and 9 years, respectively. The average discharge for the seven river basins was 3059 m 3 /s, with a range of 501 m 3 /s to 8608 m 3 /s. Furthermore, the average nitrate-N concentration for the basins was 2.48 mg/L, with a range of 0.46 mg/L to 4.40 mg/L. Figure 2 shows the daily and annual rates of discharge for each river basin. This figure uses two different y-axis scales, depending on whether the basin is large or small. GM, MM, and MS are all large basins with a large amount of discharge, whereas CY, GD, RS, and VM are small basins with a small amount of discharge. Figure 3 presents the daily and annual nitrate-N concentrations for each basin. In the figure, the average nitrate-N concentrations for CY, GM, MM, and RS are higher than those of GD, MS, and VM.

Methods
As mentioned previously, the primary objectives of this research are to investigate the use of the LSTM model for obtaining accurate nitrate-N load estimations for river basins in the US. The LSTM model was set up based on the Monte Carlo sub-sampling approach using various sampling frequencies. The discharge and nitrate-N concentrations were used for load estimates by establishing a relationship between the variables. The binning technique was used to examine the relationship, characterize the variables, and verify the results for nitrate-N load estimation. The results obtained from the LSTM model were evaluated using the WRTDS model to validate the performance of the proposed model. In the validation analysis, standard statistical indices were applied to the results.

LSTM Model Architecture
The LSTM network, which is a type of RNN, was used as an improved model in obtaining load estimation in the current study. The LSTM network was developed to overcome the problem of vanishing gradients [21]. The LSTM model is characterized by a memory cell, Ct, which memorizes state information over time and permits gradients to flow over sequences. It consists of three gates, including an input gate, it, a forget gate, ft, and an output gate, ot, from which the information flows into an LSTM cell. The LSTM cell identifies the input derived from the current time, xt, and the hidden state, ht-1, derived from the previous step by maintaining state information. A diagram of a LSTM cell with the three gates is shown in Figure 4a.

Methods
As mentioned previously, the primary objectives of this research are to investigate the use of the LSTM model for obtaining accurate nitrate-N load estimations for river basins in the US. The LSTM model was set up based on the Monte Carlo sub-sampling approach using various sampling frequencies.
The discharge and nitrate-N concentrations were used for load estimates by establishing a relationship between the variables. The binning technique was used to examine the relationship, characterize the variables, and verify the results for nitrate-N load estimation. The results obtained from the LSTM model were evaluated using the WRTDS model to validate the performance of the proposed model. In the validation analysis, standard statistical indices were applied to the results.

LSTM Model Architecture
The LSTM network, which is a type of RNN, was used as an improved model in obtaining load estimation in the current study. The LSTM network was developed to overcome the problem of vanishing gradients [21]. The LSTM model is characterized by a memory cell, C t , which memorizes state information over time and permits gradients to flow over sequences. It consists of three gates, including an input gate, i t , a forget gate, f t , and an output gate, o t , from which the information flows into an LSTM cell. The LSTM cell identifies the input derived from the current time, x t , and the hidden state, h t−1 , derived from the previous step by maintaining state information. A diagram of a LSTM cell with the three gates is shown in Figure 4a.  Based on the LSTM cell, the LSTM network consists of a sequence input layer, LSTM hidden layers, a fully connected layer, and an output layer. The input layer inputs sequence into the network. The LSTM hidden layers play a significant role in the modeling of correlations between time steps of sequence data. These layers are used to design more complex models that can solve complex issues related to pattern recognition, classification, and estimation. In the present study, the number of hidden layers ranging from one to four, was analyzed to determine the proper values of the hidden layers. The network completes the analysis with a fully connected layer and a regression outer layer, which provides an estimate. To implement the LSTM network, the hidden units are set at 200 for each layer, the maximum number of epochs is set at 250, and the learning rate is set at 0.005. The hidden units of 200 are decided in this study after attempting various numbers of units and by keeping the number of units based on the least forecast error. Adaptive optimization of weights is conducted based on the ADAM (adaptive moment estimation) optimizer algorithm. With the aim of estimating loads, the measured discharge and concentration are used to train the LSTM network. Load With the input and the hidden state, Equation (1) can be defined as the candidate cell state (g t ) based on the tanh function for the LSTM process: where W g , U g , and b g indicate the weights of the input, recurrent weight, and bias. In the input gate, information that will be stored in the memory cell is identified using an element-wise sigmoid function, as shown in Equation (2). In the forget gate, information that should be eliminated from the cell is Sustainability 2020, 12, 5942 8 of 24 determined using the sigmoid function (Equation (3)). The output gate can be also expressed using this function, as in Equation (4).
where σ(z) means the element-wise sigmoid function of σ(z) = 1/(1+exp(−z)). The information in the memory cell is then updated based on the partial forgetting of the information maintained in the previous cell C t−1 . Based on the input, forget, and output gates, the memory cell can be denoted as where * implies element-wise multiplication. The forget gate is used to determine whether an extent of the past information kept in C t−1 will be forgotten. The value of the gate ranges from 0 to 1. If f t tends to 0, the past information will be forgotten, whereas if it goes to 1, the past information will be stored in the memory cell. Using the analyzed cell state, C t , the hidden state h t , as shown in Equation (6), is updated to provide the output of the model.
The schematic description of the proposed method is presented in Figure 4b, which shows the estimation approaches, sampling frequencies, and assessment techniques used in obtaining load estimations. A brief description of the sampling frequencies and assessment techniques is provided in Sections 3.3 and 3.4, respectively.
Based on the LSTM cell, the LSTM network consists of a sequence input layer, LSTM hidden layers, a fully connected layer, and an output layer. The input layer inputs sequence into the network. The LSTM hidden layers play a significant role in the modeling of correlations between time steps of sequence data. These layers are used to design more complex models that can solve complex issues related to pattern recognition, classification, and estimation. In the present study, the number of hidden layers ranging from one to four, was analyzed to determine the proper values of the hidden layers. The network completes the analysis with a fully connected layer and a regression outer layer, which provides an estimate. To implement the LSTM network, the hidden units are set at 200 for each layer, the maximum number of epochs is set at 250, and the learning rate is set at 0.005. The hidden units of 200 are decided in this study after attempting various numbers of units and by keeping the number of units based on the least forecast error. Adaptive optimization of weights is conducted based on the ADAM (adaptive moment estimation) optimizer algorithm. With the aim of estimating loads, the measured discharge and concentration are used to train the LSTM network. Load estimation is obtained based on the estimated concentration derived from the LSTM model as the output variable. Note that the LSTM network for this analysis can be determined and built using the Deep Network Designer toolbox in MATLAB.

WRTDS Model Architecture
Weighted regressions on time discharge and season (WRTDS) is an approach used to examine long-term water-quality data by evaluating trends and average nitrate-N concentrations [14]. This method only uses daily flow data, but not necessarily daily concentration data. Notably, daily concentration data are often not present in river water quality monitoring datasets. The model is used to obtain the nitrate-N concentration estimations for each day in the data collection period. The WRTDS model consists of four components, including three deterministic and one random component corresponding to the season, trend, discharge, and random variables. The model chooses samples that are substantially close to estimation points in three dimensions, such as time, season, and discharge, by prescreening all sampled data for each point [31]. Based on the WRTDS mode with the components, the nitrate-N concentration can be estimated as follows: where β indicates the fitted coefficient, c is the nitrate-N concentration, Q indicates the discharge, t is the time in the record period, and ε implies the unexplained variation. This equation is based on weighted regression, in which each observation is weighted using the relevance of the observation to the estimation point. The weight corresponding to each observation can be defined based on a three-dimensional distance metric between the observation point and the estimation point. The form of the weight function determined by Tukey [32] is as follows where w indicates the weight, d implies the distance between the observation point and the estimation point, and h means the half window width. Detailed information regarding the processes and characteristics of the WRTDS method is available in a report by Hirsch et al. [14]. Hirsch and De Cicco [33] proposed the exploration and graphics for river trend (EGRET) R package, which includes the WRTDS algorithm. Their study applied an EGRET R package to estimate nitrate-N load concentrations using the WRTDS method.

Sampling Frequency and Monte Carlo Simulation
To accurately execute the LSTM and WRTDS models proposed in this study for nitrate-N load estimation, data sampling was performed at various frequencies. The accuracy of solute load estimation is significantly dependent on several parameters, including estimation approaches, sampling frequencies, and sampling routines [20,34]. Sampling frequencies of 6, 12, and 24 samples per year, which were used by Verma et al. [3], were employed for load estimation. Periodic sampling, which is the collection of data to represent a continuous daily concentration distribution using a sequence of seasonal and discrete values, was performed using these sampling frequencies. The aforementioned sampling frequencies are equivalent to yearly sampling intervals of 8, 4, and 2 weeks.
Based on the sampling frequencies, Monte Carlo simulation was performed to obtain a uniformly distributed random variable for the models [3]. When an 8-week sampling interval is used, a random day within 8 weeks and another random day within the next 8 weeks are selected as the sampling days on which load estimation will be analyzed. Based on the sampled data, load estimation is carried out by performing 500 iterations for executing the models selected for the present study. After all the 500 iterations are conducted for the eight river basins, the evaluation criteria are computed by averaging the 500 results derived from each simulation. The Monte Carlo sampling method ensures that a broad range of data is used for the evaluation of the proposed models with regard to estimating nitrate-N loads.

Evaluation Criteria
The performance of the proposed model was evaluated based on its accuracy of estimation of the daily nitrate-N loads for the seven river basins using a leave-one-out cross-validation, jackknife approach. This jackknife validation method has been widely used to assess the performance of estimates derived from neural network models [35][36][37][38]. The process of jackknife validation involves the removal of an original sample from the database as a test member, followed by calibration of the network model using the remaining database as training members. The model is calibrated using the training members and assessed using the test members.
The models based on the LSTM and WRTDS approaches were validated using two measures, the relative root mean squared error (rRMSE) and the mean percentage error (MPE). These statistical indices are commonly used for the evaluation of estimates derived from models [1,3,4,36]. The two measures can be computed as follows: where n implies the total number of data points used for the analysis, q i implies the measured value for day i, andq i indicates the estimated value derived from the models for day i. The rRMSE can range from zero to large positive numbers and the MPE can range from large negative to large positive numbers. The optimal value of both rRMSE and MPE is zero.
To identify the differences in the patterns of the rRMSE and MPE metrics presented in Section 4, we investigated the characteristics of and relationships among the discharge and nitrate-N concentrations. For the analysis, a binning approach was used to identify the trend of the original data, such as the nitrate-N concentration, by grouping several continuous values into a smaller number of bins. In this approach, the values of nitrate-N concentrations that fall into a given interval are replaced by the value corresponding to that interval. This method has been applied in previous studies to determine and analyze hydrological and environmental phenomena [39,40].

Evaluation of LSTM Models for Load Estimation
In the present study, an LSTM model was developed to perform nitrate-N load estimation for the analysis of nutrient concentrations for water quality management. The model was evaluated using different sampling frequencies of 6, 12, and 24 per year for seven river basins in the US. These sampling frequencies were adopted for the investigation of nitrate-N load estimation based on a study conducted by Lee et al. [1].
Using the LSTM model, the nitrate-N load estimates are obtained by employing various numbers of hidden layers for the three sampling frequencies. Table 2 shows the rRMSE and MPE metrics averaged over 500 iterations, which are used to evaluate the model estimates, for each number of hidden layers. The blue font indicates the best performing model for each number of hidden layers. The results presented in Table 2 show that the rRMSE and MPE criteria appear to improve the accuracy in estimating the nitrate-N loads of the seven river basins under the various sampling frequencies when the number of hidden layers increases. The LSTM models with three and four hidden layers show better performance compared to the ones with one and two hidden layers. Thus, a model with a small number of hidden layers seems to have insufficient complexity to fully represent the network system. With regard to the rRMSE metric, the model with four hidden layers showed the best performance under the three sampling frequencies for all sevens river basins considered in this study. Notably, with regard to the MPE metric, the best performance was exhibited by the model with four hidden layers for the CY, GD, MM, RS, and VM basins under the three sampling frequencies. However, with regard to the MPE metric, the model with three hidden layers shows the best performance for the GM basin, with the sampling frequencies of 6 and 12. Moreover, with regard to the statistical index, the model with three hidden layers shows the best performance for the MS basin, with a sampling frequency of 6.  Figure 5a shows the rRMSEs for the various numbers of hidden layers, which ranged from one to four, based on the three sampling frequencies for the seven basins. Particularly, Figure 5a shows that the model with four hidden layers exhibits relatively good performance with regard to estimating the nitrate-N loads in CY, GD, and GM compared to the other models. Furthermore, the models with three and four hidden layers exhibit a similar or slightly better performance with regard to the MM, MS, RS, and VM basins. Moreover, Figure 5a shows that the LSTM model with four hidden layers exhibits enhanced performance with regard to the rRMSE criterion for all the studied river basins. Furthermore, improved nitrate-N load estimation accuracy is obtained with regard to the CY, GD, GM, and MS basins when the sampling frequency increases. In contrast, a relatively poor performance is observed with regard to the MM and RS basins, even with the use of high sampling frequencies such as 12 and 24. Relatively good performance is obtained for the VM basin when sampling frequencies of 6 and 12 are used. We present a discussion of the results obtained for the river basins, presented in Section 5, by examining the characteristics of discharge and nitrate-N concentrations. Figure 5b shows the MPEs corresponding to the river basins analyzed in this study for the model with four hidden layers with sampling frequencies of 6, 12, and 24. Similar to the rRMSE criterion, the MPE metric also indicates that the model with four hidden layers exhibits better performance than the models with other numbers of layers, with regard to nitrate-N load estimation for the CY, GD, and GM basins. The models with three or four hidden layers demonstrate a similar or slightly better performance considering the other river basins, including the MM, MS, RS, and VM basins. Based on the results pertaining to the MPE criterion, we determined that enhanced performance is obtained for the CY, GD, GM, and MS basins when the sampling frequency of 24 is used, whereas the use of the sampling frequency of 6 or 12 leads to better performance with regard to the MM, RS, and VM basins. These patterns, which may be caused by features of discharge and nitrate-N concentration in the river basins, were also observed via analysis of the rRMSE metric. Figure 6 presents the relationship between discharge and nitrate-N concentration for the seven river basins. In Figure 6, the upper panel shows the relationship between discharge and nitrate-N concentration via a log-log plot, and the lower panel shows a log-log plot of nitrate-N concentration against discharge, based on the binning method, for each basin. As shown in the lower panel, we determined the slope based on the average discharge values that were estimated from non-overlapping bins of nitrate-N concentrations. The regression line was fitted to the plot in the log-log scale; furthermore, it is presented using the dashed line to validate the significance of the observed slopes. The slopes corresponding to the CY, GD, GM, MM, MS, RS, and VM basins are −0.35, 0.01, −0.05, 0.50, 0.44, 0.50, and 0.54, respectively. The CY basin has the largest negative slope of −0.35, demonstrating a different behavior compared to that of the other river basins. The large negative slope implies that nitrate-N concentration increases as the discharge decreases. This result is expected because the CY basin is characterized by a highly impervious watershed entailing an urban area, which affects the nitrate-N concentration [41,42]. In contrast, the MM, RS, and VM basins exhibit large positive slopes, ranging from 0.50 to 0.54. The large positive slope indicates that the nitrate-N concentration increases with an increase in the discharge values. Considering these basins, the performance of the model, with regard to the rRMSE and MPE indices, decreases when the sampling frequency increases. The large slopes of 0.50 or higher influence nitrate-N load estimation by decreasing the accuracy of the model when large sampling frequencies are employed.  Figure 5b shows the MPEs corresponding to the river basins analyzed in this study for the model with four hidden layers with sampling frequencies of 6, 12, and 24. Similar to the rRMSE criterion, the MPE metric also indicates that the model with four hidden layers exhibits better performance than the models with other numbers of layers, with regard to nitrate-N load estimation for the CY, GD, and GM basins. The models with three or four hidden layers demonstrate a similar or slightly better performance considering the other river basins, including the MM, MS, RS, and VM basins.

Comparison of LSTM and WRTDS Models for Nitrate-N Load Estimation
To validate the proposed model with regard to the estimation of nitrate-N loads, the WRTDS model was applied to estimate the nitrate-N loads of the seven river basins based on the sampling frequencies of 6, 12, and 24. Hirsch et al. [14] suggested the use of the WRTDS model to estimate daily

Comparison of LSTM and WRTDS Models for Nitrate-N Load Estimation
To validate the proposed model with regard to the estimation of nitrate-N loads, the WRTDS model was applied to estimate the nitrate-N loads of the seven river basins based on the sampling frequencies of 6, 12, and 24. Hirsch et al. [14] suggested the use of the WRTDS model to estimate daily nitrate-N concentrations, and Kandel and Bhattarai [43] compared various methods, including the WRTDS technique, for predicting nitrate-N loads. Based on the LSTM and WRTDS models, the rRMSE and MPE metrics for the nitrate-N load estimation were analyzed, as shown in Table 3. For the LSTM model, four hidden layers were used to compare its performance in obtaining nitrate-N load estimations with that of the WRTDS model. The blue font shows the best performing method. The average loads estimated using the LSTM and WRTDS models for the sampling frequencies of 6, 12, and 24 are shown in Figure 7. From the figure, we can observe that the estimation error and the bias seem to increase with the load, particularly when the load is substantially large. Regarding CY basin, the LSTM model tends to overestimate the load, whereas the WRTDS model tends to underestimate it. In contrast, in the case of VM basin, the load estimated using the LSTM model seems to be underestimated, whereas that estimated with WRTDS seems to be overestimated. Similar estimation performance is observed for the other basins with regard to the estimation of nitrate-N load using the two models.
Furthermore, box plots were obtained for each river basin based on the rRMSE criterion for the analysis of nitrate-N load estimates using the LSTM and WRTDS models. Figure 8 presents the box plots for the rRMSE of nitrate-N load estimation with three sampling frequencies. In Figure 8, the centerline of each box plot shows the median value for the estimation, and the top and bottom of each plot represent the 75th and 25th percentiles. The outliers are shown as cross symbols. Figure 8 shows that the LSTM model provides better nitrate-N load estimates compared to the WRTDS model for the sampling frequencies of 6, 12, and 24 in the cases of the CY, GD, GM, and MS basins. Regarding the MM and RS basins, the LSTM model also shows a better performance compared to that of the WRTDS model for a sampling frequency of 6. Considering the VM basin, the LSTM model demonstrates a better performance compared to that of the WRTDS model for the sampling frequencies of 6 and 12. As the results of the LSTM model demonstrate, the results obtained from the WRTDS model show worse performance when the number of sampling frequencies increases in the cases of the MM and VM basins. This observation may result from the slope of the average discharge obtained using the binning method, as discussed in Section 4.1. of 6, 12, and 24 are shown in Figure 7. From the figure, we can observe that the estimation error and the bias seem to increase with the load, particularly when the load is substantially large. Regarding CY basin, the LSTM model tends to overestimate the load, whereas the WRTDS model tends to underestimate it. In contrast, in the case of VM basin, the load estimated using the LSTM model seems to be underestimated, whereas that estimated with WRTDS seems to be overestimated. Similar estimation performance is observed for the other basins with regard to the estimation of nitrate-N load using the two models. Furthermore, box plots were obtained for each river basin based on the rRMSE criterion for the analysis of nitrate-N load estimates using the LSTM and WRTDS models. Figure 8 presents the box plots for the rRMSE of nitrate-N load estimation with three sampling frequencies. In Figure 8, the centerline of each box plot shows the median value for the estimation, and the top and bottom of each plot represent the 75th and 25th percentiles. The outliers are shown as cross symbols. Figure 8 shows that the LSTM model provides better nitrate-N load estimates compared to the WRTDS model for the  The analysis of the box plots based on the MPE was performed for each river basin considering the three sampling frequencies. Figure 9 shows the box plots for the MPE criterion of nitration load estimation derived from the LSTM and WRTDS models. The LSTM model performance in the cases of the CY, GD, GM, and MS basins was enhanced when the sampling frequency increased; this was also observed by examining the rRMSE. According to the MPE metric, the LSTM model with a sampling frequency of 6 showed better performance compared to that of the WRTDS model with regard to the GD, GM, MM, MS, and VM basins. The LSTM model provided better estimates compared to the WRTDS model in the cases of the GD, GM, MS, and VM basins for a sampling frequency of 12, and in the cases of the GD, GM, and MS basins for a sampling frequency of 24. The analysis of the load estimates shows that the LSTM model may provide enhanced load estimates when the sampling frequency is increased in the cases of the CY, GD, GM, and MS basins. Furthermore, the improved estimations can be used for water quality management. The analysis of the box plots based on the MPE was performed for each river basin considering the three sampling frequencies. Figure 9 shows the box plots for the MPE criterion of nitration load estimation derived from the LSTM and WRTDS models. The LSTM model performance in the cases of the CY, GD, GM, and MS basins was enhanced when the sampling frequency increased; this was also observed by examining the rRMSE. According to the MPE metric, the LSTM model with a sampling frequency of 6 showed better performance compared to that of the WRTDS model with regard to the GD, GM, MM, MS, and VM basins. The LSTM model provided better estimates compared to the WRTDS model in the cases of the GD, GM, MS, and VM basins for a sampling frequency of 12, and in the cases of the GD, GM, and MS basins for a sampling frequency of 24. The analysis of the load estimates shows that the LSTM model may provide enhanced load estimates when the sampling frequency is increased in the cases of the CY, GD, GM, and MS basins. Furthermore, the improved estimations can be used for water quality management.

Discussion
The analysis of nitrate-N load estimation based on the LSTM and WRTDS models was performed for the seven river basins. The models tend to provide improved load estimates when the sampling frequency increases, except for the MM, RS, and VM basins. This may be because hydrological characteristics such as the amount of discharge affect the ability of the model to obtain an accurate load estimation. A correlation study was performed for discharge and nitrate-N concentration and discharge and nitrate-N load for the river basins considered in this study. In Figure  10a, the correlation coefficients between the discharge and nitration concentration range from -0.584 to 0.523, while the R-squared value ranges from 0.007 to 0.341. Figure 10b shows that correlation coefficients between the discharge and nitrate-N load for the seven basins range from 0.776 to 0.919, and the R-squared value ranges from 0.601 to 0844. The red line indicates the linear regression line; it shows the trend of the discharge with regard to the nitrate-N concentration and load. (a)

Discussion
The analysis of nitrate-N load estimation based on the LSTM and WRTDS models was performed for the seven river basins. The models tend to provide improved load estimates when the sampling frequency increases, except for the MM, RS, and VM basins. This may be because hydrological characteristics such as the amount of discharge affect the ability of the model to obtain an accurate load estimation. A correlation study was performed for discharge and nitrate-N concentration and discharge and nitrate-N load for the river basins considered in this study. In Figure 10a, the correlation coefficients between the discharge and nitration concentration range from −0.584 to 0.523, while the R-squared value ranges from 0.007 to 0.341. Figure 10b shows that correlation coefficients between the discharge and nitrate-N load for the seven basins range from 0.776 to 0.919, and the R-squared value ranges from 0.601 to 0844. The red line indicates the linear regression line; it shows the trend of the discharge with regard to the nitrate-N concentration and load.

Discussion
The analysis of nitrate-N load estimation based on the LSTM and WRTDS models was performed for the seven river basins. The models tend to provide improved load estimates when the sampling frequency increases, except for the MM, RS, and VM basins. This may be because hydrological characteristics such as the amount of discharge affect the ability of the model to obtain an accurate load estimation. A correlation study was performed for discharge and nitrate-N concentration and discharge and nitrate-N load for the river basins considered in this study. In Figure  10a, the correlation coefficients between the discharge and nitration concentration range from -0.584 to 0.523, while the R-squared value ranges from 0.007 to 0.341. Figure 10b shows that correlation coefficients between the discharge and nitrate-N load for the seven basins range from 0.776 to 0.919, and the R-squared value ranges from 0.601 to 0844. The red line indicates the linear regression line; it shows the trend of the discharge with regard to the nitrate-N concentration and load.
(a)   14.62 to 2254.02 m 3 /s. The large value of the slope, the corresponding lowest discharge of which is below 1000 m 3 /s, affects the nitrate-N load estimation results of the proposed LSTM model. Relatively worse performance is obtained for the MM, RS, and VM basins when large sampling frequencies are employed, with the related discharge values being lower than 1000 m 3 /s. The discharge characteristics and nitrate-N concentration also affect nitrate-N load estimation in the current study. These specific characteristics may limit the applicability of the LSTM approach. Therefore, the model may be improved using a combination of relevant features of the nitrate-N concentration and discharge based on canonical correlation analysis [4]. Several studies have been conducted to evaluate the effects of hydrological and biogeochemical processes on the relationship between nitrate-N concentration and discharge [44][45][46][47]. Duncan et al. [45] found that the hydrological variability on a seasonal scale affects nitrate-N concentration in streams, and the slopes of the discharge and nitrate-N concentration are different in a wet year compared to those in a dry year, which is typically characterized by low discharge. In this study, the models with low discharge tend to be sensitive to the nitrate-N load estimation. Similarly, the concentration and discharge relationship analyzed by Duncan et al. [45] was found to be highly dependent on wetness. Future research should focus on sensitivity analysis based on different characteristics of this relationship with regard to nitrate-N load estimation.   Table 4 presents statistical estimates for average discharge values using the binning of nitrate-N concentrations. The maximum values of discharge range from 508.59 to 10,900.00 m 3 /s, and the minimum values range from 14.20 to 1507.74 m 3 /s. The standard deviation range for the values is from 14.62 to 2254.02 m 3 /s. The large value of the slope, the corresponding lowest discharge of which is below 1000 m 3 /s, affects the nitrate-N load estimation results of the proposed LSTM model. Relatively worse performance is obtained for the MM, RS, and VM basins when large sampling frequencies are employed, with the related discharge values being lower than 1000 m 3 /s. The discharge characteristics and nitrate-N concentration also affect nitrate-N load estimation in the current study. These specific characteristics may limit the applicability of the LSTM approach. Therefore, the model may be improved using a combination of relevant features of the nitrate-N concentration and discharge based on canonical correlation analysis [4]. Several studies have been conducted to evaluate the effects of hydrological and biogeochemical processes on the relationship between nitrate-N concentration and discharge [44][45][46][47]. Duncan et al. [45] found that the hydrological variability on a seasonal scale affects nitrate-N concentration in streams, and the slopes of the discharge and nitrate-N concentration are different in a wet year compared to those in a dry year, which is typically characterized by low discharge. In this study, the models with low discharge tend to be sensitive to the nitrate-N load estimation. Similarly, the concentration and discharge relationship analyzed by Duncan et al. [45] was found to be highly dependent on wetness. Future research should focus on sensitivity analysis based on different characteristics of this relationship with regard to nitrate-N load estimation.

Conclusions
In this study, LSTM was employed for the estimation of nitrate-N loads. The estimated loads can be used to control nutrient enrichment and water pollutants to improve water quality in river basins. The proposed LSTM model was designed based on long-term data records of discharge and nitrate-N concentration in seven river basins in the United States. The Monte Carlo sample method with periodic sampling frequencies of 6, 12, and 24 was applied to the uniformly distributed random variable. The proposed model was evaluated using the rRMSE and MPE statistical indices and with the WRTDS model for the comparison of model performance.
The appropriate number of hidden layers in the LSTM model was determined to enhance model performance. The statistical metrics showed that the use of three or four hidden layers provided good nitrate-N load estimates. Furthermore, relatively good performance was obtained using four hidden layers in the cases of the CY, GD, and GM basins, whereas a similar or better estimation was obtained with three or four hidden layers in the cases of the MM, MS, RS, and VM basins. Finally, in this study, we used four hidden layers in estimating the nitrate-N loads. When four hidden layers were employed, the LSTM model exhibited an increase in performance with an increase in the sampling frequency, except for the MM, RS, and VM basins. Relatively good nitrate-N load estimates were obtained for the MM, RS, and VM basins using a sampling frequency of 6, compared to those obtained using sampling frequencies of 12 and 24. These differences may be caused by characteristics of discharge and nitrate-N concentration.
To evaluate the proposed LSTM model, the WRTDS model was applied for obtaining nitrate-N load estimates for the seven river basins. The rRMSE and MPE were represented using box plots for the three sampling frequencies. With regard to the rRMSE, the LSTM model in the cases of the CY, GD, GM, and MS basins provided better estimates compared to the WRTDS model for all three sampling frequencies. Considering the MM, RS, and VM basins, the proposed model exhibited a better performance compared to that of WRTDS, based on the rRMSE, with sampling frequencies of 6 or 12. With regard to the MPE, the LSTM model produced better estimates using a sampling frequency of 6 compared to the WRTDS model in the cases of the GD, GM, MM, MS, and VM basins. Furthermore, the proposed model showed better estimates compared to the WRTDS model in the cases of the GD, GM, MS, and VM basins when using sampling frequencies of 12 or 24. Although the LSTM model employing any of the three sampling frequencies may not be applicable to all river basins with regard to estimating nitrate-N loads, reasonable estimates were obtained for most river basins used in this study. The results of this study show that the LSTM model seems to have excellent potential for being applied to solve environmental issues by reducing water pollutants in river basins.
Future work should focus on the extension of nitrate-N load estimation via LSTM to other river basins in various environments. For example, the nine large tributaries in Chesapeake Bay, analyzed by Hirsch et al. [12], can be explored using the proposed model to reduce nutrient enrichment. Future research should also investigate other aspects related to the relationship between nitrate-N concentration and discharge in various river basins. The extension of the significant variables in the model with regard to estimating nitrate-N loads will be of interest to improve model estimates.