Review of Nitrogen Compounds Prediction in Water Bodies Using Artificial Neural Networks and Other Models

: The prediction of nitrogen not only assists in monitoring the nitrogen concentration in streams but also helps in optimizing the usage of fertilizers in agricultural fields. A precise prediction model guarantees the delivering of better-quality water for human use, as the operations of various water treatment plants depend on the concentration of nitrogen in streams. Considering the stochastic nature and the various hydrological variables upon which nitrogen concentration depends, a predictive model should be efficient enough to account for all the complexities of nature in the prediction of nitrogen concentration. For two decades, artificial neural networks (ANNs) and other models (such as autoregressive integrated moving average (ARIMA) model, hybrid model, etc.), used for predicting different complex hydrological parameters, have proved efficient and accurate up to a certain extent. In this review paper, such prediction models, created for predicting nitrogen concentration, are critically analyzed, comparing their accuracy and input variables. Moreover, future research works aiming to predict nitrogen using advanced techniques and more reliable and appropriate input variables are also discussed.


Introduction
Human activities have provoked serious effects on the nutrient cycle, ecological functioning of streams, and water quality [1][2][3]. Presently, agriculture production consummately depends on the amount of fertilizers and pesticides used. Fertilizers mainly contain nitrogen compared with other chemicals. Crops require nitrogen for their growth and for the production of fruits or grains. Some agricultural specialists have also recommended using the fertilizers that carry a higher percentage of nitrogen [4]. However, only 40-70% of nitrogen compounds applied as fertilizers are absorbed by the crops. The remaining nitrogen compounds either percolate downward with water to join groundwater or flow along with the runoff water to join the streams [5,6]. In both cases, the nitrogen concentration in water escalates, which can affect human health [7][8][9]. If pesticides and fertilizers are added to the fields at a high rate, there is more chance for nitrate to percolate to the aquifer, increasing the nitrate level in groundwater [10][11][12]. In warmer countries, the loss of total nitrogen is more, as mineralization rate is probably higher due to the higher temperature; thus, the percolation of total nitrogen is increased [13].
The major proportion of the surplus nitrogen is transported by the runoff water to the streams, and consequently, nitrogen compounds such as ammonia-nitrogen, nitrite, and nitrate, are escalated in the streams. A surfeit of nitrogen in streams seems to be deleterious for both human beings and aquatic lives. In water bodies, it may lead to the magnification of aquatic plants and algae, which can result in the depletion of dissolved oxygen and hinder the contact of water with air and light. The presence of such excess nitrogen in drinking water reduces the amount of oxygen transported in the blood [5]. Mostly, treatment plants are not designed for the full removal of nitrogen compounds from river water. In China, sewage treatment systems remove total nitrogen by 40-70% [14,15]. In Malaysia, sewage treatment plants are not designed for ammonia removal [16]. Recently, several water treatment plants have been forced to shut down when, after testing the samples, it was found that ammonia-nitrogen pollution has crossed the acceptable limit in different rivers in Malaysia. The abrupt closure of the water treatment plant affects the water supply to the consumers; thus, adding additional pressure on the government for arranging an alternate source of water supply.
The lack of monitoring systems leads to an abrupt increase in pollution, which can result in the closure of the water treatment plants. Monitoring systems should contain a proper predictive system: which works based on the historical data; and a treatment system: that deals with the nitrogen pollutant, should be developed in treatment plants. Predictive systems could provide the daily data of pollutants and thus save the daily effort of quantifying such data in the laboratory. Moreover, predictive systems would create an alert for nitrogen surge in rivers before it actually happens. Hence, the government would have ample time to optimize various nitrogen inputs in the rivers. Different river basins require a separate predictive model, trained on historical data of the basin's parameters because a model well-trained on historical data of one particular basin, not necessarily will perform with the same accuracy on different basins. Hence, the government requires a separate predictive model for each basin. Additionally, to consider the upcoming seasonal changes, the predictive models need to be re-trained with the real-time data on a quarterly or yearly basis. Observing the increased pollution of nitrogen in rivers, this topic becomes important to be evaluated.
The primary objective of this study is to classify different types of ANN used for predicting nitrogen content in streams in different rivers all around the world. Furthermore, the states of different rivers in the world were also evaluated, resulting in the scope of future research work. This review paper also highlights the prediction accuracy and reliability, the parameters and methods used for prediction, and the details of ANNs of different models used for nitrogen prediction. This review paper will, surely, add some valuable points on the table for those researchers working for modeling using ANN, for those modeling for nitrogen compounds pollution and for those seeking information about nitrogen pollution level in water bodies. The articles cited in this review are those published in reputable journals.

Nitrogen Sources in Streams
Nitrogen is a vital element for plants, as it helps them in their growth and productivity. Nitrogen present as in the atmosphere cannot be utilized directly by plants until it is converted to its reactive compounds, such as , , , or [39]. This process is naturally done by bacteria present in the soil and in the root nodules of legume crops. Additionally, nitrogen compounds are provided to the soil in the form of fertilizers. Nitrate is the main constituent of fertilizers, but ammonia, ammonium, urea, and amines are also present in minor proportions. Nowadays, fertilizers contain more of a percentage of nitrogen compounds in order to boost the agricultural productivity.
In addition, the landscapes of the farmlands have been modified extensively. Farmlands are now designed to drain off the excess rainwater or irrigation water [40]. This drained water is rich in nitrogen compounds, which had been applied to the field for crop nourishment. The drained water then joins either running rivers or still water bodies such as lakes, leading to a surfeit of nitrogen entering the water system.
Sources of nitrogen to streams are not confined to agricultural fields. Industries and municipal and residential areas also contribute nitrogen compounds to streams. Comprehensively, the sources of nitrogen are classified into two: a. Point Sources A point source of nitrogen pollution is any single identifiable source of nitrogen pollution into rivers. Point sources include industries and municipal sewage treatment plants [15,41,42]. In urban areas, the contribution of nitrogen from point sources is dominant. Industries and municipal sewage treatment plants deliver more than 50% of the total nitrogen in rivers [39].
b. Non-point sources Non-point sources are sources of nitrogen pollution whose specific locations of input to rivers are not defined. They mainly consist of agricultural fields and atmospheric and biological nitrogen fixation [15,41,42]. In rural areas, the contribution by non-point sources is dominant. In different regions of rural areas, different parts of non-point sources contribute major amounts of nitrogen in streams; for example, in farming regions, agricultural fields provide significant nitrogen to the streams, and in the regions of rivers surrounded by dense forests, atmospheric nitrogen deposition dominates [39].

Effects of Nitrogen
Nitrogen, if present in river water, causes different disorders, which are deleterious for both human and aquatic animals. Nitrogen present in streams are mainly found in three compound states: ammonia, nitrate, and nitrite. Some amounts of ammonia present in the river water get converted to nitrate depending on the dissolved oxygen concentration in the water [43]. As stated earlier, nitrate is not much deleterious, but if present in surplus amount, it starts converting into nitrite, which is very harmful even in minute concentration. The Environmental Protection Agency has set standards which state that for water which is to be distributed for public use, the maximum acceptable nitrate concentration is 10 mg/L [5,25] and that for nitrite is 1 mg/L.
There are two major effects of ammonia on the whole ecosystem: eutrophication of marine and terrestrial ecosystems [44,45] and increase in the acidity of water bodies [46]. Excessive nutrients such as nitrogen and phosphorus when present in water bodies lead to the growth of algae on the top surface of water; this process is termed as eutrophication. Excess grown algae cover the whole water surface, blocking the contact of water from sunlight and air. Additionally, the algae growth decreases the oxygen level in the water body, which affects the aquatic lives. Stream eutrophication was recognized as a major problem years ago, and the United States along with other countries commenced nutrient control measures in rivers [47,48].
Streams may get acidified due to the presence of surfeit ammonia. The most common form of ammonia, ammonium sulphate, leads to formation of a considerable amount of acid, as hydrogen ions are released during nitrification. Additionally, nitrite ions present in the streams lead to the formation of nitric acid under different situations along with sulfate ions, consequently acidifying the stream water [49]. Acidic stream water is not even suitable for reuse to satisfy human water requirements. As stated by Gündüz [50], one day, reuse of treated water would be a reality for the rural population, and this would result in serious problems such as human health issues. Compared with urban areas, agricultural areas are more susceptible to health risks by the presence of nitratenitrogen in groundwater [51,52].
Nitrite has been found to be more toxic than nitrate and if present in drinking water can cause human health problems such as liver damage and, in worst cases, can lead to various types of cancer [53] and two types of birth defects [54,55]. Nitrite present in surplus quantity in drinking water will eventually lower the ability of bloodstreams to carry oxygen, leading to the lack of oxygen in the body. Infants and young livestock are lamentably affected, as this causes "blue baby syndrome" [53]. The reaction of nitrites with amines either enzymatically or chemically leads to the formation of potent carcinogenic nitrosamines [53,56].
Consumption of nitrates leads to various tumors in the human body [53,57]. In the digestive system, nitrate leads to the formation of N-nitroso compounds [53,58], which are considered to be carcinogenic. Iodine uptakes can be restricted by nitrates, causing thyroid-related problems [53].

ANN
ANN is a black-box computational model [59] that contains interconnected network-like structures passing values to other nodes of the connections. It contains an input layer, hidden layers as required, and an output layer. It is well known for its capability of predicting the non-linear variables [60]. ANN forms the same structure as neurons in the human brain [6,20,61]. It functions like a biological neuron, receiving the input as stimulus, evaluating the stimulus, and then providing the output as the response to the stimulus. Figure 1 represents a simple example of the neural network. The inputs are fed to the nodes in the input layer, and those nodes pass the values of input data to the nodes in hidden layer 1 via interconnecting links. As the values are passed from input nodes to the following nodes, it is multiplied with the weights and then passed to the corresponding layer through a transfer function [62]. Likewise, it is passed up to the output layer, where the error is calculated using target vector. Based on this error, weights get adjusted to obtain the exact weighted combination of the input data for forecasting the target vector. The major advantage of application of the ANN model, over the traditional model, such as a statistical model, is that it learns itself the complexity of nature, without being explicitly transformed into mathematical form [63,64]. Statistical models have a limitation of assuming additional information to derive a sharp conclusion [65]. The major disadvantage of ANN is that it is susceptible to overfitting. Overfitting is the state in training, beyond which, training error decreases but the model starts losing its ability of generalizing the relation between input and output for the new data set i.e., the testing set data. This results in increasing the testing error and decreasing the overall performance of the model. There are several ways to prevent the model from overfitting, among which a well-known method is early-stopping; in which training process is stopped early. However, if the training is stopped too early then the model fails to learn important information. Hence, training should be stopped accordingly to learn all important information without overfitting.
Many types of ANNs feature different concepts of data processing. Each type is designed differently to obtain a more precise output with less data processing time. This is achieved by changing the network's architecture. According to Jain et al. [66], based on the network connection pattern, i.e., their architecture, ANN is classified into two categories: a. Feed-Forward Neural Networks (FFNNs) FFNN has the simplest network connection pattern in which data flow in the forward direction only, starting from the input layer to hidden layers, and then to the output layer. No loops are formed in the paths of the data flow. As shown in Figure 2, FFNN is classified into three subcomponents: single-layer perceptron, multilayer perceptron, and radial basis function neural network (RBFNN). Single-layer perceptron, which consists of one layer, i.e., the output layer, is the simplest form of neural network. It is mainly used for classifying the linearly separable cases that use binary targets. The connection patterns of multilayer perceptron and RBFNN are the same: an input layer, as many hidden layers as required, and an output layer. The only difference between these two is the use of the data processing function. Multilayer perceptron utilizes either threshold function or sigmoidal function [67] in each of its computational units, whereas RBFNN utilizes radial basis function as the activation function in each unit of its hidden layers. The Table 1 presents the advantages and disadvantages of different models of FFNN. These models are generally used for time series prediction, system control, and data classification.

b. Recurrent or Feedback Neural Networks
Recurrent or feedback neural networks experience the backward flow of data in some computational cells. The data flow is not unidirectional; loops within the cells transfer back the feedback of the errors encountered in computations, with reference to the target values. The feedback of errors helps in updating the weights of the corresponding inputs. As shown in Figure 2, feedback neural network is classified into four subcomponents: adaptive resonance theory model, Hopfield networks, Kohonen's networks, and competitive networks. Table 1 presents their advantages and disadvantages. These networks form very complex architectures, composed of a number of loops. These networks are utilized for complex computations, such as speech recognition, image processing, robotics, and process controls. This study is limited to the review of the FFNN.

Hybrid Model
Hybrid model is the combination of different models to solve a computational task. The need of hybridization aroused when the learning models were observed to be very efficient in some cases and inefficient in most of the cases [68]. The main aim of hybridization is to resolve the limitations of an individual model by fusion of decision making models with learning models [69]. The main advantage of a hybrid model is that it provides better results in comparison to the standalone model. The decision making model integrated in the hybrid model provides a good start with selected initial values of the internal parameters of learning models; hence, increasing the productivity of the learning model. The disadvantages of the hybrid models are: overall training process is time consuming, and complex architecture and training requires modern computational resources. Some of the examples of hybrid models are [70]: • ANN and genetic algorithm • ANN and fruit fly optimization algorithm • ANN and firefly algorithm • ANN and artificial immune systems • ANN and particle swarm-optimization algorithm

Methods and Evaluation
This study is based on nitrogen compounds prediction in water bodies using ANN and other predictive models. In this study, in the section of 'Application of ANN', authors have first analyzed the sources of data collection, methods used, internal parameters of the predictive model, and then the final results of the previous research works in literature. On the basis of this analysis, authors have recommended various steps to be followed in future studies for achieving better accuracy models.
As used by [71], authors of this study have used relevant search engines such as Google Scholar and Science Direct. Additionally, the authors of [72] concluded, in their study, that Google Scholar is the most comprehensive source. While searching the relevant literature research works, the following keywords have been used: nitrogen compounds prediction, use of ANN in nitrogen prediction and nitrogen prediction in water bodies.

Nitrogen Monitoring
More than 60% of the world's rivers are affected by pollution [43], from point sources or nonpoint sources. Wastes generated by industrial, municipal, and agricultural activities are discharged into the rivers and pollute them [43,73]. Over time, human activities have escalated nitrogen species concentration in water bodies. Nitrate concentrations in many European rivers have surged by 5-to 10-fold since the 20th century [39]. In Malaysia, because of the excessive chemical pollution in rivers, more than one among the nine water treatment plants in Langat River basin has been closed several times between 2012 and 2015 [41]. According to Selangor Water Management Authority, Malaysia, between 2012 and 2015, the ammonia concentration level in the Langat River exceeded 7.0 mg/L, which led to the repeated closure of many water treatment plants during the period [41]. Moreover, in the Johor River basin, nearly five treatment plants were repeatedly closed between 2017 and 2019 due to the high concentration of ammonia in the Johor River [74][75][76].
There is no specific standard set for ammonia discharge in water bodies, but different agencies have provided separate guidelines for ammonia concentration in water bodies. "Canadian Water Quality Guidelines for the Protection of Aquatic Lives", [77] states that the guideline value for unionized ammonia discharge in freshwater is a concentration of 0.019 mg/L. The guidelines for drinking water quality (2003) published by WHO states that natural levels of ammonia in groundwater are usually below 0.2 mg/L, and this level may go up to 12 mg/L for surface waters.
For analyzing nitrate variations, Rekacewicz [76] designed a map, as shown in Figure 3, by considering all the river data at continental level, which represent the concentration of nitratenitrogen in streams at various locations around the world. Rekacewicz [76] compared the data of two decades and observed that rivers in North America and Europe were fairly stable, but those of southcentral Asia and southeast Asia showed high nitrate concentrations.  [76].
Furthermore, Basheer et al. [78] studied the water quality of the Langat River in Malaysia. They utilized 10 samples from different locations to quantify different water quality parameters. Their results showed that the pH range for the Langat River was between 5.91 and 6.79. The average value of ammonia for the Langat River was measured to be 0.24 mg/L. The total ammonia-nitrogen amounts added to the Langat River from point and non-point sources were calculated to be 9.51 ton/day and 12.67 ton/day, respectively [41,79], as displayed in Figure 4.
Moreover, Zhang, Swaney, Li, Hong, Howarth and Ding [15] tried to calculate nitrogen input to the Huai River in China from anthropogenic point and non-point sources, and also the impact of nitrogen discharge on the riverine ammonia-nitrogen flux. They used the data from Yan et al. [80], which stated that the average nitrogen concentration in the sewage discharged from industries in the Changjiang River basin was 25 mg/L. From the previous studies, they could conclude that ammonianitrogen in the river was about 10% (or less) of the total nitrogen [15,81,82], and it could be as high as 70% in heavily polluted Asian rivers in the urban areas [15,83,84]. They used the data of Zhang et al. [85], which suggested that nitrate had become a major constituent of riverine nitrogen flux; the data was obtained from measurement in 2008, at several stations in the Huai River basin; the values of riverine nitrate concentration was found to vary between 0 and 15.7 mg/L nitrate-nitrogen, with a mean of 2.1 mg/L nitrate-nitrogen. When the authors of [15] measured the ammonia-nitrogen in the same river basin, they found that the average ammonia-nitrogen concentration varied between 0.2 and 3.3 mg/L N, with an average of 1 mg/L N, which was half of the average nitrate-nitrogen concentration measured in 2008. The calculation of nitrogen input to the Huai River showed that on average, 27200 ± 1100 kg km −2 y −1 of nitrogen was added to the river from 2003 to 2010 as the net anthropogenic nitrogen input.

Application of ANN
ANNs have been extensively used worldwide in the past as a predictive model for nitrogen prediction in streams. Table 2 lists studies on the use of ANN by various authors. Various authors had utilized different methodology, as shown in Table 3. For nitrogen prediction, ANN was utilized, for the first time, probably by Lek, Guiresse and Giraudel [20]. They used ANN to predict inorganic and total nitrogen concentration in streams using eight input parameters from the catchments along with the historical data of inorganic and total nitrogen. The input database was obtained from U.S. National Eutrophication Survey (NES); which had many variables in record but according to the scope of the research (prediction of stream nitrogen concentration), the following eight variables were included: average annual flow; animal unit density; mean annual streamflow; the percentages of forest cover, wetland, urban areas, and agriculture areas; and the percentages of the remaining area in the catchment. Sensitivity analysis showed five different types of variation in total nitrogen concentration and three different types of variation in inorganic nitrogen concentration. The sensitivity types (or contribution) for total nitrogen concentration are: (i) Increasing sigmoid contribution: wetland and animal unit density. Low values of these independent variables lead to low (minimum) value of total nitrogen; which then enhances to reach its maximum value with the independent variable. (ii) Weakly growing contribution: agricultural areas. For low values of agricultural areas, the total nitrogen is less and likewise increasing gradually. (iii) Decreasing contribution: average annual flow and percentage of remaining area. (iv) Gaussian: Urban areas. (v) Weak contribution: percentage of forest cover. For inorganic nitrogen: (i) Growing contribution: urban and agricultural areas. For low values of urban and agricultural areas, inorganic nitrogen concentration is less and then rapidly increases with these independent variables. (ii) Gaussian: percentage of wetland areas. (iii) Decreasing contribution: Percentage of forest cover, animal unit density and remaining area. Forest cover rapidly and constantly decreases the inorganic nitrogen concentration. The other two independent variables also reduce the inorganic nitrogen concentration but at low levels only. Input variables were auto-scaled by centered and reduced variables. Autoscaling reduces the chance of domination of any one particular input variable over the prediction. This input database was divided into a training and independent testing set (two thirds and one third of the total database, respectively). Using data from 927 sites from different parts of the United States, Lek, Guiresse and Giraudel [20] developed a multilayer feed-forward ANN model having 10 neurons and 1 hidden layer, with a correlation coefficient of 0.82 for total nitrogen concentration and 0.8 for inorganic nitrogen concentration. Examining the results obtained, they concluded that the urban areas produced most of the inorganic nitrogen, and animal husbandry contributed the most to the total nitrogen concentration in streams. It was assumed that fertilizers were used in less quantities as its contribution was less in stream nitrogen. Forest cover lowered the inorganic nitrogen concentration in streams and has less effect on total nitrogen concentration. Percentage of wetland areas helped in reducing the inorganic nitrogen in streams, but they increased the total nitrogen. The condition of the United States seemed to be critical in terms of nitrogen in streams, as four years after the study by Lek, Guiresse and Giraudel [20], a research work published by Suen and Eheart [25] stated that nitrate has become an important problem. They conducted a study in the Upper Sangamon River, Illinois, and pointed out the use of chemical fertilizers in agriculture to be responsible for the high nitrate concentration in streams. In their study, they developed two models, RBFNN and backpropagation neural network (BPNN), and compared the models on the basis of accuracy. The parameters used for modeling were daily highest temperature, seven-day cumulative daily rainfall, daily streamflow, and Julian date. To include the common practice of fertilizer application, Julian date was used as an input parameter to the model. They used a dataset of eight years, i.e., 1993-2000. To divide the dataset into the training set and testing set, two methods were adopted. In the first method, data from 1993 to 1996 were used as the training dataset and the remaining were used for testing. For the second method, the data of odd years (i.e., 1993, 1995, 1997, and 1999) were used for training, and those of even years were used for testing. Comparing the results obtained from the models, they concluded that the odd-even years method proved to be more accurate. The overall accuracy of the first method was obtained to be 0.784 and 0.752 for BPNN and RBFNN, respectively, and that of the second method was 0.832 for both the networks. Neural network models predicted with greater precision when tested for Boolean output considering the second method. The network signaled 1 when the nitrate concentration exceeded 10 mg/L and 0 when the nitrate concentration was below 10 mg/L. Considering Boolean output, they concluded that RBFNN had a higher accuracy (0.893) than BPNN (0.866).
In 2003, a research work published in Canada by Sharma, Negi, Rudra and Yang [6] stated that subsurface waters in Canada were being polluted by the nitrate from the fertilizers used in agricultural fields. Their experimental site was a field, of area 14 ha, located at the Greenbelt Research Farm of Agriculture and Agri-Food Canada, near Ottawa. The authors proposed a neural network model to assist in optimizing the use of fertilizers. The input database was collected from the experimental field for the period of 1991-1994, except for the temperature and precipitation data. Data of these two variables were collected at the station of Agriculture and Agri-Food Canada, located 12 km from the site. Two neural network models, fast BPNN and self-organizing RBFNN, were examined, aiming to select the superior network. Inputs to the model used were treatment (tillage or no tillage, i.e., whether the land was prepared or not), Julian day, rainfall per day, cumulative rainfall, total nitrogen applied, snowfall per day, and maximum and minimum temperature. Sensitivity analysis was performed to determine the optimum internal parameters of both the networks. The input data were divided into two sets: training and testing set. Training set consisted of eight input variables and two output, and the testing set consisted of only the unexposed inputs from the replicate plots. For fast BPNN, the parameters varied for sensitivity analysis were learning rate and number of hidden neurons. This analysis comprised of two stages: First stage was to keep the number of hidden neurons constant at 20 and vary the learning rate from 0.02 to 0.08. Analysis of the fluctuation of error on every variation led to the selection of optimum learning rate as 0.02. In the second stage, learning rate was kept constant to 0.02 and number of hidden neurons were varied from 5 to 25. Analyzing the similar way, optimum number of hidden neurons were selected as 20. Similarly, sensitivity analysis was performed for RBFNN, in two stages, by varying the tolerance and spread values from 5 to 20 and 1 to 20, respectively. The selected optimum value for tolerance and spread values were 20 and 15, respectively. Using these parameter values, both the models were further trained. Comparing the results of both networks, the authors concluded that the self-organizing RBFNN, with a correlation coefficient of 0.8079 for conventional tillage and 0.6911 for no tillage, outperformed the fast BPNN, with a correlation coefficient of 0.8017 for conventional tillage and 0.6635 for no tillage, for nitrate-nitrogen concentration prediction in drainage water. Lek, Guiresse and Giraudel [20] One year Sensitivity Analysis, Autoscaling 8 Input, 10 hidden neurons 5 Suen and Eheart [25] 1993-2000, (Daily) -- 6 Sharma, Negi, Rudra and Yang [6] 1991-1994, Holmberg, Forsius, Starr and Huttunen [19], predicted the future data of total organic carbon, total nitrogen, and total phosphorus in streams, considering the climate change effect and utilizing the data of three streams (Kelopuro, Hietapuro and Valkea-Kotinen) located in two catchments of the same name (Hietajärvi) in Finland. They developed a BPNN model employing the database of 13 input variables: month of data sampling, mean temperatures of 3 and 10 preceding days, runoff of sampling day, maximum and minimum runoffs of 3 preceding days, days of peak flow, days of low flow, catchment area, fractions of lake area and peatland area with respect to catchment area, catchment latitude, and elevation. This database was collected from the catchment, except for the daily temperature and precipitation, which was collected from the nearby Finnish Meteorological Institute weather station, Lammi, from 1990 to 2000. Samples of these variables were divided into two sets: training set and testing set. The samples were allocated into these sets by random choosing, provided it was ensured that the highest and lowest 10-percentile data were included in both the sets. While training, they were to test all the possible set of models with the available inputs, hence, they varied the number of inputs from 2 to 16, fixing the number of hidden layer to 1 and the neurons in the hidden layer were set as the integer part of (1 + number of inputs)/2. Training 10 sessions for each combination, resultant models were analyzed on the basis of their efficiency. The model resulted the best efficiency with 13 input variables and 1 hidden layer with 7 nodes, having the values of flux efficiencies of total organic carbon, total nitrogen, and total phosphorus as 0.94, 0.92, and 0.90, respectively. Using this model, they forecasted the total nitrogen data until 2050. They stated that if there is a low change in climate, then the total nitrogen flux will be near the value in 2005, but for a scenario of high change in climate, the nitrogen flux will increase by 26%, with respect of the value in 2005. Similar conditions have been stimulated in Melarchez, a catchment near Paris, France, where Anctil, Filion and Tournebize [61] investigated an agricultural catchment area to develop a neural network model for predicting the nitrate-nitrogen flux. Considering the soil moisture at different depths as the input parameter, the authors analyzed its effect on the nitrate-nitrogen flux. They developed a stacked multilayer perceptron model focusing mainly on the selection of best performing model among the list of models developed, based on different combinations of input variables and neurons in hidden layers. Fifty models were trained for each combination of inputs and neurons in hidden layers. Neurons in hidden layers were varied from 2 to 20. Every issue was tested discretely to make the final decision on the basis of the model accuracy. They had 12 different options for the input parameter: same-day stream flow, previous-day stream flow, increment in the flow from the previous day, same-day precipitation, previous-day precipitation, same-day historical mean flux, increment in the historical mean flux from the previous day, same-day 10 cm-, 20 cm-, 40 cm-, 80 cm-, and 120 cm-depth soil moisture indices. These input variables were collected from the gauge station for the period of 1975 to 1993. Since the important step, in pre-processing of data, is standardization [91], all the input variables were ensured to be on the same scale by standardizing them linearly such that their standard deviation as 1 and mean as 0. After optimizing, the final model had 2 input parameters (same-day stream flow and same-day 80 cm-depth soil moisture index), 12 neurons in hidden layers, and Levenberg-Marquardt with Bayesian regulation as the calibration procedure, which performed well with an efficiency index of 0.888. The utilization of soil moisture content at different depths revealed that the soil moisture also had an effect on nitrate-nitrogen flux generated from the agricultural field.
Since a large number of input variables are available to decide for the neural network, these inputs should be chosen using sensitivity analysis [92]. Numerous authors have provided models with different sets of input parameters, which according to them, were suitable for their models (Table 4). He, Oki, Sun, Komori, Kanae, Wang, Kim and Yamazaki [18] investigated 59 river basins all over Japan and developed an FFNN to predict the monthly total nitrogen concentrations in streams. They had to choose the most important independent input variables from a set of 16 input variables: the area of each basin, amount of fertilizer applied in each basin, average temperature, precipitation, sunshine duration and river discharge of each basin, ratio of paddy area, farmland area, forest area, bare land area, urban area, road area, river area, lake area, seashore area, and other land areas in the total basin area. This input database was collected from different sources. The land use variables were collected from Ministry of Land, Infrastructure, Transport and Tourism (MLIT land use database), a digital database in Japan. Total nitrogen concentration was collected from MLIT water information system. Sunshine duration, precipitation and temperature data were obtained from Automated Meteorological Data Acquisition System. The input data were divided into three subsets: Training, overfitting test and validation subsets. Among the data of 59 river basins, 40 river basin data were used for training and overfitting test (80% and 20%, respectively). The remaining 19 river basin data were never exposed to the network for training and were used for validation only. FFNN was trained with backpropagation algorithm with different combinations of input variables and internal parameters: input variables were varied from 7 to 9, number of hidden layers was fixed to 1 with number of neurons in it fixed to 7 and 8. Analyzing the results of all the trained network on the basis of coefficient of regression, the authors found that the model with 8 input variables (river discharge, average temperature and precipitation of each basin, amount of fertilizer applied in each basin, the proportions of forest land area, urban land area, road area, and other areas in the total basin area) and one hidden layer with seven nodes provided the best accuracy with R 2 for training as 0.96, R 2 for validation as 0.84, and R 2 for overfitting as 0.90. In addition to ANN, other machine learning methods can also be used to predict nonlinear environmental variables. Wang, Oldham and Hipsey [86] compared 13 machine learning models, including ANN, on the basis of precision in the prediction of DON (dissolved organic nitrogen) in groundwater in urban areas in southwestern Australia. These 13 machine learning models are classified into five different groups: (1) tree-based and rule-based model (generalized busted model (GBM), RF (Random Forest), conditional inference random forest (cforest), and cubist); (2) kernel-based machine learning model (Gaussian process with radial basis function kernel (GPR), Gaussian process with linear kernel (GPL), support vector machine with radial basis function kernel (SVMR), and support vector machine with linear kernel (SVML)); (3) generalized stepwise linear regression models (bagged mars, multivariate adaptive regression spline (mars), and generalized linear model with stepwise feature selection (GLM)); (4) instance-based model (k-nearest neighbors (KNNs)); and (5) ANNs. Using 401 groundwater samples (60% for training and 40% for testing), the models were examined based on two scenarios: (1) to train the models with all the data available such as nutrients (DON, total nitrogen, , and ), landscape (vegetation, land use, and soil), hydrological conditions (surface water subarea, groundwater subarea, and catchment area), and sampling conditions (temperature, sample depth, sampling date, and pH); (2) to train the models with only total nitrogen and all other non-nutrient data. Database of nutrients were obtained from the Western Australian Department of Water for the period of 2006-2014. ArcGIS spatial mapping feature provided the data of soil type, land use and vegetation type. These models were analyzed on the basis of their RMSE and R 2 values and compared with the manually calculated DON (DONcal) ( Figure 5). Analysis of all the results revealed that scenario 1 produced lower errors in models than scenario 2, stating that nutrients can improve the performance of models. Among the 13 tested models, 3 models showed higher R 2 value.  Zhang, Zhang and Li [87] compared ARIMA model, RBFNN model, and hybrid ARIMA-RBFNN model based on the analysis and prediction of water quality in Chagan Lake, China. Database of water quality was collected from "The Second Songhua River Diversion Project Record" from the Chinese Academy of Science. The water quality parameters utilized for analysis were monthly total nitrogen and total phosphorus for the period of 2006-2011. The parameters of ARIMA model for total nitrogen were p = 1, d = 1 and q = 1 and for total phosphorus were p = 2, d = 1 and q = 1. Water quality data from 2006 to 2010 were used for training and the trained model was used for prediction of water quality data of 2011. The width of training, σ, was 0.6 for RBFNN model with 2 nodes in hidden layers. ARIMA-predicted values were linearly super-positioned with RBFNNderived ARIMA residual prediction values to generate the hybrid ARIMA-RBFNN model. These models were analyzed on the basis of their RMSE and mean absolute percentage error. Results showed that RBFNN model had bad prediction results for total phosphorus; though, this model had learned the pattern of total nitrogen, but the predicted values were not satisfactory. Although ARIMA model did not have high prediction accuracy, it had successfully learned various trends for both total nitrogen and total phosphorus. Analyzing the results obtained, the mean absolute percentage error for the monthly total nitrogen was 18,194%, 34,633%, and 7017% for ARIMA, RBFNN, and hybrid ARIMA-RBFNN, respectively, and the mean absolute percentage error for the monthly total phosphorus was 27,299%, 126,957%, and 14,528% for ARIMA, RBFNN, and hybrid ARIMA-RBFNN, respectively. Following the results, it was stated that hybrid models had more capacity in predicting nonlinear variables.
Markus, Hejazi, Bajcsy, Giustolisi and Savic [88] developed three models-BPNN, EPR and NBM-for predicting weekly nitrate-nitrogen in a small agricultural watershed in Illinois. For the ANN part, the authors utilized observed weekly river discharge, precipitation, air temperature, and nitrate-nitrogen concentration as input variables. The study used the historical data of nitratenitrogen concentration and was collected from the Upper Sangamon River near Decatur for the period of 1994-1999. Employing half of the data for training and the other half for testing, they predicted the weekly data of nitrate-nitrogen in streams. The input selection was performed on the basis of trial and error with two sets of variables and their time lags. The first set consisted of four variables: , , , ; and the second set consisted of four variables and three time lags , , , , , , . The first set predicted better results and hence was used for ANN modeling. ERP model has the capability of selecting the input subset, hence it is fed with the larger input set, the second set. In case of NBM, both the sets were used for modeling. For modeling in the ANN part, the internal parameters selected were: epochs: 100,000; performance gradient: 1E-10; goal: 0; number of hidden nodes: 1, 2, 3, 4 and 5; input variables: 4 (air temperature, discharge, nitrate-N concentration and precipitation) and output variable: 1 (next week nitrate-N concentration). The results indicated that the ANN with 2 nodes showed more accurate results in terms of RMSE as 0.787 mg/L and 0.935 mg/L for training and testing, respectively. For EPR, two models (EPR1 and EPR2) were generated which had their equations as: = 0.827 and = 0.659 + 0.560 , respectively. The RMSE obtained for EPR1 was 1.092 mg/L for training and 1.170 mg/L for testing. The RMSE obtained for the EPR2 was more accurate: 0.991 mg/L and 1.010 mg/L for training and testing, respectively. The NBM model utilized two categories: high and low values for variables. Each variable, except for nitrate-N concentration, had its categories divided by the average values as threshold. For nitrate-N concentration, the separation point was the emergency cutoff level (8.5 mg/L). NBM1 and NBM2 were the two models tested with the equations as: = , , , and = , , , , , , , respectively. The results of these models indicated that, for low concentration, NBM1 had accurately predicted 79 of 80 concentrations, but for high concentrations, the prediction rate was 2 of 9. For NBM2, the predicted high flows (10) were somewhat similar to the observed ones (9). However, the false alarm rate for NBM2 was higher (7) than NBM1 (1). The critical success index for NBM1 was obtained as 0.214 and 0.200 for training and testing, respectively, and that for NBM2 was 0.286 and 0.188 for training and testing, respectively. The authors concluded that none of these models can be considered superior based on this analysis criteria, hence, suggesting a multi-tool approach. In their previous study, Markus et al. [93] compared the ANN model and linear regression model to calculate the uncertainty in forecasting the weekly nitrate-nitrogen in the Sangamon River, Illinois. They stated that the ANN model was more accurate than the linear regression model. The ANN model surpassed the linear regression model by 3.30% and 4.42% of RMSE in testing and training phases, respectively.
Amiri and Nakane [89] compared BPNN and MLR on the basis of the total nitrogen prediction in streams. The study was conducted in the Chugoku district of Japan, which contains 21 river basins. Total nitrogen database, for year 2001, was collected from prefecture offices from Okayama, Shimane, Hiroshima, Tottori and Yamaguchi. Six input variables were used for the prediction, which included five variables for land cover percentage (urban area, forest area, agriculture area, grassland, and water body) and the last variable for population density. The total nitrogen was predicted by utilizing 60% of the data for training, 25% for controlling, and the remaining 15% for testing. BPNN consisted of six input nodes for the corresponding six input variables, one hidden layer and one node in output layer for total nitrogen prediction. The optimum number of nodes in hidden layer were selected by varying the nodes from 0 to 13 and training the network 5 times for each variation and evaluating them on the basis of correlation coefficient. The selected optimum BPNN had the following internal parameters: input nodes: 6, hidden layer: 1, hidden layer node: 2, output node: 1, epochs: 11, 600. MLR model had the same inputs as for the BPNN. For MLR modeling, a normality test was conducted for total nitrogen and land cover data using Sharpio-Wilk test having p-value less than 0.05. Models were analyzed on the basis of regression statistics and coefficient of the model (if the resultant was normally distributed). Final regression model was developed by using backward approach. The goodness of fit of the models was evaluated by regression of observed versus predicted and scatter plot. Comparison of the results for both the models showed that the backpropagation model (R 2 = 0.94) predicted the results more precisely than the multiple regression model (R 2 = 0.85) Zeleňáková, Čarnogurská, Šlezingr and Słyś [90] predicted nitrogen and phosphorus concentrations in river Laborec in Slovakia, employing dimensional analysis method. They used Buckingham theorem to develop a prediction model utilizing important variables such as stream discharge, area of catchment, stream velocity, temperatures of air and water, and pollutant concentration. The equation established for nitrogen concentration was: = 0.0039 . and for phosphorus was: = 0.1868 . . These models were tested for the data of eight years (2003-2010); which was collected from Slovak Hydrometeorological Institute and Slovakian Water Management Company in Košice. Sensitivity analysis of the model stated that air and water temperature have major influence on the prediction of concentration of nitrogen and phosphorus. Velocity and flow of water have less influence and the catchment area has no influence on the prediction. By exploring the results of the model, it was found that the model equations calculated the prediction values with an average uncertainty of 31.33% for nitrogen and 32.30% for phosphorus.

Recommendation for Future Works
The precision of the predictive ANN model relies on many factors such as the amount of input data provided to the model for training and testing, relevant input variables, and different types of ANN methods used in the model. Based on the reviewed research works, we suggest some techniques to improve the accuracy of the nitrogen predicting model and also to account for a large range of inputs. a) Being the first step of modeling, the training is the most important part of the modeling procedure. Various kinds of important information are provided to the model during training. The model learns different patterns in the input data. Weights are updated during training [94]. Providing ample data for training can lead to better precision of the model. Input data is divided into three sets: training, testing and validation sets [95], and sometimes divided into two sets: training and testing set, depending on the model. Training set is used for updating the weights and biases of the model. Validation set is used for preventing the model from overfitting. While training, if the validation accuracy is decreasing, then the model seems to be overfitting and the training should be stopped. Testing set is used for testing the output of the model in order to confirm the accuracy of the model. These sets are divided on certain percentage of input data, either provided by user or divided, by default, by the model. By default, ANN modeling software uses 70% of the input data as the training data, which may be less for getting higher accuracy, 15% for validation and the remaining 15% for testing. In order to increase the accuracy of the model, we suggest using a higher percentage of data for training, i.e., about 80% to 90%. The remaining is to be divided equally for validation and testing. While dividing the input data into the training, validation and testing set, it should be ensured that these sets are statistically similar. In order to increase the learning capacity of the model, it should be ensured that the model is exposed to the maximum and minimum values of the inputs while training. b) The accuracy of the AI model also depends on the types of inputs provided to the model [96].
Since there are many input variables upon which the nitrogen in streams depends, we suggest considering all the relevant inputs and then performing a sensitivity analysis to select the highly sensitive input variables for the prediction. Some of the relevant inputs are daily average rainfall data, daily average river discharge, daily average water temperature, historical data of nitrogen in streams, land use pattern, Julian day, amount of fertilizer applied in the catchment area, and the amount of nitrogen per day added from point sources. Using many input variables leads to the increase in the complexity of the network, which often effects the results of the network. To avoid this complexity, the user should avoid selecting the inter-dependent variables, for example: if the runoff data is included in the input data then the precipitation data can be avoided because runoff is dependent on precipitation and has the same pattern as that of precipitation. c) ANN is divided into different types, which are utilized for modeling hydrological parameters having different complexity levels. For creating a model involving a huge set of input variables, we suggest creating a hybrid model, which has higher accuracy. The ANN model has to be clipped with other models to create a hybrid model, and hence, it improves the accuracy of the resultant model. Zhang, Zhang and Li [87] utilized a hybrid model (ARIMA and RBFNN) to predict the monthly total nitrogen, and the mean absolute percentage error was reduced to 7.017%. However, in this case, they used only historical monthly data as input to the hybrid model; hence, a hybrid model with a wide range of relevant stochastic input variables will attain increased accuracy.

Conclusion
This research paper reviews the previous uses of ANN for the prediction of nitrogen compounds in streams. The efforts that have been made in past decades to predict the nitrogen compounds with greater accuracy are also demonstrated in this work. The current condition of rivers in terms of nitrogen compound concentration is discussed. The major non-point source of nitrogen in the streams is the fertilizer applied in agricultural fields. Excess nitrogen concentration in streams leads to human health issues. The operations of many water treatment plants depend on the concentration of nitrogen in the river. In the past two decades, ANNs have shown greater reliability in predicting the nitrogen compounds and have also helped in optimizing the sources of nitrogen input to the streams. The analysis of the literature reveals that published papers on the prediction of nitrogen compounds using hybrid models are limited. This study suggests the usage of a hybrid model along with the set of suggested relevant input variables and training procedures.