Daily River Water Temperature Prediction: A Comparison between Neural Network and Stochastic Techniques

: The temperature of river water (TRW) is an important factor in river ecosystem predictions. This study aims to compare two different types of numerical model for predicting daily TRW in the Warta River basin in Poland. The implemented models were of the stochastic type—Autoregressive (AR), Moving Average (MA), Autoregressive Moving Average (ARMA) and Autoregressive Integrated Moving Average (ARIMA)—and the artiﬁcial intelligence (AI) type—Adaptive Neuro Fuzzy Inference System (ANFIS), Radial Basis Function (RBF) and Group Method of Data Handling (GMDH). The ANFIS and RBF models had the most ﬁtted outputs and the AR, ARMA and ARIMA patterns were the most accurate ones. The results showed that both of the model types can signiﬁcantly present suitable predictions. The stochastic models have somewhat less error with respect to both the highest and lowest TRW deciles than the AIs and were found to be better for prediction studies, with the GMDH complex model in some cases reaching Root Mean Square Error (RMSE) = 0.619 ◦ C and Nash-Sutcliff coefﬁcient (NS) = 0.992, while the AR(2) simple linear model with just two inputs was partially able to achieve better results (RMSE = 0.606 ◦ C and NS = 0.994). Due to these promising outcomes, it is suggested that this work be extended to other catchment areas to extend and generalize the results.


Introduction
For the monitoring of ecological status and proper functioning of the water ecosystem, it is important to have information about a river's thermal conditions [1]. The TRW is a good indicator of the control of hydrological and environmental pollution processes in water ecosystems [2].
Due to the influence of many factors which shape the features of the thermal regime of waters, predictions of changes and forecasts of TRW are a complex process [3]. Predictive and prognostic models of TRW changes take into account relationships with meteorological and hydrological factors, as well as morphological parameters [4][5][6][7]. Brosofske et al. [8], conducting research in western Washington, showed that significant changes in the reference characteristics of RWT are also caused by the features of the microclimate in the riparian zones of streams, which are often transformed by various forms of anthropogenic activity. In their opinion, a buffer 45 m wide (on each side of the stream) is necessary to maintain a natural riparian microclimatic environment along the streams. Air temperature is an important predictor of changes in TRW, and its relationships have been confirmed taking into account the nature of short-term to long-term fluctuations [9][10][11][12]. In TRW prediction, regression models and artificial intelligence models (AIs) are most often based on air temperature data [13][14][15][16][17][18][19]. Zhu and Piotrowski [3] presented a comprehensive review of stochastic models with the AI models in terms of their performance by applying evaluation measurements and indicating the most accurate prediction models.

Study Area and Source Material
The TRW modeling was carried out for the Warta River (Figure 1), the third longest river in Poland (length = 808.2 km). The Warta River is a tributary of the Odra River that carries water from the western part of Poland to the Baltic Sea. The basin area is characterized by its considerable size (54,529 km 2 ), and also by the large variety of environmental conditions. The sources of the river are located at an altitude of 380 m above sea level, while it flows into the Odra at an altitude of 12 m above sea level. From the middle section the Warta River is a Ist and IInd shipping class waterway, while in the metropolitan area (21 municipalities including Poznań) it flows through a water intake protection zone (Mosina-Krajkowo Intake), which supplies a large part of the metropolitan population with drinking water. On the river, just below Sieradz, is the Jeziorsko Reservoir, with an area of 42.3 km 2 and a dam in Skęczniew, which was built mainly to protect against floods. The Warta River is of great ecological importance, being a habitat for many species of fish and other organisms for which TRW is an important abiotic factor, shaping the conditions of their existence and evolution. It flows through lands of high natural value, including protected areas such as the Wielkopolski National Park (middle course) and the Warta Estuary National Park (lower course), and many locations of importance for the protection of habitats and birds, which were created under the European Natura 2000 project. Due to the large latitudinal and longitudinal extent of the Warta River catchment area, regional differences in climatic conditions are visible within its range. The area belongs to nine climatic regions [57]. The average annual temperature in the river basin ranges from 7.5 • C in the north to 8.5 • C in the west. Annual precipitation totals range from 520 mm in the northeast to 675 mm in the south.
In Poland, TRW measurements are conducted using the standard measurement and observation network of the Institute of Meteorology and Water Management-National Research Institute (IMWM-NRI, Warsaw, Poland). Measurements are performed daily at 7:00 a.m. (GMT + 1) at water gauge stations using automatic station probes. Research made use of daily TRW values for four gauge stations located on the Warta River ( Figure 1, Table 1): Bobry, Sieradz, Poznań and Gorzow Wielkopolski. The data were obtained from the database of the IMWM-NRI and covered a period of 20 hydrological years (1990-2009, specifically from 1 November 1989 to 31 October 2009).

Stochastic Models (Time Series Model)
A time series model is commonly used to simulate the data sorted by time. The time series of TRW can be considered as the result of a stochastic process. In this study, Autoregressive (AR), Moving Average (MA), Autoregressive Moving Average (ARMA) and Autoregressive Integrated Moving Average (ARIMA) time series models were used for TRW modeling. The ARMA and ARIMA models combine two basic models: the autoregressive and the moving-average. The construction of models is based on the autocorrelation phenomenon, i.e., on the correlation of the value of the forecasting variable with the time lags of the same variable [43].
The AR model is based on the assumption that there is an autocorrelation between the values of the predictable variable (target variable) and its values lagged in time. The form of the AR model is as follows: where {y t , y t−1 , y t−2 , . . . , y t−p } are the values of the predictable variable at the time steps {t, t − 1, t − 2, . . . , t − p}, respectively, {ϕ 0 , ϕ 1 , ϕ 2 , . . . , ϕ p are the model parameters, e t is model error (remainder) for time t, p is the amount of the time lag. The AR model is shown by its autoregressive degree (p) as AR(p). The MA model expresses the value of the variable as a function of the delayed values of the (stationary) random component. The equation form of MA model is as follows: where y t is the value of the target variable predicted in time t, {e t , e t−1 , e t−2 , . . . , e t−q } are errors (residuals) of the model at times {t, t − 1, t − 2, . . . , t − q}, {θ 0 , θ 1 , θ 2 , . . . , θ q } are the model parameters, and q is the amount of the time lag. The model is assigned moving average degree (q) as MA (q).
Combining the autoregressive and moving average model leads to ARMA, while adding the Integrated component to this model gives the ARIMA model; these have been shown below as equations [58]: where d is the differencing degree, which is related to the Integrated component. This ARMA is shown by its AR and MA degrees (p and q) as ARMA(p,q), while ARIMA additionally has an Integrated degree (d), making it ARIMA(p,d,q).

Artificial Intelligence Models
In this study, use was made of the Adaptive Neuro-Fuzzy Inference System (ANFIS), the Radial Basis Function (RBF) neural network, and the Group Method of Data Handling (GMDH) from the Artificial Intelligence models group.

Adaptive Neuro-Fuzzy Inference System (ANFIS)
The Adaptive Neuro-Fuzzy Inference System (ANFIS) is used primarily in automation and robotics related to control, decision making, and in the monitoring and diagnostics of hardly measurable processes and parameters. Uncertainty modeling is performed using the description of fuzzy variables based on so-called fuzzy logic. ANFIS is an alternative to systems based on models and traditional numerical algorithms in situations where information about a given field is uncertain and ambiguously formalized [59]. The system is a kind of artificial neural network (ANN) based on the fuzzy inference system (FIS). As part of this model, FIS provides an inference scheme, i.e., a method of constructing logical rules that are learned according to an algorithm taken from the theory of neural networks [60,61]. The learning rule of ANFIS, which determines its parameters, is a combination of methods, back-propagation, and least squares [62,63]. Inference in the neural-fuzzy system proceeds according to specific stages, each of which is carried out by an appropriate layer of the neural structure or similar. The first layer of the structure recreates the membership functions of fuzzy sets included in the premises of the rules. The second layer performs the operations of intersecting fuzzy sets using the algebraic product. The third and fourth layers perform operations related to sharpening the resulting membership function.
A linear combination of the consequent parameters is the result of the ANFIS model. The final output f out can be written as [64]: where ω i (output) represents the firing strength of a rule, f 1 and f 2 are the fuzzy rules, x and y are the input nodes of ANFIS, and p i , q i and r i are the parameters set (consequent parameters). The FIS usually uses two methods of inference: The Mamdani method [65] and the Sugeno method [66]. In the present modeling, the latter method was employed. Figure 2 shows a schematic diagram of ANFIS modeling processes.

Radial Basis Function (RBFNN) Neural Network
Radial Basis Functions (RBF) are used in approximation functions for timing and control sequences. In artificial neural networks, radial basis functions are used as activation functions. Neural networks with radial basis functions have found application in solving classification problems, approximating tasks of multivariable functions, and in prediction problems [67][68][69][70]. The idea of the RBF network is based on the solutions of statistical methods of approximating the function of a numeric variable. A typical RBNFN is a structure containing: an input layer on which signals described by the input vector x are applied, a hidden layer with radial neurons (weights correspond to cluster center, usually the Gaussian function), and an output layer normally composed of one neuron (linear weighted sum), whose role is the weight summation of signals from hidden neurons [71]. Figure 3 presents the topology of the RBF model as a multi-input-single-output network composition.
where σ is the widths (or spread) of the hidden neuron. The output layer (y i ) in RBF can be written as: where w ij represents a weighted connection between the radial basis function neuron and output neuron; and N is the number of hidden-layer neurons. The constant term B represents a bias.

Group Method of Data Handling (GMDH) Neural Network
The GMDH network is a fast-learning machine based on the principle of heuristic self-organization [72,73]. It is a polynomial theory of complex systems that is applied in a wide variety of areas in data mining and knowledge discovery, forecasting and systems modeling, and optimization and pattern recognition [74].
The Group Method of Data Handling (GMDH) is a self-organizing methodology, with the choice of input variables being made automatically. This is a combinatorial multi-layer algorithm in which a network of layers and nodes is generated using several inputs from the data stream being evaluated. The architecture of a polynomial network is formed during the training process, and the node activation function is based on elementary polynomials of arbitrary order [75,76].
A model which uses the GMDH algorithm creates a network of neurons. Through the connections par neurons via of quadratic and triquadratic polynomials, in each layer, new neurons are spawned. The formal assumption is to find a function,f , that can be approximately used instead of an actual function, f , in order to predict the output,ŷ, for a given input vector, X = (x 1 , x 2 , x 3 . . . , x n ), as close as possible to its actual output, y [74].
In this case: It is now possible to train a GMDH network to predict the output values,ŷ i , for any given input vector, To minimize the square of the difference between actual and predicted output, it is necessary to determine a GMDH network, that is: A connection between inputs and outputs can be expressed by the series functions of Volterra, which is the discrete analogous of the polynomial of Kolmogorov-Gabor [77,78].
where {x 1 , x 2 , x 3 . . . }: inputs; {a, b, c . . . }: polynomial coefficients; and y: the node output. Figure 4 shows a schematic diagram of a GMDH model with three layers and M numbers of input variables.

Evaluation Criteria
Several evaluation criteria were used to assess the accuracy of the modeling process and evaluate the correlation between observed-predicted samples: Root Mean Square Error (RMSE), Normalized Root Mean Square Error (NRMSE), Mean Absolute Error (MAE), coefficient of determination R 2 , and the Nash-Sutcliffe efficiency criterion (NS). The RMSE and MAE show the difference between predictions and actual values, while the NRMSE is a non-dimensional form of the RMSE. The R 2 criterion is used to evaluate the correlation between predictions and actual values. The Nash-Sutcliffe efficiency criterion (NS) normalizes the variance of the errors with the variance of the measurements [79]. These criteria are defined by the following equations: where y i and y-observed data and mean of the observed data (respectively); f i and fforecast data and mean of the forecast data (respectively); n-number of forecast data; and y max and y min -extreme values. When the RMSE, MAE and NRMSE are closer to zero, and R2 and NS are approaching 1, then the performance of the models is more favorable. When an NRMSE value is greater than 0.3 then the model is poor. High performance is achieved with values below 0.1 [50]. The general process of modeling and predicting the time series of TRW is shown as a flowchart in Figure 5.
In the current study, Minitab software was used for time series analysis and the implementation of stochastic models (AR, MA, ARMA and ARIMA), while MATLAB software was employed to implement the machine learning models (ANFIS, RBF and GMDH). Graphs of the results were prepared using Minitab, Excel and R software.

Modeling and Predicting TRW
To predict TRW, an input variable was determined in an initial step. To make a reasonable comparison between several models, they must use the same input variables. As the stochastic models can only use the main variable's previous amounts (time lags) as inputs, it is logical to use the time lags of TW for neural network models too. In this study, therefore, the daily time lags of TRW for each hydrometric station were investigated through Autocorrelation (ACF) and Partial Autocorrelation (PACF) functions. The graphs of these two functions are presented in Figure 6.
The ACF and PACF graphs are drawn for time lags of 130 days ( Figure 6). ACF plots for all stations indicate that there is no seasonal degree in the daily TRW time series, and we must therefore use non-seasonal patterns for the stochastic models. Furthermore, the ACFs show a decreasing trend of correlation as time lags increase. The autocorrelations are significant for daily time lags up to the 77th day, but using all of them as inputs increases model complexity and violates the principle of parsimony [57]. Finally, when correlations decrease, usage of the less correlated lags increases the prediction error. Thus, it is better to use the most correlated time lags. In these ACF graphs, the time lags 1, 2, 3, 4 and 5 have the largest correlation amounts and have therefore been selected as input variables. In the PACF graphs, too, there are some significantly correlated time lags, i.e., the time lags of 1, 2, 3, 4, 5, 6, 7 and 8 days for the Bobry station, of 1, 2, 3, 4 and 5 days for the Sieradz station, of 1, 2 and 3 days for the Poznań station, and of 1, 2 and 3 days for the Gorzow Wielkopolski station. However, among them, the one-day time lag is specifically the most correlated, so the PACF graphs only suggest the 1st time lag of TRW as their predictor input. After preparing the input-target samples, the data are divided into two parts of training and testing (as per Table 1). The best stochastic models are identified by trial and error of the Autoregressive, Moving Average and Integrated degrees, up to five nonseasonal degrees. This includes five models for the patterns AR and MA, 25 models for ARMA, and 125 models for ARIMA. The artificial intelligence models were each optimized by the trial and error of their own parameters: the ANFIS model was developed based on the fuzzy c-means (FCM) clustering method, with its optimization parameter being the number of clusters; the RBF's parameters are the spread and the maximum neurons; and the GMDH's parameters are the number of layers and the number of neurons in each layer. The modeling and prediction results are shown in Table 2. The evaluation in this step is performed using the RMSE and MAE criteria ( Table 2). Since the validity of the numerical models is specified during their test periods, in this table we discuss the test phase for the stations. At first glance, we can see partial differences between the AI and the stochastics. Among the stochastic methods, the MA had the highest prediction errors, but the others-including AR, ARMA and ARIMA-had better performance. Among the AI methods, the best performance was displayed by the ANFIS and RBF models, with the GMDH was the weakest AI model. For the Bobry station, ARMA was the best stochastic model, with RMSE = 0.920 • C and MAE = 0.690 • C, while ANFIS with inputs of ACF had the highest accuracy among AIs (RMSE = 0.916 • C and MAE = 0.694 • C). In a similar scenario for the Sieradz station, the ANFIS was reported as the best AI, and the ARIMA as the best stochastic model. For the Poznań and Gorzow Wielkopolski stations, the criteria values were less than for the previous stations (RMSE values of about 0.6 • C and MAE of about 0.4 • C), and the AR and RBF models were the best among the stochastic and AI models, respectively. Furthermore, there was no significant reported superiority when comparing ACF and PACF input selectors. Scatter plots were drawn to make a graphical comparison between the prediction results of the stochastics and the AIs (Figure 7).  Figure 7 shows the scatter plots for predicted TRW and its actual observations. In Figure 7, the best-performing stochastic models are shown on the left, and the best AIs on the right. These graphs demonstrate that predictions are correlate significantly with observed TRW at all stations. The R 2 values are above 97%, and the dots come together well around the fitted regression line. In all cases, the R 2 values of the stochastics and AIs are too close, and the differences are partial (in the 1st or even 2nd decimal place of R 2 ). In this comparison, therefore, the linear stochastic models can be considered more suitable due to their simplicity. The best R 2 value is 99.381%, and was obtained for the RBF model at the Gorzow Wielkopolski stations. The weakest R 2 value-97.687%-was determined for the ARMA model at the Bobry station.

Investigating the Models in Extreme TRW Deciles
To investigate the prediction abilities of the used models in extreme events, the highest and lowest TRW deciles were separated in the test period. Next, their error distributions were determined, and are shown as violin plots (Figure 8).   Table 2), with violin plots drawn separately for the highest and lowest deciles, by station. On the basis of the vicinity of the violins' main curvature to the Error = 0 • C line, it is demonstrated that all the models are suitable for both of the extreme deciles. A comparison of the models makes it obvious that the violins' main curvatures in stochastic models are somewhat closer to the Error = 0 line than in AI models. The clearest examples of this are the Bobry and Sieradz stations in the lowest decile, and the Bobry and Gorzow Wielkopolski stations in the highest decile. When comparing the two studied deciles, we see clearly that the violins are wider in the highest decile and more acute in the lowest. This demonstrates that the lowest TRW decile has less errors and can achieve better predictions through numerical models than the highest decile. If looked at from another perspective, the plots (Figure 8) generally show that the curvatures are sharper for Poznań and Gorzow Wielkopolski stations, which indicates that the TRW at these two stations offers better predictions than at Bobry and Sieradz.
A Spearman non-parametric correlation test was also implemented between the observations and the predictions. This test was applied seamlessly for all of the stations. The consequent correlation matrix is shown in Table 3. This non-parametric statistical test (Table 3) says that the predictions are significantly correlated at the 0.01 level. In the upper decile, the correlations are stronger than in the lower decile. In both deciles, the predictions of MA are reported to be the weakest correlated outputs (0.607 and 0.759) against the observations. Additionally, the greatest difference between the models is established as being between MA and GMDH, leading to correlation coefficients of 0.735 and 0.878 for the lower and upper deciles, respectively. However, in general, the correlations are high, and this indicates that there are no significant differences between the models' predictions with respect to extreme events of water temperature.

Comparing Prediction Performance between Stations
The ranges of TRW differ at the four stations ( Table 1). The NRMSE is a normalized form of the RMSE, and considers the range of the observation dataset and is a good measurement for making a comparison between different ranged datasets. In this step, the prediction accuracies were investigated using NRMSE and NS criteria. For this purpose, the combined line-bar chart was used for the test period of all seven studied models (Figure 9). At first glance, it is obvious that in all seven models, the NRMSE values have a decreasing trend, and NS values have an increasing trend at the stations. This indicates that the Bobry and Sieradz stations have lower prediction accuracies than the Poznań and Gorzow Wielkopolski stations. As a matter of fact, predictions offered by the models improve as we proceed downstream along the catchment area.
In the case of the Warta River, which constitutes the object of study, better prediction results for TRW were obtained for the middle and lower sections of its course (for the Poznań and Gorzow Wielkopolski water gauges), and slightly worse results were obtained for the water gauges located along the upper course (Bobry and Sieradz). Differences in the effectiveness of the prediction models may result from the influence of local factors modifying the features of both the thermal regime and the runoff regime, influencing, for example, an increase in the variability and irregularity of flow. Figure 10 shows the time series plots of best predicted values and their observations in both training and testing periods of the stations. As can be seen, in downstream stations (Bobry and Sieradz), there are more significant overlaps than in upstream ones (Poznań and Gorzow Wielkopolski). Research on the trend of changes in TRW in Poland, carried out by Graf and Wrzesiński [80], identified water gauges on the Warta River in which the observation series of TRW showed an opposite trend of changes compared to the rest of the country. In most cases, these are located along sections of the river that are subjected to anthropopressure, such as a retention reservoir, municipal wastewater discharges, or modifications of the way in which the valley is used. The Bobry and Sieradz water gauges (upper section of the river course) are located on the river above the Jeziorsko Reservoir, which was built to regulate flows on the river. In its upper reaches, the river shows greater flow variability and greater irregularity, which may affect the modification of TRW distribution and, consequently, void the prediction results. The impact of anthropogenic influences on the structure of the Warta River TRW measurement series has been confirmed in research performed by, among others, [80][81][82][83].

Discussion
In the prediction made for daily TRW in the Warta River, different results were obtained when using the stochastic and AI models. As emphasized by Zhu and Piotrowski [3], comparisons of the results of models used to forecast the TRW have not been entirely conclusive. This problem is also indicated by the results of studies assessing the possibilities of predicting the characteristics of the thermal regime of rivers using various statistical models and artificial intelligence [3,22,34,84,85]. This is the result of various assumptions and limitations in the use of methods, input data for modeling, the level of data resolution, and the length of the observation series [54].
According to Qiu et al. [86], machine learning models perform well in TRW modeling, offering a very accurate empirical basis for its prediction. Zhu et al. [21,34] used different versions of methods to predict daily TRW: the Forward Neural Network (FFNN), Gaussian Process Regression (GPR) and the Decision Tree (DT), demonstrating that these models had similar performance when only air temperature was used as a predictor. Additionally, when the day of the year was included as an input, the performance of the three machine learning models improved significantly. As Graf [54] emphasized, TRW as a forecast variable also depends on its values in previous periods, which is related to the long memory of the system.
In the predictive TRW models developed for the Warta River, we included daily temperature delays as an input variable. According to Santos-Fernandez et al. [87], when the purpose of the application of the stochastic method is time interpolation and forecasting of future TRW values at the locations of measuring points (water gauges), and there is no need to describe unique spatial relationships on streams, thermal conditions reflect the standard models of time series.
The most favorable TRW forecasts in the upper section of the Warta River were obtained through the ARMA (Beaver station) and ARIMA (Sieradz) stochastic models, and the ANFIS AI model. However, for the middle (Poznań) and lower (Gorzow Wielkopolski) sections of the river, the best predictions were given by the AR model (among the stochastic models) and the RBFNN model among the AI models. The weakest predictive effects were obtained through the GMDH model, which can be associated with its optimization. The validity of using the first-order autoregressive model AR (1) for predicting TRW has been demonstrated, among others, by [10,88,89], while the Autoregressive Integrated Moving Average (ARIMA) was applied by Graf [54]. According to Santos-Fernandez et al. [87], in practice, in order to fit a model that is simple and generates precise estimates of fixed effects in predicting TRW, the AR model should be applied.
In the literature, there is a study authored by Graf et al. [22], which predicted the TRW of the Warta River. The researchers investigated the present four hydrometric stations, but used different machine learning approaches. Their best fitted model was the Wavelet Multilayer Perceptron Neural Network (WTMLPNN). When comparing similar hydrometric stations, the RMSE and MAE values show that the models implemented in the current study have superior performance. For example, at Bobry station, Graf et al. [22] obtained RMSE = 1.217 • C and MAE = 0.930 • C using WTMLPNN, which is less commonly reported through ANFIS in the current study (RMSE = 0.916 • C and MAE = 0.694 • C).
Additionally, Graf et al. [22] and the present study obtained a similar comparison result for the Sieradz station; RMSE = 0.981 • C and MAE = 0.781 • C through WTMLPNN in [22] and RMSE = 0.889 • C and MAE = 0.652 • C through ANFIS in the current study. Furthermore, at the Poznań and Gorzow Wielkopolski stations, the current RBF model worked better than the WTMLPNN employed by Graf et al. [22]. In general, the AIs (including the ANFIS and the RBF) used in the present study achieve a superiority of 58.7% over the WTMLPNN. The reason for this seems to be related to the nature of temperature data. The temperature time series display a strong linear autocorrelation, and this linear relation can be better realized by simpler models. As we see in this comparison, the WTMLPNN is an MLP model with a large number of parameters (including the number of hidden layers, the number of neurons in individual layers, and the type of the transfer function in each neuron) combined by wavelet analysis, while the ANFIS has just one (number of clusters), and the RBF only two parameters (spread and the maximum number of neurons). Another difference between [22] and the current study is that the latter used stochastic methods, and in some instances (the Sieradz station, for example), these performed better than the AIs. The superiority of stochastic models in relation to AIs for temperature forecasting has also been reported for the short-and long-term forecasting of air temperature. Aghelpour et al. [39] also demonstrated that in different climates air temperature was better forecast by linear stochastic models than the complex AI and meta-innovative models; again, this shows the nature of linear autocorrelation in temperature time series.
Cole et al. [90] found the MLPNN to be better than a heat budget approach. Hong and Bhamidimarri [91] determined that the DNFIS model was superior not only to the classical ANFIS, but also to the MLPNN, at least in terms of short-term TRW forecasting. Zhu et al. [32] compared the performance of the MLPNN and ANFIS models, indicating that the MLPNN model provides the best overall performance, and that the choice of identification method significantly affects the performance of the ANFIS model. The applicability of the ANFIS model for forecasting TRW is quite restricted, although ANFIS are of great use in other research applications [90,92,93]. In another study, Zhu et al. [21] showed that the coupled neural network performs better than the GPR and DT models. According to Piotrowski et al. [31], the choice of a neural network depends on the method of comparing models.

Conclusions
The investigations show promising results in predicting daily TRW. The results can be summarized in the following points:

•
Both AI and stochastic model types had acceptable performance in predicting daily TRW. • Among the stochastic methods, the AR, ARMA and ARIMA, and among the AI methods, the ANFIS and RBF, offered the best-fitted predictions of TRW. The performance difference between these two types of models is very small, and indeed negligible.

•
The stochastic models have less prediction errors in extreme TRW events.
The general results of comparisons show the superiority of the linear stochastic models, due to simplicity and parsimony. Additionally, AI models with fewer parameters (ANFIS and RBF) offered better results than those that were more complex and had large numbers of parameters (GMDH). In fact, it can be stated that for the purposes of forecasting TRW, the simpler the model, the more appropriate and logical is its use; as can be seen for the Poznań and Gorzow Wielkopolski stations, the AR(2) model-a simple linear regression model with just two input variables-was the best variant. The study confirms the applicability of these numerical approaches for the current river basin, and that they have research value for other catchment areas (among others, in order to extend the results). Furthermore, it is suggested that future researchers use optimization algorithms, such as Genetic, Particle swarm, dragonfly, etc., hybridized by the Ais, to improve the abilities of the AI models. To examine the impacts of global warming and climate change on TRW, it is a possibility to use the meteorological variables for long-term future periods, which could be another suggestion for the long-term forecasting of the variable TRW.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.