Neural-Based Ensembles and Unorganized Machines to Predict Streamflow Series from Hydroelectric Plants

Estimating future streamflows is a key step in producing electricity for countries with hydroelectric plants. Accurate predictions are particularly important due to environmental and economic impact they lead. In order to analyze the forecasting capability of models regarding monthly seasonal streamflow series, we realized an extensive investigation considering: six versions of unorganized machines—extreme learning machines (ELM) with and without regularization coefficient (RC), and echo state network (ESN) using the reservoirs from Jaeger’s and Ozturk et al., with and without RC. Additionally, we addressed the ELM as the combiner of a neural-based ensemble, an investigation not yet accomplished in such context. A comparative analysis was performed utilizing two linear approaches (autoregressive model (AR) and autoregressive and moving average model (ARMA)), four artificial neural networks (multilayer perceptron, radial basis function, Elman network, and Jordan network), and four ensembles. The tests were conducted at five hydroelectric plants, using horizons of 1, 3, 6, and 12 steps ahead. The results indicated that the unorganized machines and the ELM ensembles performed better than the linear models in all simulations. Moreover, the errors showed that the unorganized machines and the ELM-based ensembles reached the best general performances.


Introduction
Planning the operation of a power generation system is defined by establishing the use of energy sources in the most efficient way [1][2][3]. Renewable sources are those with the lowest operation cost since the fuel is provided free of charge by nature. Good predictions of river streamflows allow resource management according to their future availability [4]. Therefore, this is mandatory for countries where there are hydroelectric plants [5][6][7].
The International Hydropower Association published the Hydropower Status Report 2020 [8], showing that 4306 TWh of electricity was generated in the world using hydroelectric plants in 2019. This amount represents the single most significant contribution from a renewable energy source in history. The document summarizes data from 13 In this context, it is important to predict accurate information about rivers' monthly seasonal streamflow, since it makes the turbines spin, transforming kinetic into electric energy [5,9]. These series present a specific seasonal behavior due to the volume of water throughout the year being mostly dependent on rainfall [10,11]. Ensuring efficient operation of such plants is needed, since it significantly impacts cost of production and suitable use of water [12,13]. Additionally, their operation leads to a smaller environmental impact than burning carboniferous fuel. Due to this, many pieces of research have presented investigations on such fields for countries such as China [14], Canada [15], Serbia [16], Norway [17], Malaysia [7], and Brazil [9].
Linear and nonlinear methodologies have been proposed to solve this problem. As discussed in [5,12], and [18], the linear methods of the Box-Jenkins family are widely used [19]. The autoregressive model (AR) is highlighted because its easy implementation process allows the calculation of its free coefficients in a simple and deterministic manner. An extended proposal for this task is the autoregressive and moving average model (ARMA), a more general methodology that uses the errors of past predictions to form the output response [19,20].
However, artificial neural networks (ANN) are prominent for this kind of problem [9,[21][22][23][24]. They were inspired by the operation of the nervous system of superior organisms, recognizing data regularities and patterns through training and determining generalizations based on the acquired knowledge [18,[25][26][27].
In recent times, some studies have indicated that the best results for time series forecasting can be achieved by combining different predictors using ensembles [28][29][30]. Many authors have applied these techniques to similar tasks [31][32][33]. However, the approaches commonly explore only the average of the single models output or the classic neural networks (multilayer perceptron (MLP) and radial basis function networks (RBF)) as a combiner. The specialized literature regarding streamflow forecasting shows that ANN approaches stand out, but some authors use linear models [16], support vector regression [14], and ensembles [15].
This work proposes using a special class of neural networks, the unorganized machines (UM), to solve the aforementioned forecasting task. The term UM defines the extreme learning machines (ELM) and echo state network (ESN) collectively. In this investigation, we addressed six versions of UMs: ELM with and without regularization coefficient (RC) as well as ESN using the reservoir designs from Jaeger's and Ozturk et al. with and without RC. Additionally, we addressed the ELM and the ELM (RC) as the combiner of a neural-based ensemble.
To realize an extensive comparative study, we addressed two linear models (AR and ARMA models); four well-known artificial neural networks (MLP, RBF, Elman network, Jordan Network); and four other ensembles, using as combiners the average, the median, the MLP, and the RBF. To the best of our knowledge, the use of ELM ensembles in this problem and similar repertoires of models is an investigation not yet accomplished. Therefore, we would like to fill this gap.
In this study, the database is from Brazil. In the country, electric energy is mostly generated by hydroelectric plants, these being responsible for 60% of all electric power produced in 2018 [8,34,35]. In addition, Brazil is the one of the largest producers of hydropower in the world. Therefore, the results achieved can be extensible for other countries.
The remainder of this work is organized as follows: Section 2 discusses the linear models from the Box-Jenkins methodology; Section 3 presents the artificial neural networks and the ensembles; Section 4 shows the case study, the details on the seasonal streamflow series, the computational results, and the result analysis; and Section 5 shows the conclusions.

Linear Forecasting Models
The definition of linear prediction models by Box-Jenkins makes use of linear filtering concepts [19]. The x t element of a time series results from a linear filter Ψ on a Gaussian white noise a t . Another form to represent a linear model is by weighing the previous signals (x t−1 , x t−2 , . . ., x 1 ) for the next forecast element (x t ). To do this, we add a noise a t and the mean of the series µ: where a t is the noise of t-th term, and π n is the weight assigned to the (t − n)-th term of series.

Autoregressive Model
Given any value x t of a time series, the delay p is defined with x t−p . An autoregressive process of order p (AR(p)) is defined as the linear combination of p delays of observation x t , with the addition of a white Gaussian noise a t , as shown in Equation (2) [19]: The term a t is considered as the inherent error of the regression process. This is the error of the forecast when the model is used to predict future values. Thus, the optimum φ p coefficients must be calculated to minimize the error a t [36].
To determine the optimum values of φ p it is necessary to solve a recurrence relation that emerges from its autocorrelation function, as presented in Equation (3): If we expand this relation to j = 1, 2, . . ., p we obtain the set of linear equations denominated Yule-Walker equations, which define φ 1 , φ 2 , . . ., φ p as a function of ρ 1 , ρ 2 , . . ., ρ p for a model AR(p), as in Equation (4) [19]:

Autoregressive and Moving Average Model
Unlike the autoregressive, in the moving average model (MA), white noise signals are combined [19]. A model MA is said to be of order q if the prediction of x t utilizes q samples of white noise signals, as in Equation (5): where θ t , ∀t ∈ 1, 2, . . . , q are the parameters of model. An ARMA model is the union of AR and MA. To predict using the ARMA of order p, q, it is necessary to address p prior signals (AR) and q white noise signals (MA). Mathematically, an ARMA(p, q) model is described as in Equation (6): where φ t and θ t are the model parameters.
Unlike the AR, the calculation of the ARMA coefficient is done by solving nonlinear equations. However, it is possible to achieve an optimal linear predictor if the choice of these coefficients is adequate [18,19].

Artificial Neural Network
Artificial neural networks (ANN) are distributed and parallel systems composed of simple data processing units. These units are denominated artificial neurons and are capable of computing mathematical functions, which, in most cases, are nonlinear [37,38]. Artificial neurons are connected by normally unidirectional connections and can be arranged in one or more layers [25].
The ANNs present a learning ability through the application of a training method. They can generalize the knowledge acquired through the solution of problem instances for which no answer is known [39]. Neural networks are widely used in many areas of science, engineering computing, medicine, and others [9,25,[40][41][42][43][44][45].

Multilayer Perceptron (MLP)
The multilayer perceptron (MLP) consists of a set of artificial neurons arranged in multiple layers so that the input signal propagates through the network layer by layer [25]. It is considered one of the most versatile architectures for applicability and is used in the universal approximation of functions, pattern recognition, process identification and control, time series forecasting, and system optimization [25,29].
The training process consists of adjusting the synaptic weights of the artificial neuron to find the set that achieves the best mapping of the desired event [46,47]. The most known training method for MLP is the steepest descent in which the gradient vector is calculated using the backpropagation algorithm [48,49].
The error signal of a neuron j in iteration t is given by Equation (7): where e j (t) is the error, d j (t) is the expected result (desired output), and y j (t) is the output of the network. Finally, the rule for updating the synaptic weights of each neuron is done using Equation (8): where w m ij (t) is the synaptic input weight i of the neuron j of the layer m in iteration t, and ∂E(t) is the partial derivative of the error.
The training algorithm consists of two phases. Initially, the input data are propagated by the network to obtain its outputs. These values are then compared with the desired ones to obtain the error. In the second step, the opposite path is performed from the output layer to the input layer. In this case, all the synaptic weights are adjusted according to the rule of error correction assumed, so that the output given by the network in the following iteration is closer to the expected one [25].

Radial Basis Function Network (RBF)
Radial basis function networks (RBF), unlike MLPs, have only two layers, one hidden and one output layer. In the first, all the kernel (activation) functions are radial-based [25]. One of the functions most used is Gaussian function expressed in Equation (9): (9) in which c is the center of Gaussian and σ 2 its variance as a function of the center. The training of RBFs is performed in two stages. First, the weights of the intermediate layer are calculated, and the center is adjusted to the value of the base variance of each function. Subsequently, the weights of the output layer are tuned in a supervised process similar of the MLP [25,50].

Elman and Jordan Networks
The Elman Network is a recursive neural architecture created by Elman [51] based on an MLP. The author divided the input layer into two parts: the first comprises the network inputs and the second, denominated context unit, consists of the outputs of the hidden layer. An Elman network is shown in Figure 1.

Elman and Jordan Networks
The Elman Network is a recursive neural architecture created by Elman [51] based on an MLP. The author divided the input layer into two parts: the first comprises the network inputs and the second, denominated context unit, consists of the outputs of the hidden layer. An Elman network is shown in Figure 1. As the context units of an Elman network are treated as inputs, they also have networkassociated synaptic weights and can be adjusted by the backpropagation through time algorithm. In this work, we used the truncated backpropagation through time version for one delay [25].
Jordan [52] created the first recurrent neu6ral network based on similar premises. This neural network was initially used for time series recognition but is currently applied to all kinds of problems. Here, the context units are fed by the outputs of the output layer neurons instead of the hidden layer. Figure 2 illustrates this model.  As the context units of an Elman network are treated as inputs, they also have network-associated synaptic weights and can be adjusted by the backpropagation through time algorithm. In this work, we used the truncated backpropagation through time version for one delay [25].
Jordan [52] created the first recurrent neu6ral network based on similar premises. This neural network was initially used for time series recognition but is currently applied to all kinds of problems. Here, the context units are fed by the outputs of the output layer neurons instead of the hidden layer. Figure 2 illustrates this model.
As in the Elman network, the context units are treated as network inputs, also having associated synaptic weights, which allows the use of truncated backpropagation through time [25].
Energies 2020, 13, 4769 6 of 22 associated synaptic weights and can be adjusted by the backpropagation through time algorithm. In this work, we used the truncated backpropagation through time version for one delay [25].
Jordan [52] created the first recurrent neu6ral network based on similar premises. This neural network was initially used for time series recognition but is currently applied to all kinds of problems. Here, the context units are fed by the outputs of the output layer neurons instead of the hidden layer. Figure 2 illustrates this model. As in the Elman network, the context units are treated as network inputs, also having associated synaptic weights, which allows the use of truncated backpropagation through time [25].

Extreme Learning Machines (ELM)
Extreme learning machines (ELM), introduced by Huang et al. [53], are an architecture of feedforward neural network with only a hidden layer. The main difference between them and the traditional MLP is that the synaptic weights of the hidden layer are chosen randomly and remain

Extreme Learning Machines (ELM)
Extreme learning machines (ELM), introduced by Huang et al. [53], are an architecture of feedforward neural network with only a hidden layer. The main difference between them and the traditional MLP is that the synaptic weights of the hidden layer are chosen randomly and remain untuned during the training process. For this reason, the ELM and the ESN are classified as unorganized machines.
Adjusting an ELM consists of determining the matrix with the synaptic weights of the output layer W out that generates the smallest error for the desired output vector d, which can be done through an analytic solution. This process is summarized in utilizing the Moore-Penrose pseudo-inverse operator, which ensures the minimum mean square error and confers to the ELM a fast training process. This solution is demonstrated in Equation (10): where X hid ∈ R |x|×m is the matrix with all the outputs of the hidden layer for the training set, and m is the number of neurons in the output layer. This operator ensures that the ELM training is much more computationally efficient than the application of the backpropagation. However, network performance can be improved by inserting a regularization coefficient C, as in Equation (11) [54]: where To determine the best value of C, it is necessary to test all 52 possible values [55].

Echo State Networks (ESN)
Jaeger proposed echo state networks (ESN) as a new type of recurrent neural network. Recursive networks allow different outputs for the same input since it depends on the internal state of the network. The idea of using the term echo is based on the perception that the most recent samples and the previous states influence more strongly the output [56]. The theoretical proof of the existence of an echo state is denominated echo state propriety [9] In the original proposal from Jaeger, the ESN presents three layers. The hidden layer is denominated dynamic reservoir and consists of fully interconnected neurons, which generate a nonlinear characteristic. The output layer is responsible for combining the outputs of the dynamic reservoir. The subsequent layer, in turn, corresponds to the linear portion of the network. Unlike other RNNs, which can present feedback in any layer, the original proposal only presents feedback loops in the dynamic reservoir. Figure 3 shows a generic ESN.
network. The idea of using the term echo is based on the perception that the most recent samples and the previous states influence more strongly the output [56]. The theoretical proof of the existence of an echo state is denominated echo state propriety [9] In the original proposal from Jaeger, the ESN presents three layers. The hidden layer is denominated dynamic reservoir and consists of fully interconnected neurons, which generate a nonlinear characteristic. The output layer is responsible for combining the outputs of the dynamic reservoir. The subsequent layer, in turn, corresponds to the linear portion of the network. Unlike other RNNs, which can present feedback in any layer, the original proposal only presents feedback loops in the dynamic reservoir. Figure 3 shows a generic ESN. For each new input on time + 1, the states of the network are updated according to 12: For each new input on time t + 1, the states of the network are updated according to Equation (12): x t+1 = f W in u t+1 + Wx t (12) in which x t+1 are the states in the input t + 1, f(·) represents the activations of reservoir neurons , W in are the coefficients of the input layer, and u t is the input vector at time t.
In turn, the network output vector y t+1 is given by Equation (13): where W out ∈ R L×N is the matrix with the synaptic weights of the output layer, and L is the number of network outputs. As occurs in the ELM, the sy6naptic weights of the ESN dynamic reservoir are not adjusted during training. The Moore-Penrose pseudo-inverse operator is also used to determine the weights of W out . Additionally, the performance can be improved by using the regularization coefficient.
In this work, we consider two forms of creating the dynamic reservoir. Jaeger et al. [56] proposed the weight matrix by setting three possible values, which are randomly chosen according to the probabilities described in Equation (14): Ozturk et al. [57] elaborated a reservoir rich in mean entropy of the echo states. The eigenvalues respect a uniform distribution in the unit circle, creating a canonical matrix as presented in Equation (15): in which r is the spectral radius, set in the range [0, 1], and N is the number of neurons present in the dynamics reservoir.

Ensemble Methodology
An ensemble combines the results of several individually adjusted models with the aim of improving the final response of the system [58]. The idea behind this methodology is that different methods, such as neural networks, produce different behaviors when the same inputs are applied. Therefore, a methodology can present better responses for a given range of data, while another works better in another band. A combination method (average, voting, or another neural network) is applied to produce the final ensemble output [29,30]. Figure 4 presents an example of this model.
in which is the spectral radius, set in the range [0, 1], and is the number of neurons present in the dynamics reservoir.

Ensemble Methodology
An ensemble combines the results of several individually adjusted models with the aim of improving the final response of the system [58]. The idea behind this methodology is that different methods, such as neural networks, produce different behaviors when the same inputs are applied. Therefore, a methodology can present better responses for a given range of data, while another works better in another band. A combination method (average, voting, or another neural network) is applied to produce the final ensemble output [29,30]. Figure 4 presents an example of this model.  The combination of several specialists through an ensemble does not exclude predictors' need to show good individual performance. The purpose of an ensemble is to improve upon existing good results. Therefore, the essential condition for its accuracy is that its models be accurate and diverse [59]. Over time, ensembles have been used to solve many problems [60][61][62][63].
In this work, we used some distinct combiners. First, we addressed some non-trainable methods: the mean and the median of the outputs [29]. Additionally, we applied feedforward neural models: MLP, RBF, and ELM with and without the regularization coefficient [29,30,64]. We highlight that these methodologies are not used often, especially for seasonal streamflow series forecasting.

Case Study
Streamflow series are a kind of time series in which each observation refers to monthly, weekly, daily, or hourly average flow. This work addresses the monthly average flow. These series present seasonality since they follow the rain cycles that occur during the year [9]. Seasonality changes the standard behavior of the series, which must be adjusted to improve the response of predictors [5].
Deseasonalization is an operation that removes the seasonal component of the monthly series. They become stationary, presenting zero mean and unitary standard deviation. The new deseasonalized series is given by Equation (16) [50]:  Table 1, they have different hydrological behavior. Therefore, it is possible to accomplish a robust performance analysis. All series comprise samples from January of 1931 to December of 2015, a total of 85 years, or 1020 months. The data were divided into three sets: from 1931 to 1995 is the training set, used to adjust the free parameters of the models; from 1996 to 2005 is the validation set, used in the cross-validation process and to adjust the value of the regularization coefficient of ELMs and ESNs; and from 2006 to 2015 is the test set, utilized to measure the performance of the models. The mean squared error (MSE) was adopted as the performance metric, as done in other works of the area [12].
Predictions were made using up to the last six delays of the samples as inputs. These delays were selected using the wrapper method [66,67]. The predictions were performed for 1 (next month), 3 (next season), 6 (next semester), and 12 (next year) steps ahead, using the recursive prediction technique [68,69].
During the preprocessing stage, the series' seasonal component was removed through deseasonalization to make the behavior almost stationary. Therefore, the predicted values required an additional postprocessing step, where the data had the seasonal component reinserted. Figure 5 shows step by step the entire prediction process.  In total, this work applied 18 predictive methods, two of which were linear methods of the Box-Jenkins family, 10 artificial neural networks, and six ensembles:  In total, this work applied 18 predictive methods, two of which were linear methods of the Box-Jenkins family, 10 artificial neural networks, and six ensembles: All neural networks used the hyperbolic tangent as activation function with β = 1. The learning rate adopted for the models, which use backpropagation, was 0.1. As a stopping criterion, a minimum improvement in MSE of 10 −6 or a maximum of 2000 epochs was considered. The networks were tested for the number of neurons from 5 to 200, with an increase of 5 neurons. All these parameter's values were determined after empirical tests.
The regularization coefficients were evaluated with all the 52 possibilities mentioned in Section 3.4. The one with the lowest MSE in the validation set was chosen for ELM and ESN approaches. The wrapper method was used to select the best lags to the single models as well as which experts were the best predictors when using the ensembles. The holdout cross-validation was also applied to all fully neural networks and ensembles to avoid overtraining and to determine the C of ELM and ESN.
All proposed models were executed 30 times to obtain sample output data for each input configuration and number of neurons, these having been chosen as the best executions. Additionally, following the methodology addressed in [9] and [70], we adjusted 12 independent models, one for each month, for all proposed methods. It is allowed since the mean and the variance of each month are distinct, and this approach can lead to better results [9].

Experimental Results
In this section, we present and discuss the results obtained by all models and four forecasting horizons. Tables 2-6 show the results achieved for Agua Vermelha, Belo Monte, Ilha Solteira, Paulo Afonso, and Tucurui hydroelectric plants, for both real and deseasonalized domains. The best performances are highlighted in bold.
The Friedman [71,72] test was applied to the results for the 30 runs of each predictive model proposed, regarding the MSE in the test set. The p-values achieved were smaller than 0.05. Therefore, we can assume that changing the predictor leads to a significant change in the results.
To analyze the dispersion of the results obtained after 30 executions [73], Figure 6 presents the boxplot graphic [72,74].   occurred because the deseasonalization process considered all parts of the series as having the same importance. In the literature, the error in the real space is adopted as the most important measure to evaluate the results [12,18]. As the forecasting horizon grows, the prediction process becomes more difficult, and the errors tend to increase for all models. It is directly related to the decreased correlation between the input samples and the desired future response. Therefore, the output estimates tend to achieve the long term mean, or the historical mean [9].
We elaborated Table 7 using the results from Tables 2-6 to show a winner ranking to illustrate which models achieved the best general performance.  As the forecasting horizon grows, the prediction process becomes more difficult, and the errors tend to increase for all models. It is directly related to the decreased correlation between the input samples and the desired future response. Therefore, the output estimates tend to achieve the long term mean, or the historical mean [9].
We elaborated Table 7 using the results from Tables 2-6 to show a winner ranking to illustrate which models achieved the best general performance.
For P = 1 step ahead, for all series, the best predictor was always an ensemble, highlighting the ELM-based combiner, which was the best for three of five scenarios (60%). This result indicates that the use of ensembles can lead to an increase in the performance. Moreover, the application of an unorganized machine requires less computational effort than the MLP or the RBF since its training process is based on a deterministic linear approach. It also corroborates that their approximation capability is elevated, overcoming their fully trained counterpart [9,12].
The results varied for P = 3, for which several architectures-ELM, Jaeger ESN, average ensemble, and MLP ensemble-were the best at least once. We emphasize that the ELM network was better in two cases, and the ESN in one. The UMs were better in 60% of the cases. Regarding P = 6, the ELM was also the best predictor, achieving the smallest error in four cases (80%), followed by the MLP, which was the best only for Tucurui.
Analyzing the last forecasting horizon, P = 12, four different neural architectures reached the best performance at least once: MLP, Elman, Jaeger ESN (twice), and Ozturk ESN (RC). An important observation is the presence of recurrent models among them. This horizon is very difficult to predict since the correlation between the input samples and the desired response is small. Therefore, there is an indication that the existence of model's internal memory is an advantage.
In summary, the unorganized networks (ESN and ELM), in stand-alone versions or as a combiner of an ensemble, provided the best results in 14 of 20 scenarios (70%). This is relevant since such methods are newer than the others and are simpler to implement.
Considering the reservoir design of ESNs, we achieved almost a draw; in nine cases, the proposal from Ozturk et al. was the best, and in 11, the Jaeger model achieved the smallest error. Therefore, we cannot state which one is the most adequate for the problem.
Regarding the feedforward neural models, one can observe for 16 of 20 cases (80%), the ELMs overcame the traditional MLP and RBF architectures. In the same way, the ESNs were superior to the traditional and the fully trained Elman and Jordan proposals in 17 of 20 scenarios (85%). This is strong evidence that the unorganized models are prominent candidates to carry out such problems.
Linear models did not outperform neural networks in any of the 20 scenarios. For the problem of forecasting monthly seasonal streamflow series, the results showed that ANNs were most appropriate. However, it is worth mentioning that linear models are still widely used in current days.
Finally, to provide a visual appreciation of the final simulation, Figure 7 presents the forecast made by the ELM ensemble for Água Vermelha plant with P = 1.

Conclusions
This work investigated the performance of unorganized machines-extreme learning machines (ELM), echo state networks (ESN), and ELM-based ensembles-on monthly seasonal streamflow series forecasting from hydroelectric plants. This is a very important task for countries where power generation is highly dependent on water as a source, such as Canada, China, Brazil, and the USA, among others. Due to the broad use of this kind of energy generation in the world, even a small

Conclusions
This work investigated the performance of unorganized machines-extreme learning machines (ELM), echo state networks (ESN), and ELM-based ensembles-on monthly seasonal streamflow series forecasting from hydroelectric plants. This is a very important task for countries where power generation is highly dependent on water as a source, such as Canada, China, Brazil, and the USA, among others. Due to the broad use of this kind of energy generation in the world, even a small improvement in the accuracy of the predictions can lead to significant financial resource savings as well as a reduction in the impact of using fossil fuels.
We also used many artificial neural network (ANN) architectures-multilayer perceptron (MLP), radial basis function networks (RBF), Jordan network, Elman network, and the ensemble methodology using the mean, the median, the MLP, and the RBF as combiner. Moreover, we compared the results with the traditional AR and ARMA linear models. We addressed four forecast horizons, P = 1, 3, 6, and 12 steps ahead, and the wrapper method to select the best delays (inputs).
The case study involves a database related to five hydroelectric plants. The tests showed that the neural ensembles were the most indicated for P = 1 since they presented the best performances in all the simulations of this scenario, especially those that employed the ELM. For P = 3 and 6, the ELM was highlighted. For P = 12, it was clear that the recurrent models were outstanding, mainly those with the ESN.
Regarding the linear models, this work showed its inferiority in comparison to the neural ones in all cases. Furthermore, the unorganized neural models (ELM and ESN), in their stand-alone versions or as combiners of the ensemble, prevailed over the others, presenting 14 of the lowest errors (70%).
These results are important since the unorganized machines are easy to implement and require less computational effort than the fully trained approaches. This is related to the use of the Moore-Penrose inverse operator to train their output layer, since it ensures the optimum value of the weights in the mean square error sense. The use of the backpropagation could lead the process to a local minimum, indicating how difficult the problem is. In at least 80% of the cases, the unorganized proposals (ELM and ESN) overcame the fully trained proposals (MLP, RBF, Elman, and Jordan).
Other deseasonalization processes should be investigated in future works. Additionally, th streamflow from distinct plants must be predicted and the results evaluated. Moreover, the use of bio-inspired optimization methods [76][77][78] is encouraged to optimize the ARMA model and the application of the support vector regression method.