A Comparison of SWAT and ANN Models for Daily Runoff Simulation in Different Climatic Zones of Peninsular Spain

Streamflow data are of prime importance to water-resources planning and management, and the accuracy of their estimation is very important for decision making. The Soil and Water Assessment Tool (SWAT) and Artificial Neural Network (ANN) models have been evaluated and compared to find a method to improve streamflow estimation. For a more complete evaluation, the accuracy and ability of these streamflow estimation models was also established separately based on their performance during different periods of flows using regional flow duration curves (FDCs). Specifically, the FDCs were divided into five sectors: very low, low, medium, high and very high flow. This segmentation of flow allows analysis of the model performance for every important discharge event precisely. In this study, the models were applied in two catchments in Peninsular Spain with contrasting climatic conditions: Atlantic and Mediterranean climates. The results indicate that SWAT and ANNs were generally good tools in daily streamflow modelling. However, SWAT was found to be more successful in relation to better simulation of lower flows, while ANNs were superior at estimating higher flows in all cases.


Introduction
Streamflow is one of the most important variables of the hydrological cycle.In a watershed, streamflow data are necessary for many water resources issues such as management, planning and hydraulic engineering design [1].Hydrological models are used in science and practice to predict extreme events in terms of flood and low-flow events for river management [2].Therefore, a challenge of hydrological models is to adequately represent all phases with the same model parameter set [3] to avoid underestimating the very high flow and therefore the risk of flooding and to avoid overestimating the very low flow and water supply problems.There are many hydrological models.Conceptual hydrologic models that simulate streamflow in a watershed take into consideration various processes of the hydrological cycle through mathematical formulation [4].Numerous hydrologic models have been developed to simulate the hydrologic processes and are important tools for estimating streamflow values, capable of establishing rainfall-runoff relationships [5].A sophisticated mathematical model is the Soil and Water Assessment Tool (SWAT) [6].SWAT is a conceptual semi-distributed model and currently is one of the most popular hydrologic models for watershed scale [7].It has been widely used to estimate the streamflow time series and requires a large amount of spatial and temporal data and input parameters.In addition, the broad range of value parameters and their complex interactions complicate the model parameterization and calibration process [4].To facilitate this process, SWAT-CUP (Calibration and Uncertainty Procedures) has been developed.It is a stand-alone program developed for calibration of SWAT which contains five calibration procedures and includes functionalities for validation and sensitivity analysis [8].
On the other hand, over the last decades with advances in computing, the estimation of hydrological variables by machine learning has gained much attention among researchers.Recent studies about real-life cases using soft computing techniques in hydrologic engineering consist of the following: Olyaie et al., (2015) [9] compared three artificial intelligence approaches, namely, artificial neural networks (ANNs), adaptive neuro-fuzzy inference system (ANFIS) and coupled wavelet and neural network (WANN), for estimating suspended sediment load (SSL) of river systems; Gholami et al., (2015) [10] modelled of groundwater level fluctuations using dendrochronology (tree-rings) and an ANN; Chen and Chau (2016) [11] developed a hybrid double feedforward neural network model for daily SSL estimation; Jimeno-Sáez et al., (2017) [12] used different machine learning models, such as ANN and ANFIS for instantaneous peak flow estimation based on maximum mean daily flow.
Specifically, ANNs, which have been introduced and widely applied to water resources system problems, were found to be powerful tools for the estimation of streamflow time series [13,14].The particular advantage of the ANN is that the network can be trained to learn these relationships without requiring a priori knowledge of the physical characteristics of the process [15].This feature makes ANNs an effective tool for modelling complex hydrological processes [16].ANNs are empirical models which can be used as an alternative to simulate hydrological processes by connecting inputs and outputs through mathematical functions without the need to know the relationship with the catchment characteristics [17].ANNs have been used in a considerable number of recent studies for estimating the values of streamflow [4,12,[18][19][20].Therefore, SWAT and ANN have been widely used for streamflow estimation.However, few studies have compared both models for daily streamflow estimation [5,[21][22][23].
Against this background, one of the goals of this work is to use SWAT and ANNs to build a hydrologic model in basins with contrasting climatic conditions to simulate streamflow.These models are assessed at basin scale and in daily time intervals.A comparison of performance of ANN models with different input variables (e.g., daily precipitation, daily precipitation of previous days, total precipitation of previous days, mean daily temperature) has been made to find the best and most efficient network structure.Secondly, the selection of the most appropriate model for each of the studied cases is analysed by comparing the performance of SWAT and ANN models.In addition, their efficiency for the estimation of different ranges of flow (from very high to very low flow) is determined based on the flow duration curve (FDC).Thus, the efficiency of these models has been assessed in two watersheds: the Ladra River Basin (LRB) with an Atlantic climate and the headwaters of the Segura River Basin (HSRB) with a Mediterranean climate.These basins were selected based on the wide diversity of climate conditions that they represent, including some of the rainiest areas in Europe in the northeast of Spain (LRB) and the driest areas in the southeast of Spain (HSRB).To ensure the validity of the results, both basins are in natural regime.Selecting the appropriate model to simulate the streamflow in a watershed is a key challenge, and analysing the performance of these models in different climate basins could help researchers to apply the suitable model in each case.

Study Areas and Data Inputs
To compare the accuracy of SWAT and ANN models, two contrasting watersheds in Spain were selected as case studies in this work.Figure 1 shows the location map and the digital elevation models (DEMs) of the watersheds.Table 1 summarizes the characteristics of both watersheds.LRB is located in the north of the Miño-Sil Basin (NW Spain) and covers an area of 843 km 2 with an elevation range from 392 to 872 m asl.The climate is typical Atlantic, with higher differences between extreme temperatures in summer and winter.Annual precipitation ranges from 660 to 1632 mm.The streamflow data are abundant and are produced by persistent Atlantic frontal systems from the west that generally occur from autumn to spring; summer is the driest season [24].The mean monthly flow varied from 48 m 3 /s in winter to 4 m 3 /s in summer; this flow gradually decreases until summer and increases again during autumn.The predominant soil type is Humic Cambisol (82% of the total area).This area has a low permeability, and the importance of aquifers is much lower than superficial water resources [25].The major land cover in the Ladra River watershed is forest land (35%) followed by land with scrub and/or herbaceous vegetation (24%), heterogeneous agricultural areas (23%) and mixed mosaic (18%).
Water 2018, 10, x FOR PEER REVIEW 3 of 19 located in the north of the Miño-Sil Basin (NW Spain) and covers an area of 843 km 2 with an elevation range from 392 to 872 m asl.The climate is typical Atlantic, with higher differences between extreme temperatures in summer and winter.Annual precipitation ranges from 660 to 1632 mm.The streamflow data are abundant and are produced by persistent Atlantic frontal systems from the west that generally occur from autumn to spring; summer is the driest season [24].The mean monthly flow varied from 48 m 3 /s in winter to 4 m 3 /s in summer; this flow gradually decreases until summer and increases again during autumn.The predominant soil type is Humic Cambisol (82% of the total area).This area has a low permeability, and the importance of aquifers is much lower than superficial water resources [25].The major land cover in the Ladra River watershed is forest land (35%) followed by land with scrub and/or herbaceous vegetation (24%), heterogeneous agricultural areas (23%) and mixed mosaic (18%).The other studied area is the headwaters of the Segura River Basin (SE Spain), which has an area of 235 km 2 and is characterized by steep terrain with an elevation range from 898 to 1912 m asl.The climate is Mediterranean with very dry summers and a rainy season extending from October to May, during which over 80% of annual precipitation occurs.The mean annual precipitation ranged from 412 to 1234 mm.The main soil type is Rendzic Leptosol (88% of the total area) with good drainage [26].The catchment is characterized by conditions which allow the infiltration of a large amount of water and which smooth the hydrological response, and groundwater is crucial in surface hydrology [27].During the summer months the rainfall is practically non-existent, and so the streamflow in this period is mainly from groundwater sources [28].The mean monthly flow varies  The other studied area is the headwaters of the Segura River Basin (SE Spain), which has an area of 235 km 2 and is characterized by steep terrain with an elevation range from 898 to 1912 m asl.The climate is Mediterranean with very dry summers and a rainy season extending from October to May, during which over 80% of annual precipitation occurs.The mean annual precipitation ranged from 412 to 1234 mm.The main soil type is Rendzic Leptosol (88% of the total area) with good drainage [26].The catchment is characterized by conditions which allow the infiltration of a large amount of water and which smooth the hydrological response, and groundwater is crucial in surface hydrology [27].During the summer months the rainfall is practically non-existent, and so the streamflow in this period is mainly from groundwater sources [28].The mean monthly flow varies between 0.96 m 3 /s in September and 2.97 m 3 /s in January.The HSRB is a mostly forest-dominated area which covers about 61% of the basin, and 19% is covered by Mediterranean scrubland vegetation.
The SWAT and ANN models were constructed using freely available information.The daily discharge data of LRB were collected from the Centre for Hydrographic Studies of CEDEX website [29] and are available from 1971.The daily flow data for HSRB are available on the Hydrographic Confederation of the Segura River website [30] from 1987.This work used the climatic database SPAIN02 (see details in Herrera et al., (2012) [31]), which includes daily precipitation and temperature data for 1950 to 2007 in a grid (20 × 20 km) for Spanish territory.This grid was developed by considering a very dense network of quality-controlled stations.The grid was produced applying the kriging method in a two-step process.First, the occurrence was interpolated using a binary kriging and, in a second step, the amounts were interpolated by applying ordinary kriging to the occurrence outcomes [31].Distributed hydrological models require spatially long-term, distributed, continuous data to simulate the hydrological response of a basin.However, conventional weather stations cannot fully represent the climate conditions across a basin because they are often sparsely distributed, particularly if large hydro climatic gradients exist [32].In addition, weather station records often do not cover the proposed simulation period or contain gaps.That is why we have used grid-based data.DEMs were obtained from the National Geographic Institute of Spain [33], with a resolution of 25 m.The soil data were obtained from the Harmonized World Soil Database (HWSD), assembled by the Food and Agriculture Organization of the United Nations (FAO) [34].Land cover maps were extracted from reclassified Corine Land Cover (CLC) [35].

SWAT Model
SWAT is a semi-distributed and semi-physically based model.SWAT considers the heterogeneity of a watershed by dividing it into sub-watersheds based on the river network and topography; subsequently, sub-watersheds are divided into hydrologic response units (HRUs) which lump land areas with unique soil, land cover and slope combinations.SWAT simulates the hydrologic cycle based on water balance, which is controlled by climate inputs such as daily precipitation and maximum and minimum air temperature.The water balance equation employed is [6]: where SW t is the final soil water content (mm), SW init is the initial soil water content (mm), t is the time in days, R day (i) is the precipitation on day i (mm), Q surf (i) is the surface runoff (mm), E a (i) is the evapotranspiration (mm), W seep (i) is the percolation (mm) and G gw (i) is the amount of baseflow (mm).

Model Setup and Data Sets
The SWAT model requires physically based inputs, like hydro-meteorological data, topography, soil properties, and land-use/land-cover in the catchment.Daily precipitation data (mm) and maximum and minimum temperature ( • C) data from 1971 to 2007 in LRB and from 1987 to 2007 in Segura Basin were used for the SWAT model simulation.Relative humidity, solar radiation and wind speed were not available in the study areas.In this study, we simulated the potential evapotranspiration using the Hargreaves method [36] because it only requires maximum and minimum daily temperatures.Besides, according to Schneider et al., (2007) [37], the potential evapotranspiration method adopted has a minor effect on the simulated discharge response.The DEMs, with a 25 m mesh size, were used to determine the watershed and sub-watershed boundaries.Soil maps were used to characterize each soil type from information on soil texture, hydraulic conductivity and available water content, among others.Land cover is one of the most important factors that controls events such as runoff, evapotranspiration, sediment deposition and soil erosion [38].In SWAT, the combination of these three data sets (DEM, soil maps and land cover maps) divided the watersheds into HRUs; three categories of slope were defined (0-8%, 8-30% and >30%) to characterize the variety of the surface, and a threshold level of 10% was established to simplify model processing and remove minor slopes, soils and land used for each sub-watershed.Finally, LRB has been divided into 11 sub-basins and 124 HRUs and HSRB into 3 sub-basins and 21 HRUs.

Sensitivity Analysis, Calibration and Validation
Sensitivity analysis and calibration of parameters of the SWAT model were carried out automatically in SWAT-CUP using the SUFI-2 algorithm [39].Sensitivity analysis allowed us to calculate the rate of change in model output with respect to changes in model parameters [40] and thus we were able to identify the most influential parameters in governing streamflow [41].The parameters were calibrated using the observed daily discharge; the process consists in adjusting them so that the daily simulations are as close as possible to the observations.Firstly, we performed 500 model runs to obtain the sensitivities, and the most sensitive parameters were identified for each basin.Afterwards, two iterations of 1500 simulations were run as recommended by Yang et al., (2008) [42], readjusting the parameters after the second iteration.The input data series were divided into three phases: warm-up, calibration and validation.In the LRB, the period from 1971 to 1989 was chosen for model calibration, preceded by a five-year warming period (1966)(1967)(1968)(1969)(1970).After calibration, the model was validated using daily streamflow from 1990 to 2007.For HSRB, the periods of 1987-1997 and 1998-2007 were used for model calibration and validations, respectively, also preceded by a five-year warm-up.

Artificial Neural Network
An ANN is a computing method with a mathematical structure which mimics the human brain and nervous system.This network learns, memorizes and discloses the various relations found in the data.It is capable of modelling complex nonlinear input/output time-series relationships of a watershed without prior and explicit knowledge of the physical characteristics of that process [17,18].ANNs are composed of neurons or processing units which are organized in layers and connected through several links.There are many different architectures of ANNs: single-layer and multilayer networks according to the number of layers and feed-forward, recurrent and self-organizing networks according to the direction of information flow and processing.In this work, we have used the multilayer feed-forward networks, which are the most widely applied to simulate hydrological processes (e.g., [1,9,10,18]) and consist of a number of neurons organized in an input layer, one or more hidden layers and an output layer [20].The input layer includes neurons where input data are fed into the network.In the hidden layers, the neurons receive signals only from neurons in the previous layer and process data.Finally, the outputs are produced for the given inputs in the output layer.The nodes are connected to nodes in the neighbouring layers by weighted synaptic connections; each link has an associated weight that represents its connection strength.These weights store the knowledge of the network, that is, they parameterize the mathematical relationships between the variable inputs of the network; positive weight values reflect excitatory connections, negative values mean inhibition connections, whereas the zero weights make the connection considered non-existent.The scheme of operation of these networks is as follows: (i) The information is processed in the neurons, each neuron receives an array of inputs or signals; (ii) these signals pass between neurons through connection links; (iii) each neuron forms a linear combination of the signals' inputs according to its weights and then passes through an activation function to produce an output signal [43].The mathematical operation of a neuron is given as Equation (2): where y is the output of a neuron j, f is an activation function, x i is an input of the vector of inputs (i = 1, 2, . . ., n), w i is the weight associated with the connection link through which the input x i arrives to current neuron j from a neuron in the preceding layer and b j is a bias associated with neuron j.The activation function is usually a continuous and bounded nonlinear transfer function and controls the amplitude of the output neuron.The logistic sigmoid and hyperbolic tangent functions are the most commonly used in the hidden layers with a range of output from 0 to 1 and from −1 to 1, respectively [5], while a linear activation function may be used in the output layer so that a limited output interval has not been generated.The training process involves giving known input data and targets to the network and adjusting internal parameters (weight and biases) based on the performance measure and other parameters.
The ANN Modelling Approach According to Govindaraju (2000) [43], there is no fixed method for determining the number of input-output data that will be required.An optimal data set should be representative of the probable occurrence of an input vector and should facilitate the mapping of the underlying nonlinear process.Inclusion of unnecessary patterns could slow network learning.In contrast, an insufficient data set could lead to poor learning.A typical ANN training requires three data sets: training, validation and testing [21].In this work, the calibration data sets (1971 to 1989 in LRB, and 1987 to 1997 in HSRB) were divided into training sets (70% of data), validation data sets (15% of data) and testing data sets (15% of data).We have used a backpropagation algorithm to train, in which the result of the network (output of ANN model) is compared to the actual target (observed data), and then the network error is calculated.The output errors are repeatedly propagated backwards through the network to adjust its parameters until optimal values are obtained [1].The training was finished when the error on the validation data sets was near minimum.There are several backpropagation algorithms to network training.The superiority of the Levenberg-Marquardt (LM) algorithm [44,45] over other algorithms for better performance (lower estimated error) and higher convergence speed (when determining epoch size) was clearly established in several studies (e.g., [46,47]).LM is often the fastest backpropagation algorithm.Studies about streamflow forecasting [47,48] have shown that the LM algorithm has an appropriate operation in training this type of network.Therefore, LM algorithm has been used to reduce the mean squared error (MSE) (Equation ( 3)) iteratively in this study: where O i is the ANN target (observation), E i is the ANN output (simulated value) and n is the total number of observations.For use in finding optimal neural weights by backpropagation algorithms based on a least-squared approach such as MSE, it is required that the transfer function be easily differentiable, thus permitting the evaluation of increments of weights via the chain rule for partial derivatives [49].According to Dawson and Wilby (2001) [50], the logistic sigmoid is continuous and relatively easy to compute (as is its derivative).Thus, we used a feed-forward neural network with LM back-propagation learning and sigmoid transfer function, which is one of the best selections for modeling hydrologic parameters [10].Different types and numbers of inputs were employed in the ANNs to estimate daily flow.Understanding the temporal relationships between climatic variables and streamflow is fundamental for ANN development.Many studies use time-series correlation analysis to determine the dependency between the observed streamflow and the antecedent climate variables [1,18,20].In an attempt to check any overfitting, a cross-validation was performed.We divided the data sets into five subsets: four subsets were used to train, and the remaining subset was used to validate.

Evaluation Criteria for Model Comparison
In the cross-validation of the ANNs, we obtained the average performance for the five cross-validation steps.The selected networks for each basin were the best performances in terms of Nash-Sutcliffe efficiency coefficient (NSE), percent bias (PBIAS) and root mean squared error (RMSE).
The results were evaluated statistically using the statistics defined in Table 2.We have evaluated and compared SWAT and ANN results based on four statistics including NSE, PBIAS, RMSE and coefficient of determination (R 2 ), which are the most widely used in hydrology studies.These statistics are defined in Table 2.

Equation
Range O i is the ith observed data, O is the mean of the observed data, E i is the ith estimated data, E is the mean of the estimated data and n is the total number of observations.NSE indicates how well the plot of observed versus simulated data fits the 1:1 line and is recommended because it is very commonly used, which provides extensive information on reported values [51].R 2 describe the degree of collinearity between simulated and measured data.PBIAS measures the average tendency of the simulated data to be larger or smaller than their observed counterparts and has the ability to clearly indicate poor model performance [52].RMSE quantifies the prediction error in terms of the units of the variable calculated by the model.The best performance for NSE and R 2 is 1, and for PBIAS and RMSE the best performance is 0. In addition, to evaluate models we have used the criteria proposed by Kalin et al., (2010) [53], who adapted the monthly criteria of Moriasi et al., (2007) [51] to a daily scale (Table 3).In this case, the study is an exploratory analysis of the power of SWAT and ANN models for daily simulation of runoff, and this is an additional reason to relax the performance ratings according to the American Society of Agricultural and Biological Engineers (ASABE, 2017) [54].The results were also evaluated graphically using scatter plots.In addition to these goodness-of-fit measures, we analysed the results based on the flow duration curves to help visualize graphically the differences between observed and estimated streamflow.Pfannerstill et al., (2014) [2] presented an approach to improve the models' evaluation by subdividing the flow duration curve into different segments.Their results showed that the segmentation of very low/high and low/high flow allows analysis of the model performance for every important discharge event precisely.In addition, they concluded that the additional segmentation of the flow duration curve into low and very low flows is essential for taking into account long low flow periods events.In this study, to assess different phases of the hydrograph, FDCs were divided into five segments as shown in Table 4 according to Pfannerstill et al., (2014) [2], where Q p represents the flow with a probability of exceedance equal to p%.The RMSE, defined in Table 2, was used to compare the performance of models for each hydrograph phase.

Hydrograph Phase Definition
Very high flow Flows greater than Q5 High flow Flows between Q5 and Q20 Medium flow Flows between Q20 and Q70 Low flow Flows between Q70 and Q95 Very low flow Flows smaller than Q95

Sensitive Analysis, Calibration and Validation of the SWAT Model
A global sensitivity analysis was conducted to identify the most important influence parameters for streamflow simulation, which were adjusted during calibration.A ranking of parameter sensitivities was obtained after 500 model runs.The effect of the parameters on the simulated streamflow was evaluated with p-value which determines the significance of the sensitivity and t-stat which provide a measure of sensitivity.The ranking of most sensitive parameters observed in this study (Table 5) was also supported by the findings of Raposo et al., (2013) [55] in the LRB and Senent-Aparicio et al., (2017) [26] in the HSRB.Some of the most sensitive parameters are common for both basins with a similar order of sensitivity, as for example ALPHA_BF, CH_N1, CH_N2, SOL_K, CN2 and GWQMN.After performing a global sensitivity analysis, the most sensitive parameters were selected for each studied basin, which are shown and defined in Table 6.All the selected parameters were also selected as the most relevant in other research [23,26,41,55].The fitted values of these parameters reflect the contrasting climatic characteristics of the two basins.In HSRB, groundwater parameters (GWQMN, GW_DELAY, RCHRG_DP, ALPHA_BF and GW_REVAP) were significant, as expected in Mediterranean basins where the aquifers are relevant [41,56].A high deep aquifer percolation fraction (RCHRG_DP) and very low delay time (GW_DELAY) for aquifer recharge reflect the highly permeable geology of HSRB.In contrast, no relevant aquifer is present in LRB, where RCHRG_DP was very low.In both basins, the low values of ALPHA_BF indicate a slow response [57].The low value of CH_K1 in LRB indicated a moderate loss rate for soil with high silt-clay content, while a high value in HSRB reflected a very high loss rate for very clean gravel and large sand [57].Another big difference between the two basins is the soil evaporation compensation factor (ESCO).The ESCO was higher in the LRB, with an Atlantic climate, than in the HSRB, with a Mediterranean climate where evapotranspiration has a higher relevance [26].When the ESCO value decreases, the ability of the model to extract the evaporative demand from lower soil layers increases [58].Lateral flow travel time (LAT_TTIME) in the LRB was very similar to that used by Raposo et al., (2013) [55] in nearby basins where a significant portion of groundwater flows laterally as interflow [59].The value of GWQMN was calibrated as 82.5 in LRB similar to that obtained in a nearby study [55].Besides, an automated digital filter programme (Base Flow Filter Program) [60] was applied to determine the groundwater ratio.The results obtained are similar to those simulated by our model.

Input Selection, Training and Validation of ANN Models
Determining the input variables has a significant influence on the simulated flow.The basin rainfall and temperature data used by the ANNs were calculated using the Thiessen method, in which the climate values were based on a weighted average of the contribution of the cell in the area.After reviewing other research [22,23,48], we have selected the following variables as inputs to the ANN models to estimate daily streamflow: daily precipitation (P t ), daily temperature (T t ), precipitation of the previous n days (P t−n ), total rainfall of the preceding n days (R n ) and mean temperature over the previous n days (Tm n ).In this study, the most suitable delays of climate variables were determined using cross-correlation analyses, so we determined the temporal relationships between these input variables and streamflow.As shown in Figure 2a, the streamflow is highly positively correlated with daily precipitation of the current day t (P t ) and with daily precipitation of the previous days, until t-4 for LRB and until t-2 in HSRB.
is highly positively correlated with daily precipitation of the current day t (Pt) and with daily precipitation of the previous days, until t-4 for LRB and until t-2 in HSRB.Streamflow is strongly correlated with accumulated daily rainfall; there is a greater correlation for 4 days in LRB and for 48 days in HSRB, reflecting the little and the great importance of groundwater, respectively, in these basins.With respect to the daily temperature, there are moderate negative correlations with the daily streamflow in both basins.Finally, a total of four input combinations have been proposed for each basin in this study (Table 7).

Basin Prediction Scenario Input Combinations Output
For the network structure identification, we implemented and built the ANNs using MATLAB ® software (version 8.2.0.701 (R2013b), The Mathworks, MA, USA).A multilayer feed-forward network was used.The number of hidden layers and hidden neurons was established by trial-and-error procedure; one or two hidden layers with a number of neurons between two and ten Streamflow is strongly correlated with accumulated daily rainfall; there is a greater correlation for 4 days in LRB and for 48 days in HSRB, reflecting the little and the great importance of groundwater, respectively, in these basins.With respect to the daily temperature, there are moderate negative correlations with the daily streamflow in both basins.Finally, a total of four input combinations have been proposed for each basin in this study (Table 7).For the network structure identification, we implemented and built the ANNs using MATLAB ® software (version 8.2.0.701 (R2013b), The Mathworks, Natick, MA, USA).A multilayer feed-forward network was used.The number of hidden layers and hidden neurons was established by trial-and-error procedure; one or two hidden layers with a number of neurons between two and ten are considered.The number of neurons in the input layer depends on the number of input variables in each scenario, which varies from 3 to 7. Figure 3 shows the ANN structure used in this work.are considered.The number of neurons in the input layer depends on the number of input variables in each scenario, which varies from 3 to 7. Figure 3 shows the ANN structure used in this work.The different scenarios defined in Table 7 were tested for determining the type and number of inputs to ANN models.Table 8 shows the best architecture of ANN and their performances for each scenario trained and validated for the studied basins.These performance measures values are averages obtained over the five rounds of cross-validation.The results shown in Table 8 indicate four effective ANN structures with good performances for LRB.Scenario 1 for LRB with a combination of six cells in the input layer (the precipitation of days t, t-1, t-2, t-3 and t-4, and the temperature of day t), one hidden layer with two neurons and one neuron in the output layer (the streamflow of day t) had the highest NSE and the lowest RMSE in the training and validation phase.Based on the criteria of Table 3, NSE and PBIAS of scenario 1 were good and very good, respectively.Therefore, scenario 1 was the selected architecture for LRB.However, the performance levels of ANN models for HSRB were lower in general because modelling the hydrological response of arid and semi-arid regions, where evapotranspiration rates are high and precipitation is irregular and/or limited, is especially complex [61].The selected model for HSRB was scenario 3 where NSE and RMSE were better than those obtained in other proposed scenarios.In this scenario, NSE was classified as satisfactory and PBIAS as very good.The rest of the scenarios were classified as unsatisfactory based on NSE.Therefore, the ANN configuration selected for HSRB was three cells in the input layer (the precipitation of days t and t-1, and total rainfall of the preceding 48 days), one hidden layer with four neurons and one neuron in the output layer (the streamflow of day t).In conclusion, the structure selected for both basins was formed by three layers, similar to other studies (e.g., [1,23]).The different scenarios defined in Table 7 were tested for determining the type and number of inputs to ANN models.Table 8 shows the best architecture of ANN and their performances for each scenario trained and validated for the studied basins.These performance measures values are averages obtained over the five rounds of cross-validation.The results shown in Table 8 indicate four effective ANN structures with good performances for LRB.Scenario 1 for LRB with a combination of six cells in the input layer (the precipitation of days t, t-1, t-2, t-3 and t-4, and the temperature of day t), one hidden layer with two neurons and one neuron in the output layer (the streamflow of day t) had the highest NSE and the lowest RMSE in the training and validation phase.Based on the criteria of Table 3, NSE and PBIAS of scenario 1 were good and very good, respectively.Therefore, scenario 1 was the selected architecture for LRB.However, the performance levels of ANN models for HSRB were lower in general because modelling the hydrological response of arid and semi-arid regions, where evapotranspiration rates are high and precipitation is irregular and/or limited, is especially complex [61].The selected model for HSRB was scenario 3 where NSE and RMSE were better than those obtained in other proposed scenarios.In this scenario, NSE was classified as satisfactory and PBIAS as very good.The rest of the scenarios were classified as unsatisfactory based on NSE.Therefore, the ANN configuration selected for HSRB was three cells in the input layer (the precipitation of days t and t-1, and total rainfall of the preceding 48 days), one hidden layer with four neurons and one neuron in the output layer (the streamflow of day t).In conclusion, the structure selected for both basins was formed by three layers, similar to other studies (e.g., [1,23]).

Comparison of Model Performance
Calibration of SWAT models and training of the selected ANNs (scenario 1 for LRB and scenario 3 for HSRB) were done using the training data sets (1971)(1972)(1973)(1974)(1975)(1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989) for LRB and 1987-1997 for HSRB).Then, we tested the models with the validation sets (1990-2007 for LRB and 1998-2007 for HSRB).A comparison of flow estimation performance of the SWAT and ANN for LRB and HSRB is provided in Table 9, which shows separately the performances for the calibration/training and validation periods.The values of NSE for both models were classified as good according to the criteria listed in Table 3 for the calibration/training phase in LRB and HSRB.For the validation phase, the NSE values ranged between 0.5 and 0.7, and therefore, they were classified as good for both models of the LRB.The NSE values were classified as satisfactory for both models of the HSRB.The PBIAS values were less than 25%, so they were classified as very good in all cases.The values of RMSE for both models were similar.The NSE R 2 values obtained by the ANN model were higher than those obtained in SWAT in both basins, and those during training were higher than those during validation phases.After analysing these results, it was concluded that both SWAT and ANN were suitable.The more arid the catchment, the lower the performances obtained in the hydrological models, which is similar to the experience reported by Pérez-Sánchez et al., (2017) [61].
For a better understanding of the difference between the models, Figure 4 shows the results of SWAT and ANN models plotted against the observed values of streamflow for the calibration/training and validation periods with their correlation coefficients.
SWAT models had a poor performance in estimating the large values of streamflow, whereas ANN models were worse in estimating the small values.In every figure of Figure 4, the points which are related to streamflow with large values are positioned at a greater distance to the 1:1 line when the values have been estimated by SWAT.In contrast, the points related to the estimated streamflow by ANN models are farther from the 1:1 line when it comes to the estimation of small values.
The hydrographs (Figure 5) show the fit obtained for simulated versus measured streamflow in the studied basins during the validation period (from 1995 to 1997 for the LRB and from 2002 to 2004 for the HSRB).The models generally reproduce the streamflow fairly well.Although both models tended to underestimate the peak-flow events during the validation phase, ANN models were more sensitive to precipitation events than SWAT models, and their estimations always remain above those obtained by SWAT.(a)  According to Chen and Chau (2016) [11], NSE and RMSE scale the mean squared error of estimation models, therefore they particularly reflect the performance on high values.Thus, the above discussions on evaluation criteria and plots of estimated data could not provide explicit performances on different intervals of values.To address this problem, different ranges of flow (from very high to very low flow) were determined.The reproduction of the streamflow was analysed by the FDC of LRB and HSRB for the validation periods (Figure 6).The FDC for LRB shows that the ANN performed generally better in the very high flow segment and SWAT was better in the very low flow segment.The values obtained by SWAT and by ANN were graphically similar for the rest of the flow segments in LRB.For HSRB, SWAT was better only in the very low flows.An analysis of performance based on RMSE in each hydrograph phase was also done, as reflected in Table 10.The best results for each basin are highlighted in bold.As it was expected, high peaks are better simulated at the expense of low flows due to the fact that RMSE is biased towards high values.The RMSE values suggest that the SWAT model was better in the estimation of very low flows and ANN in the estimation of very high flows in all cases.According to Chen and Chau (2016) [11], NSE and RMSE scale the mean squared error of estimation models, therefore they particularly reflect the performance on high values.Thus, the above discussions on evaluation criteria and plots of estimated data could not provide explicit performances on different intervals of values.To address this problem, different ranges of flow (from very high to very low flow) were determined.The reproduction of the streamflow was analysed by the FDC of LRB and HSRB for the validation periods (Figure 6).The FDC for LRB shows that the ANN performed generally better in the very high flow segment and SWAT was better in the very low flow segment.The values obtained by SWAT and by ANN were graphically similar for the rest of the flow segments in LRB.For HSRB, SWAT was better only in the very low flows.According to Chen and Chau (2016) [11], NSE and RMSE scale the mean squared error of estimation models, therefore they particularly reflect the performance on high values.Thus, the above discussions on evaluation criteria and plots of estimated data could not provide explicit performances on different intervals of values.To address this problem, different ranges of flow (from very high to very low flow) were determined.The reproduction of the streamflow was analysed by the FDC of LRB and HSRB for the validation periods (Figure 6).The FDC for LRB shows that the ANN performed generally better in the very high flow segment and SWAT was better in the very low flow segment.The values obtained by SWAT and by ANN were graphically similar for the rest of the flow segments in LRB.For HSRB, SWAT was better only in the very low flows.An analysis of performance based on RMSE in each hydrograph phase was also done, as reflected in Table 10.The best results for each basin are highlighted in bold.As it was expected, high peaks are better simulated at the expense of low flows due to the fact that RMSE is biased towards high values.The RMSE values suggest that the SWAT model was better in the estimation of very low flows and ANN in the estimation of very high flows in all cases.An analysis of performance based on RMSE in each hydrograph phase was also done, as reflected in Table 10.The best results for each basin are highlighted in bold.As it was expected, high peaks are better simulated at the expense of low flows due to the fact that RMSE is biased towards high values.The RMSE values suggest that the SWAT model was better in the estimation of very low flows and ANN in the estimation of very high flows in all cases.Similar results regarding peak-flow inefficiency of SWAT have been obtained in other studies (e.g., [5,22,23]), which suggested that peak-flow inefficiency could be caused by the formulation.The results obtained show that use of ANN models can help reduce the error in the estimation of high streamflow values, although these were also underestimated.One of the reasons is that the data of high values are scarce in the training data sets, the medium and low values being more numerous as illustrated in the cloud of points in the scatterplots in Figure 4.This problem in the application of neural network has also been reported in the works of Minns and Hall (1996) [15] and Talebizadeh et al., (2010) [62].On the other hand, SWAT models simulated the estimation of the low flow values better than ANNs.In general, ANN models tended to overestimate the low values of streamflow.This inability can be attributed to complex non-linear relationships governing the process of low flow, often related to the base flow from groundwater.The performance of the ANN could be deteriorated with the increase in non-linearity [15].It is generally accepted that the processes of streamflow generation are likely to be quite different during low, medium, and high flow periods.The base flow mainly contributes to low flow events whereas intense storm rainfall gives rise to high flow events [63].Therefore, a single global ANN model could not predict the high and low runoff events satisfactorily [15].SWAT models may obtain satisfactory results for the estimation of low flows but could not simulate very high streamflow with the same accuracy.In contrast to SWAT, a single ANN can obtain better results for very high values but not for the lowest values; these results are similar to those obtained by Kim et al., (2015) [23].Therefore, the use of these models is suitable for simulating the streamflow in a basin.In the case of studies of extreme hydrologic events (e.g., floods), it is recommended to use an ANN model to simulate high-flow events.Otherwise, in studies of hydrological management in which low-flow events are more interesting, applying the SWAT model would be more desirable.In addition, it is important to take into account the disadvantages of each model.In Spain, it is relatively easier to obtain the input data, such as the streamflow and precipitation data, for the ANN model through the governmental online resources compared to data regarding the physical characteristics of river basins, such as soil moisture, infiltration, soil classes, groundwater level and evaporation, for the SWAT model.In addition, the time consumed in the setup and calibration of SWAT is higher than that consumed in the implementation of an ANN model.However, an ANN is a black box, and the water balance and its components are not obtained.The use of precipitation and temperature as the only inputs of the models is, on the other hand, a limitation of the ANN models used because the rainfall-runoff relation is impacted by different physical parameters too.The non-consideration of land use or land management in the ANN model makes the SWAT model more advantageous if a number of scenarios are to be made to investigate the response of the basin [1].
The results of this study suggest, however, that the ANN approach is very efficient to simulate a hydrological process because it requires very few input variables and minimal resources to implement and therefore, it is sufficiently promising to the development of other approaches such as the simulation of water quality process, as it is reflected in some studies (e.g., [64][65][66]).

Conclusions
We proposed the use of SWAT, a semi-physically based model, and ANN, a machine learning technique, to simulate the daily streamflow values and compare the results of both models in order to analyse their capabilities.They were applied in two basins with contrasting climates to check the validity of these models in basins with different climatic conditions.For determining the type and number of inputs for ANN models, four scenarios were considered in each studied basin, and they showed that the inclusion of daily precipitation, precipitation of previous days and total rainfall in the previous days was important to estimate the daily streamflow.After calibrating SWAT models for daily observed streamflow through the SUFI-2 algorithm, results indicated that SWAT has a better performance in estimating very low values of streamflow, whereas ANN estimated very high values with greater precision in all cases studied.Moreover, the results suggest that SWAT and ANN models were better when the climate was more humid.When the basin has more arid weather and, therefore, it is more complicated to model, ANN obtained better performance in more hydrograph phases.One of the advantages of the ANN model is that it does not require any physical characteristics of the watershed and, therefore, its implementation is easier.Nevertheless, in reverse, the totally implicit and physically meaningless features are also the major criticisms.It is still necessary to develop estimation models with conceptual ideas to reflect the characteristics of streamflow.ANN is a black box, and to gain knowledge about the water balance and its components, the SWAT model is more useful.Despite the advantages and disadvantages of each model, the results suggest that to simulate values of streamflow time-series, the choice between the SWAT or ANN has an impact on the accuracy of estimated flow.This idea for modelling streamflow can be extended to other machine learning techniques, which we could explore in future works.In addition, the ANN model only considers the inputs of precipitation and temperature.The influences of other inputs related to the streamflow can be explored additionally to improve the current study.

Figure 1 .
Figure 1.(a) Location of the Miño-Sil and Segura River Basins in Peninsular Spain; (b) Location of the LRB in the Miño-Sil River Basin; (c) Location of the HSRB in the Segura River Basin; (d) DEM of the LRB; (e) DEM of the HSRB.

Figure 1 .
Figure 1.(a) Location of the Miño-Sil and Segura River Basins in Peninsular Spain; (b) Location of the LRB in the Miño-Sil River Basin; (c) Location of the HSRB in the Segura River Basin; (d) DEM of the LRB; (e) DEM of the HSRB.

Figure 2 .
Figure 2. Cross-correlation analyses for LRB and HSRB between daily streamflow and (a) daily precipitation; (b) total rainfall of the preceding n days; (c) daily temperature; (d) mean temperature over the previous n days.

Figure 2 .
Figure 2. Cross-correlation analyses for LRB and HSRB between daily streamflow and (a) daily precipitation; (b) total rainfall of the preceding n days; (c) daily temperature; (d) mean temperature over the previous n days.

Water 2018 ,
10, x FOR PEER REVIEW 11 of 19

Figure 3 .
Figure 3. Structure of multilayer feed-forward network used in this research.

Figure 3 .
Figure 3. Structure of multilayer feed-forward network used in this research.

Figure 4 .
Figure 4. Scatterplots for daily streamflow obtained with SWAT and ANN in (a) calibration/training period of LRB; (b) validation period of LRB; (c) calibration/training period of HSRB; (d) validation period of HSRB.

Figure 4 .
Figure 4. Scatterplots for daily streamflow obtained with SWAT and ANN in (a) calibration/training period of LRB; (b) validation period of LRB; (c) calibration/training period of HSRB; (d) validation period of HSRB.

Figure 4 .Figure 5 .Figure 5 .
Figure 4. Scatterplots for daily streamflow obtained with SWAT and ANN in (a) calibration/training period of LRB; (b) validation period of LRB; (c) calibration/training period of HSRB; (d) validation period of HSRB.

Figure 5 .
Figure 5.Comparison of observed and simulated daily streamflow by SWAT and ANN in validation phase, while the bar graph at the top is the daily precipitation for (a) LRB; (b) HSRB.

Figure 5 .
Figure 5.Comparison of observed and simulated daily streamflow by SWAT and ANN in validation phase, while the bar graph at the top is the daily precipitation for (a) LRB; (b) HSRB.

Table 1 .
Characteristics of the watersheds.
1According to data from 1971 to 2007.2According to data from 1987 to 2007.

Table 1 .
Characteristics of the watersheds.

Table 3 .
Evaluation model criteria for daily time scale.

Table 4 .
Definition of five segments of different phases of the hydrograph.

Table 5 .
Sensitivity analysis of SWAT model parameters for LRB and HSRB.

Table 6 .
Parameters used in the SWAT model calibration in LRB and in HSRB.

Table 7 .
Estimation scenarios for each basin.

Table 7 .
Estimation scenarios for each basin.

Table 8 .
Best network architectures and their average performance measures obtained in cross-validation for each prediction scenario.
1I is the number of neurons in input layer; H is the number of neurons in hidden layer (one or two hidden layers); O is the number of neurons in output layer.

Table 8 .
Best network architectures and their average performance measures obtained in cross-validation for each prediction scenario.

Table 9 .
Performances of SWAT and ANN models.

Table 10 .
RMSE values (m 3 /s) of SWAT and ANN models in each hydrograph phase.