Hydrological Early Warning System Based on a Deep Learning Runo ﬀ Model Coupled with a Meteorological Forecast

: The intensiﬁcation of the hydrological cycle because of global warming raises concerns about future ﬂoods and their impact on large cities where exposure to these events has also increased. The development of adequate adaptation solutions such as early warning systems is crucial. Here, we used deep learning (DL) for weather-runo ﬀ forecasting in regi ó n Metropolitana of Chile, a large urban area in a valley at the foot of the Andes Mountains, with more than 7 million inhabitants. The ﬁnal goal of this research is to develop an e ﬀ ective forecasting system to provide timely information and support in real-time decision making. For this purpose, we implemented a coupled model of a near-future global meteorological forecast with a short-range runo ﬀ forecasting system. Starting from a traditional hydrological conceptual model, we deﬁned the hydro-meteorological and geomorphological variables that were used in the data-driven weather-runo ﬀ forecast models. The meteorological variables were obtained through statistical scaling of the Global Forecast System (GFS), thus enabling near-future prediction, and two data-driven approaches were implemented for predicting the entire hourly ﬂow time-series in the near future (3 days), a simple Artiﬁcial Neural Networks (ANN) and a Deep Learning (DL) approach based on Long-Short Term Memory (LSTM) cells. We show that the coupling between meteorological forecasts and data-driven weather-runo ﬀ forecast models are able to satisfy two basic requirements that any early warning system should have: The forecast should be given in advance, and it should be accurate and reliable. In this context, DL signiﬁcantly improves runo ﬀ forecast when compared with a traditional data-driven approach such as ANN, being accurate in predicting time-evolution of output variables, with an error of 5% for DL, measured in terms of the root mean square error (RMSE) for predicting the peak ﬂow, compared to 15.5% error for ANN, which is adequate to warn communities at risk and initiate disaster response operations.


Introduction
In the last decade, a series of unprecedented extreme hydrological events have occurred, some of which have been attributed to climate change [1]. In fact, global projections indicate a positive correlation between global warming and the risk of extreme rainfall and floods [2]. This intensification of the hydrological cycle raises growing concerns about future floods and their impact on large cities where exposure has also increased [3]. Adequate adaptation solutions are required to increase resilience and reduce vulnerability of large urban centers. In this sense, the development of early warning These networks, are also known as feedforward neural networks because the information is transmitted in only one direction, forward from the input layer to the output layer. Therefore, there are no backward connections between layers (cycles or loops).
As an example, the outputs of a three-layer ANN are given by where = , , … , and = , , … , are the input and output vectors, and are the interconnection weights, represents the bias (or threshold) terms and (⋅) is the transfer function, usually a sigmoid function. The training problem consists of finding the weights and biases that will minimize the mean square error between the network prediction and the output targets.

Deep Learning Based on LSTM Neural Networks
Recurrent Neural Networks (RNNs) are one of the most powerful type of Neural Networks, capable of processing sequences of arbitrary input patterns [36]. RNNs, however, suffer from the vanishing gradient problem, which makes it difficult to perform backpropagation efficiently during training, causing great computational effort. To overcome this, alternative structures have been proposed, such as Gated Recurrent Units and the LSTM cells. Figure 2 illustrates the flow of a time series x , , … , through an unrolled LSTM layer. In this diagram, x corresponds to a vector of input features, hi denote the output vectors and ci denote the cell states, all of them evaluated at the i-th time step. These networks, are also known as feedforward neural networks because the information is transmitted in only one direction, forward from the input layer to the output layer. Therefore, there are no backward connections between layers (cycles or loops).
As an example, the outputs of a three-layer ANN are given by where x = {x 1 , x 2 , . . . , x n } and y = y 1 , y 2 , . . . , y m are the input and output vectors, w and v are the interconnection weights, b represents the bias (or threshold) terms and f (·) is the transfer function, usually a sigmoid function. The training problem consists of finding the weights and biases that will minimize the mean square error between the network prediction and the output targets.

Deep Learning Based on LSTM Neural Networks
Recurrent Neural Networks (RNNs) are one of the most powerful type of Neural Networks, capable of processing sequences of arbitrary input patterns [36]. RNNs, however, suffer from the vanishing gradient problem, which makes it difficult to perform backpropagation efficiently during training, causing great computational effort. To overcome this, alternative structures have been proposed, such as Gated Recurrent Units and the LSTM cells. Figure 2 illustrates the flow of a time series x 1 , x 2 , . . . , x n through an unrolled LSTM layer. In this diagram, x i corresponds to a vector of input features, h i denote the output vectors and c i denote the cell states, all of them evaluated at the i-th time step.
The classical LSTM block structure shown in Figure 2b consists of different processes called gates [37]. These gates compute the desired output from a new input data at a time t, along with elements obtained from the previous time step t − 1.
Equations (2)-(7) describe the processes in an LSTM block. Equations (2)-(4) represent the input, output and forget gate. These gates compute a linear combination of the new input data x t with the output of the previous time step h t−1 , which are evaluated using a sigmoid activation function. On the other hand, gate g t generates candidates that will become part of the new cell state. Equation (6) corresponds to the cell state c t , which represents a memory capsule containing information of all previous states. Finally, the output h t is computed by the element wise multiplication of the output gate with the activation of the cell state (Equation (7)). It should be noted that gates i t , o t , f t and g t represent independent neural networks, which possess their own weights and biases.
Water 2019, 11, x FOR PEER REVIEW 5 of 22 The classical LSTM block structure shown in Figure 2b consists of different processes called gates [37]. These gates compute the desired output from a new input data at a time , along with elements obtained from the previous time step − 1.
Equations (2)-(7) describe the processes in an LSTM block. Equations (2)-(4) represent the input, output and forget gate. These gates compute a linear combination of the new input data with the output of the previous time step , which are evaluated using a sigmoid activation function. On the other hand, gate generates candidates that will become part of the new cell state. Equation 6 corresponds to the cell state , which represents a memory capsule containing information of all previous states. Finally, the output is computed by the element wise multiplication of the output gate with the activation of the cell state (Equation (7)). It should be noted that gates , , and represent independent neural networks, which possess their own weights and biases.

Hydrological Description of the Study Area
Región Metropolitana (33.5 • S, 70.8 • W, 500 m a.s.l.) is located in central Chile, in the Maipo river watershed, at the foot of the Andes Mountain that have an average height of 3000 m above sea level (m a.s.l.) and maximum height of 6500 m a.s.l. Two rivers coming from the Andes Mountains pass across region Metropolitana: Maipo river in the south, and Mapocho river in the north (Figure 3). These rivers come from the high mountains range, above 3500 m a.s.l, in an area covered by discontinuous permafrost, snow and glaciers. In this area, glaciers occupy near 8% of the total surface (Table 1), and it was estimated that nearly 10% of the rest of the detrital surface is occupied by debris-covered glaciers [38].
During the colder months in the Austral winter, disturbances of the polar front, which usually affect the southern part of Chile, move northwards and generate sporadic eruptions of precipitation systems in central Chile. Consequently, rainfall is recorded occasionally and they show great irregularity, with an annual average of 350 mm in the valley and increasing with height, and snow accumulation above 1500 m a.s.l in the winter months [39]. Nevertheless, convective rainfall in high altitude can also be expected during summer time under warm conditions with elevated 0 • C isotherm. On the other hand, the austral summer dominance of the subtropical anticyclone results in hot and dry summers, thus resulting in a semi-arid climate [39].
As a consequence, the Mapocho and Maipo rivers have a complex hydrological regime, which can be classified as a pluvio-nival regime during the autumn-winter seasons and a nivo-glacial regime during the spring-summer seasons. In this context, air temperature and humidity are as important as precipitation to determine the river flows, as they define the contributing area and the rate at which snow and glaciers melt.
Water 2019, 11, x FOR PEER REVIEW 6 of 22 discontinuous permafrost, snow and glaciers. In this area, glaciers occupy near 8% of the total surface (Table 1), and it was estimated that nearly 10% of the rest of the detrital surface is occupied by debriscovered glaciers [38]. During the colder months in the Austral winter, disturbances of the polar front, which usually affect the southern part of Chile, move northwards and generate sporadic eruptions of precipitation systems in central Chile. Consequently, rainfall is recorded occasionally and they show great irregularity, with an annual average of 350 mm in the valley and increasing with height, and snow accumulation above 1500 m a.s.l in the winter months [39]. Nevertheless, convective rainfall in high altitude can also be expected during summer time under warm conditions with elevated 0 °C

Coupled Model of a Meteorological Forecast with a Short-Range Runoff Forecast
We implemented a coupled model of a near-future global meteorological forecast with short-range runoff forecasting systems that we call data-driven weather-runoff forecast models, which were designed to forecast the flow in the nine different flow stations shown in Figure 3, for which we used as an input data the Global Forecast System (GFS) provided by the National Centers for Environmental Prediction (NCEP) for the following 3 days. The NCEP model was used because it has had a meteorological analysis since March 2004, with a meteorological forecast that has been run under the same mesh grid and with the same parametrizations, thus allowing the use of historical meteorological forecasts for training, validating and testing data-driven runoff forecast models, as explained below.
The flowchart of the weather-runoff forecast models is shown in Figure 4, in which three different groups of input data were identified to train the data-driven weather-runoff forecast models: watershed morphology, meteorology, and initial condition. The data-driven runoff model was designed to predict entire hourly flow time-series for the following 3 days, based on two different data-driven approaches: ANN and DL models. Below is described the input data and the structure of the different runoff models.

Input Data
In subsection 2.2, the hydrological description of the study area has shown that the Mapocho and Maipo rivers regime can be classified as a pluvio-nival regime during the autumn-winter seasons and a nivo-glacial regime during the spring-summer seasons. As a consequence, the variables for the weather-runoff model includes the standard variables in a rainfall-runoff model (precipitation, catchment area, stream length, slope), as well as air temperature and humidity as they define the contributing area and the rate at which snow and glaciers melt. Considering this, three different group of input information were used ( Figure 4):

•
Watershed geomorphology: The geomorphological characteristics were calculated from NASA Shuttle Radar Topography Mission (SRTM) version 3.0 global 1 arc second. For this aim, the watershed associated to each one of the flow stations was defined, and the information was summarized in one single table that list as a function of the elevation, the following information: Watershed area (km 2 ); length of the mainstream (km); maximum, minimum and average elevation of the watershed; and the average slope. This table is read at each time to create time

Input Data
In Section 2.2, the hydrological description of the study area has shown that the Mapocho and Maipo rivers regime can be classified as a pluvio-nival regime during the autumn-winter seasons and a nivo-glacial regime during the spring-summer seasons. As a consequence, the variables for the weather-runoff model includes the standard variables in a rainfall-runoff model (precipitation, catchment area, stream length, slope), as well as air temperature and humidity as they define the contributing area and the rate at which snow and glaciers melt. Considering this, three different group of input information were used ( Figure 4):

•
Watershed geomorphology: The geomorphological characteristics were calculated from NASA Shuttle Radar Topography Mission (SRTM) version 3.0 global 1 arc second. For this aim, the watershed associated to each one of the flow stations was defined, and the information was summarized in one single table that list as a function of the elevation, the following information: Watershed area (km 2 ); length of the mainstream (km); maximum, minimum and average elevation of the watershed; and the average slope. This table is read at each time to create time series of the geomorphological information as a function of the 0 • C isotherm elevation that splits the watershed into solid and liquid precipitation areas. Besides the watershed area, the time-series of the length of the mainstream, and the average slope and maximum elevation of the watershed were used as inputs of the data-driven weather-runoff models, as they are associated with the computation of the concentration time of the watershed [40]. The average elevation of the watershed was used to vertically interpolate the precipitation time-series from the GFS-NCEP meteorological model.

•
Meteorological forecast: The weather forecast was obtained from a statistical scaling of gridded data of the GFS provided by NCEP, from which we obtained precipitation and air relative humidity at the average elevation of the watershed below the 0 • C isotherm, and air temperature at 2 m above the terrain in 3500 m a.s.l, and this reference terrain elevation was the same for all of the nine catchments. This last variable was chosen based on preliminary trial and error tests that showed that it gave better results in the representation of diurnal flow pulses during snow melt. We tested for other constant reference elevations (2000 and 4000 m a.s.l), and the results were not sensitive to this value. Furthermore, without good results, we also tested as reference temperature, the temperature at the average elevation of the catchment below the 0 • C isotherm that varies in time. We used the forecast datasets with a horizontal resolution of 0.5 × 0.5 degree, available from 2004 to present, and with a horizontal resolution of 0.25 × 0.25 degree, available from 2007 to present. Vertical scaling of the GFS information was made by linearly interpolating the meteorological variables as a function of the terrain elevation, using the GFS grid points located in a 0.5 degree of radius, regardless of the horizontal resolution of the GFS model, following the vertical scaling methodology described in [41]. Furthermore, each forecast starts with the weather forecast and is updated every 6 h, at 0:00, 6:00, 12:00 and 18:00 h UTC-time.

•
Initial condition: The present flow conditions of all nine flow stations were obtained from real-time hour measurements of the General Direction of Water of the Chilean government; so that, the observed flow was used as an initial condition (t = 0).
In summary, together with the flow observed at t = 0, weather-runoff forecast models receive as input the time-series of seven variables that describes temporal changes (1 day in the past and 3 days in the future) of the watershed morphology and meteorological conditions. These variables are: (i) Watershed area below the 0 • C isotherm; (ii) mainstream length below the 0 • C isotherm; (iii) average slope of the watershed below the 0 • C isotherm; (iv) elevation of the 0 • C isotherm; (v) precipitation rate and (vi) air relatively humidity, both vertically interpolated to the average elevation of the watershed; and (vii) air temperature at 3500 m a.s.l.

Data-Driven Weather-Runoff Forecast Models
Two data-driven approaches were used for weather-runoff forecast models-the ANN and DL techniques-both aiming in predicting the entire hourly time-series of the flow rate for the next three days. Both forecasts used the observed flow in t = 0 plus the GFS weather information of the previous 24 h (4 observations at t = −24, −18, −12 and −6 h per each one of the input variables) and the following 3 days (12 observations at t = 0, 6, . . . , 66 h per each one of the input variables). We used Matlab© (version R2018b, www.matworks.com, USA) for the computation, in which the ANN model consists of a sequence of fully connected layers (FFNN), whereas the DL model combines LSTM layers followed by a sequence of FFNN layers (see Figure 5). The transfer functions in the FFNN layers, the number of FFNN and LSTM layers, and the number of hidden neurons in the LSTM and FFNN layers were determined by looking at the minimum root mean square error of the validation data set.
testing subset is used for evaluating the metrics of the model skills. Table 2 summarizes the hydrological features (maximum flow, mean flow, minimum flow and percentile 90% and 99%) of each subset, for each one of the nine flow stations, showing that for each flow station, no significant differences in the hydrological characteristics as a function of the subset. This is an important aspect since previous research has shown that the decomposition into train, validation and test subsets has consequences on the final outcome of the data-driven model [20].  These forecast models aim to predict the flow time-series every hour based on the GFS weather data and the observed flow at t = 0. For doing this, the ANN model receives one vector of 113 inputs, corresponding to 112 GFS weather data (16 × 7) plus the observed flow at t = 0 ( Figure 5a). The output is the predicted flow at t = −24, −23, . . . , +71, +72 h (96 outputs). The DL model, on the other hand, receives as input one 16 rows times 8 columns matrix, in which each row is associated with each one of the GFS times (t = −24, −18, . . . , +60, +66 h), whereas the columns contain the 7 input variables and the observed flow at t = 0, which is repeated on each row. The output is a 16 rows times 6 column matrix (Figure 5b), in which the first column contains the flow at times t = −24, −18, . . . , +60, +66 h, the second column the flow at t = −23, −17, . . . , +61, +67 h, and so on until the last column contains the flows at t = −19, −13, . . . , +65, +71 h. It is important to notice that the DL model was not used in the standard recursive way, in which the past time-series dependent variable (flow in this case) is recursively used as input for predicting the flows for the following time-steps. The proposed forecast model predicts at once the entire flow time-series for 1 day in the past and 3 days in the future with no need of the past flow. This constraint was required because the continuum real-time observations of the flow are not reliable, thus being necessary to only estimate the flow at t = 0 in case of failure of the real-time flow monitoring system. This is one of the novelties of this manuscript.
Finally, for both forecasts, the input and output variables were normalized by subtracting the mean value and dividing it by the standard deviation, and the entire data set obtained for each flow station was randomly subdivided into three subsets: 70% of the data set for training the weights of the net, 10% for validating (hyperparameter tuning), and 20% for testing (model evaluation). The testing subset is used for evaluating the metrics of the model skills. Table 2 summarizes the hydrological features (maximum flow, mean flow, minimum flow and percentile 90% and 99%) of each subset, for each one of the nine flow stations, showing that for each flow station, no significant differences in the hydrological characteristics as a function of the subset. This is an important aspect since previous research has shown that the decomposition into train, validation and test subsets has consequences on the final outcome of the data-driven model [20].

Evaluation Metrics for Model Skills
Six indexes were used to evaluate the performance of the trained models: the root mean square error (RMSE), the Nash-Sutcliffe efficiency index (NSE), the normalized RMSE by the average flow, Pearson correlation coefficient C xy , the error of time to peak discharge and the error of peak discharge.
The root means square error, RMSE, is defined as where O i and P i denotes the observed and predicted value, N the total number of observations used for computing RMSE. The RMSE has units of flow (m 3 /s) and quantifies the standard error in the prediction. The NSE (Nash-Sutcliffe efficiency index) is defined as where O denotes the average observations. NSE is a dimensionless index that quantifies the magnitude of the RMSE with respect to the standard variation of the observed flows. NSE is equal to 1 for the perfect fit and it can take values smaller than 0 if the RMSE is larger than the standard deviation of the observations.
The normalized RMSE (RMSE/Q avg ) is the RMSE divided by the average observed flow, and it quantifies the magnitude of the error with respect to the observed flow: The correlation coefficient C xy takes values between −1 and 1, being equal to 1 if there is a perfect correlation between observed and predicted flows, C xy = −1 if there is a perfect inverse correlation, and is equal to 0 if there is no correlation between observed and predicted flows.
The error to maximum flow (EQ p ) was introduced by [25], and in the context of this article it quantify the relative error in predicting the maximum flow for the following three days. It is defined as where Q m max denotes the measured maximum flow for the next three days, and Q p max the predicted maximum flow.
Finally, the error of time to maximum flow (ET p ) was also introduced by [25], and it is the absolute difference in hours between the time at which the maximum flow is observed (T m max ) versus the time at which the maximum flow is simulated. (T p max ). It is defined as The last two indexes are used to evaluate the performance of the early warning system.

Structure of the Data-Driven Weather-Runoff Forecasts Models
The structure of the ANN and DL weather-forecast models is defined by the transfer functions, the number of hidden layers and the number of hidden neurons, whose values were determined by minimizing the RMSE of the validation subset of Maipo en el Manzano, and the same structure was used for all of the flow stations. Table 3 lists the functions and parameters to be defined in the ANN model along with the range of values investigated. The number of layers and the number of hidden neurons were initially set equal to two and fifty, respectively. Then, the model was evaluated with two different transfer functions in the hidden layers: Linear and a rectified linear (Relu). The transfer function of the output layer was set to linear, which is recommended for regression problems. Each case was trained and evaluated ten times to characterize their variability. The results are presented in Figure 6a, the mean value is illustrated with a grey circle and the standard deviation by error bars. The best results are obtained with a Relu transfer function in the hidden layers. Then, the performance was evaluated with one to five hidden layers; each case was trained and evaluated ten times. The results presented in Figure 6b indicate that the best results are obtained with four hidden layers. Lastly, Figure 6c shows the performance with different number of neurons in the hidden layers, which is optimum for 300 neurons. Therefore, the final architecture that will be used hereinafter for the ANN model is four hidden layers with 300 neurons each and for Relu transfer functions, the output layer has 96 neurons and a linear transfer function.   Furthermore, Table 4 lists the parameters and functions to be defined in the DL model along with the range of values investigated. The procedure to select the best parameters was the same for the ANN model. First, the number of neurons in the LSTM and FFNN hidden layers were set equal to one hundred and the number of LSTM and FFNN hidden layers to one and two, respectively. The transfer function of the FFNN output layer was set to linear and then the performance of the two transfer functions in the hidden FFNN layers was evaluated. Once the transfer functions were selected, the model was evaluated with different combinations for the number of LSTM and hidden FFNN layers. Finally, the model was evaluated for different number of hidden units. Each combination was trained and evaluated ten times with the validation subset of Maipo en el Manzano. The results of the sensitivity analysis are presented in Figures 7-9. Therefore, the final architecture for the detailed forecast model is three LSTM layers with 300 neurons each and one output FFNN layer with a linear transfer function. This means that there are only two FFNN layers: The input and the output layers. This architecture will be used hereinafter for all nine flow stations.  Furthermore, Table 4 lists the parameters and functions to be defined in the DL model along with the range of values investigated. The procedure to select the best parameters was the same for the ANN model. First, the number of neurons in the LSTM and FFNN hidden layers were set equal to one hundred and the number of LSTM and FFNN hidden layers to one and two, respectively. The transfer function of the FFNN output layer was set to linear and then the performance of the two transfer functions in the hidden FFNN layers was evaluated. Once the transfer functions were selected, the model was evaluated with different combinations for the number of LSTM and hidden FFNN layers. Finally, the model was evaluated for different number of hidden units. Each combination was trained and evaluated ten times with the validation subset of Maipo en el Manzano. The results of the sensitivity analysis are presented in Figures 7-9. Therefore, the final architecture for the detailed forecast model is three LSTM layers with 300 neurons each and one output FFNN layer with a linear transfer function. This means that there are only two FFNN layers: The input and the output layers. This architecture will be used hereinafter for all nine flow stations.

Performance of ANN versus DL Weather-Runoff Forecast Models
To evaluate the differences between the ANN and DL weather-runoff models, Figure 10 shows the direct comparison between observed and predicted maximum and average flow for the DL model ((a) and (b)) and the ANN model ((c) and (d)); using the test subset (the goodness of the fit is indicated in Table 5). The same figure for the rest of flow stations showed, in general, good agreement between predicted and observed maximum and average flows for both the DL and the ANN models, however, the performance of the DL weather-runoff model was better than the ANN model (see Figures S1-S8 in Supporting Information and Table 5). The normalized RMSE for the DL model were 5.9% and 4.3% for and , respectively; NSE and were very close to 1, and the RMSE was 7.0 m 3 /s and 4.3 m 3 /s for and Q , respectively (see Table 5). With respect to performance in predicting the entire time-series of the flow for the following 3 days, Figure 11 shows the comparison between the observed and predicted flow for different cases identified with open circles in Figure 10. These examples were chosen based on the cumulative frequency of the average observed flow, using percentiles of 99.9%, 99.6%, 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91% and 90%, for panels (a) to (l), respectively. Finally, Figure 11 compares predicted and observed flow time-series for the maximum flow event. Similar figures for the rest of the flow stations are found in the Supporting Information.
The results show that the ANN model makes an acceptable prediction of the average flow, but has difficulties in recognizing temporal changes. This is reflected in greater errors in the predicted maximum flow, with a tendency to underestimate it. This is explained by the fact that ANN do not have a temporal "memory", and therefore are not good at predicting temporal changes. The DL model, which incorporates LSTM cells, has a much better prediction performance for the time-flow series as well as for the maximum and average flow, demonstrating that the temporal capacity of LSTM-based algorithms allows a prediction of temporal changes.

Performance of ANN versus DL Weather-Runoff Forecast Models
To evaluate the differences between the ANN and DL weather-runoff models, Figure 10 shows the direct comparison between observed and predicted maximum and average flow for the DL model ((a) and (b)) and the ANN model ((c) and (d)); using the test subset (the goodness of the fit is indicated in Table 5). The same figure for the rest of flow stations showed, in general, good agreement between predicted and observed maximum and average flows for both the DL and the ANN models, however, the performance of the DL weather-runoff model was better than the ANN model (see Figures S1-S8 in Supporting Information and Table 5). The normalized RMSE for the DL model were 5.9% and 4.3% for Q avg and Q max , respectively; NSE and C xy were very close to 1, and the RMSE was 7.0 m 3 /s and 4.3 m 3 /s for Q max and Q avg , respectively (see Table 5). With respect to performance in predicting the entire time-series of the flow for the following 3 days, Figure 11 shows the comparison between the observed and predicted flow for different cases identified with open circles in Figure 10. These examples were chosen based on the cumulative frequency of the average observed flow, using percentiles of 99.9%, 99.6%, 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91% and 90%, for panels (a) to (l), respectively. Finally, Figure 11 compares predicted and observed flow time-series for the maximum flow event. Similar figures for the rest of the flow stations are found in the Supporting Information.
The results show that the ANN model makes an acceptable prediction of the average flow, but has difficulties in recognizing temporal changes. This is reflected in greater errors in the predicted maximum flow, with a tendency to underestimate it. This is explained by the fact that ANN do not have a temporal "memory", and therefore are not good at predicting temporal changes. The DL model, which incorporates LSTM cells, has a much better prediction performance for the time-flow series as well as for the maximum and average flow, demonstrating that the temporal capacity of LSTM-based algorithms allows a prediction of temporal changes.     Figure 11. Comparison between predicted and observed flow for different cases identified with red open circles in Figure 11, associated to percentiles of 99.9%, 99.6%, 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91% and 90%, for panels (a-l), respectively. (m) Plots predicted and observed time-series for the event with maximum flow.

Performance of the Early Warning System
In order to verify the early warning advantage of the DL weather-runoff model, two extreme events were analyzed in detail. These events correspond to floods that occurred in April of 2016 (with a peak flow of 1078.6 m 3 /s, Figure 12a), and in May of 2012 (with a maximum flow of 546.1 m 3 /s, Figure 12b). For each event, the DL weather-runoff forecast model was run several times, starting at different days before the time at which the maximum flow was observed. As an example, Figure 12 shows three of these runs: One that starts 6 days before the peak flow and ends before the flow start to rise (black line that starts with the circle and ends with the black x); the second (blue simulation) that starts 4 days the peak flow, at the beginning of the storm, and ends before the peak flow was observed; and the third simulation starting on 2 days before the peak flow, in the middle of the storm, and ending after the peak flow has passed. Similarly, Table 6 shows the errors of time to peak ETp (Equation 13), and peak discharge, EQp (Equation 12), calculated for each one of the different simulations.
In terms of the early warning system, the blue simulation of April 2016 would have predicted at the beginning of the storm an important increase in the flow of Maipo River, while the red simulation would have predicted the magnitude and timing of the peak flow with two days in advance, as well as the flood duration. A similar situation is observed for the May 2012 event. In terms of the errors EQp and ETp (Table 6), the relative error in predicting the maximum flow tends to be smaller for larger maximum flows, and it takes positive values; so that, in this case, the model predicts maximum flows  Figure 11, associated to percentiles of 99.9%, 99.6%, 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91% and 90%, for panels (a-l), respectively. (m) Plots predicted and observed time-series for the event with maximum flow.

Performance of the Early Warning System
In order to verify the early warning advantage of the DL weather-runoff model, two extreme events were analyzed in detail. These events correspond to floods that occurred in April of 2016 (with a peak flow of 1078.6 m 3 /s, Figure 12a), and in May of 2012 (with a maximum flow of 546.1 m 3 /s, Figure 12b). For each event, the DL weather-runoff forecast model was run several times, starting at different days before the time at which the maximum flow was observed. As an example, Figure 12 shows three of these runs: One that starts 6 days before the peak flow and ends before the flow start to rise (black line that starts with the circle and ends with the black x); the second (blue simulation) that starts 4 days the peak flow, at the beginning of the storm, and ends before the peak flow was observed; and the third simulation starting on 2 days before the peak flow, in the middle of the storm, and ending after the peak flow has passed. Similarly, Table 6 shows the errors of time to peak ET p (Equation (13)), and peak discharge, EQ p (Equation (12)), calculated for each one of the different simulations.
In terms of the early warning system, the blue simulation of April 2016 would have predicted at the beginning of the storm an important increase in the flow of Maipo River, while the red simulation would have predicted the magnitude and timing of the peak flow with two days in advance, as well as the flood duration. A similar situation is observed for the May 2012 event. In terms of the errors EQ p and ET p (Table 6), the relative error in predicting the maximum flow tends to be smaller for larger maximum flows, and it takes positive values; so that, in this case, the model predicts maximum flows slightly smaller than the measurements. With respect to the error in the time to maximum flow, it is equal to 0 most of the time.

Discussion
In this article, we detailed a methodology that couples a process-based meteorological model that forecasts atmospheric conditions in the near future, with data-driven weather-runoff forecast models, which use these meteorological inputs for predicting hourly flow time-series in the near future. We implemented this methodology in región Metropolitana of Chile, for which two datadriven techniques were used for the weather-runoff forecast models-a simple ANN approach and a DL approach based on LSTM cells.
The data-driven weather-runoff models were designed based on the following three central ideas: (i) The near future flow (3 days) in the studied flow stations responds to both the precipitation rate of the storm, but also to changes in the watershed area or rate of snow melt (see Figure 11f). Consequently, a rainfall-runoff scheme (e.g., [25]) is not enough for predicting near-future flow, which justifies the weather-runoff concept that also uses air temperature and relatively humidity and the 0 °C isotherm for predicting the near-future flow. Both, air temperature and relatively humidity were important variables that improved the performance of the weather-runoff model in the preliminary exercises. Particularly, air temperature at 3500 m a.s.l. can be associated with snow melt rate, whereas air humidity at the 0 °C isotherm controls the limit between liquid and solid precipitation [42]. (ii) Real-time flow observations are in general available, but it is not possible to

Discussion
In this article, we detailed a methodology that couples a process-based meteorological model that forecasts atmospheric conditions in the near future, with data-driven weather-runoff forecast models, which use these meteorological inputs for predicting hourly flow time-series in the near future. We implemented this methodology in región Metropolitana of Chile, for which two data-driven techniques were used for the weather-runoff forecast models-a simple ANN approach and a DL approach based on LSTM cells.
The data-driven weather-runoff models were designed based on the following three central ideas: (i) The near future flow (3 days) in the studied flow stations responds to both the precipitation rate of the storm, but also to changes in the watershed area or rate of snow melt (see Figure 11f). Consequently, a rainfall-runoff scheme (e.g., [25]) is not enough for predicting near-future flow, which justifies the weather-runoff concept that also uses air temperature and relatively humidity and the 0 • C isotherm for predicting the near-future flow. Both, air temperature and relatively humidity were important variables that improved the performance of the weather-runoff model in the preliminary exercises. Particularly, air temperature at 3500 m a.s.l. can be associated with snow melt rate, whereas air humidity at the 0 • C isotherm controls the limit between liquid and solid precipitation [42]. (ii) Real-time flow observations are in general available, but it is not possible to rely on the availability of a continuous measured time-series for forecasting the near-future flow, especially during large storms. As a consequence, the weather-runoff forecast model is predominantly based on the (very reliable) GFS-NCEP model, although the flow at t = 0 is (always) needed as input. In case of not having flow observation for t = 0, this flow can be estimated using simple cross correlations with the other flow stations. (iii) Early warning systems should be able to give accurate warning for extreme events, but should also be able to give accurate forecasts for small flows to avoid false positive warnings that eventually reduce the reliability of the system [6,43]. In this context, both data-driven approaches used the entire set of flow observations for training, validating and testing the forecast models, without paying specific attention to high flow events which are the important ones in early warning systems. For example, the range of predicted flows in the Maipo en el Manzano station varies between 25 to 1100 m 3 /s.
With respect to the performance of data-driven weather-runoff forecast models, it is possible to argue that both approaches are accurate for predicting Q avg and Q max ; however, flow prediction based on the DL approach is far more accurate than the flow prediction based on ANN approach, as shown in Table 5. This was pointed out by [25] with a rainfall-runoff model, and it is verified in this new approach with a weather-runoff model. The DL approach has an excellent performance, with values of RMSE 5%, compared to RMSE 15.5% for ANN approach for the prediction of the peak flow in station 1 (Table 5). These results are explained in the fact that, approaches such as ANN are not exactly adequate for the analysis of sequential data, such as the flow time series. To address the forecast of sequential data, is required to keep a certain memory of the previous state of the system, thus allowing the prediction using not only the present information but also the previous state. Since ANN do not have a temporal memory, they have difficulties in recognising temporal changes. This is reflected in greater error when predicting flow floods and therefore tend to underestimate the flow, as shown in Figure 11. In this context, one of the most successful techniques based on Recurrent Neural Networks (RNN) is the DL approach based on LSTM cells [24][25][26]. Although we use only one value of the flow as initial condition, the previous state for DL approach is obtaining through the seven sequences of the meteorological and geomorphological inputs of the previous days, which is used in the DL approach for generating the output sequential data.
Furthermore, another important feature of the proposed architecture of the DL weather-runoff forecast (Figure 5b) is that it is capable of predicting an output time-series with a finer temporal resolution (1 h) than for the input time-series (6 h), thus enabling the use of DL as a temporal downscaling technique. This allows to precisely locating the time at which the peak flow will occur, which gives the system an early warning advantage, as shown in Figure 12 and Table 6. The DL weather-runoff model is capable of capturing the peak flow, the time at which it will occur and the flow duration. All of this information would be available from three days in advance, which is very useful for allocating resources and warning the communities at risk.
Finally, it is important to notice that the methodology that was implemented for the nine flow stations in Maipo and Mapocho rivers, can, in principle, be scaled to the entire set of flow stations with real-time measurement in Chile (approximately 450 flow stations). The coupling between the GFS-NCEP model forecast and the DL weather-runoff forecast model may not vary; however, input variables to DL weather-runoff forecast model should be different in flow stations located in the desert of northern Chile (latitude: −22 • S) to the flow stations located in the austral part of southern Chile (latitude: −45 • S). For example, presumably, the air temperature associated to glacier melt should not be a relevant input data in northern Chile where there are no glaciers.

Conclusions
The intensification of the hydrological cycle because of the global warming raises growing concerns about future floods and their impact on large cities where exposure has also increased, such as the región Metropolitana of central Chile. Adequate water adaptation solutions as early warning systems are crucial. Given that the early warning systems give more importance to the simplicity and robustness of the forecasting model rather than an accurate description of the various internal sub-processes, it is certainly worth considering data-driven approaches for improving real-time runoff forecasts.
In this article, we implemented a coupled model of a near-future global meteorological forecast with short-range runoff forecasting systems based on DL, showing that DL is a valuable technique that allows the acceleration of the development of hydrological forecasting and early warning tools. The coupling between meteorological forecasts and the DL weather-runoff forecast model, on the other hand, are able to satisfy two basic requirements that any early warning system should have: The forecast should be given in advance in a time-frame larger than catchment concentration time, and should be accurate and reliable. In this context, meteorological forecasts are accurate and reliable in predicting near-future meteorological conditions, which feed the DL weather-runoff forecast, thus enabling a reliable flow forecast in advance.
Furthermore, DL significantly improves runoff forecasts when compared with a simple ANN approach, being accurate in predicting the time-evolution of output variables, with an error for predicting the peak flow of RMSE 5% compared to RMSE 15.5% for the ANN approach, which is adequate to warn communities at risk and initiate disaster response operations. Another interesting aspect of this approach is that it is capable of predicting an output time-series with a finer temporal resolution than the input time-series. This temporal downscaling allows us to precisely locate the time at which the peak flow will occur. Finally, the real-time implementation of these DL models can be found in the open access webpage www.AlertaHidrica.com.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2073-4441/11/9/1808/s1, Figures S1-S8 are to Figure 11 of the manuscript, but for stations 2-9. Figures S9-S16 are equivalent to Figure 12 of the manuscript, for stations 2-9. Figure Figure S16. Figure S9: Station 2. Comparison between predicted and observed flow for different cases identified with red open circles in Figure S1. (m) plots predicted and observed time-series for the event with maximum flow. Figure S10: Station 3. Comparison between predicted and observed flow for different cases identified with red open circles in Figure S2. (m) plots predicted and observed time-series for the event with maximum flow. Figure S11: Station 4. Comparison between predicted and observed flow for different cases identified with red open circles in Figure S3. (m) plots predicted and observed time-series for the event with maximum flow. Figure S12: Station 5. Comparison between predicted and observed flow for different cases identified with red open circles in Figure S4. (m) plots predicted and observed time-series for the event with maximum flow. Figure S13