Deep Learning Approach with LSTM for Daily Streamﬂow Prediction in a Semi-Arid Area: A Case Study of Oum Er-Rbia River Basin, Morocco

: Daily hydrological modelling is among the most challenging tasks in water resource management, particularly in terms of streamﬂow prediction in semi-arid areas. Various methods were applied in order to deal with this complex phenomenon


Introduction
Water resources are of great importance to ensure the world's needs, including for agriculture, industrial and domestic usage, as well as for other environmental systems.However, the availability of water resources is being limited in many countries around the world, especially in arid and semi-arid regions due to climate change, population increase, and irrigation expansion that affects socio-economic development and food security.In Southern Mediterranean regions, the degree of water scarcity and drought conditions may also increase the pressure on water resources [1,2].In addition, these areas are characterized by low precipitation with irregular spatiotemporally distribution and high evaporation.This controls the streamflow process that present a paramount component for understanding and monitoring the quality and quantity of the water supply [3,4].
Therefore, improving streamflow prediction in arid and semi-arid regions is a challenging task for sustainable water resources management and watershed planning, because it has provided valuable statistics to decision-makers for the assignment of accessible water for different purposes, particularly for the agriculture sector [5].This is the case in the Oum Er-Rbia river basin, which serves as one of the heartbeats of hydroelectric and irrigation networks in the kingdom [6].Successful and efficient water resources management require accurate and timely streamflow information.In this context, numerous methods have been used to estimate the streamflow at gauged or poorly gauged watershed involving empirical, physical, conceptual, and data-driven methods [7].Empirical models rely only on the information based on existing data, without taking into account the characteristics of hydrological processes [8].Physical and conceptual models may be two of the best hydrological models to simulate streamflow [9], but they need considerable parameters and require more effort to construct [10].Therefore, data-driven approaches, including machine learning and deep learning, have been revolutionary tools in the watershed planning process.They have largely improved streamflow simulation with no requirement of physical and underlying processes [11].
For machine learning, Support Vector Machine (SVM), regression trees and Artificial Neural Networks (ANNs) are the popular tools utilized to build prediction models, which have definitely improved their ability to solve regression issues [12,13].For streamflow simulation, several studies have been carried out to thoroughly evaluate the methods mentioned.For example, Hadi and Tombul [14] indicated that ANN performs better than SVM to predict streamflow on a daily scale with different physical characteristics.On the other hand, Parisouj et al. [15] revealed that machine learning models achieve favorable performance, especially the SVR, in a daily and monthly time step in different climatic zones.Besides, the reason behind the higher accuracy in each model is due to the input feature and training data, or to the model structure that can affect the selection of the accurate one.Indeed, traditional machine learning algorithms have a simple structure and less data requirement, but ANN and SVR are quite inefficient to capture the series' information in the input data, which is required for handling with the sequence variable [16].
To overcome the potential limitations of machine learning techniques, the use of deep learning, particularly for time series data, provides higher accuracy [17].Deep learning is a growing field with various studies that have been performed for time-series predictions.Recently, one of the famous models used in this field is the Recurrent Neural Networks (RNNs), considering the structure of RNN networks which has a solid aspect of sequence architecture that allowed information to preserve [18].However, because of its structure, this neural network's computation is slow and difficult to process long sequences, and it is incapable of dealing with the vanishing gradient challenge.Long-Short-Term-Memory (LSTM), a powerful RNN architecture, was developed to address vanishing gradient issues [19].Due to such capacity, LSTM has been applied by many researchers for streamflow prediction, as the streamflow information is associated with the last values over extended periods of time [20][21][22].Apaydin et al. [23] indicated that LSTM gives a better performance with supportive accuracy that make them useful for streamflow modeling, compared to ANN and simple RNN models that show an inferior reaction.Nonetheless, there is a difference between simulating streamflow within a daily and monthly time extent.The LSTM model is more applicable for daily prediction, while for monthly modeling ANN got the most accurate results.For example, Cheng et al. [24] found that, compared to ANN, the LSTM model displays a better results performance in daily prediction and less accuracy in a monthly scale because of the absence of an extensive monthly training dataset.Both approaches, ANN and LSTM have diverse scenarios, but they are similar and have advantages that make the long and short predictions more powerful predictive and effective, but with a priority to LSTM [25].The quantity of hydrological and meteorological data plays a critical role in predicting streamflow, as long as the improvement of the higher potential model is related to the higher aspect of the data [26].Several studies noted the effect of feeding the LSTM model with various meteorological data conditions on the performance of the model [27] through the streamflow process.For example, Choi et al. [28] successfully adopted the LSTM network to evaluate the composition of input variables on a daily scale.Moreover, due to the luck of management, the observed data may be disordered and insufficient for training the model, which affects the LSTM efficiency.However, the implantation of the LSTM model on a gauged or poorly gauged river basin seems reliable due to the training data that presents the backbone of the LSTM structure [29].Coi et al. [30] demonstrate the ability of the LSTM model to predict streamflow without hydrological observations.The findings revealed that the model is highly depending on the amount of available data.Therefore, recent methods focus on overcoming the problem by proposing different inputs.Kieran et al. [31] trained the LSTM model to predict streamflow using hydrological and meteorological satellite data as well as the antecedent observations of streamflow.Similarly, Rahimzad et al. [32] explored the capabilities of LSTM compared with different data-driven techniques based on historical streamflow and precipitation time series.The results revealed that the LSTM model is a robust network to distinguish sequential data series behaviors in streamflow modeling.
Furthermore, the input data of LSTM requires a three-dimensional array as an input: input samples, sequence length, and input features.The relationship between the input features and the sequence length may have influences on model performance, as the third dimension represents the number of features in the input sequence considering the previous time steps as input variables [33].The impact of sequence length with the input data on the streamflow prediction performance needs to be developed.
Although there are studies that aim to solve hydrologic problems in Morocco, such as in Berrchid city where machine learning models were used to forecast groundwater quality [34], but there are scarcely any studies that evaluate streamflow prediction using deep learning techniques.Thus, we found it to be a new debate area and interesting task to work on.
The designed experiments in this study will focus on evaluating the reliability of the LSMT network to simulate daily streamflow in a semi-arid mountainous watershed in Morocco, using meteorological data and remotely sensed information.However, the importance of the LSTM model can take advantage of information among time-series data, and performs better in predicting streamflow variability that represent a tendency over time.Thus, in order to elucidate the significance of the LSTM model in streamflow prediction, we explore the capability of the model within the time splitting zone using different approaches, as well as the effect of sequence length selection in the model performance.Moreover, due to the lack of data, this study also evaluates the impact of antecedent values of streamflow by comparing the performance of the model with two different forms of inputs.In the first instance, we initially presented the study area and the data used.This is followed by describing the model architecture and the methodology used.The last section is about the discussion of the training validation and testing results of different approaches that experiment with the effect of sequence length and input features in the LSTM's performance of daily streamflow prediction.

Case Study
Oued El Abid is the largest affluent basin of the Oum Er-Rbia river, with an area of 7975 km 2 located in the center of Morocco between the meridians 6 • 15 W and 6 • 30 W, and the parallels 32 • N and 32 • 5 N.This basin is a mountainous area with a significant water resource potential, feeding the Bin El Ouidane dam to cover the agriculture activities [35] and refreshing the groundwater downstream of Tadla plain [36].This study area, as a typical South Mediterranean basin, is characterized by a semi-arid climate with an average of approximately 480 mm/year as well as strong spatiotemporal variations in precipitation.Western movements and orographic effects play a crucial role to generate rainfall.The rainy period of the year lasts for 6 months (November to April) and the dry period lasts 4 months (June to September) with a start of watering in October, a maximum in January, and a minimum in July.Depending on the high altitude of the Atlas Mountains and a yearly scale volume primarily accumulated during the spring season, which corresponds to the snowmelt, the flow gradually shifts from rain to snow.The variation of temperature is notably influenced by the high elevation involving snow occurrence.The temperature drops to −9 • C in the winter and rises to 41 • C in the summer [9].The Oued El Abid river is made up of two main sub-basins, the sub-basin of Ait Ouchene and that of Teleguide.Our study focuses on the Ait Ouchene watershed (Figure 1, Table 1).

Data
The simulation of streamflow requires timely datasets.In this study, daily hydroclimatic datasets (rainfall, streamflow, temperature, and snow cover area) from 2001 to 2010 were used: In situ observations of streamflow and rainfall were provided by the Oum Er-Rbia hydraulic Basin Agency (ABHOER) [37].The regional rainfall of the Ait Ouchene watershed was represented by the average from the gauges situated within the same subbasin.The variation of daily measured streamflow and daily rainfall values is shown below (Figure 2).Due to the lack of ground measurement of snow depth, remote sensing was the main solution to estimate snow occurrence, especially over large mountainous basins.The daily snow cover time series (SCA) at a spatial resolution of 500 m were available from the National Snow and Ice Data Center (NSIDC) using Terra/MODIS (Moderate-Resolution Imaging Spectroradiometer) satellite data, MOD10A1 version 6 [38,39].MODIS was chosen as a baseline for producing SCA, because it is reliable and provides a good streamflow simulation, according to Ouatiki et al., and it has been already studied and tested in many basins [9,32,33].Additionally, the lapse rate approach was used to generate the daily temperature data at a rate of 0.56 • C per 100 m of elevation.[40].

Long Short-Term Memory (LSTM)
LSTM network is a particular variety of recurrent neural networks (RNNs) that was developed by Hochreiter et al. [19], and has been applied by many researchers due to the specific design that overcomes the long-term dependency problem faced by RNNs.[41].The structure of LSTM depends on three basic conditions: the cell state that defines the current long-term memory of the network, the output at the prior point known as the hidden state, and the input data in the current time step [42].Thus, the architecture of LSTM can control how the information in a sequence of data comes through three special gates: the forget gate, the input gate, and the output gate (Figure 3).The first step in the process is the forget gate (Equation ( 1)).The decision is taken through a sigmoid layer.Then, the input gate (Equation ( 2)) determines what value should be added to the cell state, taking into account the previous hidden state and the new input data.This step has two parts: the input gate layer that decides which values will update, and the tanh layer (Equation ( 3)).The previous cell state C t-1 is then updated into the new cell state by combining the two layers (Equation ( 4)).The output gate (Equation ( 5)), which determines the new hidden state, is the last phase.To decide which components of the cell state should be generated, it is important to run the sigmoid layer (Equation ( 6)).The mathematical formulas of the model structure are:

Methodology
The Python-based TensorFlow open-source software package and Keras were used to create the LSTM model for this study.The process used is illustrated in Figure 4, and is divided into four main steps: feature selection (a), data pre-processing (b), hyperparameters tuning (c), prediction and evaluation (d).

Feature Selection
In this study, we created two input scenarios to explore the sensitivity of the LSTM model in this region.First, rainfall (R), temperature (T), and snow cover area are (SCA) used as default inputs (scenario 1: LSTM).The second input scenario was generated by adding lagged data, conditions, and information on indicators providing the historical point of reference for the next steps.Thus, it will be used to assess the achievement of the effect and outcomes expressed in the model.The number of time lags of the streamflow was determined by using the Partial Autocorrelation Function (PACF) [43].Days 1, 2, and 3 were significant and had an impact on the streamflow at t = 1 day.Three lag days of rainfall, temperature, and SCA were considered to select the model features.However, to find the best subset of features, we used the Forward Feature Selection (FFS) algorithm [44], which evaluates each individual feature by incrementally adding the most relevant ones to the target variable (streamflow) [45].The subset of features that were found to be significantly correlated with the streamflow are presented in Table 2 (scenario 2: FFS-LSTM).This study identified additional concerns regarding the accuracy and reliability of the LSTM model.Accordingly, it is typical to have a mechanism to evaluate the overall performance of the model.Thus, splitting the input data into training (train LSTM) validation (evaluate LSTM) and testing (confirm the results) is the apparent and fast procedure to limit the model from overfitting and to compare its effectiveness in streamflow prediction.Although with time-series data, it is crucial to consider the back values that will be used for testing and training [24], hence we split the data using three approaches:

•
Approach 3: with limited data samples, k-fold cross-validation is the most widely used method to assess the model's performance.It divides the dataset in k equal-sized numbers, with one out of k parts is used as the testing set while the model is trained using k-1 folds [43].The configuration of the cross-validation parameter is referred as the number of split iterations that the dataset will be divided into.Overall, it is from 2 to 10 depending on the availability of the data.In this study, we tested the different values of cross-validation (CV).The appropriate value is CV = 5 with 80% as the training set (7 years), along with 20% of the train data as the validation set and 20% for testing (2 years) in each group that was employed (Figure 5).In addition, we separated the target variable (streamflow) and the input variables (rainfall, snow cover area, and temperature) from the dataset.The last step in the preprocessing action is the data transformation that plays a critical role in the performance of neural network models, when features are on a relatively similar scale and are close to being normally distributed.One of the most popular methods for scaling numerical data is normalization, which scales the input variables to a standard range between zero and one [46], and uses the same range for scaling the output corresponding to the activation function's size (tanh) on the output layer of LSTM.The MinMaxScaler function (Equation ( 7)), was used to decrease the minimum values in the feature and divided by range using the original training data.It is important to estimate the minimum and maximum available values, and accordingly apply the scale to the training validation and testing.This estimation method scales and converts each feature individually, so that it falls within the training set's given range of zero to one.
x = x − min(x) max(x) − min(x) (7) where x is the scaled value, and x is the original value.

Hyper-Parameter Tuning
The configuration of the neural networks is still difficult because there is no strong and specific approach to develop the algorithm [47].This is why we need to explore different configurations to decide and selected the values of parameters that can be used to control the learning process and avoid overfitting [42,44].In general, neural networks have numerous hyperparameters that minimize the loss function.The LSTM parameters used in this study are composed of three neural network layers: the input layer, the hidden layer and the output layer.Moreover, we used a regularization method named dropout to reduce overfitting and improve the model performance.For the baches, we used 32 batch for the first scenario and 10 batch for the second scenario.The number of epochs used in this study is 250 epochs, with an early stop of 10 epochs when the model performance stops improving on the validation set.The hyper-parameters selected for the model are summarized in Table 3 [48,49].Therefore, the architecture of the LSTM model is a 3D input (num_samples, num_timesteps, num_features) [50].In this study, the number of sequence lengths was evaluated using five-time steps: 2, 10, 20, 25 and 30 (denoted TS2, TS10, TS20, TS25, TS30) days of input data employed to drive the LSTM network to predict to next day.

Model Evaluation Criteria
Typically, there are various criteria for evaluating model performance for streamflow prediction.In deep learning, these parameters serve to compare the difference between the observed streamflow and the simulated output from the validation and testing values.In our study, we evaluated the LSTM performance by employing the Root Mean Squared Error (RMSE, Equation ( 8)), the Mean Absolute Error (MAE, Equation ( 9)), the Kling-Gupta Efficiency (KGE, Equation ( 10)), and the coefficient of determination (R 2 , Equation ( 11)).
The most often used metric in prediction and regression tasks is the Root-meansquare deviation.RMSE is the average squared difference between the true values and the predicted scores.Here, y i (m 3 /s) presents the observed streamflow for each data point, and y i ' (m 3 /s) presents the predicted value.The range of values for RMSE is 0 to ∞, and a great prediction result is obtained when RMSE = 0 [51].
MAE is a popular metric defined by the units of the error score that corresponds to the units of the predicted value, and is calculated as the average of the absolute error values of the difference.The MAE does not give more or less weight to different types of errors, and instead the scores increase linearly with error increases.The value of MAE ranges from 0 to ±∞, When MAE = 0 the prediction result is the best.|y i − y i | is the difference between the observed and expected values in absolute terms [52].
The Kling-Gupta Efficiency, KGE forward certain weaknesses in NSE and is increasingly utilized to calibrate and validate models.It was originally developed to compare predicted and observed time series which can be decomposed into the contribution of average, variance and correlation to model performance.Similarly to the NSE, when the optimal score of KGE = 1 the simulations and observations has a perfect match [53].Different researchers use positive KGE values as indicators for "good" model simulations, whereas negative KGE values are regarded as 'poor'.However, KGE = 0 is implicitly used as the dividing line between "good" and "poor" performance.In KGE, r is the Pearson correlation coefficient between actual and simulations values, β a result of dividing the simulation mean by the observation mean [33,46,54].
The coefficient of determination represents how much of the observed variation is explained by the model, and ranges from 0 to 1.A score of 0 indicates that there is no association, whereas a value of 1 indicates that the model can fully explain the observed variation [10].

Results and Discussion
Three approaches were utilized in this study to assess the LSTM model's effectiveness adopting random (approach 1 and approach 2) and automatic (approach 3) data splitting methods.The purpose for designing different datasets is to explore the impact of the training series of data for different time period values on the hydrological process, where the changes over the year in the hydroclimatic conditions cause significant variations in streamflow [55].Moreover, the effect of input features and sequence length on model appearance was conducted to verify model reliability.The statistical metrics of LSTM and FFS-LSTM at TS2, TS10, TS20, TS25, and TS30 using the three approaches, making a comparison during the training between the validation and the testing are shown in Tables 4-7.

Evaluation of Model Performance Using Random Split
The quantitative analysis of the model behavior (Table 4) using approach 1, illustrates that LSTM (scenario 1) achieved extremely high RMSE and MAE values, shallow values for R 2 , and negative KGEs.However, the performance of LSTM increases with the number of time steps in predicting results.Thus, the LSTM network hardly remembers the sequence using 2 days of data as input for predicting the next day's flow.This is mainly due to the memory challenge in watersheds involving the snow and, thus, the lag between the rainfall and streamflow peaks.The results produced under the first input scenario in the validation and testing periods demonstrate that the model is unable to simulate daily streamflow in this study region, where the higher values of R 2 were found at TS = 30 (0.75 in training, 0.46 in validation and 0.45 in testing), due to the memory data feature of LSTM that was insufficient to feed the model.In addition, the achievement of the model decreased during testing using approach 2 (Table 5), considering the start and the end of the hydrological year from 1 September 2001 to 31 August 2007 as the training samples, where the best statistical data found at TS = 25 days with R 2 = 0.71, 0.51 and 0.34 after training, validation, and testing, respectively.This is mainly due to the meteorological input data and the strong spatiotemporal variability of rainfall.When defining an LSTM network, the network assumes more samples and requires the number of time steps and features to be specified.A time step presents one point of observation in a sample, and a feature is one observation at a time step.Thus, adding lagged data during training could lead LSTM to catch how water is lagged and moved into the watershed, which is interesting for improving model performance.This notion is demonstrated in the second scenario (FFS-LSTM).The values of RMSE, MAE, KGE, and R 2 in learning, validation, and predicting sets indicated upstanding streamflow simulation capacity of the LSTM model at TS10 (approach 1) and TS20 (approach 2).Thus, there is a change in KGE, and R 2 distribution values of LSTM between different periods at TS2, TS25, and TS30.This finding indicates that the generalization of the LSTM model may considerably compromise the appearance of an extreme event, where the LSTM memory cell holds the previous streamflow values to predict the current streamflow.However, in the first approach, when the sequence length was 25 days the model tended to overfit in the testing phase, with a significant decrease in KGE.For both scenarios, the RMSE and MAE decrease with the time step.These results show that the model has the capability to catch the long-term streamflow components, as well as the reliability of LSTM using the second scenario with both approaches.
Figures 6-9 shows the best hydrographs and scatterplots of the observed versus predicted daily streamflow during the testing phase.The green line represents the observed daily streamflow, and the blue and the purple lines represent the prediction results from LSTM and FFS-LSTM scenarios, respectively.Moreover, the time series plots in Figures 6 and 7 show the testing results of the first approach, while Figures 8 and 9 show the same results adopting the second approach.From the figures, it appears that the results from both scenarios using the first approach (70% of training data) had a significantly similar hydrographic form as the second approach (considering the hydrological year).This is mainly due to the period of the training and validation datasets that were almost identical.
The performance of the model using the first input scenario LSTM almost simulates the low flow data.However, at some points it underestimated the flow occurrence which is vital for water supply planning and the preservation of a quantity of water for irrigation.Moreover, as the extreme peak volumes are essential to monitoring flood and disaster events, the second input scenario almost captured the peak flow events using approach 1 with a flow volume of 223 m 3 /s (Figure 7).The maximum volume caught using approach 2 was 241 m 3 /s (Figure 9).In the scatterplots (Figures 6b and 8b), the points between the simulated and observed streamflow show that the model has underestimated the actual streamflow, since most of the points were under the diagonal line 1:1; this should be attributed to the size of the time-series data.The randomly split approaches have improved their capacity to predict streamflow using the LSTM model with the second scenario, and fail using the first scenario.However, with FFS-LSTM, it is important to evaluate the training data period, since the model performance may depend on the input data.Therefore, we used an automatic splitting approach named cross-validation to evaluate the model performance in different training times.

Evaluation of Model Performance with Automatically Split
The results of the third approach are summarized in Tables 6 and 7.Only the scores of sequence lengths TS30 for scenario 1 and TS10 for scenario 2 are shown, comparing the splitting training period on the performance of the model.The outcomes using FFS-LSTM appear to be much more effective than those of LSTM (scenario 1).It produced higher R 2 , NSE, and KGE in iterations 2 and 5. Lower scores set in CV = 1 due to the learning data that was taken from the end of the series.Decreasing time steps with fewer features impairs the ability to carry informative signals through time, which makes the prediction for the test data less efficient and probably makes the prediction erroneous.The performance at 10 and 30 days in approach 3 with both scenarios increases with the number of iterations, due to the splitting of the learning period that has an influence on the memory of the LSTM network.The bold values of RMSE (Table 6) were compared with the results performance of different time steps (Figure 10a).As may be seen in Figure 10a, the values of RMSE decrease with the number of time steps at CV5, which shows the superiority of the long-term storage memory (LSTM) unit state.However, the performance values of FFS-LSTM (Figure 10b) were quite stable and not affected by the splitting time zone of the data.In addition, at TS = 25 the testing period from 25 January 2007-11 November 2008 has a high RMSE with 10.16 m 3 /s, due to the chronological data splitting that was a discrepancy at this time step.With the results performance using scenario 1, without adding lagged data as input, the model tends to overfit, thus at CV5 it was reliable.In addition, when using the lagged data, there is a variation of KGE values during the training, validation and testing at the first four folds.Thus, the high values of RMSE in the CV5 are due to the period of training the dataset and the testing data, where the five-fold of the cross-validation was tested with a year that has a high flow volume over 349 m 3 /s (Figure 2).During the testing phase at TS10 using FFS-LSTM at CV1, CV2, CV3, CV4 and CV5, the values of RMSE are 8.31 m 3 /s, 7.05 m 3 /s, 4.97 m 3 /s, 4.78 m 3 /s and 12.40 m 3 /s, respectively, with the maximum observed streamflow of 232.88 m 3 /s, 294.78 m 3 /s, 72.86 m 3 /s, 75.90 m 3 /s and 349.06 m 3 /s, respectively.The flow volume variation influences the changes in the RMSE outputs.Similar to the first scenario, the high RMSE is related to the flow regime.
For better visualization of the model's performance, only the best hydrograph using the FFS-LSTM scenario at iteration 5 is shown in Figure 11.Compared to the previous approaches, the values of the prediction are almost the same due to the size of the learning set, which is nearly identical to approach 1 and approach 2. However, the outputs values of FFS-LSTM in approach 2 are slightly higher than that of approaches 1 and 3 in terms of peak streamflow values.Clearly, the model is not highly affected by the condition of the hydrological year in splitting as well as the splitting time.The prediction results of the FFS-LSTM model with the third approach overestimated the daily streamflow in the periods January 2009 and August 2010.Previous studies have reported the importance of the input sequences length on the storage capability of the basin, and the sensitivity of this hyperparameter over the overfitting predictions issues [30,33].In our study, the analysis of the sequence length over the three approaches has been improved to capture the dynamics of the daily streamflow prediction.

Reliability of LSTM Model
The upstanding results yielded by the FFS-LSTM scenario may be explained by the structure of LSTM, which has an intelligent architecture based on memory cells that retained valuable information over a longer period of time by serval memory cells that could filter and keep the data.The LSTM model's capacity to almost capture the peak of 241 m 3 /s was carried out into the prediction period.This is a powerful set that the LSTM model provides, with relatively high precision for streamflow in small volumes.Moreover, in our study, we also focused on the impact of the splitting time zone.The performance of the model has compared the random splitting over two approaches, using the end and the start of the hydrological year during the splitting that describes a time period of 12 months, and the chronological time splitting that was defined by the cross-validation using two scenarios.According to the results, almost all the low values were apprehended by the model using the three approaches.The overestimation of the model in the testing set was found mostly in the third approach (Figure 11), on account of the reliability of the LSTM model to the hydrological variables leading inputs (the learning data).Moreover, the achievement of the LSTM model in simulating streamflow using short-term datasets with the first scenario is less improved compared to the conceptual hydrological model's results, due to the data requirement by deep learning [9].In most of the studies that used the LSTM model as a setting in hydrological problems, the 3D inputs weren't a priority.In our study, we used the FFS method as a key solution to determine the optimal input combinations utilizing the optimum time step.Hence, in the process of developing the model, the historical input features play an essential role in the model achievement as well as the length of time steps.The results performed by FFS demonstrate the capability to choose the appropriate predictors when adjusting the sequence length, to avoid the issue of overfitting.In addition, the main problem that causes the overfitting is the data scarcity, with all the models tested used on a limited time series dataset.Hence, the LSTM model may not be sufficiently informed with the watershed hydrological processes that may have a difference between streamflow and rainfall.This discrepancy was described by Ávila et al. using hydrological models [56].It is worth highlighting that the LSTM model is capable of achieving good predictability performance with all approaches at TS10 and TS20.Based on these results, the selection of the appropriate 3D input combinations and time lag supports the LSTM model to be more reliable.The past histories of each variable with the time steps used in the model can affect streamflow prediction.It enables the model to capture the heading of the time-series dataset, demonstrates a powerful capacity of prediction, and empowers the memory process throughout the LSTM model.In this model, the structure used information about previous computations from specific previous steps to determine whether or not this instruction should be passed on to the next iteration.Since the LSTM model generates the data in numerous time steps, the input data are utilized to update a set of parameters in the internal memory cell states at each step during a training period.Memory cell states are only influenced throughout the prediction period by the input at a single time step and the states from the previous time step.However, machine learning methods such as the ANN model lack a chronological recall and presume that the model's inputs are independent of one another, making it impossible to detect temporal changes.As a result, the model's memory cells help the LSTM model better capture dataset trends and demonstrate its predictive power.However, the LSTM was unable to predict streamflow when using the default data as inputs.This demonstrates that the datasets were not enough to feed the model to capture the streamflow.

Conclusions
Accurate streamflow prediction has always been one of the primary concerns in watershed management.In this work, we studied the flexibility of the data-driven LSTM model on the streamflow prediction over a semi-arid region.Comprehensively, the LSTM model was tested based on three input conditions.It has been concluded that, the hydrological year (approach 2) and the time splitting zone (approach 3) does not significantly affect the performance of the model, where the accurate time step number is related to the selection of input feature.On the other hand the model shows upstanding performance in recording the streamflow time series using the Forward Feature Selection technique, compared with the default data as input features where the model shows a bad reaction.The FFS method of streamflow decomposition is a meticulous process that significantly improved prediction accuracy using the LSTM model.
The outcomes of the analyses used in this study, illustrate the major issues connected with hydrological modeling studies, particularly the high connection between the LSTM design and the significant impact of input condition circumstances.However, in some of our findings, the model showed overfitting prediction issues due to the scarcity of useful information on ground data, which present the limitations of our study.
In conclusion, the streamflow experiments carried out by LSTM model, learning from meteorological data and satellite data of the studied watershed were impressive.Yet, it is necessary to investigate the stability of the LSTM model, which would be our priority in future studies.

Figure 1 .
Figure 1.The geographical setting of the study area.

Figure 3 .
Figure 3.The architecture of Long-Short-Term Memory (LSTM) where σ presents the sigmoid function, tanh the hyperbolic tangent, C t−1 previous cell state, h t−1 previous hidden state, x t input data, C t new cell state and h t present the new hidden state.The adding and scaling of information is represented by the vector operations (+) and (X), respectively.

Figure 4 .
Figure 4. Flowchart of the modeling process for streamflow prediction model.

• Approach 1 :
splitting data to 70% training, 15% validation, and 15% testing[5,25].The learning period was set from 1 September 2001 to 14 December 2007, the validation period was from 15 December 2007 to 24 April 2009, and the testing period was from 25 April 2009 to 31 August 2010.• Approach 2: splitting data taking into consideration the hydrological year that started from September of the current year and ended in August.Six years for training (1 September 2001-31 August 2007), one year and 6 months for validation (1 September 2007-28 February 2009), and one year and 6 months for testing (1 March 2009-31 August 2010).

Figure 6 .
Figure 6.The hydrograph of observed and predicted daily streamflow of LSTM scenario during the testing (a) using approach 1 at TS = 30 along with the corresponding scatter plot (b).

Figure 7 .
Figure 7.The hydrograph of observed and predicted daily streamflow of FFS-LSTM scenario during the testing (a) using approach 1 at TS = 10 along with the corresponding scatter plot (b).

Figure 8 .
Figure 8.The hydrograph of observed and predicted daily streamflow of LSTM scenario during the testing (a) using approach 2 at TS = 25 along with the corresponding scatter plot (b).

Figure 9 .
Figure 9.The hydrograph of observed and predicted daily streamflow of FFS-LSTM scenario during the testing (a) using approach 2 at TS = 20 with the corresponding scatter plot (b).

Figure 10 .
Figure 10.Comparisons of model predictions in different time steps and different data splitting during the testing phase using approach 3 (a) LSTM.(b) FFS-LSTM.

Figure 11 .
Figure 11.The hydrograph of observed and predicted daily streamflow of FFS-LSTM scenario during the testing (a) using approach 3 (CV = 5) at TS = 10 with the corresponding scatter plot (b).

Table 3 .
Hyper-parameters used for training LSTM network.

Table 4 .
Performance of the model scenarios for daily streamflow simulation using approach 1 (70% train; 15% valid; 15% test) with a variety of sequence lengths (number of time steps).

Table 5 .
Performance of the model scenarios for daily streamflow simulation using approach 2 (considering hydrological year) with a variety of sequence lengths (number of time steps).

Table 6 .
Performance of the LSTM scenario for daily streamflow simulation using approach 3 (cross-validation) at TS = 30.