Assessment of Short Term Rainfall and Stream Flows in South Australia

The aim of this study is to assess the relationship between rainfall and stream flow at Broughton River in Mooroola, Torrance River in Mount Pleasant, and Wakefield River near Rhyine, in South Australia, from 1990 to 2010. Initially, we present a short term relationship between rainfall and stream flow, in terms of correlations, lagged correlations, and estimated variability between wavelet coefficients at each level. A deterministic regression based response model is used to detect linear, quadratic and polynomial trends, while allowing for seasonality effects. Antecedent rainfall data were considered to predict stream flow. The best fitting model was selected based on maximum adjusted R2 values ( 2 adj R ), minimum sigma square (σ2), and a minimum Akaike Information Criterion (AIC). The best performance in the response model is lag rainfall, which indicates at least one day and up to 7 days (past) difference in rainfall, including offset cross products of lag rainfall. With the inclusion of antecedent stream flow as an input with one day time lag, the result shows a significant improvement of the 2 adj R values from 0.18, 0.26 and 0.14 to 0.35, 0.42 and 0.21 at Broughton River, Torrance River and Wakefield River, respectively. A benchmark comparison was made with an Artificial Neural Network analysis. The optimization strategy involved adopting a minimum mean absolute error (MAE). OPEN ACCESS


Introduction
A review of rainfall-runoff modeling has been given by [1].Rainfall and stream flow models can be applied to a diverse range of purposes including daily control of reservoirs, projecting future stream flows and flood management.Rainfall and stream-flow models can be classified as physically based, conceptual and empirical.Physically-based models include the Système Hydrologique Européan with sediment and solute transport [2] and Gridded Surface Subsurface Hydrologic Analysis [3] both of which require extensive spatial and temporal data and typically are used for small catchments.An example of a conceptual based model is the Modèle du Génie Rural à 4 paramètres Journalier (GR4J), which has been developed for understanding catchment hydrological behavior [4].Other examples of conceptual rainfall-runoff models are the Sacramento Soil Moisture Accounting Model [5] and the SIMulation and HYDrologic model (SIMHYD) [6], which can be applied either as a lumped or gridded application.SIMHYD estimates daily stream flows from daily rainfall and areal potential evapotranspiration data.The class of empirical models includes time series models [7][8][9][10][11][12][13].An advantage of an empirical model is that it can be fitted to situations where the hydrological data are restricted to rainfall and stream flow time series.A further advantage is that in a parametric test, a distribution can be fitted for assessing the hydrological behavior for any time period in any region.In addition, they can represent either linear or non-linear relationships.Time series models perform as well as physically-based alternatives [14].Combined a conceptual model with an artificial neural network (ANN) for forecasting inflow into the Daecheong Dam in Korea [15].Compared the wavelet decompositions of rainfall and runoff at four sites in the Tianshan Mountains [16].They aimed to distinguish between errors in timing and errors in magnitude of hydrograph peaks.They used a cross-wavelet technique to quantify timing errors and hence provided an empirical adjustment to model predictions of stream flow.
In this study, we have proposed a novel method for assessing short-term rainfall and stream flow models.The travel time between rainfall and stream flow gauges using cross-correlation functions [10,17].They reported that the travel time was less than one day for the Onkaparinga catchment in South Australia.In this paper, we presume that there is a higher order relationship between rainfall gauge and stream flow data.It is, therefore, important in this study to construct the correlation structure.Linear regression models are commonly used for time series analysis [18], particularly for assessing evidence of trends, higher order changes and variability, including allowing for seasonality.We developed deseasonalized and detrended time series rainfall and stream flow models from deterministic regression models including linear, quadratic and cubic terms.These models take account of both lag rainfall and the influence of stream flow.The results of this study will be useful for water managers and policy makers involved in sustainable water resource management and climate change adaptation for the catchments used in this study.The approach is capable of modeling the non-linear relationships between inputs and outputs using ANNs [19].The first advantage of ANN is that it only requires a small number of parameters and learns through a number of training iterations involving adjusting the parameters (weights) of the network [20].A second advantage is that it is useful in situations where it is complex to build a physical or conceptual model, such as hydrological modeling of rainfall-stream flow processes [21][22][23][24][25]. ANN models were useful to find the relationships between rainfall and river flow data in a river basin in India [26].We present a statistical approach that uses the deterministic features of a regression model to build many neural networks with a combination of different lagged input patterns.A wavelet based regression model for stream flow using the discrete wavelet transform (DWT) of the entire time series [27].They also provided a comparison of their model performance with ANN.A chaotic stream flow model using an ensemble wavelet network [28].Used wavelet analyses of rainfall and runoff and wavelet rainfall-runoff cross-analyses to investigate the temporal variability of the rainfall-runoff relationship [17,29].They found that wavelet transforms provide a physical explanation of the temporal structure of the catchment response.

Data Collection and Preparation
The analysis is based on data from three rainfall and stream flow stations in South Australia, as presented in Figure 1.The Broughton River (BR) station is at Mooroola, which is located approximately 40 km north of Port Broughton and 20 km south west of Port Pirie.Torrance River (TR) station is located at Mount Pleasant, and its rivers and tributaries are highly variable in flow and together drain an area of 508 km 2 .Wakefield River (WR) is an ephemeral river near Rhynie, with a catchment area of approximately 1913 km 2 .The elevation of the river may indicate the hydrological feature, presented in Table 1, Column 4. These stations were selected because they had long records of rainfall and stream flow and the highest quality control in terms of Australian Bureau of Meteorology, [30] and the Department for Environment, Water and Natural Resources [31] quality designations for rainfall and stream flow records.Information on these stations and data quality are presented in Table 1.In this paper, there was less than 1% missing data and these were replaced by the mean of the series of rainfall and stream flow, to give an unbroken time series for analysis.Methods for replacing periods of missing values are discussed [18,32].In this paper, we propose a dyadic signal time period (i.e., 2 n where n is an integer and n ≥ 0, for assessing the relationship between daily rainfall and stream flow during the period 1990-2012.We observe the discrete sequence of time series {yt} where {yt} is an integer ranging in length.We extract multi-level information of observed rainfall and stream flow series in three catchments in South Australia using the Haar wavelet decomposition.We split {yt} into 10 sub-time series of length power two i.e., 2 n , where n is the level of the time series, starting from 0. We also investigate the correlation between rainfall and stream flow patterns for each sub-series from levels 0 to 8.

Assessing the Relationship between Rainfall and Stream Flow
The open source software R [33] was used for the analyses in this paper.We calculate 10 subseries of rainfall and stream flow from 1990 to 2012 using the "wavethresh" R routine packages [34,35] for assessing the relationship between rainfall and stream flow.The length of time taken into account in 10 subseries for rainfall and stream flow is a period of 512 days.
The relationship between rainfall and stream flow within 10 subseries is presented in Figure 2. The maximum correlation coefficients are 0.08, 0.23 and 0.31 at Broughton River, Torrance River and Wakefield River, respectively.These values are between −1 and +1 in all cases, indicating the degree of linear dependence between rainfall and stream flow.For assessing short term spatial variability, a correlation coefficient of the sub-series of rainfall and stream flow less than 0.4 indicates a significant difference from 0 at each station.For example, in sub-series 2, the correlation coefficient was 0.04, 0.15 and 0.28 which indicates the independence of rainfall and stream flow at Broughton River, Torrance River and Wakefield River, respectively.In order to understand stream flow availability under the climatic conditions in South Australia, we investigated the characteristics of rainfall and stream flow patterns, as categorized by climatic phenomena.A statistical measure of the dispersion of rainfall and stream flow patterns around the mean is defined as follows: where CV is defined as the coefficient of variation and is represented by the ratio of the standard deviation (Sx) to the mean (µx).Table 2 shows the degree of variation in rainfall and stream flow patterns.In Table 2, the CV for stream flow patterns indicates higher variability than for the rainfall series.Figure 3 shows the variability of the wavelet coefficients from levels 0 to 8. The evidence of association between the rainfall and stream flow coefficient is strongly correlated at the 5% significance level in Table 1.

Correlation Structures between Rainfall and Stream Flow
In the previous sections, we calculated wavelet coefficients for each subset of the rainfall and stream flow series.In order to filter each of those series, we applied Haar wavelets.
The constructed correlation pattern for each rainfall and stream flow sub-series for levels 0 to 8 is given by: where rk is the constructed correlation with level n from 0 to 8 and is the jth sub-series of the rainfall and stream flow wavelet decomposed with the Haar procedure.The results are presented in Table 3.
The evidence of significant correlation (r ≥ 0.50) between rainfall and stream flow wavelet coefficient series with at least a 5% significance level is shown in Table 3.Furthermore, to avoid co-linearity problems, the squared rainfall and stream flow wavelet coefficient series are also included.We found that a correlation structure (r = 0.56) such as stream flow is determined by rainfall on at least 4 days with 5% level at the Broughton River Basin and Torrance River Basin, as shown in Table 3.The adjusted squared stream flow and rainfall has a little evidence of correlations (i.e., at 5% level) up to 64 days at Torrance River at, also a marginal correlation (r = 0.51) up to 128 days within squared adjusted rainfall and adjusted stream flow at Wakefield River.The rainfall and stream flow relationship was used to develop a response model for predicting stream flow.

Rainfall-Stream Flow Response Modeling
The constructed correlations described in the previous section may be partly due to common seasonal variations and trends, so a first step is to estimate these deterministic features with regression models for entire period from 1990 to 2010.The residuals from these regressions are reformed to the deseasonalized and detrended (dsdt) time series.For all three stations, a cubic trend gave a statistically improved fit over a linear or quadratic trend over the study period.The seasonal variation was reasonably modelled by a sinusoidal curve.Therefore, the regression models are of the form: where, Ti represents either rainfall or stream flow; time is the mean adjusted time, that is ) ( t t − where t is the number of days from the start of the record and t is the mean of t, time 2 and time 3 , which allows for possible quadratic and cubic trends; C is cos(2πt/365.25)and S is sin(2πt/365.25)and together these allow for seasonal variation of period one cycle per year; βj are the unknown coefficients to be estimated; and εt are random variations with mean 0 and constant standard derivation.
For the estimated coefficients, only a few values are significantly different from 0 even at the 5% significance level, as shown in Table 4.There is evidence of significantly different trends in rainfall at Wakefield River, which may have corresponded to increased stream flows if rainfall is increased.We have predicted the stream flow (Yt) on day t from rainfall (Xt) with corresponding lags k.This is referred to as a Response Model (RM).The regression is defined as: We assess stream flow in response to rainfall at lags 0 to 128.The best fitted model is selected based on the adjusted coefficient of determination; ) ( 2 adj R ; minimum sigma squared (σ 2 ) and the Akaike Criterion Information (AIC); The AIC is defined as: AIC = 2 × number of parameters − 2 Log(L) (6) where L is the maximized value of the likelihood function for the estimated model.Comparisons of the AIC for different model is as shown in Table 5.The 2 adj R value significantly reduces and the estimated stream flow influence is close to zero after the exogenous rainfall at lag 7. Therefore, we reduced the exogenous rainfall at lags from 128 to 7 in the response model; referred to as RM0 in Table 5.This strategy is sub-optimal inasmuch as rejected terms might meet the retention criterion if added back individually.However; any small improvement in 2 adj R would be balanced by increased complexity in the model; which is undesirable if interaction and squared terms are added.The regression model is defined as RM: In the second model, we add deterministic features to the regression model including linear, quadratic and cubic terms of t, allowing for seasonality effects.This model is defined as RM_D: The third model is defined as RMD_AR [1] and is an autoregressive model of order 1 (AR [1]) with RM_D.It can be written in the form: The fourth model is defined as RMD_AR [2], and is an autoregressive model of order 2 (AR [2]) with RMD_AR [1].It can be written in the form:  Finally, we develop a model for a benchmark comparison of stream flow on day t based on the entire previous period of stream flow and their influence (τ) adding with model RM_D.This model is defined as RMD_tau.Tau (τ) is 0 if there is no stream flow influence from the previous day's rainfall.We have demonstrated an example of count stream flow influence in Table 6.Table 6.An example of count tau and stream flow influence rainfall over time.In the Table 6, when the day t = 6, Y6 = 2, then we count tau = 3 (number of 0), and Y6-3-1 = 9, can be applied in the referred model RMD_tau.
The model RMD_tau can be written in the form: The fitted model for predicted stream flow in response to exogenous rainfall, deterministic features of the regression model, and previous stream flow influence, is presented in Table 5.The best fitting model selection was based on minimum AIC and minimum root mean square Error (RMSE).The RMSE is defined as: where, t Y ˆ is defined as the estimated stream flow and Yt is the observed stream flow, respectively.
The response model RM0 has 128 predictor variables namely the rainfall lags at 0 to 128.Therefore, there are 129 parameters to estimate including the intercept.The estimated rainfall effects belong to 0 up to 7 days lag, therefore we reduced the rainfall lags from 128 to 7 days and the optimized 2 adj R values for this model are 0.16, 0.24 and 0.13 for Broughton River, Torrance River and Wakefield River, respectively, as presented in Table 5.We also offset the cross product term of lags to further reduce the complexity of this model.The second model included linear quadratic and cubic terms, and this model is denoted as RM_D.The number of parameters to be estimated is therefore 8 + 3 = 11 and the 2 adj R increased to 0.18, 0.26 and 0.14 for Broughton River, Torrance River and Wakefield River, respectively, which is a practical and statistically significant improvement.We then added a first order autoregressive term, referred to as a RMD_AR [1] model, and a second order autoregressive term referred to as a RMD_AR [2] model.We also made a benchmark comparison by using the entire stream flow record and this model is denoted RMD_tau, as presented in Table 5.
In Table 5, there is evidence of improvement of 2 adj R values, RMSE in m 3 s −1 from RM to RM_D.
Adding autoregressive order 1 (AR [1]) with RM_D results in substantially improved 2 adj R values (from 0.18, 0.26, and 0.14 to 0.35, 0.42 and 0.21 for Broughton River, Torrance River and Wakefield River, respectively.Furthermore, when adding autoregressive order 1 (AR [1]) with RM_D, there is evidence of improvement but this may be offset by the increasing number of parameters that affect the complexity of the model.In addition, the RMD_tau model represents a small improvement for two of the three river basins.The best fitted models are RMD_tau for Broughton River, RMD_AR [2] for Torrance River and RMD_tau for Wakefield River, were selected based on the minimum Akaike Information Criterion (AIC) and minimum root mean square error (RMSE) in m 3 s −1 .The residuals from the best fitted models were transformed to normalized form by factor multiplication.A factor was calculated, which allows for the fact that the mean of a non-linear function of a random variable is not equal to that function of the mean.The transform series follow an identically normalized form with mean (μ) of zero, standard deviation (σ 2 ) of 1 and a random disturbance term (εt) which is uncorrelated.
The transformed series were used to predict the stream flow on day t based on the predicted stream flow influence over the short term, as shown in Figure 4.
In Figure 4, we demonstrate the versatility of stream flow prediction.It can be seen that this is a non-linear relationship when expressed in terms of the physical interpretation of stream flow based on rainfall.

Modeling Stream Flow Using an Artificial Neural Network
Artificial neural network (ANN) techniques are motivated by the principles of biological nervous systems [36].Although there are different types of ANN, the multilayer feed forward network is the most commonly used technique.For example, a common approaches of training using back-propagation in a multi-layer feed forward network [23].The network consists of input, hidden and output layers.Each layer is fully connected with the proceeding layer with weights in each connection, as shown in Figure 5.In Figure 5, the number of nodes in the input layer is p, the number of nodes in the hidden layer is q and the number of nodes in the output layer is r.The initial assigned random weights are updated during the training process by comparing the predicted output and the known output for errors.Errors are then back-propagated to adjust the weights.The dsdt of daily rainfall and stream flow data from the regression model developed in the previous section are considered for developing a prediction model for each of the three river basins for the years 1990 to 2010.A certain methods proposed such as input selection, model architecture selection, model calibration (training) and validation (testing) [37].In addition, we emphasize the fact that ANN set-up has to be carefully achieved and described to get the reliable results.This study described the steps in building the prediction models for stream flow.We consider the prediction function as: St+1 = f(St, St-1, St-2, ….., St-m, Rt, Rt-1, Rt-2,...,Rt-n) where S represents stream flow, R represents rainfall, t is the current day, m = {3,...,8}, n = {3,...,8} and f represents the ANN as a regression function.We investigate necessary lagged inputs of rainfall and river flow for modeling the river flows at three locations in South Australia.We apply an artificial neural network (ANN) technique for modeling river flow.ANN models are developed with all combinations of rainfall and river flow input ranges.In addition, a standard range of nodes in the hidden layer are also considered.Among all models based on inputs and hidden nodes, the best model is selected based on mean absolute error criteria.This entire process is applied to all three locations.ANN models capture the non-linear relationships of rainfall and river flow patterns in modeling river flows from large time series data.For example, if we consider 3 days lag of stream flow and 5 days lag of rainfall, then the total number of input nodes in the ANN structure will be 8 and we consider the number of nodes in the hidden layers ranging from 1 to 10.To achieve the best model using ANN for each location, all inputs not only apply in combination, but we also consider setting a range of parameters, such as different number of nodes in the hidden layer, for each combination of inputs.
In predicting stream flow one day ahead as output, we consider stream flow and rainfall with combinations of consecutive lags where the minimum lag is 3 days and the maximum lag is 8 days.Thus, for each location, the total number of models to be trained becomes 36.As the data set is large, one year of data is considered initially for testing.For training ANN models at each location, we consider stream flow and rainfall data for the period 1990 to 2009.The remaining data for the year 2010 is used for testing the best model found in the training phase.
For the Multilayer Perceptron (MLP) function, the ANN stream flow prediction model was built using the RWeka package in R Language [38].One of the important parameters to specify is the number of nodes in the hidden layer, which may vary for time series modeling in different locations.Using trial and error, the number of nodes in the hidden layer is considered from 1 to 10.This range is widely used in hydrological time series modeling [21].We consider the learning rate (the amount the weights are updated) to be 0.3, momentum is 0.2 and the number of epochs to train is 500.
Application of back propagation in ANN with a sigmoidal function was used to set the normalized data in the MLP function.Furthermore, the mean absolute error (MAE) in m 3 s −1 was minimized through an iteration process that varied the number of nodes in the hidden layer.
The best lag combination at each location is presented in Figure 6. and the lowest root mean square error (RMSE) and mean absolute error (MAE) for each location is presented in Table 7.For Broughton River, 3 days rainfall and 6 days stream flow as lagged inputs with 9 nodes in the hidden layers produces the lowest MSE.
At Torrance River, 3 days rainfall and 8 days stream flow as lagged inputs with 2 nodes in the hidden layers produces the lowest MSE.For Wakefield River, 4 days rainfall and 5 days stream flow as lagged inputs with only one node in the hidden layer produces the lowest MSE.This indicates the variability in the ANN models for different locations.
When the best model is identified based on the training data for each location, we use this model on testing data prediction.This study show the prediction results for the testing data for each location.Figure 7 shows the predicted and observed stream flows using testing data for the locations Broughton River, Torrance River and Wakefield River, respectively.The MAE for training and testing data is shown in Figure 8 for all three locations.We observed that the MAE for the training and testing data at Broughton and Torrance Rivers do not vary significantly.
For Broughton, in training, the best ANN model structure includes 3 days lagged rainfall and 6 days lagged stream flow as inputs with 9 nodes in the hidden layer.This model has the lowest MAE, at 45.53 m 3 s −1 .We further use this best model for testing and we find the MAE of 32.43 m 3 s −1 .For Torrance, the ANN best model in training has 3 days lagged rainfall and 8 days lagged stream flow as inputs with 2 nodes in the hidden layer achieving the MAE of 4.89 m 3 s −1 .For testing data, this model gives a MAE of 9.27 m 3 s −1 .In case of Wakefield, the best ANN model has 4 days lagged rainfall and 5 days lagged stream flow as inputs with 1 node in the hidden layer achieving the MAE of 19.28 m 3 s −1 .For the testing data, this model achieves an MAE of 42.88 m 3 s −1 .The reason for the difference in MAE between the training and testing phases could be due to this river's ephemeral nature, and its substantial dependence on rainfall.

Conclusions
Initially, we split the whole series with a dyadic signal process for assessing the short term relationship between rainfall and stream flow including correlation using Haar wavelets.We have presented an innovative idea for the hydrological community for assessing stream flow for any catchment.In particular, the end user could assess the variability of changes and construct higher order correlations from 2 days up to as long as required.In addition, this study would be helpful for predicting stream flows using deterministic regression techniques, particularly where there is evidence of changes of statistical distribution characteristics, which is important for Water Sensitive Urban Design, as clearly demonstrated [39].Using a deterministic regression based response model we found an increasing trend in stream flow when rainfall increased significantly.Predicted stream flow was more influenced by the previous few days' stream flows than when considering the entire previous period of stream flow.We also developed artificial neural network models for three locations.The results show that the influence of lagged rainfall and stream flow lies within a short temporal window.The results demonstrate that the ANN models perform better for Broughton and Torrance River in capturing the rainfall and stream flow relationships.

Figure 2 .
Figure 2. Correlation pattern subseries of rainfall and stream flow time series.

Figure 3 .
Figure 3. Standard deviations of wavelet coefficients of rainfall and stream flow from level 0 to 8. (a) Rainfall; (b) Stream flow.

Figure 4 .
Figure 4. Predicted stream flow based on dsdt rainfall for (a) Broughton River; (b) Torrance River; and (c) Wakefield River from 1990 to 2010.

Figure 5 .
Figure 5.A schematic ANN including input, hidden and output layers.

Figure 6 .
Figure 6.MAE for training data (1990-2009) using ANN with best lag combinations at each location, units in m 3 s −1 .

Figure 7 .
Figure 7. Observed and predicted stream flow for (a) Broughton River; (b) Torrance River; and (c) Wakefield River for the year 2010.

Figure 8 .
Figure 8.Comparison of MAE for training and testing data, units are in m 3 s −1 .

Table 1 .
Weather stations information, data quality and observations.

Table 2 .
Rainfall and stream flow variability at Broughton River, Torrance River and Wakefield River in South Australia (SA) from 1990 to 2011.

Table 3 .
Constructed correlation pattern for different levels between (a) adjusted rainfall and adjusted stream flow; (b) squared adjusted rainfall and adjusted stream flow; (c) adjusted rainfall and squared adjusted stream flow; (d) squared adjusted rainfall and squared adjusted stream flow.

Table 4 .
Estimated coefficients of rainfall and stream flow variability from 1990 to 2012.

Table 5 .
Fitted regression model for Broughton River, Torrance River and Wakefield River.

Table 7 .
Best prediction model based on 2 adj R , lowest RMSE and MAE are in m 3 s −1 on the training data.