Short-term Forecasting of Water Yield from Forested Catchments after Bushfire: a Case Study from Southeast Australia

Forested catchments in southeast Australia play an important role in supplying water to major cities. Over the past decades, vegetation cover in this area has been affected by major bushfires that in return influence water yield. This study tests methods for forecasting water yield after bushfire, in a forested catchment in southeast Australia. Precipitation and remotely sensed Normalized Difference Vegetation Index (NDVI) were selected as the main predictor variables. Cross-correlation results show that water yield with time lag equal to 1 can be used as an additional predictor variable. Input variables and water yield observations were set based on 16-day time series, from 20 January 2003 to 20 January 2012. Four data-driven models namely Non-Linear Multivariate Regression (NLMR), K-Nearest Neighbor (KNN), non-linear Autoregressive with External Input based Artificial Neural Networks (NARX-ANN), and Symbolic Regression (SR) were employed for this study. Results showed that NARX-ANN outperforms other models across all goodness-of-fit criteria. The Nash-Sutcliffe efficiency (NSE) of 0.90 and correlation coefficient of 0.96 at the training-validation stage, as well as NSE of 0.89 and correlation coefficient of 0.95 at the testing stage, are indicative of potentials of this model for capturing ecological dynamics in predicting catchment hydrology, at an operational level.


Introduction
Forested catchments of southeast Australia supply most of the water for at least 25% of Australia's population, as well as for nationally significant industries including agriculture [1,2].Underlying interactions between soil-plant-atmosphere make the relationship between rainfall and runoff non-linear and complex to model in forested catchments.In addition, catchment vegetation cover is continuously affected by activities such as logging, long-term land use and changes, and major disturbances (e.g., bushfires) that in return influence the catchment water yield [3,4].Bushfires affect catchment hydrology through changes in the structure and density of the vegetation cover, leading to changes in evapotranspiration and water yield over the regeneration period that last sometimes for several decades.For example a 5% change in forest evapotranspiration due to changes in the vegetation cover affects mean water yield by 20% in southeast Australia [5].In the state of Victoria, the impact of the 1939 bushfires on water yield from mountain ash forests peaked around 20 years after the fire following the relationship that exists between changes in the leaf area index [6] and catchment water yield [3,6].
Climate and vegetation cover characteristics have commonly been used as key predictors of post-fire runoff from forested catchments in Australia [3,[7][8][9][10].Previously, short-term estimation and forecast of catchment water yield were based on a range of methods, from purely empirical simple models to highly sophisticated distributed process-based models defined by partial differential equations (e.g., the Systeme Hydrologique Europeen model [11] or the Macaque model [9]).Over the past decades, data-driven models have become increasingly useful in hydrological forecasting on the basis that they avoid having to address the problems of the spatial and temporal variability, and the uncertainty of the inputs and the parameters, as opposed to the physically-based models that require a wide range of catchment and climate information [12][13][14].
The most commonly applied data-driven methods in short-term estimating and forecasting of catchment water yield are the parametric regressions such as the multivariate regressions [15][16][17], nonparametric regressions such as the K-nearest neighbor method [18][19][20], symbolic regressions such as genetic programming [21][22][23], and artificial intelligence based methods such as neural networks [24][25][26].Hydrological processes contain non-linearities that are commonly modeled with data-driven techniques as an alternative to linear regression methods.Numerous studies have compared data-driven techniques with regression models (linear or non-linear) and have underlined the interest in using data-driven methods (see for example [27][28][29]).In this study, we employ four data-driven techniques, namely Nonlinear Multivariate Regression (NLMR), K-Nearest Neighbor (KNN), Nonlinear Autoregressive with External Input based Artificial Neural Networks (NARX-ANN), and Symbolic Regression (SR) to explore their potential for capturing ecological dynamics in predicting catchment hydrology at an operational level.

Case Study and Data Sources
The Corin catchment is located in the Namadgi National Park and is part of the Cotter river catchment in the Australian Capital Territory (ACT).The catchment lies about 50 km west of Canberra, at the northern end of the Australian Alps (35°39'25" S, 148°49'53" E), and encompasses an area of 148 km 2 (Figure 1).The catchment is covered by native eucalypt forests and soils of the area are derived from highly weathered Ordovician sediments and are acidic and duplex in structure [30].The underlying rock types are granite, limestone and shale, and the topography is mountainous (steep with rocky outcrops).Summers are characterized as warm and often hot, with dry periods of between six and eight weeks.In winter (July), mean daily maximum and minimum temperatures in sheltered locations (mid-slope) are 14 and −1 °C respectively, while in summer (January) the respective temperatures are 24 °C and 10 °C.Mean annual rainfall is approximately 1150 mm.Annual evaporation and seepage losses from the catchment are estimated to be 630 mm and stream discharge typically peaks between August and September and reaches a minimum between March and May [31] (Figure 2).Identification and documentation of fires within the Australian Capital Territory (ACT) date back to 1730.In the past 100 years, there has been major bushfires in the catchment and surrounding areas in the summers of 1920, 1926, 1939, 1983 and 2003.With the exception of the 1920 fire, all have followed severe droughts where rainfall in the months preceding the fire was well below average [32].Most recently, bushfire in January 2003 affected nearly 100% of the catchment.
We used Normalized Difference Vegetation Index (NDVI) as a proxy for vegetation cover [33].Our time series data encompass a period between the latest bushfire in the catchment of study (January 2003) and January 2012.Daily time series of water yield and rainfall (verified for gaps and aggregated from hourly measurements) were provided by Actew-AGL (the utility that supplies water to Canberra).Water yield was measured at a gauging station upstream of the Corin reservoir, and rainfall was measured at a weather station in the middle of the catchment (Figure 1).An annual hydrograph of the inflow to the Corin Dam and a hyetograph at the Mount Ginini weather station (closest synoptic station to Corin dam) are presented in Figure 2.  Time series of MODIS NDVI version 5 derived from red (0.62-0.67 µm) and near-infrared (0.841-0.876 µm) reflectance data were extracted and used directly.MODIS NDVI was selected because: (1) it provides an estimate of vegetation ecological changes over time [34] and fire damage [35], and is directly related to leaf area index (LAI) [36,37]; (2) remotely sensed NDVI is available at a higher spatial resolution than LAI (250 m instead of 1 km); (3) there is less uncertainty associated with estimation of NDVI by satellite data, compared to LAI of eucalypt forests in Australia [38].Data obtained from the MODIS-TERRA sensor (MOD13Q1 product version 5) has a 250 m spatial resolution and is a composited output over 16 days.Version-5 MODIS/Terra NDVI is validated over a widely distributed set of locations and time periods [37].We downloaded scene h30v12 of the product, from the NASA Land Processes Distribution Active Archive Centre Data Pool.Using the MODIS reprojection tool (Version 4.1; USGS Earth Resources Observation and Science Center, Sioux Falls, SD, USA) NDVI scenes were cut to the study area and average values for the catchment were used.Average NDVI for the catchment was used since the catchment of study has a relatively homogenous forest cover in the sense that the native eucalypt forests that cover the catchment did not differ widely in their canopy characteristics.
A short-term forecast (intra-month) of water yield is worthwhile information for water allocation as well as water stress assessments, especially since immediate bushfire impacts on water yield are relatively unknown.NDVI is only available every 16 days, therefore water yield and precipitation variables were reformed to 16-day intervals by accumulating their daily values.

Standardization and Goodness of Fit Criteria
Data standardization adjusts all data so that they fall within a prescribed range and have common basic statistical characteristics.The result of standardization is a data space without the bias, which appears usually as a result of scale, and consequently all input variables are treated equally.Given the natural range of the dependent and independent variables being equal to or greater than 0, the most simple and efficient method for standardizing variables was: where ystan is the amount of variable y after standardization; and Max(y) is the maximum value of y within the time series.Following goodness of fit criteria were used for comparing the results of different data driven methods: where obsi and fori are observed and forecasted value of the dependent variable at time step i, respectively; and n is the total number of time steps.

• Nash-Sutcliffe Efficiency (NSE)
NSE is used to evaluate the estimative power of hydrological models [39]: obs obs for obs NSE (5) where obs is the average of observed values from i = 1:n.NSE can range from −∞ to 1. NSE = 1 corresponds to a perfect match of modeled discharge to the observed data.NSE = 0 indicates that the model forecasts are as accurate as the mean of the observed data, and NSE < 0 is an indication that observed mean is a better predictor than the model or in other words, when the residual variance is larger than the data variance.

Data-Driven Methods
Data-driven methods that were applied in this study were: • Non-Linear Multivariate Regression (NLMR) NLMR estimates unknown coefficients of predictors based on a non-linear optimization approach in the following form: where m is the number of samples used for calibration; k is the number of samples used for both calibration and validation; Qobs(i) and Qfor(i) are observed and forecasted water yields at time i; x1 to xn are n predictor variables; cj and bj are coefficients of predictors and cn+1 is the constant of the model.Here we used "Lingo optimization package" [40] to optimize the model for coefficients and constant values.

• K-Nearest Neighbor (K-NN)
The K-NN method develops a distribution function of estimated values using a nonparametric kernel distribution function.The concept is based on observing estimator variable values at a given time and searching for similar conditions in the past, within the time series.These similar conditions can be considered as possible solutions depending on the degree of similarity between estimator variables at current and past time points [18,20].General algorithm for K-NN would be: (1) setting a matrix with m columns (number of predictors) and n + 1 rows (length of time series).
(2) last row of the above-mentioned matrix is assumed as a vector of predictors at current time (xj,t j = 1:m).(3) remaining rows are assumed as a matrix of predictors at historical time series (xj,t-i j = 1:m i = 1:n).(4) vector Q is defined with n rows of independent variable values from t − n to t − 1.
(5) using a distance function, distances between xj,t and xj,(t-i) are calculated.
where wj are weights of predictor variables at the distance function.We chose the Euclidean function as the distance function with equal weights to predictor variables.(6) distance vector (Dist) is sorted from minimum to maximum (SDist) and vector Q is assorted based on SDist.
(7) best number of neighbors (k) are specified based on a variety of methods.Here we have used the empirical equation n K = in which n is the length of the time series which is used as historical data for calibration and validation stages [41].(8) a discrete Kernel function is used to give weights to k neighbors [42].
where T is the transpose operation.

• Nonlinear Autoregressive with External Input Based Artificial Neural Networks (NARX-ANN)
NARX-ANN is able to model nonlinear autoregressive time series and it is quite appropriate to identify nonlinear dependencies among dependent and estimator variables [43,44].This model is a recurrent dynamic network, with feedback connections compassing several layers of the network.The defining equation for the NARX-ANN model is [45]: (10) where y(t) is dependent time series at time t; f is a set of nonlinear functions; x is matrix of independent variables; e(t) is the white noise residuals; and d is number of delays.NARX-ANN was coded and run in "MATLAB R2013a" programming package.

• Symbolic Regression (SR)
SR searches the space of mathematical equations to find an equation, which appropriately fits the data, by changing both the type of mathematical functions as well as the value of the parameters.This process starts with choosing initial expressions that randomly couple mathematical building blocks.Then, latter equations are shaped by reincorporating previous equations and altering their sub-expressions via an evolutionary algorithm similar to genetic programming; and ultimately final mathematical equations are ranked using a ratio of accuracy and equation complexity [46].In this study we used "Eureqa Formulize" software [47] to form SRs.Recently this software has been increasingly applied by researchers in different environmental studies as a reliable tool for analyzing SR-based issues [48][49][50][51].

Results and Discussion
Cross-correlation (based on Pearson Correlation) with 95% confidence interval was undertaken on standardized time series to find the most appropriate time lags (TL) between all independent and dependent variables.Results shows that TL = 1 has the highest correlations in all cases with the exception of zero.This means for forecasting standardized water yield at time t + 1 by three methods of NLMR, K-NN and SR, the values of standardized precipitation, standardized NDVI and standardized water yield at time t should be used; however in NARX-ANN best TL would be derived based on a trial-error process which will be explained.
At the next step, data were split into three blocks of (1) calibration including 70% of data; (2) validation including 15% of data; and (3) verification including 15% of data.While data were randomly distributed into three blocks, we ensured that each block included extreme events, in addition to more normal values.During the calibration stage (also known as training) 70% of our total data is calibrated for the model parameters.The model is then tested during the validation stage with an unused 15% of the total data.This process is repeated until total error for the calibration/validation stages is minimum.Once the optimum model is selected, an independent 15% of the data set that has not been used in the modeling process is used to examine model's accuracy.This is called the verification stage.
As for the K-NN method, best number of neighbors was 13 based on the number of data, which were located in the calibration and validation blocks.The NLMR model for short-term forecasting of water yield was derived as following (all used variables in Equation (11) Based on the recommendations of previous studies [44,52], here in the three-layer NARX-ANNs model, "Levenberg-Marquardt" algorithm function was used as back-propagation calibrationvalidation algorithm, and "Tangent sigmoid" function was applied for the hidden layer neurons, and finally "Linear Transfer" function was employed in the output layer neuron.Further, a range of 1 to 5 for number of delays as well as 1 to 20 for number of neurons in the hidden layer were examined to reach the best neural network's architecture.Considering the results of the objective function of mean square error, the best number of neurons was found to be 10; moreover, the best number of delays for precipitation and NDVI ("inputDelays") equals 1 and ("feedbackDelays") equals 2 for the water yield.
As for SR, five main mathematical operators, exponential and trigonometry functions were considered for building the mathematical blocks.Figure 3 shows the mathematical solution's accuracy vs. its complexity.Here we considered the solution with a mean absolute error 0.08 and complexity of 33 as the optimum point on the frontier.After this point by increasing the complexity the amount of errors had ignorable discrepancies.The stability and maturity of final solutions after 4.7 × 10 7 generations were 0.77% and 98.6%, respectively.Ultimately based on the optimum point, the proposed mathematical solution can be presented as (all used variables in Equation ( 12) are standardized by Equation ( 1), and therefore are dimensionless): where i is time step; WY is water yield; and Prec is precipitation.
Table 1 presents a sensitivity analysis over Equation (12).Here, sensitivity means the relative impact that a predictor has on the target variable (streamflow).In Table 1 "%Positive" is the likelihood that increasing this variable will increase the target variable; "Positive Magnitude" is when increase in this variable lead to increase in the target variable, this is generally how big the positive impact is."%Negative" is the likelihood that increasing this variable will decrease the target variable, and finally "Negative Magnitude" is when increase in this variable leads to decrease in the target variable and this is generally how big the negative impact is.According to Table 1, the model is most sensitive to NDVI (sensitivity = 1.31) even more than to precipitation (sensitivity = 0.08).Results show that at 31% of times, NDVI will have positive impact with magnitude of 3.81, and at 55% of times, streamflow with time lag 1 will have positive impact on forecasting streamflow with a magnitude of 0.7.Table 2 summarizes final results of the goodness of fit criteria for all models, at three stages of Calibration-Validation, Verification, and the Entire Data.NARX-ANN produced the best correlation coefficient (at the Calibration-Validation, 0.91) and K-NN the worst (at the entire data, 0.63).Using the RMSE criterion, NARX-ANN had the best performance at the Calibration-Validation stage (0.07) and K-NN had the worst performance at the Verification stage (0.21).As for VE, SR performed the best (Calibration-Validation stage, 1.09) and K-NN the worst (Verification stage, 3.57), and finally considering the NSE criterion, NARX-ANN showed the best performance (Verification stage, 0.80) and K-NN the worst performance (Verification stage, −3.54).In a pair-wise comparison between different methods, the ranking of performances is: (1) NARX-ANN; (2) SR; (3) NLMR; and (4) K-NN.According to Figure 4, K-NN has overestimated the low streamflows, and underestimated the high streamflow values while with NLMR (Figure 5) only underestimation of high streamflow values is remarkable.NARX-ANN by far has forecasted a wide range of streamflows much more accurately in comparison with other three models (Figure 6).Finally, considering Figure 7 the forecast errors of low, moderate and high streamflows at SR are rationally similar, and it is hard to distinguish the advantages of this model in forecasting a specific range of streamflow values.For comparison purposes we also parameterized a conventional linear regression model of form y = a + bx1 + cx2 + dx3 and found its performance very poor (Corr = 0.45) and the technique inappropriate compared to the data-driven models.An encouraging aspect of the NLMR and SR models was that they presented mathematical equations to be used in short-term forecasting of water yield.Of these two models, SR outperformed NLMR especially in extreme events.Obviously, models presented here could further be improved by using longer periods of continuous data in the analysis.
In case top priority is given to forecasting extreme events, the NARX-ANN model can be improved by including new performance function networks [53], and the SR model may be improved by dividing the data into extreme and normal groups first, and then modeling each group separately, as proposed by Charhate et al. [54].In this study, a 16-day interval was considered due to limited NDVI data availability.In case NDVI values are available with a higher temporal resolution, a shorter time interval (e.g., daily) might provide a more realistic short term forecast of the stream flow, because flood events that occur during this time interval will not be smoothed anymore, and because precipitation and discharge data generally have more heterogeneity than NDVI values.
Concluding, this study shows that in this catchment in southeast Australia, different data-driven models perform differently.The NARX-ANN model is superior to the rest of the techniques and would be a suitable tool for catchment managers and water utilities in the absence of extensive climate, soil and vegetation data.

Conclusions
Hydrological processes contain non-linearities that are commonly modeled with data-driven techniques as an alternative to conventional regression and process-based methods.Nevertheless, the lumped data-driven models have limitations in delivering hydrological insights.An underlying justification for the variability in the fit metric values in this study could be the physical processes that are under-presented with the current inputs.For example, hydrological processes such as snowmelt at higher elevations within the catchment impact the timing of the discharge and are inadequately presented via precipitation inputs.When limited data is available, uncertainties associated with hydrological data exert even larger limitations on the model (e.g., rainfall uncertainties driven by spatial scale or discharge uncertainties dominated by flow condition and gauging method [55]).While a varied mix of data-driven techniques have emerged for modeling hydrological time series, limitations to the mechanistic and physical rationale that can be afforded to the internal structure and behaviors of such models still need to be considered.

Figure 1 .
Figure 1.Study area and location of hydrometry station and the precipitation gauge.

Figure 2 .
Figure 2. Annual hydrograph at the inflow of the Corin dam provided by the Bureau of Meteorology of Australia (a); and hyetograph at of the Mount Ginini automatic weather station (b).
forecast value at current time is calculated as:

Figure 3 .
Figure 3. Mathematical solution accuracy vs. model complexity in the SR method.Y-axis shows mean absolute error of the forecast vs. observations for the entire data.

Figure 4 .
Figure 4. Final results of standardized forecasts vs. standardized observations using the K-NN method.

Figure 5 .
Figure 5. Final results of standardized forecasts vs. standardized observations using the NLMR method.

Figure 6 .
Figure 6.Final results of standardized forecasts vs. standardized observations using the NARX-ANN method.

Figure 7 .
Figure 7. Final results of standardized forecasts vs. standardized observations using the SR method.

Table 1 .
Sensitivity analysis of the optimum solution derived from SR.

Table 2 .
Performance statistics for the tested data-driven methods.