4.1. Development and Calibration of LFRM for Australian Data
The selected Q
max (1 and 3) (i.e., the top 1 and 3 maximum data points from each station’s AMFS data, referred to as Q
max), are first standardised by the at-site average of the AMFS data (mean), and then plotted in the (CV, Q
max/mean) plane.
Figure 5 shows such a plot for (1 and 3 max) the study data set, consisting of 626 data points (1 max), 1878 data points (3 max) from 626 sites, which suggests the following relationship:
The coefficients (
) of Equation (3) were estimated by the maximum likelihood method for each of the plots in
Figure 5. The estimated coefficients, along with their R
2 values, are provided in
Table 2.
The R
2 values in
Table 2 suggest that the estimated coefficients provide a reasonably good fit to the experimental data; this is more evident, however, when pooling the top 1 AMFS. When pooling 3 top maxima, a greater scatter is noticed, as can be seen in
Figure 5; this is also supported by the drop in R
2 value. An important note is made here on whether the weaker relationship with CV is compensated for later on by having additional data points to define the lower end of the distribution. What can be observed from
Table 2 is that the exponent
is appreciably greater than unity (as would be the case for a Gumbel distribution for 1 maxima) and decreases slightly with the pooling of more data (i.e., 3 max).
Based on
Figure 5, and assuming that a large part of the scatter can be explained by variations in the average recurrence interval (ARI/AEP) of the AMFS data, the best way to model the scatter is to search for a LFRM function in the form of:
where it is assumed that
is a function of the ARI/AEP only, and can be substituted for the coefficient
. From Equation (3), the calibration procedure is based on the introduction of a new standardised variable, which can be defined by:
where
are based on the coefficients according to the number of annual maxima pooled (e.g., 1 or 3 maxima). This form of standardisation (Equation (5)) takes into account not only differences in the mean values, but also of the CV, raised to the power appropriate for a specific regional data set. As expected, as a result of this new standardisation,
was practically uncorrelated with the CV, as was confirmed by the very small R
2 of 0.0037 referring to the same set of data points for using the top 3 annual maxima. The following plotting position formula (Equations (6)–(8)), proposed by Majone and Tomirotti [
11], was applied to estimate the ARI or the empirical non-exceedance frequency (AEP) of each of the
values in the pooled data sets (i.e., max of 1 and 3) from the N = 626 sites. In order to define the form of the distribution of the variable
, the top 1 and 3 annual maxima values of each site’s data were used. Here, the major assumption made is that the
ith value of the
series is independent of the other values and that the
values belong to the same population. It follows that the plotting position of the
can be provided by the following equations (Majone and Tomoirotti [
11]):
where
is the at-site standardised annual maximum and n
a is the site sample size (taken as the average of the site samples sizes, which is 34 for this study). Now, sorting the pooled sample of standardised maxima consisting of N = 626 (and L = 626 or 1878) in decreasing order and define
as the mth ranked value in the pooled sample. The ARI of
(expressed as T years) can be estimated using
From this definition, the estimated ARI/AEP values would ideally be assumed to be representative of actual return periods. However, this may not be the case for the Australian flood data set, as many of the gauging sites used here are very close together spatially (see
Figure 1) and hence there would be significant inter-site dependence. The plot of
vs. Y
T (where Y
T is the Gumbel reduced variate and is used as a surrogate for ARI or AEP), where Y
T = −ln[−ln(1−1/T)] is shown in
Figure 6 for
(L = 626 and 1838). The plots for L = 626 and L = 1838 sites in
Figure 6 (bottom curves for all two plots) are in line with what would be expected from using the additional data points. Clearly, the impact of using a greater number of maxima, e.g., 3 maxima, seems to provide a very smooth empirical distribution that is fitted closely by the distribution function. The plots also reveal that the experimental data can be approximated by a second-degree polynomial function of Y
T as given by Equation (9), whose model coefficients and R
2 values can be seen in
Table 3 for the different pooling of the annual maxima (i.e., top 1 and 3 maxima):
which, in terms of Q
max/mean, takes the following form
Equations (9) and (10) yield the analytical expression of the LFRM model for the study data, set using the top 1 and 3 annual maxima. The appropriate values of the coefficients in
Table 3 are substituted into Equations (9) and (10). However, this formulation does not allow for the effects of the inter-site dependence.
4.2. Revision of LFRM for Spatial Dependence
The LFRM for the study data in its current form (see Equations (9) and (10)) does not allow for the effect of inter-site dependence. In this section, spatial dependence is accounted for through the use of the spatial dependence model derived in the previous section (see Equation (2)). For this study, the use and calculation of N
e for application with the LFRM is illustrated. Firstly, the average correlation for each pair of sites was calculated for the region by computing the correlation coefficient from a regional relationship with distance for all of the Australian states. The average correlation coefficient was found to be
0.26. Secondly, using Equation (2) along with the coefficients for the Australian spatial dependence model given in
Table 1 (using the real and simulated data) and
0.26, the N
e was estimated. The calculated N
e value, along with the effective record length, is given in
Table 4. From
Table 3, it can be seen that the results from the real data match reasonably well with the simulated data.
Using the calculated N
e value of 207 (from the real dataset) in Equations (7) and (8) instead of the total number of stations (N = 626), we can now estimate the new plotting position of the pooled data points for 1 and 3 maxima. The new interpolated curve for Equations (9) and (10) has new coefficient values. The revised coefficient values of the LFRM have now been corrected for the spatial dependence in the dataset. The appropriate values of the coefficients of Equations (9) and (10) are given in
Table 3. Differences are clearly seen in the coefficients of the LFRM when comparing the results of the dataset using N and N
e sites; this is due to the reduction of the total useful information (i.e., the effective number of stations). The new interpolated frequency curves can be seen in
Figure 6 (both panels, top curves).
What is striking in
Figure 6 is the shift upwards in the frequency curve of the pooled data. Taking the 1 max plot for example, if one compares the Y
max value of approximately 4, it can be seen that, if one ignores the spatial dependence, the flood magnitude risk may be notably underestimated (for N sites Y
max = 4, AEP = 1 in 87, for N
e sites Y
max = 4, AEP = 1 in 8.3). For the pooling of the 1 max and correcting for spatial dependence (see max of 1 plot in
Figure 6) it was found that the range of
Ymax values for which the fitted model (referred to as LFRM_N
e henceforth) might be considered reliable is approximately 2.2 to 7, which corresponds to AEPs of 1 in 10 to approximately 1 in 2000.
Figure 7 shows the behavior of the dimensionless quantiles derived from Equations (9) and (10) for AEPs of 1 in 100, 1 in 500, and 1 in 1000 for all the pooled data, (i.e., 1 and 3 max), and for the estimated quantiles using N and N
e. The dimensionless quantiles for the world model (referred to as the PM (world), based on 7300 gauging stations around the world) developed by Majone and Timorotti [
11] are also superimposed for comparison. The comparison with the PM (world) curves in
Figure 5 indicates that the LFRM_N
e can explain a good amount of the scatter in these plots, as the set of curves (1 in 100 and 1 in 500 AEP curves) for this extended AEP range (including the 1 in 1000-AEP) captures most of the upper part of the points in the pooled data set of the Q
max/mean values. The flatter slopes in
Figure 7 for 3 max (bottom panel), are consistent with what was shown in
Figure 5 and seems to reflect a weaker relationship of Q
max/mean with CV. Comparison of the curves for max of 1 and 3 for N
e and N seems to indicate that allowance for spatial dependence has a smaller influence on slope.
Figure 7 also indicates that the extra data i.e., 3 max provides slightly better definition of the left-hand tail of the distribution (where the top few points in the right-hand tail are mostly common in all 2 data sets (1 and 3 maxima). Further investigation also revealed that the LFRM_N
e can provide reasonably accurate growth curve estimation for CV values in the ranges 0.60–1.60 (approximately 81% (505 out of 626) of the study catchments fall in this range). However, the LFRM_N
e can perform poorly for some catchments, with CV values greater than 1.70.
4.3. Application of the LFRM to Ungauged Catchments
Our interest is the application of Equations (9) and (10) to ungauged catchments, which requires the estimation of the mean flood and CV for the ungauged catchment in question. The BGLSR and the ROI approach, as discussed in [
21], were used to develop the prediction equations for the mean flood and CV of the AMFS data as a function of catchment and climatic characteristics (predictor variables). The prediction equation for the mean flood used a ROI of 30–40 stations, while 65–80 stations were used for the CV, based on the findings from past studies (e.g., [
21,
22,
23,
27]) and which state was being analysed.
The regression equations are presented in general form below:
The prediction equations developed above using the ROI approach, and Equations (9) and (10) (LFRM_N
e model), were applied to the 28 test catchments, which were not used in developing the prediction equations. To make the comparison more useful and to benchmark the LFRM_N
e model, the developed prediction equations were also used to estimate the mean flood and CV with the PM (world) model developed by Majone and Tomirotti [
11]. It must be pointed out however, that the PM (world) model does not contain any of the data used to develop the Australian LFRM. The validation analysis was undertaken for AEPs to 1 in 1000. AEPs in the range of 1 in 50 to 1 in 100 were compared with at-site flood frequency analysis (FFA) (obtained from the fitted log Pearson type 3) distribution using the FLIKE software [
28]. Validating beyond the AEP 1 in 100 with at-site FFA estimates was not viewed as reliable, given the very large extrapolation errors involved. Any validation results obtained beyond AEP 1 in 100 would be of little significance for most of the stations.
For the lower AEPs (1 in 500 and 1 in 1000), comparison was made against the results obtained from another regional method where the parameters of the LP3 distribution (i.e., mean, standard deviation, and skew) were regressed against catchment characteristics (known as the PRT—see [
21,
26] for more details) and flood quantiles were then derived for the 1 in 500 and 1 in 1000 AEPs. The extrapolation of these distributions to the low AEPs also involves a large degree of uncertainty. To assess how well the derived large flood estimates could approximate the observed flood estimates, two numerical measures were applied. Relative bias (BIAS
r) was used to assess whether the predicted rare flood quantiles by the LFRM_N
e or PM (world) models systematically under- or overestimated the at-site FFA or the PRT estimates on average, considering all the 28 test catchments.
The relative error values (REr), with respect to the at-site FFA or the regional parameter regression technique (PRT) estimate, were also obtained. This is by no means the true error of the LFRM_Ne or PM (world) models; the estimated errors represented here by both the BIASr and REr may be taken as a reasonable indication of consistency of the LFRM_Ne or PM (world) models as compared to FFA and PRT estimates. Here, both the FFA and PRT estimates are associated with a higher degree of uncertainty due to considerable extrapolation involved. It is worth noting here that in calculating the median relative error (REr), the sign of the relative errors was ignored.
Table 5 summarises the various error statistics with the LFRM_N (i.e., no spatial dependence) and LFRM_N
e models (considering the pooling of 1 and 3 maxima) and the PM (world) model based on the 28 test catchments. If spatial dependence is ignored in the Australian dataset, it is observed that the estimation for the AEP of 1 in 1000 using the LFRM_N model suffers from major underestimation on average (e.g., BIAS
r of −27%) for the ungauged catchment case. Moreover, from
Table 5, it can be seen for 1 max and when the pooling of more data is undertaken (i.e., 3 maxima), and spatial dependence (LFRM_N
e) is compensated for, the BIAS
r is well corrected. For example, from
Table 5, for the 1 in 1000 AEP, the BIAS
r for 1 and 3 max and LFRM_N
e are a 5 and 7% overestimation on average, respectively.
Focusing on the 3 max results, for the AEPs of 1 in 50 to 1 in 1000-, the BIAS
r values are positive on average for the LFRM_N
e, while for the PM (world) models, there are a couple cases of underestimation on average. When compared to the results of preliminary LFRM models (i.e., [
12]), the results obtained here present a significant improvement. As found in Haddad et al. [
12], the underestimation on average was up to 40%. By pooling more data and also accounting for the inter-site dependence in the LFRM model, the underestimation problem, to a large extent, has been rectified. The results as benchmarked against the PM (world) model are reassuring; this places a higher degree of confidence in the estimates given by the LFRM_N
e model developed here.
The RE
r values in
Table 5 show acceptable results, which are comparable to similar regional models for the larger AEP ranges (Rahman et al. [
27]). Focusing on the 3 max results, the RE
r values range from 30% to 60% (which are also very comparable to the PM (world) model). It should be noted that in the PM (world) data set most of the stations were so well separated that they were mostly independent of each other, and this was the reason why Majone and Timorotti [
11] did not need to work out an effective number of sites. The LFRM_N
e model in this study has refined the approach of the PM (world) model, as significant inter-site dependence exists between stations in the Australian data set.
An error bar plot of the BIAS
r values is given in
Figure 8, which displays the central tendency and variability of the sample BIAS
r values over the 28 independent test catchments. Here,
Figure 8 displays the mean value (circle symbol) with a 95% limit bar for flood quantiles AEP 1 in 100 to 1 in 1000. While the mean values appear to be different for the two methods (i.e., LFRM_N
e and PM (world) models), the difference is modest because the error bars overlap, suggesting the LFRM_N
e model to be very comparable and even better than the PM (world) model. Moreover, it proves that consistency is achieved for the 3 maxima pooling LFRM_N
e model as the mean values and the spread of BIAS
r values are very similar to the PM (world) model. What is noteworthy is the difference between LFRM_N
e and LFRM_N. The mean values were found to be statistically different, which suggests that the LFRM_N
e has corrected the negative bias quite well and justifies the use of the LFRM_N
e. It is envisaged that as a part of the future assessment of the LFRM_N
e, model comparisons will be made against design flood estimates obtained by alternative methods (e.g., spillway design and dam safety studies based on design rainfall-based approaches).