Application of Principal Component Analysis and Cluster Analysis in Regional Flood Frequency Analysis: A Case Study in New South Wales, Australia

This paper examines the applicability of principal component analysis (PCA) and cluster analysis in regional flood frequency analysis. A total of 88 sites in New South Wales, Australia are adopted. Quantile regression technique (QRT) is integrated with the PCA to estimate the flood quantiles. A total of eight catchment characteristics are selected as predictor variables. A leave-one-out validation is applied to determine the efficiency of the developed statistical models using an ensemble of evaluation diagnostics. It is found that the PCA with QRT model does not perform well, whereas cluster/group formed with smaller sized catchments performs better (with a median relative error values ranging from 22% to 37%) than other clusters/groups. No linkage is found between the degree of heterogeneity in the clusters/groups and precision of flood quantile prediction by the multiple linear regression technique.

Some researchers have also adopted climatic and catchment characteristics to form homogeneous regions by using multivariate statistical techniques such as cluster analysis [8,[17][18][19][20][21][22][23][24][25]. Ward's method is the most common method for clustering as it can form regions of roughly equivalent size and this method is considered to be more suitable for regionalization of flood data [26]. Finally, many researchers adopted a region of influence (ROI) approach to circumvent complications related to fixed state boundaries [6,[11][12][13]16,25,[27][28][29][30][31][32][33]. The ROI is much more flexible than the fixed region approach and can be easily incorporated with a range of RFFA methods. ROI can also effectively reduce regional heterogeneity by sub-region formation within a large region. In the ROI approach, various ways can be adopted to form sub-regions such as by geographical distance [34,35] or distance in a multi-dimensional catchment attributes space [4,6,11,29].
Once a region is formed based on an appropriate condition [17,[34][35][36][37][38], the design flood can be estimated either by the index flood method [15,39,40] or a regression-based approach [6,41]. The regression-based approach is used in numerous studies. Both the ordinary least square and the generalized least square regression methods are adopted to estimate the coefficients of the prediction equations in regression-based approaches [6,29,[41][42][43][44][45][46]. For regression-based RFFA models, a linear relationship is generally assumed between the dependent variable (flood quantiles) and predictor variables (catchment and climatic characteristics). It is also assumed that the predictor variables are uncorrelated among each other; however, in many practical situations this assumption is not fully satisfied [47].
Principal component analysis (PCA) is a statistical technique capable of generating statistically uncorrelated principal components (PC) which are the linear amalgamation of the original variables (catchment and climatic characteristics). PCA has been used previously to delineate homogenous regions. For example, Burn [48], DeCoursey and Deal [49], Hawley and McCuen [50], Kar et al. [51], and Nathan and McMahon [22] adopted PCA to generate the PCs consisting of different physical, hydrological, and meteorological variables, and accordingly used them in principal component regression (PCR) to estimate flood quantiles. Choi et al. [52], Haque et al. [53], Haque et al. [54], and Koo et al. [55] used PCs in a PCR for water demand forecasting. Although PCA can produce uncorrelated PCs, one shortcoming of PCA is that due to the use of variance as an objective function, statistically independent structures are not always guaranteed in PCA [53].
There is a lack of research on delineation of homogeneous regions in RFFA. To fill this research gap, this study examines the formation of homogeneous regions using PCA and cluster analysis. This investigates the use of PCR method in RFFA with ROI and fixed region approaches. Multiple linear regression (MLR) models are developed for the regions generated by cluster analysis to estimate flood quantiles. A leave-one-out validation technique is adopted to assess the performance of the developed models.

Study Area
A very large catchment can have a significantly different flood frequency behavior compared to smaller sized catchments. Australian Rainfall and Runoff (ARR) [32,56] recommends an upper limit of 1000 km 2 for small to medium sized catchments [6], which appears to be rational to select candidate catchments for this study. From New South Wales (NSW), Australia, a total of 88 catchments are selected to carry out this study. These are natural catchments and free from any major storage and land use change. These selected catchments have catchment areas varying from 8 to 1010 km 2 . The mean of catchment area is found to be 352 km 2 and median is found to be 260 km 2 . It is recommended in Rahman et al. [57] to select catchments that have at least 20 years of flood data to develop the RFFA models in Australia. For this study, the selected catchments show a record length of annual maximum (AM) flow data in the range from 25 to 82 years (mean of 41.5 years and median 37 years). The catchments selected, vary from mountain to coastal region. The mean annual rainfall for the chosen catchments ranges from 625-1955 mm with a mean of 1000 mm and a median of 910 mm. Figure 1 shows the location of the selected 88 catchments.
A summary of descriptive statistics of selected catchment and climatic characteristics for the selected 88 sites is presented in Table 1. The design rainfall intensity of six-hour duration and two-year return period (I 62 ) at each catchment centroid is obtained from Australian Bureau of Meteorology website. The shape factor (SF) is defined as the shortest distance between a catchment's centroid and outlet divided by the square root of catchment area (A). Stream density (sden) is obtained as the sum of all the streamlines on a 1:100000 topographic map divided by catchment area (A). Mean annual rainfall (MAR) and mean annual evapotranspiration (MAE) data for each catchment are obtained from Australian Bureau of Meteorology website. The fraction forest (forest) is obtained as the total forested area shown on a 1:100000 topographic map divided by catchment area (A). The mainstream slope (S1085) is obtained as the difference in elevations at 10% and 85% of the mainstream length (measured Water 2020, 12, 781 3 of 26 from the catchment outlet) divided by 0.75 of mainstream length. The PCs are extracted using these characteristics to develop the PCR models using quantile regression technique (QRT) and to form regions using cluster analysis. A summary of descriptive statistics of selected catchment and climatic characteristics for the selected 88 sites is presented in Table 1. The design rainfall intensity of six-hour duration and twoyear return period (I62) at each catchment centroid is obtained from Australian Bureau of Meteorology website. The shape factor (SF) is defined as the shortest distance between a catchment's centroid and outlet divided by the square root of catchment area (A). Stream density (sden) is obtained as the sum of all the streamlines on a 1:100000 topographic map divided by catchment area (A). Mean annual rainfall (MAR) and mean annual evapotranspiration (MAE) data for each catchment are obtained from Australian Bureau of Meteorology website. The fraction forest (forest) is obtained as the total forested area shown on a 1:100000 topographic map divided by catchment area (A). The mainstream slope (S1085) is obtained as the difference in elevations at 10% and 85% of the mainstream length (measured from the catchment outlet) divided by 0.75 of mainstream length. The PCs are extracted using these characteristics to develop the PCR models using quantile regression technique (QRT) and to form regions using cluster analysis.

Principal Component Analysis
Multiple linear regression (MLR) models get unstable with an increasing number of predictor variables, in particular, if these are highly correlated. PCA is one of the multivariate statistical techniques that can be used to deal with highly correlated variables in regression [58]. In PCA,

Principal Component Analysis
Multiple linear regression (MLR) models get unstable with an increasing number of predictor variables, in particular, if these are highly correlated. PCA is one of the multivariate statistical techniques that can be used to deal with highly correlated variables in regression [58]. In PCA, original dataset of n variables, which are correlated to various degrees are transformed to n numbers of uncorrelated PCs. The PCs are linear transformation of the original variables in such a way that the original and the new variables have equal sums of the variances. Although the number of PCs and original variables are equal, the first few PCs explain the majority of the variance in the data set, reducing the dimensionality of the original data set [59]. The PCs are sequenced from the highest to the lowest variance, i.e., the first PC describes the data's highest variance proportion. The next highest variance is explained by the second PC and so on. The values of PCs can be obtained from Equations (1) and (2): Water 2020, 12, 781 4 of 26 where x 1 , x 2 , . . . x n are the original variables and a jj are the eigenvectors. The eigenvalues are the variances of the PCs. The covariance or correlation matrix of the data set is used to derive the coefficients a jj , which are the eigenvectors. The eigenvalues of the data matrix can be calculated by Equation (3): where C is the correlation/covariance matrix, λ is the eigenvalue, and I is the identity matrix. The PC coefficients or the weights of the variables in the PCs are then calculated by Equation (4): In the PCR analysis, PCs are used as predictor variable in MLR [60]. The general form of PCR model is as follows: where Y is the dependent variable (which is flood quantile here), α is the model intercept, β 's are the regression coefficients.

Cluster Analysis
The statistical distance measurement representing the similarity (or dissimilarity) among the collections of attributes (similarity measurements) selected for each gauging site is used for grouping sites in cluster analysis. There are various clustering techniques available in the statistical literature [61] and are used to delineate hydrologically homogeneous regions [17,26,35,36,[62][63][64][65]. Ward's method is the most commonly used method since this can produce clusters of similar sizes [26]. Hence, Ward's method is adopted in this study.
Ward's approach is an agglomerative hierarchical algorithm that starts with each site being its own cluster (or region). The algorithm successively merges clusters using a variance approach analysis in which the similitude between members in a region is measured in terms of the square error sum (ESS). For region k containing N k sites, the ESS is calculated as: where x j = [x 1 , x 2 , . . . , x p ] T is a vector of p characteristics measured at site j, and where each element denotes the mean value of a characteristic across the N k sites in the region. ESS k is calculated for the theoretical fusion of any two clusters at each step, and the actual fusions selected are those which minimize the increment in the total ESS across all regions.

Region of Influence Approach
Formation of regions without fixed boundaries was firstly carried out by Acreman [7]. Based on this, Burn 1990 a, b [12,13] introduced the ROI approach. In this approach, the individual site of interest (i.e., catchment where flood quantiles are to be estimated) forms its own region. Such identified regions can overlap and gauged sites for different sites of interest can be part of more than one ROI. The ROI may be formed for the site of interest using the group of sites in close proximity to the site of interest. More recently, the ROI approach has been adopted in ARR RFFA [32] and also in a study carried out by Rahman et al. [6]. A weighted Euclidean distance in an M-dimensional space may be used to measure the proximity. The distance metric can be defined by: where D i,j is the weighted Euclidean distance between site i and j, M is the number of features incorporated in the distance measure, and the X terms represent standardized values for feature m at site i and site j, and W m is a weight applied to attribute m, which reflects the relative significance of the feature. Standardization of attributes is performed to remove units and therefore bias due to scaling of the attributes can be avoided.

Homogeneity Assessment
Here, the Hosking and Wallis' [15] criteria of heterogeneity is adopted, which is based on L moments. A group of catchments is considered to be heterogeneous if H is too high. When H is smaller than 1, the group is taken as 'acceptably homogeneous', H falls between +1 to +2, the group is taken as 'possibly heterogeneous', and H ≥ 2 it indicates a 'definitely heterogeneous' group. Furthermore, there are three different measures of H: H 1 is based on L coefficient of variation, H 2 is based on L coefficient of variation, and L coefficient of skewness and H 3 is based on L coefficient of skewness and L coefficient of kurtosis [15].

Evaluation Statistics
A leave-one-out validation technique is adopted for assessing the performance of the developed RFFA models. Based on leave-one-out validation, during the construction of a model, a site is left out in each phase, i.e., this site is treated as an ungauged site. The following performance statistics for each of the models are computed using predicted flood quantile (Q pred ) and observed flood quantile (Q obs ): relative error (RE), median absolute relative error (med_RE r ), Q pred /Q obs ratio, median Q pred /Q obs ratio (med_Q pred /Q obs ), mean square error (MSE), root mean square error (RMSE), bias (BIAS), relative bias (RBIAS), relative root mean square error (RRMSE), and root mean square normalized error (RMSNE): Q obs is the observed flood quantile at site i. Q obs is obtained by carrying out at-site FFA using LP3 distribution by FLIKE software [66]. In this study, six flood quantiles with annual exceedance probabilities (AEPs) of 50%, 20%, 10%, 5%, 2%, and 1% are considered as dependent variables. Table 2 shows the magnitude and type of correlation between the original catchment and climatic characteristics (predictors). Correlation between the predictors, i.e., catchment area, rainfall intensity, shape factor, stream density, mean annual rainfall, mean annual evapo-transpiration, slope and fraction forest ('A', 'I 62 ', 'SF', 'sden', 'MAR', 'MAE', 'S1085', and 'forest', respectively) are calculated using the method described in methods section. Looking into Figure 2 and Table 2, it can be seen that catchment area has a negative correlation (−0.208; p-value 0.052) with the rainfall intensity indicating if the catchment area is large, rainfall intensity decreases. Rainfall intensity has a positive correlation with the variables shape factor, stream density, fraction forest, mean annual rainfall and mean annual evapo-transpiration, where the maximum positive correlation is between mean annual rainfall and rainfall intensity being 0.83 (p-value ≈ 0) and mean annual evapo-transpiration and rainfall intensity being 0.67 (p-value ≈ 0). This indicates that, if the rainfall intensity increases, mean annual rainfall also increases. Although these values are not close to ± 1, they have a very small p values (<0.10) indicating that these correlations are significant. Slope has a positive correlation of 0.387 with fraction forest and a negative correlation of −0.286 with mean annual evapo-transpiration where both the coefficients have smaller p values (<0.10). The variable fraction forest also has a positive correlation with the variable mean annual rainfall (0.405; p-value ≈ 0) and mean annual evapo-transpiration has positive correlations with stream density and mean annual rainfall (0.392 and 0.533; p-values ≈ 0). All the other correlation coefficients range from −0.007 to 0.303, which are statistically insignificant.  From the above discussion it is possible to say that some of the predictor variables have a notable degree of correlation between them. Therefore, PCA is applied to the eight selected predictors, i.e. catchment area, rainfall intensity, shape factor, stream density, mean annual rainfall, mean annual evapo-transpiration, slope and fraction forest ('A', 'I62', 'SF', 'sden', 'MAR', 'MAE', 'S1085', and 'forest', From the above discussion it is possible to say that some of the predictor variables have a notable degree of correlation between them. Therefore, PCA is applied to the eight selected predictors, i.e., catchment area, rainfall intensity, shape factor, stream density, mean annual rainfall, mean annual evapo-transpiration, slope and fraction forest ('A', 'I 62 ', 'SF', 'sden', 'MAR', 'MAE', 'S1085', and 'forest', respectively) to achieve the uncorrelated eight PCs. Figure 3 shows the transformed PCs without any correlation. The eigenvalues with their percentage of contribution represent the quantity of variability in the data and they are presented in Table 3. Table 3 confirms that the first two PCs explain the maximum degree of variability of the data set with the proportion of PC1 and PC2 being 35.3% and 20.5%, respectively. The proportions of other PCs (PC3, PC4, PC5, PC6, PC7, and PC8) range 1.8%-13.4%. Although, PC1 and PC2 have the highest percentages among all the PCs, however, the cumulative of these two PCs only accounts for 55.8% variance, meaning these two PCs can only explain half of the variability in the dataset. To explain at least 85% of the variability in the data the first five PCs are required though the individual percentage is quite low for some PCs.

Development and Testing of Regression Equation in Fixed Region and ROI Framework
To select the most impactful PCs, a significance level (p-value) for each of the PCs in the regression analysis is examined and the selection criterion is set to p ≤ 0.1. Based on this criterion, PC1, PC2, PC3, PC4, and PC5 are found to be the significant ones to be used in the development of regression equation to estimate the flood quantiles. However, the coefficient of determination (R 2 ) and adjusted coefficient of determination (adj_R 2 ) are found to be quite small for the regression equations, 0.46 and 0.43, respectively. The developed regression equation is then tested in both fixed region and ROI framework with leave-one-out validation. To apply fixed region (FR) approach, all the 88 sites are grouped together as 'one region' and all the flood quantiles for each site for the six AEPs (50%, 20%, 10%, 5%, 2%, and 1%) are estimated by leave-one-out validation.
In the ROI framework, at first, 10 sites are grouped together to form one region and the flood

Development and Testing of Regression Equation in Fixed Region and ROI Framework
To select the most impactful PCs, a significance level (p-value) for each of the PCs in the regression analysis is examined and the selection criterion is set to p ≤ 0.1. Based on this criterion, PC1, PC2, PC3, PC4, and PC5 are found to be the significant ones to be used in the development of regression equation to estimate the flood quantiles. However, the coefficient of determination (R 2 ) and adjusted coefficient of determination (adj_R 2 ) are found to be quite small for the regression equations, 0.46 and 0.43, respectively. The developed regression equation is then tested in both fixed region and ROI framework with leave-one-out validation. To apply fixed region (FR) approach, all the 88 sites are grouped together as 'one region' and all the flood quantiles for each site for the six AEPs (50%, 20%, 10%, 5%, 2%, and 1%) are estimated by leave-one-out validation.
In the ROI framework, at first, 10 sites are grouped together to form one region and the flood quantiles are estimated. Afterwards, at each of the iterations, five new sites are added to form a larger region until the site number reached 30. When the site number reached 30, ten sites are added at each of the iterations to form a larger region until the number of sites reached 80. Leave-one-out is applied at each of the iterations for validation. Table 4 shows the statistical evaluation for 5% AEP flood. The first column shows the statistical evaluations that are calculated for both the fixed region and ROI cases, which are MSE, RMSE, BIAS, RBIAS, RRMSE, RMSNE, R r , and med_Q pred /Q obs based on observed and predicted values for 5% AEP flood. Looking into Table 4, it is found that for 5% AEP flood, fixed region has the lowest MSE. The lowest R r is found in case of KNN80 (46.59%) and med_Q pred /Q obs close to 1 is found in case of KNN15. From Table 4, it is clear that KNN10 performs the poorest with largest MSE and R r , which means it is preferable to select more than ten sites to form a region to use PCR. Durocher et al. [67] carried out a study in Southern Quebec (Canada) and their results show a RMSE in the range of 38 m 3 /s and 45 m 3 /s in case of 10% and 1% AEP floods using spatial copula method. For the same dataset in Québec a number of studies [47,68,69] were carried out. The results show RMSE values being 41 m 3 /s to 51 m 3 /s for 10% AEP flood, and 49 m 3 /s to 70 m 3 /s for 1% AEP flood. These studies were carried out using ordinary kriging in PCA-space, generalized additive model and single artificial neural network. Studies carried out by Durocher et al. [67], Chokmani and Ouarda [68], Chebana et al. [20] and Shu and Ouarda [69] show RBIAS values ranging from −5% to −20% for 10% AEP flood and −7% to −27% for 1% AEP flood. A study carried out by Rahman et al. [6] found RBIAS values ranging from 22% to 69% for the six AEP floods.
For further clarification on the number of sites required to form regions in case of using this technique in FFA, boxplots are examined based on their RE and Q pred /Q obs ratio values for both the fixed region and ROI framework. Figures 4 and 5 show boxplots of the RE and Q pred /Q obs ratio values for both the fixed region and ROI framework for 5% AEP flood. Both Figures 4 and 5 starts with fixed region approach and have all the ROI approaches presented one by one after the fixed region. Looking at Table 3, one can see that fixed region performs better than the rest of the ROI models. However, Figures 4 and 5 show that, although the box size is smaller (i.e., a smaller error range), the median line is not close to the expected line (expected lines are set at zero and one for Figures 4 and 5, respectively) for the fixed region. KNN10 shows a similar performance as presented in Table 4. KNN15 and KNN25 both show promising results in Figures 4 and 5 with a smaller box size and median value being very close to the median line. However, it seems that KNN25 has smaller error bars than KNN15. There are number of outliers for all the models as shown by small circles, but they are not all visible in the figures as the range for the boxplots are set in the range of −300 to +300 for the RE and −2 to +3 for Q pred /Q obs ratio values to have a greater visibility. In Figure 5, it is seen that none of the top error bars are visible in the set range bringing in the question of how well they fit the regression analysis. The rest of the ROI models show that they represent a poorer fit with bigger box size and median RE being far away from the expected line. range for the boxplots are set in the range of −300 to +300 for the RE and −2 to +3 for Qpred/Qobs ratio values to have a greater visibility. In Figure 5, it is seen that none of the top error bars are visible in the set range bringing in the question of how well they fit the regression analysis. The rest of the ROI models show that they represent a poorer fit with bigger box size and median RE being far away from the expected line.   Tables 5 and 6. Although KNN25 comes out as the best model out of all of them leaving KNN15 behind, however from Tables 5 and 6, it seems that KNN15 outperforms KNN25 especially in the case of Qpred/Qobs ratio values. A fixed region approach or KNN80 does not show any better performance in this case. As seen earlier, KNN10 shows the worst results. The other models generate a mixture of results. In some cases, a very large RE (%) and Qpred/Qobs ratios are also found (i.e., for stations 206026, 210068, 210076, and 222016). As seen from the R 2 and adj_R 2 values, this regression model is found to be representing a poor fit. The analysis for both the fixed region and ROI framework also supports this finding.  Tables 5 and 6 compare the RE and the Q pred /Q obs ratio values for all the AEPs for both the fixed region and ROI, respectively. Rows 2 to 18 of Tables 5 and 6 show the mean, median and standard deviation (Std_Dev) of the selected AEPs for both the fixed region and ROI models, respectively. The last three rows show the overall mean, median, and Std_Dev for the AEPs. All the RE values are transformed to their absolute values by ignoring their sign. All the lowest values in case of both Q pred /Q obs ratio values and RE values are presented with blue color in both Tables 5 and 6. Although KNN25 comes out as the best model out of all of them leaving KNN15 behind, however from Tables 5 and 6, it seems that KNN15 outperforms KNN25 especially in the case of Q pred /Q obs ratio values. A fixed region approach or KNN80 does not show any better performance in this case. As seen earlier, KNN10 shows the worst results. The other models generate a mixture of results. In some cases, a very large RE (%) and Q pred /Q obs ratios are also found (i.e., for stations 206026, 210068, 210076, and 222016). As seen from the R 2 and adj_R 2 values, this regression model is found to be representing a poor fit. The analysis for both the fixed region and ROI framework also supports this finding.  Figure 6 shows the dendrogram by hierarchical cluster analysis (Ward's method) using all the predictors (standardized to 0 mean and unit variance). A total of five clusters are identified in Figure 6; each of the clusters has more than seven stations (cluster 4 has 23 stations, whereas the other clusters have 15 to 18 stations). The details of the five clusters are provided in Table 7. The median of streamflow record lengths for all the clusters are in the range 36 to 40 years. Cluster 5 contains catchments that are relatively large (area ranging from 454-1010 km 2 with a median value of 835.5 km 2 ). The median values of area for the other clusters are in the range 156-365 km 2 . Looking into design rainfall intensity i.e., I 62 , the median values range 38 mm-59 mm for all the clusters. The highest rainfall intensity is found in case of cluster 3, in the range 76 mm-133 mm. The variable 'SF' is found to be similar for all the five clusters; whereas 'sden' is found to be higher for clusters 1 and 3 and minimum for cluster 4. The variables 'MAR' and 'MAE' are found to be higher for cluster 1, i.e., 1480.2 mm and 1382.7 mm (median), respectively. Cluster 2 shows a relatively higher slope. Finally, fraction forest area is found to be relatively higher for clusters 1 and 2.

Homogeneity Analysis of the Clusters
To investigate the degree of homogeneity of the five clusters, the heterogeneity measure proposed by Hosking and Wallis [15] is applied to each cluster individually. According to Hosking and Wallis [38], any station showing Di ≥ 3 is considered to be discordant. Based on this criterion, no discordant station is found for clusters 1, 2, 4, and 5, yet one discordant station (Di = 3.01) is found for cluster 3. The heterogeneity measure is applied to the five clusters to calculate H-statistics (H1, H2,  and H3). For cluster 3, although the Di value is not very large, the heterogeneity measure is applied

Homogeneity Analysis of the Clusters
To investigate the degree of homogeneity of the five clusters, the heterogeneity measure proposed by Hosking and Wallis [15] is applied to each cluster individually. According to Hosking and Wallis [38], any station showing Di ≥ 3 is considered to be discordant. Based on this criterion, no discordant station is found for clusters 1, 2, 4, and 5, yet one discordant station (Di = 3.01) is found for cluster 3. The heterogeneity measure is applied to the five clusters to calculate H-statistics (H1, H2,  and H3). For cluster 3, although the Di value is not very large, the heterogeneity measure is applied

Homogeneity Analysis of the Clusters
To investigate the degree of homogeneity of the five clusters, the heterogeneity measure proposed by Hosking and Wallis [15] is applied to each cluster individually. According to Hosking and Wallis [38], any station showing D i ≥ 3 is considered to be discordant. Based on this criterion, no discordant station is found for clusters 1, 2, 4, and 5, yet one discordant station (D i = 3.01) is found for cluster 3. The heterogeneity measure is applied to the five clusters to calculate H-statistics (H 1 , H 2 , and H 3 ). For cluster 3, although the D i value is not very large, the heterogeneity measure is applied twice; firstly, with all the discordant station in the cluster and secondly, removing the discordant station.   Table 8 presents the heterogeneity measures for each cluster. It is visible from Table 8 that none of the clusters form homogeneous region. The lowest H-statistics is found for cluster 4; however, as the range is between 1 ≤ H ≤ 2, cluster 4 is 'possibly heterogeneous'. Cluster 3 shows two H-statistics as one discordant station is found for cluster 3 (station 419029). Removal of the discordant station does not improve the result for cluster 3. Although the values of H 2 and H 3 for some clusters are smaller, H 1 is mostly indicative of the heterogeneity in the group, which is much higher than 1.00. It is of interest to check how these heterogeneous clusters perform in regional flood estimation. Hence, QRT is applied to each cluster in the next section with leave-one-out validation to validate the QRT.

Development of Prediction Equation and Performance Testing
For the development of prediction equation, the dependent (flood quantiles) and predictor variables are natural-log transformed (i.e., a log-log modelling is adopted). A stepwise procedure based on their level of significance (p ≤ 0.1) is applied to select the best set of predictor variables. For different clusters, selection of the predictor variables generated different sets of equations because of their different levels of significance. The general regression equation for all clusters is given below for 5% AEP flood (Q 20 ) and the regression coefficients for each variable for each cluster are given in Table 9. The general form of the regression equation for the clusters: ln Q 20 = β 0 + β 1 (ln(A)) + β 2 (ln(I 62 )) + β 3 (ln(SF)) + β 4 (ln(sden))+ β 5 (ln(MAR)) + β 6 (ln(MAE)) + β 7 (ln( f orest)) + β 8 (ln(S1085)), The R 2 and adj_R 2 for each model for all the clusters are found to be quite high except for cluster 5. In case of cluster 1, R 2 and adj_R 2 values are 0.93 and 0.89, respectively, for the selected model; for cluster 2 they are 0.98 and 0.96, respectively; for cluster 3 they are 0.82 and 0.77, respectively; for cluster 4 these are 0.68 and 0.63, respectively, and for cluster 5 these are 0.66 and 0.48, respectively. It is evident that except cluster 5 the other four clusters generate regression models with satisfactory goodness-of-fit. Figures 9 and 10 show the standardized residuals versus the fitted or predicted value plots and normality plots for 5% AEP flood for all the five clusters. It is necessary for the residuals of a linear regression model to satisfy homoscedastic pattern as heteroscedasticity in the residuals indicates that a non-linear model is more appropriate for the data. It is evident from the standardized residual versus the fitted value plots that the residuals lie between −2 and +2 and no specific pattern is visible in the plots. No pattern in the spread of the residuals indicates that the residuals are homoscedastic, which satisfies the linearity model assumption. The Q-Q plots do not completely follow the reference line except for cluster 2, yet there is no specific pattern, which indicates that the normality assumption is not grossly violated.
Water 2020, 12, x FOR PEER REVIEW 13 of 29 which satisfies the linearity model assumption. The Q-Q plots do not completely follow the reference line except for cluster 2, yet there is no specific pattern, which indicates that the normality assumption is not grossly violated.   Figure 10 is set at zero as it indicates an un-biased model. For Figure 12, the expected line is set at one as the ratio being one is indicative of an unbiased model. Figure 11 has a set boundary of −300 to +300, whereas Figure 12 has a set boundary of −3 to 3 which satisfies the linearity model assumption. The Q-Q plots do not completely follow the reference line except for cluster 2, yet there is no specific pattern, which indicates that the normality assumption is not grossly violated.   Figure 10 is set at zero as it indicates an un-biased model. For Figure 12, the expected line is set at one as the ratio being one is indicative of an unbiased model. Figure 11 has a set boundary of −300 to +300, whereas Figure 12 has a set boundary of −3 to 3 to have better visibility.  Figures 11 and 12 show the boxplots of the selected quantiles for all the clusters in terms of RE and Q pred /Q obs ratio values. The expected line in Figure 10 is set at zero as it indicates an un-biased model. For Figure 12, the expected line is set at one as the ratio being one is indicative of an unbiased model. Figure 11 has a set boundary of −300 to +300, whereas Figure 12 has a set boundary of −3 to 3 to have better visibility.   It is seen in Figures 11 and 12 that cluster 5 is the worst performing group. It is clear that most of the predictions are underestimations. The boxplots are not visible for all the quantiles as there are cases with a gross underestimation. The median of 50% AEP flood is far below the expected line in case of both Figures 11 and 12 for cluster 5. Cluster 5 is made of stations having larger area than the other clusters. Poor performance of cluster 5 in QRT indicates that pooling of larger catchments into single group do not represent a viable choice in RFFA. Cluster 2 seems to be the best performing out of all the five clusters. Cluster 4 showing the best H 1 value does not perform as good as cluster 2 in the leave-one-out validation. Figure 13 shows plots of predicted vs observed flood quantiles for all the AEP floods for the five clusters. These plots generally present a good agreement between the predicted and observed flood quantiles. For cluster 1, there are a few cases of over-estimation when the observed flows are in between 100 m 3 /s to 200 m 3 /s and in case of larger observations ranging from 1200 m 3 /s to 1800 m 3 /s, there are some under-estimations by the regression model. For smaller discharges, cluster 2 seems to be performing well, although as the discharge gets larger the prediction by the model gets more erroneous, which is also visible from the boxplots. In case of clusters 3 and 4, the models perform well for smaller discharges; for the larger discharges, the models provide gross under-estimation. Cluster 5 is the worst performing group as seen from Figure 13. Cluster 2 performs the best in case of 5% AEP flood. As 5% AEP is the most frequently adopted flood quantile in design flood estimation, it can be said that regions formed based on small to medium sized area with a small range in other catchment characteristics will generate better prediction than other groups. Looking into the homogeneity analysis for all the five clusters, it can be concluded that homogeneity does not play a vital role in enhancing the prediction accuracy. Tables 9 and 10 show the comparison of mean, median and standard deviation (Std Dev) of absolute relative error (RE abs ) and absolute Q pred /Q obs ratio for all the clusters and the selected AEPs. Tables 10 and 11 again prove the worst performance by cluster 5 with very large values for both RE abs and Q pred /Q obs ratio values. Clusters 3 and 4 also show a mixture of under-and over-estimation. Clusters 1 and 2 seem to show promising results, although in the case of cluster 1, the mean RE abs and mean Q pred /Q obs ratio values for 1% AEP flood are quite high. Cluster 2 shows the lowest values for both the overall REabs and Q pred /Q obs further confirming the better performance of cluster 2.     Table 12 summarizes the evaluation statistics from application of leave-one-out validation with respect to MSE, RMSE, BIAS, RBIAS, RRMSE, RMSNE, RE r , and med_Q pred /Q obs based on observed and predicted flood values for all the five clusters in case of 5% AEP flood. A value close to zero is preferable for MSE as zero indicates no error in prediction. However, from Table 12 it is seen that all the MSE values for the five clusters are very large, in the range of 25,000 to 35,000,000. The smallest MSE is found in case of cluster 2 and the value is 25,309. The range of RMSE for all the clusters fall between 159 m 3 /s to 5800 m 3 /s. Cluster 2 shows the lowest RMSE with a value of 159 m 3 /s proving cluster 2 being the best performing group. Cluster 2 also shows the smallest values in case of RRMSE, RMSNE, BIAS, and RBIAS. Cluster 1 has large value for both BIAS and RBIAS (1480.61 and 388.4, respectively). Clusters 4 and 5 show a large negative BIAS and cluster 5 shows a very large negative RBIAS. The results here are notably higher than those reported in Durocher et al. [67], Chokmani and Ouarda [68], Chebana et al. [47], and Shu and Ouarda [69]. In Rahman et al. [6], independent component regression was adopted to develop flood prediction equations using the same data set as of this study, where error values are similar to this study. It should be noted that the values of MSE, RMSE, and BIAS depend on catchment size, a larger catchment generally has a larger discharge which is likely to result in higher MSE, RMSE, and BIAS.
The RE r and med_Q pred /Q obs both show, cluster 2 has the smallest values. Cluster 5 is the worst performing group with a high RE r and median_Q pred /Q obs (403.44 and −3.03, respectively). Hence, it can be said that group of stations having smaller catchment areas and lower range of other catchment characteristics such as cluster 2 is likely to generate more accurate flood prediction in QRT in the study region.

Comparison with ARR RFFA Model
An assessment of RE r values between ARR RFFA model [32] and PCR KNN15, PCR KNN25, and QRT models for cluster 2 is presented in this section. ARR RFFA model is developed using a Bayesian generalized least square based parameter regression technique to estimate regional flood quantiles using Australian flood data [32]. The RE r values for ARR RFFA model and PCR KNN15, PCR KNN25 and QRT models for cluster 2 are compared in Table 13. It is apparent from Table 13 that, the RE r values for ARR RFFA model (ranging from 56% to 64%) is greater than PCR KNN15, PCR KNN25 and QRT models for cluster 2 (RE r values ranging from 42% to 69%, 42% to 60% and 22% to 37%, respectively). ARR RFFA model is developed with data from 558 stations from NSW, Victoria and Queensland [58] and PCR KNN15, PCR KNN25 and QRT models for cluster 2 are developed for 15, 25 and 16 stations from NSW. This may be a possible reason for these differences in RE values. However, it is promising to see that the RE r values from the PCR KNN15, PCR KNN25, and QRT models for cluster 2 are analogous to the RE r values of ARR RFFA model. From this study, it may be argued that PCR may not be a good choice in RFFA in case of NSW. Moreover, a group of stations with smaller catchment areas such as cluster 2 may generate a better RFFA grouping. Further research with additional catchment characteristics data may enhance the reliability of PCR and cluster analysis based RFFA models in the study region.

Conclusions
A total of 88 stations form NSW, Australia and eight catchment characteristics variables are used in this study to compare regression-based RFFA models. Principal components are derived by applying the principal component analysis on the catchment characteristics data set and a multiple linear regression technique is applied to predict flood quantiles. The first five principal components are selected to be the predictor variables in the regression equations. According to the R 2 and adj_R 2 values of the developed regression equations, it is found that the principal component regression based RFFA models perform quite poorly.
The application of cluster analysis resulted into five clusters from the selected 88 stations. Cluster 1 has the smallest catchment areas and larger rainfall intensity, mean annual rainfall, mean annual evapo-transpiration, shape factor, forest, and stream density values. Stations in cluster 2 have smaller sized catchments along with moderate values for rainfall intensity, shape factor, stream density, and mean annual rainfall, although the mean annual evapo-transpiration is the highest in case of cluster 2. Cluster 3 seems to have all the catchment characteristics uniformly distributed. Cluster 4 is influenced by medium sized catchment area, smaller rainfall intensity, mean annual rainfall, mean annual evapo-transpiration, shape factor, and forest and stream density. The largest values for rainfall intensity, mean annual rainfall, mean annual evapo-transpiration, shape factor, and forest and stream density are found in case of cluster 5, although the other characteristics are quite small with large variance in forest cover. A quantile regression technique is applied to all five clusters with leave-one-out validation. Based on the findings of this study, it can be said that cluster 2 is the best performing group among the selected five clusters. It is also found that a relatively smaller catchment areas and small range of other catchment characteristics in a group is likely to result in more accurate RFFA models in the study region.
It is also found that cluster analysis does not generate any homogeneous groups of catchments for the selected dataset in NSW. Furthermore, the degree of heterogeneity does not have any link with RFFA model performance for the dataset used in this study. The best cluster-based quantile regression model in RFFA is found to be more accurate than the currently recommended ARR RFFA Model in Australia.