Comparing Performance of ANN and SVM Methods for Regional Flood Frequency Analysis in South-East Australia

: Design ﬂood estimations at ungauged catchments are a challenging task in hydrology. Regional ﬂood frequency analysis (RFFA) is widely used for this purpose. This paper develops artiﬁcial intelligence (AI)-based RFFA models (artiﬁcial neural networks (ANN) and support vector machine (SVM)) using data from 181 gauged catchments in South-East Australia. Based on an independent testing, it is found that the ANN method outperforms the SVM (the relative error values for the ANN model range 33–54% as compared to 37–64% for the SVM). The ANN and SVM models generate more accurate ﬂood quantiles for smaller return periods; however, for higher return periods, both the methods present a higher estimation error. The results of this study will help to recommend new AI-based RFFA methods in Australia.


Introduction
Floods are one of the most destructive natural hazards, resulting in billions of dollars' of annual damage across the globe [1,2]. Floods cause damage to infrastructures [3,4], transportation systems [5,6], properties, heritage sites, environments and death to humans [7,8]. Due to global warming, floods are becoming more frequent and destructive [9,10]. Although intense rainfall and snow melt are the main causes of flooding, environmental degradation [11], land-use change [12] and other anthropogenic factors increase the severity of flooding [13,14]. Many countries are in danger of floods at different scales, with an estimated 1.3 billion people to be directly impacted by floods by 2050 [15].
Australia has faced many devastating floods in the past, which resulted in thousands of casualties, mental and physical losses. Additionally, millions of dollars have been spent on the maintenance and rehabilitation of flood-affected infrastructures and communities across Australia [16]. Subtropical climates, low-lying cities and heavy rainfall put the eastern part of Australia at serious flood risk [17,18].
To reduce flood damage, a risk-based approach is generally adopted in the design of hydraulic structures and for numerous flood management tasks. Here, a design flood/flood quantile is defined as a flood discharge with a certain return period (such as 100-year flood). Both the flood frequency analysis (FFA) and regional flood frequency (RFFA) are widely used for this purpose for gauged and ungauged catchments, respectively [19]. Most of the previously developed RFFA models are linear in nature, such as the index flood method of Hosking and Wallis (1997) [19,20]. More recently, artificial intelligence (AI)-based RFFA Using 21 years of data from 47 catchments in Iran, Ghaderi et al. [44] compared the performances of the SVM, ANFIS and GEP models. They also used the M-test and GM-test to find the best test and training data ratio and the most critical input variables. They reported that the SVM method was the best-performing method in terms of the coefficient of determination (R 2 ) and root mean square error (RMSE). The ANN, SVR and NLR methods were compared by Vafakhah and Khosrobeigi Bozchaloei [28] using a dataset from 33 stations in Iran. They reported that the SVR was the best-performing method for a regional analysis of flood duration curves. Using a dataset from 202 catchments in Australia, Haddad and Rahman [45] compared the performances of 15 combinations of RFFA methods, including Bayesian generalised least squares (BGLSR), multidimensional scaling (MDS) and SVR. They reported that the MDS-based SVR method using a radial basis function (RBF) kernel was the best-performing model in terms of consistency, accuracy of the results and generalisation.
Five different types of ANN methods were used by Kordrostami et al. [46] to estimate design floods in Australia, where they used a dataset from 88 gauging stations. They reported that using fewer predictor variables improved the performance of the ANN method, except when all the eight were used. The performances of some AI-based RFFA methods, including SVR, projection pursuit regression (PPR), boosted regression trees (BRT) and multivariate adaptive regression spline (MARS), were compared by Allahbakhshian-Farsani et al. [26] using data from 54 hydrometric stations in Iran. They used statistical indices such as RMSE and relative root mean square error (RRMSE) to compare the methods and reported that the SVR model based on the RBF kernel outperformed the other methods.
Using a dataset of 37 years from three hydrometric stations, Linh et al. [27] compared the performances of the ANN, MLR and WNN (variation methods of ANN) to estimate design floods. They used RMSE, R 2 and NASH to evaluate the performances of these methods. They reported that the WNN method had a better performance in terms of generalisation capability and accuracy. A dataset from 151 catchments in Canada was used by Desai and Ouarda [47] to compare the performance of different combinations of the canonical correlation analysis (CCA) with ANN, random forest regression (RFR), MLR and ANN ensembles. They reported that a combination of CCA and RFR to be the best-performing method. In another study, Bozchaloei and Vafakhah [48] used 20 years of data recorded from 33 hydrometric stations to estimate design floods, in which they compared the performances of the ANN, ANFIS and NLR methods and reported ANFIS to be the best-performing method. In another study, Kumar et al. [49] used the fuzzy inference system (FIS), ANN and L-moment methods using a dataset of 15-29 years from 17 catchments in India. They reported FIS to be the best-performing method, followed by ANN in terms of accuracy and reliability.
AI-based RFFA models are generally more complex than the simplified RFFA techniques, such as the index flood method [20] and quantile regression technique [50]. Some of the simplified RFFA techniques use only a few predictor variables, such as the catchment area and mean annual rainfall, and are easier to apply in practice. However, AI-based techniques are becoming more popular as computing powers are increasing, and these often provide more accurate results [51]. Based on the previous studies on AI-based RFFA methods, the SVM and ANN are found to be quite popular in other countries; however, their application to Australia is quite limited. Hence, the objective of this study is to develop and test the ANN and SVM-based RFFA methods on South-East Australian catchments. The results of this study will help to recommend more accurate RFFA models in Australia for design applications.

Study Area
South-East Australia (Figure 1) was selected as the study area, since that part of Australia has high-quality streamflow data. The region is highly populated and has been South-East Australia ( Figure 1) was selected as the study area, since that part of Australia has high-quality streamflow data. The region is highly populated and has been impacted by numerous floods in the past. Catchments, which are natural and have at least 30 years of streamflow data, were selected for this study.

Data
A total of 181 catchments were selected for this study, as shown in Figure 1. The annual maximum (AM) flood data length of the selected stations ranged 40-89 years (mean: 48 years). The catchment sizes ranged 3-1010 km 2 (mean: 349 km 2 ). The selected catchments were divided into a training dataset (consisting of 126 catchments) and testing dataset (consisting of 55 catchments).
A total of eight predictor variables (Table 1) were selected as candidates [19], which were the catchment area (AREA), design rainfall intensity with a 6-h duration and 2-year return period (I62), mean annual rainfall (MAR), shape factor (SF), mean annual evapotranspiration (MAE), stream density (SDEN), mainstream slope S1085 and fraction forested area (FOREST). It should be noted that all these eight predictor variables were included in the developed ANN and SVM-based RFFA models presented in this study.
AREA is the main scaling factor and has widely been used in RFFA [52,53]. The design rainfall intensity is the main input to the flood generation process and has been

Data
A total of 181 catchments were selected for this study, as shown in Figure 1. The annual maximum (AM) flood data length of the selected stations ranged 40-89 years (mean: 48 years). The catchment sizes ranged 3-1010 km 2 (mean: 349 km 2 ). The selected catchments were divided into a training dataset (consisting of 126 catchments) and testing dataset (consisting of 55 catchments).
A total of eight predictor variables (Table 1) were selected as candidates [19], which were the catchment area (AREA), design rainfall intensity with a 6-h duration and 2-year return period (I 62 ), mean annual rainfall (MAR), shape factor (SF), mean annual evapotranspiration (MAE), stream density (SDEN), mainstream slope S1085 and fraction forested area (FOREST). It should be noted that all these eight predictor variables were included in the developed ANN and SVM-based RFFA models presented in this study. AREA is the main scaling factor and has widely been used in RFFA [52,53]. The design rainfall intensity is the main input to the flood generation process and has been adopted in many RFFA studies [50,54]. The minimum, maximum, average and median times of the concentration values of the selected catchments are 1.15 h, 10.53 h, 6.45 h and 6.67 h, respectively. The duration of the design rainfall is taken as six hours, which is closer to the average time of the concentration (6.45 h) of the selected catchments. To consider the shape of a catchment in RFFA, Rahman, Haddad [55] introduced SF, which is defined as the distance between the catchment centroid and outlet divided by the AREA. The MAE and MAR are surrogate to other characteristics affecting flood generation [56]; these are obtained at a catchment centroid from the Australian Bureau of Meteorology website. SDEN is another important factor in the flood generation process, which is calculated by dividing the total stream length within the catchment by AREA. FOREST is directly connected to the loss factor, the amount of water loss through infiltration during a flood event, and it also affects catchment roughness. S1085 is directly connected to the flood response (a higher slope means a higher flow velocity) and is defined by Equation (1) [19]: where H2 and H1 are elevations at 0.85 and 0.10 of the mainstream length, measured from the catchment outlet, and L is the mainstream length. Table 1 presents the summary of the selected predictor variables, and Figures 2 and 3 present boxplots and correlation plots of the selected predictor variables.
The dependent variables in this study are flood quantiles for 2-year, 5-year, 10-year, 20-year, 50-year and 100-year return periods (Q 2 , Q 5 , Q 10 , Q 20 , Q 50 and Q 100 , respectively). These are estimated by fitting a log-Pearson Type 3 (LP3) distribution to each of the selected station's AM flood series. The parameters of the GEV distribution were estimated by the Bayesian method. It should be noted that other distributions such as GEV could have been used, but for South-East Australia, LP3 was found to be the best-fit probability distribution in previous FFA studies [50,54]. It should also be noted that the impacts of non-stationarity on the FFA results are worth considering [57], which, however, is beyond the scope of this study.

ANN-Based RFFA Method
ANN is an empirical model capable of predicting flood quantiles using selected predictor variables [58]. The ANN modelling consists of three steps (model training, model testing and model evaluation). The evaluation is carried out using a set of statistical metrices, which compares predicted flood quantiles by the ANN model with the observed flood quantiles. This study uses a multi-layered feed-forward neural network consisting of an input layer with three to four nodes, a hidden layer and an output layer with one node, Water 2022, 14, 3323 6 of 18 as shown in Figure 4 [59,60]. The optimum number of the hidden nodes and output nodes are selected according to a study conducted by Zhihua et al. [61].   The dependent variables in this study are flood quantiles for 2-year, 5-year, 10-year, 20-year, 50-year and 100-year return periods (Q2, Q5, Q10, Q20, Q50 and Q100, respectively). These are estimated by fitting a log-Pearson Type 3 (LP3) distribution to each of the selected station's AM flood series. The parameters of the GEV distribution were estimated by the Bayesian method. It should be noted that other distributions such as GEV could have been used, but for South-East Australia, LP3 was found to be the bestfit probability distribution in previous FFA studies [50,54]. It should also be noted that the impacts of non-stationarity on the FFA results are worth considering [57], which, however, is beyond the scope of this study.

ANN-Based RFFA Method
ANN is an empirical model capable of predicting flood quantiles using selected predictor variables [58]. The ANN modelling consists of three steps (model training, model testing and model evaluation). The evaluation is carried out using a set of statistical metrices, which compares predicted flood quantiles by the ANN model with the observed flood quantiles. This study uses a multi-layered feed-forward neural network consisting of an input layer with three to four nodes, a hidden layer and an output layer with one node, as shown in Figure 4 [59,60]. The optimum number of the hidden nodes and output nodes are selected according to a study conducted by Zhihua et al. [61].  There are four different types of training algorithms: Levenberg-Marquardt, Bayesian Regularization, Scaled Conjugate Gradient and Multilayer perceptron (MLP). First, the input nodes are filled with input variables. Hidden nodes are then connected to the input variables, and the initial weights are used to assign the synaptic connection between the input and hidden nodes and the hidden and output nodes. The initial weights are then replaced by random values of weights to start the training processes. These random values are used to generate normalised values, which are then used as new input nodes linked with hidden nodes [61]. The total sum of the input variables multiplied by the corresponding initial weights is activated to develop a MLR-type model [62]. The ANN method uses the following equation: where z is a symbol for the graphical representation of ANN shown in Figure 4, wi is the weight coefficient and xi is the input or independent variable. There are four different types of training algorithms: Levenberg-Marquardt, Bayesian Regularization, Scaled Conjugate Gradient and Multilayer perceptron (MLP). First, the input nodes are filled with input variables. Hidden nodes are then connected to the input variables, and the initial weights are used to assign the synaptic connection between the input and hidden nodes and the hidden and output nodes. The initial weights are then replaced by random values of weights to start the training processes. These random values are used to generate normalised values, which are then used as new input nodes linked with hidden nodes [61]. The total sum of the input variables multiplied by the corresponding initial weights is activated to develop a MLR-type model [62]. The ANN method uses the following equation: where z is a symbol for the graphical representation of ANN shown in Figure 4, w i is the weight coefficient and x i is the input or independent variable.

SVM-Based RFFA Method
The SVM method assumes that there is a relationship between the independent (I), and dependent (Q) variables via an additional parameter called noise (N), as shown below: where the function (f) is developed based on available/measured data. This function could then be used to predict the flood quantile for an ungauged catchment using similar independent variables. Training an SVM model includes data classification and optimisation of an error function. A general process of an SVM method is shown in Figure 5. The method starts with a simple approach of finding the best formula, which simply connects the data and then calculates the error of the formula. The SVM methods are classified into two types based on two different types of error.

Kernel Functions
Kernels are representative of the input data points in a higher dimensional feature space. Linear, polynomial, radial basis function (RBF) and sigmoid are some of the most common Kernel types used in SVM methods: where , is the kernel function. The gamma function can be used as a kernel function. The RBFs are common choices of kernel types, due to their localised and finite responses across the full domain of the real x-axis.  Error function type 1, epsilon-SVM, as shown in Equation (4):

Kernel Functions
Kernels are representative of the input data points in a higher dimensional feature space. Linear, polynomial, radial basis function (RBF) and sigmoid are some of the most common Kernel types used in SVM methods: where K X i , X j is the kernel function. The gamma function can be used as a kernel function. The RBFs are common choices of kernel types, due to their localised and finite responses across the full domain of the real x-axis.

Statistical Metrices Used for Model Evaluation
Based on the results of the model testing on the selected 55 test catchments, the following metrices are used to compare the performances of the developed RFFA models [19]: Abs REi = ABS(REi) Qobs,i is at-site flood quantile (in m 3 /s) estimated by fitting LP3 distribution for each of the selected return periods at the site i (i = 1, N), and Qpred,i (in m 3 /s) is the predicted-flood quantiles using either SVM or ANN at site i. Here, N = 55, as there are 55 test catchments.

Results
The final predictor variables were selected based on their p-values (p-value must not exceed 0.10). The final predictor variables used are (i) AREA, I 62 , MAR and SDEN (for Q 2 ); (ii) AREA, I 62 and SDEN (for Q 5 and Q 10 ); (iii) AREA, I 62 , SDEN and MAR (for Q 20 and Q 50 ); and (iv) AREA, I 62 , MAE and MAR (for Q 100 ). Table 2 shows the statistical metrices of the best ANN model for different flood quantiles. As shown in Appendix A (Table A1), the best methods were selected based on the most common evaluation statistics, such as MSE, RMSE, RRMSE, REr and Rbias. Table A1 shows some of the best-performing ANN methods with different algorithms; from these, the best one is presented in Table 2 and used for further investigation. Table A2 shows the different parameters used in developing the SVM methods, and Table A3 represents the best-performing SVM methods with different algorithms used in developing the SVM methods. The best-performing SVM methods are selected based on statistical indices and are represented in Table 3 and are used for further investigation. From Table 2, it can be seen that, for the ANN-based models, Q 10 has the smallest Rbias and RMSNE values, whereas Q 5 has the smallest REr value. From Table 3, it is found that Q 2 has the smallest Rbias value, and Q 5 has the smallest RMSNE value.  Figure 6 shows boxplot of Q ratio values for ANN and SVM models. As can be seen in this figure, the ANN model shows some overestimation for Q 20 , Q 50 and Q 100 , whereas, for the SVM model, there is an overestimation for Q 20 and Q 100 . In terms of Q ratio , the ANN presents better results for Q 5 (with a smaller box width) as compared to SVM. As can be seen in Figure 6, the results of Q 2 for SVM are better than the ANN. In terms of Q 10 , Q 20 and Q 100 , both the models perform very similarly; however, the median values for SVM seem to be further away from the 1:1 line. The Q ratio results for Q 50 show that the SVM method has a better performance than the ANN, with a smaller box width and median value located near the 1:1 line Overall, the Q 5 model for ANN is the best model (with the smallest box width), followed by Q 10 (ANN), Q 2 (SVM), Q 5 (SVM), Q 10 (SVM) and Q 50 (SVM). For Q 100 , both the ANN and SVM and, for Q 50 , the ANN shows remarkable overestimations.
for SVM seem to be further away from the 1:1 line. The Qratio results for Q50 show that the SVM method has a better performance than the ANN, with a smaller box width and median value located near the 1:1 line Overall, the Q5 model for ANN is the best model (with the smallest box width), followed by Q10 (ANN), Q2 (SVM), Q5 (SVM), Q10 (SVM) and Q50 (SVM). For Q100, both the ANN and SVM and, for Q50, the ANN shows remarkable overestimations.  Figure 7 shows the boxplots of RE values for the ANN and SVM models. For Q2, SVM has a better performance, since it produces a smaller box width with a median value closer to the 0:0 line. The ANN produces better results for Q5 with a smaller box width. In terms of Q10, Q20 and Q100, both the models perform similarly; however, SVM produces better results for Q50. Overall, SVM shows better performance with smaller box widths. In terms of bias, Q5 (ANN), Q10 (ANN) and Q50 (SVM) present the best performances, as the median values are located closer to the 0:0 line. The Q100 model for  Figure 7 shows the boxplots of RE values for the ANN and SVM models. For Q 2 , SVM has a better performance, since it produces a smaller box width with a median value closer to the 0:0 line. The ANN produces better results for Q 5 with a smaller box width. In terms of Q 10 , Q 20 and Q 100 , both the models perform similarly; however, SVM produces better results for Q 50 . Overall, SVM shows better performance with smaller box widths. In terms of bias, Q 5 (ANN), Q 10 (ANN) and Q 50 (SVM) present the best performances, as the median values are located closer to the 0:0 line. The Q 100 model for both the ANN and SVM and Q 50 (ANN) and Q 20 (SVM) models produce notable overestimations. The best model is found for Q 5 (ANN), followed by Q 5 (SVM).  Figures 8 and 9 show the qualitative comparison of the performance of the ANN and SVM methods based on the classification of the result in three groups (Good, Fair and Poor). These identifiers are used for different ranges of REr and Qratio values [63]. As seen in Figure 8, catchments with REr values falling in the range of 0-30% are rated as "Good", catchments with REr values in the range of 31-60% are rated as "Fair" and "Poor" is assigned to the remaining catchments with REr values beyond 61%. Figure 7 shows the qualitative comparison of the performance of the ANN and SVM models for different test catchments based on Qratio. In this figure "Good" is assigned to the test catchments with the Qratio values falling between 0.8 and 1.3, "Fair" is assigned to the test catchments with Qratio values falling in the range of 0.6-0.79 and 1.31-2 and "Poor" is assigned to the remaining test catchments. The ANN method outperforms the SVM Figure 7. RE for the ANN and SVM methods (y-axis represents RE in %). Figures 8 and 9 show the qualitative comparison of the performance of the ANN and SVM methods based on the classification of the result in three groups (Good, Fair and Poor). These identifiers are used for different ranges of REr and Q ratio values [63]. As seen in Figure 8, catchments with REr values falling in the range of 0-30% are rated as "Good", catchments with REr values in the range of 31-60% are rated as "Fair" and "Poor" is assigned to the remaining catchments with REr values beyond 61%. Figure 7 shows the qualitative comparison of the performance of the ANN and SVM models for different test catchments based on Q ratio . In this figure "Good" is assigned to the test catchments with the Q ratio values falling between 0.8 and 1.3, "Fair" is assigned to the test catchments with Q ratio values falling in the range of 0.6-0.79 and 1.31-2 and "Poor" is assigned to the remaining test catchments. The ANN method outperforms the SVM method in terms of REr values, because it has a Good-rated performance for more test catchments than SVM-in particular, for smaller return periods. Overall, both the SVM and ANN show a poor performance for Q 100 .   In terms of the cumulative percentage of stations based on Abs RE values, shown in Figure 10, both models perform very similarly, with the ANN method performing slightly better than the SVM for all the return periods, where the curve for ANN is above the SVM, except for Q 2 , showing that the ANN method performs better for a greater number of test catchments with lower ranges of Abs RE values.   Figure 11 shows the performance of the ANN and SVM models based on Abs RE values for 55 test catchments over the geographical space for Q 20 as an example. The ANN model performs better than the SVM, having 19 catchments with Abs RE values less than 25%, while there are only 14 catchments for the SVM method. There is no spatial pattern of the Abs RE values of the 55 test catchments. models. Figure 11 shows the performance of the ANN and SVM models based on Abs RE values for 55 test catchments over the geographical space for Q20 as an example. The ANN model performs better than the SVM, having 19 catchments with Abs RE values less than 25%, while there are only 14 catchments for the SVM method. There is no spatial pattern of the Abs RE values of the 55 test catchments.   (57.25-64.06%). It should be noted that the RFFA model was based on 558 stations from eastern Australia. Additionally, the ARR RFFA model used leave-one-out validation (as opposed to the split-sample validation adopted in this study), which generally generates a higher model error, because it is a more rigorous validation method. It should also be noted that the streamflow data lengths of the selected stations are much higher in the present study than the ARR RFFE model. This has given more advantage to the present study as compared to ARR RFFE model. It should be noted that ARR RFFE model used only four predictor variables, whereas the ANN and SVM-based RFFA models in the present study used eight predictor variables. The use of higher number of predictor variables played a role in reducing the REr values associated with the ANN and SVM-based RFFA models; however, these models have higher bias than the ARR RFFE model [55]. Further investigation is needed to reduce the bias of these AI-based RFFA models.

ANN SVM
The results of the present study are compared with similar ones. For example, Allahbakhshian-Farsani et al. [26] compared the results of the SVM, MARS, BRT, PPR and NLR methods and reported a RMSE value of 50.70 for their best-performing method (SVM); this value is 50.15 for the ANN-Q 2 and 50.17 for the SVM-Q 2 models in the present study.
Similarly, Ghaderi et al. [44] reported a RMSE value of 239.94 for their best-performing method (SVM) when comparing it with the ANFIS and GEP methods. Vafakhah and Bozchaloei [28] compared the results of the ANN, SVM and NLR methods and reported a RRMSE of 1.45 for their best-performing method (SVM); these values were 0.79 and 0.80 for our ANN-Q 5 and SVM-Q 5 models, respectively. Ouarda and Shu [32] compared the results of the ANN method with MLR model and reported a RRMSE value of 36.17 and RMSE value of 27.33 for the ANN method as the best-performing method. Shu and Ouarda [31] used the ANFIS, ANN, NLR and NLR methods and reported RMSE and RRMSE values of 316 and 57, respectively, for their best-performing method. Jingyi and Hall [23] used cluster analysis and ANN methods and reported the best RMSE value of 47 for their best-performing method. The above discussion shows that the ANN and SVM models developed in the present study provided results similar to the relevant international studies.

Conclusions
In this study, the ANN-based RFFA models are compared with the SVM-based models. The performance of these two models varies with the return periods. Overall, based on the median relative error, the ANN outperforms the SVM. The best model is found to be for Q 5 and Q 10 with the ANN, giving the smallest median relative error (33-36%). This is notably smaller than the ARR-RFFE model (57%). It should be noted that the ARR-RFFE model adopted only four predictors, whereas the ANN and SVM-based models presented here adopted eight predictor variables, which played a role in reducing the prediction error.
For Q 100 , both the SVM and ARR-RFFE models provide similar relative errors (64%). In terms of bias, both the ANN and SVM models provide significant overestimations for Q 100 . This highlights that the estimation of floods with higher return periods is challenging. even with the artificial intelligence-based models.
A split-sample validation is adopted for comparing different models in the present study; in future studies, a Monte Carlo cross-validation should be adopted where the dataset can be randomly divided into numerous training and testing datasets. Furthermore, hybrid methods could be applied by combining different AI-based methods to reduce the prediction error and bias.
It should be noted that the relative accuracy of any RFFA technique depends on the quality and quantity of the streamflow and predictor variable data, which are used to develop and test the technique. For example, a short streamflow record length can introduce significant a sampling error in flood quantile estimates, which are used as a dependent variable in RFFA. Hence, the RFFA techniques examined in the present study should be repeated when a greater streamflow record length is available in the study area in the future. Furthermore, the impacts of climate change on RFFA need to be evaluated. The observed bias for the AI-based based RFFA models should be subjected to bias correction similar to the ARR RFFA technique, which, however, was not implemented here, as it needs further research.

Conflicts of Interest:
The authors declare that they have no conflicts of interest.