Low-Flow Identification in Flood Frequency Analysis: A Case Study for Eastern Australia

: Design flood estimation is an essential step in many water engineering design tasks such as the planning and design of infrastructure to reduce flood damage. Flood frequency analysis (FFA) is widely used in estimating design floods when the at-site flood data length is adequate. One of the problems in FFA with an annual maxima (AM) modeling approach is deciding how to handle smaller discharge values (outliers) in the selected AM flood series at a given station. The objective of this paper is to explore how the practice of censoring (which involves adjusting for smaller discharge values in FFA) affects flood quantile estimates in FFA. In this regard, two commonly used probability distributions, log-Pearson type 3 (LP3) and generalized extreme value distribution (GEV), are used. The multiple Grubbs and Beck (MGB) test is used to identify low-flow outliers in the selected AM flood series at 582 Australian stream gauging stations. It is found that censoring is required for 71% of the selected stations in using the MGB test with the LP3 distribution. The differences in flood quantile estimates between LP3 (with MGB test and censoring) and GEV distribution (without censoring) increase as the return period reduces. A modest correlation is found (for South Australian catchments) between censoring and the selected catchment characteristics (correlation coefficient: 0.43), with statistically significant associations for the mean annual rainfall and catchment shape factor. The findings of this study will be useful to practicing hydrologists in Australia and other countries to estimate design floods using AM flood data by FFA. Moreover, it may assist in updating Australian Rainfall and Runoff (national guide).


Introduction
Floods are among the most dreadful natural disasters, causing destruction, economic losses, property damage, and the deaths of humans and animals [1].For example, the 2022 floods in Queensland and New South Wales (NSW) have been deemed to be one of the most expensive flood events in Australian history [2].Climate change impacts the frequency, duration intensity, and timing of major floods [3,4].At-site flood frequency analysis (FFA) is the most commonly used method for estimating design floods (when flood record length is adequate), which is required when designing hydraulic infrastructures, conducting flood insurance studies, and a variety of other water resource management tasks [5].However, for a reliable FFA, a long period of recorded flood data is required, which is not always available.As a result, regional flood frequency analysis (RFFA) is adopted for ungauged stations, which is a data-driven procedure [6].RFFA enables transferring flood information from gauged to ungauged stations based on regional homogeneity [5,7,8].
Streamflow data preparation is one of the most important steps in any FFA and RFFA, since the accuracy of quantile estimates largely depends on the quality and quantity of streamflow data length.Furthermore, the selection of an appropriate probability distribution is important in FFA because a poor choice often results in significant error and bias Water 2024, 16, 535 2 of 17 in flood quantile estimates [9].Many research studies have been conducted to compare various probability distributions in FFA.Several studies in Australia have argued that the log-Pearson type 3 (LP3) and generalized extreme value (GEV) distributions are the best-fit flood frequency distributions [10,11].Estimating parameters is also an important step in fitting the selected probability distribution to the AM flood data of a given station.Maximum likelihood (MLE) and L-moments are the most robust and widely accepted parameter estimation methods in FFA [11].
A potentially influential point ("outlier") is a point of observation that deviates significantly from the rest of the observations in the AM data series.This could be due to inconsistencies in data collection and recording, or it could simply be due to natural causes such as due to a drought condition.The identification of low flow outliers is a critical step in FFA because their presence in AM flood series can cause significant issues in fitting a probability distribution and can have a significant impact on the estimation of flood quantiles of higher return periods [12].The presence of low flow data points in AM series can lead to unreliable flood quantile estimates particullarly when logged flow data are used to fit a probability distribution.Several test procedures for identifying low outliers in FFA have been proposed [13,14].For example, a Grubbs-Beck test (GB) can be used to detect a single outlier at a time in FFA using a one-sided 10% significance level [13,15].It was recommended in the federal guidelines in the USA for detecting low flow outliers in FFA in Bulletin 17B [16].
In contrast to the GB test, which is used to detect a single low-flow outlier, the multiple Grubbs and Beck (MGB) test can identify multiple potential low flows (PILFs) in AM flood series at a given station [17].According to this, the MGB test has been proposed in Bulletin 17C in the USA as a method for evaluating low-flow outliers [18,19].Rosner [20,21], on the other hand, developed a consecutive two-sided outlier test based on a generalization of the GB test that can detect multiple outliers in the AM flood series, which are abnormally large or small.Similarly, Cohn et al. [17] presented a generalized GB (MGB) test based on significance levels, calculated using the new approximations that can detect PILFs in AM flood series and improve the robustness of the FFA.Accordingly, Rahman et al. [22] used FLIKE software to compare the results of LP3 distribution and the GEV distribution for ten Australian stations using two outlier identification tests (GB and MGB) and they found that flood quantile estimates produced from MGB with LP3 were more consistent than estimates derived from GB with LP3.
Blagojević et al. [23] used the GB test on 68 streamflow gauging stations in Serbia to investigate the presence of high and low flow outliers in the AM flood series and concluded that outlier detection is important in design flood estimation.Lamontagne et al. [24] proposed a MGB test to identify PILFs in AM flood data from California and they found that censoring and identifying PILFs improved FFA results to be more robust and precise.Stojkovi et al. [11] also performed an MGB test to detect low outliers in AM flood data from the Kolubara River Basin and concluded that detecting low outliers improves the estimation of design floods.Using the GB test, Ahmad et al. [3] identified outliers in AM flood data from ten stations in Pakistan's Indus Basin and concluded that censored samples outperformed uncensored ones.Furthermore, Ekeu-Wei et al. [4] used the MGB test to identify outliers in AM flood data from 17 stations in the Ogun-Osun River Basin in Western Nigeria.Jaiswal et al. [25] examined AM flood data from 26 stations in India's Chhattisgarh state Mahanadi basin and used the GB test to censor outlier data points from AM flood data series in FFA.
All the studies summarized in Table 1 show that censoring is important in FFA.However, there is no study on the performance of censoring at a regional/country level.Previous studies have used only a handful of stations and hence the merits of censoring could not be generalized.To fill this knowledge gap, the objective of this paper is to investigate the impacts of censoring of PILFs in AM flood series using the MGB test for a large number of Australian stations and to compare the performances of the two most commonly used probability distributions in FFA (GEV and LP3) with and without censoring.The rest of the paper is laid out as follows: Section 2 presents the study area and data, and the adopted methodology is presented in Section 3. The results and discussion are provided in Section 4. Finally, the conclusions are presented in Section 5.

Study Area and Data
We selected eastern Australia for this study since it has a high density of gauging stations with long AM flood data.A total of 176 stream gauging stations from New South Wales (NSW), 195 from Queensland (QLD), 183 from Victoria (VIC), and 28 from South Australia (SA) (total 582 stations) were selected.These stations were used in Project 5 of the Australian Rainfall and Runoff (ARR) upgrade project using AM flood data until 2011 [6]; however, we updated the AM flood data until 2018 for these stations as part of this study.Figure 1 illustrates the study region and geographical distribution of the selected stations.Figure 2a depicts the distribution of the AM flood record lengths of the selected 582 stations.The average record length of the AM flood data for the selected 582 stations is 44 years, ranging from 20 to 109 years.Among these stations, 86% have a record length of at least 30 years, and only 13% stations have a record length greater than 55 years.It should be noted that a 20-year record length is too small to carry out meaningful FFA, and, hence, caution should be exercised with the results of the stations (used in this study) having record lengths in the range of 20 to 29 years.Figure 2b depicts the distribution of catchment areas for the selected stations.The selected stations have an average catchment area of 289 km 2 , ranging from 1 km 2 to 1036 km 2 .Among the stations, 79% are smaller than 500 km 2 and 21 are larger than 500 km 2 .
catchment areas for the selected stations.The selected stations have an average catchment area of 289 km 2 , ranging from 1 km 2 to 1036 km 2 .Among the stations, 79% are smaller than 500 km 2 and 21 are larger than 500 km 2 .

Methodology
In this study, special consideration is given to the challenges posed by the presence of PILFs in AM flood data.Therefore, the MGB test is selected for its ability to identify multiple PILFs in the AM flood series to overcome the limitation of the original GB test, which can detect only one outlier in a given station's AM flood series at a time [15].catchment areas for the selected stations.The selected stations have an average catchment area of 289 km 2 , ranging from 1 km 2 to 1036 km 2 .Among the stations, 79% are smaller than 500 km 2 and 21 are larger than 500 km 2 .

Methodology
In this study, special consideration is given to the challenges posed by the presence of PILFs in AM flood data.Therefore, the MGB test is selected for its ability to identify multiple PILFs in the AM flood series to overcome the limitation of the original GB test, which can detect only one outlier in a given station's AM flood series at a time [15].

Methodology
In this study, special consideration is given to the challenges posed by the presence of PILFs in AM flood data.Therefore, the MGB test is selected for its ability to identify multiple PILFs in the AM flood series to overcome the limitation of the original GB test, which can detect only one outlier in a given station's AM flood series at a time [15].
GEV and LP3 distributions are selected as candidate distributions for FFA because previous studies found that these two distributions are better suited to Australian AM flood data [10].The estimation of three parameters (location, scale, and shape) of the GEV and LP3 distributions for each of the stations are conducted by L-moments for GEV Water 2024, 16, 535 5 of 17 distribution (due to its robustness and efficiency in estimating distributional parameters) and the Bayesian parameter estimation method for LP3 distribution.

Log-Pearson Type III Distribution
The LP3 distribution is described by the following equation [27]: where Q T is the flood quantile having an annual exceedance probability (AEP) of 1 in T; M is the mean of the natural logarithms of the AM flood series; S is the standard deviation of the natural logarithms of AM flood series; and K T is frequency factor for the LP3 distribution for AEP of 1 in T, which is a function of the AEP and the skewness of the natural logarithms of the AM flood series [28].

Generalized Extreme Value Distribution
The GEV distribution is described by the following equation [29]: where Q T is the flood quantile having an AEP of 1 in T, µ is the location parameter, α is the scale parameter; and κ is the shape parameter [30].The at-site FFA for each of the selected stations was conducted using FLIKE software [31].Flood quantiles were estimated for various AEPs such as 1%, 2%, 5%, 10%, 20%, and 50% (Q 100 , Q 50 , Q 20 , Q 10 , and Q 2 , respectively).

Parameter Estimation Method
In estimating the three parameters (location, scale, and shape) of the GEV and LP3 distributions, L-moments were used for the GEV distribution and Bayesian parameter estimation for the LP3 distribution.The resulting parameters were then used to estimate the selected flood quantiles.Flood quantiles were estimated for various AEPs, and the results were carefully examined to understand the likelihood of extreme events occurring for different return periods at a given station.

Multiple Grubbs-Beck Test
The GB and MGB test details are given by Cohn et al. [17] and Stedinger [32].The original GB test, which was recommended in Bulletin 17B [16], describes a low outlier threshold by the following equation: where K n is a one-sided, 10% significance-level critical value for an independent sample of n normal variates, and µ and σ are the sample mean and standard deviation of the whole data set [19].Any observation less than X crit is labelled as a "low outlier" [17].Any observation less than X crit is labelled as a "low outlier" [16,17].Bulletin 17B [16] uses a conditional probability adjustment to remove low outliers from AM flood data and adjust the frequency curve [16,17].K n values are arranged in Section A4 of the Interagency Advisory Committee on Water Data (IACWD) [16] ], based on Table A1 in Grubbs and Beck [13].Stedinger et al. [32] presents an approximation for 5 ≤ n ≤ 150: The original GB test can detect one outlier/PILF in the AM flood series of a given station in one go [15].However, AM flood series may contain more than one low outlier.
To overcome this limitation, Rosner [20,21] developed a generalised Extreme Studentised deviate (ESD) to detect one or more outliers from a data set, which is defined as: where X and σ denote the sample mean and standard deviation of the whole observation [20,21].The test removes the observation that maximises X i − X and recomputes T 2 using the remaining n − 1 data observations [20,21].Then, it repeats this process until the r number of data points are removed (where r is the number of outliers).Cohn et al. [17] developed a generalization of the GB test, called the MGB to be able to identify multiple PILFs.In order to consider whether logarithms of AM series {X [1:n] , X [2:n] , . . ..., X [k:n] } are reliable with a normal distribution and the other observations in the sample, Cohn et al. [17] examine a statistic ω [k:n] which is defined by the following equation: where X [k:n] represents the k th smallest observation in the flood sample, while μk is the partial mean and σ 2 k is the partial variance which are calculated using only the observations greater than X [k:n] [17].
The Multiple Grubbs-Beck (MGB) test was employed to identify PILFs in the AM flood series.Statistical measures were calculated such as the mean and standard deviation to assess the robustness of the detected outliers.

Comparison of Results
To further assess the differences between GEV and LP3 distributions (with and without censoring), the absolute percentage difference between the quantiles estimated by the two distributions was calculated using the following equation: where X 1 represents the flood quantiles from the GEV distribution at a specified AEP and, X 2 represents the flood quantiles from the LP3 distribution (with or without censoring) at the same AEP.This measure represents the relative variation between the two distributions as a percentage at different AEPs focusing on the effect of censoring low flows in FFA.A higher value indicates a larger difference in the estimated quantiles, while a lower value suggests a closer agreement between the two distributions.The analysis was performed for various AEPs, including 1%, 2%, 5%, 10%, 20%, and 50%, providing a comprehensive understanding of the difference between the GEV and LP3 distributions in representing extreme events in the AM flood series with and without censoring.

Regression Analysis
A general linear regression model is given by the following equation: where Y is is the number of censoring data points (dependent variable); β 0 is the intercept; β 1 , β 2 , β 3 . . .are the regression coefficients for the corresponding catchment characteristics; X 1 , X 2 , X 3 . . .are the catchment characteristic variables; and ε denotes the error term.
T-statistics and p-values are evaluated for a significance level of 0.05.A p-value less than 0.05 indicates a statistically significant association.Multiple R, co-efficient of determination (R 2 ), and adjusted R 2 are calculated to assess the strengths of the linear relationship.

Identification of Potentially Influential Low Flows (PILFs)
For NSW, it was found that 58 out of 176 stations did not require any censoring and for 30 stations, more than 40% of AM flood data points contain PILFs that needed censoring.For VIC, 51 out of 183 stations did not require any censoring, while 40 stations required censoring for more than 40% of AM flood data points.For SA, it was found that 4 out of 28 stations did not require any censoring, while 5 stations required censoring for more than 40% of AM flood data points.For QLD, 54 out of 195 stations did not require censoring, while 35 stations required censoring for more than 40% of AM flood data points.
Figure 3 shows that 27% (167 out of 582) of stations do not require any censoring; for 88 stations, the MGB test detects between 1 and 10% of the AM flood data points as PILFs; for 69 stations, the MGB test detects between 11% and 20% of AM data points as PILFs.For 78 stations, the MGB test identifies between 21 and 30% of AM data points as PILFs; for 78 stations, the MGB test detects between 31 and 40% of the AM data points as PILFs; and for 102 stations, the MGB test detects at least 41 PILFs.Overall, 71% of the stations contain PILFs.Rahman et al. [22], note that the MGB test detects 40% to 50% of the AM flood series as PILFs at 10 stations in Eastern Australia, which has a temperate climate.Comparing to the USA, Barth et al. [27] report that up to 50% of the peak flows are identified as PILFs using the MGB test in AM flood series from 624 stations in the central and southern coast of California, which is mainly due to precipitation from cold season rainfall.Paretti et al. [32] examine outliers in Arizona (in the Western United States) using the MGB test and identify around 12% of all the AM flood data points as PILFs, which is most likely due to the fact that Arizona is an arid, drought-prone state in the USA.T-statistics and p-values are evaluated for a significance level of 0.05.A p-value less than 0.05 indicates a statistically significant association.Multiple R, co-efficient of determination (R 2 ), and adjusted R 2 are calculated to assess the strengths of the linear relationship.

Identification of Potentially Influential Low Flows (PILFs)
For NSW, it was found that 58 out of 176 stations did not require any censoring and for 30 stations, more than 40% of AM flood data points contain PILFs that needed censoring.For VIC, 51 out of 183 stations did not require any censoring, while 40 stations required censoring for more than 40% of AM flood data points.For SA, it was found that 4 out of 28 stations did not require any censoring, while 5 stations required censoring for more than 40% of AM flood data points.For QLD, 54 out of 195 stations did not require censoring, while 35 stations required censoring for more than 40% of AM flood data points.
Figure 3 shows that 27% (167 out of 582) of stations do not require any censoring; for 88 stations, the MGB test detects between 1 and 10% of the AM flood data points as PILFs; for 69 stations, the MGB test detects between 11% and 20% of AM data points as PILFs.For 78 stations, the MGB test identifies between 21 and 30% of AM data points as PILFs; for 78 stations, the MGB test detects between 31 and 40% of the AM data points as PILFs; and for 102 stations, the MGB test detects at least 41 PILFs.Overall, 71% of the stations contain PILFs.Rahman et al. [22], note that the MGB test detects 40% to 50% of the AM flood series as PILFs at 10 stations in Eastern Australia, which has a temperate climate.Comparing to the USA, Barth et al. [27] report that up to 50% of the peak flows are identified as PILFs using the MGB test in AM flood series from 624 stations in the central and southern coast of California, which is mainly due to precipitation from cold season rainfall.Paretti et al. [32] examine outliers in Arizona (in the Western United States) using the MGB test and identify around 12% of all the AM flood data points as PILFs, which is most likely due to the fact that Arizona is an arid, drought-prone state in the USA Figure 4 shows the fitting of GEV distribution to the AM flood data of station 226222 from VIC; Figure 5 shows the LP3 distributions fitting to station 226222 without censoring; and Figure 6 shows the LP3 distribution fitting considering 13 censored data points.The distinction between these three graphs is striking.Thus, fitting the LP3 distribution (with Figure 4 shows the fitting of GEV distribution to the AM flood data of station 226222 from VIC; Figure 5 shows the LP3 distributions fitting to station 226222 without censoring; and Figure 6 shows the LP3 distribution fitting considering 13 censored data points.The distinction between these three graphs is striking.Thus, fitting the LP3 distribution (with censoring) and GEV distribution (without censoring) is much better than fitting the LP3 distribution without censoring, as the latter shows how far many points are located from the fitted line.The variation in the shape of the graph is obvious, and the results are similar to those reported by Rahman et al. [22], who concluded that the use of the MGB test provides a better fit for the LP3 distribution to the AM flood data series of ten stations in Eastern Australia.Also, Plavsic et al. [33] found that LP3 distribution is very sensitive to the presence of low outliers, and using the MGB test improves the distributional fitting.
Water 2024, 16, x FOR PEER REVIEW 8 of 17 censoring) and GEV distribution (without censoring) is much better than fitting the LP3 distribution without censoring, as the latter shows how far many points are located from the fitted line.The variation in the shape of the graph is obvious, and the results are similar to those reported by Rahman et al. [22], who concluded that the use of the MGB test provides a better fit for the LP3 distribution to the AM flood data series of ten stations in Eastern Australia.Also, Plavsic et al. [33] found that LP3 distribution is very sensitive to the presence of low outliers, and using the MGB test improves the distributional fitting   censoring) and GEV distribution (without censoring) is much better than fitting the LP3 distribution without censoring, as the latter shows how far many points are located from the fitted line.The variation in the shape of the graph is obvious, and the results are similar to those reported by Rahman et al. [22], who concluded that the use of the MGB test provides a better fit for the LP3 distribution to the AM flood data series of ten stations in Eastern Australia.Also, Plavsic et al. [33] found that LP3 distribution is very sensitive to the presence of low outliers, and using the MGB test improves the distributional fitting   Figure 7 shows the location of the 415 stations that require censoring, while Figure 8 shows the percentage of AM flood points that require censoring.The MGB test detects 1.2% to 51.7% of the AM flood data points as PILFs for 415 stations.It can be seen that stations in North-western Victoria require more data points to be censored in fitting the LP3 distribution.Figure 9 depicts the location of the stations that do not require censoring (167 stations).It was discovered that 64 NSW stations, 75 QLD stations, 13 SA stations, and 76 VIC stations have a percentage of censoring points greater than 25% of the total AM records at a station.Figure 7 shows the location of the 415 stations that require censoring, while Figure 8 shows the percentage of AM flood points that require censoring.The MGB test detects 1.2% to 51.7% of the AM flood data points as PILFs for 415 stations.It can be seen that stations in North-western Victoria require more data points to be censored in fitting the LP3 distribution.Figure 9 depicts the location of the stations that do not require censoring (167 stations).It was discovered that 64 NSW stations, 75 QLD stations, 13 SA stations, and 76 VIC stations have a percentage of censoring points greater than 25% of the total AM records at a station.Figure 7 shows the location of the 415 stations that require censoring, while Figure 8 shows the percentage of AM flood points that require censoring.The MGB test detects 1.2% to 51.7% of the AM flood data points as PILFs for 415 stations.It can be seen that stations in North-western Victoria require more data points to be censored in fitting the LP3 distribution.Figure 9 depicts the location of the stations that do not require censoring (167 stations).It was discovered that 64 NSW stations, 75 QLD stations, 13 SA stations, and 76 VIC stations have a percentage of censoring points greater than 25% of the total AM records at a station.S6 (can be seen in the Supplementary Materials) show how the censoring of PILFs affects the fitting of the LP3 distribution to the AM flood data for station 215004 from NSW.It shows the difference in fitting the LP3 distribution by censoring 0, 5, 13, 21, 23, and 31 data points, respectively.The application of the MGB test identifies, in total, 31 PILFs, and the fitting of the LP3 distribution appears to be the best in the last graph (Figure S6) (where all the 31 censored points are applied) compared to the first one (Figure S1) (where there are 0 censored point).Similar to Lamontagne et al. [24], the censoring of PILFs using the MGB test improves the fitting of the LP3 distribution to the AM flood data series at a given station.
After the application of the censoring procedure with the LP3 distribution, Figure 10 displays a new distribution of data lengths (after censoring), which shows that 48 stations have censored data lengths less than 20 years (and 534 stations have a censored data length of 20 years or higher).As a 20-year data length is too small to carry out meaningful FFA, these 48 stations were excluded from further analysis.However, there are 165 stations that have a censored data length in the range of 20 to 29 years.Generally, a 30-year data length is considered to be adequate for a meaningful FFA; hence, caution should be exercised when interpreting the FFA results of these stations (having a data length smaller than 30 years).As a 20-year data length is too small to carry out meaningful FF these 48 stations were excluded from further analysis.However, there are 165 stations th have a censored data length in the range of 20 to 29 years.Generally, a 30-year data leng is considered to be adequate for a meaningful FFA; hence, caution should be exercis when interpreting the FFA results of these stations (having a data length smaller than years).Figure 11 shows a boxplot displaying the variations in flood quantiles for 1% A using the GEV distribution without censoring and the LP3 distribution with the MGB t and censoring.The plot shows that 50% of the 1% AEP quantiles vary between a minimu of −4% and a maximum of 22% for almost all the states.Furthermore, in NSW, for examp 75% of the variations are less than 55% and 50% are greater than 8% (median value).R garding the upper and lower whisker values, it is found that the variation of flood qua tiles for the 1% AEP flood quantiles falls in the range of −45% to 72% in QLD, −42% to 79 in VIC, −60% to 84% in NSW, and −18% to 30% in SA.However, except for a small numb of stations, there is no significant difference in the 1% AEP flood quantiles between t GEV and LP3 distributions.These findings support the results of Rahman et al. [22], w demonstrated that in most of their 10 study stations, the GEV distribution and LP3 dist bution (with censoring) produce very similar results.Figure 11 shows a boxplot displaying the variations in flood quantiles for 1% AEP using the GEV distribution without censoring and the LP3 distribution with the MGB test and censoring.The plot shows that 50% of the 1% AEP quantiles vary between a minimum of −4% and a maximum of 22% for almost all the states.Furthermore, in NSW, for example, 75% of the variations are less than 55% and 50% are greater than 8% (median value).Regarding the upper and lower whisker values, it is found that the variation of flood quantiles for the 1% AEP flood quantiles falls in the range of −45% to 72% in QLD, −42% to 79% in VIC, −60% to 84% in NSW, and −18% to 30% in SA.However, except for a small number of stations, there is no significant difference in the 1% AEP flood quantiles between the GEV and LP3 distributions.These findings support the results of Rahman et al. [22], who demonstrated that in most of their 10 study stations, the GEV distribution and LP3 distribution (with censoring) produce very similar results.Table 2 shows the variation between the absolute values of the flood quantiles for 1% AEP estimated by different distributions for QLD, VIC, NSW and SA.It is found that the variation of the absolute values of the flood quantiles for 1% AEP gets bigger if there is no censoring.Figure 12 depicts the percentage of the variation in flood quantiles (Q10, Q20, Q50, and Q100) estimated using two methods: LP3 with censoring and GEV without censoring.As shown, the variation for Q10 ranges between −2.91% and 51.09%, and for eight stations, the quantiles estimated by GEV were greater than the quantiles estimated by LP3.The variation for Q20 ranges between −7.29% and 60.71%, and for 12 stations, the quantiles estimated by GEV were greater than the quantiles estimated by LP3 distribution.The variation for Q50 ranges between −30.21% and 77.79%, and for 103 stations, the quantiles estimated by the GEV distribution were greater than the quantiles estimated by the LP3 distribution.The variation for Q100 ranges between −59.79% and 87.21%, and for 178 stations, the quantiles estimated by GEV were greater than the quantiles estimated by LP3.It was also demonstrated that the number of stations with a negative difference in GEV distribution increases with 10%, 5%, 2%, and 1% AEPs, respectively.Interestingly, the Table 2 shows the variation between the absolute values of the flood quantiles for 1% AEP estimated by different distributions for QLD, VIC, NSW and SA.It is found that the variation of the absolute values of the flood quantiles for 1% AEP gets bigger if there is no censoring.Figure 12 depicts the percentage of the variation in flood quantiles (Q10, Q20, Q50, and Q100) estimated using two methods: LP3 with censoring and GEV without censoring.As shown, the variation for Q10 ranges between −2.91% and 51.09%, and for eight stations, the quantiles estimated by GEV were greater than the quantiles estimated by LP3.The variation for Q20 ranges between −7.29% and 60.71%, and for 12 stations, the quantiles estimated by GEV were greater than the quantiles estimated by LP3 distribution.The variation for Q50 ranges between −30.21% and 77.79%, and for 103 stations, the quantiles estimated by the GEV distribution were greater than the quantiles estimated by the LP3 distribution.The variation for Q100 ranges between −59.79% and 87.21%, and for 178 stations, the quantiles estimated by GEV were greater than the quantiles estimated by LP3.It was also demonstrated that the number of stations with a negative difference in GEV distribution increases with 10%, 5%, 2%, and 1% AEPs, respectively.Interestingly, the stations showed a positive difference in flood quantiles by the GEV, which decreases with AEPs of 10%, 5%, 2%, and 1%.Compared to Rahman et al. (2014), the variations of the flood quantiles Q100 for 10 stations in Eastern Australia, estimated by two methods, LP3 with the MGB test and GEV with L-moments, range between −18.19% and 38.01%.Figure 12 shows that the variations of the flood quantiles Q100 for 87% of the stations (465/534) fall within this range.As a result, these findings highlight the similarity in the results in flood quantile estimates between the LP3 and GEV distributions.
Water 2024, 16, x FOR PEER REVIEW 13 of 17 stations showed a positive difference in flood quantiles by the GEV, which decreases with AEPs of 10%, 5%, 2%, and 1%.Compared to Rahman et al. (2014), the variations of the flood quantiles Q100 for 10 stations in Eastern Australia, estimated by two methods, LP3 with the MGB test and GEV with L-moments, range between −18.19% and 38.01%.Figure 12 shows that the variations of the flood quantiles Q100 for 87% of the stations (465/534) fall within this range.As a result, these findings highlight the similarity in the results in flood quantile estimates between the LP3 and GEV distributions. Figure 13 depicts the absolute variation of flood quantiles for various AEPs ranging from Q1 to Q10000 between GEV with L-moments and LP3, with and without censoring.The comparison was performed on the stations that required censoring.As shown, the quantile variations are larger when there is no censoring.Thus, censoring improves the fitting of the LP3 distribution.Also, censoring enhances accuracy of flood quantiles, particularly at higher AEPs. Figure 13 depicts the absolute variation of flood quantiles for various AEPs ranging from Q1 to Q10000 between GEV with L-moments and LP3, with and without censoring.The comparison was performed on the stations that required censoring.As shown, the quantile variations are larger when there is no censoring.Thus, censoring improves the fitting of the LP3 distribution.Also, censoring enhances accuracy of flood quantiles, particularly at higher AEPs.

Relationship between Number of Censored Data Points and Catchment Characteristics
Regression analysis was conducted to investigate the relationship between the number of censoring data points in the AM flood series and catchment characteristics such as base-flow index (BFI) (volume factor and peak factor), mean annual rainfall (MAR), and the catchment shape factor (SF).For this, only South Australian (SA) stations are used.For the 28 SA stations, the catchment characteristics data were obtained from the ARR data hub.The regression results in Table 3 revealed that the MAR and SF are statistically significant, with p-values less than 0.05, indicating a possible association with the number of censoring data points.The volume and peak baseflow factors, on the other hand, did not exhibit statistical significance, as their p-values exceeded the 5% significance level.The correlation coefficient (multiple R) of 66% suggested a reasonable linear relationship (Table 3).Nevertheless, the coefficient of determination (R²), representing the goodness of fit, indicated a modest overall fit of 43%.This implies that the examined catchment characteristics explain only a limited portion of the variability in the number of censored points in the AM flood series in Eastern Australia.While the initial hypothesis focused on catchment characteristics, the results indicate a weak correlation, and, hence, we acknowledge the importance of exploring alternative hypotheses to better understand the origin of censored data in the AM flood series.It is crucial to discover other potential factors contributing to the presence of censored data.Future investigations should consider larger

Relationship between Number of Censored Data Points and Catchment Characteristics
Regression analysis was conducted to investigate the relationship between the number of censoring data points in the AM flood series and catchment characteristics such as base-flow index (BFI) (volume factor and peak factor), mean annual rainfall (MAR), and the catchment shape factor (SF).For this, only South Australian (SA) stations are used.For the 28 SA stations, the catchment characteristics data were obtained from the ARR data hub.The regression results in Table 3 revealed that the MAR and SF are statistically significant, with p-values less than 0.05, indicating a possible association with the number of censoring data points.The volume and peak baseflow factors, on the other hand, did not exhibit statistical significance, as their p-values exceeded the 5% significance level.The correlation coefficient (multiple R) of 66% suggested a reasonable linear relationship (Table 3).Nevertheless, the coefficient of determination (R²), representing the goodness of fit, indicated a modest overall fit of 43%.This implies that the examined catchment characteristics explain only a limited portion of the variability in the number of censored points in the AM flood series in Eastern Australia.While the initial hypothesis focused on catchment characteristics, the results indicate a weak correlation, and, hence, we acknowledge the importance of exploring alternative hypotheses to better understand the origin of censored data in the AM flood series.It is crucial to discover other potential factors contributing to the presence of censored data.Future investigations should consider larger datasets and additional catchment characteristics, including those from other states, to provide a more comprehensive understanding of this phenomenon.

Conclusions
This study focused on the treatment of low-flow outliers in flood frequency analysis (FFA).We employed log-Pearson type 3 (LP3) and generalized extreme value (GEV) distributions to carry out FFA using FLIKE software.Of the 582 stations, 71% required low-flow censoring when applying the LP3 distribution in FFA.This study shows that the multiple Grubbs and Beck (MGB) test is quite effective in Australia to identify low-flow outliers in annual maximum (AM) flood data.In many cases, about one third of AM flood data points at a given station need censoring to fit the LP3 distribution.North-western Victoria required more censoring than other parts of Australia.This part of the study area is drier compared to the other parts, which means that there are many low AM flow data points in the AM flood series due to frequent droughts.In cases using the GEV distribution with L-moments, censoring may not be needed, as the L-moments method is less affected by the presence of outliers compared to ordinary product moments.Censoring is generally needed for the LP3 and log-normal distributions, since these two distributions require the log-transformation of AM flood data series before fitting a distribution.It should be noted that log-transformation provides higher weight to data points representing smaller peak discharges in the AM flood series.The record length of a station after censoring should not be too small to ensure that the FFA results are not much affected by sampling errors.However, censoring does not mean discarding the low-flow outliers in fitting a probability distribution to AM flood data.The impact of censoring was more pronounced for smaller return periods than larger ones, as expected.For South Australia, the number of censored low-flow data points in the AM flood series was found to have a modest correlation with catchment characteristics (base-flow index, mean annual rainfall, and catchment shape factor).Given the fact that the correlation was modest, additional research is needed to confirm this finding using data from all Australian states and including additional catchment characteristics that could affect censoring.The findings of this study may be useful to update the FFA methods recommended in Australian Rainfall and Runoff (national guide) [34].

Figure 1 .
Figure 1.Geographical distribution of the selected 582 stream gauging stations in eastern Australia.

Figure 2 .
Figure 2. Distribution of record length of the AM flood data and catchment areas of the selected 582 stations.

Figure 1 .
Figure 1.Geographical distribution of the selected 582 stream gauging stations in eastern Australia.

Figure 1 .
Figure 1.Geographical distribution of the selected 582 stream gauging stations in eastern Australia.

Figure 2 .
Figure 2. Distribution of record length of the AM flood data and catchment areas of the selected 582 stations.

Figure 2 .
Figure 2. Distribution of record length of the AM flood data and catchment areas of the selected 582 stations.

Figure 3 .
Figure 3. Number of stations required censoring in this study from 582 stations.

Figure 3 .
Figure 3. Number of stations required censoring in this study from 582 stations.

Figure 4 .
Figure 4. Fitting of GEV distribution to station 226222 with L-moments.

Figure 4 .
Figure 4. Fitting of GEV distribution to station 226222 with L-moments.

Figure 4 .
Figure 4. Fitting of GEV distribution to station 226222 with L-moments.

Figure 7 .
Figure 7. Location of stations that required censoring with LP3 distribution (415 stations).Figure 7. Location of stations that required censoring with LP3 distribution (415 stations).

Figure 7 .
Figure 7. Location of stations that required censoring with LP3 distribution (415 stations).Figure 7. Location of stations that required censoring with LP3 distribution (415 stations).

Figure 8 .
Figure 8. Percentage of AM flood data points that needed censoring.

Figure 9 .
Figure 9. Location of annual maxima (AM) flood data stations that do not need censoring (167 stations).

Figures
Figures S1-S6 (can be seen in the Supplementary Materials) show how the censoring of PILFs affects the fitting of the LP3 distribution to the AM flood data for station 215004

Figure 8 . 17 Figure 8 .
Figure 8. Percentage of AM flood data points that needed censoring.

Figure 9 .
Figure 9. Location of annual maxima (AM) flood data stations that do not need censoring (167 stations).

Figures
Figures S1-S6 (can be seen in the Supplementary Materials) show how the censoring of PILFs affects the fitting of the LP3 distribution to the AM flood data for station 215004

Figure 9 .
Figure 9. Location of annual maxima (AM) flood data stations that do not need censoring (167 stations).

Figures S1-
Figures S1-S6 (can be seen in the Supplementary Materials) show how the censoring of PILFs affects the fitting of the LP3 distribution to the AM flood data for station 215004 from NSW.It shows the difference in fitting the LP3 distribution by censoring 0, 5, 13, 21, 23, and 31 data points, respectively.The application of the MGB test identifies, in total, 31 PILFs, and the fitting of the LP3 distribution appears to be the best in the last graph (FigureS6) (where all the 31 censored points are applied) compared to the first one (FigureS1) (where there are 0 censored point).Similar to Lamontagne et al.[24], the censoring of PILFs using the MGB test improves the fitting of the LP3 distribution to the AM flood data series at a given station.After the application of the censoring procedure with the LP3 distribution, Figure10displays a new distribution of data lengths (after censoring), which shows that 48 stations have censored data lengths less than 20 years (and 534 stations have a censored data length of 20 years or higher).As a 20-year data length is too small to carry out meaningful FFA, these 48 stations were excluded from further analysis.However, there are 165 stations that have a censored data length in the range of 20 to 29 years.Generally, a 30-year data length is considered to be adequate for a meaningful FFA; hence, caution should be exercised when interpreting the FFA results of these stations (having a data length smaller than 30 years).

Water 2024 ,
16,  x FOR PEER REVIEW 11 of from NSW.It shows the difference in fitting the LP3 distribution by censoring 0, 5, 13, 23, and 31 data points, respectively.The application of the MGB test identifies, in total, PILFs, and the fitting of the LP3 distribution appears to be the best in the last graph (Figu S6) (where all the 31 censored points are applied) compared to the first one (FigureS(where there are 0 censored point).Similar to Lamontagne et al.[24], the censoring PILFs using the MGB test improves the fitting of the LP3 distribution to the AM flood da series at a given station.After the application of the censoring procedure with the LP3 distribution, Figure displays a new distribution of data lengths (after censoring), which shows that 48 statio have censored data lengths less than 20 years (and 534 stations have a censored data leng of 20 years or higher).

Figure 10 .
Figure 10.Distribution of new record length of the AM flood data after censoring.

Figure 10 .
Figure 10.Distribution of new record length of the AM flood data after censoring.

Figure 13 .
Figure 13.Absolute variation (%) in flood quantile estimates between GEV and LP3 distributions (with and without censoring) for the stations that required censoring.

Figure 13 .
Figure 13.Absolute variation (%) in flood quantile estimates between GEV and LP3 distributions (with and without censoring) for the stations that required censoring.

Table 1 .
Summary of literature review on cennsoring of outliers in flood frequency analysis.

Table 2 .
Variation between the absolute values of flood quantiles for 1% AEP for the stations that required censoring.

Table 2 .
Variation between the absolute values of flood quantiles for 1% AEP for the stations that required censoring.

Table 3 .
Linear regression outputs exhibiting relationship between censored points and catchment characteristics.