1. Introduction
Natural disasters are well-established in the literature as important contributors to disastrous events all over the world [
1]. Flooding is one of the natural disasters that poses damage to property, infrastructure, and human lives [
2,
3]. Homes, businesses, and infrastructure, such as power lines, bridges, and highways, can all be destroyed by floods [
2,
4,
5]. This might cause major financial losses and interfere with daily lives in the affected communities [
1,
6,
7]. Floods can cause loss of life, particularly in areas with poor infrastructure, lack of proper drainage systems, and low-lying regions [
6]. If people are caught off guard, it can be difficult for them to flee from strong currents and deep waters. Flooding has the potential to pose a threat to food security since it may result in the loss of livestock and top fertile soil due to washing away [
8,
9]. Flooding has become a prevailing natural disaster in the region of Southern Africa. According to Boudrissa et al. [
10], a fundamental difficulty in climatology is determining the possible threats posed by severe rainfall in order to protect people and property.
This entails determining the frequency and severity of disasters and unforeseen events, such as floods. The goal now is to anticipate and minimise the consequences of these events to the greatest extent possible. Southern Africa experiences frequent occurrences of floods, which have become a prevalent form of natural calamity. As a result of its geographic location, South Africa is among nations that have encountered difficulties associated with flooding [
11]. The focus of this study is specifically directed towards the KwaZulu-Natal province in South Africa, where a significant event of severe flooding occurred on 11 April 2022. According to Naidoo et al. [
12], during this specific period, the floods caused extensive destruction to infrastructure and property, tragically resulting in the loss of human lives as well as livestock. This province is vulnerable to flooding because it is located along the Indian Ocean on the east coast of South Africa, which receives high rainfall, especially in summer, due to warm ocean currents and tropical air masses [
2].
This eastern coast is known for extreme flooding due to its landscape and geographical position. According to Du Plessis and Burger [
13], as the global climate continues to change, the frequency and intensity of extreme weather phenomena like heavy rainfall and flooding is anticipated to increase. Concerns about rising greenhouse gas emissions from industrialised nations, which raise global temperatures and alter other climate factors like precipitation and evaporation, are spreading across the globe [
14,
15]. This could make floods in KwaZulu-Natal province and other regions more frequent and severe in the future. Coles [
16] defines extreme value theory (EVT) as the study of very rare events.
The current study employs modern techniques of EVT to model the floods in KwaZulu-Natal province. These models include the blended generalised extreme value distribution (bGEVD) and the generalised extreme value distribution for the r-largest order statistics (GEVDr). These models are extensions of the generalised extreme value distribution (GEVD) [
17,
18]. The statistical approach known as EVT is frequently used to analyse extreme events such as floods [
16]. The literature is scarce on the use of new advanced EVT techniques such as the bGEVD in analysing flood events [
17,
18]. This study explores the use of this advanced EVT method in analysing extreme flood events in selected stations of the KwaZulu-Natal province.
The rest of the paper is structured as follows:
Section 2 presents a review of the relevant and related literature, while
Section 3 summarises the key findings of this study.
Section 4 outlines the methods and techniques employed in this study.
Section 5 discusses the results and findings corresponding to each method used. Finally,
Section 6 provides the conclusions, recommendations, and directions for future research.
2. Related Literature Review
Finance, insurance, engineering, hydrology, and climatology are just a few of the many fields where EVT is frequently used to analyse extremes and associated return levels [
14]. The literature reveals that EVT has been a fundamental statistical approach in applied sciences for over five decades [
16,
17,
19]. According to Maposa et al. [
19], many statistical applications frequently ignore extreme values as outliers in favour of the mean and other measures of central tendency. Nonetheless, the focus in rare or extreme events is on the tails of the underlying data distribution. These rare or extreme events, which are often referred to as outliers, are uncommon and severe events that are usually dropped during data cleaning and analysis [
19]. According to Ferreira and De Haan [
16,
20], the distribution of maxima over blocks is usually approximated using the GEVD. However, Castro-Camilo et al. [
17] noted that its features might not always be applicable to some given data. These authors used the bGEVD, which combines the right tail of an unbounded-support Fréchet distribution with the left tail of a Gumbel distribution. They also employed a method known as property-preserving penalised complexity priors to establish the first and second moments of the GEVD beforehand.
Castro-Camilo et al. [
17] provided a new parameterisation of the GEVD that offers a more realistic interpretation of the characteristics of the model that provides useful priors. They demonstrated the effectiveness of their strategy using simulations and applying it to nitrogen dioxide pollution levels in California. Overall, their new approaches provided improvements over the traditional GEVD models in several instances. A brand-new technique for creating geographic maps of return level estimates for the annual maxima of sub-daily precipitation was put forth by [
18]. In order to simulate annual precipitation maxima, the study employed a Bayesian hierarchical model with a latent Gaussian field and the bGEVD. To improve the efficacy of inference, the authors utilised a unique two-step approach to model the scale parameter of the bGEVD by analysing peaks-over-threshold (POT) data.
The stochastic partial differential equation technique and integrated nested Laplace approximations (INLA), both employed in R-INLA, were used in inference in [
18]. By utilising numerical approximations instead of sampling-based inference methods such as Markov chain Monte Carlo (MCMC), the INLA framework offers a substantial acceleration in computation speed. Additionally, heuristics for enhancing numerical stability with the GEVD and bGEVD were provided in a study by [
18]. In their study, the model more rapidly produced high-resolution return level maps with uncertainty by being fitted to the yearly maxima of sub-daily precipitation from the South of Norway. Modelling the yearly maxima of sub-daily precipitation with the bGEVD resulted in a better model fit overall than with the usual inference techniques.
Tibari [
21] concentrated on the assessment of hydrological extremes, which have substantial socio-economic implications and are essential for the planning and design of hydraulic structures. The researcher conducted a comparison between two statistical modelling approaches for extreme hydrological events: block maxima (BM) and peak-over-threshold (POT). Tibari [
21] investigated how future projected changes in extreme hydrological events are impacted by the chosen method, particularly when simplifications are applied for large-scale studies. The outcomes suggest that both the BM and POT methods align in indicating the direction of changes in flood and extreme precipitation intensities, but they diverge in terms of magnitude. The disparity between the two methods becomes more pronounced for more extreme events. Additionally, the variation in results is dependent on the season.
Miniussi et al. [
22] used the daily mean streamflow records from 5311 stream gauges in the continental United States, obtained from the U.S. Geological Survey, to analyse and develop a tailored Metastatistical Extreme Value Distribution (MEVD) for flood frequency analysis. The performance of the MEVD was compared with two commonly used distributions, namely the GEVD and Log-Pearson Type III (LP3), and the role of the El Niño Southern Oscillation (ENSO) in flood generation was investigated. The study found that the MEVD outperforms the GEVD and LP3 distributions in approximately 76 and 86 percent of the stations, respectively. The MEVD showed a significant improvement in the accuracy of quantiles corresponding to return periods that are much larger than the standard sample size.
Vasiliades et al. [
23] employed the GEVD to analyse nonstationarity in the annual maximum daily rainfall time series in Greece and Cyprus. The parameters of the GEVD were modelled as functions of time-varying covariates, and the conditional density network (CDN), which is an extension of the multilayer perceptron neural network, was employed to estimate these parameters. The model parameters were estimated using the generalised maximum likelihood (GML) approach with the quasi-Newton BFGS optimisation algorithm. The appropriate GEV-CDN model architecture for each meteorological station was selected based on the Akaike information criterion or the Bayesian information criterion. The findings of the study demonstrated the application of the GEV-CDN model for assessing nonstationarity in extreme rainfall events. The results highlighted the importance of considering temporal variability in hydrometeorological processes when conducting extreme value analyses. Maposa et al. [
24] employed the GEVD to model annual flood heights in the lower Limpopo River basin of Mozambique. The study focused on four different time series models: annual daily maxima (AM1), annual maxima 2 days (AM2), annual maxima 5 days (AM5), and annual maxima 10 days (AM10). The results indicated that the AM5 model was suitable for analysing flood heights at the Chokwe station, while the AM10 model was appropriate for the Sicacate station. In another study by [
25], a comparative analysis of parameter estimation methods for the GEVD was conducted in the lower Limpopo River basin of Mozambique. The MLE and Bayesian estimation methods were compared. The authors used MCMC Bayesian method to estimate the parameters of the GEVD, aiming to predict extreme flood heights and their return periods. The findings suggested that the Bayesian approach outperformed the MLE approach in terms of parameter estimation.
The analysis of rainfall extremes in East Africa in the context of climate change was conducted by [
26]. The research examined the impact of convection-permitting climate models compared to parameterised convection models in representing rare rainfall extremes. EVT and regional frequency analysis were employed to quantify these rare rainfall events using the CP4A convection-permitting model and its parameterised counterpart (P25), as well as the CORDEX-Africa ensemble and observational data for comparison. The findings revealed that the convection-permitting model (CP4A) aligns better with observations compared to the parameterised models. It was observed that the parameterised convection models exhibit unrealistic changes in the shape parameter of the extreme value distribution, resulting in significant increases in return levels for events with longer return periods (greater than 20 years). The findings suggest that parameterised convection models may not be suitable for analysing relative changes in rare rainfall events under climate change.
Chikobvu and Chifurira [
27] utilised GEVD to model the extreme minimum annual rainfall in Zimbabwe. They applied the GEVD to annual rainfall data spanning from 1901 to 2009. The results obtained from model diagnostics revealed that the minimum annual rainfall in Zimbabwe follows a distribution from the Weibull class. On the other hand, Boudrissa et al. [
10] conducted an analysis using GEVD to examine the annual maximum daily rainfall at selected stations in the northern region of Algeria. The empirical findings demonstrated that the Gumbel distribution provided a good fit for the Algiers and Miliana stations, while the Fréchet distribution was found to be more suitable for the Oran station. A study conducted by [
28] focuses on the modelling of monthly extreme rainfall in Somalia over a 116-year period using various statistical distributions. The models employed in the study include the GEVD, GPD, r largest order statistics, and point process (PP) characterisation. The optimal model was determined based on criteria such as negative log-likelihood, Akaike information criteria (AIC), and Bayesian information criteria (BIC). The findings of the study reveal that the GEVD, with specific parameter values, provides the best fit for modelling extreme rainfall in Somalia.
Sikhwari et al. [
29] utilised EVT to model maximum rainfall data in Limpopo Province, South Africa, covering the period from 1960 to 2020. The study employed the r-largest order statistics modelling approach and analysed yearly blocks of data spanning 61 years. The parameters of the selected model, namely the GEVD with r = 8, were estimated using the maximum likelihood method. The findings reveal that the estimated 50-year return level in the Thabazimbi area is 368 mm, indicating a 0.02 probability of rainfall exceeding this threshold within a fifty-year time frame. Another study conducted by [
30] employed spatial and spatio-temporal dependence modelling techniques to analyse extreme daily maximum rainfall data from selected weather stations in South Africa. The study utilised a combination of the GPD and the flexible Bayesian Latent Gaussian Model (LGM). The results highlighted the effectiveness of the spatio temporal GPD model in capturing systematic variation within a spatial and spatio-temporal modelling framework. The temporal component was modelled separately for weeks and months. To estimate the marginal posterior means of the parameters and hyperparameters for the Bayesian spatio-temporal models, the study utilised the INLA algorithm. The INLA technique facilitated Bayesian inferences and allowed for the prediction of return levels at each station, incorporating uncertainty arising from model estimation and the inherent randomness of the processes.
In their research, Singo et al. [
31] conducted a study in the Luvuvhu River Catchment in Limpopo Province, South Africa, focusing on the evaluation of flood risks using flood frequency models. The main objective of the study was to estimate flood risks by analysing the distribution of rainfall. For the flood frequency analysis, the researchers selected the Gumbel and LP3 distributions. The findings of the study indicated a notable rise in the occurrence of extreme events, leading to floods of greater magnitude. The study conducted by [
32] focused on a regional frequency analysis of the annual maximum series (AMS) of flood flows in the KwaZulu-Natal province of South Africa. The objective was to identify homogeneous regions and determine suitable regional frequency distributions for these regions. The study divided the area into two regions based on monthly rainfall concentrations. Region 1 encompassed the coastal and midlands area, while Region 2 covered the west north-western parts of the study area. The researchers found that the General Normal, Pearson Type 3, and General Pareto distributions were suitable for modelling the AMS of flood flows in Region 2. However, in Region 1, due to the occurrence of only a few flood events of extreme magnitude, no suitable regional frequency distribution could be identified.
Researchers have explored the application of EVT to predict extreme weather events in the field of climatology, specifically related to extreme occurrences. Their focus was mainly on the generalised Pareto distribution (GPD) and GEVD and less on modern techniques. The insufficient application of modern techniques highlights a methodological and literature gap that this study seeks to address. To enhance the applicability, dependability, and precision of EVT in modelling rainfall in KwaZulu-Natal province, this study will employ modern and advanced EVT models like bGEVD and GEVDr using Bayesian MCMC parameter estimation approach.
6. Discussion and Conclusions
This study presented a comparative analysis of advanced extreme value theory models, specifically the GEVD for r-largest order statistics and the bGEVD, applied to two meteorological stations in the KwaZulu-Natal province. The stations used in the study are Port Edward and Virginia. Data for these stations was provided by the South African Weather Service (SAWS). Prior to analysis, the data was cleaned and blocked into five blocks of maxima to facilitate the application of the r-largest GEVD method for order statistics.
The GEVD was fitted to each of the two stations. The MLE and Bayesian estimation methods were used to estimate the three parameters of the GEVD. The estimated parameters obtained from both methods were found to be relatively similar. Both approaches yielded positive values for the shape parameter, indicating that the Fréchet distribution is the most appropriate among the three GEVD classes. These findings were further validated by the confidence intervals of the shape parameter across all stations.
The GEVD for r-largest order statistics was fitted to the five largest blocks of maxima, consisting of rainfall values for each station. The MLE was used for parameter estimation, diagnostic plots were employed to assess the goodness of fit, and deviance statistics were utilised for model selection. The 95% confidence interval for the shape parameter suggests that a distribution within the Fréchet domain of attraction can effectively model the data from both stations.
The optimal block size for Port Edward was determined to be , and was determined to be the optimal block size for the Virginia station. The selection of the optimal block size is supported by the alignment of diagnostic plots with deviance test results, both of which confirm this conclusion. The return levels and their corresponding return periods were computed. The return levels obtained from the standard GEVD model and the GEVD for the r-largest order statistics model were found to be closely comparable and relatively consistent.
The bGEVD model was fitted to the time series data from the two stations of interest in the study area, KwaZulu-Natal province. The results suggest a negative time trend in the explanatory variables, indicating a decline in rainfall maxima over time. This decrease in rainfall maxima influences the parameter estimates, resulting in significantly lower return levels for the bGEVD model compared to the two previously fitted models: the standard GEVD and the GEVD for r-largest order statistics.
The assessment of the return levels for the modelling techniques employed in the current study is crucial for comparative analysis. Return levels provide invaluable insights into the statistical characteristics of extreme events. The findings regarding the return levels and their corresponding return periods are vital for performing accurate extreme value analysis and for determining the appropriate model for prediction and disaster assessment. Based on the findings of the study, the two comparable models, the standard GEVD and the GEVD for
r-largest order statistics, produce consistent results and therefore adequately model extreme rainfall in the KwaZulu-Natal province. Similar findings were obtained in by [
36], which focused on modelling the maximum average daily temperature in South Africa. In contrast, bGEVD yields lower return levels and its results are inconsistent with those obtained from the GEVD and GEVD for
r-largest order statistics.
Future Research
The findings of this study suggest that higher rainfall events with increased intensity and frequency may occur in the future within the KwaZulu-Natal province. Based on these results, the study proposes several future research directions that could help improve the accuracy and reliability of extreme rainfall models.
The current research focused exclusively on the KwaZulu-Natal province using EVT models such as the GEVD, the GEVD for r-largest order statistics, and the bGEVD. Future researchers are encouraged to apply advanced EVT models, such as the Kappa four-parameter distribution for r-largest (K4Dr), which may offer improved accuracy in extreme event modelling. Although the bGEVD model was employed in this study to analyse extreme floods, it was acknowledged that there are still limitations and a lack of extensive literature on the model. Therefore, future researchers are encouraged to further explore and refine the application of the bGEVD to contribute to the growing body of knowledge in the EVT field.
Additionally, this study found that Bayesian estimation methods slightly outperformed MLE in forecasting accuracy. Thus, it is recommended that future research should prioritise Bayesian methods, incorporating expert knowledge into estimation and forecasting processes for more robust results. Finally, while this study was confined to the KwaZulu-Natal province, future research should consider expanding the scope to cover all nine provinces of South Africa, using the suggested models. Such broader studies would offer better assessments of severe weather events and contribute valuable insights for policymakers to facilitate effective disaster mitigation strategies.
We acknowledge that the impacts of extremes can be mitigated through early warning systems and appropriate infrastructure, such as effective drainage networks; however, the complete elimination of natural hazards is beyond human capacity. Accordingly, the primary objective of this study is to address a methodological gap rather than to provide mitigation or preparedness guidance. Specifically, the study aims to conduct a comparative evaluation of model performance using applied data, namely rainfall observations. While some of the literature cited discusses mitigation strategies, this is not the primary focus of the present work.