Selection of Hydrological Probability Distributions for Extreme Rainfall Events in the Regions of Colombia

Frequency analysis of extreme events is used to estimate the maximum rainfall associated with different return periods and is used in planning hydraulic structures. When carrying out this type of analysis in engineering projects, the hydrological distributions that best fit the trend of maximum 24 h rainfall data are unknown. This study collected maximum 24 h rainfall records from 362 stations distributed throughout Colombia, with the goal of guiding hydraulic planners by suggesting the probability distributions they should use before beginning their analysis. The generalized extreme value (GEV) probability distribution, using the weighted moments method, presented the best fits of frequency analysis of maximum daily precipitation for various return periods for selected rainfall stations in Colombia.


Introduction
Frequency analyses of extreme events are used to estimate maximum rainfall associated with different return periods [1][2][3], and their results are used to plan stormwater network projects, longitudinal dikes, overflows, drainage channels, cofferdams, gutters, circular and box culverts and bridges, among other infrastructure works [4,5]; they can also be used to carry out erosion analysis in hydrographic basins [6].
In recent years, due to the influence of global warming as well as changes in the magnitude and patterns of extreme precipitation events, it is necessary to periodically update the magnitudes of the maximum rainfall that are used to design hydraulic works [7]. In particular, extreme weather events such as floods, droughts and storms can increase in frequency over time [8][9][10]; thus, it is necessary to determine probability functions that best represent current trends in the data.
In Colombia, there are several meteorological factors that influence the climate and therefore the maximum precipitation over a 24 h period, among which are: (i) the relative position of subtropical high pressure centers, (ii) the equatorial convergence zone, (iii) the intertropical front, (iv) the prevailing winds, and (v) the effects of the local topography [11]. It is recommended that each region be analyzed (Andean, Caribbean, Pacific, Orinoquía and Amazonas) to take into account the geographic variability in maximum precipitation. The Institute of Hydrology, Meteorology and Environmental Studies (Instituto de Hidrología, Meteorología y Estudios Ambientales -IDEAM) is the governmental entity in Colombia that operates and manages the maximum 24 h rainfall records. However, regional autonomous corporations are also responsible for compiling hydroclimatological records.
When carrying out projections of maximum rainfall associated with specific return periods, it is necessary to perform frequency analysis [12][13][14]. In frequency analysis of extreme precipitation events, the hydrological probability distribution that best represents the trend of maximum 24 h rainfall data can be determined using functions such as the generalized extreme value (GEV) [15], Gumbel [1,3,13], log-Pearson type III [1,16], normal [3] and Pearson type III [17]. The parameters of the probability distributions are determined mainly by applying the method of maximum likelihood (ML) or the method of weighted moments (WM) [3,18]. To select the probability distribution function that best fits the trend of the data, different goodness of fit tests are usually used, such as the chi-square test or the Kolmogorov-Smirnov test [19][20][21]. The ML method uses a lot of calculations for determining parameters of hydrological distributions. Despite, the WM method is simpler than the ML method; it provides a good accuracy in the estimation parameters. In this sense, Mahdi & Cenac [22] showed that the Gumbel probability distribution was fitted adequately using the WM method than the ML method. A similar analysis showed how the WM method predicted better the behavior of extreme values using the GEV and Log-Pearson Type III distributions than the ML method; however, the Log-Normal distribution with the ML method provides the best prediction [23]. The Log-Pearson III distribution uses the SAM method for estimating parameters of extreme values.
Typically, to design hydraulic structures, a return period must be selected that varies between 5 and 100 years depending on the importance of the structure. In Colombia, Resolution 0330 of 2017 [24] outlines the return periods that should be used for urban drainage projects, the Manual on Drainage Design for Highways [25] provides the values for road works, and international recommendations are often used for other types of structures. An inadequate selection of a hydrological distribution could oversize or undersize a hydraulic structure, then the current research provides a starting point for selecting hydrological distributions since there has not been any official recommendation.
However, the probability distribution that should be used to make the statistical projections is never known a priori [26]. Therefore, in this study, we analyzed 362 stations with 24 h maximum rainfall records distributed throughout Colombia. The most representative probability distributions in each region of Colombia were selected and analyzed using the Gumbel, log-Pearson type II, Pearson, normal and GEV distributions and the chi-squared goodness of fit test. This study can be used by designers and engineers to determine a priori the hydrological distribution that should be used in a particular project.

Case Study
Colombia was selected as a case study ( Figure 1) to determine the hydrological distributions that best represent the trend in the maximum 24 h rainfall data. During the compilation of the maximum 24 h rainfall records in Colombia, the following aspects were taken into account for each station: a minimum recording period of 30 years, eliminating outliers, using the entire available recording period and ensuring that the stations were distributed throughout each of the five regions that make up Colombia (Caribbean, Pacific, Andean, Orinoquía and Amazonas). Table 1 and Figure 2 show the number of stations analyzed in Colombia. The maximum 24 h rainfall records were obtained from the IDEAM (Institute of Hydrology, Meteorology and Environmental Studies), which is the more important database in Colombia for collecting rainfall records. The stations in each region were selected to ensure they were distributed over the entire study area and had at least 30 years of records.   The results of Table 1 show that the Andean region represents 69% of the stations compiled, the Caribbean region 16%, the Pacific region 10% and the Orinoquía and Amazonas regions 3% and 2%, respectively. It is important to bear in mind that the regions with the lowest percentage of stations used in the present study (Orinoquía and Amazonas) also have the fewest stations installed. Appendix A shows the codes of the stations with maximum 24 h rainfall data. Figure 2 shows the distribution of the stations used in each region of Colombia.

Methodology
The methodology used to determine the hydrological distribution that best represents the trends in 24 h maximum rainfall data associated with different return periods is presented as follows.

Selection of Rainfall Stations
The 24 h maximum rainfall records were collected from 362 rainfall stations distributed across Colombia (see Appendix A). Once the 362 stations with maximum rainfall records were selected, the error percentage of the selected stations with respect to the total installed stations in Colombia was 7determined. The equation used for a finite population is shown below [27]: where n = sample size, compiled from 362 stations; N = population size, of 2977 stations installed by IDEAM; = the level of confidence chosen, assumed at 95%; Z α = z value (where z is a normal centered and reduced variable), which leaves a proportion of the individuals out of the interval ±Z α ; p = proportion at which the variable studied occurs in the population; q = 1 − p. The most critical condition was assumed (p = q = 0.5); e = estimation error. Taking into account each of the previous variables, an estimation error of 4.83% was obtained.

Frequency Analysis
For each of the 362 stations, the annual series of maximum precipitation values was adjusted over 24 h with the Gumbel, GEV, Log-Pearson, Pearson and Normal probability distributions using the Hyfran Version 1.1 program [28].
The Gumbel distribution has typically been used to adjust the maximum 24 h precipitation values for different return periods. Parameters of this function are determined based on the recorded data. Its probability density function is given by: where ( ): probability density function : random variable u: mean of the data ∝: scale parameter • GEV distribution The generalized extreme value distribution is widely used by hydrologists worldwide and in Colombia due to its versatility.
where k: shape parameter. If k = 0, then the Gumbel distribution is obtained (see Equation (2)).
• Pearson type III distribution This distribution is characterized by taking the gamma function to perform the frequency analysis and has three parameters that must be determined when performing the probabilistic adjustment.
where γ: gamma function. β: location parameter. • Log-Pearson type III distribution By taking the natural logarithm of the Pearson type III distribution, the following distribution is obtained, which also consists of three parameters: The Normal distribution can be applied for estimating maximum daily precipitation for several return periods: where, and are the parameters of the distribution.

Goodness of Fit Test and Methods of Estimation of Parameters
The chi-squared test was used as a measure of goodness of fit to evaluate whether the probability distribution adequately fit the trend of the data.
where 2 : value of the chi-square test, : recorded value, : modeled value. To adjust the parameters of each probability function, the methods of the ML, WM, and SAM were employed using the Hyfran program.
The methods of estimation of parameters were used for the following hydrological distributions: the GEV distribution, the ML and WM; the Gumbel distribution, the ML and WM; the Pearson Type III distribution, the ML and WM; the Log-Pearson Type III distribution, the SAM; and the Normal, the ML.

Selection of Hydrological Distribution
To select the best hydrological distribution the following analysis was conducted:

•
For each rainfall stations the mean, maximum and minimum values, and standard deviation of the chi-squared test were computed for the Gumbel-ML, Gumbel-MV, Log-Pearson Type III-SAM, Pearson Type III-ML, Pearson Type III-WM, Normal-ML, GEV-ML and GEV-WM. These eight methods were used because they have adequately fitted the trend of maximum daily precipitation in various publications [22,23]. Based on this analysis, a regional mean value of the chi-squared test for Colombia was calculated based on the number of stations using a weighted mean.

•
Estimation of percentage that establishes times where a hydrological distribution reaches the best fits of the trend of maximum daily precipitation records considering the minimum value of the chi-squared test.

Analysis of Results
This section presents the results that determine which probability density function best fits the 24 h maximum rainfall data of the 362 stations located in Colombia and should therefore be included in the maximum precipitation projections associated with different return periods. The error percentage of the selected rainfall stations was computed using Equation (1), obtaining a value of 4.83% based on the total number of rainfall stations of the IDEAM database.
Taking into account the methodology previously presented, the results presented in Table 2 were obtained. The results should be interpreted in a way that allows planners to know a priori the hydrological distributions that can occur in the regions of Colombia to save calculation time. Based on the results in Table 3 the following can be deduced: • In all regions of Colombia, the best fits of the chi-squared test were obtained with the GEV probability distribution. The weighted moment method best fits the parameters for this distribution and has an average regional value for Colombia of 5.04. There are other probability distributions that also fit the trend of the data similarly well: GEV with the maximum likelihood method, Gumbel with the weighted moment and maximum likelihood methods and Pearson's with the method of weighted moments. The Gumbel distribution using the WM method brings a better estimation of maximum daily precipitation for several return periods in comparison with the ML, obtaining a similar result reported in the literature [22].
• In Colombia, the poorest fits were obtained when employing the Pearson type III probability distribution with the maximum likelihood method, where an average value of the chi-square test of 56.57 was obtained, and the log-Pearson type III distribution with the SAM method which had a value of 10.31. This finding is also verified by analyzing the maximum and minimum values and the standard deviation in these probability functions.

•
In the Amazonas region, the best fit in the chi-squared test was obtained with the GEV probability distribution and the weighted moment method, with a value of 4.36. This value may have been obtained because few stations were used in the analyses. Table 3 shows values of chi-squared test for a sample of rainfall stations in Colombia in order to show how a hydrological distribution is selected in each rainfall station. The green cells represent the obtained minimum values that best fits a hydrological distribution. It is of utmost important to mention that a rainfall station can be represented by various hydrological distributions, for instance, Doña Juana rainfall station (Andean region) can be simulated using the Gum-WM, LP-SAM, Pea-ML, Pea-WM, GEV-ML, and GEV-WM since these present a chi-squared value of 1.53.  Table 4 shows, for a hydrological distribution, the best agreement using the minimum value of the chi-squared test considering the ML, MP, or SAM methods, which are marked in blue cells. Andean, Caribbean, Pacific, and Orinoquía regions were adjusted appropriately by the GEV distribution (using ML or WM method) with percentages of 52, 44, 54, and 73%, respectively, which implies the percentage of rainfall stations where the GEV distribution reaches the minimum value of chi-squared test (best agreement). The Gumbel and Pearson Type III fit adequately the parameters in the Amazonas region with a value of 60%. The GEV distribution presents the best fit with an overall value of 52%. Results are in agreement with the study conducted by Gonzalez-Alvarez et al. (2019) for the Caribbean region [24]. Since the Gumbel distribution corresponds to the scenario when the parameter k = 0 for the GEV distribution, then the percentage when both the Gumbel and GEV distribution is achieved using ML and WM methods is shown in Table 5. According to the analyzed sample, the 74% of rainfall stations in Colombia can be simulating using these hydrological distributions since the minimum values of the chi-squared test are reached. To know the actual ranges of maximum daily precipitation for various return periods and the spatial variability in each region, the GEV distribution was applied to the analyzed rainfall stations. A summary of extreme values is presented in Table 6. Considering a return period of 100 years, the minimum value is reached in Andean region with a value of 42.6 mm (gray cell); and the maximum value is obtained in Caribbean region reaching an extreme precipitation of 306 mm (green cell). It is important to mention that there are no rainfall stations located in all departments in each region: in Andean region, Manizales and Norte de Santander are missing; in Caribbean region, La Guajira; in Pacific region, Chocó and Nariño; and in the Amazonas region, Guainía.

Conclusions and Recommendations
To estimate the maximum daily rainfall associated with different return periods for a particular project, it is recommended that designers and planners use the following hydrological distributions: the GEV, with the weighted moments and maximum likelihood methods; the Gumbel, with weighted moments and maximum likelihood; and the Pearson, with weighted moments. It is of utmost important to note that the GEV hydrological probability distribution (weighted moments method) best fits the trend of the data in all regions of the country.
For future studies, it is recommended to collect more data in the Amazonas and Orinoquía regions and to apply other goodness of fit tests. Similarly, it is recommended to perform a similar analysis using distributions that analyze non-stationary trends to evaluate the impact of climate effects, where the changes over time of rainfall records can be identified. This kind of analysis should be implemented for all regions in Colombia.