Open Access This article is
- freely available
Water 2019, 11(3), 490; https://doi.org/10.3390/w11030490
Third-Order Polynomial Normal Transform Applied to Multivariate Hydrologic Extremes
Disaster Prevention and Water Environment Research Center, National Chiao Tung University, Hsinchu 300, Taiwan
Department of Civil, Environment, and Architecture, Korea University, Seoul 02841, Korea
Author to whom correspondence should be addressed.
Received: 26 December 2018 / Accepted: 4 March 2019 / Published: 8 March 2019
Hydro-infrastructural systems (e.g., flood control dams, stormwater detention basins, and seawalls) are designed to protect the public against the adverse impacts of various hydrologic extremes (e.g., floods, droughts, and storm surges). In their design and safety evaluation, the characteristics of concerned hydrologic extremes affecting the hydrosystem performance often are described by several interrelated random variables—not just one—that need to be considered simultaneously. These multiple random variables, in practical problems, have a mixture of non-normal distributions of which the joint distribution function is difficult to establish. To tackle problems involving multivariate non-normal variables, one frequently adopted approach is to transform non-normal variables from their original domain to multivariate normal space under which a large wealth of established theories can be utilized. This study presents a framework for practical normal transform based on the third-order polynomial in the context of a multivariate setting. Especially, the study focuses on multivariate third-order polynomial normal transform (TPNT) with explicit consideration of sampling errors in sample L-moments and correlation coefficients. For illustration, the modeling framework is applied to establish an at-site rainfall intensity–duration-frequency (IDF) relationship. Annual maximum rainfall data analyzed contain seven durations (1–72 h) with 27 years of useable records. Numerical application shows that the proposed modeling framework can produce reasonable rainfall IDF relationships by simultaneously treating several correlated rainfall data series and is a viable tool in dealing with multivariate data with a mixture of non-normal distributions.
Keywords:polynomial normal transform; multivariate modeling; sampling errors; non-normality; extreme rainfall analysis
In hydrosystem design, performance evaluation, and simulation, the problems often involve multiple random variables that are correlated with a mixture of non-normal marginal distributions. Under this condition, it is generally difficult, if not impossible, to establish an analytical joint probability distribution for these variables. In comparison with univariate distributions, there are relatively few analytical multivariate distribution functions under special combinations of parametric marginal distributions, and most of them are of the same type, which can be found in [1,2]. Examples of using analytical multivariate distributions in hydrology are bivariate Gamma distribution  and bivariate generalized extreme distribution . Their use is somewhat limited to many practical problems because of different marginal distributions.
Due to the difficulty in establishing a truly multivariate joint distribution model for problems involving mixtures of several correlated, non-normal variables, approximated approaches, such as copula or normal transform, are often used by preserving marginal distributions or moments, including the correlation features among the variables. However, one should realize that, unlike using a true multivariate joint distribution function, preservation of the marginal distributions and dependence structure represents the retention of partial information of the concerned multivariate random variables in the analysis .
The concept of copula is one type of approximated multivariate approaches that has recently received tremendous attention and applications by researchers in various disciplines, including in hydrology . Some examples of applying copula in multivariate hydrologic modeling can be found in analyzing floods [7,8], droughts [9,10,11,12], dam safety , and extreme rainfalls [14,15]. Most of the copula-based applications deal with bivariate problems and some trivariate problems under some restrictive conditions on correlation structures . Applications of copula to higher dimension multivariate problems are rare primarily because there are only a few copula families that are rather restrictive in describing the dependence structure. Recently, the introduction of vine copulas has shown the advantage of overcoming the limitation of currently used copulas in multivariate analysis [16,17,18,19]. A copula-based approach is parametric by nature in that analytical marginal distribution models for the involved variables are specified.
Alternatively, another viable scheme in treating multivariate problems involving correlated non-normal random variables is to apply a NORTA (normal-to-anything) algorithm . By a NORTA algorithm, normal transformation of an individual non-normal variable is made by preserving its marginal probability content in the normal variable domain as with , respectively, being the cumulative distribution functions (CDFs) of the standard normal variable Z and the original variable X. In addition, a relationship must be established to allow the determination of an equivalent correlation coefficient, , of a pair of normal transformed variables, , from the correlation coefficient, , of the corresponding random variables, , in the original space. Once the correlation matrix of standard normal variables Z’s, , is obtained from that of the non-normal variables X’s, , appropriate orthogonal transformation can be implemented to transform the original correlated variables into uncorrelated standard normal space for analysis.
The determination of from is made through the Nataf transform , which requires solving an implicit non-linear equation in the form of a double integration involving marginal distributions of a pair of random variables,, under consideration:where , and = bivariate standard normal joint probability density function (PDF). Lebrun and Dutfoy  provide an insightful analysis of Nataf transform and uncover that it is a special modeling of dependence structure using Gaussian copula. To facilitate practical engineering applications, a set of empirical equations for 10 commonly used distribution functions has been established to relate to and their distribution properties . Such empirical relations were applied to reliability analysis of engineering systems [5,24]. Later, computationally more efficient methods based on root finding and linear search , the false position method , and the artificial neural network method  were proposed to solve Equation (1) for from the known and marginal PDFs of .
The above mentioned schemes (i.e., copula, NORTA, and Nataf transform) all require the stipulation of marginal PDFs. The stipulation of a distribution function implies knowing the complete statistical information of the random variable, including its moments of all orders. This ideal situation is attainable only when one has a large amount of data, which generally is not the case in practice. Therefore, to relax the information requirement without having to specify the distribution functions, third-order polynomial normal transform (TPNT) can be used. By TPNT, each individual non-normal random variable is related to a 3rd-order polynomial function of the corresponding standard normal variable . The polynomial coefficients are determined by matching the statistical moments or quantiles of the individual random variables. The multivariate version of TPNT was first proposed by Vale and Maurelli  to simultaneously consider statistical moments and correlation coefficients. A multivariate TPNT procedure has been applied to different fields including, but not limited to, Monte Carlo simulation for generating multivariate random variates [24,30,31,32,33], wind power modeling , load computation in power network planning , and reliability analysis [36,37].
It should be noted that the great majority of multivariate TPNT applications are done under the assumption of known marginal statistical moments (i.e., product-moments and L-moments) and correlation coefficients. However, in real-life hydrologic applications, the amount of available data generally is not sufficiently large to reliably ascertain the true marginal probability distribution functions, statistical moments, and correlation coefficients. Therefore, the sample statistical moments and correlation coefficients used could be subject to sampling errors. In this study, a procedure is proposed to (1) optimally estimate multivariate TPNT coefficients by explicitly incorporating sampling errors associated with the sample moments and correlation coefficients, and (2) comply with a one-to-one monotonicity increasing relation between quantiles of the original and normal transformed variables. The procedure is illustrated by analyzing annual maximum rainfall data series involving seven different durations to establish at-site rainfall intensity–duration–frequency (IDF) and depth–duration–frequency (DDF) relationships.
2.1. Third-Order Polynomial Normal Transform (TPNT)
2.1.1. Univariate TPNT
By TPNT, a univariate non-normal random variable, , is approximated by the standard normal variable, , in the form of a 3rd-order polynomial functional relation as where denotes the 3rd-order polynomial transform with a0, a1, a2, and a3 being the transformation coefficients. The TPNT coefficients can be determined by several methods of varying mathematical complexity. By preserving the first four product-moments, the TPNT coefficients are related to the first four product moments of the standardized variable, , as in which = skew coefficient; = kurtosis of the original random variable X. Alternatively, the TPNT coefficients in Equation (2) can also be related to the first four L-moments as in which = the mth order L-moment  of the original non-normal random variable, X. Other than the above two moment-matching methods, TPNT coefficients can also be determined by the quantile-based least square method and the Fisher–Cornish asymptotic expansion (FC) method . Chen and Tung  investigated the performance of different methods in determining the TPNT coefficients with regard to their accuracy and robustness in capturing the probabilistic features of the random variable X under the condition that the population distribution is known. It was found that, among the various methods for estimating TPNT coefficients, the L-moment based method is computational simplistic and can yield a satisfactory performance under a wide range of distribution conditions. The product-moment method can also yield a satisfactory normal transformation provided that accurate estimations of skew coefficient and kurtosis in Equations (3)–(6) can be made. However, when the statistical moments are to be estimated from finite data, the sample L-moments have been proven to be more stable and robust than those of product-moments , especially when the sample size is not large.
By referring to Equations (3)–(6), one also realizes that determining TPNT coefficients based on the product-moments requires solving a system of non-linear equations. It is expected that solving Equations (3)–(6) would be more difficult than solving L-moments based on Equations (7)–(10), which is linear. Sometimes, the solution to the system of non-linear equation may not be attainable. According to Equations (7)–(10), TPNT coefficients can be easily obtained in terms of L-moments as
In the transformation process, it is necessary to preserve probability content in both original space and standard normal space, i.e., . This implies that quantiles of the two variables should satisfy the following relationship:where = pth-order quantiles of random variable X and standard normal random variable, Z, respectively, that is, and . Furthermore, inherently embedded in Equation (15) is a requirement of one-to-one monotonically increasing relations between . This, then, requires that TPNT coefficients must comply with the following conditions:
It should be noted that the TPNT coefficients obtained from solving Equations (3)–(6), Equations (7)–(10), or other methods mentioned above do not guarantee the compliance of the monotonicity condition stipulated in Equation (16). This is especially a major concern when sample statistics are used in determining TPNT coefficients.
2.1.2. Multivariate TPNT
The TPNT coefficients can be determined by preserving the statistical moments of individual random variables. Specifically, L-moments are used herein to determine the multivariate TPNT coefficients due to simple, linear functional relationships between the TPNT coefficients and the L-moments as shown in Equations (11)–(14). Furthermore, sample L-moments have several desirable sampling properties over the product-moments as proven by Hosking . In the context of fitting the first four L-moments of a total of N correlated variables, Equations (7)–(10) can be re-written asin which = the mth order L-moment of the jth random variable Xj for j = 1, 2, …, N.
In addition to preserving marginal statistical moments of involved variables, multivariate TPNT must also simultaneously preserve the statistical dependence between random variables in the transformation. The correlation coefficient of any two correlated random variables, , is imbedded in their 2nd-order cross-product moment of which Vale and Maurelli  had shown the explicit expressions in terms of TPNT coefficients asin which = correlation coefficient of random variables and its equivalent in normal space; = mean and standard deviation of random variable , respectively. The correlation coefficient in the original scale, , is related to its counterpart in the normal space, , in a 3rd-order polynomial relationship through TPNT coefficients.
Upon the determination of TPNT coefficients for the two concerned random variables, the correlation coefficient in the normal space, , corresponding to that in the original space, , can be obtained by finding the real root of Equation (21). The mathematical relations between the two correlation coefficients are 
Equation (21) is used repeatedly to solve for for all pairs of correlated random variables to establish the correlation matrix in multivariate normal space.
2.2. Optimization Framework for Determining Multivariate TPNT Coefficients
2.2.1. Objective Function
To determine the multivariate TPNT coefficients that best preserve the known values of L-moments, the least-square criterion is used in the study by which the objective function can be expressed aswhere = a decision variable defining the deviation between the mth-order TPNT-based L-moments computed by the left-hand side of Equations (17)–(20) and the known values, , of the jth random variable, Xj. Of course, other forms of objective function, such as minimizing the sum of absolute deviations, can be used.
Several constraints are essential to make sure that multivariate TPNT coefficients obtained are able to preserve the known statistical features and mathematical relationships of concerned random variables.
(a) Preservation of L-moments for the individual variable Xj:
The deviation in the objective function defining the degree of preserving the known values of the first-four L-moments of individual variable Xj can be written, according to Equations (17)–(20), as
Note that the value of is unrestricted-in-sign, meaning that its value can be negative, zero, and positive, depending on the relative magnitudes of TPNT-based L-moments and those of the known values.
In reality, statistical properties of a random variable are estimated from a finite number of sample data. Consequently, sample L-moments of random variables, Xj, are subject to uncertainty. In practice, two approaches are used to estimate sample L-moments: plotting position-based estimators and unbiased estimators. This study adopts the latter by which the first four unbiased estimators of L-moments, , can be computed respectively as in which = sample estimator of the mth-order L-moment, ; = the ith ranked sample (ascending order) in a data of size n.
Suppose that the sampling distributions of sample L-moments are derived or approximated. Proper bounds can then be incorporated into Equations (24)–(27) for determining the suitable and probabilistically plausible TPNT coefficients for all N random variables X1, X2, …, XN. Assuming that the lower and upper bounds of the L-moments can be determined from their corresponding sampling distributions, constraint Equations (24)–(27) then can be modified asfor j = 1, 2, …, N. In Equations (32)–(35), are, respectively, the upper and lower bounds containing the unknown population mth-order L-moment of random variable Xj, . The derivation of bounds for unknown population L-moments is described in Section 2.3.1.
(b) Preservation of the monotonic probability–quantile relationship for individual variable Xj:
(c) Preservation of the correlation between all pairs of different variables, Xj and Xk:
Based on Equation (21), any pair of two correlated random variables must satisfy the following equation.where = cross-product moment of variables, Xj and Xk, defined in Equation (21), which are functions of the corresponding TPNT coefficients and ; and = correlation coefficients of random variables, , and their normal equivalents, , respectively; , = mean and standard deviation of random variable , respectively.
Similarly, constraint Equation (37) on correlation coefficients can be modified asin which = lower and upper bounds, respectively, of the unknown population correlation coefficient, , between the random variables Xj and Xk (see Section 2.3.2); = equivalent correlation coefficient of the random variables in the standard normal domain; = TPNT-based estimation of mean and standard deviation of random variables Xj which can be computed according to Equations (3) and (4) as
Equation (38) can alternatively be expressed as
In summary, by considering sampling errors of sample L-moments and correlation coefficients, the optimization model to determine the most plausible TPNT coefficients for establishing multivariate relationships can be summarized as follows:
The objective function is expressed in Equation (23) or its variations, which is subject to the following constraints:
- Equations (32)–(35) for preserving plausible L-moments (8 × N constraints);
- Equation (36) for complying with a probability–quantile monotonic relationship (2 × N constraints);
- Equations (41) and (42) for preserving plausible correlation coefficient (N × (N − 1) constraints); and
- unrestrictive-in-sign of polynomial coefficients and deviations .
2.3. Determination of Bounds for L-Moments and Correlation Coefficients
2.3.1. Bounds for L-Moments
To determine the bounds for L-moments, the sampling distributions corresponding to the sample L-moments are needed. For independent random samples of size n from a distribution function having the mth-order population L-moment , Hosking  showed that the statistic , with being the sample L-moment of order m, is unbiased, having a sampling distribution asymptotically converge to the normal distribution with the mean zero and variance . Therefore, the variance of the mth-order sample L-moment, , has the variance of . For the first four orders of sample L-moment, the value of can be computed byin which . To estimate the values of based on the ranked sample observations, the double integration stated in Equations (43)–(46) can be carried out numerically asin which = the ith ranked sample in ascending order, i.e., ; = estimated cumulative probability for the ith ranked sample, i.e., , by using the well-known Weibull plotting position formula, . Makkonen  has shown that the Weibull plotting position formula  provides the best estimate for the underlying non-exceedance probability. The superiority of the Weibull formula gets more pronounced with a decreasing sample size. By adopting the normality distribution assumption, the α-confidence interval for the unknown population can be obtained asin which , a standard normal quantile with an exceedance probability of , with being the inverse standard normal CDF.
2.3.2. Bounds for Correlation Coefficients
To quantify the lower and upper bounds of a correlation coefficient, Fisher transform is often used by which the sampling distribution of the inverse hyperbolic tangent function of sample correlation approximately follows a normal distribution as [45,46]in which = sample and population correlation coefficients, respectively; n = number of sample pairs. With a specified confidence level , the corresponding lower and upper bounds for the unknown population coefficient can be obtained aswhere the hyperbolic tangent function is defined as .
2.4. Solution Algorithm
A recursive procedure is proposed to solve the above optimization models for determining multivariate TPNT coefficients. The procedure consists of four steps of initialization, optimization, validation, and updating. Solution algorithm for determining multivariate TPNT coefficients considering sampling errors of L-moments and correlation coefficients is detailed below and outlined in Figure 1.
Step (1): Initialization—Since the problem involves a nonlinear optimization model, an initial solution would be needed. One straightforward and sound initial solutions for the TPNT coefficients, , are those provided by Equations (11)–(14) for all variables j = 1, 2, …, N. The initial TPNT coefficients obtained this way will automatically satisfy the constraint Equations (24)–(27). However, they do not necessarily comply with the monotonicity constraint Equation (36). As for the initial normal correlation coefficients, rather than arbitrarily choosing a set of initial correlation coefficients, let in which is the sample correlation coefficient between random variables . Alternatively, obtain a feasible set of initial by solving the 3rd-order polynomial function of in Equation (21) according to the initially assumed TPNT coefficients, .
Step (2): Optimization—Based on the initially adopted TPNT coefficients, , and the normal transformed correlation coefficients , solve the optimization model with objective function Equation (23) and constraint Equations (32)–(35), (36), and (41)–(42) for the optimal TPNT coefficients for random variable
Step (3): Validation—From the optimal feasible TPNT coefficients for j = 1, 2, …, N obtained from Step (2), determine the equivalent normal variates corresponding to the sample data by solving the 3rd-order polynomial function:where = unknown normal variate corresponding to the ith observation of the jth random variable under the optimal set of TNPT coefficients . From the normal-transformed data series of two different variables, and , the corresponding correlation coefficient, , in the normal space is calculated.
Step (4): Updating—Compare the discrepancies between the initialized and validated for all different pairs of concerned random variables. If the discrepancy in any pair of durations is judged to be significant, update the initial normal correlation as and TPNT coefficients , and the process from Steps (2)–(4) is repeated. Otherwise, the optimal solutions are obtained and the iteration stops.
With regard to the optimization step presented in Step (2), the sequential quadratic programming (SQP) algorithm is implemented . The SQP tackles a nonlinear optimization problem by successively finding the approximated optimum solution to the quadratic programming (QP) representation of the original problem. The approximated solution is improved iteratively by solving the QP problem. Boggs and Tolle  elaborated some useful properties of the SQP algorithm. The subroutine “sqp.m” in Matlab is used in this study to solve the optimization model.
3. Numerical Example
In this section, at-site rainfall intensity–duration–frequency (IDF) and depth–duration–frequency (DDF) relations are established to demonstrate the proposed multivariate TPNT method and examine its general performance. Rainfall IDF relations are widely used in the planning, design, and management of hydrosystem infrastructures, such as stormwater sewer systems and detention basins [49,50]. Such relations at a given location involves at-site frequency analysis of annual maximum rainfall intensity (or depth) data of several selected durations. The conventional approach in rainfall frequency analysis chooses a proper parametric probability distribution model to individually fit the observed annual maximum rainfall data of different durations. The choice of a distribution model for the rainfall intensity–frequency relations is largely statistical without much physical justification .
By the conventional approach, resulting rainfall intensity–frequency curves of different durations could sometimes intersect within the probability range of practical application. The crossover phenomenon often occurs when data record length is relatively short. According to the physical reality, rainfall intensity–frequency curves of different durations should not crossover or intersect. Porras and Porras  attributed the occurrence of crossover of rainfall IDF curves to short record data of questionable representation in which a significant amount of sampling errors existed in the estimated rainfall quantiles by frequency analysis. One other plausible reason for the possible crossover of IDF curves is that frequency analysis of rainfall data is performed separately for each duration without considering the inter-correlations that are intrinsically embedded in rainfall data of different durations. Haktanir  earlier pointed out that rainfall frequency analysis of different durations in the process of establishing IDF relationships should not be performed independently of each other, but did not propose a mechanism to handle the correlation directly. Recently, Gräler et al.  applied D-vine copula, along with the generalized extreme value distributions, to derive rainfall IDF relationships based on rainfalls of five durations. You and Tung , under the TPNT framework, developed a constrained least square model to simultaneously considering rainfall data of seven durations for establishing at-site rainfall IDF relations. However, their model does not explicitly take into account the correlation among rainfall data of different durations.
The multivariate TPNT-based model presented above was applied to establish at-site rainfall IDF relations using annual maximum hourly rainfall data of various durations at a raingauge in Zhongli City of Taoyuan County, Taiwan. Annual maximum rainfall intensity data cover the record period of 1988–2015, but the year 1992 was excluded from the analysis due to long periods of registers with technical issues. Hence, only 27-year data (n = 27) with seven (N = 7) durations (i.e., 1, 2, 6, 12, 24, 48, and 72 h) are used in this illustration (see data in Table 1). The sample values of the mean, standard deviation, and first-four L-moments of rainfall data of different durations are tabulated in Table 2. Furthermore, the standard error values corresponding to the first four sample L-moments, according to Equations (47)–(50), are listed in Table 3. The sample correlation coefficients of all rainfall intensity pairs of different durations in the original and normal-transformed domains are shown in Table 3 and Table 4, respectively. Based on the information given in Table 2 and Table 4, one is able to define the lower and upper bounds for the L-moments and correlation coefficients according to the desired confidence level, α, by Equations (51) and (53), respectively. Table 5 lists the values of correlation coefficients in normal-transformed space, , provided by the solution to constraint Equations (41) and (42) in the optimization model.
Under different constraint types and confidence levels for the L-moments and correlation coefficients, the corresponding optimal multivariate TPNT coefficients can vary. With the confidence level of α = 90% for both L-moments and correlation coefficients, Table 6a–d list the optimal TPNT coefficients under four different constraint types, including “LM” for L-moments by Equations (32)–(35), “Mono” for monotonicity by Equation (36), “Corr” for correlation by Equations (41) and (42), and “NC” for no-crossover by Equation (56). Once the optimal TPNT coefficients associated with each rainfall duration are obtained from solving the multivariate TPNT model, the rainfall IDF relations, according to Equation (15), can be established aswhere = estimated t-h, T-year rainfall intensity; , , , and = optimum TPNT coefficients corresponding to rainfall of duration t (h); zT is the standard normal quantile corresponding to return period T-year having an annual exceedance probability of .
4. Results and Discussions
By varying the value of zT for different return periods in Equation (55), in conjunction with the optimal TPNT coefficients listed in Table 6a–d, one can establish IDF curves as shown in Figure 2 and Figure 3. Part (a) of Table 6 and Figure 2, Figure 3 and Figure 4 (denoted by “LM”) shows the results from considering only the bounding constraints of L-moments, Equations (32)–(35). In fact, the optimal TPNT coefficients corresponding to each duration can be obtained separately from the exact solutions using sample L-moments in Equations (11)–(14). Note that the TPNT coefficients obtained from each rainfall duration at this stage do not necessarily comply with a one-to-one monotonic increasing relation of rainfall quantile and probability. This can be clearly seen in Table 6a for 1 and 2 h rainfalls for which the two monotonicity constraints are violated (shown by *). Part (b) (denoted by “LM/Mono”) shows the results by considering both L-moment constraints, Equations (32)–(35), and the monotonicity constraints, Equation (36), for each rainfall duration. In this case, both results presented in Parts (a) and (b) in Table 6 and Figure 2, Figure 3 and Figure 4 can be obtained separately by treating rainfall data of different durations without considering their inter-correlations. Results in Part (c), denoted by “LM/Mono/Corr,” were obtained by incorporating correlation constraints of rainfall data with different durations, Equations (41) and (42), in determining the multivariate TPNT coefficients.
To show the degree of goodness-of-fit of normal transformed rainfall data by the proposed multivariate TPNT procedure, a normal probability plot of 24 h rainfall data (after normal transformation) with the fitted line and 95% confidence band are shown in Figure 5 as an example. The goodness-of-fit test shown in Figure 5 was achieved by the Anderson–Darling test  by which the test statistic is 0.535 with a p-value of 0.155. Figure 5 represents the worst case among the seven durations considered. The range of p-value varies from 0.155 (for 72 h) to 0.933 (for 2 h), which are higher than the generally adopted significance level of 0.05. This indicates that the normal transform by the proposed multivariate TPNT procedure is quite adequate.
Note that the solution obtained up to this stage does not necessarily comply with the physical reality that rainfall intensity (depth) of a given return period is a decreasing (an increasing) function of duration. In other words, rainfall intensity/depth–frequency curves of different durations should not intersect or crossover each other. However, in the process of establishing rainfall IDF/DDF relationships, one does not know in advance if any two resulting two curves would intersect before the statistical model is developed. Therefore, a special set of intersections avoidance constraints are imposed in establishing the IDF curves:where = upper limit of selected rainfall return period below which no crossover of IDF curves is permitted to occur; = standard normal quantile obtainable from . Hence, additional N − 1 no-crossover (NC) constraints are included in the optimization model to solve for multivariate TPNT coefficients. Part (d) results (denoted by “LM/Mono/Corr/NC”) show the rainfall IDF relations considering the NC constraints.
Figure 2 shows the rainfall intensity–duration curves corresponding to various frequencies. For this particular data set, by only preserving sample L-moments, Figure 2a reveals two unusual features for those curves when return period is high (say, ≥100 years). They are (1) curves that tend to converge together for rainfall duration in the vicinity of 1 h and (2) the relatively pronounced undulation of curves for medium and long duration. These features are indications of possible anomalies that should not appear in a reasonable rainfall IDF relation. The convergence of rainfall intensity–duration curves in Figure 2a, shown in a different form in intensity–frequency relation as Figure 3a, reveal that the 1 h curve (in red) clearly does not satisfy the monotonicity condition according to Equation (36), which requires a rainfall intensity quantile value to increase continuously with a return period (see also Table 4a). In fact, the 2 h intensity–frequency curve (in gold) also mildly violates the monotonicity requirement as the curve starts to bend down for high return periods. The violation of the monotonicity condition can also be observed in the form of the depth–frequency curve for a 1 h duration (see Figure 4a). In this circumstance, the non-monotonicity of the 1 h rainfall intensity–frequency relation produces a crossover with the 2 h curve shown in Figure 3a.
Interestingly, Figure 3a also reveals that 6 and 12 h rainfall intensity–frequency curves have a strong tendency to intersect as rainfall frequency increases. This tendency to intersect could be attributed to a relatively large undulation of intensity–duration curves in the range of 6–12 h when rainfall frequency increases (see Figure 2a). From 6 to 12 h, the gradient of intensity–duration curves flatten out for larger return periods. The empirical results show some evidence of improvement (in terms of a decrease in undulation, for large frequencies) when more constraints are considered. However, the improvement is not significantly enough to remove undulation. In practical engineering applications, the undulation of rainfall IDF curves such as those shown in Figure 2a–d is removed by fitting the estimated rainfall intensity–duration data by an empirical IDF model, such as Sherman’s equation .
It is clear that, by considering the monotonicity constraint, Equation (36), the crossover tendency of intensity–duration curves (see Figure 2b) in the vicinity of 1–2 h disappears (see also Table 6b), as does that of the 1 h and 2 h intensity–frequency curves in Figure 3b. Correspondingly, the concave down appearance of the 1 h depth–frequency curve and, to a lesser extent, the 2 h curve is corrected (see Figure 4b).
Notice that joint consideration of complying with L-moments and the monotonicity condition does not truly take into account the inter-correlations of rainfall intensity or depth with different durations. The appearance of undulation in the rainfall intensity–duration curves for medium and long durations (≥6 h), which satisfy the monotonicity condition, is not affected. Hence, the crossover tendency of 6 and 12 h intensity–frequency curves (see Figure 3b) and the actual intersection of 48 and 72 h depth–frequency curves (see Figure 4b) remain unchanged.
With further consideration of inter-correlations of rainfalls of different durations, Equations (41) and (42), the resulting rainfall IDF and DDF curves are shown in Part (c) of Figure 2, Figure 3 and Figure 4. Figure 2c shows that the rainfall intensity–duration curves in the range of short duration for a high return period completely remove the crossover tendency. Both Figure 3c and Figure 4c show that rainfall intensity–frequency and depth–frequency curves for 1 and 2 h are parallel to each other. Still, the 48 and 72 h rainfall depth–frequency curves intersect (see Figure 4c).
For illustration, this application artificially select = 5000-year in Equation (56) as the limiting frequency below which rainfall depth–frequency or intensity–frequency curves of any two durations are not allowed to intersect. The obvious results of imposing no-crossover constraint is that the 48 h rainfall depth–frequency curve in Figure 4d would not intersect with the 72 h curve.
As for the effect of confidence level, numerical results indicate that a feasible solution for TPNT coefficients may not exist when the confidence levels for the unknown true L-moments and correlation coefficients are too low. This is expected because the width of confidence interval shrinks toward the sample L-moments and correlation coefficients as the confidence level reduces. At a certain confidence level, the corresponding width of the confidence band might be too restrictive for the optimization model to find feasible TPNT coefficients that simultaneously satisfy the monotonicity constraints. How low the limiting confidence level is depends on the problem. In this numerical example, the limiting confidence level is about 70%, below which no feasible solution can be found for multivariate TPNT coefficients. On the other hand, a reasonable confidence interval allows one to obtain a suitable set of TPNT coefficients to approximate multivariate relations.
5. Summary and Conclusions
Statistical modeling and data analysis in hydrosystems engineering often encounter multiple correlated random variables following non-normal distributions. Due to the difficulty in establishing a full joint probability density function for the involved variables, most of the methods tackling multivariate problems preserve the marginal statistical properties (e.g., distributions or moments) of individual variables and their correlation structures. In this study, focus is placed on the third-order polynomial transform (TPNT) procedure, which relies on the preservation of marginal L-moments and correlations among variables. In particular, a general framework is presented to optimally determine multivariate TPNT coefficients incorporating the constraints that (1) preserve the statistical L-moments and correlations with explicit consideration of their associated sampling errors; (2) comply with a one-to-one monotonicity increasing relation between quantiles of the original and normal transformed variables. Other than the above basic constraints required to hold the statistical and mathematical validity of the TPNT method, additional constraints that are relevant to the problem at hand can be incorporated into the modeling framework. In the illustrative example of establishing rainfall intensity–duration–frequency (IDF) relations, the no-intersection constraints for rainfall depth–frequency curves of different durations, Equation (56), are introduced in the model formulation to ensure that the resulting IDF relationships comply with the physical reality. The proposed method not only solves for the suitable multivariate TPNT coefficients that satisfy the monotonicity condition for individual variables, but also produces the correlation coefficients between random variables in the normal space. At this stage, the proposed multivariate TPNT procedure has not gone through a formal mathematical testing for its performance under different scenarios of multivariate distributions, correlation structures, and sample sizes. However, the procedure is based on a good logic with sound statistical and mathematical theory. The results from the empirical application to establish at-site rainfall IDF relationships appear to be quite reasonable.
Conceptualization: Y.-K.T. and L.W.Y.; formal analysis: Y.-K.T., L.W.Y., and C.S.Y.; investigation: Y.-K.T., L.W.Y., and C.S.Y.; writing-original draft preparation: Y.-K.T. and L.W.Y.; writing-review & editing: Y.-K.T., L.W.Y., and C.S.Y.; project administration: Y.-K.T. and C.S.Y.; funding acquisition: Y.-K.T. and C.S.Y.
This study was primarily supported by the Joint Cooperative Research Program managed by the National Research Foundation of Korea (2016K2A9A1A06922023) and the Ministry of Science & Technology of Taiwan (MOST 105–2923-E-009-004-MY2). Additional funding was received from the General Research Program of the Ministry of Science & Technology of Taiwan (106-2221-E-009 -067).
The authors wish to express their gratitude to the two anonymous reviewers for their thorough review and constructive comments.
Conflicts of Interest
The authors declare no conflict of interest.
- Kotz, S.; Balakrishnan, N.; Johnson, N.L. Models and Applications. In Continuous Multivariate Distributions, 2nd ed.; Wiley and Sons Inc.: New York, NY, USA, 2005; Volume 1. [Google Scholar]
- Hutchinson, T.P.; Lai, C.D. Continuous Bivariate Distributions—Emphasizing Applications; Rumsby Scientific Publishing: Adelaide, South Australia, 1990. [Google Scholar]
- Yue, S. A bivariate gamma distribution for use in multivariate flood frequency analysis. Hydrol. Process. 2001, 15, 1033–1045. [Google Scholar] [CrossRef]
- Nadarajah, S.; Shiau, J.T. Analysis of extreme flood events for the Pachang River, Taiwan. Water Resour. Manag. 2005, 19, 363–374. [Google Scholar] [CrossRef]
- Der Kiureghian, A.; Liu, P.L. Structural reliability under incomplete probability information. J. Eng. Mech. 1986, 112, 85–104. [Google Scholar] [CrossRef]
- Genest, C.; Chebana, F. Chapter 30: Coupula Modeling in Hydrologic Frequency Analysis. In Handbook of Applied Hydrology, 2nd ed.; Singh, V.P., Ed.; McGraw-Hill Book Company: New York, NY, USA, 2017. [Google Scholar]
- Favre, A.C.; Adlouni, S.E.; Perreault, L.; Thiemonge, N.; Bobee, B. Multivariate hydrological frequency analysis using copulas. Water Resour. Res. 2004, 40, W01101. [Google Scholar] [CrossRef]
- Ganguli, P.; Reddy, M.J. Probabilistic assessment of flood risks using trivariate copulas. Theor. Appl. Climatol. 2013, 111, 341–360. [Google Scholar] [CrossRef]
- Chen, L.; Singh, V.P.; Guo, S.L.; Mishra, A.K.; Guo, J. Drought analysis using copulas. J. Hydrol. Eng. 2013, 18, 797–808. [Google Scholar] [CrossRef]
- Khedun, C.P.; Chowdhary, H.; Mishra, A.K.; Giardino, J.R.; Singh, V.P. Water deficit duration and severity analysis based on runoff derived from Noah land surface model. J. Hydrol. Eng. 2013, 18, 817–833. [Google Scholar] [CrossRef]
- Sadri, S.; Burn, D.H. Copula-based polled frequency analysis of droughts in the Canadian Prairies. J. Hydrol. Eng. 2014, 19, 277–289. [Google Scholar] [CrossRef]
- Tosunoglu, F.; Kisi, O. Joint modelling of annual maximum drought severity and corresponding duration. J. Hydrol. 2016, 543, 406–422. [Google Scholar] [CrossRef]
- Requena, A.I.; Mediero, L.; Garrote, L. A bivariate return period based on copulas for hydrologic dam design: Accounting for reservoir routing in risk estimation. Hydrol. Earth Syst. Sci. 2013, 17, 3023–3038. [Google Scholar] [CrossRef]
- Li, C.; Singh, V.P.; Mishra, A.K. A bivariate mixed distribution with a heavy tailed component and its application to single site daily rainfall simulation. Water Resour. Res. 2013, 49, 767–789. [Google Scholar] [CrossRef]
- Jun, C.; Qin, X.S.; Gan, T.Y.; Tung, Y.K.; De Michele, C. Bivariate frequency analysis of rainfall intensity and duration for urban stormwater infrastructure design. J. Hydrol. 2017, 553, 374–383. [Google Scholar] [CrossRef]
- Aas, K.; Czado, C.; Frigessi, A.; Bakken, H. Pair-copula constructions of multiple dependence. Insur. Math. Econ. 2009, 44, 182–198. [Google Scholar] [CrossRef][Green Version]
- Shafaei, M.; Fard, A.F.; Dinpashoh, Y.; Mirabbasi, R.; Michele, D.C. Modeling flood event characteristics using d-vine structures. Theor. Appl. Climatol. 2017, 130, 713–724. [Google Scholar] [CrossRef]
- Vernieuwe, H.; Vandenberghe, S.; De Baets, B.; Verhoest, N.E.C. A continuous rainfall model based on vine copulas. Hydrol. Earth Syst. Sci. 2015, 19, 2685–2699. [Google Scholar] [CrossRef][Green Version]
- Tosunoglu, F.; Singh, V.P. Multivariate Modeling of Annual Instantaneous Maximum Flows Using Copulas. J. Hydrol. Eng. 2018, 23, 04018003. [Google Scholar] [CrossRef]
- Qing, X. Generating correlated random vector involving discrete variables. Commun. Stat. Theory Methods 2017, 46, 1594–1605. [Google Scholar]
- Nataf, A. Determination des distributions dont les marges sont donnees. Comptes Rendus l’Academie Sciences 1962, 225, 42–43. [Google Scholar]
- Lebrun, R.; Dutfoy, A. An innovating analysis of the Nataf transformation from the copula viewpoint. Probab. Eng. Mech. 2009, 24, 312–320. [Google Scholar] [CrossRef]
- Liu, P.L.; Der Kiureghian, A. Multivariate distribution models with prescribed marginals and covariances. Probab. Eng. Mech. 1986, 1, 105–112. [Google Scholar] [CrossRef]
- Chang, C.H.; Tung, Y.K.; Yang, J.C. Monte Carlo simulation for correlated variables with marginal distributions. J. Hydraul. Eng. 1994, 120, 313–331. [Google Scholar] [CrossRef]
- Chen, H.F. Initialization for NORTA: Generation of random vectors with specified marginal and correlations. INFORMS J. Comput. 2001, 13, 312–331. [Google Scholar] [CrossRef]
- Li, H.S.; Lu, Z.Z.; Yuan, X.K. Nataf transformation based point estimate method. Chin. Sci. Bull. 2008, 53, 2586–2592. [Google Scholar] [CrossRef]
- Niaki, S.T.A.; Abbasi, B. Generating correlation matrices for normal random vectors in NORTA algorithm using artificial neural networks. J. Uncertain Syst. 2008, 2, 192–201. [Google Scholar]
- Fleishman, A.L. A method for simulating non-normal distributions. Psychometrika 1978, 43, 521–532. [Google Scholar] [CrossRef]
- Vale, C.D.; Maurelli, V.A. Simulating multivariate non-normal distributions. Psychometrika 1983, 48, 465–471. [Google Scholar] [CrossRef]
- Headick, T.C.; Sanwilowsky, S.S. Simulating correlated multivariate nonnormal distributions: Extending the Fleishman power method. Psychometrika 1999, 64, 25–35. [Google Scholar] [CrossRef]
- Chen, X.Y.; Tung, Y.K. Applications of TPNT in multivariate Monte Carlo simulation. In Water Resources Planning and Management; EWRI/ASCE: Philadelphia, PA, USA, 2003; pp. 23–26. [Google Scholar]
- Hodis, F.A. Simulating Univariate and Multivariate Nonnormal Distributions Based on a System of Power Method Distributions. Ph.D. Thesis, Southern Illinois University, Carbondale, IL, USA, 2008. [Google Scholar]
- Demirtas, H.; Hedeker, D.; Mermelstein, R.J. Simulation of massive public health data by power polynomials. Stat Med. 2012, 31, 3337–3346. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Yang, H.; Zou, B. The point estimate method using third order polynomial normal transformation technique to solve probabilistic power flow with correlated wind source and load. In Proceedings of the Asia-Pacific Power and Energy Engineering Conference, Shanghai, China, 27–29 March 2012; pp. 1–4. [Google Scholar]
- Cai, D.; Shi, D.; Chen, J. Probabilistic load flow computation with polynomial normal transformation and Latin hypercube sampling. IET Gen. Trans. Distrib. 2013, 7, 474–482. [Google Scholar] [CrossRef]
- Chen, X.Y. Investigating Third-Order Polynomial Normal Transform and Its Applications to Uncertainty and Reliability Analyses. Master’s Thesis, Hong Kong University of Science and Technology, Hong Kong, China, 2002. [Google Scholar]
- Zhao, Y.G.; Lu, Z.H. Fourth moment standardization for structural reliability assessment. J. Struct. Eng. 2007, 133, 916–924. [Google Scholar] [CrossRef]
- Tung, Y.K. Polynomial normal transformation in uncertainty analysis. Appl. Probab. Stat. 1999, 1, 167–174. [Google Scholar]
- Hosking, J.R.M. The Theory of Probability Weighted Moments; IBM Research Report, RC12210; IBM: Yorktown Heights, NY, USA, 1986. [Google Scholar]
- Fisher, R.A.; Cornish, E.A. The percentile points of distribution having known cumulants. Technometrics 1960, 2, 209–225. [Google Scholar] [CrossRef]
- Chen, X.Y.; Tung, Y.K. Investigation of polynomial normal transform. Struct. Saf. 2003, 25, 423–445. [Google Scholar] [CrossRef]
- Hosking, J.R.M. L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics. J. R. Stat. Soc. Ser. B Method 1990, 52, 105–124. [Google Scholar] [CrossRef]
- Makkonen, L.; Pajari, M.; Tikanmaki, M. Discussion on “Plotting positions for fitting distributions and extreme value analysis”. Can. J. Civil Eng. 2013, 40, 130–139. [Google Scholar] [CrossRef]
- Weibull, W. A statistical theory of the strength of materials. R. Swed. Inst. Eng. Res. Proc. 1939, 151, 1–45. [Google Scholar]
- Fisher, R.A. On the ‘probable error’ of a coefficient of correlation deduced from a small sample. Metron. 1921, 1, 1–32. [Google Scholar]
- Kennedy, J.B.; Neville, A.M. Basic Statistical Methods for Engineers and Scientists, 3rd ed.; Happer and Row Publishing: New York, NY, USA, 1986. [Google Scholar]
- Wilson, R.B. A simplicial Method for Convex Programming. Ph.D. Thesis, Harvard University, Boston, MA, USA, 1963. [Google Scholar]
- Boggs, P.T.; Tolle, J.W. Sequential quadratic programming. Acta Numer. 1995, 4, 1–51. [Google Scholar] [CrossRef]
- Akan, A.O.; Houghtalen, R.J. Urban Hydrology, Hydraulics, and Stormwater Quality; Wiley: Hoboken, NJ, USA, 2003. [Google Scholar]
- Sun, S.A.; Djordjevic, S.; Khu, S.T. Decision making in flood risk based storm sewer network design. Water Sci Technol. 2010, 64, 247–254. [Google Scholar] [CrossRef]
- Singh, V.P.; Strupczewski, W.G. On the status of flood frequency analysis. Hydrol Process. 2002, 16, 3737–3740. [Google Scholar] [CrossRef]
- Porras, P.J.S.; Porras, P.J., Jr. New perspective on rainfall frequency curves. J. Hydrol. Eng. 2001, 6, 82–85. [Google Scholar] [CrossRef]
- Haktanir, T. Divergence criteria in extreme rainfall series frequency analyses. Hydrol. Sci. J. 2003, 48, 917–937. [Google Scholar] [CrossRef][Green Version]
- Gräler, B.; Fischer, S.; Schumann, A. Joint modeling of annual maximum precipitation across different duration levels. In EGU General Assembly Conference Abstracts; Discussion Paper SFB 823; EGU: Munich, Germany, 2016. [Google Scholar]
- You, L.; Tung, Y.K. Derivation of rainfall IDF relations by third-order polynomial normal transform. Stoch. Environ. Res. Risk Assess. 2018, 32, 2309–2324. [Google Scholar] [CrossRef]
- D’Agostino, R.B.; Stephens, M.A. Goodness-of-Fit Techniques; Marcel Dekker: New York, NY, USA, 1986. [Google Scholar]
- Sherman, C.W. Frequency and intensity of excessive rainfalls at Boston, Massachusetts. Transaction 1931, 95, 951–960. [Google Scholar]
Figure 1. Flow diagram showing the procedure of multivariate TPNT modeling considering sampling errors of sample statistics.
Figure 2. Multivariate TPNT modeling of rainfall intensity–duration relationships of varying return periods under different constraints which consider: (a) L-moments only; (b) L-moments and monotonicity; (c) L-moments, monotonicity and correlation; (d) L-moments, monotonicity, correlation and no crossover.
Figure 3. Multivariate TPNT modeling of rainfall intensity–frequency relationships of varying durations under different constraints which consider: (a) L-moments only; (b) L-moments and monotonicity; (c) L-moments, monotonicity and correlation; (d) L-moments, monotonicity, correlation and no crossover.
Figure 4. Multivariate TPNT modeling of rainfall depth–frequency relationships of varying durations under different constraints which consider: (a) L-moments only; (b) L-moments and monotonicity; (c) L-moments, monotonicity and correlation; (d) L-moments, monotonicity, correlation and no crossover.
Figure 5. Probability plot of normalized 24 h rainfall data.
Table 1. Annual maximum rainfall intensities (mm/h) at Zhongli Station, Taiwan.
|Year||1 h||2 h||6 h||12 h||24 h||48 h||72 h|
Table 2. Sample moments (in mm/h) of rainfall intensity data.
|Moments||1 h||2 h||6 h||12 h||24 h||48 h||72 h|
Table 3. Standard errors (in mm/h) of sample L-moments of rainfall intensity data.
|Std. Error||1 h||2 h||6 h||12 h||24 h||48 h||72 h|
Table 4. Sample correlation coefficients between rainfall intensity of different durations.
|Duration||1 h||2 h||6 h||12 h||24 h||48 h||72 h|
Table 5. Correlation coefficients of normal-transformed rainfall intensity of different durations.
|Duration||1 h||2 h||6 h||12 h||24 h||48 h||72 h|
Table 6. Multivariate TPNT coefficients obtained under different constraints with α = 0.90.
|(a) Constraints: L-moments only (LM)|
|TPNT Coefficients||1 h||2 h||6 h||12 h||24 h||48 h||72 h|
|−1.46 *||−0.23 *||0.29||0.84||0.44||0.17||0.07|
|104.8 *||15.4 *||−1.90||−4.09||−1.46||−0.47||−0.15|
|(b) Constraints: L-moments and Monotonicity (LM/Mono)|
|1 h||2 h||6 h||12 h||24 h||48 h||72 h|
|(c) Constraints: L-moments, Monotonicity, and Correlation (LM/Mono/Corr)|
|1 h||2 h||6 h||12 h||24 h||48 h||72 h|
|(d) Constraints: L-moments, Monotonicity, Correlation, and No Crossover (LM/Mono/Corr/NC)|
|1 h||2 h||6 h||12 h||24 h||48 h||72 h|
Note: * indicates a violation of monotonicity condition.
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).