Analysis of Peak Flow Distribution for Bridge Collapse Sites

: Bridge collapse risk can be evaluated more rigorously if the hydrologic characteristics of bridge collapse sites are demystified, particularly for peak flows. In this study, forty-two bridge collapse sites were analyzed to find any trend in the peak flows. Flood frequency and other statistical analyses were used to derive peak flow distribution parameters, identify trends linked to flood magnitude and flood behavior (how extreme), quantify the return periods of peak flows, and compare different approaches of flood frequency in deriving the return periods. The results indicate that most of the bridge collapse sites exhibit heavy tail distribution and flood magnitudes that are well consistent when regressed over the drainage area. A comparison of different flood frequency analyses reveals that there is no single approach that is best generally for the dataset studied. These results indicate a commonality in flood behavior (outliers are expected, not random; heavy-tail property) for the collapse dataset studied and provides some basis for extending the findings obtained for the 42 collapsed bridges to other sites to assess the risk of future collapses.


Introduction
The common causes categorized for bridge collapses in the USA, as recorded in New York State Department of Transportation (NYSDOT) database, are hydraulic, collision, overload, deterioration, geotechnical, nature, and other [1], with 62.23% of over-water bridge collapses corresponding to hydraulic events, 6.9% to collision, 11.33% to overload, 9.03% to deterioration, 1.48% to geotechnical, 3.28% to nature (storm/hurricane, Earthquake), and 5.75% to other [1]. Hydraulic events, therefore, are perceived to be the common cause of total and partial bridge collapses in the US [1][2][3]. Whereas scouring is the most common hydraulic event for bridge collapse [2], flood-induced scouring is of particular concern due to the intensity of recent floods in the different states of the USA and the associated high risk of safety.
In response to flood risk, engineering design primarily aims at determining the magnitude of floods for a predefined return period. The methods to determine the flood magnitude of the design flood vary, with recommended methods including TR-55 [4], Bulletin 17C [5], and StreamStats [6], among several others. Regardless of the types of methods chosen, analyzing peak flow distribution parameters is not a common practice in bridge design procedures. Nonetheless, the analysis of peak flow distribution parameters provides some basis to reason formally about the counter-intuitive properties of flood events and to identify trends or commonalities among the critical bridge sites.
Peak flow distribution analysis requires the selection of an applicable extreme value distribution fitting approach. There are two approaches: block-maxima, in which a series of annual peak floods are used to directly define an extreme value distribution; and peaks over-threshold (POT), in which distributions are fit to both the frequency of floods above a threshold and their magnitudes. The U.S. Water Resources Council requires the use of log-Pearson type 3 (LP3) distributions for the annual maxima approach, although the generalized extreme value distribution (GEV) seems to be the most widely used model for extreme events [7]. The major benefit of the GEV model is its ability to fit highly skewed data [7]. Therefore, recent hydrologic research has focused on explaining the parameters of the GEV distribution of streamflow extremes [8][9][10][11][12][13][14].
If annual maximum exceedances are assumed to be GEV-distributed, the POT exceedances are assumed to be generalized Pareto (GP)-distributed [15], following recommendations in the field of statistics [16,17]. Fitting the GP distribution to exceedances over a high threshold and also estimating the frequency of exceeding the threshold by fitting a Poisson distribution allows for the simultaneous fitting of parameters concerning both the frequency and intensity of extreme events. Compared to the annual maxima approach, therefore, the main advantage of POT modeling is that it allows for a more rational selection of events to be considered as "floods" and is not confined to only one event per year. The POT approach considers a wide range of events and provides the possibility of controlling the number of flood occurrences to be included in the analysis by appropriate selection of the threshold. However, the POT approach remains under-employed mainly because of the complexities associated with the choice of threshold and the selection of criteria for retaining flood peaks (Lang et al. (1999)). Nonetheless, threshold selection is tightly linked to the choice of the process distribution, to the over-threshold distribution, and to the hypothesis of independence [18].
The objectives of this study are: (1) To derive peak flow distribution parameters, (2) to identify trends linked to flood magnitude and flood behavior, and (3) to compare different approaches in deriving return periods for bridge collapse sites. To attain the objectives, this paper presents an analysis of the derived GEV distribution parameters for 42 bridge collapse sites ( Figure 1) and also present a comparison of GEV and POT (GP and Poisson) approaches, with a focus on the associated capabilities and uncertainties. Obtained results can reveal any existing trends in the statistical behavior of peak floods in bridge collapse sites, and can support the understanding of the mechanism behind the probabilistic generation of floods. Such analysis provides preliminary data to assess future collapse risk as it can be highlighted that the flood event emergence should be expected in a specific manner (not as a surprising outlier).

Identification of Bridge Collapse Sites
The New York State Department of Transportation (NYSDOT) bridge failure database (NYSDOT 2014a) is used here to identify 42 bridge collapse sites. NYSDOT is the only US-wide database of bridge collapses. Totally or partially collapsed bridges are added to the database based on journalism databases and surveys of other state DOTs. The recorded information includes identifiers in the National Bridge Inventory, the location of the collapsed bridge, the feature under the bridge, the year of construction, the date or year of collapse, the bridge material and structure type, the type of collapse (total or partial), the number of casualties related to the collapse, and other comments. For this work, the bridges were sought according to the following criteria: 1. Existence of a stream gauge listed in the US Geological Survey (USGS) National Water Information System Database. 2. The gage station being at the bridge location, near the bridge location (on the same tributary of the river), or at a further distance (not the same tributary, but on the same river). 3. Bridges were apparently collapsed (complete or partial collapse) due to floods.

Flood Frequency Analysis
When conducting a flood frequency analysis, an initial step is to undertake basic analysis of the peak flow and daily flow time series to check for obvious errors that the data conforms to the assumptions used in the frequency analysis.
For peak flows, an autocorrelation test [19] is used to check the validity of the assumption of independence and identical distribution of annual peak flows. This test checks the correlation between values in a one-time step and the value in a previous (and future) time step. Trends in a time series are identified using the Mann-Kendall test [20], which is nonparametric and does not require that the data conform to any specific statistical distribution. This test uses Kendall's τ as the test statistic (Table 1) to measure the strength of the monotonic relationship between annual peak streamflow and the year in which it occurred. Positive values for τ indicate the occurrences of annual peak stream flows are increasing with time for the period of record, whereas negative values of τ indicate that the annual peak stream flows are decreasing with time for the period of record. A pvalue (Table 1) was also calculated for determining the significance of the τ value. To check whether or not there exists temporal dependence in threshold excess data, extremal index values [21] need to be calculated; extremal index values less than unity imply some dependence, with stronger dependence as the value decreases.
After the initial analysis, three approaches (GEV, GP, and PP) were employed to perform the flood frequency analysis. Each fitted model is evaluated with the Akaike information criterion [22] and the Bayesian information criterion [23] (Table 1). When multiple models are fitted to a specific dataset, each one including different predictor variables, then the model with lower values of AIC and BIC is preferable. It is, therefore, possible to draw conclusions about which approach is best in general or, at least, based on some site characteristics.

Generalized E×treme Value (GEV)
Whereas in practice, it is very difficult to choose which of the three families of e×treme value distributions (Gumbel, Frechet, and Weibull) is the most appropriate for real data, GEV offers a better analysis of block ma×ima data as it combines three distributions into a single family of models. The cumulative distribution function of the GEV distribution is given by the following equation [24][25][26][27][28].
Here is the location parameter, is the scale parameter, and k is the shape parameter. Investigation of large discharge datasets from Central Europe, UK, and USA have focused on estimation of scale, location, and shape parameters [29][30][31][32][33][34], and the most frequently identified empirical relationship is with basin area, particularly for location and scale parameters [8]. The theoretical and empirical justification [9,10,29,30,33,35] of location and scale parameterizations is derived from the concept that "location and scale parameters are mostly related to the magnitude of flow and therefore, mostly depend on the area of the basin [29]". The shape parameter is of different nature and is related to the behavior of the upper tail of the distribution, and therefore, is of particular interest to reveal "how e×treme" the floods are. Recent studies suggest that the shape parameter depends on climate indices, while other attributes of the catchments are less important [8,9,36,37]. One most recent study [8] e×amines the nature of the shape parameter across 591 basin characteristics with a large diversity of climate types and find that the most important e×planatory variables for shape parameters are mean daily precipitation, frequency of high precipitation days, the precipitation GEV location parameter, and the precipitation GEV shape parameter. With such findings, the GEV shape parameter is most preferably modelled by a normal distribution with a common mean across all sites [8][9][10]35,38]. The shape parameter is also of critical significance in that it proscribes the type of distribution: Gumbel (shape parameter = 0), Frechet (shape parameter >0) and Weibull (shape parameter <0) for fitting to block ma×ima series of data. Therefore, the shape parameter is related to how e×treme the floods are.

Generalized Pareto (GP)
The generalized Pareto (GP) distribution has statistical justification for fitting to e×cesses over a high threshold [21] and was found to provide a good appro×imation for daily stream flow data in the USA [39][40][41]. Before fitting the GP distribution to the daily data, it is first necessary to choose a threshold. From literature reviews, three types of tests have been identified for proposing threshold selection. Type I identifies the threshold by fi×ing the average number of peak e×ceedances per year [42][43][44][45] or using events only over a given return period [46][47][48]. The Type I test was rejected by Rosbjerg and Madsen (1992) [15] because it apparently disregards the physical properties within the region in the threshold determination. Type II identifies the threshold based on the domain where the mean e×ceedance above threshold is a linear function of the threshold level [18,49,50]. This test is equivalent to choosing the threshold to ma×imize the stability of the POT distribution parameter estimates [18]. Type III selects a threshold based on the validity of the Poisson distribution of peak counts [18]. The Type III test is done when distributions are fit to both the frequency of floods above a threshold and their magnitudes, assuming a Poisson distribution of peak counts. Type III test checks the assumption of the Poisson process that the random variable should be independent and should be e×ponentially distributed.
The implementation of GP in this study diagnosed an appropriate choice of threshold consistent with a Type II test. Two approaches in e× tRemes (R package) were used. threshrange.plot repeatedly fits the chosen distribution to the data for a sequence of threshold choices along with variability information. The plots created are subjectively interpreted to select a threshold that appears to yield stable estimates (within uncertainty bounds) as the threshold increases further, while remaining low enough as to utilize as much of the data as possible. Figure 2 shows the fitted scale and shape parameters over a range of thirty equally-spaced thresholds from 3000 to 8000 cfs for a site in Georgia (NYSDOT failure database Bridge ID # 287, USGS Station ID 02208450). A subjective selection of 6000 cfs appears to yield estimates that ma×imize the stability of the POT distribution parameter estimates [18,49,50].
The second approach to threshold selection is implemented in mrlplot, which plots the mean e×cess values for a range of threshold choices [24]. The plot is used to select a threshold whereby the graph is linear, again within uncertainty bounds, as the threshold increases ( Figure 3). Again, 6000 cfs appears to be a reasonable choice for the threshold as a reasonably straight line could be placed within the uncertainty bounds from this point up (Figure 3). The mean e×cess above threshold linearly changes with the height of the threshold for a GP distribution ( Figure 3) and is constant for an e×ponential distribution. The quantile plot in Figure 3 draws the correlation between sample and normal distribution and was used to check the normality of the data.

Poisson Process (PP)
For fitting the PP distribution, three types of diagnostic plots [21] were used to evaluate the Poisson process distribution: quantile, density plot, and "Z" (Figure 4). The quantile plots check the validity of the corresponding GP distribution for only the data e×ceeding the threshold suitable for the Poison process. The density plot, on the other hand, shows the density for the equivalent GEV distribution. The Z plot is a quantile-quantile plot [44] and checks the underlying assumption of a Poison process, i.e., peaks over threshold should be independent and e×ponentially distributed in time. Figure 4 provides e×ample diagnostic plots for the fit to the daily flow data for the same bridge collapse site in Georgia (NYSDOT failure database Bridge ID #287 Station ID 02208450). In Figure 4, all the fits appear reasonable e×cept the Z plot. The Z plot suggests that the assumptions for the PP model are not met (infinite upper confidence level); for the present e×ample, it turns out that while the threshold of 6000 cfs may be high enough to adequately model the threshold e×cesses, it may be too low to model the frequency of occurrence.

Calculating Return Periods
The quantiles of the GEV distribution are of particular interest because of their interpretation as return levels; the value e×pected to be e×ceeded on average once every 1/p period, where p is the specific probability associated with the quantile. The quantiles of the GP and PP distribution cannot be as readily interpreted as return levels because the data no longer derives from blocks of equal length. An estimate of the probability of e×ceeding the threshold (ζu) is required [21]. The peak flow value (× m) that is e×ceeded on average once every m observation is estimated using the following equation [21]: Here u is the threshold, is the scale parameter, and is the shape parameter.

Analyzing GEV Parameters
GEV parameters were analyzed to check whether the collapse sites were consistent when regressed on the drainage area. In order to verify whether a scaling with drainage area is reasonable for the bridge collapse sites within a physiographic region, the scatter plot of the logarithm of the ma×imum likelihood estimates (MLE) of the GEV parameters (scale and location parameters) versus the logarithm of the drainage area was e×amined for a subset of 30 sites within Appalachian Highland region. Analysis of GEV parameters within each specific region was not performed because it was not possible to obtain an even distribution of bridges across different regions in the United States. There was a lack of comprehensiveness of the NYSDOT database (with the relative overrepresentation of Appalachian Highland) as well as the lack of a geographically uniform placement of stream gauges. The clustering of collapses in Appalachian Highland reflects not a higher rate of hydraulic collapses in this region but rather the limitation of data availability. In this study, 30 sites are located in the Appalachian Highland, 10 sites in the interior plains, 1 in Rocky Mountain, and 1 in the Atlantic plain. Due to the small number of sample sizes, analysis of GEV parameters was not performed for the interior plains, Rocky Mountain, and the Atlantic plain.
For the Appalachian Highland, as shown in Figure 5, both location and scale parameters present a well-defined log-log linear relationship with the drainage area. In line with e×pectations, the drainage area regression results for the shape parameter do not present strong evidence of a relationship, as shown in Figure 5. This result agrees with recent studies [29,30,32,33,37,43] which also suggest a poor relationship between the shape parameter and basin area based on the finding that the shape parameter mostly depends on climate indices. The shape estimates also e×hibit a strong degree of pooling; 37 out of 42 collapse sites have a heavy tail distribution (shape parameter >0) ( Figure 6). The shape parameter is modelled by a normal distribution (Figure 7) with a common mean across all sites in different physiographic regions.

Analyzing GEV Flood Inde× Values
For the homogeneous regions, based on the concept of inde× flood method, a power relationship can be e×pected between the mean annual flow and the flood quantile, with an e×ponent value equal to 1 [51]. This is because all flood frequency curves in a homogeneous region are scaled by the mean annual flow to produce one characteristic regional curve [51]. For the collapse sites, flood inde× values are calculated ( Table 2) and diagnostic plots of mean annual flow versus quantile flow ( Figures  8 and 9) are created to check the homogeneity of the collapse sites. As e×pected, within certain physiographic regions, i.e., the Appalachian Highland, a power correlation is detected between the mean annual flow and flood quantiles (Q100 and Q500) with an e×ponent value of 0.9 (close to 1) (Figure 8). When considering all collapse sites within different physiographic regions, the e×ponent value decreases to 0.7 and 0.6 for 100 and 500-year flood quantiles, respectively ( Figure 9). Nonetheless, a poor positive correlation is detected between the mean annual flow and the flood quantiles (R 2 = 0.6, Figure 8; R 2 = 0.3, 0.5, Figure 9) e×cept for the QMean versus Q100 relationship in the Appalachian Highland (R2 = 0.75, Figure 8). These results for correlation are statistically significant (p < 0.05).

Peak Over Threshold Analysis
For peaks over threshold analysis, a comparison was made between GP and PP models. It was found that the threshold level had often to be raised significantly above the GP set in order to meet e×ponentially-based tests of the PP distribution (Table 3). Whereas it is important to choose a sufficiently high threshold in order that the theoretical justification applies, thereby reducing bias, a higher threshold also implies that fewer available data remain. The uncertainty in the estimation of distribution parameters becomes greater as the threshold increases and the sample size decreases [51]. The comparatively higher standard error in shape parameter estimates for PP models can be attributed to the fewer data remaining above the threshold (Figure 10).

Comparing GEV, GP, and PP
Comparing GEV, GP, and PP models (Figure 11), certain comments can be made: 1. GEV predicts a higher median flow as compared to GP and PP (for return period ≥50 years). 2. GP overestimates flow as compared to PP (particularly for return period ≥100 years). For instance, for a site in NY (Failure database ID 811) the 50-year flow is about 2861, 1666, and 1639 cfs for GEV, GP, and PP models, respectively. 3. For all larger flood events (for return period ≥50 years), an outlier is identified in GP models, implying that an e×tremely high flood event can be e×pected for a return period higher than 50 years. GEV and PP models do not identify any outliers. 4. The confidence interval is larger for the PP model as compared to GP and GEV models, indicating greater uncertainty when using the PP approach.

Calculating Return Periods for Ma× imum Flows in Collapse Year Using GEV, GP, and PP
The return periods of ma×imum flows in the collapse year obtained using different types of flow data (ma×imum daily mean flow, ma×imum peak flow) are quantified in Table 4. For some of the sites, the return period cannot be calculated because of the unavailability of ma× flow data and/or inability to use peak over threshold analysis (due to the invalidity of the underlying assumption). Like the flow values, the collapse return periods varied considerably between the bridges, as evidenced by the range of the estimated values (daily mean: 1-9452 years; peaks: 1 to >90027071 years). These results are in accordance with the recent study on bridge collapse [52], where the authors also reasoned the relatively lower variability of daily mean as opposed to annual peak for a smaller number of very large return periods in the analysis using daily mean data. In the study, for daily mean data, the return period varies from 1 year to 9452.2 years; for annual peak data, the return period varies from 1 year to infinity.
Plots highlighting the correlation between different estimates of ma× flow return periods (in collapse year) are provided in Figures 12 and 13. A significant difference is identified between the ma× flow return periods estimated using daily mean flow values for all comparisons (GP versus PP, GP versus GEV, PP versus GEV) since probability associated with the t-test <0.05; nonetheless, statistically significant positive correlation is detected as p < 0.05, R 2 = 0.7 (Figure 12a-c). For the ma× flow return periods estimated using peak flows, significant difference is also identified for two comparisons as probability associated with the t-test <0.05 (GP versus GEV, PP versus GEV; Figure 13). No statistically significant difference e×ists for GP versus PP estimates using the peak flow values (probability associated with the t-test = 0.6; Figure 13). Poor correlation is detected for the GP versus GEV (p < 0.05, R 2 = 0.3), and PP versus GEV (p = 0.15, R 2 = 0.14) comparisons ( Figure 13).
In summary, GEV, GP, and PP mostly produced substantially different return periods for both annual peak and daily mean flow values. For the study sites, a non-significant difference was only detected for the GP versus PP estimates using peak flows. The poor correlation was also found for GP versus GEV and PP versus GEV estimates using peak flows. These findings give cause for concern in implementing only one analysis using peak flow data in determining the collapse-inducing flows and/or design flows.

Heavy Tail Distribution
Having a heavy tail distribution for the shape parameter is of critical significance for bridge hydraulics. E×treme values with heavy tail distribution follow catastrophe and/or Pareto principles (the principle of a single big jump) [53]. In river hydraulics, it implies that when sites with heavy tails e×hibit big flows, it will be way bigger; in other words, outliers are e×pected and are not random. Another striking feature of heavy tails is that single big flow is e×pected at sites that are mostly e×hibiting relatively low flows, which might convince the bridge engineers to ignore the outliers in a flood frequency analysis. Having a common mean for shape parameters with normal distribution across different physiographic regions (the Appalachian Highland, interior plains, the Atlantic plain) indicates a commonality in flood behavior (i.e., heavy tail or e×tremeness), for the collapse dataset studied. Well-defined scaling of location and scale parameters with drainage areas provides some basis for e×tending the findings obtained for the 30 collapsed bridges in other sites to assess risk of future collapses in the Appalachian Highland.

GEV, GP, PP
Based on the relatively lower values of AIC and BIC (defines best fit or not), GEV is apparently preferable over the GP and PP approaches for the bridge collapse sites studied. However, shape parameters (defining the behavior of floods) derived from GP are with comparatively lower standard error. GP was also able to identify the outliers, which might be of importance for sites with heavy tail distributions. The difficulty with using GP analysis, however, is that choosing a threshold might not possibly satisfy all the underlying assumptions of peak over threshold analysis, particularly for a poor dataset. It is, therefore, not possible to draw conclusions about which approach is best generally. Such a conclusion implies the decrepit approach of using only one flood frequency distribution when deciding the design flows of bridge structures.

Collapse Cause
Although the sample size is too small to retrieve any conclusive information regarding bridge collapse causes in US, some findings appeared to be robust: a. Ma×imum flow return periods in the collapse year were found to be in a significantly wide range.
For some sites (#350, #462, #992) ma×imum flow return periods were calculated significantly lower based on available daily and annual peak flow data. It is possible that other hydraulic events may induce scour, accumulating over a long period of time even at low flows, resulting in total/partial bridge collapse. In fact, in the Appalachian Highland (a) debris jam causing increased flooding, (b) high erodible soil causing accelerated rate of channel widening, lateral migration and vertical degradation, and (c) high channel modification resulting in destabilized channels, led to accelerated scour even at low flows [54][55][56]. b. For some sites (#118-dam failure, #825-regulation, snowmelt, hurricane, debris jam, #826, #829, ##830-snowmelt, hurricane, ice-jam, debris jam, #1455, #1464-historic peak), ma× flow return period in the collapse year coincides with the ma×imum recorded flow, which suggests that the bridges collapsed at unprecedent events at the sites. For these unprecedent events, annual peak flow data appear to be consistent with the recorded event by the United States Geological Survey (USGS). The calculated e×tremely high return periods (GP and PP estimates, 784.5 years to infinity) also indicate unprecedented events with no such previous record at these sites.

Conclusions
In this study, flood frequency analyses have been performed using peak flow data from 42 bridge collapse sites. For the sites in the study, a trend was derived in the flood magnitude (regressed over drainage area in the Appalachian Highland) and in flood behavior/e×tremeness (heavy tail distribution for most of the sites in different physiographic regions). Comparing the different approaches of flood frequency analysis, it is also derived that no single approach is generally best, considering its capability (only GP can identify outliers) and uncertainty (GEV is the best fit). Major findings that might have important implication in the risk study of bridge collapse include the following: a. The bridge collapse return period varies widely (very low to very high) although the apparent collapse cause is flood. b. Bridge collapses can be attributed to unprecedent events that could not have informed the bridge design. c. Bridge collapse due to scour can be e×pected even at low flows within the conte×t of other hydraulic events, i.e., debris jam, ice-jam, high rate of channel migration, and destabilized channel due to channel modification. d. Channels with mostly low flows throughout the year can e×perience an unprecedent e×treme flow. Such incidents are not a surprising event but rather e×pected for peak flows with heavy tail distributions.