List of Symbols

The dam-break induced loads and their effects on buildings are of vital importance for assessing the vulnerability of buildings in flood-prone areas. A comprehensive methodology, for risk assessment of buildings subject to flooding, is nevertheless still missing. This research aims to take a step forward by following previous research. To this aim, (1) five statistical procedures including: simple correlation analysis, multiple linear regression model, stepwise multiple linear regression model, principal component analysis and cluster analysis are used to study relationship between mean normalized force on structure and other related variables; (2) a new and efficient variable that can take into account both the shape of the structure and flow conditions is proposed; (3) a new and practical formula for predicting the mean normalized force is suggested for different types of obstacles, which is missing in the previous research.


Introduction
Dam break wave corresponds to an uncontrolled release of a mass of fluid in a channel due to the structure failure.From the hydrodynamic point of view, dam break flow is characterized by shockwaves, subcritical, supercritical and trans-critical flows [1].Surge waves induced by dam break may cause serious floods with numerous casualties and damages.The surge front is characterized by sudden flow discontinuities and variation in depth and velocity.Although it is a rather explored problem, dam break wave propagation down the rough surface remains impossible to solve because of all influencing parameters.
Dam break wave propagation is one of the most studied examples of unsteady open channel flow.Even though, there are still arguments about the driving constitutes of unsteady turbulent flow influence, such as spatial and temporal variation, bottom slope, friction characteristics, and speed of failure.Major dam breaks in the past century such as failures of the St Francis Dam (1928), Malpasset Dam (1959) and overtopping of the Vajont Dam (1963) attracted recent attention to the subject [2].
Basics of dam break wave profile are explained with some basic assumptions.The first one who attempted to describe dam break wave profile applying method of characteristics on solution of differential form of Saint-Venant equations was [3].Ever since, many analytical and numerical methods have been developed in order to simulate dam break flows.As the dam break wave propagates downstream with rapidly increasing speed, it generates strong impact forces on structures and obstacles.For the flood risk assessment and safety concepts, it is important to estimate the impact forces properly.

Literature Review
Forces on structures, associated with dam break, are generated by hydrostatic and hydrodynamic loading on structures.
Hydrostatic forces, F hs , are generated by pressure on the vertical side of structure and buoyancy on the horizontal side.Some of the existing codes propose calculation of hydrostatic force per unit width with velocity head component, while Federal Emergency Management Agency suggests formulation of force without velocity head, assuming it to be insignificant component of the hydrostatic force [Equation (1)] [4].
( ) where, g is acceleration due to gravity, h us and h ds are the water depths upstream and downstream of the structure, respectively, and ρ is density of the fluid.The effect of buoyant force, F b , cannot be neglected as it affects the resistance of structure against sliding and overturning.It can be calculated as follows [Equation ( 2)]: where V is volume of water displaced by submerged structure.Hydrodynamic forces, F hd , are induced by loading, which consists of drag/velocity dominated (quasi-static) loading and inertia/acceleration dominated (impulsive) loading.Hydrodynamic forces are dependent on kinematics of the flow as well as the geometry and dynamic characteristics of the structure.Although the general expression of this force [Equation (3)] is recommended by existing codes, there are still ambiguities in defining it.
F hd is the total drag force acting in the direction of flow, b•h is area of the structural element normal to the flow direction, C d is the drag coefficient, which is highly dependent on Reynolds number and shape of the obstacles, and u is flow velocity component orthogonal to the structure.This expression is widely used with variation in drag coefficient values, depending on the geometry of the affected structure.Federal Emergency Management Agency recommends drag coefficient 2.0 for rectangular piles and 1.0 and 1.2 for circular piles [4].Although recommended, Equation (3) does not involve inertia as an important factor in total hydrodynamic force.The more complete relationship is given as follows [5]: where C 1,1 and C 1,2 are inertia coefficients related to variation of flow velocity and water depth, as these two parameters have significant influence to damage of the structure affected.First experimental and analytical researches which tried to quantify forces due to hydraulic bore reach dates back to Cumberbatch [6], who tried to estimate tsunami impact on the wall and gave solution of a two dimensional fluid impact on a vertical wall.Cross [7] studied surge impact on vertical walls and improved formula of Cumberbatch [6] by adding gravity forces [Equation ( 5)]: where C F is the force coefficient which is function of the impacted wedge angle.Ramsden [8] observed the interaction of tsunamis with a vertical wall and found that the maximum measured force due to bores, surges and solitary waves propagating on dry bed is exceeded by the force computed from the maximum measured run-up, assuming hydrostatic conditions.Due to the vertical accelerations, model of Cross [7] under predicts the measured forces due to bore on dry bed by 30%-50%.In addition, empirical formulae for the maximum force due to bore impact and resulting moment exerted on a vertical wall, is developed [Equations ( 6) and (7)]: where F is force on the wall, F l force on the wall due to a run-up equal to twice the wave height, assuming hydrostatic pressure; H is the wave height at the wall, h o is still water depth, M is moment on the wall corresponding to the force F and M l is the moment corresponding to the force F l .Surge forces, F s , are hydrodynamic forces generated by the impingement of the advancing water front of the bore against the structure.There are still uncertainties about calculation of surge force due to bore.The most applicable formulation is recommended by City and County of Honolulu Building Code (CCH) [9], based on Dames and Moore [10], in Equation ( 8): Impulsive forces, F imp , are hydrodynamic forces caused by initial impact of the leading edge of the bore [11].According to Ramsden [8], the wave force over dry bed due to impulsive force does not exceed the expected drag force.However, a significant increase was observed on wet bed tests.As flood wave come in set of subsequent waves, the initial wave might not cause any impulsive forces, but the waves traveling over flooded terrain subsequently may cause impulsive forces on structures.Moreover, there are some other studies on dam-break phenomena in the literature.Researches on fluid-structure interactions after a dam break were initiated [1,12].Laboratory tests concerning dam break flows were carried out in straight and curved channels [13].Initial stages of the dam break flow were well detected by recording the flow using CCD cameras [14].Several numerical studies based on shallow water equations have been compared with experimental data of the dam break flow over channel with bottom obstacle [15].Oertel et al. [16] studied on 2D dam-break waves comparing physical and numerical data as well as analytical approaches.For various placed obstacles in the propagation area, drag forces are also analyzed and compared.Soares-Frazão and Zech [17] searched the dam break phenomena break in channels with 90-degree bend.Chanson [18] applied the method of characteristics to the dam break wave problem.
Arnason [19] reached similar conclusions in his experimental work.Design standards [4], in which results of Arnason [19] are included, recommend calculation of hydrodynamic force, acting on isolated building, as in Equation ( 3).They suggest applying C d = 3 in the equation to account for the impulsive component of the force, which implies formulation of impulsive force as: In this study, the most recent test results by Arnason et al. [19,20] are followed to develop an efficient and practical formula for predicting mean normalized forces exerting on different types of obstacles, in terms of engineering design perspective.For this purpose, (1) five statistical procedures are performed to investigate both the most parsimonious and most appropriate model that cannot violate the restrictive assumptions; (2) new variable that can take into account the shape of the obstacles exposed to dam break wave is suggested; (3) a new and practical formula for predicting the mean normalized force is suggested for different types of obstacles, which is missing in the previous research.

Experiments
In this study, experimental results of Arnason et al. [19,20] are used for predicting mean normalized forces on obstacles due to dam-break.The experiments are performed in a wave channel, which is 16.6 m long, 0.6 m wide and 0.45 m deep, with glass sidewalls.The bores are generated by lifting a 6.4 mm thick stainless steel gate, which initially separated a thin layer of water from the impoundment behind the gate.The gate was lifted in 0.2 s or less by a 64 mm diameter pneumatic cylinder driven by 0.5 MPa air pressure.A similar generation scheme as adopted was previously used by Ramsden [8] and Yeh et al. [21], which is capable of generating precise bores.The obstacles, which mimic structures in the tests, are three circular columns of varying diameters (Ds) as 140 mm, 60.6 mm and 29 mm, which will be referred to as large, medium and small, and two square columns (SCs) of 120 mm × 120 mm size in two different configurations, which will be referred to as square and diamond.The test matrix that is used in this study is tabulated in Table 1.
The normalized resistance coefficient is given as , with F x being force in the x-direction, ρ being density of the water during runs as 998 kg/m 3 , h being the bore height, b being width of the object perpendicular to the flow, and finally u being the flow velocity.In this study the mean normalized force, ̅ is timely averaged within interval t 1 -t 2 as [Equation (10)]: Table 1.Test matrix of [19].

The Statistical Analysis
The study is motivated by the problem of generating new efficient formula for better understanding of dam-break induced forces on structures of different cross sections, which is missing in Arnason [19].
In regression analysis, as the parameters are entered into the model, whether or not they have relationship with the output, it is highly possible that one can get higher R 2 but more important parameter is the "the adjusted R 2 ", indicating degree to which the related variable belongs.Therefore, statistical tests are performed in each regression model to better understand the relationship between variables.Second important factor in regression analysis is to construct parsimonious models in terms of engineering perspective, even if the predicting capability could drop in statistical error criteria.In this research, both of the two aspects are considered.
Some statistical analyzing techniques are applied to define the most influential parameters between ̅ and other nondimensional parameters as R eh , R eb , F r , h/b, in which R eh = uh υ is the flow Reynolds number, R eb = ub υ is the object Reynolds number, with υ being kinematic viscosity and u the flow velocity.F r = u gh is the Froude number, in which h is the bore height.

Basic Statistics and Simple Correlation Analysis
Table 2 shows the minimum and maximum values, mean and standard deviations for all variables of ̅ in order to summarize the characteristics of a data set.Correlation coefficients between ̅ and its components are calculated in Table 3.The correlation matrix is useful for checking the pattern of relationship between pairs.It must be noted that correlation coefficient does neither identify causality nor measure nonlinear association, but only linear association.Therefore, correlation analysis is performed to select statistically significant variables that have strong relationship with ̅ , considering only the linear relationship.In correlation analysis, the values close to "1" mean there is a strong relationship, whereas "0" indicates that the two variables are independent of each other.It is demonstrated in Table 3 that ̅ is highly correlated with h/b, R eh , F r , whereas lowly with R eb .The most linearly correlated parameter to ̅ is h/b.It should be taken into account, unfortunately, that variables except h/b, are correlated with each other in a medium level, which may cause collinearity problem in regression analysis.

Multiple Linear Regression Model
Linear regression is used to model the value of a dependent scale variable, based on its linear relationship to one or more predictors, which assumes that there is a linear relationship between dependent variable Y i and its predictors.This relationship is described in the following formula as [Equation (11)]: where b indicates regression coefficients, i is case number, p is the predictor indices and e is the error of observed value for i th case.In our model ̅ is the dependent variable whereas F r , R eh , R eb and h/b variables are independent.The R 2 of linear regression model is 0.642 while adjusted R 2 is 0.595.Usually, the more predictors included, the higher R 2 is obtained.Hence, adjusted R 2 is suggested for model complexity to provide a more fair comparison of model performance.The adjusted R 2 is calculated as [Equation ( 12)]: where R 2 is the square of multiple correlation coefficient, p is number of input fields, C is sum of case weights and p* is number of coefficients in the model.If the intercept is included p* = p + 1, otherwise p* = p.Table 4 represents the coefficient statistics, correlations and collinearity statistics.To interpret contribution of the predictors to the regression, it is not sufficient to take into account only the regression coefficients.To determine the relative importance of the significant predictors, the most known evaluation parameter is the standardized coefficient.Even though R eh has a lower coefficient (unstandardized) than F r , it contributes more to the model because it has larger absolute standardized coefficient when compared to F r .The most contributing variable of the model is h/b, with the highest standardized and unstandardized coefficients, 0.739 and 0.428, respectively.The rest are found to be insignificant contributors by t statistics when four variables are constrained to enter the model.Partial Correlation and part correlation statistics are presented in Table 4. Partial correlation is correlation between a dependent and an independent variable when linear effects of other independent variables in the model have been removed from both dependent and independent variables.If the linear effect of other variables is removed only from the examined independent variable, but not from dependent variable, this time correlation is named as part correlation.Therefore, part correlation can also be viewed as decrease in R square that results from removing a predictor from the model and sometimes called the semi partial correlation.The values of the partial and part correlations for all the parameters except h/b drop sharply from the zero-order correlation, implying that some variances in ̅ are explained by R eb , F r and R eb together.The second part of Table 4 is reserved to most known collinearity statistics and diagnostics.The tolerance is the percentage of the variance in a given predictor that cannot be explained by the other predictors.Thus, the small tolerances show that a high proportion of the variance in a given predictor can be explained by the other predictors.When the tolerances are close by "0" as an indication of high multicollinearity then standard error of the regression coefficients will be inflated.By definition, variance inflation factors (VIF) are inversely related to the tolerances.A VIF that is greater than 2 is usually considered problematic.The smallest VIF in the Table 4 is 6.264 for h/b.Other collinearity diagnostics, condition indexes, confirm that there are serious problems with multicollinearity.The condition indices are computed as the square roots of the ratios of the largest eigenvalue to each successive eigenvalue.The values of condition indices that are greater than 15 indicate a possible problem with collinearity while greater than 30 is a serious problem.One of the condition indices is larger than 30, suggesting a very serious problem with collinearity.Although a serious multicollinearity problem exists, the linear model indicates that the most predictive variable is h/b.

Stepwise Multiple Linear Regression Model
This procedure was used to determine the variable accounting for the majority of total yielded variability.In stepwise regression process, at each step, an independent variable that has the smallest probability of F is entered in regression.The variables that are already in the regression equation are removed if their F distribution probability becomes sufficiently larger than determined criteria.
In this study, the criteria was determined as; probability of F ≤ 0.050 to enter, whereas F > 0.1 to remove.From the perspective of collinearity, all variables must pass the tolerance criterion defined above to enter the regression equation, regardless of the entry method criteria.The tolerance level was determined as so small (0.0001), to gain all the variables in regression except perfectly correlated variables in which the tolerance is 0.000.In addition, a variable was not entered into regression if it would cause tolerance violation of other variables of regression.The dependent ̅ and four independent variables were used to construct the stepwise model.The regression statistics of final step are gathered in Table 5. Tolerance, VIF and condition index values show that the regression model is not ill from collinearity.Stepwise model is consisted of two steps and only two variables, h/b and R eh , are included in the final regression step, as these variables are found significant in regression.R eb and F r are found to be insignificant to predict ̅ , so these variables are not included in regression.The R 2 change is only 0.052 when R eh is entered in model in the second step.As seen from R 2 change, the contribution of R eh is very limited when compared to h/b to regress the ̅ .The ratio of the absolute values of standardized coefficients show that h/b is about three times more important than R eh (have more contribution rate in the prediction) to regress ̅ .

Principal Component Analysis
The principal component analysis is performed by including all four inputs as R eb, h/b, R eh , F r .
The variables in the system have to be intercorrelated, but they should not demonstrate extreme multicollinearity and singularity as this might cause difficulties in determining the unique contribution of the variables to a factor.The variables F r and R eh show high correlation because both are derived from the velocity, u, and bore height, h.Therefore, the variable F r is excluded from the Principal Component Analysis to prevent singularity, which is the extreme form of multicollinearity.
The Kaiser-Meyer-Olkin (KMO) test is performed, which is a measure of sampling adequacy whether partial correlations among variables are small or not.The sample is regarded as adequate if the value of KMO is greater than "0.5".In the analysis, KMO is calculated as 0.526, which is greater than 0.5.Bartlett's test of sphericity is applied to check if correlation matrix is an identity matrix, which could imply that variables are unrelated and therefore unsuitable for structure detection.Small values (less than 0.05) of the significance level indicate that factor analysis is valid.In the analysis, Bartlett's test of sphericity is found to be 0, which is less than 0.05.The next step is to determine number of factors to be retained.Several rules of thumb have been suggested in this regard as: 1. Keeping factors which have an eigenvalue greater than "1" (Guttman-Kaiser rule); 2. Retaining the factors which, in total, account for about 70%-80% of the variance; 3. Obtaining factors before the breaking point or elbow of trend in a scree-plots.
In this study, the first and the second methods are applied.It is seen in the second column of Table 6 that component 1 and 2 have eigenvalues greater than 1, which meets the first criteria.In addition in the columns of "cumulative %", the components 1 and 2 account for 97.998% of the variance which meet the second criteria.Therefore in the analysis two components are performed.The first component is most highly correlated with h/b while the second component is with R eh .Since the variable R eb has more or less the similar influence on both components, it is excluded.As a result, the variables h/b and R eh are suggested for regression analysis.

Cluster Analysis
Many algorithmic methods have been proposed so far for cluster analysis.In this study, Hierarchical cluster analysis technique is used.Hierarchical cluster analysis is a statistical method for searching relatively homogeneous clusters of cases based on measured characteristics.It begins with each case in a separate cluster and then combines the clusters sequentially, reducing the number of clusters at each step until only one cluster is left.
The distance method defines how the distance between two data points is measured.There are variety of options such as Euclidean, Minkowksi, Cosine, Correlation, Chebychev, Spearman.In this study, Euclidean distance weight function is employed.If there are two points in two dimensional space such as P(x 1 ,y 1 ) and P(x 2 ,y 2 ) then euclidean distance is defined as d = The linkage method defines how the distance between two clusters is measured.Nearest neighbor, furthest neighbor, Centroid clustering, Median clustering, and Ward's method are among those widely used.In this research, Ward's method is applied.The main idea behind Ward's method is that the linkage function specifying the distance between two clusters is computed as the increase in the "error sum of squares" (ESS) after fusing two clusters into a single cluster.Ward's method seeks to choose the successive clustering steps so as to minimize the increase in ESS at each step.
The dendrogram, which is a graphical summary of the cluster solution, is given in Figure 1 Variables are listed along the left vertical axis whereas the horizontal axis shows the distance between clusters when they are joined.To determine the number of clusters is a subjective process.As a rule of thumb, the longer distance the gaps are searched starting from the right.For example, there is a gap between distance 5 and 25.Therefore, two clusters are suggested.As a result of cluster analysis in Figure 1, ̅ shows similar behavior with R eb, F r , h/b whereas R eh belongs to the single element cluster.statistically significant.In addition, adjusted R 2 value of the model increased to 0.86 as seen in Table 10.This is a satisfactory result when compared with the other models that are constructed in scope of this study.The collinearity diagnostics shows a tolerably collinearity problem, because the condition index is below 15 and VIF is smaller than 2 (Table 9).Therefore, new model is suggested which is in harmony with the regression assumptions as:    In order to test how the new proposed variable w/h o can describe the "shape of influences" to the structures, triple diagram curves are plotted in Figure 4, in which all 35 data of [1] in Table 1 are employed.It is obvious that ̅ is highly dependent on these parameters.In addition, newly developed parameter w/h o is capable of describing the behavior of ̅ in an efficient way.Contrary to the Figure 1, there are no any transitional zones dependent on obstacles in Figure 4   The regression assumption that mean of the average of residuals must be zero is checked in Figure 5.It is observed that the mean of the residuals is calculated as −0.0043, which meets the first criteria.The second check is performed for conditional distribution of the residuals that should be normally distributed.As seen in the Figure 6, the residuals abide normal distribution function with a significance level of 0.05 by using Kolmogrov-Smirnov test.

Evaluation of the Proposed Model for Mean Normalized Force Computation
The performance of the proposed model is achieved in terms of graphical representation and three different error statistics as mean absolute relative error (MARE), mean squared error (MSE) and coefficient of determination (R 2 ).MARE and MSE indicate quantitative information of the model error with the characteristic that larger errors receive greater attention than smaller ones whereas R 2 demonstrates the measures of how much of the variation and trends in the observed data are predicted by the proposed model.The performance results of the proposed model for obstacles are given in Table 11.As can be seen in Table 11, the minimum MARE occurs for diamond obstacle as 0.0508 whereas the maximum is as 0.1339 for small.In overall, it is 0.0755, which states that both obstacles follow each other very closely and the proposed variable w/h o covers the influence of the obstacles in an efficient way.Based on MSE, both obstacles yield smaller prediction error.The overall prediction MSE is 0.0237.All obstacles yield similar results.As for R 2 , the predictions are successful especially except the obstacle rectangular, which is lower than the others (Figure 7).

Conclusions
In this study, several statistical approaches as simple correlation analysis, multiple linear regression model, stepwise multiple linear regression model, principal component analysis and cluster analysis are employed to propose new and efficient formula.The regression model based on h/b and newly proposed parameter w/h o is suggested to predict mean normalized force due to dam break.It is seen that the w/h o parameter takes into account the shape of the obstacles in an appropriate way, which makes up parsimonious model in accordance with the engineering applications.The average MARE, MSE and R 2 values of the proposed model are 0.0755, 0.0237 and 0.8643, respectively.This demonstrates that the proposed model gives satisfactory results.In terms of coefficient of determination, only the rectangular obstacle gives lower prediction capability than others.Similar approaches can be applied for future research for predicting maximum forces on obstacles.

Figure 2 .
Figure 2. The variation of C r (output) with relevant inputs.

Figure 3 .
Figure 3. Proposed "shape of influence" parameter for structures.
with proposed w/h o variable.One can easily predict the ̅ by just inserting w/h o and h/b on the figure, without describing the shape of structure in advance.

Figure 4 .
Figure 4. Distribution of ̅ with h/b and proposed w/h o .

Figure 7 .
Figure 7. Scatter diagrams around perfect line for different types of obstacles.

Table 5 .
Stepwise regression statistics of final (second) step.

Table 6 .
Total variance explained.The rotated component matrix, which is used for what the components represent, is given in Table7.

Table 9 .
The regression statistics of stepwise model.

Table 11 .
Performance of the proposed model for variety of obstacles.