Investigation of Attributes for Identifying Homogeneous Flood Regions for Regional Flood Frequency Analysis in Canada

Ziyang Zhang; Tricia A. Stadnyk

doi:10.3390/w12092570

and

¹

Department of Civil Engineering, University of Manitoba, Winnipeg, MB R3T 5V6, Canada

²

Matrix Solutions Inc., 3001-6865 Century Ave, Mississauga, ON L5N 7K2, Canada

³

Department of Geography, University of Calgary, Calgary, AB T2N 1N4, Canada

^*

Author to whom correspondence should be addressed.

Water2020, 12(9), 2570;https://doi.org/10.3390/w12092570

This article belongs to the Special Issue Hydrological Extremes in a Warming Climate: Nonstationarity, Uncertainties and Impacts

Version Notes

Order Reprints

Abstract

The identification of homogeneous flood regions is essential for regional flood frequency analysis. Despite the type of regionalization framework considered (e.g., region of influence or hierarchical clustering), selecting flood-related attributes to reflect flood generating mechanisms is required to discriminate flood regimes among catchments. To understand how different attributes perform across Canada for identifying homogeneous regions, this study examines five distinctive attributes (i.e., geographical proximity, flood seasonality, physiographic variables, monthly precipitation pattern, and monthly temperature pattern) for their ability to identify homogeneous regions at 186 gauging sites with their annual maximum flow data. We propose a novel region revision procedure to complement the well-known region of influence and L-Moments techniques that automates the identification of homogeneous regions across continental domains. Results are presented spatially for Canada to assess patterning of homogeneous regions. Memberships of two selected regions are investigated to provide insight into membership characteristics. Sites in eastern Canada are highly likely to identify homogeneous flood regions, while the western prairie and mountainous regions are not. Overall, it is revealed that the success of identifying homogeneous regions depends on local hydrological complexities, whether the considered attribute(s) reflect primary flooding mechanism(s), and on whether catchment sites are clustered in a small geographic region. Formation of effective pooling groups affords the extension of record lengths across the Canadian domain (where gauges typically have <50 years of record), facilitating more comprehensive analysis of higher return period flood needs for climate change assessment.

Keywords:

regional flood frequency analysis; flood-related attribute; region of influence; flood region revision process; Canadian annual maximum flow

1. Introduction

Designing future infrastructure for flood resiliency is necessary and crucial for emerging design standards. Flood frequency analysis (FFA) is often used to estimate flood quantiles for river infrastructure design to prevent structural failure or inadequacy during extreme flood events. Given its importance, a growing number of countries have carried out nation-wide study for advanced methods of FFA to improve design flood estimation [,,]. Outcomes from these studies can be generalized into published guidelines, which are beneficial for domestic end-users in terms of simplicity and consistency and for reducing the element of subjectivity within the design process [,].

In Canada, flooding has been recognized as the most frequent and costliest of natural disasters over the past 100 years, claiming considerable economic and social losses for cities, urban clusters, and agricultural land use []. Consistent with the recent assessment of 23 unsolved problems in hydrology led by the International Association for Hydrological Sciences (IAHS), improved analysis of the magnitude of extreme (flood and drought) events and the variability in these events is a critical area of research in hydrology []. General requirements for design floods at primary crossings or for floodplain delineation in Canada necessitate a one in 50 year or 100 year event [,]. The 2T rule of traditional at-site FFA recommends at least 100 years flood data [], while only 1.05% of flow gauging sites (67/6379) in the Canadian HYdrologic DATabase (HYDAT) satisfy this criterion []. As a result, FFA based on regional information, or regional flood frequency analysis (RFFA), is a particularly important method for Canadian design flood estimation.

The identification of homogeneous flood regions is of paramount importance for RFFA, as it marks the first step in the FFA process, forming the period of record for analysis [,]. The homogeneous region is a collection of hydrologically similar catchments so that flood information, such as annual maximum flow, can be reasonably and effectively transferred within the defined flood region using transformation methods, such as index-flood [,,,,]. Many studies [,,] have focused on investigating different regionalization frameworks and techniques; for example, in Canada, the statistical hydrology research group in Québec (Groupe de Recherche en Hydrologie Statistique, GREHYS) [] compared four different regionalization techniques, including region of influence (ROI), canonical correlation analysis, hierarchical cluster analysis, and L-Moments statistics, for delineating homogeneous flood regions in Québec and Ontario. For the same study area, Wazneh et al. [] endorsed catchment regionalization based on statistical depth function over ROI and canonical correlation approaches because of robust region identification process and improved accuracy of pooled estimation. Zadeh and Burn [] delineated 1114 Canadian gauging sites into six super hydrological regions based on flood seasonality statistics, drainage area, and mean annual precipitation. The concept of delineated super regions was later adopted to calibrate a nonparametric model for ungauged pooled estimation [,]. Though regionalization techniques may differ, the selection of arbitrary, flood-related attributes is required for all regionalization techniques in order to effectively discriminate between flood behaviors among catchments sites. Geographical contiguity has been frequently used as an attribute because hydrological variability tends to be smaller within smaller geographical regions [,,]. For large catchments with fewer gauging stations, however, cohesive flood behavior associated with geographic contiguity is often reduced. This is often the case for rural and northern Canada and regions with highly dynamic flood response, such as the Prairie Pothole region [,].

Other widely used flood attributes include physiographic, climatic, and statistical types. Each type of attribute can effectively measure flood similarity, and thus be used to identify homogeneous flood regions. Burn [] considered coincident annual peak flood values as the prime flood-related variable for 41 sites in southern Manitoba, Canada. Catchment geographical distribution and local empirical knowledge were also embedded in the regionalization process. Three homogeneous regions were formulated, acknowledged by a statistical homogeneity test. In the same study area, Burn [] derived a composite attribute to group catchments. The attribute comprised coefficient of variation (CV) of floods, mean annual flow divided by drainage area (QDA), and latitude and longitude of the gauging station. The results also showed that CV and QDA were relatively more effective than geographical proximity for forming homogeneous regions, with the CV attribute being more informative than the QDA. The type or the method of forming a composite attribute to describe multiple aspects of flood characteristics is often considered more informative for dividing catchments into distinct flood regions []. Weighting each variable within the composite attribute, however, introduces the element of subjectivity. Additionally, variable selection is often identified mostly on localized physiographic and climatic knowledge rather than analytical reasoning [,,].

Recent studies of RFFA in Canada tend to focus on deriving a robust quantile regression model for ungauged frameworks [,,,,,]. Among these studies, the following variables were frequently considered in their quantile regression model as flood influential attributes: latitude and longitude of gauging stations, CV, QDA, mean annual precipitation, and basin slope.

The geographic extent of Canada means that water resources engineering practice is generally governed at the provincial level and the boundary, as opposed to federal jurisdiction, which is more common in other countries [,]. As a result, methods of RFFA have been inconsistently applied among government agencies, academic communities, and industrial partners [,,]. To tackle this problem, the Natural Sciences Engineering Research Council of Canada (NSERC) funded FloodNet Strategic Network project unified researchers across Canada to develop nation-wide flood forecasting and water resources management strategies. An important mandate was to research standardized FFA methods and techniques tailored for Canadian hydrological environments []. Within this network, Sandink et al. and Zahmatkesh at al. [,] examined FFA using a quantile regression model that considered ungauged catchments across Canada. Zhang et al. [] demonstrated the generalized extreme value (GEV) distribution fits Canadian annual maximum flow data considerably better than other well-known distributions, including generalized logistic, Pearson type III, and log Pearson type III distributions. Others [,,,,] focused on developing regionalization techniques using peaks-over-threshold (POT) flood data, which is advantageous for gauging sites where annual maximum flood records are limited. Little attention has been paid to the examination of different flood-related attributes and their characteristics for identifying homogeneous flood regions.

Here, we consider five distinct categories of frequently used attributes (i.e., geographical proximity, physiographic variables, flood seasonality, monthly precipitation pattern, and monthly temperature pattern) and investigate their relevance in identifying homogeneous flood regions for RRFA applications on a continental, Canada-wide scale. Their abilities to identify homogenous regions are investigated across major hydrological sub-regions of Canada. Regional hydrological characteristics are used as input to analyze homogeneous region identification results. To increase efficiency of our analysis and minimize the element of subjectivity, a novel automated regionalization process that combines the well-known ROI [,] approach with a proposed automatic region revision algorithm (ARRA) is introduced and demonstrated for applicability to continental domains. Memberships of two regions are selected as a case study to provide insight into membership characteristics. Findings of this study are deemed to be an important contribution toward the Canadian statistical flood estimation guideline under the FloodNet project.

2. Materials and Methods

2.1. Rationale for Attribute Selection

Geographical proximity is selected based on the rationale that catchments closer to each other generally encompass similar hydrological and physiographical characteristics, and, therefore, catchments with smaller geographical proximity are likely to exhibit a similar flood regime and to form a homogeneous region. The presence of large spatial variability in flood characteristics might question the use of geographical proximity, therefore, directly using physiographic variables that exert key influence on the dominant flood generating mechanisms provides another way to group sites with similar flood behavior. Geographical proximity and physiographic variables are the most common flood-related attributes for catchment regionalization and thus are included in this study.

As previously noted, flood seasonality has the advantage of convenience in attribute extraction. In addition, it has been previously applied and was found to be beneficial for flood studies in Canada for catchment classification [,,] and in the formation of homogeneous regions [,].

Monthly precipitation and temperature patterns consider monthly average precipitation and temperature for the location of the catchment site. These values are provided by Environment and Climate Change Canada (ECCC) [], computed for each catchment site in this study using historical monthly climate grids for North America [,].

Flood generating mechanisms in Canada are generally dominated by either rainfall (pluvial), snowmelt (nival), and rain-on-snow (mixed) events [,]. The monthly patterning of precipitation and temperature are considered to contain key information concerning the dominant flood generation process. For example, precipitation accumulation during winter months dominates the magnitude of the spring melt event. Large precipitation values in summer and fall suggest rainfall-driven peak floods. Temperature values in the melt season influence the timing and the magnitude of spring peak floods. Therefore, we explore these attributes given their potential usefulness in mapping regional flood characteristics.

2.2. Datasets

Annual maxima flood samples are taken from the Canadian Reference Hydrometric Basin Network (RHBN). Developed by Water Survey of Canada, the RHBN constitutes 223 gauging sites in total at the start of this study, and is only a small subset of the Canadian hydrometric gauging network (6379 gauging with flow record sites in total) []. RHBN sites are identified as near pristine catchments, high quality flow measurements, with an absence of anthropogenic control [,]. These merits make their flood data ideal for RFFA. In addition to the 223 RHBN sites, only 186 sites have corresponding physiographic variables available, supported by ECCC []. Therefore, we consider a total of 186 gauging sites in this study, generating a total of 186 annual maximum flood samples. Although RHBN stations generally have flow records that are greater than 20 years in length, some sites are seasonally operated, which means that not all calendar years are able to derive the annual maximum flood. The average station record length among our samples is 48 years, with a maximum of 103 years and a minimum of eight years. More than 80% of samples have station record lengths greater than 30 years.

The geographical distribution of the 186 sites is presented in Figure 1, with corresponding record length distributions of the 186 sites presented in Figure 2 (the x-axis corresponding to the longitude, from west to east, noted by province or territory. Figure 1 and Figure 2 indicate that most study sites in British Columbia and the Atlantic provinces have relatively higher record lengths compared to other regions. The prairie provinces, particularly Saskatchewan and Manitoba, have relatively fewer stations and relatively shorter record lengths. The three northern territories have the fewest number of gauging sites and an average record length of 40 years.

Figure 1. Geographical location of 186 study sites identifying primary cause of flood response.

Figure 2. Distribution of record length for the 186 flood samples. Sites in the ten provinces are plotted in order of longitude from west (left) to east (right). Sites in the three territories are plotted to the right most of the figure, with three sites in northeast Québec embedded within the Atlantic provinces.

2.3. Defining Attribute Similarity Distance

2.3.1. Geographical Proximity

The latitude and the longitude of the gauging stations are used to calculate the geographical distance between two catchments. The similarity distance between catchment m and n is defined as:

d_{m n} = {[{(L a t_{m} - L a t_{n})}^{2} + {(L o n_{m} - L o n_{n})}^{2}]}^{0.5}

(1)

where

L a t_{m}

and

L a t_{n}

are the latitude and the longitude coordinates for the gauging site of catchment

m

. We use geographical coordinates for the above equation, which can cause minor discrepancies in the calculation or the comparison of one-degree longitude approaching the polar region.

2.3.2. Physiographic Variables

The selection of physiographic variables is based on the stepwise regression method, which has been used to select flood-related attributes in previous studies [,,]. The stepwise regression method is an automatic procedure used to select explanatory variables based on the development of a multilinear regression model. Candidate variables are iteratively added and removed based on the use of statistical t-test until the predictive power of the regression model is optimized. In this study, 66 sets of different physiographic variables at each site are obtained from ECCC []. Because different variables have different units and scale, variables are normalized by their standard deviation prior to the regression. The dependent variable for the stepwise regression considers the median value of each flood sample, which corresponds to a 2-year return period flood. The median value is considered a robust indicator of flood characteristics and is meant to reduce impact from outlier flood values [,]. Consequently, the stepwise method recognizes the following variables as sufficiently explanatory of flood characteristics: (1) catchment area, (2) waterbody area in the catchment, (3) standard deviation of elevation across the catchment, (4) average annual air temperature for the catchment, and (5) average annual precipitation for the catchment. Variables (2) and (3) are derived from the ECCC National Hydrology Network database. Variables (4) and (5) are computed based on 10 km historical gridded climate data representing a 30 year period of record from 1981 to 2010. Data provided by ECCC are computed using historical monthly climate grids for North America [,].

The similarity distance between catchment m and n is calculated based on a weighted Euclidean distance formula defined as:

d_{m n} = {[\sum_{j = 1}^{k} w_{j} {(x_{m j -} x_{n j})}^{2}]}^{0.5}

(2)

where k is the number of physiographic variables, w_j is the weighting factor for the physiographic variable

j

, and

x_{m j}

is the standardized value for the physiographic variable

j

of catchment

m

.

w_{j}

controls the relative importance of variable

j

. Here, weights of 0.4 were assigned to the basin area and 0.15 to the remaining four variables. These weights corresponded to variable coefficients in the computed stepwise model, rounding to the nearest 0.05 digit.

2.3.3. Flood Seasonality

Similarity between catchments is measured using a unit polar coordinate system. A catchment is presented as a point in the polar coordinate space and can be positioned by angular and radial values. The angular value reflects the average date of flood occurrence, whereas the radial value reflects the variability in the occurrence date of floods. A larger radial value indicates smaller variability in occurrence date; a radial value of one indicates no variability in occurrence date, implying that all floods occur on the same day of each year.

Based on Burn [], for a single flooding event, the date of occurrence of the event is transformed from Julian day to an angular value, where Julian day one is 1 January and day 365 is 31 December, using:

θ_{i} = {(J u l i a n D a t e)}_{i} (\frac{2 π}{365})

(3)

For a given catchment flood sample composed of

k

flooding events, its Cartesian coordinates

\bar{x}

and

\bar{y}

in the unit circle are calculated as:

\bar{x} = \frac{1}{k} \sum_{i = 1}^{k} c o s (θ_{i})

(4)

\bar{y} = \frac{1}{k} \sum_{i = 1}^{k} s i n (θ_{i})

(5)

Therefore, the similarity distance between catchments

m

and

n

is calculated as:

d_{m n} = {[{({\bar{x}}_{m} - {\bar{x}}_{n})}^{2} + {({\bar{y}}_{m} - {\bar{y}}_{n})}^{2}]}^{0.5}

(6)

Followed by the Durocher et al. [] classification, sites used in this study are further classified into nival, pluvial, and mixed regimes based on their flood seasonality statistics and localized geographic and climatic environments (i.e., classifications noted on Figure 1 and Figure 3, respectively). Nival sites are subject to regular flood occurrence dates for the spring snowmelt period. These sites are generally located in cold regions of Canada such as continental interior, mountainous British Columbia, and northern Canada. A smaller number of sites are exclusively pluvial-driven with average annual flood occurrence from November to February. These sites are in the warmest regions of Canada, which are coastal southwest British Columbia and Vancouver Island. A substantial number of study sites are classified as mixed response. These sites experience warm to mild winters and are predominately located in southeastern Ontario, southern Québec, and the Atlantic provinces. Peak floods for these sites can be either spring snowmelt, rain-on-snow, or single heavy rainfall events. Their wide range of regularity in the flood seasonality space provides an effective indication of annual peak floods driven by multiple flood responses.

Figure 3. The 186 study sites plotted in flood seasonality space.

2.3.4. Monthly Precipitation Pattern

Similarity measures based on precipitation patterns are attributed to the values of monthly average precipitation from January to December for each catchment site. The correlation coefficient is selected to assess the similarity measure between two catchments. In contrast to Euclidean distance, the correlation coefficient is considered more effective when characterizing the pattern of two datasets, as it measures the degree of linearity of the datasets, while the Euclidean distance measures the distance between two points in a matric space. The correlation coefficient between catchment

n

and

m

is described as:

r_{n m} = \frac{\sum_{i = 1}^{12} (x_{n i} - \bar{x_{n}}) (x_{m i} - \bar{x_{m}})}{\sqrt{\sum_{i = 1}^{12} {(x_{n i} - \bar{x_{n}})}^{2}} \sqrt{\sum_{i = 1}^{12} {(x_{m i} - \bar{x_{m}})}^{2}}}

(7)

where

x_{n i}

is the monthly average precipitation value for month

i

of catchment

n

, and

\bar{x_{n}}

is the average of the 12 monthly average precipitation values for catchment

n

expressed as:

\bar{x_{n}} = \frac{1}{12} \sum_{i = 1}^{12} x_{n i}

(8)

r_{n m}

ranges from

- 1

to

1

, with values exactly equal to

1

(

- 1

) indicating a perfect positive (negative) linear relationship between two datasets, and values exactly equal to

0

indicating no linear relationship. For the similarity measure of catchment

m

and

n

,

r_{n m}

closer to

1

indicates a stronger positive linear relationship between catchment

m

and

n

, therefore, the similarity distance based on the correlation coefficient is computed as:

d_{n m} = 1 - r_{n m}

(9)

2.3.5. Monthly Temperature Pattern

In common with the similarity measure for precipitation patterning, temperature patterning is computed from monthly average temperature for each catchment. Monthly average temperature data for catchment

n

and

m

are then input into Equations (7) and (8); Equation (9) is used to calculate the similarity distance between the two catchments.

2.4. Region of Influence Approach

The ROI approach [,] is used given its flexibility of identifying flood regions for each study site. The ROI defines target sites as having a unique flood region. The addition of other sites to the region proceeds in order of the shortest similarity distance to the greatest. Determining the number of sites in a region requires a trade-off between the size of the region and the quality of the region. A larger region benefits flood estimation at larger return periods (i.e., generates longer records), however, the quality of the region (i.e., homogeneity in flood characteristics) generally decreases as more sites are added to the region. For RFFA, the 5T rule for region size (i.e., total station-year of record of the region) states that regions should optimally have five times greater record length than the return period of interest (T) and has been widely accepted as a guideline for optimal trade-off [,]. The 5T rule was adopted in this study.

2.5. Generalized Extreme Value (GEV) Distribution and L-Moment Estimation Method

The GEV distribution is used to estimate flood quantiles. The GEV distribution has been determined to be more robust for fitting annual maximum flow at RHBN stations than other commonly used three parameter distributions []. The index flood L-Moment parameter estimation method is recommended by many studies for its simplicity, robustness, nearly unbiased estimation, and convenient integration with the GEV and the L-Moment homogeneity test [,,].

2.6. L-Moment Homogeneity Test

The homogeneity test aims to verify if sites in the flood region exhibit similar flood characteristics at an acceptable level of statistical significance. Since L-Moments are considered unbiased statistics of flood data, the L-Moment homogeneity test has received much attention in RFFA applications [,,,,,]. Based on Hosking and Wallis [], the first step of the homogeneity test is to determine the regional L-Moment ratios

t^{R}

,

t_{3}^{R}

, and

t_{4}^{R}

, denoted as the regional L-CV, L-skewness, and L-kurtosis, respectively. For a region comprises N sites, the regional L-Moment

t^{R}

(similarly apply for

t_{3}^{R}

and

t_{4}^{R}

) is calculated as:

t^{R} = \sum_{i = 1}^{N} n_{i} t^{(i)} / \sum_{i = 1}^{N} n_{i}

(10)

where

t^{(i)}

is the at-site L-Moment ratio for site i, and

n_{i}

is the record length for site

i

.

Dispersion can then be expressed as:

V = {[\sum_{i = 1}^{N} n_{i} {(t^{(i)} - t^{R})}^{2} / \sum_{i = 1}^{N} n_{i}]}^{0.5},

(11)

To assess if the dispersion,

V

, is within the limit of region homogeneity, two variables are required:

μ_{V}

, the expected mean

o f

V; and

σ_{V}

, the expected standard deviation of V.

μ_{V}

and

σ_{V}

are estimated through many reproductions of the original region. To do this, a Kappa distribution fit by L-Moment ratios of

1

,

t^{R}

,

t_{3}^{R}

, and

t_{4}^{R}

is used to reproduce the

N_{s i m}

, or number of original regions (

N_{s i m}

= 1000 used in this study). Each reproduced region has the same region size (

N

sites in a region) and the same record length,

n_{i}

for site

i

, with respect to the original region.

For each reproduced region, the dispersion,

V

, is calculated using Equations (10) and (11). Based on the

N_{s i m}

number of

V

values, the expected mean

σ_{V}

and the expected standard deviation

μ_{V}

can be obtained.

Lastly, the homogeneity statistic is defined as:

H = \frac{V - μ_{V}}{σ_{V}}

(12)

where

H

is the homogeneity statistic. A region is regarded as acceptably homogeneous if

H < 1

, possibly homogeneous if

1 \leq H < 2

, and heterogeneous if

H \geq 2

. In this study,

H < 1

was used to determine if the region was homogeneous.

2.7. Automatic Region Revision Algorithm (ARRA)

For a given target site and ROI and any attribute, the initial flood region formation often still tests heterogeneous. Many studies have reported this situation, and, subsequently, a region revision process is needed to reduce region heterogeneity by editing the initial group membership [,,,]. The revision process includes steps such as adding, deleting, and replacing site(s) within the initially formed region, subsequently testing for homogeneity after each progressive change. In past studies, this is largely carried out through a heuristic process, meaning there is no set procedure regarding the order of steps or the methodology of revision [,,,]. For our large-scale study, however, it is ineffective to proceed via a heuristic process for each region, therefore, an automatic region revision algorithm (ARRA) was /designed with the intent of reducing region heterogeneity through an automatic and non-subjective modification of the region membership.

A heterogeneous region is input into the ARRA, and a revised region with improved homogeneity is output from the first iteration. If the output region does not meet the homogeneous criteria (i.e.,

H < 1

), the ARRA can be reapplied to the region to further reduce heterogeneity. Each time the region membership is modified, the homogeneity of the membership increases, but the attribute similarity decreases because the newly added site(s) have larger attribute distance(s) compared to the replaced site. As a region should be formed primarily based on attribute similarity, the number of ARRA iterations needs to be constrained to ensure an appropriate trade-off between region homogeneity and attribute similarity. We perform a sensitivity analysis on the number of ARRA iterations used to revise 186 randomly formed initial pooling regions by counting the number of homogeneous regions produced after each ARRA iteration. From this sensitivity analysis, we determine that a maximum of five iterations of the ARRA should be applied (see Section 3.1. ARRA performance). If, after five iterations of the ARRA, a region still tests heterogeneous, this region is regarded as unable to form a homogeneous region.

Figure 4 illustrates methodological procedure followed by the ARRA. The L-Moment homogeneity test is embedded in the ARRA and used to identify sites that should be removed and new sites that should be added to achieve the greatest improvement in region homogeneity. The order of searching for a newly added site depends on attribute similarity, such that shorter attribute similarities are tested first. The process terminates once an improved region is formed and the 5T region size rule is satisfied.

Figure 4. Schematic of the automatic region revision algorithm (ARRA) process.

2.8. Flood Region Identification Process

For each of the five considered attributes, the process of identifying flood regions is demonstrated below. First is the identification of the initial flood region for a study site, which uses the ROI approach to group regions based on attribute similarity alone. To be specific, catchment sites having the shortest attribute distance to the target site are pooled into the initial region. The region size is set to 500 station years of record, which allows for accurate estimation up to the 100-year flood according to the 5T rule. Next, the homogeneity of the initial region is assessed using the L-Moment homogeneity test. If the initial region is heterogeneous, the ARRA is applied to revise region membership, up to a maximum of five iterations. The homogeneity of the revised region is re-evaluated using the homogeneity test. This process was repeated for all 186 study sites, and the total number of homogeneous regions identified for each attribute was determined.

Annual maximum flows for all region members are typically used for the homogeneity test and the subsequent flood quantile estimation. In this study, however, we purposely exclude annual maximum flows at the target site to afford more robust and rigorous evaluations of homogeneity and flood quantiles (i.e., a leave-one-out analysis), and therefore, our methodology can later be applied for ungauged regional analyses.

2.9. Assessing the Accuracy of Regional Flood Quantiles

Estimated regional flood quantiles are compared to “true” flood quantile determined by at-site samples. It is common, in practice, to determine “true” quantiles from at-site FFA when the return period of interest is below half the at-site record length (i.e., a 2T rule) []. Comparison of regional and at-site quantiles provides a means to assess the accuracy of regional estimates relative to standard practice.

There were only 11 sites with record lengths greater than 90 years included in this study, therefore, for the purpose of reliable at-site estimation, the return periods selected for comparison could not be extreme quantiles; we selected a range of 20 to 45 years. For each return period, T, the selected sites were those that were able to identify 5T homogeneous regions across all attributes and those having record lengths greater than 2T for reliable at-site estimation. A homogeneous region is easier to form for smaller region sizes, therefore, the number of sites available for analysis for each return period differed, with more sites meeting our criteria at smaller return periods.

Table 1 lists the return periods considered for comparison, the number of sites considered at each return period, and the required record lengths for adequate at-site and regional quantile estimates. It is noteworthy that flood estimation for both at-site and regional methods was subject to sampling uncertainty, with the uncertainty bound decreasing with decreasing return period. Thus, the smaller return periods provided improved reliability for assessing results.

Table 1. Required record length for at-site and regional estimate at different return periods used in analysis.

R e l a t i v e b i a s = \frac{1}{n} \sum_{i = 1}^{n} \frac{Q_{i} - q_{i}}{q_{i}} \times 100 %

(13)

R e l a t i v e R M S E = {[\frac{1}{n} \sum_{i = 1}^{n} {(\frac{Q_{i} - q_{i}}{q_{i}})}^{2}]}^{0.5} \times 100 %

(14)

where

Q_{i}

is the quantile of regional estimate for site

i

,

q_{i}

is the quantile of at-site estimate for site

i

, and

n

is the number of available sites that analyses for each return period.

3. Results and Discussion

3.1. ARRA Performance

Table 2 shows the resulting number of homogeneous regions produced by the attribute and the number of ARRA iterations. When the ARRA is not applied and the regions are formed based on shortest attribute distance alone, it results in only five to ten sites (of 186) sites that form homogeneous regions across all attributes. Forming homogeneous regions based on attribute similarity alone is, therefore, found to be unproductive, and the use of the region revision process (i.e., the ARRA) to revise initial regions is deemed necessary.

Table 2. The number of homogeneous regions identified for each attribute with target region size 500 station years of record. For each ARRA iteration, bold italicized number(s) indicate the best outcome across the five considered attributes; if two attributes tested equally, they were both best outcomes.

Once implemented, the number of homogeneous regions the ARRA identifies non-linearly increases with the number of ARRA iterations for all attributes. In general, the number of homogeneous regions increases significantly for one to three iterations of the ARRA and increases less from four to eight iterations. Two to four iterations of the ARRA results in identification of relatively more homogeneous regions when considering geographical proximity, precipitation, and temperature patterning than for flood seasonality and physiographic attributes. For five or more iterations, monthly precipitation pattern produces the most homogeneous regions.

To determine a suitable threshold for the number of ARRA iterations, an alternative series composed of 186 regions, for which membership was randomly formed (i.e., without the use of attribute similarity), is used and one to eight iterations of the ARRA applied (last column, Table 2). Comparing results between the five attributes and the alternative series, we find that attribute similarity is largely irrelevant to the identification of homogenous regions after eight iterations. At five iterations, approximately half of the sites form homogenous regions across all attributes, and the number of regions associated with each attribute remains greater than the alternative series. This suggests reasonable preservation of attribute similarity as a selection criterion. We therefore find a maximum of five iterations of the ARRA to be a suitable balance between maintaining attribute similarity for a region and leveraging the revision power of the ARRA.

With appropriate use of the ARRA (i.e., five iterations), approximately 79 to 99 sites of 186 sites identify homogeneous regions across all attributes. This is significantly higher than the five to ten sites identified prior to the use of the ARRA.

3.2. Identification of Homogeneous Regions

When the ARRA is applied for five times (or less), monthly precipitation pattern identifies the largest number of homogeneous regions among all other attributes, followed by temperature pattern, geographical proximity, and flood seasonality. Physiographic variables produce the fewest number of homogeneous flood regions. Differences among the attribute results are relatively small, where the total difference between the two most extreme results (flood seasonality and monthly precipitation pattern) is 21 sites, which is ~11% of the 186 study sites.

Figure 5 shows homogeneous region identification across Canada for each attribute. Note that sites that could not identify a homogeneous region but may be members of another site’s homogeneous region are also indicated in blue. Catchment sites are non-uniformly distributed across Canada, with clusters in southern Canada aligned with urban development and large populations, while remote and sparsely gauged regions are often found in the continental interior and mid to high latitudes of the continental landmass.

Figure 5. Sites achieving homogeneous regions (red) relative to those that did not (blue), shown by geographic location for each attribute. Note that ARRA was applied up to a maximum five iterations.

Results are generally similar across all attributes at the national scale, with regionalized discrepancies identified. In general, all attributes readily identify homogeneous regions in eastern Canada, while, in western Canada (particularly the interior and the northern regions), the identification of homogeneous regions is more problematic. Catchment sites in eastern Canada are generally clustered in small geographical areas, therefore, they experience more similar flooding behavior. Site clusters are also found in Vancouver Island and southeast British Columbia, where considerable homogeneous regions were also identified across all attributes.

As catchment sites in eastern Canada are more tightly clustered, less variability in flood attributes is expected. Figure 6 presents three boxplots comparing catchment physiographic variables between eastern and western sites with respect to catchment area, water body area in the catchment, and standard deviation of elevation across catchment. The variability in attribute physiography for the eastern sites is noticeably less than that for the western sites, particularly for the standard deviation of elevation across catchment.

Figure 6. Boxplots comparing physiographic variability of eastern and western sites across Canada. Eastern (Western) computed based on 76 (110) sites; Ontario-Manitoba border is considered the east–west divide, respectively. Boxes represent 25th and 75th percentiles and the median (black line); whiskers extend to extreme values without outliers, where outliers are defined as 1.5 the interquartile range (outliers are removed for scaling purposes).

Some site clusters are found across the southern Canadian prairies in Alberta, Saskatchewan, and Manitoba, where annual peak floods are predominated during the spring snowmelt period. The geographical proximity attribute is typically effective when sites are clustered. Important nival regime influences, such as snowpack accumulation and timing and rate of snowmelt, are reflected in attributes such as monthly precipitation pattern, monthly temperature pattern, and flood seasonality. The regional pooling results, however, show that not many catchment sites within the cluster groups identified homogeneous regions across all attributes. Though site clusters are found in both eastern Canada and the southern prairie region, homogeneity often occurs across large regions, indicating geographical contiguity cannot always warrant effective homogeneous region identification.

Literature indicates that the Canadian Prairies are known for their hydrological complexity, mainly attributed to the presence of potholes and hummocks, which results in highly variable fill and spill runoff process and dynamic effective drainage area, leading to highly non-linear flooding generating mechanisms [,,,]. Zhang et al. [] provided statistical evidence that annual maximum flows from prairie RHBN stations are difficult to adequately fit robust distributions as well, including GEV, log Pearson type III, and generalized logistic distributions. This is a strong indication of multiple flood responses occurring at a single catchment site. Ehsanzadeh et al. [] studied prairie flood response based on nine prairie sites and revealed noteworthy non-linear flood frequency curves.

In addition, flood record length across the Prairies is generally limited (Figure 2). The average record length over 28 prairie sites is 25 years, which is substantially lower than the rest of the 158 sites examined across Canada (having an average record length of 52 years). In order to develop a flood region with 500 station years, more catchment sites must be pooled into these flood regions. This adds an additional challenge for developing homogeneous regions, since more sites leads to more variable flood responses within the flood region.

In Burn [,], wherein successfully identified homogeneous regions were formed for southern Manitoba, region identification did not simply rely on attribute similarity measures. A heuristic membership revision process was applied with subjective trial and error to improve region homogeneity. Such region revision approaches are more statistically rigorous than our proposed ARRA, however, they require practitioners to have sound knowledge of local hydrology and are more statistically sophisticated []. Our method, on the other hand, is designed to be accessible to all water resource practitioners seeking to perform food frequency analysis.

For the mountainous western Canada region, annual peak floods are predominately snowmelt and rain-on-snow regimes, though rare heavy rainstorms can also trigger annual peak floods in smaller basins []. Homogeneous region identification maps are noisy along the cordillera mountain chain for all attributes, namely, it is difficult to interpret a distinctive spatial pattern. In southern British Columbia and Alberta, some sites identify homogeneous regions, however, locations of these sites differ amongst attributes. In central British Columbia and Alberta, and south of Northwest Territories, only flood seasonality and monthly precipitation patterns identify homogeneous regions, and just for a few sites. The western mountain chain is subject to highly variable climate and basin characteristics. Flood generation mechanisms are influenced by combined basin features including catchment size, drainage topography (e.g., channel slope, floodplains, alluvial fans, canyons), localized snow accumulation and distribution, as well as glaciation and avalanches []. These basin features, as well as temperature and precipitation, are highly variable spatially and temporally in mountainous environments []. Attributes selected in this study capture flood behavior from a limited set of physiographic characteristics and are likely not rigorous enough for catchment regionalization in the mountains.

In northern Manitoba, Northwest Territories, and Nunavut, catchment sites are characterized by cold subarctic climate, barren and tundra rolling landscape, as well as long-lasting (five or six months of the year) snow and ice cover underlain by permafrost []. Annual peak floods are primarily snowmelt driven; therefore, the duration and the rate of snowmelt are key characteristics for grouping catchment sites. Homogeneous region identification shows that monthly temperature pattern is more effective than other attributes because it captures timing, rate, and duration of snowmelt driven flood behavior. Some sites also identify homogeneous regions using flood seasonality, possibly because duration and rate of snowmelt are inherently correlated with the average and the variation of peak flood dates.

We find two general and probable causes that account for the inability to identify homogeneous flood regions in Canada. First, the clustering or the proximity of gauge sites has considerable impact on the outcome of identifying homogeneous regions, regardless of the attribute considered. The tendency for attributes to be more similar within smaller geographical regions is significant, despite the fact that regions (for attributes other than geographical proximity) can also include sites that are non-proximal. Second, attributes selected in this study measure a distinctive hydrological feature, however, across large spatial domains (e.g., Canadian landmass), there exist significant local-scale hydrological complexities that influence flood generation mechanisms. For sites that are influenced by multiple hydrological characteristics, our attribute selection is not rigorous enough to capture the particulars of flood behavior and is thus unable to group catchment sites with similar flood frequency regimes. Related to this, Table 2 indicates that most sites identify homogeneous regions as an outcome of ARRA interactions; ARRA revises region membership based on a specified attribute. If the specified attribute does not capture primary flood characteristics, the subsequent ARRA enhancement becomes ineffective.

3.3. Analyzing Membership Characteristics

To gain insight into membership characteristics, two catchment sites along with their region memberships were selected for more detailed case studies. Flood regions for both sites are identified based on flood seasonality and five ARRA iterations, with only one of the two regions being homogeneous.

Target catchment site Water Survey of Canada (WSC) gauge 03MB002–Whale River at 40.2 km from the Mouth in northern Québec, cannot identify a homogeneous flood region with flood seasonality attribute and ARRA iterations. This site and its 12 members are plotted in geographically (Figure 7a) and in flood seasonality space (Figure 7b) and are summarized in Table 3 based on physiography. Figure 8 provides a group of boxplots showing the spread of physiographical variables of this flood region. Flood seasonality space indicates that member sites share similar annual average occurrence dates for flooding, resulting in the formation of this flood region.

Figure 7. Region membership for study sites 03MB002 and 06DA004, presented by (a) geographical extent and in (b) flood seasonality space. Members for the 03MB002 (06DA004) region are labeled in (a) with numbers (alphabets) as they are referenced in Table 3.

Table 3. Physiographic variables for WSC 03MB002 and WSC 06DA004 and their regions formed based on flood seasonality.

Figure 8. Boxplots for physiographic variability of 03MB002 and 06DA004 flood regions. Boxes represent 25th/75th percentiles with the median (black line); whiskers extend to the extreme values without outliers; outliers (circles in plot) are defined as 1.5 the interquartile range.

Based on geographical proximity (Figure 7a), site membership is supported from a climatic perspective. The map shows that target and member sites broadly span across Canada, from Pacific to Atlantic and from southern British Columbia to the northern edge of the Northwest Territories. All member sites are, however, situated near an ocean or a coastal region and receive substantial annual precipitation (see Table 3). Since member sites span a broad range of latitude, there is variation in annual temperature range that alters the amount and the temporal distribution of rain and snowfall, thus affecting the dominant runoff mechanism during the annual peak flood season. Expected differences in flood behavior are also reflected in the varying physiography among member sites (Table 3).

Four member sites in southeastern British Columbia have high mean annual precipitation and noticeably higher mean annual temperatures compared to other members further north. These four member sites are exposed to more pluvial or mixed rain-on-snow floods. Basin area also substantially varies among the membership; five member sites are small basins (i.e., <

500 {km}^{2}

), whereas the seven others and the target site have basin areas ranging between

3500 {km}^{2}

and

49,000 {km}^{2}

. Four of the five small basins are located in southeastern British Columbia. The basin compactness ratio (

BasinArea / {Perimeter}^{2}

) is a surrogate measure for time to peak flow and is significantly greater (as expected) for the smaller basins, indicating much shorter routing times than what is seen for the larger basins. A wide spectrum of mean basin slope also exists, ranging from 3.3% to 45.4%, across member sites. Smaller basins in British Columbia mountains are remarkably steeper than member sites from other areas of Canada. Mean basin slope affects time to peak flow, as well as runoff ratios. Member sites that are highly variable in such physiographic characteristics are less likely to exhibit similar flood behavior.

In contrast, target catchment site WSC gauge 06DA004 (Geikie River below Wheeler River) identifies as a homogeneous region consisting of 12 catchment sites, excluding the target site (i.e., leave-one-out analysis). The target site is in northern Saskatchewan with very few other sites nearby. The climatology is described as sub-arctic, cold temperature, with physiography consisting of flat to rolling topography with numerous surface water bodies present in the catchment. The sub-arctic, cold climate causes annual peak flooding that is predominately snowmelt driven; the amount of accumulated winter snowpack, as well as timing and rate of snowmelt are influential to flood generation. The geographical extent of the 12 site membership is shown (Figure 7a), along with flood seasonality (Figure 7b), physiographic values (Table 3), and boxplots of physiographical values (Figure 8). Flood seasonality space indicates the membership has good consistency in the regularity of date of occurrence, suggesting these 12 member sites likely have similar flood type and characteristics. Geographically, member sites are situated in the interior of Canada with most located in mid-to-northern Alberta, Saskatchewan, and Manitoba. This area is subject to prolonged, colder, sub-arctic climate; hence the annual peak flood is a nival flood regime. Although catchment area and perimeter span a large range (Table 3 and Figure 8), catchment compactness ratio, mean basin slope, mean annual precipitation, and mean annual temperature are within the same order of magnitude. The spread of 06DA004 box plots is noticeably smaller than the spread observed for the 03MB002 region among all physiographic variables, which reflects their contrasting results in terms of homogeneity.

The above case studies provide examples of the application of catchment physiographic variables to further investigate membership characteristics, which can potentially diagnose causes for region (homo)heterogeneity. We conduct a similar member physiographic analysis for other considered attributes, and for prairie and mountainous sites that cannot identify homogeneous regions. Member sites in heterogeneous regions often displayed large physiographic variability. Therefore, it is generally found that our selected attributes and ARRA regionalization approach are not rigorous enough to identify homogeneous flood regions for catchments with significant hydrological complexity, which, in Canada, are those primarily located in prairie and mountainous regions.

3.4. Predictive Measures for Regional Quantile Estimation

Predictive measures for regional quantile estimation are presented as relative bias and relative RMSE for return periods ranging from 20 to 45 years (Table 4 and Figure 9). In general, relative bias across all considered attributes is small for all return periods (ranging from −0.6% to 3.7%). As biases are within ±5% deviation, regional estimation accuracy is considered satisfactory. Bias is generally positive, suggesting that regional estimates tend to overestimate “true” flood quantiles, but are uncorrelated with the magnitude of the flood quantile. Comparing bias among attributes, flood seasonality and physiographic variables exhibit larger bias than geographical proximity, monthly precipitation pattern, and monthly temperature pattern, in general.

Table 4. Relative bias and relative RMSE performance measures (in percentages) for quantiles produced from regionalized estimates. Bold italicized numbers indicate the best outcome for each return period.

Figure 9. Relative bias and relative RMSE results by return period.

RMSE generally increases with increasing return period across all attributes. Similar RMSE among attributes is found within each return period equal to and less than 35 years. At larger return periods (i.e., 40 and 45 years), flood seasonality and physiographic variables show noticeably larger RMSE than geographical proximity, monthly precipitation pattern, and monthly temperature pattern attributes. Though the “true” quantile is modeled by at-site estimates using accepted methods, estimation uncertainties caused by statistical extrapolation increase with increasing quantiles for both at-site and regional estimates. Therefore, higher relative RMSE at larger return periods is anticipated.

Geographical proximity, monthly precipitation pattern, and monthly temperature pattern perform better across both metrics than flood seasonality and physiographic variables, possibly because regions identified based on the first three attributes often have a higher degree of geographic proximity. Flood seasonality and physiographic measures end up grouping sites across a wider geographical extent, therefore, the degree of hydrological similarity between sites may be lower, resulting in slightly poorer (but acceptable) regional flood estimation results.

Overall, all considered attributes produced satisfactory regional flood quantile estimates for Canada based on acceptable range of bias and a reasonable range of estimation uncertainty. The success in regional quantile estimation demonstrates the applicability of proposed regionalization process based on ROI and ARRA.

4. Conclusions

This study provides insight into five distinctive flood-related attributes for their behavior in identifying homogeneous flood regions across Canada. All considered attributes show similar results regarding the number of homogeneous regions identified and locations where homogeneous regions could be identified. In general, the success of homogeneous region identification is relevant to local hydrological complexities and whether the considered attribute reflects primary flood generation mechanisms and geographic clustering of the sites.

Through combinations of these factors, results of homogeneous region identification are highly distinctive when mapped for Canada. Catchment sites in eastern Canada are generally clustered in small geographic regions and are more likely to exist within similar hydrological environments. Annual peak floods in northern Canada are predominately snowmelt driven, which is sensitive to temperature variation, making monthly temperature pattern important for homogeneous region identification. The Prairie region and the western mountains are subject to highly variable physiographic characteristics, resulting in difficultly in identifying homogeneous regions, regardless of the attribute considered.

Use of a regionalization revision process to revise initial group membership was found to be important. We proposed an automated process, the ARRA, to efficiently revise group membership across large domains and showed it successfully increased the number of homogeneous regions. Flood quantiles obtained from the identified homogeneous regions were reasonably close to estimated at-site “true” quantiles, further demonstrating success of the regionalization process. The ARRA can be readily adopted for other types of regionalization frameworks (e.g., clustering) when subsequent region revision is required.

Findings of this study, on the basis of 186 catchment sites across Canada, provide valuable input on the identification of homogeneous flood regions as well as their attribute behaviors and spatial characteristics. The success of identifying homogeneous flood regions is essential for RFFA and thus for reliable flood quantile estimation. Within the FloodNet project, work on refining RFFA techniques will aid in appropriate sizing of flood resilient infrastructures, which is crucial to proactive protection of lives and properties against flood risk.

Author Contributions

Conceptualization, Z.Z. and T.A.S.; Methodology, Z.Z. and T.A.S.; Software, Z.Z.; Validation, Z.Z. and T.A.S.; Formal Analysis, Z.Z.; Investigation, Z.Z. and T.A.S.; Resources, T.A.S.; Data Curation, Z.Z.; Writing—Original Draft Preparation, Z.Z.; Writing—Review and Editing, Z.Z. and T.A.S.; Visualization, Z.Z. and T.A.S.; Supervision, T.A.S.; Project Administration, Z.Z.; Funding Acquisition, T.A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by (i) the University of Manitoba through Graduate Enhancement of Tri-Council Stipends, and (ii) The Natural Sciences and Engineering Research Council (NSERC) of Canada through Canadian FloodNet Project (Grant number: NETGP 451456).

Acknowledgments

The authors gratefully acknowledge Donald H. Burn for providing valuable guidance for this study. The authors extend gratitude to Erika Klyszejko with data support from Environment and Climate Change Canada. The authors acknowledge the helpful comments from five anonymous reviewers that contributed to improving this manuscript. The authors would like to dedicate this study to the memory of Peter F. Rasmussen, who was a key contributor to the journey of this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

Salinas, J.L.; Castellarin, A.; Viglione, A.; Kohnová, S.; Kjeldsen, T.R. Regional parent flood frequency distributions in Europe—Part 1: Is the GEV model suitable as a pan-European parent? Hydrol. Earth Syst. Sci. 2014, 18, 4381–4389. [Google Scholar] [CrossRef]
Salinas, J.L.; Castellarin, A.; Kohnová, S.; Kjeldsen, T.R. Regional parent flood frequency distributions in Europe—Part 2: Climate and scale controls. Hydrol. Earth Syst. Sci. 2014, 18, 4391–4401. [Google Scholar] [CrossRef]
Ball, J.; Babister, M.; Nathan, R.; Weeks, W.; Weinmann, E.; Retallick, M.; Testoni, I. Australian Rainfall and Runoff: A Guide to Flood Estimation; Commonwealth of Australia (Geoscience Australia): Barton, Australia, 2016.
Robson, A.; Reed, D. Statistical Procedures for Flood Frequency Estimation, Flood Estimation Handbook, Vol. 3; Centre for Ecology & Hydrology: Wallingford, UK, 1999. [Google Scholar]
England, J.F., Jr.; Cohn, T.A.; Faber, B.A.; Stedinger, J.R.; Thomas, W.O., Jr.; Veilleux, A.G.; Kiang, J.E.; Mason, R.R., Jr. Guidelines for determining flood flow frequency–Bulletin 17C. In U.S. Geological Survey Techniques and Methods, Book 4, Chap. B; U.S. Geological Survey: Reston, VA, USA, 2018. [Google Scholar]
Public Safety Canada Canadian Disaster Database. Available online: https://www.publicsafety.gc.ca/cnt/rsrcs/cndn-dsstr-dtbs/index-en.aspx (accessed on 13 November 2019).
Blöschl, G.; Bierkens, M.F.P.; Chambel, A.; Cudennec, C.; Destouni, G.; Fiori, A.; Kirchner, J.W.; McDonnell, J.J.; Savenije, H.H.G.; Sivapalan, M.; et al. Twenty-three unsolved problems in hydrology (UPH)—A community perspective. Hydrol. Sci. J. 2019, 64, 1141–1158. [Google Scholar] [CrossRef]
Watt, W.E. Hydrology of Floods in Canada—A Guide to Planning and Design-NRC Publications Archive-National Research Council Canada; National Research Council Canada, Associate Committee on Hydrology: Ottawa, ON, Canada, 1989; ISBN 06660128764. [Google Scholar]
Moudrak, N.; Feltmate, B. Preventing Disaster before it Strikes: Developing a Canadian Standard for New Flood-Resilient Residential Communities; Intact Centre on Climate Adaptation, University of Waterloo: Waterloo, ON, Canada, 2017. [Google Scholar]
Water Survey of Canada Environment Canada Data Explorer-HYDAT Database 2020. Available online: https://www.canada.ca/en/environment-climate-change/services/water-overview/quantity/monitoring/survey/data-products-services/explorer.html (accessed on 16 December 2018).
GREHYS. Presentation and review of some methods for regional flood frequency analysis. J. Hydrol. 1996, 186, 63–84. [Google Scholar] [CrossRef]
Burn, D.H.; Goel, N.K. The formation of groups for regional flood frequency analysis. Hydrol. Sci. J. 2000, 45, 97–112. [Google Scholar] [CrossRef]
Wallis, J.R.; Wood, E.F. Relative accuracy of log Pearson III procedures. J. Hydraul. Eng. 1985, 111, 1043–1056. [Google Scholar] [CrossRef]
Potter, K.W.; Lettenmaier, D.P. A comparison of regional flood frequency estimation methods using a resampling method. Water Resour. Res. 1990, 26, 415–424. [Google Scholar] [CrossRef]
Stedinger, J.R.; Lu, L.H. Appraisal of regional and index flood quantile estimators. Stoch. Hydrol. Hydraul. 1995, 9, 49–75. [Google Scholar] [CrossRef]
Hosking, J.R.M.; Wallis, J.R. Regional Frequency Analysis—An Approach Based on L-Moments; Cambridge University Press: Cambridge, UK, 1997; ISBN 0521019400. [Google Scholar]
Yang, T.; Xu, C.-Y.; Shao, Q.-X.; Chen, X. Regional flood frequency and spatial patterns analysis in the Pearl River Delta region using L-Moments approach. Stoch. Environ. Res. Risk Assess. 2010, 24, 165–182. [Google Scholar] [CrossRef]
Jingyi, Z.; Hall, M.J. Regional flood frequency analysis for the Gan-Ming River basin in China. J. Hydrol. 2004, 296, 98–117. [Google Scholar] [CrossRef]
Merz, R.; Blöschl, G. Flood frequency regionalisation - spatial proximity vs. catchment attributes. J. Hydrol. 2005, 302, 283–306. [Google Scholar] [CrossRef]
Mediero, L.; Kjeldsen, T.R.; Macdonald, N.; Kohnova, S.; Merz, B.; Vorogushyn, S.; Wilson, D.; Alburquerque, T.; Blöschl, G.; Bogdanowicz, E.; et al. Identification of coherent flood regions across Europe by using the longest streamflow records. J. Hydrol. 2015, 528, 341–360. [Google Scholar] [CrossRef]
Wazneh, H.; Chebana, F.; Ouarda, T.B.M.J. Identification of hydrological neighborhoods for regional flood frequency analysis using statistical depth function. Adv. Water Resour. 2016, 94, 251–263. [Google Scholar] [CrossRef]
Zadeh, S.M.; Burn, D.H. A super region approach to improve pooled flood frequency analysis. Can. Water Resour. J. 2019, 44, 146–159. [Google Scholar] [CrossRef]
Durocher, M.; Burn, D.H.; Mostofi Zadeh, S. A nationwide regional flood frequency analysis at ungauged sites using ROI/GLS with copulas and super regions. J. Hydrol. 2018, 567, 191–202. [Google Scholar] [CrossRef]
Durocher, M.; Burn, D.H.; Mostofi Zadeh, S.; Ashkar, F. Estimating flood quantiles at ungauged sites using nonparametric regression methods with spatial components. Hydrol. Sci. J. 2019, 64, 1056–1070. [Google Scholar] [CrossRef]
U.S. Water Resources Council. Guidelines for Determining Flood Flow Frequency, Bulletin 17B; Hydrology Committee: Reston, VA, USA, 1982. [Google Scholar]
Muhammad, A.; Evenson, G.R.; Stadnyk, T.A.; Boluwade, A.; Jha, S.K.; Coulibaly, P. Assessing the importance of potholes in the Canadian Prairie Region under future climate change scenarios. Water 2018, 10, 1657. [Google Scholar] [CrossRef]
Whitfield, P.H.; Shook, K.R.; Pomeroy, J.W. Spatial patterns of temporal changes in Canadian Prairie streamflow using an alternative trend assessment approach. J. Hydrol. 2020, 582, 124541. [Google Scholar] [CrossRef]
Burn, D.H. Delineation of groups for regional flood frequency analysis. J. Hydrol. 1988, 104, 345–364. [Google Scholar] [CrossRef]
Burn, D.H. Cluster analysis as applied to regional flood frequency. J. Water Resouces Plan. Manag. 1989, 115, 567–582. [Google Scholar] [CrossRef]
Zrinji, Z.; Burn, D.H. Regional flood frequency with hierarchical region of influence. J. Water Resour. Plan. Manag. 1996, 122, 245. [Google Scholar] [CrossRef]
Sandrock, G.; Viraraghavan, T.; Fuller, G.A. Estimation of peak flows for natural ungauged watersheds in southern saskatchewan. Can. Water Resour. J. 1992, 17, 21–31. [Google Scholar] [CrossRef]
Ouarda, T.; Girard, C.; Cavadias, G.S.; Bobee, B. Regional flood frequency estimation with canonical correlation analysis. J. Hydrol. 2001, 254, 157–173. [Google Scholar] [CrossRef]
El-Jabi, N.; Caissie, D.; Turkkan, N. Flood analysis and flood projections under climate change in New Brunswick. Can. Water Resour. J. 2016, 41, 319–330. [Google Scholar] [CrossRef]
Faulkner, D.; Warren, S.; Burn, D. Design floods for all of Canada. Can. Water Resour. J. 2016, 41, 398–411. [Google Scholar] [CrossRef]
Sandink, D.; Kovacs, P.; Oulahen, G.; McGillivray, G. Making Flood Insurable for Canadian Homeowners: A Discussion Paper; Institute for Catastrophic Loss Reduction & Swiss Reinsurance Company Ltd.: Toronto, ON, Canada, 2010. [Google Scholar]
Zahmatkesh, Z.; Jha, S.K.; Coulibaly, P.; Stadnyk, T. An overview of river flood forecasting procedures in Canadian watersheds. Can. Water Resour. J. 2019, 44, 213–229. [Google Scholar] [CrossRef]
Aucoin, F.; Caissie, D.; El-Jabi, N.; Turkkan, N. Flood Frequency Analyses for New Brunswick Rivers; Fisheries and Oceans Canada: Moncton, NB, Canada, 2011.
FloodNet Floodnet–NSERC Network–Enhanced Flood Forecasting and Management Capacity in Canada. Available online: https://www.nsercfloodnet.ca/ (accessed on 11 March 2020).
Zhang, Z.; Stadnyk, T.A.; Burn, D.H. Identification of a preferred statistical distribution for at-site flood frequency analysis in Canada. Can. Water Resour. J./Rev. Can. Ressour. Hydr. 2020, 45, 43–58. [Google Scholar] [CrossRef]
Ashkar, F.; El Adlouni, S.E. Adjusting for small-sample non-normality of design event estimators under a generalized Pareto distribution. J. Hydrol. 2015, 530, 384–391. [Google Scholar] [CrossRef]
Ashkar, F.; Ba, I. Selection between the generalized Pareto and kappa distributions in peaks-over-threshold hydrological frequency modelling. Hydrol. Sci. J. 2017, 62, 1167–1180. [Google Scholar] [CrossRef]
Durocher, M.; Zadeh, S.M.; Burn, D.H.; Ashkar, F. Comparison of automatic procedures for selecting flood peaks over threshold based on goodness-of-fit tests. Hydrol. Process. 2018, 32, 2874–2887. [Google Scholar] [CrossRef]
Durocher, M.; Burn, D.H.; Ashkar, F. Comparison of Estimation Methods for a Nonstationary Index-Flood Model in Flood Frequency Analysis Using Peaks Over Threshold. Water Resour. Res. 2019, 55, 9398–9416. [Google Scholar] [CrossRef]
Mostofi Zadeh, S.; Durocher, M.; Burn, D.H.; Ashkar, F. Pooled flood frequency analysis: A comparison based on peaks-over-threshold and annual maximum series. Hydrol. Sci. J. 2019, 64, 121–136. [Google Scholar] [CrossRef]
Burn, D.H. An appraisal of the “region of influence” approach to flood frequency analysis. Hydrol. Sci. J. 1990, 35, 149–165. [Google Scholar] [CrossRef]
Burn, D.H. Evaluation of regional flood frequency analysis with a region of influence approach. Water Resour. Res. 1990, 26, 2257–2265. [Google Scholar] [CrossRef]
Burn, D.H.; Whitfield, P.H. Changes in floods and flood regimes in Canada. Can. Water Resour. J. 2016, 41, 139–150. [Google Scholar] [CrossRef]
Burn, D.H.; Whitfield, P.H.; Sharif, M. Identification of changes in floods and flood regimes in Canada using a peaks over threshold approach. Hydrol. Process. 2016, 30, 3303–3314. [Google Scholar] [CrossRef]
Burn, D.H. Catchment similarity for regional flood frequency analysis using seasonality measures. J. Hydrol. 1997, 202, 212–230. [Google Scholar] [CrossRef]
Burn, D.H.; Zrinji, Z.; Kowalchuk, M. Regionalization of catchments for regional flood frequency analysis. J. Hydrol. Eng. 1997, 2, 76–82. [Google Scholar] [CrossRef]
Klyszejko, E. (Environment and Climate Change Canada, Ottawa, Canada). Personal communication, 2016.
McKenney, D.W.; Pedlar, J.H.; Papadopol, P.; Hutchinson, M.F. The development of 1901-2000 historical monthly climate models for Canada and the United States. Agric. For. Meteorol. 2006, 138, 69–81. [Google Scholar] [CrossRef]
Historical Monthly Climate Grids for North America. Natural Resources Canada. Available online: https://cfs.nrcan.gc.ca/projects/3/3 (accessed on 22 April 2019).
Buttle, J.M.; Allen, D.M.; Caissie, D.; Davison, B.; Hayashi, M.; Peters, D.L.; Pomeroy, J.W.; Simonovic, S.; St-Hilaire, A.; Whitfield, P.H. Flood processes in Canada: Regional and special aspects. Can. Water Resour. J. 2016, 41, 7–30. [Google Scholar] [CrossRef]
Brimley, B.; Cantin, J.F.; Harvey, D.; Kowalchuk, M.; Marsh, P.; Ouarda, T.M.B.J.; Phinney, B.; Pilon, P.; Renouf, M.; Tassone, B.; et al. Establishment of the Reference Hydrometric Basin Network (RHBN) for Canada; Environment Canada: Ottawa, ON, Canada, 1999.
Whitfield, P.H.; Burn, D.H.; Hannaford, J.; Higgins, H.; Hodgkins, G.A.; Marsh, T.; Looser, U. Reference hydrologic networks I. The status and potential future directions of national reference hydrologic networks for detecting trends. Hydrol. Sci. J. 2012, 57, 1562–1579. [Google Scholar] [CrossRef]
Nathan, R.J.; McMahon, T.A. Identification of homogeneous regions for the purposes of regionalisation. J. Hydrol. 1990, 121, 217–238. [Google Scholar] [CrossRef]
Noto, L.V.; Loggia, G. La Use of L-moments approach for regional flood frequency analysis in Sicily, Italy. Water Resour. Manag. 2009, 23, 2207–2229. [Google Scholar] [CrossRef]
Eslamian, S.S.; Hosseinipour, E.Z. A modified region of influence approach for flood regionalization. In Proceedings of the World Environmental and Water Resources Congress 2010: Challenges of Change, Providence, RI, USA, 16–20 May 2010; pp. 2388–2414. [Google Scholar]
Vogel, R.M.; Fennessey, N.M. L moment diagrams should replace product moment diagrams. Water Resour. Res. 1993, 29, 1745–1752. [Google Scholar] [CrossRef]
Bobée, B.; Rasmussen, P.F. Recent advances in flood frequency analysis. Rev. Geophys. 1995, 33, 1111–1116. [Google Scholar] [CrossRef]
Zrinji, Z.; Burn, D.H. Flood frequency analysis for ungauged sites using a region of influence approach. J. Hydrol. 1994, 153, 1–21. [Google Scholar] [CrossRef]
Atiem, I.A.; Harmancioǧlu, N.B. Assessment of regional floods using L-moments approach: The case of the River Nile. Water Resour. Manag. 2006, 20, 723–747. [Google Scholar] [CrossRef]
Shook, K.R.; Pomeroy, J.W. Memory effects of depressional storage in Northern Prairie hydrology. Hydrol. Process. 2011, 25, 3890–3898. [Google Scholar] [CrossRef]
Muhammad, A.; Evenson, G.R.; Stadnyk, T.A.; Boluwade, A.; Jha, S.K.; Coulibaly, P. Impact of model structure on the accuracy of hydrological modeling of a Canadian Prairie watershed. J. Hydrol. Reg. Stud. 2019, 21, 40–56. [Google Scholar] [CrossRef]
Ehsanzadeh, E.; Spence, C.; van der Kamp, G.; McConkey, B. On the behaviour of dynamic contributing areas and flood frequency curves in North American Prairie watersheds. J. Hydrol. 2012, 414–415, 364–373. [Google Scholar] [CrossRef]
Prieto, C.; Le Vine, N.; Kavetski, D.; García, E.; Medina, R. Flow Prediction in Ungauged Catchments Using Probabilistic Random Forests Regionalization and New Statistical Adequacy Tests. Water Resour. Res. 2019, 55, 4364–4392. [Google Scholar] [CrossRef]

Figure 1. Geographical location of 186 study sites identifying primary cause of flood response.

Figure 2. Distribution of record length for the 186 flood samples. Sites in the ten provinces are plotted in order of longitude from west (left) to east (right). Sites in the three territories are plotted to the right most of the figure, with three sites in northeast Québec embedded within the Atlantic provinces.

Figure 3. The 186 study sites plotted in flood seasonality space.

Figure 4. Schematic of the automatic region revision algorithm (ARRA) process.

Figure 5. Sites achieving homogeneous regions (red) relative to those that did not (blue), shown by geographic location for each attribute. Note that ARRA was applied up to a maximum five iterations.

Figure 6. Boxplots comparing physiographic variability of eastern and western sites across Canada. Eastern (Western) computed based on 76 (110) sites; Ontario-Manitoba border is considered the east–west divide, respectively. Boxes represent 25th and 75th percentiles and the median (black line); whiskers extend to extreme values without outliers, where outliers are defined as 1.5 the interquartile range (outliers are removed for scaling purposes).

Figure 7. Region membership for study sites 03MB002 and 06DA004, presented by (a) geographical extent and in (b) flood seasonality space. Members for the 03MB002 (06DA004) region are labeled in (a) with numbers (alphabets) as they are referenced in Table 3.

Figure 8. Boxplots for physiographic variability of 03MB002 and 06DA004 flood regions. Boxes represent 25th/75th percentiles with the median (black line); whiskers extend to the extreme values without outliers; outliers (circles in plot) are defined as 1.5 the interquartile range.

Figure 9. Relative bias and relative RMSE results by return period.

Table 1. Required record length for at-site and regional estimate at different return periods used in analysis.

Return Period for Comparison	Required Record Length for at-Site Estimate	Number of Sites Available	Station-Years of Record for Regional Estimate
20	40	88	100
25	50	47	125
30	60	29	150
35	70	15	175
40	80	14	200
45	90	11	225

Table 2. The number of homogeneous regions identified for each attribute with target region size 500 station years of record. For each ARRA iteration, bold italicized number(s) indicate the best outcome across the five considered attributes; if two attributes tested equally, they were both best outcomes.

Number of ARRA Iterations	Considered Flood-Related Attributes					Alternative Series (Initial Regions Randomly Formed)
Number of ARRA Iterations	Geographical Proximity	Flood Seasonality	Physiographic Variables	Monthly Precipitation Pattern	Monthly Temperature Pattern	Alternative Series (Initial Regions Randomly Formed)
0	10	6	5	6	10	0
1	26	22	17	23	21	1
2	49	43	35	50	54	9
3	70	50	52	69	80	22
4	83	66	69	82	88	43
5	89	78	83	99	94	63
6	97	98	97	110	97	74
7	106	110	104	118	105	98
8	106	116	106	120	109	112

Table 3. Physiographic variables for WSC 03MB002 and WSC 06DA004 and their regions formed based on flood seasonality.

WSC ID	Flood Region of	Province	Map ID	Catchment Area (km²)	Catchment Perimeter (km)	Compactness Ratio (Area/Perimeter²) (%)	Mean Basin Slope (%)	Mean Annual Precipitation (mm)	Mean Annual Temperature (°C)
03MB002	Target Site	QC	Target Site	29,124	1417	1.5	2.6	732.3	−4.6
03KC004	03MB002	QC	1	39,371	1901	1.1	3.3	654.9	−5.4
03MD001	03MB002	QC	2	22,440	1627	0.8	3.5	815.2	−4.4
03NF001	03MB002	NL	3	7322	780.6	1.2	8.2	881.0	−3.8
10LA002	03MB002	NT	4	18,746	1173	1.4	18.1	385.9	−6.6
10ND002	03MB002	NT	5	65	44.4	3.3	3.5	220.0	−8.7
09BC001	03MB002	YT	6	48,867	1752	1.6	15.9	456.5	−3.9
08CD001	03MB002	BC	7	3555	488.9	1.5	7.4	562.1	−1.8
07EC002	03MB002	BC	8	5559	597.8	1.6	23.2	648.7	0.2
08NE006	03MB002	BC	9	330	103.3	3.1	45.4	1326	1.4
08NF001	03MB002	BC	10	416	105.1	3.8	31.3	796.1	0.0
08NH005	03MB002	BC	11	442	130.5	2.6	44.5	1218	1.2
08NN015	03MB002	BC	12	233	100.3	2.3	12.1	941.7	2.1
06DA004	Target Site	SK	Target Site	7729	684.0	1.7	2.2	506.7	−2.5
05AA008	06DA004	AB	A	403	105.2	3.6	25.4	753.2	1.9
05LJ005	06DA004	MB	B	348	115.5	2.6	2.5	522.7	1.7
05PB014	06DA004	ON	C	4768	585.6	1.4	2.4	718.7	2.6
05TG002	06DA004	MB	D	886	157.7	3.6	0.8	449.6	−1.4
05UH002	06DA004	MB	E	2191	369.4	1.6	0.4	466.1	−4.4
06BD001	06DA004	SK	F	3670	395.8	2.3	2.6	483.4	−1.5
06FB002	06DA004	MB	G	4274	355.4	3.4	0.4	478.5	−4.7
07CD001	06DA004	AB	H	30,792	1548	1.3	1.5	469.4	0.1
07KE001	06DA004	AB	I	9856	614.0	2.6	0.7	443.1	−0.3
07OB003	06DA004	AB	J	36,901	1278	2.3	0.9	450.8	−0.9
10FA002	06DA004	NT	K	9213	553.4	3.0	0.7	474.9	−3.2
10GB006	06DA004	NT	L	20,696	1146	1.6	0.9	351.3	−4.6

Table 4. Relative bias and relative RMSE performance measures (in percentages) for quantiles produced from regionalized estimates. Bold italicized numbers indicate the best outcome for each return period.

Statistic	Return Period	Attribute
Statistic	Return Period	Geographic Proximity	Flood Seasonality	Physiographic Variables	Precipitation Pattern	Temperature Pattern
Relative Bias	20	0.3%	1.0%	0.9%	0.6%	0.6%
	25	0.8%	2.0%	0.5%	0.4%	1.0%
	30	0.5%	2.5%	3.3%	1.1%	1.9%
	35	1.1%	3.1%	2.8%	−0.6%	−0.1%
	40	0.9%	1.8%	2.3%	0.2%	0.8%
	45	0.1%	2.1%	3.7%	0.004%	−0.2%
Relative RMSE	20	6.5%	6.7%	6.6%	6.4%	6.7%
	25	7.9%	7.7%	7.6%	7.7%	7.6%
	30	7.5%	7.9%	8.7%	8.5%	8.0%
	35	9.5%	8.9%	9.6%	8.4%	8.1%
	40	9.4%	9.7%	10.0%	8.8%	8.7%
	45	8.6%	11.1%	13.0%	9.9%	10.1%

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Investigation of Attributes for Identifying Homogeneous Flood Regions for Regional Flood Frequency Analysis in Canada

Abstract

1. Introduction

2. Materials and Methods

2.1. Rationale for Attribute Selection

2.2. Datasets

2.3. Defining Attribute Similarity Distance

2.3.1. Geographical Proximity

2.3.2. Physiographic Variables

2.3.3. Flood Seasonality

2.3.4. Monthly Precipitation Pattern

2.3.5. Monthly Temperature Pattern

2.4. Region of Influence Approach

2.5. Generalized Extreme Value (GEV) Distribution and L-Moment Estimation Method

2.6. L-Moment Homogeneity Test

2.7. Automatic Region Revision Algorithm (ARRA)

2.8. Flood Region Identification Process

2.9. Assessing the Accuracy of Regional Flood Quantiles

3. Results and Discussion

3.1. ARRA Performance

3.2. Identification of Homogeneous Regions

3.3. Analyzing Membership Characteristics

3.4. Predictive Measures for Regional Quantile Estimation

4. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics