A Data Driven Approach to Bioretention Cell Performance: Prediction and Design

Bioretention cells are an urban stormwater management technology used to address both water quality and quantity concerns. A lack of region-specific design guidelines has limited the widespread implementation of bioretention cells, particularly in cold climates. In this paper, experimental data are used to construct a multiple linear regression model to predict hydrological performance of bioretention cells. Nine different observed parameters are considered as candidates for regressors, of which inlet runoff volume and duration, and initial soil moisture were chosen. These three variables are used to construct six different regression models, which are tested against the observations. Statistical analysis showed that the amount of runoff captured by a bioretention cell can be successfully predicted by the inlet runoff volume and event duration. Historical data is then used to calculate runoff volume for a given duration, in different catchment types. This data is used in the regression model to predict bioretention cell performance. The results are then used to create a design tool which can assist in estimating bioretention cell size to meet different performance goals in southern Alberta. Examples on the functionality of the design tool are provided.


Introduction
Urban stormwater runoff is defined as the overland flow that occurs after a precipitation event in urbanized areas.These areas are characterized by a higher fraction of impervious surfaces as compared to natural, undeveloped areas.An increase in impervious surfaces, such as roofs, roads and parking lots can lead to higher runoff volumes with higher flow rates, lower evapotranspiration and lower infiltration [1].This can increase the risk of flooding in downstream waterways and contribute to the degradation of water quality; in fact urban stormwater runoff is considered to be a leading cause of degradation of surface waters [2][3][4].In response to the detrimental effects of runoff, a site design strategy known as Low Impact Development (or LID), has been widely adopted and aims to control, capture and treat runoff at the source and promotes sustainable water management [5].For LID, structural and non-structural practices are implemented on site such that the post-development hydrology and water quality mimics or improves from the pre-or undeveloped conditions [3,6,7].
Bioretention cells, also known as rain gardens, are small, terrestrial, soil and plant based infiltration and treatment basins.They consist of a highly porous engineered soil media topped with dense vegetation and organic ground cover (such as mulch) (see Figure 1 for a schematic of the bioretention cell used in this study).Urban runoff is routed to the cell, where it infiltrates into the media.The runoff can exit the cell via an under-drain beneath the cell, percolate into the surrounding sub-soils, or a combination of both depending on the design [7].The surface layer acts to attenuate the peak discharge and time of concentration of the incoming flow [8,9] as well as to provide storage if the runoff intensity is higher than the infiltration capacity.Losses to the surrounding sub-soils, plant uptake and storage in soil media contribute to runoff volume reduction.Bioretention cells are used to address both runoff quantity and quality concerns.This paper will focus only on the quantity (i.e., hydrological) component, specifically the ability of a bioretention cell to decrease or capture the amount of runoff it receives.Other hydrological performance indicators include the change in peak flow rate and the lag time of the outlet hydrograph.The ability of a bioretention cell to decrease or capture runoff volume has been reported in a number of studies.Experiments on both the field and laboratory scale have shown that the cells can capture 100% of influent volumes from small to medium size events, and between 33%-80% for larger events [5,[10][11][12].The large variability in performance in these studies can be attributed to the fact that current design guidelines are highly empirical and are not optimized for region specific differences in geological characteristics and climatic conditions [1].Thus bioretention cells may not perform with the same efficacy as intended if implemented without considerations for regional differences.
A number of bioretention cell design guidelines have been developed in the United States; a summary of some of these has been provided in [13].In most cases, the functionality of these guidelines is limited by location and climate, as described above.This is especially poignant in cold climate regions, particularly in prairie regions, where bioretention cell hydrology is unique [14,15].Cold climate conditions are characterized by frozen soil, higher snowmelt volumes and lower runoff intensity from snowmelt, and repeating freeze-thaw cycles [15].Thus, guidelines developed in warm or temperate regions may not be applicable in cold regions.
To overcome these limitations, a number of physically-based numerical models have been created to predict bioretention cell performance [13].The overall objective of these models is to ascertain whether a particular design guideline is appropriate for the desired application or performance targets.Additionally, these models can predict the effects of design changes on performance.Examples include the RECARGA [16] and RECHARGE [17] models, based on the Richards and Green-Ampt equations, respectively.The models were developed in Wisconsin, USA and they have been adopted by the state government [13].Heasom et al. [18] developed a model using the HEC-HMS system that treats the bioretention cell as a reservoir and uses a weir system to control the outflow.Similar rainfall-runoff approach models were created using SWMM by [19,20].
However, a major drawback of using physically-based models is the data needed for simulations, which may not be easily available, as well difficulties in model calibration.This is especially true when a number of "sub-models" representing different processes (e.g., rainfall-runoff, infiltration, evapotranspiration) are lumped together [21].In addition to this, it may be difficult to mimic local characteristics using these models, for example, the frequent freeze-thaw cycles experienced in regions like Calgary, Canada.In contrast to physically-based models, data-driven models are based on the analysis of the data characterizing a system, i.e., the bioretention cell, with limited or no assumptions regarding the nature of the physical system being modeled.Typically, the model is defined on the basis of the connections between concurrent input and output variables [21].An advantage of this approach is that, there is potential for establishing strong relationships between bioretention cell performance (i.e., the output variable) and widely available input variables (e.g., precipitation depth or air temperature).If so, the ability to predict bioretention cell performance may be much simpler than relying on physically-based models that require more intensive and specialized data requirements.By using region-specific data, a data-driven model offers the potential of predicting bioretention cell performance in that region.
In this paper, a data-driven model is proposed to predict region-specific bioretention cell performance in Calgary, Canada.The use of a bioretention cell to address stormwater runoff concerns has been explored in Calgary [15,22].However, to date, no comprehensive regional design guidelines or numerical models (data-driven or other) have been developed for bioretention cell use in Calgary.The city is characterized by prairie hydrology that experiences frequent freeze-thaw cycles through the winter months.These unique characteristics require specific considerations in design guidelines before widespread bioretention implementation can occur.The objectives of this research are to simulate bioretention cell performance using a data-driven model and then to use the results from the simulations to create a design tool to assist in sizing bioretention cells to meet performance targets.The research presented here uses data from Calgary; however this methodology can be applied to develop region-specific tools for any location.
The model described in this paper uses experimental data from a bioretention cell to construct and validate a multiple linear regression (MLR) model.The regressors were selected based on statistical analysis of the available parameters collected via field-scale experiments.The best performing model was used to predict bioretention cell performance under different conditions, using historical observations.These processes are described in detail below.

Site Description and Experiment Procedure
The bioretention cell used for the experiments is located in southwest Calgary.The cell was constructed in 2005 for experimental purposes.As such, the cell is not operational and no runoff drains to it.The bioretention cell measures 8 m wide by 4 m long, with a depth of 1.5 m.The cell consists of dense, native vegetation on the surface along with a 75 mm mulch layer.The soil media used in the cell is classified as loam to sandy-loam.The media is enclosed with a permeable geotextile, allowing moisture in the cell to drain into the surrounding sub-soils.Details on the design of the bioretention cell are included in [15,22].
Since the bioretention cell is experimental and offline, a runoff distribution system was designed to mimic precipitation events.Synthetic stormwater runoff was created using stormwater from a stormwater pond and sediment used for road de-icing.The runoff was applied to the cell using a variable speed pump and series of hoses.The volume, flow rate and duration of each experiment aimed to mimic storm events of different frequencies that occur in Calgary.The synthetic runoff drained through the bioretention cell to the under-drain layer, where a perforated pipe channeled the runoff to a manhole that was equipped with monitoring equipment.Details on the synthetic runoff system and experimental procedure are listed in [15,22].

Data Collection and Analysis
For each experiment on the bioretention cell, a number of different hydrological parameters were collected.Details on sampling methods have been previously described in [15].The pertinent parameters used for this research include the volume of synthetic runoff applied (V i ) and released (V o ) from the bioretention cell, the corresponding flow rates Q i and Q o , and the duration of the two hydrographs, t i and t o .The initial (θ i ), peak (θ p ) and final (θ f ) moisture level of the soil media was measured at four depths: 150, 300, 500 and 1000 mm below the surface of the cell.A number of climatic parameters, including air temperature (T), radiation, wind speed and direction, and precipitation were also collected.In addition to these, a number of variables were calculated from the observed data, including peak and center of mass flow rates and the time to peak for both inlet and outlet hydrographs and the delay or lag-time between the peaks and center of mass.
Candidates for regressors were chosen from this collection of data based on correlation analysis.However, since the objective of this research is to use a data-driven model to predict performance, a function of V o , all parameters that were directly influenced by V o were not considered as regressors.This limited the candidates to the following nine parameters: T, V i , t i , influent peak flow rate and center-of mass flow rate (labeled Q pi and Q ci , respectively) and θ i at four depths.Pearson's correlation coefficient and significance tests (estimated with Student's t-test) were carried out on these parameters to determine which had the highest influence on the dependent variable V o .
Bioretention cell performance was defined as, and limited to, the change in runoff volume between the inlet and outlet.It is defined as: where ΔV = change in runoff volume between inlet and outlet of the bioretention cell (%); V o = volume of effluent (m 3 ); and V i = volume of influent (m 3 ).A value of ΔV less than 100% indicates a decrease in runoff volume between the inlet and outlet, or a lower volume of effluent than influent.

Model Construction
In this study an MLR model was used as the data-driven technique to predict bioretention cell performance.The independent variables were selected from the larger group of observed variables, based on correlation analysis.The general form of the regression equation was: where x i are the independent variables, A and B i are the regression coefficients and n is the total number of regressors used.Different permutations of regressors were used to create six different models, which are summarized in Section 4 below.For each regression analysis, the logarithms of the data were used as inputs into the model to overcome the nonlinearity between the independent and dependent variables.These models were evaluated using the coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE) and the corrected Akaike information criterion (AICc).These error statistics were calculated using the following equations: where n is the number of sample points used and k is the number of parameters in each regression model.For each model, approximately two thirds of the available data was used for constructing model and the remaining data were used for validation.

Experimental and Data Analysis Results
A total of 24 experiments were conducted on the bioretention cell; data corresponding to the experiments are listed in Table 1 below.The volume of synthetic runoff applied to the cell ranged from approximately 3720 L to 17,990 L with experiment durations ranging from 14 min to 116 min.Eight experiments were conducted in cold climate conditions; cold climate was defined as when air temperature was less than 5 °C.The average change in volume between the inlet and outlet was 91.5%, calculated using Equation (1).The performance was significantly lower (i.e., higher V o ) for the cold climate experiments.The reasons for the differences in the two climatic conditions are discussed in detail in [15].
Table 1.Summary of data collected from the 24 field experiments.Notes: * Cold climate experiments, when T ≤ 5 °C; + Analysis concluded that experiment S10 was an outlier, with respect to the magnitude of V i and was removed from further calculations; θ i -1, θ i -2, θ i -3 and θ i -4 refer to soil moisture measured 150, 300, 500 and 1,000 mm below the surface of the bioretention cell.
A correlation analysis was conducted to determine the associations between V o and the other nine parameters. Figure 2 shows pair-by-pair scatter plots of V o and these parameters.The figure illustrates that a single parameter cannot be used to describe all the variance in V o .Correlation coefficients were calculated for each pair; V o was significantly (with a p-value < 0.05) correlated with V i (Pearson's correlation coefficient r, was calculated to be 0.73) and t i (r = 0.64).Non-significant (with a p-value > 0.05) correlation was found between V o and θ i -2 (the soil moisture measured 300 mm below the surface, with r = 0.14).Based on this, the candidates for regressors for the MLR were reduced from the original nine to V i , t i and θ i -2. Figure 3 shows pair-by-pair scatter plots of these three parameters versus V o ; in this figure the values plotted are the logarithm of the original observations.The linear correlation between these variables is clearer after the transformation.It should be noted that for events F3 to Sp1, Sp3 and S7 to S9, V o was less than 10 L (in fact, V o was equal to zero for five of the seven cases, i.e., no effluent drained from the bioretention cell); these values were replaced with values of 1 L to facilitate the log-transformation.As such, experiments producing minimal effluent volume were considered to be of the same population (with the higher effluent volumes arising from a different population due to different generating mechanisms) with a very small variance.Thus, all effluent volumes less than 0.1% of the inlet volume, or 10 L, were assigned a value of 1 L.   Pair by pair scatter plots of log-transformed V o , the two significantly correlated variables, V i , and t i , and weakly correlated θ i -2.

Model Construction and Validation
After the candidate regressors to predict V o were identified, the data was divided for model construction and validation; 15 data points were used for model construction and 8 were used for validation.The data were split to ensure an equal representation of cold climate events in both sets as well as no significant deviation from the population mean.Table 2 summarizes statistics of the original and the divided data, and identifies the experiment ID that corresponds to each data set.The construction data set was then used to construct six MLR models using Equation (2) with different combinations of the regressors.The models are summarized in Table 3, which lists the regressors used and the values of regression coefficients for each.Based on the error statistics for both the construction and validation sets (summarized in Table 3), Model 2 which used V i and t i as the regressors, emerged as the best performing model.Model 2 outperformed the other models based on a collective evaluation of error statistics, rather than leading in a single category.An advantage of using Model 2 to predict bioretention performance is that these two variables are typically readily available from historical records, and thus can be easily used to predict bioretention cell performance, without the need for other detailed site characteristics (e.g., soil type or the degree of imperviousness in upstream catchment area).The figure shows that the model can accurately predict both the lower (~0 L) and higher end of the observed V o .It is important to note this model was designed so that predicted V o values between 0 L and 10 L were set equal to 1 L to be consistent with the rule used described in Section 3.This assumes that predicted low effluent volumes (between 0 L and 10 L) are essentially the same population as the case when no runoff drains from the bioretention cell.Figure 4 also shows that the observed and predicted values closely follow the 1:1 line, in both the construction and validation data sets.
The upper and lower limits of the input data for Model 2 are influenced by the data used to construct the model.V i ranged from 3720 L to 9100 L, while t i ranged from 14 min to 116 min.The application of the model is approximately limited to these values.Furthermore, the limit of the predicted values of this model are bounded by V o = 0 L (100% of inlet runoff captured) and by V o ≤ V i (maximum outlet runoff cannot be greater than inlet runoff).Therefore, it is important to ensure that the input variables used for prediction do not exceed the upper limits of V i and t i .

Model Application
One of the objectives of this research was to use an MLR model to create a design tool to assist in designing bioretention cells, for region-specific applications.To do this, historical precipitation depth and duration data for Calgary were used to predict V o using the developed model.An important intermediate step in this process is the conversion of the precipitation depth to V i .A method that accounts for the size of a bioretention cell and the size and degree of imperviousness of its upstream catchment is used and described below.
Historical precipitation and duration data was collected from Environment Canada [23].The data, part of the Short Duration Rainfall Intensity-Duration-Frequency Data set, includes the maximum precipitation depth observed in Calgary (Calgary Int'l A, ID: 3031093) at nine event durations.The data from 1947 to 2007 (with six years data missing) includes event durations from 5 min to 24 h. Figure 5 summarizes the maximum precipitation depths at each interval for the 60 year period.Note that events with a duration of less than 10 min and more than 6 h were excluded in the figure and in the proceeding analysis, as this data does not meet the criteria for the MLR model outlined in Section 4. The volume of runoff generated in a catchment from a given precipitation depth is a function of both the area and the degree of imperviousness of the catchment.Both these factors need to be included to calculate V i from the historical precipitation depths.First, the area of the catchment has to be defined; typical guidelines recommend that a bioretention cell should be sized from 5% to 20% of the catchment area [24,25].For this research, four different catchment sizes were used, where bioretention cell area (which was equal to 32 m 2 for the experiments) was 5%, 10%, 15% and 20% of the total catchment area.The corresponding catchment areas are 160, 213.3, 320 and 640 m 2 .
Second, the amount of impervious area of the catchment area has to be defined.A typical residential area in Calgary will have an impervious area of 40% whereas in commercial areas or parking lots it can be as high as 100% (i.e., 100% of the catchment has impervious surfaces).For the application of the MLR model, four different levels of impervious areas were used: 100%, 80%, 60% and 40%, representing typical catchment characteristics found in urban areas.For example, for a 320 m 2 catchment, if the amount of impervious area in a catchment is 40%, the area of the impervious surfaces is 128 m 2 .
A ratio known as the Impervious to Pervious Ratio (I/P) was introduced [15,26].I/P is the ratio of the upstream impervious area (I) to the area of the pervious bioretention cell (P) that receives the runoff.Thus, continuing from the example above, the impervious fraction (I) is 128 m 2 , while the bioretention area (i.e., P) is 32 m 2 , meaning the I/P = 128 m 2 /32 m 2 = 4.The I/P approach is useful since it can combine both factors: the total area, and the imperviousness of the catchment, into a single relationship which can be formulated as: where V i = volume of influent (m 3 ); d = precipitation depth (m); A B = bioretention cell area (m 2 ); and I/P = Impervious to Pervious Ratio.Equation (7) takes into account the runoff generated from the impervious area in the catchment and also the precipitation that occurs on the bioretention cell itself.A summary of the I/P calculated for the 16 combinations of catchment size and impervious areas are shown below in Table 4. Though Equation (7) shows that V i is a function of the area of bioretention cell, A B , this is not explicitly necessary, and Equation ( 7) can be rewritten in a more general form as: = × ( + ) (10) where A imp is the impervious area of the catchment (m 2 ); P oC is the size of the bioretention cell as a percentage of the total area of the catchment, e.g., between 5% and 20% (%); and A C is the area of the catchment that drains for the bioretention cell (m 2 ).After the I/P values were determined, V i values were computed for the entire historical data set.Values of V i that exceeded 9,100 L were excluded from further analysis.These values were then used as input in Model 2. The resultant V o , determined from the model output, were used to calculate ΔV using Equation (1).The results are plotted in Figure 6  To illustrate the functionality of the figure, take the following examples: given an event with a duration of 60 min and depth of 30 mm, the expected ΔV, for a bioretention cell sized to an I/P of 8, is approximately 90%.Conversely, if ΔV is known a priori, for example development regulations stipulate that 80% of runoff must be captured, the figure can be used to determine the necessary I/P ratio.This ratio can be used to calculate the size of the bioretention cell needed, assuming that the total catchment size is known, to meet the performance goals, for a given "design event".For example, to capture 80% of runoff generated from a 25 mm, 120 min event, the figure shows that the necessary I/P of the cell has to be 8 or lower.However, if the "design event" is, for example, 25 mm over 60 min, the I/P value can be up to 10 to meet the runoff reduction goals.This increase can be achieved by either decreasing the size of the bioretention cell, relative to the size of the catchment, or more impervious surfaces can be added to the catchment, or a combination of both.
These examples show how Figure 6, which was developed using local experimental and historical data, can be used as a design tool to size bioretention cells for use in Calgary.Once the performance targets are known, different sizes can be explored using this figure, based on catchment characteristics and available area.Limitations of the tool include the maximum permissible V i , and also the characteristics of the experimental bioretention cell used to generate the data.

Conclusions
LID is a sustainable water management strategy that aims to integrate water management into the urban fabric.As a site design strategy, LID can be considered more than just a stormwater control technology; it is a tool that promotes a full spectrum of ecosystem benefits.LID, including bioretention cells, are an emerging technique of addressing urban stormwater management: techniques that move away from traditional practices of expanding urban infrastructure, to providing engineering answers at the lot-level.
Bioretention cells are a proven technology to address urban runoff concerns in cold climate conditions, like that of Calgary, Canada [15,22].This paper developed a novel method to create a design tool to assist in bioretention design, for Calgary, that takes region-specific design considerations into account.Currently, no guidelines or models exist to design bioretention cells for use in Calgary.In this paper, experimental data was used to create a data-driven model to predict bioretention cell performance, specifically the amount of runoff captured.This research shows that the volume of runoff draining to a bioretention cell and the duration of a precipitation event can be successfully used to predict bioretention cell performance.The model performed well with respect to R 2 (0.991), RMSE (0.128), MAE (0.086) and AIC (−28.8).Historical precipitation depth and duration was then used to predict bioretention cell performance under 16 different catchment characteristics, including varied levels of imperviousness and catchment size.These results were collated and extrapolated to create a novel design tool, shown in Figure 6.This figure was then used to demonstrate how future bioretention cell design size can be estimated if performance targets are known.An example also showed how bioretention cell performance can be estimated if the size and characteristics of the catchment are known.Though the methodology was applied to only one case study, Calgary, it can be applied in any region where the relevant data is available.
Future research should focus on advancing this technique by expanding the scope of performance parameters.Other hydrological performance parameters, such as peak flow reduction, are closely correlated with the decrease in runoff volume, and a data-driven technique may be applied to predict these parameters as well.Similarly, future work should focus on predicting water quality performance by using the hydrological performance as an indicator.

Figure 1 .
Figure 1.An example of a bioretention cell in Calgary, Canada.

Figure 2 .
Figure 2. Pair by pair scatter plots of V o and the nine independent variables.

Figure 3 .
Figure 3. Pair by pair scatter plots of log-transformed V o , the two significantly correlated variables, V i , and t i , and weakly correlated θ i -2.

Figure 4
Figure4below shows a comparison of the observed and predicted data using Model 2. The figure shows that the model can accurately predict both the lower (~0 L) and higher end of the observed V o .It is important to note this model was designed so that predicted V o values between 0 L and 10 L were set equal to 1 L to be consistent with the rule used described in Section 3.This assumes that predicted low effluent volumes (between 0 L and 10 L) are essentially the same population as the case when no runoff drains from the bioretention cell.Figure4also shows that the observed and predicted values closely follow the 1:1 line, in both the construction and validation data sets.The upper and lower limits of the input data for Model 2 are influenced by the data used to construct the model.V i ranged from 3720 L to 9100 L, while t i ranged from 14 min to 116 min.The application of the model is approximately limited to these values.Furthermore, the limit of the predicted values of this model are bounded by V o = 0 L (100% of inlet runoff captured) and by V o ≤ V i (maximum outlet runoff cannot be greater than inlet runoff).Therefore, it is important to ensure that the input variables used for prediction do not exceed the upper limits of V i and t i .

Figure 4 .
Figure 4. Model 2 results: time series comparison of observed and predicted log(V o ) for (a) construction, and (c) validation data sets; and comparison of predicted and observed data sets for (b) construction and (d) validation data sets.

Figure 5 .
Figure 5. Summary of historical maximum precipitation depth and corresponding duration for Calgary, collected from Environment Canada for the years 1947-2007.
below.The figure shows predicted ΔV values versus the precipitation depth at six different duration intervals.Further, at each duration, the ΔV values are further categorized by I/P values.The solid black lines represent predicted values calculated by the model, whereas the dashed lines are extrapolation of the predicted trends.

Figure 6 .
Figure 6.Bioretention cell design guide for Calgary: runoff volume reduction as a function of precipitation depth, duration and I/P ratio.

Table 2 .
Summary of statistics of all experiments, construction and validation data sets for the three regressor candidates and V o .

Table 3 .
Summary of six multiple linear regression (MLR) model results, including regressors, x i used, the constant term A, coefficients B i , and error statistics for both model construction (top row) and model validation (bottom row).

Table 4 .
Summary of I/P values calculated for 16 different combinations of catchment size and percent impervious area in the catchment.