Flood Hazard Risk Mapping Using a Pseudo Supervised Random Forest

: Devastating ﬂoods occur regularly around the world. Recently, machine learning models have been used for ﬂood susceptibility mapping. However, even when these algorithms are provided with adequate ground truth training samples, they can fail to predict ﬂood extends reliably. On the other hand, the height above nearest drainage (HAND) model can produce ﬂood prediction maps with limited accuracy. The objective of this research is to produce an accurate and dynamic ﬂood modeling technique to produce ﬂood maps as a function of water level by combining the HAND model and machine learning. In this paper, the HAND model was utilized to generate a preliminary ﬂood map; then, the predictions of the HAND model were used to produce pseudo training samples for a R.F. model. To improve the R.F. training stage, ﬁve of the most e ﬀ ective ﬂood mapping conditioning factors are used, namely, Altitude, Slope, Aspect, Distance from River and Land use / cover map. In this approach, the R.F. model is trained to dynamically estimate the ﬂood extent with the pseudo training points acquired from the HAND model. However, due to the limited accuracy of the HAND model, a random sample consensus (RANSAC) method was used to detect outliers. The accuracy of the proposed model for ﬂood extent prediction, was tested on di ﬀ erent ﬂood events in the city of Fredericton, NB, Canada in 2014, 2016, 2018, 2019. Furthermore, to ensure that the proposed model can produce accurate ﬂood maps in other areas as well, it was also tested on the 2019 ﬂood in Gatineau, QC, Canada. Accuracy assessment metrics, such as overall accuracy, Cohen’s kappa coe ﬃ cient, Matthews correlation coe ﬃ cient, true positive rate (TPR), true negative rate (TNR), false positive rate (FPR) and false negative rate (FNR), were used to compare the predicted ﬂood extent of the study areas, to the extent estimated by the HAND model and the extent imaged by Sentinel-2 and Landsat satellites. The results conﬁrm that the proposed model can improve the ﬂood extent prediction of the HAND model without using any ground truth training data.

(1) Hydrodynamic models use mathematical equations to simulate fluid motion and usually require high computational power [22]. These models require significant data inputs such as flow discharge, water depth, gravitational acceleration and channel bed slope, to simulate flow in their attempt to replicate flow patterns and estimate flow velocity and spatial extents of inundation. Hydrodynamic models are mainly divided into three categories depending on their dimensionality, 1D, 2D and 3D. 1D models consider the floodplain flow as one-dimensional along the river central line [8,9] and by solving mass and momentum conservation equations, model the flood over a floodplain on multiple cross-sections. In 2D models, an intensive collection of topographical data are used for flood simulation assuming that the third dimension, water depth, is shallow [22]. 2D models are probably the most common approach among hydrodynamic models for flood mapping [22]. Furthermore, 1D/2D solutions are becoming very popular, where the 1D component models flow in the river and 2D flow modeled on the floodplain [23]. 3D hydrodynamic models generate a complex three-dimensional representation of floodplain which is usually regarded unnecessary because of their extensive required parameters and maybe comparable accuracy with 2D models [22]. For modeling more complicated flood events, which occur due to tsunami and dam breaks and require modeling of vertical turbulence and spiral flow, 3D models provide better representation because of their capability in modeling vertical dimensions [24,25]. The higher dimensionality of a model, the more parameters required for flood simulation and longer computation time. In general, hydrodynamic models are extremely complex, have sensitive outputs based on site-specific parameters, require a substantial amount of data and their calibration is costly [26].
(2) Empirical methods generate flood maps mainly using observations which could be obtained through satellite imagery, aerial photography, land survey and so forth. Depending on the credibility and resolution of the observations, empirical methods generate flood maps with varying accuracy and are mainly used as a benchmark for modeling and assessment of other approaches [14]. For example, using remote sensing techniques it is possible to map areas covered by water and through a preand post-event comparison, affected areas will be identified. However, there are certain limitations involved with this approach such as the often coarse spatial resolution of freely available satellite images, availability of the data at the time of the flood, high cost of data collection through surveying and various types of observations which require knowledge for interpretation [22].
(3) Simplified conceptual models generally involve less physical detail than hydrodynamic models and are based on simplified hydraulic concepts. One example of these types of models is Rapid Flood Spreading Method (RFSM) which separates the floodplain into distinct areas representing the depressions. Then, based on the flood volume and filling/spilling process, inundated areas are identified [16]. Teng in Reference [22] categorized the height above nearest drainage (HAND) model [17] in the simplified conceptual class, which is essentially an elevation model normalized toward the nearest stream [27]. In the HAND model, the elevation of each pixel is calculated from the nearest stream based on a series of steps including producing flow direction, accumulation area and so forth. Then, by considering the water level in the stream, the inundated areas can be recognized. This capability of the HAND model can be used to predict inundated areas as a function of water level in the nearby stream. Different hydrological models can estimate the amount of water in the streams each year and with the HAND model it is possible to estimate inundated areas and manage further damages. However, depending on the topography of the region of interest, existence of natural or artificial infrastructures that affect the water discharge like waterfalls or dams, the HAND model can produce different accuracies. In the studies reported by Momo [28] and McGrath et al. [29], the HAND model was able to obtain flood extents that compared closely to actual flooded areas; however, in Reference [30], the generated HAND model considerably overestimated the real flooded extent. Despite the capability of the HAND model to predict the flood extent and its straightforward approach, the model is highly susceptible to the accuracy of hydrologically conditioned Digital Terrain Models (DTM). Due to its dependence on the external factors, such as precise water level of the river and a hydrologically conditioned DTM, the HAND model can predict flood extents with limited accuracy.
(4) Machine learning and statistical models investigate the probability of an area being flooded or not by studying past flood events and generate flood susceptibility maps. The analysis is based on different topographical, hydrological and geological conditioning factors, which can vary depending on the area of study, flood inventory datasets and existing machine learning models. Researchers have implemented different algorithms over different areas of study such as K-Nearest Neighbors (K-NN) [31], Artificial Neural Networks (ANNs) [32], Support Vector Machines (SVM) [18,33,34], random forest (R.F.) [35,36], Genetic Programming (GP) [37], Frequency Ratio (FR) [34,38,39] and Logistic Regression (LR) [40,41]. Remote sensing and Geographic Information System (GIS), along with machine learning models, have made significant contributions in flood modelling [42,43]. As a case study, Esfandiari et al. [44] tested different conditioning factors for flood mapping and reported that five conditioning factors, namely Altitude, Slope, Aspect, Distance from River and Land use/cover, produce the best results once used with a R.F. classifier. However, these studies map an existing flood as they take training samples from existing satellite images and cannot predict floods in a future event.
One of the major problems of machine learning models is that they require numerous ground truth samples for training. Lee [45] proposed to use pseudo labels with maximum predicted probability to true labels for neural networks. This is a promising approach that can help with reducing the number of training samples but to the best of our knowledge has never been used in real-time flood mapping. On the other hand, the flood susceptibility maps produced by Machine Learning provide a general perspective of flood prone areas based on previous flood events; therefore, they can fail in predicting possible future flood events.
To overcome the problems of machine learning mentioned above (requirement of abundant training samples and low accuracy in flood prediction), we proposed in this paper a new flood prediction model called Pseudo Supervised Random Forest (PS-RF). PS-RF uses the HAND model, a R.F. classifier and random sample consensus (RANSAC) paradigm [46] to dynamically estimate the flood extent. In this model, the flood predictions from the HAND model are used as pseudo labels to train a R.F classifier. Through a series of random data selection and outlier detection using RANSAC paradigm, the most reliable predictions are selected for training R.F. To help eliminate erroneous samples, the five best conditioning factors reported by Reference [44] are used. Then, the R.F. with the best selected pseudo training subset goes through a cross-validation procedure for hyperparameter optimization. The optimized R.F. classifier, is used to produce the final flood map. In this model, rather than using ground truth samples to train the R.F. classifier, the training samples are selected from a thresholded HAND model with their associated uncertainties. However, since we include other conditioning factors, the resulting trained R.F classifier produces higher accuracies compared to a thresholded HAND model alone.
The flood prediction accuracy of PS-RF was tested using five different flood events in Fredericton, NB, Canada in the years 2014, 2016, 2018, 2019. To prove that the proposed model can work in other areas we also tested it using the 2019 flood in Gatineau, QC, Canada. For accuracy assessment, multiple ground truth points were collected from Landsat 8 OLI and Sentinel-2 satellite imageries. PS-RF was able to produce flood maps with accuracies over 89% which surpasses flood extent estimations produced by the HAND model in all the five flood events. The results proved that, by PS-RF, we produced a flood extent prediction model by which we can estimate the inundated areas as a function of the depth of water in the nearby stream. Thus, the relevant authorities can prepare for the rise of water to prevent further damage each year.

Study Areas
The first study area in this study is in Fredericton (Figure 1b), the capital of the province of New Brunswick, in Atlantic Canada, which covers an area of approximately 155 square kilometers. The city with its nearly 94,000 population and approximately 22,000 households, has experienced several flood events in history [47]. The Saint John River, which flows from west to east through Fredericton, splits the city into northern and southern sides. Due to cold winters in this region, the surface of the Saint John River freezes every year during winter and it starts to melt as the weather gets warmer in April and May. Extreme snowmelt along with heavy rain can cause a significant rise in the river water level. Also, the Mactaquac Dam, shown in Figure 1b, which is located nearly 19 km upstream from the city, often cannot hold all the melted snow and ice in its headpond, so water may be released to the river downstream. The river, in this study area, is mainly surrounded by relatively flat floodplain which gets inundated by the water level rise.
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 23 splits the city into northern and southern sides. Due to cold winters in this region, the surface of the Saint John River freezes every year during winter and it starts to melt as the weather gets warmer in April and May. Extreme snowmelt along with heavy rain can cause a significant rise in the river water level. Also, the Mactaquac Dam, shown in Figure 1b, which is located nearly 19 km upstream from the city, often cannot hold all the melted snow and ice in its headpond, so water may be released to the river downstream. The river, in this study area, is mainly surrounded by relatively flat floodplain which gets inundated by the water level rise. The City of Gatineau (Figure 1c), is located on the northern bank of the Ottawa River. The city with its 332,000 inhabitants and a total population of 1.3 million is the fourth largest city in the province of Quebec. The Gatineau River goes through the center of the city, flowing from the lakes in the north of the Baskatong Reservoir in western Quebec and joining the Ottawa River in the city. The Ottawa River is regulated with more than 50 major dams and 13 principal reservoirs [48]. The area of interest contains the relatively flat area starting from Chaudière Bridge, where the hydrometric gauge station of 02LA028 is located, continuing for nearly 6 km along the Ottawa River and Gatineau flood plain starting from Ottawa River continuing upstream for around 9 km. In May 2019, the city and adjacent regions experienced a severe flood causing 111 homes to evacuate, 923 The City of Gatineau (Figure 1c), is located on the northern bank of the Ottawa River. The city with its 332,000 inhabitants and a total population of 1.3 million is the fourth largest city in the province of Quebec. The Gatineau River goes through the center of the city, flowing from the lakes in the north of the Baskatong Reservoir in western Quebec and joining the Ottawa River in the city. The Ottawa River is regulated with more than 50 major dams and 13 principal reservoirs [48]. The area of interest contains the relatively flat area starting from Chaudière Bridge, where the hydrometric gauge station of 02LA028 is located, continuing for nearly 6 km along the Ottawa River and Gatineau flood plain starting from Ottawa River continuing upstream for around 9 km. In May 2019, the city and adjacent regions experienced a severe flood causing 111 homes to evacuate, 923 damaged and cost more than 3.4 million for government and insurance companies [49]. The flooding was a result of extreme snowmelt accompanied by heavy rain.
The water level in the rivers is observed with several hydrometric stations along the river every five minutes and the observations are recorded on the Canadian Water Office website (wateroffice.ec.gc.ca). For this research, we used the 01AK003 gauge station located in downtown Fredericton and 02LA028 station located in downtown Ottawa. Figure 2 demonstrates the water level in five years of interest in Saint John River in Fredericton and Ottawa River.
Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 23 damaged and cost more than 3.4 million for government and insurance companies [49]. The flooding was a result of extreme snowmelt accompanied by heavy rain. The water level in the rivers is observed with several hydrometric stations along the river every five minutes and the observations are recorded on the Canadian Water Office website (wateroffice.ec.gc.ca). For this research, we used the 01AK003 gauge station located in downtown Fredericton and 02LA028 station located in downtown Ottawa. Figure 2 demonstrates the water level in five years of interest in Saint John River in Fredericton and Ottawa River.  The spring snowmelt and water level rise happens yearly with different degrees of intensity depending on the volume and rate of precipitation and how rapidly the snowpack melts and the depth of the snowpack. In the two consecutive years, 2018 and 2019, the Saint John River reached notable water levels of 8.216 m and 8.356 m respectively, which caused severe damage including the closure of about 50 streets in Fredericton and a total of 1000 evacuations across the province, with more than $23 million damage to the province infrastructures [50]. In the Gatineau area, the water level in 2019 reached 45.994 m which is nearly 3 m above the normal water level in the Ottawa River. This intense water level rise left 1212 flood victims and around 575 damaged homes [49]. The spring snowmelt and water level rise happens yearly with different degrees of intensity depending on the volume and rate of precipitation and how rapidly the snowpack melts and the depth of the snowpack. In the two consecutive years, 2018 and 2019, the Saint John River reached notable water levels of 8.216 m and 8.356 m respectively, which caused severe damage including the closure of about 50 streets in Fredericton and a total of 1000 evacuations across the province, with more than $23 million damage to the province infrastructures [50]. In the Gatineau area, the water level in 2019 reached 45.994 m which is nearly 3 m above the normal water level in the Ottawa River. This intense water level rise left 1212 flood victims and around 575 damaged homes [49].

Datasets
For this research, we used four different flood events of the Saint John River in Fredericton that occurred in the years 2014, 2016, 2018, 2019 and one flood event in Gatineau city in 2019 (Table 1). For accuracy assessment, we used optical satellite images of the area of interest.
Pan-sharpened Landsat and Multispectral Sentinel-2 images for the five study areas were collected from satellite archives dated either at their peak or close to the peak of the flood events. The Landsat images are pan-sharpened through the fusion of the panchromatic band (15 m) and multispectral bands (30 m) using University of New Brunswick (UNB) pan-sharpening method available in PCI Geomatica software. UNB pan-sharpening addresses significant image fusion challenges including color distortion and dataset dependency and is selected for the fusion of the panchromatic and multispectral bands of the Landsat 8 OLI images, as it properly preserves the spectral signature of the objects as reported by Reference [51]. In other words, the pan-sharpening uses least square to reduce color distortion and applies statistic techniques to remove dataset dependency [52]. More details can found in Reference [53].
For each flood event, we selected a number of sample points whose label is set to either flooded or not-flooded. We named the points depending on the source of labeling. When we labeled the points as flooded/not flooded using satellite images, we call the points ground truth points. Satellite images capture one instance of floods and clearly show the flooded pixels. Therefore, these labels are considered as ground truth. For the Fredericton and Gatineau study areas, 700 (350 flooded and 350 not flooded) and 330 (165 flooded; 165 not flooded) ground truth points were collected from satellite images, respectively. The points were well-distributed and spread along the floodplains as shown in Figure 3. The ground truth points are only used for accuracy assessment in this study. On the other hand, we labeled the same points using a thresholded HAND model. We call these points pseudo ground truth points. Among the pseudo ground truth points, we selected 10% of them as pseudo training points and 90% of them as validation points. Table 1. Flood events and the satellites used for accuracy assessment in each dataset. For accuracy assessment, the ground truth points were selected using a newly developed Normalized Water Difference Index (NDWI) [54] produced from the satellite images ( Figure 3). Prior to NDWI calculation, top of the atmosphere correction (TOA) is applied to the images to remove the atmospheric effects from the reflectance values. TOA is applied in QGIS software using the Semi-Automatic Classification Plugin [55]. Equation (1) demonstrates how NDWI is calculated using blue (Blue) and short-wave infrared (SWIR) bands (bands 2 and 6 for Landsat 8 OLI and bands 2 and 11 for Sentinel-2 satellites, respectively). Flooded and not-flooded ground truth points extracted from the NDWIs were mainly selected within the floodplain to assess the capability of the HAND model and PS-RF in generating accurate flood maps.  .
Light Detection and Ranging (LiDAR) data were used to generate the HAND model and three other conditioning factors including Altitude, Slope and Aspect. For the Fredericton datasets, the LiDAR data were obtained from the geographic data catalog website of the Province of New Light Detection and Ranging (LiDAR) data were used to generate the HAND model and three other conditioning factors including Altitude, Slope and Aspect. For the Fredericton datasets, the LiDAR data were obtained from the geographic data catalog website of the Province of New Brunswick (GeoNB) [56]. The LiDAR data were collected in a period of nearly two months, starting on August 2nd, 2015 and continuing to September 28th, 2015, with an average point density of 6 points per square meter. The LiDAR data has a horizontal positional accuracy of~0.257 m and vertical positional accuracy of 0.275 m which were tested using 216 GPS RTK Survey points [57]. Using available high-resolution LiDAR data, a Digital Terrain Model (DTM) with 1-m resolution was generated for Fredericton area. For the Gatineau dataset, a bare earth High Resolution Digital Elevation Model (HRDEM) [58] was downloaded from the government of Canada website with 1-m resolution. HRDM is a national product of high resolution DTM generated from LiDAR across Canada. The downloaded HRDEM was used to generate the HAND model and Altitude, Slope and Aspect conditioning factors for Gatineau city as was done in the Fredericton study area.

Conditioning Factors
In flood mapping using machine learning, it is crucial to use the appropriate conditioning factors [59]. In a previous case study [44] the best conditioning factors for flood mapping in the Fredericton area were identified. In the study, the effectiveness of 12 different conditioning factors including Altitude, Slope, Aspect, Distance from River, Land use/cover, Terrain Wetness Index (TWI), Terrain Roughness Index (TRI), Stream Power Index (SPI), Curvature, Plan Curvature, Profile Curvature and HAND model were investigated. To find the best combination of conditioning factors, different combinations of the 12 layers were tested as input features of a R.F. classifier. The results confirmed that the combination of five layers, namely Altitude, Slope, Aspect, Distance from River and Land use/cover, provided the highest accuracy. It was found that the Altitude layer is one of the most important conditioning factors for flood mapping [44]. The results also demonstrated that using correlated conditioning factors reduces the accuracy of the R.F. model and a higher number of conditioning factors does not necessarily improve the predictions [44,60]. Thus, for this research, the 5 best conditioning factors namely Altitude, Slope, Aspect, Distance from River and Land use/cover ( Figure 4) are used to help with the training of the R.F. model.
For the Fredericton dataset, the Altitude layer, with 1-m spatial resolution, was generated from LiDAR data. Slope and Aspect conditioning factors were derived from the Altitude layer in ArcGIS Desktop 10.6.1 using the Spatial Analyst tool. Distance from River was generated using the Euclidean distance tool within the ArcGIS software and generated from the river boundary shapefile which was obtained from the GeoNB website. The Land use/cover layer was generated by overlaying available polygon shapefiles provided on the GeoNB website. The polygons provided seven classes of Urban, Forest, Grass Land, Bare Land, Roads, Water, Wetlands and were obtained through aerial photography.
The generated conditioning factors for the Gatineau city were derived from HRDEM obtained from the federal governmental website. Conditioning factors were generated using the same tools mentioned earlier within ArcGIS software. The land use/cover layer for the Gatineau dataset was downloaded from Agriculture and Agrifood Canada [61].
Previously mentioned conditioning factors, Altitude, Slope, Aspect and Distance from River had ordinal values that were normalized from 0 to 1 for better consistency in R.F. [62]. All the conditioning factors were covering the same area extent constructing grids of size 1 m 2 , which resulted in raster sizes of 22,448 columns and 11,533 rows (~258 square kilometers) for the Fredericton region. The same conditions for the Gatineau city were applied and the grids in the Gatineau raster files contained 10,000 columns and 10,000 rows (~100 square kilometers).

HAND Model
The HAND model is used for several applications such as flood hazard mapping [29], [63], landform classification [17] and remote sensing [64]. As shown in Figure 5, the HAND model is generated through a series of steps. One of the most important requirements of having an accurate HAND model is to use a hydrologically conditioned DTM [17]. To have a proper hydrologically conditioned DTM, a total of 34 culverts and bridges were identified through aerial imagery and OpenStreetMap and burnt using Zonal Statistics Tool and Raster Calculator within ArcMap (Step-1 in Figure 5). Then, the pit holes and depressions in the area were filled using Pit Remove function from the Terrain Analysis Using Digital Elevation Models (TauDEM) suite within ArcGIS [65] (Step-2 in Figure 5). By burning culverts and bridges and filling depressions we ensured that our DTM does not contain erroneous flow directions. Then, the flow direction raster was computed using D-Infinity Flow Directions function from TauDEM, in which the flow is calculated through a triangular facet [66] (Step-3 in Figure 5). The last required component of the HAND model is the stream raster which will be considered as the elevation reference of the model and elevation of all pixels will be calculated from this layer, hence, it is necessary to ensure that the stream raster is accurate. The stream raster can be obtained through D-8 Flow Direction and thresholding the major stream based on Strahler stream order [29,67] but, since the stream shapefiles representing all the streams of the province were available in the GeoNB and Ottawa Open Data [68], geospatial databases for Fredericton and Gatineau respectively, we rasterized the shapefile and used it as our stream raster. D-Infinity Distance Down function available in TauDEM software is the last step which generates the HAND model. Different distance and statistical methods of calculation are available within this tool but for

HAND Model
The HAND model is used for several applications such as flood hazard mapping [29,63], landform classification [17] and remote sensing [64]. As shown in Figure 5, the HAND model is generated through a series of steps. One of the most important requirements of having an accurate HAND model is to use a hydrologically conditioned DTM [17]. To have a proper hydrologically conditioned DTM, a total of 34 culverts and bridges were identified through aerial imagery and OpenStreetMap and burnt using Zonal Statistics Tool and Raster Calculator within ArcMap (Step-1 in Figure 5). Then, the pit holes and depressions in the area were filled using Pit Remove function from the Terrain Analysis Using Digital Elevation Models (TauDEM) suite within ArcGIS [65] (Step-2 in Figure 5). By burning culverts and bridges and filling depressions we ensured that our DTM does not contain erroneous flow directions. Then, the flow direction raster was computed using D-Infinity Flow Directions function from TauDEM, in which the flow is calculated through a triangular facet [66] (Step-3 in Figure 5). The last required component of the HAND model is the stream raster which will be considered as the elevation reference of the model and elevation of all pixels will be calculated from this layer, hence, it is necessary to ensure that the stream raster is accurate. The stream raster can be obtained through D-8 Flow Direction and thresholding the major stream based on Strahler stream order [29,67] but, since the stream shapefiles representing all the streams of the province were available in the GeoNB and Ottawa Open Data [68], geospatial databases for Fredericton and Gatineau respectively, we rasterized the shapefile and used it as our stream raster. D-Infinity Distance Down function available in TauDEM software is the last step which generates the HAND model. Different distance and statistical methods of calculation are available within this tool but for the HAND model generation Vertical method for Distance and Average for Statistical are selected (Step-4 in Figure 5). Once the HAND model is generated through the steps, the model can be used to demonstrate inundated areas based on the water level rise. To do that, the water level difference should be calculated based on the observations from gauge station and the HAND model. For more details on the HAND model, readers can refer to References [17,69].
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 23 the HAND model generation Vertical method for Distance and Average for Statistical are selected (Step-4 in Figure 5). Once the HAND model is generated through the steps, the model can be used to demonstrate inundated areas based on the water level rise. To do that, the water level difference should be calculated based on the observations from gauge station and the HAND model. For more details on the HAND model, readers can refer to References [17,69]. To generate an early estimation of the flood extent, the HAND model is thresholded based on the water level as recorded by the local river gauge. To properly threshold the HAND model, the difference of water level at the time of flood and at the time of LiDAR data collection should be calculated and the model should be thresholded based on that difference [30]. As previously mentioned, in the Fredericton dataset, the LiDAR data were collected in a period of approximately two months from August 2nd, 2015 continuing to September 28th, 2015 and during that time the river water level fluctuated from 1.121 m to 1.854 m. We have used the average water level over that period as the river water level at the time of LiDAR data collection which is equal to 1.46 m. So, the threshold for all the Fredericton datasets were calculated by subtracting the average 1.46 m from the water level at the time of the flood event. The reference water level for the Ottawa River was considered to be 42.04 m which was obtained through gauge readings at the time of data collection. Figure 6 shows the HAND models generated for the five flood events discussed in this study.

Random Forest
R.F. is a supervised machine learning model and one of the most robust classifiers that makes decisions based on the average of the results of a multitude of decision trees [70]. The classifier fits an arbitrary number of trees on different features of the datasets and by averaging multiple predictions controls the over-fitting issue. R.F. was selected because of its capability to handle noisy data, its efficiency to take features with different natures and its ability to rank the features based on their degree of importance [70,71]. The feature with the highest degree of importance is the major splitter in all the trees. To generate an early estimation of the flood extent, the HAND model is thresholded based on the water level as recorded by the local river gauge. To properly threshold the HAND model, the difference of water level at the time of flood and at the time of LiDAR data collection should be calculated and the model should be thresholded based on that difference [30]. As previously mentioned, in the Fredericton dataset, the LiDAR data were collected in a period of approximately two months from August 2nd, 2015 continuing to September 28th, 2015 and during that time the river water level fluctuated from 1.121 m to 1.854 m. We have used the average water level over that period as the river water level at the time of LiDAR data collection which is equal to 1.46 m. So, the threshold for all the Fredericton datasets were calculated by subtracting the average 1.46 m from the water level at the time of the flood event. The reference water level for the Ottawa River was considered to be 42.04 m which was obtained through gauge readings at the time of data collection. Figure 6 shows the HAND models generated for the five flood events discussed in this study.

Random Forest
R.F. is a supervised machine learning model and one of the most robust classifiers that makes decisions based on the average of the results of a multitude of decision trees [70]. The classifier fits an arbitrary number of trees on different features of the datasets and by averaging multiple predictions controls the over-fitting issue. R.F. was selected because of its capability to handle noisy data, its efficiency to take features with different natures and its ability to rank the features based on their degree of importance [70,71]. The feature with the highest degree of importance is the major splitter in all the trees.    Figure 7 demonstrates the workflow of the proposed PS-RF model. In PS-RF, we used the HAND model predictions as pseudo labels to train a R.F. classifier in Python within Scikit-Learn library [72]. However, due to the deficiencies of the HAND model discussed previously, the pseudo labels obtained from the HAND model are not 100% accurate. Therefore, we need to detect the outliers and remove them so that the R.F. algorithm is trained with the most reliable pseudo training points. For selecting the best pseudo training points and removing the outliers, the RANSAC paradigm was used. Through the RANSAC process, the HAND model is used as a cost function for R.F. optimization. Through iterations and training R.F. by randomly selected pseudo training points, the best training subset is selected and as it is evident from Figure 8, different training subsets achieve various accuracies. So, the outliers involved in training data provided by the HAND model will be identified.

Pseudo Supervised Random Forest (PS-RF)
Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 23 Figure 7 demonstrates the workflow of the proposed PS-RF model. In PS-RF, we used the HAND model predictions as pseudo labels to train a R.F. classifier in Python within Scikit-Learn library [72]. However, due to the deficiencies of the HAND model discussed previously, the pseudo labels obtained from the HAND model are not 100% accurate. Therefore, we need to detect the outliers and remove them so that the R.F. algorithm is trained with the most reliable pseudo training points. For selecting the best pseudo training points and removing the outliers, the RANSAC paradigm was used. Through the RANSAC process, the HAND model is used as a cost function for R.F. optimization. Through iterations and training R.F. by randomly selected pseudo training points, the best training subset is selected and as it is evident from Figure 8, different training subsets achieve various accuracies. So, the outliers involved in training data provided by the HAND model will b First, the dataset, including 700 pseudo ground truth points for Fredericton and 330 pseudo ground truth points for Gatineau, with values from conditioning factors and pseudo labels (flooded and notflooded labels from the HAND model) are randomly split into two subsets of 10% training (in fact pseudo training) and 90% validation. The R.F. algorithm is then trained by the pseudo training data and used to predict the validation subset. The overall accuracy of the validation subset is checked against the HAND model. This accuracy number is stored and the process is repeated 1000 times to make sure different combinations of pseudo training points from the dataset are used to train R.F. Over the iteration process, the pseudo training subset is updated each time the process reaches a higher accuracy than previous predictions over the validation subset.

Pseudo Supervised Random Forest (PS-RF)
Once 1000 iterations were completed, the best pseudo training subset was used to train the R.F. model. Also, the model hyperparameters including the number of trees (n_estimators), the minimum number of samples (min_sample_split), minimum number of samples at a leaf (min_samples_leaf) and maximum depth of trees (max_depth) are optimized through 5-fold Cross-Validation. Then, the R.F. model which was trained by the best-achieved pseudo training subset is used to predict the ground truth points of the whole dataset.

Results
In this study, different measures for accuracy assessment were applied to compare the performance of our results from PS-RF to that of a thresholded HAND model. To assess the accuracy of PS-RF we used accuracy measures such as true positive rate (TPR), true negative rate (TNR), false positive rate (FPR), false negative rate (FNR), overall accuracy, Cohen's Kappa Coefficient [73] and Matthews Correlation Coefficient (MCC) [74] which are depicted in Table 2. First, the dataset, including 700 pseudo ground truth points for Fredericton and 330 pseudo ground truth points for Gatineau, with values from conditioning factors and pseudo labels (flooded and not-flooded labels from the HAND model) are randomly split into two subsets of 10% training (in fact pseudo training) and 90% validation. The R.F. algorithm is then trained by the pseudo training data and used to predict the validation subset. The overall accuracy of the validation subset is checked against the HAND model. This accuracy number is stored and the process is repeated 1000 times to make sure different combinations of pseudo training points from the dataset are used to train R.F. Over the iteration process, the pseudo training subset is updated each time the process reaches a higher accuracy than previous predictions over the validation subset.
Once 1000 iterations were completed, the best pseudo training subset was used to train the R.F. model. Also, the model hyperparameters including the number of trees (n_estimators), the minimum number of samples (min_sample_split), minimum number of samples at a leaf (min_samples_leaf) and maximum depth of trees (max_depth) are optimized through 5-fold Cross-Validation. Then, the R.F. model which was trained by the best-achieved pseudo training subset is used to predict the ground truth points of the whole dataset.

Results
In this study, different measures for accuracy assessment were applied to compare the performance of our results from PS-RF to that of a thresholded HAND model. To assess the accuracy of PS-RF we used accuracy measures such as true positive rate (TPR), true negative rate (TNR), false positive rate (FPR), false negative rate (FNR), overall accuracy, Cohen's Kappa Coefficient [73] and Matthews Correlation Coefficient (MCC) [74] which are depicted in Table 2.  [75]. Due to recent concerns about Cohen's Kappa Coefficient assessments and its undesired behavior discussed in Reference [76], the MCC metric was also calculated to ensure the validity of our evaluation. MCC values close to 1 represent perfect agreement, the value of 0 is interpreted as random prediction and the value of −1 is interpreted as complete opposite predictions for observations. Calculated indices are all based on four  [75]. Due to recent concerns about Cohen's Kappa Coefficient assessments and its undesired behavior discussed in Reference [76], the MCC metric was also calculated to ensure the validity of our evaluation. MCC values close to 1 represent perfect agreement, the value of 0 is interpreted as random prediction and the value of −1 is interpreted as complete opposite predictions for observations. Calculated indices are all based on four parameters of true positive (T.P.), true negative (T.N.), false positive (F.P.) and false negative (F.N.). In Cohen's Kappa Coefficient, P 0 and P e are relative observed agreement and the hypothetical probability of chance agreement, respectively. Table 2. Formulas related to accuracy assessment section.

Parameter Name Formula
Sensitivity (TPR) TP TP + FN  Figure 8 represents the Model Loss for the five periods. The vertical axis of the figures demonstrates how the prediction accuracy over validation data has improved and higher accuracy was obtained by selecting diverse subsets of pseudo ground truth points provided by the HAND model. The horizontal axis is the implemented iteration number. In the 2014 dataset, the R.F. model accuracy over validation data were improved by 0.06 through the iteration process. In 2016, it can be seen that an improvement of nearly 0.04 was achieved through iteration. In 2018, the accuracy for R.F. model was enhanced by 0.03 over validation data through shuffling and selecting training data randomly among our pseudo ground truth points. R.F. predictions for 2019 were also enhanced over validation data by 0.025 through randomly selecting pseudo training data. In the Gatineau flood event that occurred in 2019, there was an improvement of around 0.04 through the iteration process.
The achieved overall accuracy, which represents the closeness of predictions to their true classes, shows that PS-RF has improved the accuracy of the HAND model in all the five flood events (Figure 9). In the 2014 Fredericton flood event, PS-RF reached an overall accuracy of 94.23% compared to 88.86% from the HAND model. Also, PS-RF was~11% higher than HAND in both Cohen's and MCC respectively. In the 2016 Fredericton flood event, the PS-RF model has improved the accuracy of the HAND model by 5% with 94.14% accuracy for PS-RF and 89.14% accuracy for the HAND model. Cohen's and MCC metrics were 10% and~9.5% higher for PS-RF than HAND in 2016 as well. 2018 flood event predictions were nearly similar but still, PS-RF obtained a higher accuracy with 88.43% overall accuracy for PS-RF and 85.57% for the HAND model. In 2018, PS-RF had Cohen's of~77% which is~6% higher than HAND and MCC score of~79% which is~7% higher than HAND. In the 2019 Fredericton flood event, PS-RF was able to reach an overall accuracy of~95% compared to~90% for the HAND model with~11% higher Cohen's and MCC score. In Gatineau 2019 flood event, the overall accuracy of PS-RF was 94.85% while the HAND model reached 93.03%. PS-RF's Cohen's and MCC score for Gatineau 2019 flood event were~3.5% and~3% higher than HAND, respectively.   To analyze the HAND model and PS-RF accuracy in detecting flood extent, TPR, FPR, TNR and FNR indices were calculated. The accuracy indexes are presented in Figure 10 and discussed further in the discussion section.

Discussion
PS-RF was tested in 5 different flood events and resulted in higher accuracy than the HAND model in all. In flood mapping, it is very important to provide precise estimations and the model should not over or underestimate the flood extent. To analyze the performance the HAND model in To analyze the HAND model and PS-RF accuracy in detecting flood extent, TPR, FPR, TNR and FNR indices were calculated. The accuracy indexes are presented in Figure 10 and discussed further in the discussion section.

Discussion
PS-RF was tested in 5 different flood events and resulted in higher accuracy than the HAND model in all. In flood mapping, it is very important to provide precise estimations and the model should not over or underestimate the flood extent. To analyze the performance the HAND model in estimating the extent of 5 tested flood events, different accuracy metrics were calculated and are discussed below.
The Fredericton 2014 flood event, the HAND model has correctly predicted~85% of flooded ground truth points while PS-RF achieved~98%. In the case of not-flooded points, the HAND model performed slightly better with TNR value of~93% compared to~91% for PS-RF. This means PS-RF was able to perform better in detecting flooded points but the HAND model had a better performance in detecting not-flooded points (Figure 10a). In the Fredericton 2016 flood event, PS-RF performed better in predicting both flooded and not-flooded points with~9% and~1% higher TPR and TNR values (Figure 10b), respectively. The Fredericton 2018 flood event contained nearly similar results for PS-RF and HAND model with PS-RF achieving~9% higher TPR and~3% lower TNR compared to the HAND model (Figure 10c). In the Fredericton 2019 flood event, PS-RF obtained~11% higher TPR and 0.3% lower TNR (Figure 10d). In the Fredericton 2016, 2018 and 2019 floods, PS-RF performed better in detecting both flooded and not-flooded ground truth points than the HAND model. However, in the Gatineau 2019 flood event, PS-RF was able to reach a TPR value of~93% compared to~87% from the HAND model. However, the HAND model performed better in detecting not-flooded points with~2% higher TNR value (Figure 10e). The HAND model can estimate the inundated areas given the depth of water. Therefore, it can produce a dynamic flood mapping model. However, since it is a simplified model, it suffers from having a limited accuracy as reported in different studies [30]. In this study, we proposed a new model that achieved better results compared to a HAND model. We trained an R.F. model to benefit from different conditioning factors, that is, altitude, slope, aspect, distance from river and land use/cover, to adjust the flood borders produced by the HAND model. Thus, this model combines the HAND results with different conditioning factors to improve the accuracy of the final flood maps.
In PS-RF, the predictions of the HAND model are used as the presumptive labels of ground truth points to train R.F. model with five conditioning factors of Altitude, Slope, Aspect, Distance from River and Land use/cover. However, as concluded previously, the HAND model estimations contain errors that need to be eliminated from the pseudo training subset. To remove the HAND model errors, the RANSAC paradigm was employed. The philosophy of RANSAC is to use as little data as possible to train a model while attempting to remove the errors [46]. Thus, to find the optimum percentage of the pseudo training points, different percentage numbers including 10%, 15%, 25%, 50%, 70% and 90% were tested over the 2016 Fredericton and 2019 Gatineau datasets. These pseudo training points are used to train the R.F. model. Figure 11 demonstrates the overall accuracy of the R.F. with different training percentages over the test subset. For comparison only, we displayed the HAND model accuracy in the figure as well.
Remote Sens. 2020, 12, x FOR PEER REVIEW 18 of 23 estimating the extent of 5 tested flood events, different accuracy metrics were calculated and are discussed below. The Fredericton 2014 flood event, the HAND model has correctly predicted ~85% of flooded ground truth points while PS-RF achieved ~98%. In the case of not-flooded points, the HAND model performed slightly better with TNR value of ~93% compared to ~91% for PS-RF. This means PS-RF was able to perform better in detecting flooded points but the HAND model had a better performance in detecting not-flooded points (Figure 10a). In the Fredericton 2016 flood event, PS-RF performed better in predicting both flooded and not-flooded points with ~9% and ~1% higher TPR and TNR values (Figure 10b), respectively. The Fredericton 2018 flood event contained nearly similar results for PS-RF and HAND model with PS-RF achieving ~9% higher TPR and ~3% lower TNR compared to the HAND model (Figure 10c). In the Fredericton 2019 flood event, PS-RF obtained ~11% higher TPR and ~0.3% lower TNR (Figure 10d). In the Fredericton 2016, 2018 and 2019 floods, PS-RF performed better in detecting both flooded and not-flooded ground truth points than the HAND model. However, in the Gatineau 2019 flood event, PS-RF was able to reach a TPR value of ~93% compared to ~87% from the HAND model. However, the HAND model performed better in detecting not-flooded points with ~2% higher TNR value (Figure 10e). The HAND model can estimate the inundated areas given the depth of water. Therefore, it can produce a dynamic flood mapping model. However, since it is a simplified model, it suffers from having a limited accuracy as reported in different studies [30]. In this study, we proposed a new model that achieved better results compared to a HAND model. We trained an R.F. model to benefit from different conditioning factors, that is, altitude, slope, aspect, distance from river and land use/cover, to adjust the flood borders produced by the HAND model. Thus, this model combines the HAND results with different conditioning factors to improve the accuracy of the final flood maps.
In PS-RF, the predictions of the HAND model are used as the presumptive labels of ground truth points to train R.F. model with five conditioning factors of Altitude, Slope, Aspect, Distance from River and Land use/cover. However, as concluded previously, the HAND model estimations contain errors that need to be eliminated from the pseudo training subset. To remove the HAND model errors, the RANSAC paradigm was employed. The philosophy of RANSAC is to use as little data as possible to train a model while attempting to remove the errors [46]. Thus, to find the optimum percentage of the pseudo training points, different percentage numbers including 10%, 15%, 25%, 50%, 70% and 90% were tested over the 2016 Fredericton and 2019 Gatineau datasets. These pseudo training points are used to train the R.F. model. Figure 11 demonstrates the overall accuracy of the R.F. with different training percentages over the test subset. For comparison only, we displayed the HAND model accuracy in the figure as well. Since the datasets for Fredericton flood events include the same number of ground truth points, different pseudo training subsets were tested over Fredericton 2016 flood event. As is evident in Figure 11a, the highest overall accuracy, 92.86%, was obtained using 10% pseudo training points and as the training subset was increased, the overall accuracy has become more similar to that of the Since the datasets for Fredericton flood events include the same number of ground truth points, different pseudo training subsets were tested over Fredericton 2016 flood event. As is evident in Figure 11a, the highest overall accuracy, 92.86%, was obtained using 10% pseudo training points and as the training subset was increased, the overall accuracy has become more similar to that of the HAND model. In the Fredericton dataset, where the pseudo training subset was 70%, an overall accuracy of 91.29% was obtained and where it was 90% an overall accuracy of 91.14% was obtained which is similar to the HAND model overall accuracy. In the Gatineau dataset, the highest overall accuracy, 93.94%, was obtained by 15% training data. By increasing the pseudo training percentage to 25%, the overall accuracy reaches 93.48% which is slightly higher than the HAND with 93.03% overall accuracy. As shown in Figure 11b, other training amounts achieved overall accuracies fairly similar to that of the HAND model while the 15% one attained the highest rate. The results of this study showed how the HAND model can provide pseudo training data for R.F. to model flood events. Through this integration, the required training data for R.F. was obtained and the flood events were modeled with a higher accuracy than by the HAND model.
In this study, we proposed to improve the performance of the HAND model in flood mapping by integrating it with one of the most robust machine learning models, R.F. The R.F. model is used in different studies including [19,35] for generating flood susceptibility maps by using the geospatial information of abundant available inundated locations as training data. In these studies, the R.F. model makes predictions based on the data related to previous flood events; therefore, the model provides a general assessment of areas that are at risk of flooding by using datasets from multiple flood events. However, the required training data for R.F. might not be always available for all the locations, which makes the use of R.F. very strict in this context. Hence, we proposed to use a simplified conceptual model to provide the training data for R.F. to dynamically identify the inundated areas.

Conclusions
Near-real time flood mapping requires efficient and accurate estimations. Previous techniques for flood mapping either lack the necessary precision or are very costly and intricate for rapid implementation. In this research, a new model, named PS-RF, was developed for flood mapping through the integration of a simplified conceptual model, HAND and a robust machine learning model, R.F. This model was tested on 5 different flood events occurred in Fredericton and Gatineau achieving higher overall accuracy in all the events compared to the HAND model. First, the HAND model was used to simulate the extent of the flood events based on the river water level. To improve the accuracy of flood maps obtained from the HAND model, the RANSAC methodology was used to filter out the outliers from pseudo ground truth points and maintain the most established ones. Then, those most certain pseudo ground truth points were used as pseudo training points for R.F. model. Next, the trained R.F. model was used to estimate the overall flood extent in the area. The proposed model obtained higher accuracy than the HAND model in all five years of interest. Also, by integrating the HAND model and R.F., the required training data for the R.F. model were provided through the HAND. This approach can be implemented in areas with different climate and topography where a high resolution DTM is available; however, varied conditioning factors might be selected for other regions depending on the characteristics of a region. In our future work, we will consider how the feature selection can be further automated using different optimization algorithms which will result in a more complete flood mapping procedure.