GFPLAIN and Multi-Source Data Assimilation Modeling: Conceptualization of a Flood Forecasting Framework Supported by Hydrogeomorphic Floodplain Rapid Mapping

: Hydrologic/hydraulic models for ﬂood risk assessment, forecasting and hindcasting have been greatly supported by the rising availability of increasingly accurate and high-resolution Earth Observation (EO) data. EO-based topographic and hydrologic open geo data are, nowadays, available on large scales. Data Assimilation (DA) models allow Early Warning Systems (EWS) to produce accurate and timely ﬂood predictions. DA-based EWS generally use river ﬂow real-time observations and 1D hydraulic models to identify potential inundation hot spots. Detailed high-resolution 2D hydraulic modeling is usually not used in EWS for the computational burden and the numerical complexity of injecting multiple spatially distributed sources of ﬂow observations. In recent times, DEM-based hydrogeomorphic models demonstrated their ability in characterizing river basin hydrologic forcing and ﬂoodplain domains providing data-parsimonious opportunities for data-scarce regions. This work investigates the use of hydrogeomorphic ﬂoodplain terrain processing for optimiz-ing the ability of DA-based EWSs in using diverse distributed ﬂow observations. A ﬂood forecasting framework with novel applications of hydrogeomorphic ﬂoodplain processing is conceptualized for empowering ﬂood EWSs in preliminarily identifying the computational domain for hydraulic modeling, rapid ﬂood detection using satellite images, and ﬁltering geotagged crowdsourced data for ﬂood monitoring. The proposed ﬂood forecasting framework supports the development of an integrated geomorphic-hydrologic/hydraulic modeling chain for a DA that values multiple sources of observation. This work investigates the value of ﬂoodplain hydrogeomorphic models to tackle the major challenges of DA for EWS with speciﬁc regard to the computational efﬁciency issues and the lack of data in ungauged river basins towards an improved ﬂood forecasting able to use advanced hydrodynamic modeling and to inject all available sources of observations including ﬂood phenomena captures by citizens.


Introduction
DEM-based hydrogeomorphic models are fast and parsimonious tools aimed to identify floodplain boundaries. Geomorphic models define floodplains as those riparian areas underlying maximum flow levels associated with erosion and deposition processes linked to historical floods. Some of these models are based on the application of geomorphic laws [1,2] such as Nardi et al., 2006 [3] and Samela et al., 2017 [4] and were developed and implemented extensively at the basin, continental and global scale [5,6].
Hydro-geomorphic models have been used for different purposes, such as flood-prone areas delineation [7], accuracy assessment of Digital Elevation Models (DTMs) [8], the impact of levees on wetlands [9], and for investigating floodplain connectivity patterns [10].
Moreover, geomorphic classifiers can be also integrated into Machine Learning techniques for rapid delineation of the maximum flood extent. For example, Tavares da Costa et al., 2 of 10 2020 [11] developed linear stepwise and random forest regressions trained with flood descriptors using geomorphic and climatic/hydrologic catchment characteristics to envelope flood extents, obtaining good results with respect to the standard flood hazard maps.
Hydrogeomorphic floodplain modeling demonstrated to be a valid complement to standard flood hazard zoning based on physically-based hydrologic/hydraulic models. The simplicity of reading floodplain morphology breaks in slope along fluvial corridors may constitute a valid surrogate of detailed simulations integrating rainfall-runoff and flow propagation simulations for flood hindcasting, forecasting and hazard mapping, especially in ungauged basins [12].
Hydrogeomorphic dissection of floodplains, distinguishing flood-prone zones from surrounding hillslopes, define boundaries where fluvial processes occur and may represent valuable information for improving flood monitoring and mapping. Geospatial data filtering within floodplain zones can be crucial where timely flood observations are needed for supporting flood Early Warning Systems. We, thus, argue that hydrogeomorphic floodplain mapping can be integrated into Data Assimilation (DA) frameworks, where flood model inputs, state variables and/or parameters are updated in real-time or nearreal-time during a flood event according to observations of flow levels derived usually from stage gages [13] or satellite altimetry [14,15]. The adoption of a preliminary screening of areas where flood wave propagations may occur may be particularly important to value the increasing availability of Earth Observation (EO) data at different spatial and temporal scales, from satellite images to local observations taken by citizens with smartphones. The use of new sources of information whose spatial location is not necessarily known (as in standard flow gauges) is particularly important for flood forecasting models in DA frameworks.
Some examples of research investigations on this topic were recently developed. For example, satellite-derived flood extents were used as observation for updating the Data Assimilation framework based on 2D hydraulic models [16][17][18][19][20][21].
Moreover, geotagged social media contents demonstrated to be potentially useful for gaining a quicker near-real-time understanding of the location, the timing, as well as the causes and impacts of floods [22]. Therefore, crowdsourced information of flow depths started to be investigated for updating hydrologic and hydraulic models for improving flood forecasting models [23][24][25][26].
Satellite-derived flood extents and geotagged crowdsourced datasets can provide crucial information during a flood event and can be considered complementary. In fact, on one side, satellite-derived flood extent datasets are increasing in temporal frequency because of the launching of new satellite missions and the integration of different constellations among different missions [27]. Moreover, the satellite flood extent accuracy is increasing because of the increasing accuracy of satellite sensors and the refinement of flood detection algorithms [28]; however, these datasets have strong limitations in small basins with flood responses faster than the satellite revisit time and in urban environments because of issues related to radar layover, foreshortening, shadows and double backscatter due to buildings and man-made structures [27,29]. On the other hand, crowdsourced data, even if affected by several issues related to accuracy and credibility of the users [30], location and timing errors [25], can provide dense and distributed information even in small ungauged basins and especially in an urban environment, filling the potential gaps left by satellite imagery.
To date, a flood early warning and forecasting framework that integrates hydrogeomorphic rapid flood modeling is not available, especially for supporting more advanced physically based flood inundation models in DA frameworks.
In this study, we propose a conceptual framework for integrating hydrogeomorphic modeling to: 1. Support fast hydrologic modeling for real-time identifying areas related to critical nodes of small basins whose stream network is not completely covered by the available standard flood maps; 2. Improve multi-source data assimilation models for near-real-time flood forecasting and mapping.
Specifically, we propose the integration of the GFPLAIN model [3,5] in a DA framework to both bounding the physically-based flow propagation processes and masking geospatial information to be adopted as real-time observations for updating the flood forecasting model. We identified satellite-derived flood extensions and geotagged crowdsourced as examples of intermittent and spatially distributed observations that can be ingested for updating the flood forecasting model.
The aim of this methodology is to improve the responsiveness and enrich the set of information of the DA framework, reducing the computational time of both the physical hydraulic model and the algorithms aimed to retrieve intermittent and spatially distributed information for the model updating.

The Methodology: Integrating Hydrogeomorphic and Data Assimilation for Flood Forecasting
The following subsections describe how the hydrogeomorphic model can be adopted as a supporting method for improving near real-time flood forecasting. The working hypothesis is that the physically based hydrologic/hydraulic flood routing model can be updated, by means of the DA, by intermittent and spatially distributed observations whose location is unknown a priori such as satellite-derived flood extents and geotagged crowdsourced contents. As a result, preliminary knowledge of potential flood extents and river basin hydrologic forcing spatial and temporal distributions may support the filtering and use of unconventional flood observations.
In this section, we firstly describe the hydrogeomorphic floodplain mapping method (Section 2.1). Then, we explore the potential benefits of adopting hydrogeomorphic floodplain mapping integrated with simplified hydrologic modeling for identifying critical areas in small ungauged basins (Section 2.2). Moreover, the specific steps where applying hydrogeomorphic modeling in a DA framework for large-scale flood forecasting are described in Section 2.3.
The methodologies of the Data Assimilation modeling, i.e., the application of the sequential ensemble-based methods with Monte Carlo approaches, generation of the probability density functions (pdf ) of the model errors and observations errors, the updating of the model state, model inputs or model parameters are briefly described in Section 2.3.4.

The Hydrogeomorphic Model GFPLAIN
The GFPLAIN algorithm developed by Nardi et al., 2006Nardi et al., , 2019 is based on the implementation of well-known scaling laws [1,2] relating the basin contributing area (A) in a specific stream network section with the water energy level (d) related to an high magnitude flood event, with the following equation: where a and b are scaling law parameters dependent on the geomorphic and climatic basin settings. These parameters can be obtained in a GIS environment considering the resolution of the DTM, the morphometric and climatic setting of the basin in the study area [31].

The Hydrogeomorphic Modeling GFPLAIN for Supporting Small-Scale Flood Hazard Mapping and Forecasting
DEM-based hydrogeomorphic models demonstrated to be effectively used at the basin scale to extensively map riparian corridors of major rivers and tributaries, from upstream to coastal fluvial domains. A spatial comparison between GFPLAIN datasets and SFHM was performed in previous studies at a basin [31,32] and continental [5] scale. These studies demonstrated that hydrogeomorphic floodplain datasets, such as GFPLAIN, cover usually a larger portion of the stream network with respect to standard flood hazard maps (SFHM) [12]. GFPLAIN is able to identify further potential flood hazard areas, outside areas covered by SFHM, that may be the source of significant flood risk and, thus, should require specific attention (e.g., river confluences with small scale basins of major tributaries, complex floodplains impacted by road/railroads network infrastructures). Figure 1 shows a schematic sample comparing the current SFHM available for a small basin (Rio Galeria, tributary of the Tiber River, central Italy) and the GFPLAIN dataset applied with the available highest spatial resolution (5 m cell size) DEM (provided by Regione Lazio). The flood hazard mapping of the Tiber river and its tributaries was extensively analyzed in recent scientific literature [33,34]. Stream network initiation and power laws parameters of Equation (1) were determined as a function of the DTM resolution, morphometric settings according to Annis et al., 2019 [31]. Lengths of the stream network covered by SFHM and GFPLAIN are respectively 42.4 and 137.5 km. Moreover, the GFPLAIN dataset shows that even if the main channel of the Galeria river was already analyzed with standard hydrologic/hydraulic modeling, the floodplain width could be even larger especially at the confluence of many tributaries that are still not modeled with standard hydrologic/hydraulic modeling. and SFHM was performed in previous studies at a basin [31,32] and continental [5] scale. These studies demonstrated that hydrogeomorphic floodplain datasets, such as GFPLAIN, cover usually a larger portion of the stream network with respect to standard flood hazard maps (SFHM) [12]. GFPLAIN is able to identify further potential flood hazard areas, outside areas covered by SFHM, that may be the source of significant flood risk and, thus, should require specific attention (e.g., river confluences with small scale basins of major tributaries, complex floodplains impacted by road/railroads network infrastructures). Figure 1 shows a schematic sample comparing the current SFHM available for a small basin (Rio Galeria, tributary of the Tiber River, central Italy) and the GFPLAIN dataset applied with the available highest spatial resolution (5 m cell size) DEM (provided by Regione Lazio). The flood hazard mapping of the Tiber river and its tributaries was extensively analyzed in recent scientific literature [33,34]. Stream network initiation and power laws parameters of Equation (1) were determined as a function of the DTM resolution, morphometric settings according to Annis et al., 2019 [31]. Lengths of the stream network covered by SFHM and GFPLAIN are respectively 42.4 and 137.5 km.
Moreover, the GFPLAIN dataset shows that even if the main channel of the Galeria river was already analyzed with standard hydrologic/hydraulic modeling, the floodplain width could be even larger especially at the confluence of many tributaries that are still not modeled with standard hydrologic/hydraulic modeling. In the proposed conceptualization, the hydrogeomorphic floodplain dataset is adopted as a mask to identify areas at critical nodes of the stream network modeled with a simplified real-time lumped hydrologic model. The adopted geomorphic hydrologic modeling (WFIUH) is extensively documented in the literature [35] and it was already applied for detecting critical nodes in small ungauged basins by Nardi et al., 2018 [34]. Figure 2 illustrates a schematic of an application of a lumped hydrological model (with a In the proposed conceptualization, the hydrogeomorphic floodplain dataset is adopted as a mask to identify areas at critical nodes of the stream network modeled with a simplified real-time lumped hydrologic model. The adopted geomorphic hydrologic modeling (WFIUH) is extensively documented in the literature [35] and it was already applied for detecting critical nodes in small ungauged basins by Nardi et al., 2018 [34]. Figure 2 illustrates a schematic of an application of a lumped hydrological model (with a hydrogeomorphic Instantaneous Unit Hydrograph-WFIUH) applied in each upstream node of a stream network where the floodplain dataset bounds the extensions of critical nodes (even outside the available SFHM) at the stream confluences, or culverts/bridges intersection. This aspect is crucial, since the exposure of critical areas plays an important role, even more than vulnerability, in the magnitude of losses and damages estimation [36].
node of a stream network where the floodplain dataset bounds the extensions of critical nodes (even outside the available SFHM) at the stream confluences, or culverts/bridges intersection. This aspect is crucial, since the exposure of critical areas plays an important role, even more than vulnerability, in the magnitude of losses and damages estimation [36]. In this specific case study, a right tributary of the main Galeria river shall need further analyses beyond the available standard flood maps, because of the presence of critical areas such as a road crossing close to an oil refinery deposit settled in a floodplain area (yellow star in Figure 1). This is confirmed by recent evidence of damages in the abovementioned critical areas due to a flood event in January 2014.

GFPLAIN Hydrogeomorphic Modeling for Supporting DA in Large-Scale Flood Forecasting
The flowchart in Figure 3 schematizes how the floodplain dataset is used as a computational domain for: 1.identifying the maximum extension of the hydraulic model; 2.masking the flood detection algorithm applied on the satellite image; 3.Filtering geotagged information from crowdsourced datasets related to the flood event. In this specific case study, a right tributary of the main Galeria river shall need further analyses beyond the available standard flood maps, because of the presence of critical areas such as a road crossing close to an oil refinery deposit settled in a floodplain area (yellow star in Figure 1). This is confirmed by recent evidence of damages in the above-mentioned critical areas due to a flood event in January 2014.

GFPLAIN Hydrogeomorphic Modeling for Supporting DA in Large-Scale Flood Forecasting
The flowchart in Figure 3 schematizes how the floodplain dataset is used as a computational domain for: 1. identifying the maximum extension of the hydraulic model; 2. masking the flood detection algorithm applied on the satellite image; 3. filtering geotagged information from crowdsourced datasets related to the flood event.  The hydrogeomorphic floodplain dataset is used both in the forecasting and in the steps of the observation before the combination of the updating of the model states, inputs and parameters (updating step of the DA model). The specific phases in which the hydrogeomorphic model is integrated into the DA framework are illustrated in the following subsections. The hydrogeomorphic floodplain dataset is used both in the forecasting and in the steps of the observation before the combination of the updating of the model states, inputs and parameters (updating step of the DA model). The specific phases in which the hydrogeomorphic model is integrated into the DA framework are illustrated in the following subsections.

Definition of the Hydraulic Model Domain Using GFPLAIN
The choice of the hydraulic modeling domain is usually entrusted to the experience of the flood modeler and/or considering the extensions of the available high magnitude flood hazard maps. However, flood hazard maps, if available, could not consider floodplain portions beyond the flood protection structures (e.g., levees) that should be considered to simulate, for example, unexpected inundations due to levee breaching or overtopping. Moreover, Figure 1 shows that the adoption of an SFHM as a reference hydraulic domain could underestimate the actual extension of the model boundaries where flood damages could occur at the confluence of small ephemeral tributaries.
On the other hand, advanced physical models for flood forecasting and mapping (e.g., 2D or Quasi-2D hydraulic models) are usually computationally demanding and need to be as fast as possible to meet the need of real-time or near-real-time response of DA frameworks. Coarse-resolution hydraulic models with simplified river channel geometry [37,38] can help to reduce the computational burden, but their performance can be considered acceptable only whit large rivers and valley-filling flood events [39]. On the other hand, small-scale domains require high-resolution computational domains with high accuracy DTMs [40]. DA frameworks are often implemented with ensemblebased methods with Monte Carlo (e.g., Ensemble Kalman filter -EnFK-and Particle Filtering-PF) approaches requiring simultaneous simulations to represent the pdf of the forecasting model errors. Therefore, the hydrogeomorphic models can effectively provide the preliminary computational domains excluding hillslope areas where channelized flood propagation does not occur.

Masking Satellite Images Using GFPLAIN for Flood Detection Algorithms
The ensemble-based methods (EnKF and PF) require Monte Carlo approaches to simulate the pdf of the observation errors adopted to update the model states, inputs and parameters. The flood extension can be used as direct observations in DA frameworks [16] or to develop a cost-function of the internal model states [17,19]. The need of generating the pdf of the observation errors requires several simultaneous applications of the flood detection algorithms applied to multispectral or SAR images. Therefore, the hydrogeomorphic floodplain dataset can be adopted for reducing both the computational domain of these algorithms and the extension of potential overestimation errors due to radar shadow (for SAR images) or clouds (for multispectral images) [28] outside the flood-prone areas, thus avoiding unwanted overestimation of the observed flood extent.

Filtering of the Crowdsourced Observations
The retrieval of crowdsourced observations for flood monitoring is affected by several limitations related to the accuracy of the information (e.g., flow levels/velocities provided by untrained people, location and timing uncertainties [25,41]) and to mining unstructured data [42,43]. Besides the issues related to the choices of semantic tags [44], spatial filtering is crucial for gathering useful flood-related information, for example excluding water levels reports outside the floodplain area related to other causes such as pluvial floods. Crowdsourced information can be gathered automatically adopting the Application Programming Interface (API) of the social media platforms selecting keywords related to specific flood events. Examples of the adoption of geotagged semantically filtered Twitter of Flickr contents can be found at a local [44,45] and global [46] scale. Once gathered, waterrelated information can be further analyzed to extract quantitative observation manually or automatically, for example adopting deep learning techniques applied to images or videos [47].
The proposed approach adopts the hydrogeomorphic floodplain dataset as the first spatial filter to select geotagged crowdsourced information. Note that this spatial mask is meant to be integrated into further manual or automatic data filtering related to the geotagged social media contents. Figure 4 illustrates an example of a DA approach for near real-time flood forecasting updating the model states, inputs and parameters with observations from satellite-derived flood extents, bounded by the hydrogeomorphic floodplain dataset. Examples of DA frameworks adopting satellite-derived flood extents can be found in the recent scientific literature [16,17,[19][20][21]. In this section, we focused on the integration of the GFLAIN dataset whereas the hydrological/hydraulic modeling updated by satellite-derived flood extents is referred to in the above-mentioned literature. On the other hand, an application of a DA approach updated by crowdsourced observation can be found in [25].  The model state updating considered for a 1D-2D hydraulic model is usually related to the water levels [14,[48][49][50] or flood extent [16]. Model input updating is related to inflow hydrograph derived from stage gauges observations (assuming specific flow-stage rating tables) or from rainfall-runoff modeling [15,51,52]. Parameters for 1D-2D hydraulic model updating are channel/floodplain friction, even if recent studies demonstrated that in calibrated and validated models, this updating has a second-order effect in terms of changes of results of flood inundation models with respect to the variations due to the uncertainty of model inputs [50]. This second-order effect can be considered particularly negligible when uncertainties of the model inputs are given by rainfall-runoff modeling where rainfall and infiltration uncertainties, among others, are considered much more impacting on flood simulations with respect to hydraulic friction [51].

Scheme of a DA Approach for Flood Forecasting
Satellite-derived flood extents for near-real-time model updating are usually The model state updating considered for a 1D-2D hydraulic model is usually related to the water levels [14,[48][49][50] or flood extent [16]. Model input updating is related to inflow hydrograph derived from stage gauges observations (assuming specific flow-stage rating tables) or from rainfall-runoff modeling [15,51,52]. Parameters for 1D-2D hydraulic model updating are channel/floodplain friction, even if recent studies demonstrated that in calibrated and validated models, this updating has a second-order effect in terms of changes of results of flood inundation models with respect to the variations due to the uncertainty of model inputs [50]. This second-order effect can be considered particularly negligible when uncertainties of the model inputs are given by rainfall-runoff modeling where rainfall and infiltration uncertainties, among others, are considered much more impacting on flood simulations with respect to hydraulic friction [51].
Satellite-derived flood extents for near-real-time model updating are usually gathered from SAR images because of their higher reliability regardless of the weather and daylight conditions with respect to the multispectral images [27]. The generation of the ensemble of the observed flood extent is related to its uncertainty and can be performed by estimating the pixel-by-pixel probability corresponding to open water given its backscatter value [52].
Ensemble-based DA filtering performed with PF has the advantage of considering even non-Gaussian observation errors and avoid to update of model states that may lead to model instability issues [16]. Conversely, EnKF allows for much smaller ensemble sizes and was recently used in different studies [27]. In this regard, the application of the hydrogeomorphic floodplain dataset as a spatial filter for generating the pdf of both observation and model errors can help to limit the computational burden due to the ensemble's generation.

Conclusions
This work conceptualizes the integration of a hydrogeomorphic floodplain delineation model GFPLAIN to improve flood forecasting at different spatial scales, for both small ungauged basins and large major rivers. Specifically, we propose a flood hazard modeling and forecasting framework characterized by two novel main features:

•
The adoption of hydrogeomorphic floodplain terrain processing to identify the maximum flood extent and capture the domain of inclusion of critical nodes whose hydrologic forcing is analyzed by means of a real-time lumped hydrologic model based on a hydrogeomorphic approach (e.g., WFIUH). The proposed research aims to pave the way for adopting hydrogeomorphic floodplain modeling to improve consolidated flood forecasting frameworks for: • Providing ancillary information on the extension of critical areas (e.g., in the case of the application of a simplified lumped hydrologic model) during flood events. • Pre-process the computational domain of physical models (e.g., 2D hydraulic models) and geospatial algorithms for detecting flood-related observations whose extension or position is unknown a priori.