Flood Hazard and Risk Mapping by applying an Explainable Machine Learning Framework using Satellite Imagery and GIS data

: Flood is one of the most destructive natural phenomena that happens world-widely 1 leading to damage of properties, infrastructures, or even loss of lives. The escalation in intensity 2 and number of flooding events as a result of the combination of climate change and anthropogenic 3 factors motivates the need to adopt real-time solutions for mapping flood hazards and risks. In 4 this study, a methodological framework is proposed that enables the assessment of flood hazard 5 and risk levels of severity dynamically by fusing optical remote sensing (Sentinel-1) and GIS-based 6 data from the region of Trieste, Monfalcone and Muggia Municipalities. Explainable machine 7 learning techniques were utilised, aiming to interpret the results for the assessment of flood hazard. 8 The flood inventory was randomly divided into 70% were used for training and the remaining 30% 9 were employed for testing. Various combinations of the models were evaluated for the assessment 10 of flood hazard. The results revealed that the Random Forest model achieved the highest F1-score 11 (approx. 0.99), among others and utilised for generating flood hazard maps. Furthermore, the 12 estimation of the flood risk achieved by a combination of a rule-based approach to estimate the 13 exposure and vulnerability with the dynamic assessment of flood hazard. 14


Introduction
Over the past couple of decades, flood disasters are intensified, become more fre-18 quent and are more destructive compared with the old ones, especially in the developing 19 countries, such as those in Latin America and the Caribbean [1], causing loss of human 20 lives and properties worldwide.According to the CRED's Emergency Events Database 21 (EM-DAT 1 ), 44% of all disaster events from 2000 to 2019 concern flooding events, that 22 have impacted on 1.6 billion people worldwide, which is the highest figure for any 23 disaster type.Furthermore, floods are the most common type of event with an average 24 of 163 events per year [2].Climate changes along with anthropogenic factors play a 25 significant role in escalating the severe impacts of flood disasters in terms of economic 26 loss, social disruptions, and damage to the urban environment.Therefore, the proper 27 monitoring to identify areas prone to floods and the effective mitigation countermeasures 28 are considered very important to risk reduction [3][4][5][6][7].29 The deployment of real-time solutions for mapping flood hazard and the estimation 30 of potential consequences of flood events might be extremely valuable towards con-fronting emergency response and mitigating the impact of those events [8].Therefore, realising the need for effective flood management, the European Union adopted European Directive 2007/60 / EC on flood risk assessment and management, which entered into force on 26 November 2007.In this Directive, the flood mapping was considered as a crucial element of flood risk management and moreover, it requested from EU Member States to prepare two types of crisis maps, namely the flood hazard and risk maps, by 2013 (art 6) and update them every six years [9,10].
Flood mapping is a process that describes the expected extent of Track changes is on 6 water inundation into dryland as a result of intense precipitation or river water level rise driven by natural or anthropogenic factors [11].Although, flood mapping basically comprising of flood hazard maps and flood risk maps, however, it processes vary considerably from project to project, and/or country to country, depending on specific project requirements and country-specific guideline, legislation etc. [9,10,12,13].
Flood mapping provides the baseline for a good understanding of historical flood trends, future expectations, and identification of vulnerable -susceptible locations likely to be impacted by flooding.Hence, the flood hazard and risk maps are considered as important tools to communicate flood risk to various target groups [12].They convey the compiled information for flooding events to relevant public bodies like civil protection and water management authorities, municipalities and local states or disaster/crisis managers and control staffs, but also raise awareness to the broad public [14].
Recently, the hazard, exposure and vulnerability from natural disasters have been assessed by utilising machine learning methods in a descriptive and/or predictive manner.Descriptive Machine Learning methods focus on the Response and Recovery phases of the Disaster Management Cycle while the Predictive Machine Learning methods concentrate to provide forecasting assessments of a natural disaster, enhancing the preparedness and mitigation processes of the Disaster Management Cycle [5,6,15,16].Specifically, flood hazard assessments employing descriptive machine learning methodologies focus primarily on the response phase, by estimating current inundation extents and depths.The aim is to provide assistance in various levels: to emergency responders and those affected directly, as well as to public and government authorities assessing the impact of the event.The increasing volume of obtained data due to the rise of Earth Observation technologies, such as Synthetic Aperture Radar -SAR (e.g.Sentinel 1) and optical data (e.g.Sentinel 2), as well as social media, provides opportunities for machine learning methods to improve efficiency of existing flood detection approaches [5,6,15,17,18].Satellite remote sensing capabilities have been utilised to monitor for timely and near-real-time flood disaster detection.Specifically, SAR technology overcomes the limitations of the remotely sensed optical data which are not functional during cloud-cover or at night and as a result enhances total temporal resolution [6,7,15,[17][18][19]. Advanced machine learning classification methods can be used to improve the process of the flood extend assessment and consequently the severity level of a flood hazard.However, the creation of these models requires the existence of annotated datasets to be used as training sets.
As stated in [5] one of the main key research challenges in this domain is the lack of large scale annotation datasets, related to social media and satellite sensing data, for training and evaluation machine learning models enable to detect and analyse disasters generated by natural extreme events.Moreover, Said et al. [5] pointed out that another open issue in the application of Remote Sensing Disaster Management cycle concerns the Satellite Imagery low temporal frequency.On the other hand, time is vital during a disaster event in order to enable authorities to respond effectively to minimise the socio-economic, ecologic, and cultural impact of the event, to evacuate vulnerable people at risk, and general for recovery processes [20].Motivated by the above limitations, the main contribution in this work is the adoption of a methodological framework for the creation in near real-time of flood hazard and risk maps that is relied on the fusion of the satellite imagery outcomes and the GIS-based data.Explainable Machine Learning techniques are employed to analyse and aggregate the information in a pixel-based approach aiming to estimate the flood hazard in terms of the severity levels, namely moderate, medium and high hazard.A thorough analysis of the specific local characteristics in pixel-based operation enhances the reliability of the proposed framework regarding the classification of these small areas in terms of their severity level.The annotation of the datasets which are needed for the modeling phase is carried out in an automated way, performing a rule that relies on the experts' knowledge.Furthermore, relied on a rule-based approach, the assessment of the exposure, vulnerability as well as flood risk are carried out producing the corresponding crisis maps.Hence, the proposed framework enables authorities and other crisis managers to reliable map and monitor flooding events by generating crisis maps almost dynamically, which are strengthening situational awareness providing an adequate picture of the crisis.

Relevant Literature
Recently, numerous studies have been proposed to create flood susceptibility maps as a tool for efficient flood risk management [21][22][23][24][25][26][27][28][29][30].Flood susceptibility indicates the propensity of an area, given by its physical-geographical characteristics, to be affected by flooding.Additionally, flood susceptibility mapping can be determined as a quantitative and qualitative assessment of an area with likely flood occurrence, providing simultaneously the spatial distribution of the particular natural event [22,26].Since the analysis and the mapping of flood susceptibility identify the most vulnerable areas and therefore can be considered as one of the most important aspects of early warning systems or strategies for prevention and mitigation of future flood situations [28,31].It should be mentioned that apart from flood hazard, also the vulnerability and exposure can be visualised as maps, therefore, they are spatially explicit and are integrated into a GIS context.For instance, in a grid cell of GIS maps of a certain size, we can explicitly exhibit the expected depth of a flood and the presence of buildings and people and the likelihood of them to be damaged or harmed.
With the rise of technological advances in Remote Sensing, Geographic Information System and Machine Learning, multidisciplinary approaches have been proposed aiming to efficiently map, monitor and manage floods.Hence, in the flood risk assessment, multiple satellite-based flood mapping and monitoring can be considered as an essential and imperative process.By leveraging the increasing availability of free-of-charge or low-cost satellite data with global coverage (e.g.Sentinel-1 and -2 from ESA, and Landsat and MODIS satellites from NASA) [32], new potentialities have emerged in the near real-time for mapping and modeling flood risk and its impact assessments [33].As a result, authorities and stakeholders can be assisted to carry out appropriate disaster response and relief activities achieving in the early stages the disaster risk reduction and mitigation [34].Another low-cost Remote Sensing solution that has gained considerable interest in the last decades is the Unmanned Aerial Vehicles (UAVs) [35,36].Equipped by high-resolution camera sensors, UAVs can capture high-quality topographical data and facilitate monitoring and mapping a natural hazardous event [37].
Advanced machine learning methods coupled with multi-criteria analysis methods and remote sensing technologies have been developed and applied effectively in flood susceptibility mapping.To name of a few, in [22] the performance of four machinelearning methods, namely Kernel Logistic Regression, Radial Basis Function Classifier, Multinomial Naïve Bayes, and Logistic Model Tree have been compared in terms of their efficiency to create reliable flash flood susceptibility maps.Similar, in [23]  Tree, and single Credal Decision Tree have been compared for flash flood susceptibility assessment.In [24] authors focused on Support Vector Machines (SVMs) and applied various kernels to investigate their capabilities to assess accurately the flood susceptibility and produce the corresponding mappings.Logistic Regression (LR) has been employed in [25] aiming to determine the significance of flood conditioning factors to flood susceptibility.Researchers in [21] adopted an approach to identify the areas susceptible to flash-flooding, by relying on the computation of Flash-Flood Potential Index (FFPI) and using two machine learning models (k-Nearest Neighbor and K-Star) along with their novel ensemble with an Analytical Hierarchy Process (AHP).Furthermore, in [26] an approach to derive an integrated model, considering the best performing models among the combinations of four models: Artificial Neural Network (ANN), AHP, LR, and Frequency Ratio (FR) have been proposed.The goal was to develop a unique flood hazard map of Bangladesh by increasing the precision of flood susceptibility assessments.In [38] a hybrid model comprising Principal Component Analysis, LR and Frequency Distribution analyses has been presented, while in [39] an ensemble modeling approach which incorporates the SVM with Multivariate Discriminant Analysis (MDA), and Classification and Regression Trees (CART) to create a flood susceptibility maps has been proposed.Another ensemble method that combines SVM using a radial basis function kernel with the FR approach to estimate flood probability has recently proposed [40].The ultimate goal was to assess the flood risk.In [41] two machine learning techniques, namely, Convolutional Neural Network (CNN) and SVM fused to develop most reliable flood susceptibility maps using GIS data.In [42] authors proposed a Deep Neural Network (DNN) model that employed Sentinel-1 satellite data by fusing the SAR backscatter coefficients and the Digital Elevation Model (DEM) data, so as to generate water-bodies masks.
Generally, in the majority of the above studies, the satellite imagery and GIS related data are provided in near real-time in order to assess the risk of an extreme flood event which is in progress.

Study Area
The study domain is located in North-East of Italy, and specifically in the eastern part of Friuli Venezia Giulia Region and of the Eastern Alps River Basin District, close to the boundary between Italy and Slovenia.In particular, this work focuses on three distinct areas, each of them located in a different Municipality, namely Trieste, Muggia and Monfalcone, as it is illustrated in Figure 1: The area of Trieste and Muggia is unique in Italy from a hydrogeological perspective, having karst features and thus lacking of surface hydrography and well-defined watersheds.As regards the topography, these two Municipalities are characterized by the presence of steep hillside close to the shoreline, as can be seen from the elevation plotted in Figure 2.However, the urban centers of the two municipalities, where this work focuses, have a low elevation, close to the sea level.As regards the Monfalcone region, the Municipality is mostly located in the plain called in Italian 'Pianura Isontina', at the mouth of the Isonzo River.The elevation of the area is very close, if not inferior, to the sea level and the terrain mostly plain with very low slope (Figure 2).Due to the fact that the all the three study areas are characterized by low elevation of the ground above sea level, they are particularly prone to floods due to high tides of the Adriatic sea triggered by meteorological conditions.In fact, Flood hazard in the coastal area often manifests trough storm surge simultaneous to with specific climate conditions (rainfall, high tide, southern winds).Flooding in the urban areas of Trieste and Muggia is caused, in addition to the topography, by the excessive imperviousness of the soil and because of the difficult discharge of the superficial runoff when high tide is simultaneous to the flow of the superficial drainage network [43].In addition, for the area of Muggia, even if the karst geology mostly causes the lack of superficial water bodies, there are two streams: Rosandra and Ospo.These two streams highlight some critical points from hydraulic point of view, due to the insufficient maintenance and to the increasing pluvial runoff caused by the intensive urbanization.
Regarding Monfalcone area, the territory, located in the east side of the Isonzo River, is well known to be humid (swampland).In particular, drainage network often shows failures in occasion of flood events simultaneous with high tides.As it can be seen from Figure 2, part of the territory has also an elevation lower than the mean sea level.In addition, the area presents a relevant underground hydrography (e.g. the Karst river Timavo).Thus, in this area high tide can cause flooding due to the insufficiency of the marine levees, as well as for overflowing of the drainage network [43].Finally, for the Monfalcone area, the flood risk is due also by the presence of the Isonzo River, one of the most important rivers for the Eastern Alps River Basin District, as well as its most relevant transboundary water body.The Isonzo River originates in Trenta's valley with springs at an altitude of 935 m and flows into the Adriatic sea, near Monfalcone, where it forms a delta that tends, over time, to move from West to East.The Isonzo catchment basin subtends a total area of approximately 3400 km 2 of which is about 1150 km 2 , that is about one third, in Italian territory.The Isonzo river, as character purely torrential, collects and discharges the waters of the southern side of the Alps Giulie, which separate this basin from that of the Sava.The main right tributaries are the Coritenza, in Slovenian territory, and the Torre, which flows almost entirely in the Italian part.On the left, the Isonzo is fed by Idria and Vipacco, with their respective basins included totally and almost totally in Slovenian territory [44].

Digital Elevation Model in the Study Area
The Digital Elevation Model (DEM) has been provided by Eastern Alps River Basin District Authority (AAWA), who performed some GIS elaborations on the official DEM of the Friuli Venezia Giulia Region.DEM is provided into the reference system UTM 33N (EPSG 3045).It has been obtained using Laser Imaging, Detection And Ranging (LIDAR) technique from a set of areal flights which were performed in 2019.The raw data obtained from the flights (a cloud of points) has been gradually processed to provide the final product.This, in turn, consists of a representation of the points of terrain, devoid of all the elements above the ground (like buildings, vegetation, cables etc.), on a regular grid with pixel resolution of 0.5 m x 0.5 m, divided between many different tiles.The DEM has a planimetric accuracy of 0.15 m and an altimetric one which ranges from 0.15 m (in open field) and 0.3 m (under vegetation cover), both estimated trough a set of reference points all over the region.It should be noted that for the city of Trieste, which is particular vulnerable to floods caused by the tide, identify flat areas near the sea is thus very important.We used three areas with DEM resolution equal to 0.5 m as shown in the above figure (Figure 2).Elevation: the elevation of the terrain has a great influence on floods.Firstly, at a great scale, the dynamic of the event is usually completely different in high elevation areas (mountains) than low elevation ones (i.e.plains) which usually are more vulnerable to flooding caused by various reasons such as river overtopping, drainage system failure and/or rising water level of seas, or other water bodies.Secondly, at a minor scale, the terrain elevation determines the presence of preferential pathways, which channels the superficial runoff, or accumulation areas, which usually are represented by local depression of the terrain.

Flood Conditioning Factors
Slope is an essential factor for studying flash flood susceptibility because it affects the speed of water.Slope of a line can be positive, negative, nil, etc. [27].
Aspect is related to the directions of water flow affecting flash flood occurrence.
Flat areas are more vulnerable to water accumulation and/or spreading of water over a large surface, in particular when large volumes of water are involved.Therefore, by using this parameter, the flat regions can easily be identified [23,27].
Topographic Wetness Index (TWI) is a topo-hydrological factor and reflects the wetness potential of each pixel.It can be calculated as a fraction of flow accumulation, A s , and the slope α (in degree) at the pixel: The increment of the TWI index, indicating higher wetness characteristics, means that high flow accumulation carries out in low slope surfaces, and, therefore, potentially indicates locations that are exposed at greater flood hazard [21,[23][24][25]45].
Topographic Position Index (TPI) is a ratio of the pixel elevation (grid cell) and the 257 mean elevation of its neighboring pixels (cells) respectively [21,45]: Terrain Ruggedness Index (TRI) is in contrast to the TWI and is responsible for rugged [27].TRI which is defined as the mean difference between a central pixel and its 263 surrounding cells can be calculated as follows [45]: where x shows the elevation of each neighbor cell to cell (0, 0)(m).In addition, min  (4) where:

293
For the flood detection we processed the Sentinel-1 GRD-IW products of the flooded 294 day and the timeseries images using ESA's Sentinel Application Platform2 (SNAP).

295
Following preprocessing steps were applied [46]: • Apply Orbit File: The operation of applying a precise orbit available in SNAP allows the automatic download and update of the orbit state vectors for each SAR scene in its product metadata, providing an accurate satellite position and velocity information.
• Thermal Noise Removal: Reduces noise effects in the inter-sub-swath texture, in particular, normalizing the backscatter signal within the entire Sentinel-1 scene and resulting in reduced discontinuities between sub-swaths for scenes in multi-swath acquisition modes.
• Subset: the initial product is cropped so it contains only the lake we want to observe.
Some balance between the inundated and non-inundated areas is desired.
• Radiometric calibration: Fixes the uncertainty in the radiometric resolution of satellite sensor.The pixel values can be directly related to the radar backscatter of the scene.
The information required to apply the calibration equation is included within the Sentinel-1 GRD product.
• Speckle noise removal: Removes the pepper and salt like pattern noise that is caused by the interference of electromagnetic waves.The "Lee Sigma" filter of Lee (1981) [47] with a 5×5 filter size is used to filter the intensity data.As noted by Jong-Sen Lee et al. ( 2009) [48], this step is essential in almost any analysis of radar images, due to the speckle noise aggravation of the interpretation process.
• Terrain correction: Projects the pixels onto a map system (WGS84 was selected) and re-sampled to a 10m spatial resolution.Also, topographic corrections with a Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) is performed.
Corrects the distortions over the areas of the terrain.The deep valley of the histogram separates the inundated from the non-inundated areas.
This thresholding technique works better when there is adequate number of inundated areas in order to distinguish them from the dry ones, elsewise threshold extraction may fail.In the satellite images of the areas that we study it is quite common that water and land areas are not in balance.Thus, in order to increase the chance to estimate a valid threshold we split the image to nine (9) tiles and then perform the thresholding to each one of them, calculating eventually the average threshold that is used in the whole image to separate the inundated from the non-inundated areas.This pixel-based classification of the region of interest, will be fused with the information from DEM to estimate the Water Depth.For each separate water body (sub-area) of a water mask, the maximum elevation is detected using the DEM.Then, for this sub-area the Water Depth is estimated by subtracting each pixel DEM value from the maximum elevation.It should be noted, that flood depth along with flood duration directly contribute to flood occurrence [26].

Machine Learning techniques
In this work, we utilised a well-known machine learning techniques for classification, namely Support Vector Machines (SVMs), Naive Bayes (NB), an ensemble learning method called Random Forest (RF) and a feed-forward Neural Network (NN).A brief description of them is the following: • Support Vector Machine -SVM: Support Vector Machine (SVM) Classifier [49] represents a supervised machine learning technique that exploits the abilities of hyperplanes, reshaping the nonlinear world into linear in order to classify the features.Hyperplane is a decision plane that aims to separate a set of objects and label them into different classes.SVM consists a method which is aiming to separate in more efficient way the features using hyperplanes.
• Naive Bayes -NB: According to Bayes Theorem, we deployed the statistical classification technique, Naïve Bayes (NB) classifier.This classifier belongs into the group of supervised learning algorithms and happens to be one of the simplest with high accuracy and speed, especially when it collocates with large datasets.NB is using a classifier model which is assigning class labels into the problem events, represented as vectors of feature events, where a set is used to annotate the class labels.1) that presents the results from classifiers, using some specific terms, such as "True positives (TP) "the predicted and actually positive result, "False positives (FP)" the predicted positive but actually negative result, "True negatives (TN)" the predicted and actually negative result and "False negatives (FN)" the predicted negative but actually positives.

•
Accuracy: Accuracy is the most commonly percentage metric for machine learning models judging the accuracy of the results and can me calculated using confusion matrix terms: • Precision: Precision answers the question of what analogy of the positive results was in fact correct and can be calculated using: • Recall: Recall on the other hand, answers the question of what analogy of true positives was identified correctly and can be calculated using: • F1-score: F1-Score is a measure to evaluate classification systems and is a way to combine the precision and recall results.It can be described as the harmonic mean of precision and recall and can be calculated using: • Cross-Validation k-fold: Cross-validation is a statistical method of evaluating machine learning models, where it divides the dataset into random K-segments in order to use them for model training and comparing them we select the best model.The process of cross-validation, has a single parameter k, which refers to the number of segments that will randomly separate each set of data.In our case k is equal to 10 and we choose the best model using the average result per training.

Methodology
In the case of extreme natural events, such as floods, the hazard, exposure and vulnerability can be identified when interactions between these events and human societies are assessed.Flood Hazard can be estimated from the physical characteristics of the flood event such as the extent, water depth, persistence, and flow velocity.The hazard outcome is a map of flood intensity, provided by the hydrological analysis and modelling i.e., flood frequency analysis, geomorphological characteristics of the region under assessment (pathway) and manufactured barriers against the hazard (attenuation) elements of the assessed area.Conventionally approaches consider different return times and measures of intensity, producing multiple hazard maps [13,14].
Furthermore, the exposure refers to the characteristics of the people and assets that can be affected by flooding, focusing mainly on the social, environmental and economic value of them.Vulnerability is the human dimension of flood disasters and is the result of the range of economic, social, cultural, institutional, political, and psychological factors.
The physical component is captured by the likelihood that receptors located in the area considered, could potentially be harmed (susceptibility of receptors).The social one is the ex-ante preparedness of society given their risk perception of awareness to combat hazard and reduce its adverse impact or their ex-post skills to overcome the hazard damages and return to the initial state (represented by adaptive and coping capacities).
These can increase the susceptibility of an individual, a community, assets, or systems to the impacts of flood hazards [51][52][53].
The proposed framework tailors the definition for the disaster risk which was defined in 2017 by the UN Office for Disaster Risk Reduction (UNISDR) and includes the Sendai Framework for Disaster Risk Reduction 2015-2030 [53,54].Therefore, Disaster Risk (R) is defined as the potential loss of life, injury, or destroyed or damaged assets which could occur to a system, society, or a community in a specific period of time, determined probabilistically as a function of hazard, exposure, vulnerability and capacity.
Based on the above term, in the field of natural hazards, the disaster risk results from the coupling between hazard (H), vulnerability (V) and exposure (E): In our approach, the severity level of the flood hazard is dynamically assessed by employing machine learning techniques that are able to multimodal fuse data generated by the analysis of Sentinel-1 images and GIS-based data.Then, a rule-based approach is utilised in order to estimate in near real-time the vulnerability and the exposure in the region of interest.Specifically, the proposed framework consists of ten (10) successive steps as illustrated in the following figure (Figure 3).The remaining steps concern the assessment of vulnerabilities, exposure upon three main categories concerning the people, economic activities, and environment, cultural-archaeological assets and protected areas.A rule-based approach has utilised for this purpose.In the last step, the combination of the assessments of the hazard, vulnerabilities and exposure generates the hydraulic risk.In the following sections, the steps of the proposed methodological framework are described in more details.

Dynamic Flood Hazard Assessment Algorithm
The proposed approach for Dynamic Flood Hazard Assessment consists of seven (7) steps as they are illustrated in the Figure 3. Specifically, a study of the area of interest should be realised including the gathering of appropriate information from past extreme flood events.Then, the data acquisition phase should be taken place and the appropriate features are extracted from the data aiming to create a dataset for the application of machine learning methods.The obtained data should be homogenised and pre-processed so as to deal with missing values or outliers, data impurity issues, different ranges over the features, etc.Hence, a flood inventory will be created that contains data suitable for apply Machine Learning modeling.In the training/testing phase machine learning models will be fit to the data and evaluate their performance in terms of their accuracy.The best machine learning model is chosen and utilised in Validation phase to create the flood hazard maps.

Study Area and Historical Flood Events
As aforementioned (Section 3.1) the area of interest to further study is located in the municipality of Trieste.For this particular region, past flood events were chosen in dates that there are satellite imagery that captured the events.

Data Acquisition and Feature Extraction
The processes of data collection and feature extraction aiming to create adequate feature space that will be utilised in the modelling phase are included in this step.The data will be gathered from two diverse sources (Figure 3), namely from the analysis of satellite images and the DEM.
The Sentinel-1 Images (SAR) were analysed by employing the preprocessing steps that were described in the Section 3.3.Their spatial resolution was equal to 10m and temporal resolution was approximately 6 days or less.The outcome of these steps undergoes a histogram thresholding analysis that generates the appropriate water masks.
The Flood Conditioning Factors that are employed in this work derived from the DEM as described in Section 3.2.Each one of these factors can be considered as an independent feature in the feature space.As they are provided as maps, they can be converted to raster image (format) with pixel size which is equal to the pixel size of the DEM.In this way, all the images will obtain the same resolution.Then, a feature space of nine ( 9) attributes (features) are formulated, in which each feature corresponds to one raster image.The number of entries in the dataset depends on the total number of pixels in each image (width x height).

Data Preprocessing
The dataset that has generated after the fusion of all the features, as it was described in the above section, should be subdue under preprocessing procedures including the followings: • Create annotated dataset: Upgrade the data set by adding a target variable so that Machine Learning techniques can be applied.Our goal is to create machine learning models enable to assess the flood hazard level and which are relied on the flood conditioning factors and the real-time analysis of satellite imagery.Hence, the target-variable should be the "Flood Hazard" that receives three potential values, namely Moderate (Low) Hazard, Medium Hazard and High Hazard.To be annotated the dataset, the following rule will be applied [44,55]: It should be mentioned here that the above rule is based on hypothesis of medium probability of the flood, which has a 100-year return period in the study area.
• Handle Imbalanced dataset: due to the facts that inundated areas usually are a quite small portion of the whole region of interest and furthermore floods are a quite rare extreme event, then it is expected the majority of entries in the "Flood Hazard" will belong to the Moderate Hazard class causing an imbalanced dataset.
Hence, the machine learning models will be biased to the majority class.To tackle with this issue a random sampling is performed, and a portion of the majority class is selected equal to the amount of data that belong to the other two classes. •

Handle missing or extreme values: pixels with missing values or extreme values
that indicate areas that are out of the interest, e.g.inside the sea, should be detected and removed from the analysis.
• Data Normalisation: the aim is to eliminate the numerical differences between the features and transform them to the same range.Machine learning models require that the input data are normalized using the same range, since the bias may occur in the results due to the bigger magnitude of the initial untransformed data.
Hence, the min-max scaler is utilised that transforms each one of the input features (predictors) to min/max scale (i.e.[0,1] scale).The formula is given as follows: (10) where X is the normalized data, x is the raw data, x min is the minimum value of each feature vector, and x max is the maximum value of each feature vector.
It should be mentioned that the above two steps, namely the data acquisition and feature extraction as well as the preprocessing could be performed iteratively taking into consideration historical flood events in a specific region.As a result, a Flood Inventory would be created that will be exploited to fit Machine Learning models capable to assess the flood hazard.The above process results in the classification of each pixel in terms of the level of severity of a potential flooding event that expressed by the Flood Hazard Index.To color the necessary labels of the Flood Hazard categories, we followed coloring suggestions by end-users (AAWA).The outcome of this process is a flood hazard map.

Dynamic Flood Risk Assessment Algorithm
To estimate the Hydraulic Flood Risk, it is necessary to calculate three basic parameters, namely the Flood Hazard, the Vulnerability and the Exposure, as mentioned above.The first parameter relates with the Flood Hazard Index which is estimated by adopting the process that proposed in Section 4.1 by fusing information from the analysis of Satellite images and GIS-related data.
The other two parameters are the Vulnerability and Exposure of socioeconomic elements in the impacted area.The flood risk assessment algorithm presented in this work has been developed in collaboration by AAWA, as an adaptation of the procedure presented in AAWA's Flood Risk Management Plan (FRMP) of the Eastern Alps River Basin District.FRMP has been redacted by AAWA in compliance with the Directive 2007/60/EU, which also prescribes a periodic update of the contents of the plan every six year.The first iteration of the plan was finalized in 2015 and approved in 2016 [44], while the second iteration (referring to the period 2022-2028) is being finalized [55].From first to second cycle, some of the criteria have been updated.The methodology presented in this work is coherent with the newest criteria.
According to the Flood Risk Management Plan (FRMP), for the estimation of the Vulnerability and Exposure crucial and necessary is the knowledge of the usage and land cover of the area of interest.Therefore, in this work we employ geospatial data files, such as Corine Land Cover [56].Then, a specific land use type from FRMP is corresponded with Corine Land Cover Codex (CLC) and the Manning roughness coefficient is estimated [44].

Vulnerability estimation
To mitigate the consequences of flood disasters, suitable Disaster Risk Reduction (DRR) measures need to be carried out.In addition to flood hazard awareness and knowledge, also information on Elements at Risk (EaR), i.e., people, infrastructure and assets, that may suffer damage when exposed to a flood hazard, needs to be considered [57].EaR's vulnerability assessment toward the specific flood hazard at different event magnitudes, and the resulting risk allows the effectively monitored and early warnings to be given in case in an impending hazardous situation.
In this work, the Flood Risk Assessment algorithm defines three different parameters of vulnerability: vulnerability of people (Vp), vulnerability of economic activities (Ve) and vulnerability of environments and cultural-archaeological assets and protected areas (Va), all these parameters are estimate for every pixel and their values are between 0 and 1.These values depend both on the intrinsic characteristics of the different exposed assets, as well as the hydraulic condition (water level and water depth) that are established during the flood and they can affect the capacity of response.In other words, Vulnerability is dependent on the specific nature of the element, which can be related to land use, and simultaneously by the flood hazard.In the FRMP, a detailed description behind the definition of these rules is provided [44]. •

Vulnerability of people (Vp):
The physical vulnerability associated with people considers the values of flow velocity (Water Velocity -v) and Water Depth (h) that produce "instability" with respect to remaining in an upright position [58].FRMP proposes a semi-quantitative equation that links a flood hazard index, referred to as the Flood Hazard Rating (FHR), to h, v and a factor related to the amount of transported debris, i.e. the Debris Factor (DF).According to this algorithm, the land use type classes are grouped in order to calculate the Debris Factor (DF) concerning the possibility of floating materials which can harm the population.
After the calculation of DF, the estimation of the Flood Hazard Rating (FHR) is carried out, by utilizing the Water Depth and Water Velocity according to the following formula: where h is the Water Depth, v is the Water Velocity and DF is the Debris Factor.Vp is estimated according FHR (Table 2)  to the land surface and to vegetation but can also damage infrastructure [58].From AAWA's FRMP [44,55], the value of Va in certain land use is 1, while assuming a residual Va value for all other.

Exposure estimation
Exposure depends on the spatial collocation of the assets, which is strictly related to the land use, and on the evaluation of the potential negative consequence for each category of the exposed element.Flood risk algorithm sets three different exposure parameters: exposure of people (Ep), exposure of economic activity (Ee), exposure of environment and cultural elements (Ea).All these parameters are estimate for every pixel and their values are between 0 and 1.For more detailed information about the literature behind the definition of these rules, we remand to the FRMP [44,55].
• Exposure of people (Ep): First step to calculate the Ep, is to estimate the population of the area of interest per pixel which is divided into census areas by the Italian national Institute of Statistics (ISTAT).The dataset of population is given to us via shapefiles which is a form of geospatial vectors, so we can calculate per pixel according to geolocation data.The calculation of Ep can be produced by: where F d is a factor characterizing the density of the population in relation to the number of people present.For the population estimations in specific areas, census data have been employed.F t is the proportion of time spent in different locations (e.g.houses and schools), using the land use classes. •

Exposure of economic activity (Ee):
The Ee calculation depends solely on land use of the area of interest.In order to create the corresponding Flood Risk Map for the area of interest, the assessments of the Hydraulic Flood Risk correspond to specific colors in RGB scale.

Results and Discussion
In order to evaluate the performance of the Dynamic Flood Hazard Assessment algorithm in terms of its accuracy, firstly the machine learning models need to be created.
This takes place in the Training/Testing phase (Sec.4.1.4) of the proposed methodological framework.Then, in the evaluation phase, the trained models are validated in terms of their precision, namely to estimate the class of Flood Hazard Index over "unknown" data.
For this purpose, a series of experiments were realised in order to find out the best set of parameters during the training of machine learning models which will result in the chosen of the best model.The dataset that we used in this phase, formed based on satellite images and DEM data over specific dates where floods had occurred, due to the appearance of extremely high sea tides and heavy rains that were observed in the municipality of Trieste.
As mentioned above, the dataset divided into two sets, 70% of the entries used for training purposes and the rest 30% for testing the accuracy of the models.Cross-Validation k-fold in order to evaluate the machine learning models is used.In our case, the parameter k is set equal to 10 choosing the best model with the help of the average results.A set of parameters for each one of the machine learning model that they have been employed and evaluated is presented in the Table 4.    Similarly, to evaluate the performance of the Dynamic Flood Risk Algorithm, we extend the former analysis over the evaluation datasets that have created by utilised the satellite imageries in the areas of interest for various dates.The goal is to estimate the Hydraulic Flood Risk (R) for each entry in the dataset, assign its value to a corresponding risk level and create the corresponding Flood Risk Map.

Trieste 2019/09/23
The confusion matrix (Figure 6) implies the efficacy of the proposed approach as the algorithm manage to inference correctly the entries of the validation dataset into the corresponding flood hazard labels (Predicted labels).In Figure 7 and Figure 8

Muggia 2018/10/29
Similarly, the results of the application of the proposed approach is also examined in the Muggia area at 2018/10/29.The confusion matrix (Figure 9) indicates the efficiency of the proposed approach.The flood hazard and risk map in the specific area and date are illustrated in the following figures (Figure 10 and Figure 11) respectively.

Monfalcone 2019/09/24
The proposed approach managed to classify correctly the pixels, that shape the evaluation set in the Monfalcone area on 2019/09/24.The results are depicted in the corresponding confusion matrix (Figure 12).The Figure 13 and Figure 14

Discussion
In this work, the proposed framework aims to provide to the Authorities a methodology for evaluating and mapping the level of the risk of a specific flood event using free data from widely available sources, namely the satellite (Sentinal-1) data and GIS-related data.Initially, four well-known machine learning approaches, namely Naïve Bayes (NB), Random Forest (RF), Support Vector Machines (SVM) and Neural Networks (NN), have been employed to fuse the available information and estimate in near real-time the flood hazard levels.From the experimental evaluation process, Random Forest has exhibited slightly better performance in terms of the F1-score compared with the others.Therefore, we used this approach, as a predictor, in order to create flood hazard maps in the region of the three Municipalities (Trieste, Muggia and Monfalcone) during the evaluation process.The high-precision scores achieved during the training and evaluation process by machine learning algorithms are mainly due to the pixel-based approach that we followed, instead to analyse a sampling of pixels.Hence, the trained machine learning algorithms are able to classify correctly areas in terms of their flood hazard levels.Going a step further, a rule-based approach has been applied, based on the AAWA's FRMP, which combines the flood hazard assessments with flood exposure and vulnerability estimations from the region of interest.The final goal was to produce a near real-time flood risk map.
Concerning the flood conditioning factors, it should be mentioned that the importance of the flood conditioning factors depends on the geomorphological characteristics in the area of interest as well as the historical flood events that were examined [22,59].In this work, the Water Velocity, Water Depth, Slope and Roughness have a dominant role (approx.91.5%) to the training and evaluation of the machine learning approaches that were applied.This is a rational conclusion due to the fact that these factors affect the propagation of flood and are the most important hydrodynamic parameters.Slope and roughness affect flow velocity and the water depth.As more an area is smooth and steep the more is higher the velocity of the flood.On the other hand, high roughness slows the water flow but increases the water level.Moreover, as described in the Section 3.1, the study areas are characterized by low slope and elevation of the ground above sea level (coastal areas), which are factors that favor floods due to high tides.
Furthermore, water depth and water velocity, as described in the Section 4 are the basis for both hazard and vulnerability estimations.These two factors participate in the annotation process in order to classify each pixel in one of the severity level categories (Section 4.1.3).The lack of annotated datasets to train machine learning models that will enable the assessment of the flood hazard levels is considered a crucial issue for the development of a robust system [5,16].In this work, to overcome this limitation, an automated rule-based approach has been adopted which inspired by the AAWA's FRMP.
In general, the proposed framework enables Authorities to evaluate the flood risk in near real-time by utilising low cost or free of charge satellite data and thus it can be used to overcome the gap of information in the areas with an irregular diffusion of hydro-meteorological sensors.Additionally, even in the presence of legacy Decision Support Systems like monitoring water distribution networks or forecasting systems, the proposed framework can provide useful providing complementary information.
For example, hydrometers record a punctual measure of water level inside a fluvial section.Thus, in the case of river overtopping, they cannot offer any useful information about the extension of the flood external to the river, as well as about its impact on the exposed assets.Similar consideration applies to flood forecasting system based on 1D hydraulic models.Even in the case of the availability of 2D hydraulics models, the information provided is limited to a hazard estimation, while the concept of risk is really crucial for effective response to an emergency situation and mitigating the consequences.
Flood Risk in fact links together not only the intensity of the event itself (hazard) but also the potential impacts of the communities, economic assets, environment and cultural heritage.
For this reason, the Flood Directive (2007/60/EC) highlights the importance of the redaction of flood risk maps as part of flood management plans.However, flood risk maps should be referred to a set of pre-defined hydraulic and hydrological scenarios (floods of certain return times), which may be different from the ones that occur during a real extreme event.From this perspective, this work aims to provide to the Authorities, as an integration to the 'static' flood risk maps, a 'dynamic' tool for having a quick and reliable estimation of the level of risk referred to a specific flood event when it occurs.Moreover, the proposed methodology can be used to assess the risk caused by different flooding mechanisms, including the ones that are currently not dealt by the Flood Directive (e.g.urban flood).
Finally, the proposed approach can be used to help the calibration of 2D hydraulic models, which is a challenging and time-consuming process.That means the operators have to simulate a flood event based on the past events for whom hydrometer's recordings/measurements are available.Then, they should confirm whether the results of the model are coherent with those measurements.However, measurements are punctual (a hydrometer measures the water level in a specific place, called river section) whereas the 2D model covers a broader area.Hence, the calibration of a 2D model that covers a vast area by using only spare punctual values is not an easy task.Moreover, although it is very important to calibrate a 2D model in surrounding areas of the river, however, the hydrometers are located inside the river and as a result, the water level measurements in the flooding areas (areas outside the river due to overtopping) do not available.

Conclusions
In flood management studies, the creation of accurate flood hazard and risk maps is The results are quite promising and encouraging.However, improvements should be done in the direction of the integration social media information into the Flood Risk algorithm.
Another aspect that we should deal with is the reduce the processing time and computational effort.These are mainly affected by the resolution of the satellite imagery, the DEM and the other derived flood conditioning factors.Due to the pixel-based approach that was followed in the analysis, higher resolutions of the images generate bigger scale datasets, which are demanding to resources.On the other hand, a poor resolution of the images affects the quality of the flood hazard and risk assessments and the generated maps.Hence, we should find out a trade-off between the quality of images and framework robustness.A potential solution to increase the quality of the DEM or its unavailability, is the adoption of low-cost UAV applications.
novel hybrid computational approaches of machine learning methods for flash flood susceptibility mapping, namely AdaBoostM1 based Credal Decision Tree, Bagging based Credal Decision Tree, Dagging based Credal Decision Tree, MultiBoostAB based Credal Decision

Figure 1 .
Figure 1.Location of the case study areas (the square boxes).The coordinates are expressed in the Reference system WGS84 -EPSG 4326

Figure 2 .
Figure 2. Elevation of the case study area in meters above sea level.Referred to vertical Datum EPSG 32632 (WGS84/UTM Zone 32), while the horizontal coordinates are expressed in the Geographic Reference System WGS84 -EPSG 4326 (Source of data INGV http://tinitaly.pi.ingv.it/,elaborated by AAWA)

Floods
are natural phenomena, caused by many different factors, including climatology, hydrology, geomorphology, topography and land use.For the purpose of this work, topography and land use are considered, extracting some of the most relevant conditioning factors from DEM analysis, well-known as Flood Conditioning Factors.The application of accurate Remote Sensing techniques is essential for obtaining reliable DEM and consequently more accurate factors.Furthermore, equivalent spatial resolution should be employed to calculate these factors.Below, a brief description of the factors that we utilised in this work is exhibited.

259
quantifying ruggedness of the terrain, by portraying the local variance of surface gra-260 dients or curvatures.TRI is considered as a morphometric measure that describes the 261 heterogeneous condition of a land surface and facilitates characterizing it as smooth or 262 285 v i denotes the Water Velocity (in m/s) at the i-th pixel; 286 h i denotes the Water Depth (in m) at the i-th pixel; 287 S i denotes the slope (in decimals) per pixel; 288 L denotes the resolution (in m) of each pixel; 289 n i denotes the Manning Roughness (Gauckler-Manning-Strickler) coefficient (in 290 s/m 1/3 ), that depends also on the land use and thus can be related by the Corine 291 Land Cover index, indicating the surface roughness per pixel. 292

•
Linear to Decibel (dB):The dynamic range of the backscatter intensity of the transmitted radar signal values is usually a few orders of magnitudes.Thus, these values are converted from linear scale to logarithmic scale leading to an easier to manipulate histogram, also making water and dry areas more distinctive.The analysis of the obtained Sentinel-1 images that are extracted from the Copernicus Open Access Hub (previously known as Sentinels Scientific Data Hub), carried out in order to estimate the Water-bodies Masks (water delineation maps).Particularly, we perform histogram thresholding on the processed VH band of the Area of Interest (AoI).

Figure 3 .
Figure 3. Flowchart of the Dynamic Flood Hazard Assessment Algorithm.The first two steps concern the specification of the area of interest and the choice of dates where flood events were carried out.The essential condition is the existence of the satellite images from the study area.Steps 3-7 concern the processes for the creation of flood hazard maps in near real-time, when new satellite images appear for the particular area.The water mask, water depth and velocity of the water body along with other flood conditioning factors which are derived from the analysis of satellite imagery or extracted from GIS tools, are fused by employing machine learning techniques.As a result, is the generation in near real-time flood hazard maps that highlight the areas that are affected by or are vulnerable to a potential flood hazard.

4. 1 . 4 .
Training, Testing and ValidationIn this phase, various Machine Learning methodologies are applied to aim to assess the flood hazard relied on the information from the Flood Inventory.The goal is to select the best machine learning model in terms of precision in the estimation of flood hazards.To achieve this, the dataset is divided randomly into two subsets.One portion of 70% of the data is commonly utilised for training and the rest 30% for testing so as to evaluate the capability of each model for generalisation.In this work, we use four different machine learning approaches, namely Naïve Bayes (NB), Random Forest (RF), Support Vector Machines (SVM) and Neural Networks (NN).The accuracy of each model is estimated in terms of the statistical validation measures, such as Accuracy, Precision, Recall and F-measure as well as the corresponding Confusion Matrix.The outcome (target) of the Machine Learning model is the Flood Hazard Index (H) which is estimated for every pixel on the area of interest and takes values between 0 and 1. Flood Hazard Index represents the probability of flood occurrence in an area of interest and classified into three (3) categories, namely Moderate, Medium and High.4.1.5.Flood Hazard assessment and mapping

FHR Vp (0 ≤ Vp ≤ 1 )FHR•
Vulnerability of economic activities (Ve):The vulnerability associated with economic activities considers buildings, network infrastructure and agricultural areas[58].It is a pixel-by-pixel function of the Water Depth (height) and Water Velocity (flow velocity).The vulnerability function depends on the specific nature of the assets and thus different functions are applied to land use types.•Vulnerability of environments and cultural-archaeological assets and protected areas (Va): Environmental flood susceptibility is described using contamination/ pollution and erosion as indicators.Contamination is caused by industry, animal/human waste and stagnant flooded waters.Erosion can produce disturbance

Figure 4 .
Figure 4. Confusion Matrix of the best Random Forest model Furthermore, the relative importance of the features namely the significance of each one of the attributes that participated in the training of a machine learning model was examined and the results are illustrated in the following figure (Figure 5): the flood hazard and risk map in the Trieste area at 2019/09/23 are exhibited respectively.
illusrate the flood hazard and risk map in the Monfalcone area at 2019/09/24 are exhibited respectively.
essential for the preparedness and mitigation of an extreme flood incident.In the recent decade, numerous researches have been published aiming to assess the flood hazard and create more reliable hazard maps.State-of-the-art methodologies utilise advanced remote sensing techniques including Satellite imagery analytical tools and GIS-related data along with machine learning techniques aiming to estimate the flood susceptibility and develop the corresponding maps.In this work, a flood hazard assessment algorithm proposed which deals with the problem of flood monitoring and mapping.It develops a machine learning model which is enabled to assess the severity levels of flood hazard.The utilisation of satellite imagery along with the flood conditioning factors that are generated by GIS, provide the opportunity to create an extensive flood inventory.The proposed approach attempts to resolve the two main challenges which are: 1. the domain lack of annotated dataset for the training and evaluation of the machine learning techniques able to detect and monitor the flood event by utilisation remote sensing techniques.2. the low temporal frequency of satellite imagery acquisition, which hinders the real-time monitoring of an evolving flood.Furthermore, in this paper an extension of the Dynamic Flood Hazard algorithm was realised in order to estimate the hydraulic flood risk combining vulnerability and exposure information from impacted areas.Both approaches are evaluated in terms of their accuracy and their capability to create accurate flood hazard and flood risk maps.
[50]Random Forest (RF)[50]is a well-known ensemble machine learning method either for classification or regression.The objective of this classification technique is to compare and analyze the dataset variables to define new weights for each factor.In our case of study, the RF model exploits decision trees in order to calculate and estimate the connection between Flood Hazard Index labeling and Flood feature factors values, focusing on the end to classify each vector of values into a predicted label.RF is simple, fast, able to handle large datasets, it has generally high outcome through randomization and is applicable to multiclass •Random Forest -RF:• Neural Network -NN: Neural Networks can be portrayed as the hierarchical multilevel relationships between neurons in a network of neurons similar to the function of the brain.The neurons implement a feedback mechanism with each other, transmitting the necessary signals to the next levels, based on the received input received from the respective previous levels, reaching one or more final results.3.5.Model evaluation metrics• Confusion matrix: Confusion Matrix is a table(Table

Table 2 .
Estimation of Vulnerability of people according to FHR.
(13) V p + p e H * Ee * Ve + p a H * Ea * Va p p + p e + p a(13)where H is the Flood Hazard, E is the Exposure, V is the Vulnerability and p p p e p a

Table 3 .
Classification of Hydraulic Risk into four classes

Table 4 .
Set of parameters per machine learning model

Table 5 .
Summary table of results of the best-trained machine learning models over the test set