Flooded Extent and Depth Analysis Using Optical and SAR Remote Sensing with Machine Learning Algorithms

: Recurrent ﬂooding occurs in most years along different parts of the Gulf of Mexico coastline and the central and southeastern parts of Mexico. These events cause signiﬁcant economic losses in the agricultural, livestock, and infrastructure sectors, and frequently involve loss of human life. Climate change has contributed to ﬂooding events and their more frequent occurrence, even in areas where such events were previously rare. Satellite images have become valuable information sources to identify, precisely locate, and monitor ﬂooding events. The machine learning models use remote sensing images pixels as input feature. In this paper, we report a study involving 16 combinations of Sentinel-1 SAR images, Sentinel-2 optical images, and digital elevation model (DEM) data, which were analyzed to evaluate the performance of two widely used machine learning algorithms, gradient boosting (GB) and random forest (RF), for providing information about ﬂooding events. With machine learning models GB and RF, the input dataset (Sentinel-1, Sentinel-2, and DEM) was used to establish rules and classify the set in the categories speciﬁed by previous tags. Monitoring of ﬂooding was performed by tracking the evolution of water bodies during the dry season (before the event) through to the occurrence of ﬂoods during the rainy season (during the event). For detection of bodies of water in the dry season, the metrics indicate that the best algorithm is GB with combination 15 ( F 1 m = 0.997, AUC = 0.999, K = 0.994). In the rainy season, the GB algorithm had better metrics with combination 16 ( F 1 m = 0.995, AUC = 0.999, Kappa = 0.994), and detected an extent of ﬂooded areas of 1113.36 ha with depths of <1 m. The high classiﬁcation performance shown by machine learning algorithms, particularly the so-called assembly algorithms, means that they should be considered capable of improving satellite image classiﬁcation for detection of ﬂooding over traditional methods, in turn leading to better monitoring of ﬂooding at local, regional, and continental scales.


Introduction
The scale of recent changes across the climate system as a whole and the present state of many aspects of the climate system are unprecedented over many centuries to many thousands of years. The frequency of occurrence of extreme weather events is increasing in many parts of the planet. Rain events have increased in frequency and intensity since the 1950s over most of Earth's surface, with the available observational data identifying trends in human-induced climate change as the main driver [1]. Climate change has innumerable impacts worldwide, which can be observed in terms of its repercussions on sea level rise, heat waves, storms, drought, floods, reduction of glaciers, economic instability, and the destruction of ecosystems, among others. Rain affects daily life in many ways, its distribution in space because of the way in which they are obtained, SAR images require careful processing prior to analysis [7].
The complexity of monitoring events using SAR images increases when studying tropical areas on account of the density and height of vegetation. The backscatter of waves emitted by the SAR sensor suffers distortions and double-bouncing effects caused by vegetation, which limits the detection of flooded areas. However, proposals have been made to improve the ability of the SAR method to obtain information of interest [8][9][10].
The use of optical multispectral satellite images and digital elevation models (DEMs) as complements to SAR images has the potential to increase the accuracy of classification based on the better spectral correlation between the optical sensors with their respective topographic and water-body information [11]. Combining these two types of image processing allows the extent and depth of flooding to be better detected, analyzed, and monitored [12][13][14].
DEMs are geospatial data of great importance in studies related to detection, monitoring, diagnosis, prediction and indirect estimation of depth of flood events [15,16]. DEMs are generated using different methods and technologies. Pure LIDAR data (raw data) have inherent errors due to obstacles and terrain conditions, which is why a set of cleaning procedures are usually applied [17,18]. Such procedures include interpolation techniques, which allowed us to obtain a DEM with a vertical and horizontal resolution of 5 m.
Timely information about floodwater depth is important for directing rescue and relief resources and determining road closures and accessibility. Once available, flood depth information can also be used for post-event analysis of property damage and floodrisk assessment. Several approaches for quantifying flood depth using remote sensingbased flood maps have been proposed; one of them is [19], who used "Floodwater Depth Estimation Tool (FwDET)", developed to augment remote sensing analysis by calculating water depth based on an inundation map with an associated digital elevation model (DEM).
Furthermore, current computational power enables the implementation of machinelearning techniques for modeling multi-dimensional problems. In agriculture, these techniques have been used for estimating the height of crops, extent of irrigation, and extent and depth of flooding, as well as for identifying pests and diseases [20]. One of the key tasks in detecting flooding is classification, for which machine learning algorithms have shown superior performance to traditional binary segmentation methods [21].
Machine learning (ML) models based on decision trees can use the input dataset (Sentinel-1 SAR, Optical Sentinel-2, and DEM), to establish rules that allow it to classify the set into the categories specified by previous tags [11,17,22]. The algorithm selects the best attribute or characteristics (image pixels) and recursively partitions the labeled data until the following rule is met: all samples belong to the same attribute, there are no more Atmosphere 2022, 13, 1852 3 of 20 samples, or there are no more features to perform the division [22]. The integration of remote sensing to ML algorithms has obtained a good performance in the detection water bodies [22,23]. One of the advantages of ML algorithms is flexibility of training with a smaller training set; and due to its structure, training and prediction times are shorter, maintaining a good precision [24].
In central and southeastern Mexico, floods occur every year because of the intense rainfall during the period from June to October, with surface runoff caused by deforestation and also due to effects of climate change [25,26]. Although some previous studies have used SAR image analysis and processing methods to monitor recurrent floods in the southern part of the country, traditional classification techniques were used in most investigations. Some of them used Radarsat-2 and Sentinel-1 SAR images to track flooding using single and dual polarizations; others evaluated the Sentinel-1 data for describing the spatial and temporal variability of water bodies [27][28][29][30]. The objective of this research was to evaluate and compare the performance of two assembly algorithms: gradient boosting (GB) and random forest (RF) for different combinations of Sentinel-1 SAR images, Sentinel-2 optical images, indexes to detect water bodies, and DEMs for monitoring the extent and depth of flooding generated by excess rainfall.

Study Area
The study area is located in the Municipality of Lerma de Villada, State of Mexico, Mexico (Figure 1), and has a surface area of 230,810 km 2 (23,081 ha) in an irregular polygon with a perimeter of 93,248 km. The study area has a minimum elevation of 2566.2 m above sea level (masl), and in the eastern part of the area, which comprises mountainous terrain, it reaches a maximum value of 3427 masl. The study area corresponds to the Lerma-Santiago hydrological region, whose natural drainage network captures runoff from the mountains in the east and transports it westward through tributaries of the Lerma River, whose path follows a course to the northwest of the State of Mexico (Figure 1).

Rainfall Data Analysis
The rainfall data for the year 2021 were analyzed, obtained from five stations near the study area (2RSMX, CGCMEX, COBMX, RECMX, and TROMX) of the National Meteorological Service [31]. To generate the spatial pattern of rainfall for the period 1 January to 30 September 2021, kriging interpolation was used based on statistical models that include autocorrelation.
Only one year of rainfall data observation was analyzed (2021), as this was an atypical rainy year in the study area. In this analysis, the daily precipitation information covered two seasons: (a) the dry season (January to May), and the rainy season (June to September).

Satellite Images Used
Sentinel-1 and Sentinel-2 images were selected due to their spatial and temporal resolution; their timing in data acquisition also allows access the necessary images to cover the flood event in its entirety. Additionally, there are free access computational tools for preprocessing, as well as extensive experience in the use of this type of images in problems related to detection and classification [32].
Two Sentinel-1 SAR and two Sentinel-2 multispectral scenes were used, one from the dry season and one from the rainy season for each sensor (Table 1). The characteristics of the Sentinel-1 SAR images used were: C band, a spatial resolution of 10 m, acquisition form interferometric wide-swath (IW), processing level 1 (ground range detected or GRD), and dual polarization (VV and VH). For Sentinel-2, bands 4, 3, and 2 with 10 m original resolution, and bands 7, 6, and 5 with resampling, were used to obtain a spatial resolution of 10 m. The images were obtained from the Alaska Satellite Facility (ASF) platform (https://asf.alaska.edu/, accessed on 10 September 2021).

Rainfall Data Analysis
The rainfall data for the year 2021 were analyzed, obtained from five stations near the study area (2RSMX, CGCMEX, COBMX, RECMX, and TROMX) of the National Meteorological Service [31]. To generate the spatial pattern of rainfall for the period 1 January to 30 September 2021, kriging interpolation was used based on statistical models that include autocorrelation. Only one year of rainfall data observation was analyzed (2021), as this was an atypical rainy year in the study area. In this analysis, the daily precipitation information covered two seasons: (a) the dry season (January to May), and the rainy season (June to September).

Satellite Images Used
Sentinel-1 and Sentinel-2 images were selected due to their spatial and temporal resolution; their timing in data acquisition also allows access the necessary images to cover the flood event in its entirety. Additionally, there are free access computational tools for preprocessing, as well as extensive experience in the use of this type of images in problems related to detection and classification [32].
Two Sentinel-1 SAR and two Sentinel-2 multispectral scenes were used, one from the dry season and one from the rainy season for each sensor (Table 1). The characteristics of the Sentinel-1 SAR images used were: C band, a spatial resolution of 10 m, acquisition form interferometric wide-swath (IW), processing level 1 (ground range detected or GRD), and dual polarization (VV and VH). For Sentinel-2, bands 4, 3, and 2 with 10 m original resolution, and bands 7, 6, and 5 with resampling, were used to obtain a spatial resolution of 10 m. The images were obtained from the Alaska Satellite Facility (ASF) platform (https://asf.alaska.edu/, accessed on 10 September 2021).

Software and Computing Requirements
Processing of SAR images was performed using the Sentinel-1 toolbox in the Sentinel application platform (SNAP) [33] and of Sentinel-2 images was performed with the semiautomatic classification plugin [34] extension of QGis Version 3.10.7.
The implementation of machine-learning algorithms was carried out in Python 3.7 language through the Scikit learn 0.21.3 [35] module in a Spyder 3.6 environment. Both vector and matrix mathematical operations were developed with Pandas 0.25.1, Numpy 1.16.5, and Matlab R2016a, with output graphs being constructed using Matplotlib 3.1.1 and Seaborn 0.9.0.

Sample Selection and Resampling
To incorporate pixel qualities and characteristics that correspond to each target class, the resampling process was performed on Sentinel-1 polarization VV and VH images and on the RGB false-color compounds (bands 4, 3, and 2) of Sentinel-2. The resampling was conducted with the binary classification approach, that is, there are two possible exclusive classes (1 if a pixel belongs to the "water" class, and 0 for the "non-water" class). Sample selection was developed in two phases: Atmosphere 2022, 13, 1852 5 of 20 (a) In Phase 1, the coordinates (x, y) of a set of pixels representing the two target classes were obtained. This procedure was performed on a subset of the image of variable size to select pixels with greater detail. For this, three methods were used: (1) growth by region [36,37], (2) manual definition of the threshold, and (3) use of the entire subset. The method for manually defining the threshold and using the entire subset consisted of defining a threshold for binary segmentation based on analysis of the histogram of the subset and the graphic display. This procedure starts with the transformation of the input image to a single band. In the case of Sentinel-1 (SAR), only one polarization is used. Next, the histogram of the scene is calculated [7]. In the case of Sentinel-2, a limit value is defined that allows the separation of water bodies and that also allows a pixel-by-pixel validation on the RGB composite image. Before exporting the coordinates of the sampling pixels, the intra-and inter-class duplicates database was purged, based on the value of the RGB false-color pixels for the "non-water" class and backscatter (VV) values for the "water" class. (b) In Phase 2, the database of phase 1 was updated, and using the geographic coordinates, the pixel values were recovered for the following input combinations: RGB and HSV composite of Sentinel-2, Sentinel-1 VV, and VH polarization, and DEM and indices to detect bodies of water (Table 2). Bands 4, 3, and 2 of the Sentinel-2 image were assigned to generate the RGB composite. The HSV model represents color of three components: Hue (H), Saturation (S), and Value (V). In uniform spaces, the color difference (Euclidean distance) is proportional to the human perception of that difference. In this sense, RGB is not uniform [38]. The conversion from RGB to HSV was obtained from the OpenCV library, using the cvtColor function in Python-3. Table 2. Indexes used to detect water bodies in multispectral images.

Index Reference
Where: MNDWI 1 is the modification of normalized difference water index; MNDWI 2 is the water difference index normalized modified by Rogers and Kearney; NDWI is the normalized difference index of water; AWEI is the automated water extraction index; AWEISH is the automated water extraction index.

Machine Learning Algorithms Used
Algorithms known as decision trees were introduced by Breiman et al., in 1984 [43]. These algorithms have a tree-like structure, starting with a root, followed by nodes that represent a division rule and give rise to branches on both sides. By successive divisions, one arrives at the sheets, which indicate the target classes. The decision tree is built recursively from top to bottom with the initial selection of the features to be used in each decision rule. The construction ends when all of the samples belong to the same class, the features to be subdivided are exhausted, and the maximum allowable depth of the structure has been reached [44]. The key hyperparameters that allow control of the creation of decision trees in the training process are (a) Max_features, (b) Min_samples_split, (c) Min_samples_leaf, and (d) Max_depth [45]. In this study, the GB and RF decision-treebased ensemble algorithms were used, as described below.

Gradient Boosting
This assembly algorithm proposed by Friedman [46] trains decision trees sequentially. The process involves creating "Ne" (number of estimators), and for each tree, the algorithm is trained and corrected with the residual values of its predecessor. This approach allows a robust and powerful algorithm to be obtained after "Ne" iterations [47].
The cost function (L) that allows optimization of the division for each tree is the entropy expressed in Equation (1) [48]: where y i = 1 is the i-th response up to Nc (number of classes) ∈ {0, 1}, and Pi (x) = Pr(y i = 1|x), is the probability that input x generates an output and is equal to unity.

Random Forest
The RF assembly algorithm is based on decision trees and was proposed by Breiman in 2001 [49]. The process consists of training decision trees with different subsets of "Nce" (features with replacement) to create "Ne".
The cost function (L) to be optimized indicates the impurity of the split in each tree, which is given by the Gini index [50] as where p(Cj|i) is the proportion of samples belonging to class j, and Nc represents the total number of classes for a particular node i.

The Training Algorithms
The approach used during training was cross validation. This involves optimizing the algorithms several times to obtain weighted ranking metrics. In this case, 10 cycles were performed, and in each one, 90% of the data was used to train the models, and the remaining 10% (which does not belong to the training set) was used to evaluate the model using validation metrics mentioned in Section 2.7.1.
Standard hyperparameters of each algorithm were selected, and data cross-validation was performed [51,52]. The process began with the definition of the number of partitions for cross-validation (Kf), whereby Kf subsets, disjoint and stratified with respect to the dependent variable, were generated for the dataset. The model was trained Kf times, and in each iteration, Kf-1 subsets were used for training, with the remainder being used for validation. In each iteration, the validation subset was changed, and at the end of the cycle, Kf trained models were obtained, allowing calculation of the indicators for comparison. The hyperparameters of the machine learning models found by the cross-validation processes were the following: (a) For Gradient Boosting (GB): Loss function to be optimized (log_loss, binomial and multinomial deviation). Learning rate (learning_rate = 0.1). Number of estimators (n_estimators = 100). Criterion to measure the quality of the branches ('friedman_mse', for the mean squared error with improvement score by Friedman). Minimum samples for each internal node split (min_samples_split = 2). Minimum number of samples to define a leaf (min_samples_leaf = 1). Maximum depth (max_depth = 3). Randomization seed (random_state = 1). Maximum features (None, set equal to the number of features or attributes available).
(b) For Random Forest (RF): Number of trees that make up the forest (n_estimators = 100). Function or criterion to measure the quality of the ramifications (gini index).

Training Algorithms of Machine Learning Models for Flood Prediction
In the prediction stage, the classifier with the independent variables as inputs classified a pixel into two classes: "1, water" and "0, non-water". Depending on the result of the classification and the actual (desired) label of the pixel, four possibilities arise: (a) true positive (VP), when a pixel is correctly classified as class 1; (b) false positive (FP), when a pixel of class "0" is wrongly classified as class 1 (type I error); (c) true negative (TN), when a pixel is correctly classified as class "0"; and (d) false negative (FN), when a pixel of class "1" is wrongly classified as class "0" (type II error).
To determine the global accuracy of the algorithms (PG), the relationship between the correctly classified values (VP + VN) and the total of the test set (VP + FP + VN + FN) was obtained. This parameter measures the performance of models on balanced samples, also considering the use of four auxiliary evaluation criteria or metrics (precision, sensitivity, F1 score, and Kappa), as described below.
Precision (P) is defined as the ability of the model to correctly classify class 1 pixels in a set whose labels are known and is calculated using Equation (3): Sensitivity (S) measures the ability of the classifier to identify those pixels that represent the water class in a completely unrelated set. It is evaluated and calculated in decimal values using Equation (4): The criteria of P and S follow inverse behavior curves and depend directly on the threshold that is defined to declare the model output as class 1. The harmonic mean of these criteria is better known as the F1 score and allows comparison of the evaluated models with greater certainty. In cases for which there are multiple target classes, it is convenient to use the mean of F1, known as F1 macro (F1m). F1 is calculated as The Kappa index [53] is another discrete multivariable technique that can be used to evaluate the accuracy of algorithms. The KHAT (K) index is calculated using Equation (6) [54]: where r is the total number of rows in the matrix; xii is the number of observations for the "ith" row and column; xi+ and x+i are the marginal totals for the i row and column, respectively; and N is the total number of observations.

Workflow
The workflow started with processing using optical Sentinel-2 [55,56] and SAR Sentinel-1 [57][58][59] data for the dry season to identify perennial water bodies (lagoons, rivers, and irrigation canals) through the growth model by region and manual definition of the threshold. The database was built with the independent variables, which were polarization (VH, VV, and VV/VH), RGB, HSV, and indexes ( bands. The clean database without duplicates between and inter-class allowed optimization of the models through cross-validation with Kf = 10 [60]. The classification accuracy of the standard models was evaluated and compared with the following indicators: P, S, F1, K, the ROC curve (ROC, receiver operating characteristic), and the area under the ROC curve (AUC, area under the curve). The best classifier generated the prediction for each pixel in the satellite images used in the dry and rainy seasons, with maps of flooded areas being calculated by area difference (Figure 2).  Table 3 describes the inputs and combinations obtained from Sentinel-2 and Sentinel-1 for the dry and rainy seasons using the GB and RF algorithms. Three main inputs were generated: (a) Sentinel-2 (S2), (b) Sentinel-1 (S1), and (c) DEM. Algorithm + H, V bands from HSV composite of S2 + DEM S1

Flood Extent
To determine the extent of flooding, the change in pixel classification between the two seasons was computed. Flooding was assigned for a change in pixel classification from 0 in the dry season to 1 in the rainy season, and permanent water was assigned for a value of 1 in both seasons.

Depth of Flooding
The DEM generated using LiDAR from sources E14A38 (2018) and E14A48 (2016) [61] was used to determine the depth of flooded areas by reclassifying the data every 0.50 m. Given the study area elevation range of 2567.5-3427.3 masl, the minimum elevation was used as the baseline for the reclassification and therefore for the depth of flooded areas. In addition, vector data were obtained from the hydrographic network (RH) sub-basin RH12Aa Almoloya-Otzolotepec region, Lerma-Santiago river basin [62], the pluvial drainage network, lagoons, and the great canal crossing the study area. The reclassification was carried out using the QGis reclass module. Other studies have been carried out using the DEM; for example, [19] used a Floodwater Depth Estimation Tool (FwDET), developed to augment remote sensing analysis by calculating water depth based solely on an inundation map with an associated digital elevation model (DEM). The results showed good correspondence, with an average difference of 0.18 m for the coastal (using a 1 m DEM) and 0.31 m for the riverine (using a 10 m DEM) case studies. Algorithm + H, V bands from HSV composite of S2 + DEM S1 10 Algorithm + (VH, VV) dual polarization of S1. S1, DEM 11 Algorithm + (VH, VV) dual polarization of S1 + DEM S1, DEM 12 Algorithm + (VH, VV) dual polarization of S1 + (VV/VH) polarization of S1 + DEM S2, S1 13 Algorithm + HSV composite of S2 + VH polarization of S1. S2, S1 14 Algorithm + HSV composite of S2 + VV polarization of S1. S2, S1, DEM 15 Algorithm + HSV composite of S2 + VV polarization of S1 + DEM S2, S1, DEM 16 Algorithm + H, V bands from HSV composite of S2 + VV polarization of S1 + AWEI index + DEM

Rainfall Record during 2021
The rainfall during September 2021 in the study area was the highest recorded monthly total in the year (201.22 mm) [31]. During the days of 4 to 7 September, excessive rainfall occurred in terms of both intensity and duration, which caused flooding and overflows of the Lerma River in the Lerma-Santiago basin. The minimum mean monthly precipitation from January to September of the five stations near the study area was 1.66 mm (January), and as the months passed, the cumulative precipitation increased to 815.85 mm by 30 September 2021 ( Figure 3).

Flood Extent
To determine the extent of flooding, the change in pixel classification between the two seasons was computed. Flooding was assigned for a change in pixel classification from 0 in the dry season to 1 in the rainy season, and permanent water was assigned for a value of 1 in both seasons.

Depth of Flooding
The DEM generated using LiDAR from sources E14A38 (2018) and E14A48 (2016) [61] was used to determine the depth of flooded areas by reclassifying the data every 0.50 m. Given the study area elevation range of 2567.5-3427.3 masl, the minimum elevation was used as the baseline for the reclassification and therefore for the depth of flooded areas. In addition, vector data were obtained from the hydrographic network (RH) subbasin RH12Aa Almoloya-Otzolotepec region, Lerma-Santiago river basin [62], the pluvial drainage network, lagoons, and the great canal crossing the study area. The reclassification was carried out using the QGis reclass module. Other studies have been carried out using the DEM; for example, [19] used a Floodwater Depth Estimation Tool (FwDET), developed to augment remote sensing analysis by calculating water depth based solely on an inundation map with an associated digital elevation model (DEM). The results showed good correspondence, with an average difference of 0.18 m for the coastal (using a 1 m DEM) and 0.31 m for the riverine (using a 10 m DEM) case studies.

Rainfall Record during 2021
The rainfall during September 2021 in the study area was the highest recorded monthly total in the year (201.22 mm) [31]. During the days of 4 to 7 September, excessive rainfall occurred in terms of both intensity and duration, which caused flooding and overflows of the Lerma River in the Lerma-Santiago basin. The minimum mean monthly precipitation from January to September of the five stations near the study area was 1.66 mm (January), and as the months passed, the cumulative precipitation increased to 815.85 mm by 30 September 2021 (Figure 3).

Samples Selected for Algorithm Training
For the Sentinel-2 RGB composite, a refined and unbalanced database of 62,400 samples (20,000 for the water class and 42,400 for the non-water class) was generated. Of these, 32,000 samples corresponded to the dry season (10,000 in the water class and 22,000

Samples Selected for Algorithm Training
For the Sentinel-2 RGB composite, a refined and unbalanced database of 62,400 samples (20,000 for the water class and 42,400 for the non-water class) was generated. Of these, 32,000 samples corresponded to the dry season (10,000 in the water class and 22,000 in the non-water class) ( Figure 4A), and 30,400 samples for the rainy season (10,000 in the water class and 20,400 in the non-water class) ( Figure 4B).
Atmosphere 2022, 13, x FOR PEER REVIEW 11 of 22 in the non-water class) ( Figure 4A), and 30,400 samples for the rainy season (10,000 in the water class and 20,400 in the non-water class) ( Figure 4B).  Figure 5 shows the input images described in Table 3 used for training and evaluating the machine learning algorithms. HSV false-color composites of Sentinel-2 for the dry and rainy seasons are presented in Figures 5A and 5D, respectively, VV polarizations of Sentinel-1 for the dry and rainy seasons are presented in Figures 5B and 5E, respectively, and the DEM is shown in Figure 5C.  Figure 5 shows the input images described in Table 3 used for training and evaluating the machine learning algorithms. HSV false-color composites of Sentinel-2 for the dry and rainy seasons are presented in Figure 5A and D, respectively, VV polarizations of Sentinel-1 for the dry and rainy seasons are presented in Figure 5B and E, respectively, and the DEM is shown in Figure 5C.

Results for Gradient Boosting Algorithm
For Sentinel-2 and the GB classification algorithm, for both the dry and rainy seasons, the best results were obtained with combination 8 (K = 0.9895 and 0.9716, respectively) ( Table 4). For SAR Sentinel-1 and the GB algorithm, the best result for the dry season was obtained with combination 12 (K = 0.9574), and for the rainy season was obtained with combination 11 (K = 0.9560).
When combining Sentinel-2, Sentinel-1, and MDE using the GB algorithm, combination 15 gave the best result for the dry season (K = 0.9945) and combination 16 for the rainy season (K = 0.9905) ( Table 4).

Results for Random Forest Algorithm
For Sentinel-2 and the RF classification algorithm, for both the dry and rainy seasons, the best results were obtained with combination 8 (K = 0.9885 and 0.9721, respectively) ( Table 5). Similarly, with Sentinel-1 and the RF algorithm, for both the dry and rainy seasons, the best results were obtained with combination 11 (K = 0.9547 and 0.9554, respectively).
When combining Sentinel-2, Sentinel-1, and DEM using the RF algorithm, combination 15 gave the best result for the dry season (K = 0.9937) and combination 16 for the rainy season (K = 0.9878) ( Table 5).

The Best Combinations for Determining Flooding According to Season and Algorithm
On the basis of the information in Tables 4 and 5, the best input combinations for detecting permanent bodies of water in the dry and rainy seasons and flooded areas in the rainy season were able to be identified.

Results for Gradient Boosting Algorithm
For Sentinel-2 and the GB classification algorithm, for both the dry and rainy seasons, the best results were obtained with combination 8 (K = 0.9895 and 0.9716, respectively) ( Table 4). For SAR Sentinel-1 and the GB algorithm, the best result for the dry season was obtained with combination 12 (K = 0.9574), and for the rainy season was obtained with combination 11 (K = 0.9560).
When combining Sentinel-2, Sentinel-1, and MDE using the GB algorithm, combination 15 gave the best result for the dry season (K = 0.9945) and combination 16 for the rainy season (K = 0.9905) ( Table 4).

Dry Season
Both algorithms (RF and GB) showed good performance for combination 15 (Algorithm + Sentinel-2 HSV composite + Sentinel-1 VV polarization + MDE). However, a better classification was obtained with the GB algorithm; the performance indicators (F1m, AUC, and Kappa) are shown in Table 6, which were used to validate the results. Values of indicators for evaluating the performance of the algorithms are presented in Table 7 and were obtained from the confusion matrix of the last test subset in the crossvalidation phase (Kf = 10). The best-ranked algorithm was GB, with an overall accuracy of 99.81%. The performance error of the GB and RF algorithms is directly related to the false positive (FP) rate. For the dry season, the GB algorithm predicted an extent of water bodies (surface area) of 968.43 ha (4.20% of the total study area), whereas the RF algorithm predicted an area of 836.6 ha (3.63% of the total study area) ( Table 8). The GB algorithm identified a greater extent of water bodies in the dry season compared with RF, although the distributions are similar for the two algorithms ( Figure 6A,B). This is because both models use decision trees for classification.

Rainy Season
Both algorithms obtained the best results for the detection of floods in the rainy season with combination 16. The index that contributes best to flood detection is the AWEI index. A comparison of the performance of the two algorithms reveals that GB shows the best result than RF; the performance indicators (F1m, AUC, and Kappa) are shown in Table 9, which were used to validate the results. Values of indicators for evaluating the performance of the algorithms are presented in Table 10 and were obtained from the confusion matrix of the last test subset in the crossvalidation phase (Kf = 10). The GB algorithm showed the best performance when classifying the test set, with an overall accuracy (GP) of 99.67%, compared with 99.47% for RF. Compared with results for the dry season, the two analyzed models have lower classification accuracies because of type II errors, with a high rate of false negative classifications. For the extent of permanent bodies of water and flooded areas in the rainy season, the GB algorithm identified 7.96% of the total area as being water, compared with 7.04% for RF (Table 11).

Rainy Season
Both algorithms obtained the best results for the detection of floods in the rainy season with combination 16. The index that contributes best to flood detection is the AWEI index. A comparison of the performance of the two algorithms reveals that GB shows the best result than RF; the performance indicators (F1m, AUC, and Kappa) are shown in Table 9, which were used to validate the results. Values of indicators for evaluating the performance of the algorithms are presented in Table 10 and were obtained from the confusion matrix of the last test subset in the cross-validation phase (Kf = 10). The GB algorithm showed the best performance when classifying the test set, with an overall accuracy (GP) of 99.67%, compared with 99.47% for RF. Compared with results for the dry season, the two analyzed models have lower classification accuracies because of type II errors, with a high rate of false negative classifications. For the extent of permanent bodies of water and flooded areas in the rainy season, the GB algorithm identified 7.96% of the total area as being water, compared with 7.04% for RF (Table 11). The GB algorithm correctly predicts flooded areas in agricultural areas located in valleys and transition zones (mountain slopes), as well as an increase in surface area along the left bank of the Lerma River and areas with urban infrastructure. The similarity of the pattern of flooded area for GB and RF can be seen in Figure 7A,B and is due largely to both of these algorithms being based on decision trees.

Extent of Flooding According to Season
Flooding was considered to be those pixels that were classified as "non-water"class during the dry season and as "water" class during the rainy season. With this criterion, the GB algorithm identified 1113.36 ha of flooding, and RF identified 670.46 ha of flooding (Table 12). The analysis of the extent of the flooding is based on the temporal change of the pixels as represented by the difference between Figure 6 (dry season) and Figure 7 (rainy season). The flooded areas are largely on the left bank of the Lerma River, agricultural plots in the Lerma valley, as well as intermittent stormwater channels in the catchment areas and runoff from the mountains into the Lerma valley ( Figure 8). Regarding the evaluated algorithms, the difference between GB and RF lies in the compactness of the flooded regions identified by the evaluation metrics. Both algorithms have acceptable performance for flood identification and monitoring.

RF
16 1623.98 7.04 The GB algorithm correctly predicts flooded areas in agricultural areas located in valleys and transition zones (mountain slopes), as well as an increase in surface area along the left bank of the Lerma River and areas with urban infrastructure. The similarity of the pattern of flooded area for GB and RF can be seen in Figure 7A,B and is due largely to both of these algorithms being based on decision trees.

Extent of Flooding According to Season
Flooding was considered to be those pixels that were classified as "non-water"class during the dry season and as "water" class during the rainy season. With this criterion, the GB algorithm identified 1113.36 ha of flooding, and RF identified 670.46 ha of flooding (Table 12). The analysis of the extent of the flooding is based on the temporal change of the pixels as represented by the difference between Figure 6 (dry season) and Figure 7 (rainy season). The flooded areas are largely on the left bank of the Lerma River, agricultural plots in the Lerma valley, as well as intermittent stormwater channels in the catchment areas and runoff from the mountains into the Lerma valley ( Figure 8). Regarding the evaluated algorithms, the difference between GB and RF lies in the compactness of the flooded regions identified by the evaluation metrics. Both algorithms have acceptable performance for flood identification and monitoring.

Depth of Flooded Areas
Agricultural plots of the center and north of the valley show flooded areas with depths of <0.5 m. In other areas, such as Lagoon, depths ranging from 1.0 to 1.5 m were observed. The left bank area of Lerma river, which is located in the west of the study area, showed depths of >2 m from overflow. In hillside areas, small valleys retain runoff water and flood water (Figure 9) generated by the intense rainfalls that occurred from 4 to 7 September 2021.

Depth of Flooded Areas
Agricultural plots of the center and north of the valley show flooded areas with depths of <0.5 m. In other areas, such as Lagoon, depths ranging from 1.0 to 1.5 m were observed. The left bank area of Lerma river, which is located in the west of the study area, showed depths of >2 m from overflow. In hillside areas, small valleys retain runoff water and flood water ( Figure 9) generated by the intense rainfalls that occurred from 4 to 7 September 2021.
Agricultural plots of the center and north of the valley show flooded areas with depths of <0.5 m. In other areas, such as Lagoon, depths ranging from 1.0 to 1.5 m were observed. The left bank area of Lerma river, which is located in the west of the study area, showed depths of >2 m from overflow. In hillside areas, small valleys retain runoff water and flood water (Figure 9) generated by the intense rainfalls that occurred from 4 to 7 September 2021.

Discussion
The presence of clouds, fog, and atmospheric particles in Sentinel-2 optical images can place limitations on detecting phenomena during rainy periods, as such conditions decrease solar illumination. During image processing, it is necessary to homogenize the histograms and normalize values of the RGB composite [63]. In SAR image processing, backscatter values are converted to backscatter coefficients (dB) to increase scene contrast and enhance visual differences between target classes.

Discussion
The presence of clouds, fog, and atmospheric particles in Sentinel-2 optical images can place limitations on detecting phenomena during rainy periods, as such conditions decrease solar illumination. During image processing, it is necessary to homogenize the histograms and normalize values of the RGB composite [63]. In SAR image processing, backscatter values are converted to backscatter coefficients (dB) to increase scene contrast and enhance visual differences between target classes.
The results obtained using machine learning algorithms (models) in this research indicate that, of the analyzed inputs, the combination of Sentinel-2, Sentinel-1, and DEM obtained more reliable results than other combinations [64,65], and the more robust and efficient model was gradient boosting algorithm to classify water bodies and flooded areas. The use of optical images with SAR images forms a complementation that can reduce Type I and II classification errors in the two analyzed machine learning algorithms (GB and RF). However, if a multiclass classification is to be carried out, then the use of optical images is necessary, as it allows a wide variety of shades and textures to be distinguished, which can be perfectly correlated to detect different uses and land covers, such as bodies of water and flooded areas [66,67]. In addition, SAR images can be used to identify bodies of water with aquatic vegetation, lagoons, rivers, lakes, and irrigation canals using VV polarization; however, the potential of SAR for monitoring floods is limited by the similarity of backscatter values in (shallow) flooded areas with soils without vegetation with high moisture content [68].
For the dry season, the utilized algorithms (GB and RF) detect similar regions of water bodies. The difference between the output maps lies in the compact regions, where the GB algorithm detects a larger area of water bodies than RF. For this season, the GB algorithm has the best performance for combination 15, generated by using Sentinel 2, Sentinel-1, and DEM (HSV composite, VV polarization, and DEM) with F1m = 0.997, AUC = 0.999, and K = 0.994.
For the rainy season, the models detect very similar spatial distributions of water bodies and flooded areas. There is a difference in the number of pixels that make up each region: GB identifies 1835.7 ha of flooded areas, and RF identifies 1623.98 ha. This difference is due to the identification by GB of pixels as "water" in small valleys located on the slopes of mountains. The best combination for both the RF and GB algorithms for the rainy season is combination 8, with F1m = 0.9861 and K = 0.9721 and F1m = 0.9858 and K = 0.9716, respectively.
In the comparison of the indexes derived from Sentinel-2 for detecting bodies of water for both algorithms (GB and RF); the best performance for the dry season is obtained with the reformulated automated water extraction index (AWEISH, combination 7) [42]; and the best performance for the rainy season is obtained with the automated water withdrawal index (AWEI, combination 6) [42]. The performance of the models does not improve significantly, when these indexes are integrated but the classification accuracy of the models increases with integration with the DEM.
According to Tables 6 and 9, the GB algorithm reports an extent of flooded areas of 1113.36 ha, and the evaluation metrics show that this model is superior to RF. The maps of flood extent obtained with both algorithms show a high degree of similarity. The depth of flooding in agricultural plots located in the central part of the Lerma valley and on the left bank of the Lerma River is less than 1 m, but in some parts of the lagoons, depths of 1-2 m are observed.
One of the advantages of this research is that the proposed methodology generates maps that show the extent of floods in a short time, whose results are validated using comparison metrics (F1m, AUC, and Kappa) using the best model.

Conclusions
In this study, we assessed 16 combinations of Sentinel-1 synthetic aperture radar (SAR) images, Sentinel-2 optical images, and DEM data to evaluate the performance of two widely used machine learning algorithms for providing information about flooding extent and depth, with a case study of flooding in the Municipality of Lerma de Villada, central Mexico.
To identify flooded areas, the combination of Sentinel-2 optical images, SAR Sentinel-1, and DEM, with GB and RF algorithms, gives more reliable results if we used only Sentinel-2 or Sentinel-1 for binary classification (water and non-water).
Determination of flooded areas requires independently analyzing the two seasons (dry and rainy) because meteorological conditions strongly affect the optical images in the rainy season and consequently generate a combined model that predicts high numbers of false positives.
For the three analysis inputs (Sentinel-2 + MDE, Sentinel-1 + MDE, and Sentinel-2 + Sentinel-1 + DEM) for both seasons (dry and rainy), the ensemble algorithms (GB and RF) show acceptable performance, with GB being slightly superior according to performance metrics.
For the detection of bodies of water in the dry season, the metrics indicate that the best algorithm is GB with combination 15 (F1m = 0.9973, AUC = 0.9999, K = 0.9945); however, the model gives a 0.27% classification error on the set of samples with which it was trained and evaluated. For the identification of flooded areas in the rainy season, with the same GB algorithm, the best results are obtained with combination 16 (F1m = 0.9953, AUC = 0.9999, and K = 0.9905); however, the model shows a 0.47% classification error on the set of samples with which it was trained and evaluated.
The GB algorithm generates the best result for the extent of flooding, with an area of 1113.36 hectares, with flooding depth ranging is <1.0 m.
The output maps of both the extent and depth of flooding should allow planning and execution of flood mitigation actions in a more timely manner and with a higher degree of accuracy than could otherwise be achieved.
This research aims to be the basis and approach to the implementation of machine learning algorithms applied to flood monitoring in southeastern Mexico. For future work, it is recommended to incorporate field validations and case studies elsewhere in order to obtain better calibrated models.