Identifying Dynamic Changes in Water Surface Using Sentinel-1 Data Based on Genetic Algorithm and Machine Learning Techniques

: The knowledge of water surface changes provides invaluable information for water resources management and ﬂood monitoring. However, the accurate identiﬁcation of water bodies is a long-term challenge due to human activities and climate change. Sentinel-1 synthetic aperture radar (SAR) data have been drawn, increasing attention to water extraction due to the availability of weather conditions, water sensitivity and high spatial and temporal resolutions. This study investigated the abilities of random forest (RF), Extreme Gradient Boosting (XGB) and support vector machine (SVM) methods to identify water bodies using Sentinel-1 imageries in the upper stream of the Yangtze River, China. Three sets of hyper-parameters including default values, optimized by grid searches and genetic algorithms, were examined for each model. Model performances were evaluated using a Sentinel-1 image of the developed site and the transfer site. The results showed that SVM outperformed RF and XGB under the three scenarios on both the validated and transfer sites. Among them, SVM optimized by genetic algorithm obtained the best accuracy with precisions of 0.9917 and 0.985, kappa statistics of 0.9833 and 0.97, F 1-scores of 0.9919 and 0.9848 on validated and transfer sites, respectively. The best model was then used to identify the dynamic changes in water surfaces during the 2020 ﬂood season in the study area. Overall, the study further demonstrated that SVM optimized using a genetic algorithm was a suitable method for monitoring water surface changes with a Sentinel-1 dataset.


Introduction
Flooding is a natural phenomenon in which water volume or the water level of rivers and lakes increases rapidly due to rainstorms, melting ice or a storm surge [1]. In the world, flooding is one of the most frequent and serious natural disasters [2]. Between 1996 and 2005, the number of large-scale flood disasters that occurred every decade worldwide was twice as much as that of , and the loss of economic property increased by five times [3][4][5]. Mapping water bodies in order to describe past and present flood disasters is essential for assessing the disaster situation, as well as for assisting flood rescue and relief efforts [6].
Traditional flood inundation impact analysis is usually accomplished by calculating relevant hydrological indicators (e.g., inundation range) based on hydrology and hydrodynamics [7], but this method cannot directly reflect the impact of flood inundation. Sensor networks monitoring rainfall and river flow [8] cannot cover many parts of the world, and these ground-based systems are expensive [9]. In recent years, satellite remote sensing technology has developed rapidly that can provide timely, objective, accurate, regional and global geographic information data [10][11][12]. Among them, Sentinel-1 remote sensing data

Study Area
The study area, Chongqing (105°11′ E~110°11′ E, 28°10′ N~32°13′ N), is located in the upper reaches of the Yangtze River, southwestern China ( Figure 2). Chongqing is rich in water resources, with average annual water resources of about 500 billion m 3 , ranking the first in China in terms of water area per square kilometer. Further, Chongqing has many rivers and dense water systems. The main rivers flowing through Chongqing include the Yangtze River, Jialing River, and Wujiang River. The Yangtze River runs through the whole territory from west to east, with a flow length of 665 km. There are 553 rivers with a drainage area of more than 50 km 2 , and seven rivers with a drainage area of more than 10,000 km 2 [42]. The terrain of Chongqing gradually reduces from the north to the Yangtze

Study Area
The study area, Chongqing (105 • 11 E~110 • 11 E, 28 • 10 N~32 • 13 N), is located in the upper reaches of the Yangtze River, southwestern China ( Figure 2). Chongqing is rich in water resources, with average annual water resources of about 500 billion m 3 , ranking the first in China in terms of water area per square kilometer. Further, Chongqing has many rivers and dense water systems. The main rivers flowing through Chongqing include the Yangtze River, Jialing River, and Wujiang River. The Yangtze River runs through the whole territory from west to east, with a flow length of 665 km. There are 553 rivers with a drainage area of more than 50 km 2 , and seven rivers with a drainage area of more than 10,000 km 2 [42]. The terrain of Chongqing gradually reduces from the north to the Yangtze River Valley in the south. In the northwest and central part, hills and low mountains are Remote Sens. 2021, 13, 3745 4 of 20 dominant. In the southeast, there are two high mountains, Daba Mountain and Wuling Mountain. The terrain is high in the southeast and northeast and low in the middle and west. Chongqing has a subtropical humid climate, which is characterized by hot summers and warm winters. The average annual temperature is 17.5 • C, rainfall is 1125.3 mm, relative humidity is 80%, sunshine hours are 1000-1400 h, and the sunshine percentage is only 25-35%. The rainfall of the rainy season (from May to September) accounts for about 70% of the total annual rainfall. Heavy rain usually occurs in June and August ( Figure 3) [42]. Thus, Chongqing is prone to flood disasters in the summer. Rainstorms and flooding occurred frequently in 2020 in China, enduring the most serious flood situation since 1998. In the 15 districts and counties of Chongqing, 263,200 people were affected, 251,000 people took emergency shelter, 132,700 people were transferred and resettled, 4095 houses were damaged, and the affected area of crops was 8636 h m 2 [43].

Data
In this study, seven interferometric wide swath (IW) ground range detection (GRD) images of Chongqing during June to August 2020 (the specific dates were 10 June, 22 June, 4 July, 16 July, 28 July, 9 August and 21 August) were obtained from the Copernicus open access hub. Available online: https://scihub.copernicus.eu/dhus/#/home (accessed on 10 June 2020). The Sentinel-1 data were preprocessed by SNAP software, including the function of Apply-Orbit-File, ThermalNoiseRemoval, Calibration, LinerToFormB, Multilook, Speckle-Filtering, and Terrain-Correction (using DEM data of Chongqing from https://www.webmap.cn/, accessed on 10 June 2020). Finally, VV polarization, VH polarization and projection local incidence angle were obtained as features for machine learning modeling. The data acquired on 10 June were used for model development and assessment. The best model was then applied to investigate the dynamic changes in water surface and to analyze the impact of flooding using all the data.
Four sites (site A to D) were selected along the Yangtze River ( Figure 2). Among them, site A was used for model development and validation, site B for model transfer, and site C (Shanhuba) and site D (Huangjin River) for investigating the dynamic changes in water surfaces during the 2020 flood season. Shanhuba is located in the center of Chongqing municipality. There is an island in Shanhuba that many people visit during the dry season. Huangjin River is a tributary on the north bank of the Yangtze River in the Three Gorges Reservoir area. There are paddy fields and dry land along the Huangjin River, which were calculated, and the effects of flood on cultivated land were analyzed for this rural area. For model development and transfer, the sample points in site A and B were selected with the help of a 1:10,000 land-use map in 2019 and Google Earth pro (v7.3.3.7673). In the process, the land-use map was divided into a 20 × 20 m grid according to the 20 m resolution of the Sentinel-1 data. The results indicate that the length and width of small water bodies over the study area were usually between 50-80 m. In order to obtain only water body for each sample point and eliminate the influence of mixed pixels on the classification results, we used a 3 × 3 square grid window to extract pure pixels [44][45][46][47]. If a grid itself and its surrounding grids were of the same land-use type, the grid was a pure pixel. This step was implemented in GDAL (Geospatial Data Abstraction Library, version 2.2.2) and Python v3.6 scripting language. Please refer to the Supplementary for details of this process. In this way, 57,046 pure pixels (2118 water and 54,928 non-water samples) were obtained for site A. We used an under-sampling technique to eliminate data imbalance, which removed majority-class examples and decreased the overall level of class imbalance [48][49][50][51]. In this way, we randomly selected 2000 water samples and 2000 non-water samples from these pure pixels, of which 70% were used as training samples for model development and 30% were used as the validation set for hyper-parameter optimization and accuracy verification. Accordingly, 2000 water samples and 2000 non-water samples were randomly selected from 2013 water samples and 31,884 non-water samples in site B to verify the transfer accuracy of the model.

Data
In this study, seven interferometric wide swath (IW) ground range detection (GRD) images of Chongqing during June to August 2020 (the specific dates were 10 June, 22 June, 4 July, 16 July, 28 July, 9 August and 21 August) were obtained from the Copernicus open access hub. Available online: https://scihub.copernicus.eu/dhus/#/home (accessed on 10 June 2020). The Sentinel-1 data were preprocessed by SNAP software, including the function of Apply-Orbit-File, ThermalNoiseRemoval, Calibration, LinerToFormB, Multilook, Speckle-Filtering, and Terrain-Correction (using DEM data of Chongqing from https://www.webmap.cn/ accessed on 10 June 2020). Finally, VV polarization, VH polarization and projection local incidence angle were obtained as features for machine learning modeling. The data acquired on 10 June were used for model development and assessment. The best model was then applied to investigate the dynamic changes in water surface and to analyze the impact of flooding using all the data.
Four sites (site A to D) were selected along the Yangtze River ( Figure 2). Among them, site A was used for model development and validation, site B for model transfer, and site C (Shanhuba) and site D (Huangjin River) for investigating the dynamic changes in water surfaces during the 2020 flood season. Shanhuba is located in the center of Chongqing municipality. There is an island in Shanhuba that many people visit during the dry season. Huangjin River is a tributary on the north bank of the Yangtze River in the Three Gorges Reservoir area. There are paddy fields and dry land along the Huangjin River, which were calculated, and the effects of flood on cultivated land were analyzed for this rural area. For model development and transfer, the sample points in site A and B were selected with the help of a 1:10,000 land-use map in 2019 and Google Earth pro (v7.3.3.7673). In the process, the land-use map was divided into a 20 × 20 m grid according to the 20 m resolution of the Sentinel-1 data. The results indicate that the length and width of small water bodies over the study area were usually between 50-80 m. In order to obtain only water body for each sample point and eliminate the influence of mixed pixels on the classification results, we used a 3 × 3 square grid window to extract pure pixels [44][45][46][47]. If a grid itself and its surrounding grids were of the same land-use type, the grid was a pure pixel. This step was implemented in GDAL (Geospatial Data Abstraction Library, version 2.2.2) and Python v3.6 scripting language. Please refer to the Supplementary for details of this process. In this way, 57,046 pure pixels (2118 water and 54,928 non-water samples) were obtained for site A. We used an under-sampling technique to eliminate data imbalance, which removed majority-class examples and decreased the overall level

Genetic Algorithm (GA)
Genetic algorithms (GAs) were first proposed by Professor Holland from the United States in 1975 [44]. They are a kind of random search algorithm that draws on biological selection and natural genetic mechanisms. Genetic algorithms simulate reproduction, crossover and gene mutation in the process of natural selection and natural heredity. In each iteration, a group of candidate solutions are reserved, and the better individuals are selected from the solution group according to some index. Genetic operators (selection, crossover and mutation) are used to combine these individuals to produce a new generation of candidate solution groups, and the process is repeated, until some convergence index is satisfied [45].

Random Forest (RF)
Random forest (RF) is a statistical learning theory that uses a bootstrap resampling method to select multiple samples from the original samples, build a decision tree model for each bootstrap sample, and then combine the predictions of multiple decision trees to arrive at the final prediction results through voting. A large number of theoretical and empirical studies have proved that RF has a high prediction accuracy, a good tolerance to outliers and noise, and is not prone to over-fitting [46]. RF is a natural nonlinear modeling tool and is one of the most popular frontier research fields in data mining and bioinformatics at present [47,48].

Extreme Gradient Boosting (XGB)
Extreme Gradient Boosting (XGB) is a machine learning system based on lifting tree that was proposed by Chen et al. [52]. It contains a set of iterative residual trees, each of which has a n − 1 tree residual before learning. Adding the new sample output values predicted by each tree is the final predicted value of the sample. Unlike the commonly used gradient lifting decision tree [53], which only uses the initial derivative information during optimization, XGB carries out a second-order Taylor expansion of the cost function, and uses the first and second derivatives at the same time so that XGB has good results [54].

Support Vector Machine (SVM)
Support vector machine (SVM) is a machine learning method based on VC dimension theory and the structural risk minimization principle of the statistical learning theory [52]. It has many unique advantages for solving small-sample, nonlinear, and high-dimensional pattern recognition problems, and overcomes the problems of "dimension disaster" and "over-learning" to a great extent. It has been widely used in pattern recognition, function estimation, regression analysis, time series prediction, and other fields [53,54].

Combining Machine Learning with Genetic Algorithm
For RF, XGB, and SVM, there are two or more hyper-parameters that need to be tuned to achieve the highest classification accuracy (Table 1). Most people use grid searches or random searches to achieve the optimal combination of hyper-parameters. But if there are too many hyper-parameters, the time cost of the grid search will increase exponentially, and a random search cannot always guarantee the optimal combination of hyper-parameters. Hence, this study makes an attempt to use genetic algorithms to search for the optimal hyper-parameter of RF, XGB and SVM through which to identify water bodies. In order to ensure that the genetic algorithm could discover the global optimal solution of the hyperparameter and reduce the training time, the number of iterations of the genetic algorithm was set to be 100, and the population size was set to be 50. Further, the performances of the optimized models based on the genetic algorithm (hereafter RF_GA, XGB_GA, and SVM_GA) were compared with that of the optimized models using grid search methods (RF_grid, XGB_grid, and SVM_grid) as well as the models with default hyper-parameters (RF, XGB, and SVM). The default hyper-parameter values of the eRandomForestClassifier and SVC modules in scikit-learn 0.20.3 packages and xgb module in xgboost 0.90 packages of Python v3.6 were used in this work.

Statistical Indicators
Indicators including accuracy (ACC), kappa coefficient, and F1-score were applied to evaluate the model performance.
Precision is the most common index that is easy to understand; that is, the number of correctly classified samples can be divided by the number of all samples (TP: correct prediction of water body, FN: wrong prediction of water body, FP: wrong prediction of non-water body, TN: correct prediction of non-water body (Table 2). Generally, the higher the accuracy, the better the classifier. Kappa coefficient is an index to measure the accuracy of classification: Producer's accuracy (P) indicates how many positive examples in the sample are predicted correctly.
User's accuracy (U) represents the proportion of actual positive cases in the example that are divided into positive examples.
Sometimes there are contradictions between P and R indicators, so they need to be considered comprehensively. The F1 score (also known as the F-score) is the most common method to achieve this. The F1 score is the weighted harmonic average of accuracy rate and recall rate: Therefore, the F1 score synthesizes the results of P and R. A model with a higher-value F1 score is more effective.

Hyper-Parameter Optimization
As detailed in Section 2, a genetic algorithm was applied to select the optimal hyperparameter of the RF, XGB, and SVM models. This was done 50 times for each machine learning model. The average maximum precision of each generation and the corresponding hyper-parameter were recorded during each GA iteration. Figure 4 illustrates the change in the average precision with each generation during hyper-parameter optimization. For RF and XGB, the average precision increased progressively over generations. An increase in the average precision was achieved after the 50th iteration for RF and 150th iteration for XGB. For SVM, the average precision increased very quickly and remained almost unchanged. The computational costs of hyper-parameter optimization using the genetic algorithm were 1357 s for RF, 242 s for XGB, and 758 s for SVM. In addition, the results were compared with the grid search method. The computational costs of hyper-parameter optimization using a grid search were 1863 s for RF, 260 s for XGB, and 1172 s for SVM. Obviously, the computational time required for hyper-parameter optimization using the genetic algorithm was relatively short. These results indicate that genetic algorithms are efficient in searching for optimal hyper-parameters of ML methods. Table 3 shows the final optimal hyper-parameters of RF, XGB, and SVM produced by both the genetic algorithm and the grid search method.

Model Performance
The statistical indices of accuracy (ACC), kappa value, and F1 score were used to compare model performances. Table 4 shows the accuracy indicators of these ML models with default and optimal hyper-parameters using a genetic algorithm and grid search on the validation site. Models with the optimized hyper-parameters outperformed those

Model Performance
The statistical indices of accuracy (ACC), kappa value, and F1 score were used to compare model performances. Table 4 shows the accuracy indicators of these ML models with default and optimal hyper-parameters using a genetic algorithm and grid search on the validation site. Models with the optimized hyper-parameters outperformed those with default hyper-parameters. Among them, and SVM optimized with both the genetic algorithm and grid search had the highest ACC (0.9917), kappa (0.9833), and F1 score (0.9919), whereas the RF with default hyper-parameters had the lowest ACC (0.9817), kappa (0.9633), and F1 score (0.9823). In order to test the stability of the models, the developed models were transferred to an area that was independent of the testing site. Table 5 shows the accuracy indicators of these ML models with default and optimized hyper-parameters using a genetic algorithm and grid search on the transfer site. SVM optimized using a genetic algorithm achieved the highest values of ACC (0.9850), kappa (0.9700), and F1 score (0.9848). RF with default hyper-parameters and optimized using a grid search had the lowest values of ACC (0.9800), kappa (0.9600), and F1 score (0.9797). Obviously, all models had slightly lower performances on the transfer site than on the training site. Figure 5 shows the relative variation of accuracy on the validation site and the transfer site. relative variation of precision(%) = ACC validation − ACC transfer ACC transfer × 100 (7) Figure 5. The relative variation of accuracy of ML models with default and optimized hyper-parameters using a genetic algorithm and grid search method from validation site to transfer site.
In order to provide a better visual comparison, the water distribution maps of two sub-regions (Shanhuba and Huangjin River) produced by the nine classifiers were shown in Figures 6 and 7. Generally, the predicted maps were very similar to their corresponding satellite images. For Shanhuba (Figure 6), a few of them produced by the models with default hyper-parameters presented more areas of water. There was an obvious "salt and pepper" phenomenon in Figure 6c. Compared with the satellite image of Shanhuba, there was a large area of water in Figure 6c that was not found in other maps. Additionally, the maps created by SVM_ GA and SVM_ Grid were closest to the satellite map of water distribution. For Huangjin River (Figure 7), maps produced by the models with default hyper-parameters also presented more areas of water. According to the land-use map of the study site, forests were grown along the river sides and in the north and west parts of the site. RF_grid, RF_GA, XGB_grid and XGB_GA divided more forest land into water body than SVM_grid and SVM_GA. Meanwhile, the latter classifiers contributed less 'noise' than other models (Figure 7d-i). Considering the model performance on validation and transfer sites, SVM_GA was the best model for predicting water distribution with Sentinel 1 images, and was applied to explore the dynamic changes in water surfaces. Models with default hyper-parameters presented small changes in precision, whereas XGB optimized using both a grid search and genetic algorithm had large relative accuracy.
In order to provide a better visual comparison, the water distribution maps of two sub-regions (Shanhuba and Huangjin River) produced by the nine classifiers were shown in Figures 6 and 7. Generally, the predicted maps were very similar to their corresponding satellite images. For Shanhuba (Figure 6), a few of them produced by the models with default hyper-parameters presented more areas of water. There was an obvious "salt and pepper" phenomenon in Figure 6c. Compared with the satellite image of Shanhuba, there was a large area of water in Figure 6c that was not found in other maps. Additionally, the maps created by SVM_ GA and SVM_ Grid were closest to the satellite map of water distribution. For Huangjin River (Figure 7), maps produced by the models with default hyper-parameters also presented more areas of water. According to the land-use map of the study site, forests were grown along the river sides and in the north and west parts of the site. RF_grid, RF_GA, XGB_grid and XGB_GA divided more forest land into water body than SVM_grid and SVM_GA. Meanwhile, the latter classifiers contributed less 'noise' than other models (Figure 7d-i). Considering the model performance on validation and transfer sites, SVM_GA was the best model for predicting water distribution with Sentinel 1 images, and was applied to explore the dynamic changes in water surfaces.

Dynamic Changes in Water Surface
The best-performing model (SVM_GA) was applied to investigate the dynamic changes in water surface in the upper stream of the Yangtze River during the flood period (June to August) in 2020. Seven maps of water surface distribution were created using Sentinel-1 images acquired on 10 June, 22 June, 4 July, 16 July, 28 July, 9 August and 21 August 2020. The dynamic changes in water surface over Shanhuba and Huangjin River during the flood period are shown in Figures 8 and 9. According to Figure 8a, we can see that the flood did not start on 10 June, and the large land of Shanhuba was exposed. From 22 June to 16 July, the flood season came, and the area of Shanhuba shrank (Figure 8c-e). On 28 July, Shanhuba was completely submerged (Figure 8f). On 9 August, the water level dropped, and Shanhuba came out of the water (Figure 8g). On 22 August, the flooding occurred, and Shanhuba was submerged again (Figure 8h). This is exactly what the news said. Available online: https://www.sogou.com (accessed on 10 June 2020). According to the news, from 1 June 2020 to the end of June, there were five rounds of heavy rainfall in southern China, one-sixth of which was more than 200 mm. At 8 a.m. on July 18, the largest flood since the Yangtze River entered the flood season and passed through the main city of Chongqing. At 8 a.m. on 18 August, Shanhuba was completely submerged. On 16 August, the water from Fujiang River, a tributary of Jialing River in the upper reaches of the Yangtze River, increased rapidly. Meanwhile, the Xiaoheba station exceeded the warning water level (238 m), and the water from Wusheng station on the main stream of Jialing River also increased greatly.

Dynamic Changes in Water Surface
The best-performing model (SVM_GA) was applied to investigate the dynamic changes in water surface in the upper stream of the Yangtze River during the flood period (June to August) in 2020. Seven maps of water surface distribution were created using Sentinel-1 images acquired on 10 June, 22 June, 4 July, 16 July, 28 July, 9 August and 21 August 2020. The dynamic changes in water surface over Shanhuba and Huangjin River during the flood period are shown in Figures 8 and 9. According to Figure 8a, we can see that the flood did not start on 10 June, and the large land of Shanhuba was exposed. From 22 June to 16 July, the flood season came, and the area of Shanhuba shrank (Figure 8c-e). On 28 July, Shanhuba was completely submerged (Figure 8f). On 9 August, the water level dropped, and Shanhuba came out of the water (Figure 8g). On 22 August, the flooding occurred, and Shanhuba was submerged again (Figure 8h). This is exactly what the news said. Available online: https://www.sogou.com (accessed on 10 June 2020). According to the news, from 1 June 2020 to the end of June, there were five rounds of heavy rainfall in southern China, one-sixth of which was more than 200 mm. At 8 a.m. on July 18, the largest flood since the Yangtze River entered the flood season and passed through the main city of Chongqing. At 8 a.m. on 18 August, Shanhuba was completely submerged. On 16 August, the water from Fujiang River, a tributary of Jialing River in the upper reaches of the Yangtze River, increased rapidly. Meanwhile, the Xiaoheba station exceeded the warning water level (238 m), and the water from Wusheng station on the main stream of Jialing River also increased greatly.  The best-performing model (SVM_GA) could also be applied to investigate the flooded condition of dry land paddy field in Huangjin River during the flood period (June to August) in 2020. Figure 9 shows the disaster situations from 10 June 2020 to 21 August 2020 in Huangjin River basin. Navy blue represents the flood inundation area, a small part of which was inundated from 10 June to 4 July potentially as a result of rainfall, model classification error, or change in land-use status. On 16 July, a small part of the area outside the inland shoals along the Huangjin River was inundated. On 28 July, the inundated area increased. On 9 August, the inundated area decreased slightly, and on 21 August, the inundated area continued to increase, which was consistent with the news reports. Table 6 shows the inundated areas of dry land and paddy field in the Huangjin River Basin. More areas of dry land were inundated during the flood season in 2020. Because the proportion of dry land in the Huangjin River watershed was much larger than the paddy field, and the dry land was closer to the river than the paddy field (Figure 9b).

Model Performance
Before hyper-parameter optimization, SVM was the best and RF was the worst on the validation site. The best was XGB and the worst was RF on the transfer site. This was because the three models had different tolerances for noise. For the land classification problem, the method to obtain the land type information of a certain place was usually based on the land-use map produced by satellite images or land survey. Undoubtedly, some errors or noises caused by human activities on natural environment and human visual interpretation would be introduced to the data set. These noises might be the key factor affecting the accuracy of the classifier. For SVM, a few support vectors were needed to determine the final decision function, which means that this method was not only simple, but also had good robustness [55]. Further, compared with other supervised classification algorithms, SVM required less training data [56]. Many studies have proved the effectiveness of SVM in land classification of remote sensing data [57][58][59][60][61]. Both XGB and random forest were ensemble models based on a decision tree algorithm. There was a dependency relationship between classifiers of XGB [62,63]. For random forest, each tree was added to fit the predicted residuals. There was no dependency between classifiers and they could be parallel [64]. Therefore, the tolerance of XGB for noises was better than that of random forest, which might have resulted in the higher accuracy of XGB than that of random forest [65,66].
After hyper-parameter optimization, the accuracies of XGB and SVM were higher than that of RF. In the validation area, the model optimized by a genetic algorithm had the same accuracy as the model optimized by the grid search method, but in the transfer area, the model optimized by a genetic algorithm had higher accuracy than the model optimized by grid search method. Genetic algorithms have been widely used in combinatorial optimization and NP hard problem [35,36,67]. This is due to the practicability of genetic algorithms for combinatorial optimization problems. Genetic algorithms are evolutionary algorithms that can find the optimal solutions by imitating the selection and genetic mechanisms of nature [68], as well as the global optimal solutions of optimization problems. The algorithms were independent of the solution domain and had strong robustness [69][70][71].
In terms of training time and hyper-parameter differences, the algorithm using a genetic algorithm for hyper-parameter optimization was faster and better than that of the grid search. This advantage using a genetic algorithm would be highlighted with an increase in hyper-parameters. The grid search used the enumeration method to obtain the optimal hyper-parameter combination, whereas the genetic algorithm imitated the crossover and mutation of chromosome genes in biological evolution by means of mathematics and a computer simulation to deal with the problem. Therefore, the genetic algorithm was able to acquire optimization results better and faster than some conventional optimization algorithms in order to solve complex combinatorial optimization problems [68][69][70][71].
Comparing the extracted maps of Shanhuba and Huangjin River with their corresponding satellite images (Figures 6 and 7), the optimized SVM models showed more accurate water bodies than the others. It has been speculated that SVM is more accurate for mixed pixels [72][73][74], since the pixels at the edge of the river were all land-water mixed ones under the current resolution [75][76][77][78] and could be accurately classified by SVM [79]. SVM_GA was the best model to solve the dynamic changes of water bodies.

Deficiencies
This study attempted to obtain a dynamic distribution map of a water body using machine learning techniques and remote sensing data. Although the studied models gave satisfactory accuracy, there were still some problems that need further work. For example, the flow path of the Huangjin River has not been completely extracted, indicating that more training and testing samples might be needed. Further, only three machine learning models were used for comparison. In the future, other algorithms such as convolutional neural networks will be considered for the identification of water bodies.

Conclusions
In this study, three widely used machine learning algorithms-namely, random forest, extreme gradient boosting, and support vector machine-were evaluated for their ability to identify water surfaces using Sentinel-1 data. Grid searches and genetic algorithms were applied to optimize models' hyper-parameters, and the main findings are as follows,

1.
The optimized models performed better than the models with default hyper-parameters in both validation and transfer areas; 2.
The genetic algorithm for hyper-parameter optimization was better than the grid search, as it had a shorter time, higher model validation accuracy and transfer accuracy; 3.
The support vector machine model based on a genetic algorithm was the best model for identifying water bodies and dynamic changes in water surfaces.