RFim: A Real-Time Inundation Extent Model for Large Floodplains Based on Remote Sensing Big Data and Water Level Observations

: The real-time ﬂood inundation extent plays an important role in ﬂood disaster preparation and reduction. To date, many approaches have been developed for determining the ﬂood extent, such as hydrodynamic models, digital elevation model-based (DEM-based) methods, and remote sensing methods. However, hydrodynamic methods are time consuming when applied to large ﬂoodplains, high-resolution DEMs are not always available, and remote sensing imagery cannot be used alone to predict inundation. In this article, a new model for the highly accurate and rapid simulation of ﬂoodplains, called “RFim” (real-time inundation model), is proposed to simulate the real-time ﬂooded area. The model combines remote sensing images with in situ data to ﬁnd the relationship between the inundation extent and water level. The new approach takes advantage of remote sensing images, which have wide spatial coverage and high resolution, and in situ observations, which have continuous temporal coverage and are easily accessible. This approach has been applied in the study area of East Dongting Lake, representing a large ﬂoodplain, for inundation simulation at a 30 m resolution. Compared with the submerged extent from observations, the accuracy of the simulation could be more than 90% (the lowest is 93%, and the highest is 96%). Hence, the approach proposed in this study is reliable for predicting the ﬂood extent. Moreover, an inundation simulation for all of 2013 was performed with daily water level observation data. With an increasing number of Earth observation satellites operating in space and high-resolution mappers deployed on satellites, it will be much easier to acquire large quantities of images with very high resolutions. Therefore, the use of RFim to perform inundation simulations with high accuracy and high spatial resolutions in the future is promising because the simulation model is built on remote sensing imagery and gauging station data.


Introduction
Floods are one of the most common and harmful natural disasters in the world, having caused direct economic losses exceeding $1 trillion and killing more than 220,000 people over the last forty years [1]. The flood risk may not be reduced and may even increase in many regions of the world in the future because of climate change [2]. Knowing the real-time flood inundated extent during a flood event is an important way to respond quickly and reduce disaster impacts [3]. Great efforts have been made to study flood inundation, and the methods can be roughly divided into three types: Hydrodynamic methods, DEM-based (digital elevation model-based) inundation methods, and remote sensing methods.
Based on hydrodynamic models, hydrodynamic methods mathematically express the physical laws of flood movement and the inundation extent. From the perspective of dimension, the hydrodynamic in Section 5. The discussions are presented in Section 6. The conclusions and future work are summarized in Section 7.

Study Area
East Dongting Lake, which is connected to the Yangtze River, is the largest lake in the Donging Lake system [17]. It covers the region from approximately 28°59″ to 29°38″ and 112°43″ to 113°15″ [18], and drains an area of 1900 km 2 , including a water area of 364 km 2 [19]. The location of East Dongting Lake is shown in Figure 1. Since it is a flood basin of the Yangtze River and precipitation is imbalanced during the year, East Dongting Lake fluctuates dramatically between the wet and dry seasons, with the maximum area in August and the minimum area in January or February [20]. The elevations of East Dongting Lake are lower than 35 m in Huanghai Datum 1965, and slopes are less than 3° [21]. In recent years, under the impact of economic development and human activities, flood disasters have occurred frequently around East Dongting Lake. Hence, it is significant to predict the inundation extent in this area for flood disaster preparation and mitigation.

Data Selection
The size of East Dongting Lake is always changing throughout the year. To assess its extent as much as possible, Landsat images of the lake with a dense time series were chosen. Generally, large-scale flood inundation models are run at a 25-100 m resolution [22]. Landsat images with a 30 m resolution were chosen. In this paper, the archives of Landsat 5 Thematic Mapper (TM) and Landsat 8 Operational Land Imager (OLI) orthorectified, top-of-atmosphere reflectance images acquired between 2001 and 2016 were used. The 97 images acquired from 2002 to 2011 that were used for constructing the simulation model in the study are shown in Table 1. The images acquired from 2013 to 2016 that were used for validation or application are also shown in Table 1. Landsat 5 was launched on 1 March 1984, to collect imagery of the Earth until 5 June 2013. Landsat 8, launched on 11 February 2013, still works. Landsat 5 and 8 revisit the same scene every 16 days, with a 30 m resolution. Landsat 7 Enhanced Thematic Mapper-plus (ETM+) images were not used in the study because of the scan line corrector (SLC) failure [23], which led to the loss of more than 20% of each single scene acquired after 23 May 2003 [24]. The measurements of the water level were provided by the Chenglingji gauging station (29.41°, 113.12°), which is located at the confluence of East Dongting Lake and the Yangtze River, and is near East Dongting Lake. This gauging station provides the daily water level of East Dongting Lake. The collected daily water levels at the Chenglingji gauging station from 1 January 2001 to 31 December 2016 were used in this study.

Data Selection
The size of East Dongting Lake is always changing throughout the year. To assess its extent as much as possible, Landsat images of the lake with a dense time series were chosen. Generally, large-scale flood inundation models are run at a 25-100 m resolution [22]. Landsat images with a 30 m resolution were chosen. In this paper, the archives of Landsat 5 Thematic Mapper (TM) and Landsat 8 Operational Land Imager (OLI) orthorectified, top-of-atmosphere reflectance images acquired between 2001 and 2016 were used. The 97 images acquired from 2002 to 2011 that were used for constructing the simulation model in the study are shown in Table 1. The images acquired from 2013 to 2016 that were used for validation or application are also shown in Table 1. Landsat 5 was launched on 1 March 1984, to collect imagery of the Earth until 5 June 2013. Landsat 8, launched on 11 February 2013, still works. Landsat 5 and 8 revisit the same scene every 16 days, with a 30 m resolution. Landsat 7 Enhanced Thematic Mapper-plus (ETM+) images were not used in the study because of the scan line corrector (SLC) failure [23], which led to the loss of more than 20% of each single scene acquired after 23 May 2003 [24]. The measurements of the water level were provided by the Chenglingji gauging station (29.41 • , 113.12 • ), which is located at the confluence of East Dongting Lake and the Yangtze River, and is near East Dongting Lake. This gauging station provides the daily water level of East Dongting Lake. The collected daily water levels at the Chenglingji gauging station from 1 January 2001 to 31 December 2016 were used in this study.

Data Preparation
To pre-process the images quickly, Google Earth Engine (GEE), a free cloud-computing platform, was used in this study. The GEE provides Internet-based application programming interfaces (APIs) that enable users to process and analyze large quantities of remote sensing images at a time. Additionally, many geographical datasets could be found in the GEE, including Earth surface observations from Landsat, SPOT, MODIS, and Sentinel-1 [25]. The main part of data preparation is cloud removal. More than five hundred Landsat 5 and 8 images were acquired in the research area from 2001 to 2016, but most of them were partly or entirely covered by clouds. It is necessary to remove the clouds to make use of the acquired images. There is an API called 'simpleCloudScore', provided by the GEE, that calculates the cloud likelihood for every pixel in the range from 1 to 100 by using the normalized difference snow index (NDSI), and the brightness and temperature from the Landsat imagery [26]. The details of simpleCloudScore could be found in the GEE documentation. The simpleCloudScore is easy to operate and saves time, and some studies used the API to remove the clouds [26,27]. Therefore, we also chose to use it to remove the clouds. For this study, more than 40% of the images covered by clouds were deleted. A threshold of 20 for the cloud score was used to remove clouds based on the visual interpretation of the Landsat images [27]. After cloud removal, only 97 images were left that were suitable for further processing.

Method
The basic idea of RFim is to find the relationships between water levels and inundation extents. Following the main idea of RFim, the method of the real-time flood inundation simulation and prediction model based on remote sensing data is shown in Figure 2. The method consists of six steps: (1) Discretizing the study area and resampling the images; (2) extracting historical flood extents based on remote sensing images; (3) relating the water level values to pixels; (4) forming inundation records for every grid cell in the study area; (5) establishing the relationship between inundation extent and water level; and (6) simulating and predicting flood extent with the relationship between inundation extent and water level.  Figure 2. The flowchart of the approach in this paper.

Discretizing the Study Area and Resampling the Images
The proposed method requires pixels covering the same position to be completely coincident in different images. Therefore, it is necessary to discretize the research area, as shown in Figure 3. Then, the remote sensing images used in the experiment are resampled, using nearest-neighbor interpolation according to the results of grid cell division, as shown in Figure 4. After that, the pixels

Discretizing the Study Area and Resampling the Images
The proposed method requires pixels covering the same position to be completely coincident in different images. Therefore, it is necessary to discretize the research area, as shown in Figure 3. Then, the remote sensing images used in the experiment are resampled, using nearest-neighbor interpolation according to the results of grid cell division, as shown in Figure 4. After that, the pixels covering the same location are completely coincident in different resampled images. For example, the study area, East Dongting Lake, is divided into 1560 rows and 1907 columns to discretize the area into 30 m grid cells, the amount of which is 2,974,920 (1560 × 1907). the study area, East Dongting Lake, is divided into 1560 rows and 1907 columns to discretize the area into 30 m grid cells, the amount of which is 2,974,920 (1560 × 1907).

Extracting the Historical Flood Extents Based on Remote Sensing Images
Water extent extraction is a common topic in object classification, and experts have developed many models to successfully extract surface water, such as tasselled cap transformation, the normalized differences of water index (NDWI) [28], the modified normalized differences of water index (mNDWI) [29], and support vector machines (SVMs) [30]. In this research, the NDWI is selected to detect the water extents from the images because of its simplicity and high accuracy in large open water [31]. The NDWI separates the water from the background by the different reflections of the green band (p(Green)) and near infrared band (p(NIR)) on different objects. It should be stressed that the water extraction method is not fixed, and any method could be used if it is suitable for the experimental conditions. The mathematical model of the NDWI is as follows: Determining the threshold is one of the most important steps in using the NDWI to distinguish the study area, East Dongting Lake, is divided into 1560 rows and 1907 columns to discretize the area into 30 m grid cells, the amount of which is 2,974,920 (1560 × 1907).

Extracting the Historical Flood Extents Based on Remote Sensing Images
Water extent extraction is a common topic in object classification, and experts have developed many models to successfully extract surface water, such as tasselled cap transformation, the normalized differences of water index (NDWI) [28], the modified normalized differences of water index (mNDWI) [29], and support vector machines (SVMs) [30]. In this research, the NDWI is selected to detect the water extents from the images because of its simplicity and high accuracy in large open water [31]. The NDWI separates the water from the background by the different reflections of the green band (p(Green)) and near infrared band (p(NIR)) on different objects. It should be stressed that the water extraction method is not fixed, and any method could be used if it is suitable for the experimental conditions. The mathematical model of the NDWI is as follows: NDWI = (p(Green) − p(NIR))/(p(Green) + p(NIR)) (1) Determining the threshold is one of the most important steps in using the NDWI to distinguish water from non-water areas [32]. Unfortunately, the threshold values between water and other objects were unstable and varied with scenes and locations [33], and it is quite impractical to find the threshold for every image by hand. Otsu thresholding [34] is widely used to determine the optimal

Extracting the Historical Flood Extents Based on Remote Sensing Images
Water extent extraction is a common topic in object classification, and experts have developed many models to successfully extract surface water, such as tasselled cap transformation, the normalized differences of water index (NDWI) [28], the modified normalized differences of water index (mNDWI) [29], and support vector machines (SVMs) [30]. In this research, the NDWI is selected to detect the water extents from the images because of its simplicity and high accuracy in large open water [31]. The NDWI separates the water from the background by the different reflections of the green band (p(Green)) and near infrared band (p(NIR)) on different objects. It should be stressed that the water extraction method is not fixed, and any method could be used if it is suitable for the experimental conditions. The mathematical model of the NDWI is as follows: Determining the threshold is one of the most important steps in using the NDWI to distinguish water from non-water areas [32]. Unfortunately, the threshold values between water and other objects were unstable and varied with scenes and locations [33], and it is quite impractical to find the threshold for every image by hand. Otsu thresholding [34] is widely used to determine the optimal threshold for water detection with the NDWI [35]. From the image histogram, the Otsu method decided the ideal threshold to make the between-class variance maximum and the within-class variance minimum. Thus, Otsu thresholding was selected to determine the threshold automatically in this paper. The accuracy of water extraction this study using NDWI and Ostu thresholding is shown in Appendix A.

Relating the Water Level Values to Pixels
To find the relationship between water levels and inundation, it is necessary to relate the right pixels to corresponding water levels according to the date when the image is acquired. If there is one gauging station that can provide water level data, the procedure of relating water level values to images is illustrated in Figure 5. As shown in the right part of Figure 5, each pixel in blue with the water level value in it means that the pixel was submerged based on observations at that water level, and each pixel in white with the water level value in it means that the pixel was not submerged based on observations at that water level. Furthermore, if there are two or more gauging stations, each pixel will be related to the water level value which is calculated from water level values in the gauging stations, based on the inverse distance weighted interpolation method and the distances between gauging stations and the center of the pixel. threshold for water detection with the NDWI [35]. From the image histogram, the Otsu method decided the ideal threshold to make the between-class variance maximum and the within-class variance minimum. Thus, Otsu thresholding was selected to determine the threshold automatically in this paper. The accuracy of water extraction this study using NDWI and Ostu thresholding is shown in Appendix A.

Relating the Water Level Values to Pixels
To find the relationship between water levels and inundation, it is necessary to relate the right pixels to corresponding water levels according to the date when the image is acquired. If there is one gauging station that can provide water level data, the procedure of relating water level values to images is illustrated in Figure 5. As shown in the right part of Figure 5, each pixel in blue with the water level value in it means that the pixel was submerged based on observations at that water level, and each pixel in white with the water level value in it means that the pixel was not submerged based on observations at that water level. Furthermore, if there are two or more gauging stations, each pixel will be related to the water level value which is calculated from water level values in the gauging stations, based on the inverse distance weighted interpolation method and the distances between gauging stations and the center of the pixel. Figure 5. The procedure of relating water levels to pixels: (1) Find the water level value at the specific date when the image was acquired; (2) relate the water level value to pixels in that image, and reserve the pixel values that represent dry or wet (thus, every pixel in that image will have two kinds of data at the same time, one of which represents dry or wet, while the other represents water level).

Forming Inundation Records for Every Grid Cell in the Study Area
Based on the previous step, where all the pixels in the images have been labelled with water level values, pixels from different images but in the same grid cell that the study area is divided into are placed together to form a set of inundation information and corresponding water level values for that grid cell, as shown in Figure 6. The set of inundation information and water level values for that grid cell are called its "inundation records" in this article. After obtaining inundation records for a grid cell, we can easily determine whether the grid cell is inundated or not under the water level Figure 5. The procedure of relating water levels to pixels: (1) Find the water level value at the specific date when the image was acquired; (2) relate the water level value to pixels in that image, and reserve the pixel values that represent dry or wet (thus, every pixel in that image will have two kinds of data at the same time, one of which represents dry or wet, while the other represents water level).

Forming Inundation Records for Every Grid Cell in the Study Area
Based on the previous step, where all the pixels in the images have been labelled with water level values, pixels from different images but in the same grid cell that the study area is divided into are placed together to form a set of inundation information and corresponding water level values for that grid cell, as shown in Figure 6. The set of inundation information and water level values for that grid cell are called its "inundation records" in this article. After obtaining inundation records for a grid Remote Sens. 2019, 11, 1585 8 of 18 cell, we can easily determine whether the grid cell is inundated or not under the water level from the inundation records. In Figure 7, a schematic diagram of the inundation records, sorted by the water level value, in one grid cell is displayed, which looks like a column of cells.
a. Find the minimum value under which a grid is submerged or wet, and the maximum one under which a grid is unsubmerged or dry in the 'inundation records' of that grid. If the minimum value and the maximum value do not exist at the same level, there is no need to remove 'abnormal records', or in other words, there is no 'abnormal record'; b. Compare the minimum and the maximum value which are found in step a. If the minimum value under which the grid is submerged is lower than the maximum one under which the grid is unsubmerged, remove these 'abnormal records' from the 'inundation records' of that grid. If the minimum is larger than the maximum, there is no 'abnormal record', and so we do not need to remove 'abnormal records'; c. Repeat step a and step b on the rest of 'inundation records' of that grid in step b.

Figure 6.
Relating water levels to images and forming inundation records for every grid cell in the study area.  Ideally, the minimum value of the water level under which the grid cell is inundated is always larger than the maximum value of the water level where the grid cell is not flooded. However, in some cases, the minimum value under which the grid cell is submerged is lower than the maximum value under which the grid cell is unsubmerged, as shown in the middle of Figure 8, due to errors when detecting water. Similar to other classification methods, the NDWI could not separate water from no-water with 100% accuracy. This means that a grid cell that is not submerged may be classified as an inundation area when extracting the water surface from an image because of the error associated with detecting water. When this error occurs, the minimum value or maximum value with the corresponding inundation information is called an "abnormal record" in the article. To prevent such anomalies from affecting the subsequent steps, these abnormal records should be removed from the inundation records. Inundation records: The cell in blue with a water level value means that the grid cell is submerged from an observation under that water level value, and the cell in white with a water level means that the grid cell is observed to be unsubmerged under that water level.

Establishing the Relationship between the Inundation Extent and Water Level
The essence of establishing the relationship between flood extent and water level is to find out the threshold for every grid cell that determines whether it would be submerged when compared to the water level.
It is obvious from Figure 7 that the inundation threshold of a grid cell is the value between the lowest water level under which the grid cell is submerged and the highest water level under which the grid cell is unsubmerged. However, any value in that range might be the true threshold. We choose the mean value of the lowest water level under which the grid cell is submerged plus the highest level under which the grid cell is unsubmerged. Although the mean value might not equal the exact threshold, it would be near the true threshold when the observations are dense enough to make that range small. The impacts of the mean value will be explained in the analysis section. The equation for calculating the threshold is as follows: The procedure that determines abnormal records is as follows: a. Find the minimum value under which a grid is submerged or wet, and the maximum one under which a grid is unsubmerged or dry in the 'inundation records' of that grid. If the minimum value and the maximum value do not exist at the same level, there is no need to remove 'abnormal records', or in other words, there is no 'abnormal record'; b. Compare the minimum and the maximum value which are found in step a. If the minimum value under which the grid is submerged is lower than the maximum one under which the grid is unsubmerged, remove these 'abnormal records' from the 'inundation records' of that grid. If the minimum is larger than the maximum, there is no 'abnormal record', and so we do not need to remove 'abnormal records'; c. Repeat step a and step b on the rest of 'inundation records' of that grid in step b.

Establishing the Relationship between the Inundation Extent and Water Level
The essence of establishing the relationship between flood extent and water level is to find out the threshold for every grid cell that determines whether it would be submerged when compared to the water level.
It is obvious from Figure 7 that the inundation threshold of a grid cell is the value between the lowest water level under which the grid cell is submerged and the highest water level under which the grid cell is unsubmerged. However, any value in that range might be the true threshold. We choose the mean value of the lowest water level under which the grid cell is submerged plus the highest level under which the grid cell is unsubmerged. Although the mean value might not equal the exact threshold, it would be near the true threshold when the observations are dense enough to make that range small. The impacts of the mean value will be explained in the analysis section. The equation for calculating the threshold is as follows: where u is the lowest water level under which the grid cell is submerged and b is the highest water level under which the grid cell is unsubmerged.

Simulating and Predicting Flood Extent with the Relationship between Inundation Extent and Water Level
After calculating threshold values for each grid cell in the inundating threshold part, the inundation extent could be predicted easily by comparison with the water level on the desired date, as shown in Figure 9.
where u is the lowest water level under which the grid cell is submerged and b is the highest water level under which the grid cell is unsubmerged.

Simulating and Predicting Flood Extent with the Relationship between Inundation Extent and Water Level
After calculating threshold values for each grid cell in the inundating threshold part, the inundation extent could be predicted easily by comparison with the water level on the desired date, as shown in Figure 9.

Results
The remote sensing images were divided into two parts: One part was used for modelling acquired from 1 January 2002 to 31 December 2011 by calculating the water level threshold in every grid cell, and the other part was used for validation or application acquired from 1 January 2013 to 31 December 2016.
Due to the location of East Dongting Lake and data losses when removing clouds in images, at least two images were required to cover the study area. Hence, "composite images" were used for validation in our study, and composite image in this article refers to mosaicking several images. We  Figure 10. The accuracies and kappa coefficients were determined using a cell-to-cell comparison strategy between the predictions and observations. The accuracies and kappa coefficients are shown in Table 2. The flooded area obtained from Landsat imagery was regarded as the real and correct flood extent. The prediction accuracy is equal to the percentage of cells correctly predicted by the approach, and the formulation is shown in Equation (3):

Results
The remote sensing images were divided into two parts: One part was used for modelling acquired from 1 January 2002 to 31 December 2011 by calculating the water level threshold in every grid cell, and the other part was used for validation or application acquired from 1 January 2013 to 31 December 2016.
Due to the location of East Dongting Lake and data losses when removing clouds in images, at least two images were required to cover the study area. Hence, "composite images" were used for validation in our study, and composite image in this article refers to mosaicking several images.  Figure 10. The accuracies and kappa coefficients were determined using a cell-to-cell comparison strategy between the predictions and observations. The accuracies and kappa coefficients are shown in Table 2. The flooded area obtained from Landsat imagery was regarded as the real and correct flood extent. The prediction accuracy is equal to the percentage of cells correctly predicted by the approach, and the formulation is shown in Equation (3): where w is the number of correct water pixels, n is the number of correct non-water pixels, and s is the number of pixels in the study area. Prediction accuracy = (w + n)/s where w is the number of correct water pixels, n is the number of correct non-water pixels, and s is the number of pixels in the study area.

Application
The model is easy to set up for being used by users with relatively little hydraulic modelling experience. This model could simulate real-time flood extent, but also reproduce the flood extents in the past when the images on the floodplain were not available. Therefore, inundation extent on every day of 2013 was simulated based on the model and water level values collected during that period. The daily water level variations from 1 January 2013 to 31 December 2013 are shown in Figure 11, and the daily variations in the inundation area from 1 January 2013 to 31 December 2013 in East Dongting Lake are shown in Figure 12.
From Figure 12, it is clear that the variations in the inundation area during 2013 exhibit a first-rise-and-then-fall pattern that is similar to that for the changes in water level during 2013.

Application
The model is easy to set up for being used by users with relatively little hydraulic modelling experience. This model could simulate real-time flood extent, but also reproduce the flood extents in the past when the images on the floodplain were not available. Therefore, inundation extent on every day of 2013 was simulated based on the model and water level values collected during that period. The daily water level variations from 1 January 2013 to 31 December 2013 are shown in Figure 11, and the daily variations in the inundation area from 1 January 2013 to 31 December 2013 in East Dongting Lake are shown in Figure 12.
From Figure 12, it is clear that the variations in the inundation area during 2013 exhibit a first-rise-and-then-fall pattern that is similar to that for the changes in water level during 2013. The largest inundation area in 2013 was 1069.8 km 2 when the water level at Chenglingji station was 29.83 m, and the lowest in 2013 was 9.1 km 2 when the water level was 20.41 m. The variations in the inundation extent every four days from 5 May 2013 to 21 May 2013 are displayed in Figure 12.

Discussions
Among the predictions during the different time periods, there was the best agreement between the inundation prediction and the observation for the composite image from 1 July 2014 to 30 September 2014. Up to 96% of the cells or grid cells were predicted correctly based on the composite from 1 July 2014 to 30 September 2014, and the kappa coefficient was approximately 0.92, which represents almost perfect agreement. The worst performance of the approach was the prediction on the composite image from 1 January 2016 to 31 March 2016, with 93% of the cells predicted correctly, and the kappa coefficient was approximately 0.67, which represents substantial agreement. Although 93% was the lowest accuracy compared to those of the other three predictions, it was acceptable in the flood simulation. The accuracies of the four predictions were all approximately 94%, and the kappa coefficients ranged from 0.67 to 0.91.
There is a main reason that could explain why the accuracy of prediction from 1 January 2016 to 31 March 2016 was relatively low. The number of remote sensing images with good quality during that period was small compared to those in the other three time periods. The number of images during each month used to build the prediction model is shown in Figure 13. From 2001 to 2011, only 11 images, which were acquired from January to March, were considered good observations and used for modelling. The fewer the number of good observations that are acquired, the less the historical water extent can be accessed. The shortage of good observations made the model perform

Discussions
Among the predictions during the different time periods, there was the best agreement between the inundation prediction and the observation for the composite image from 1 July 2014 to 30 September 2014. Up to 96% of the cells or grid cells were predicted correctly based on the composite from 1 July 2014 to 30 September 2014, and the kappa coefficient was approximately 0.92, which represents almost perfect agreement. The worst performance of the approach was the prediction on the composite image from 1 January 2016 to 31 March 2016, with 93% of the cells predicted correctly, and the kappa coefficient was approximately 0.67, which represents substantial agreement. Although 93% was the lowest accuracy compared to those of the other three predictions, it was acceptable in the flood simulation. The accuracies of the four predictions were all approximately 94%, and the kappa coefficients ranged from 0.67 to 0.91.
There is a main reason that could explain why the accuracy of prediction from 1 January 2016 to 31 March 2016 was relatively low. The number of remote sensing images with good quality during that period was small compared to those in the other three time periods. The number of images during each month used to build the prediction model is shown in Figure 13.  Apart from the number of remote sensing images, three factors affected the model performance.
The first factor was the failure to extract a narrow water body when using the NDWI. Although the difference between water and other objects could be enhanced by the NDWI, its formulation is inherently sensitive to imagery noise [36]. In addition, mixed water pixels usually appear in narrow rivers and shallow water [37]. This made it difficult to delineate the narrow water bodies. The lake expanded and merged many narrow rivers and small ponds into a large single unity during the flood season, which reduced the narrow water bodies that were hard to detect using the NDWI. During the dry season, the lake shrank, and the narrow water bodies appeared, which led to much confusion regarding water extraction. Then, the results of water delineation, with failure to detect narrow water bodies, were applied to simulate the inundation extent and violated the predictions to some degree. Since the water extraction in the images during the wet season did not interfere with the small water surface, the inundation simulation from 1 July 2014 to 30 September 2014 had the highest accuracy in the experiment.
The second factor that had a negative effect on the model performance was including only one gaging station, the Chenglingji water station, for the measurements of water stages. Although the Chenglingji water station could represent the daily water level of East Dongting Lake using a single value, the height of the whole water surface actually varied slightly from place to place, especially when many water bodies shrank and separated from the lake during the dry season.
The third factor was using the mean value to represent the real threshold in each grid cell. This impacts the performance of RFim, as the mean value and the real threshold are not equal in every grid cell. If the gap, which is the difference between the lowest water level under which the grid cell is submerged and the highest level under which the grid cell is unsubmerged, is small, RFim is likely to obtain the mean value equivalent the real threshold in that grid cell. The grid cells were collected, of which corresponding thresholds were calculated from water level values from July to September, and the distribution of the gaps from these grid cells is shown in Figure 14a. Additionally, the distribution of the gaps from which the corresponding thresholds were calculated from the water levels from January to March is shown in Figure 14b. From these figures, the gaps for the water levels from July to September were smaller than the gaps for the water levels from January to March. Thus, this observation can partly explain why the prediction from 1 July 2014 to 30 September 2014 was better than the prediction from 1 January 2016 to 31 March 2016. Apart from the number of remote sensing images, three factors affected the model performance.
The first factor was the failure to extract a narrow water body when using the NDWI. Although the difference between water and other objects could be enhanced by the NDWI, its formulation is inherently sensitive to imagery noise [36]. In addition, mixed water pixels usually appear in narrow rivers and shallow water [37]. This made it difficult to delineate the narrow water bodies. The lake expanded and merged many narrow rivers and small ponds into a large single unity during the flood season, which reduced the narrow water bodies that were hard to detect using the NDWI. During the dry season, the lake shrank, and the narrow water bodies appeared, which led to much confusion regarding water extraction. Then, the results of water delineation, with failure to detect narrow water bodies, were applied to simulate the inundation extent and violated the predictions to some degree. Since the water extraction in the images during the wet season did not interfere with the small water surface, the inundation simulation from 1 July 2014 to 30 September 2014 had the highest accuracy in the experiment.
The second factor that had a negative effect on the model performance was including only one gaging station, the Chenglingji water station, for the measurements of water stages. Although the Chenglingji water station could represent the daily water level of East Dongting Lake using a single value, the height of the whole water surface actually varied slightly from place to place, especially when many water bodies shrank and separated from the lake during the dry season.
The third factor was using the mean value to represent the real threshold in each grid cell. This impacts the performance of RFim, as the mean value and the real threshold are not equal in every grid cell. If the gap, which is the difference between the lowest water level under which the grid cell is submerged and the highest level under which the grid cell is unsubmerged, is small, RFim is likely to obtain the mean value equivalent the real threshold in that grid cell. The grid cells were collected, of which corresponding thresholds were calculated from water level values from July to September, and the distribution of the gaps from these grid cells is shown in Figure 14a. Additionally, the distribution of the gaps from which the corresponding thresholds were calculated from the water levels from January to March is shown in Figure 14b. From these figures, the gaps for the water levels from July to September were smaller than the gaps for the water levels from January to March. Thus, this observation can partly explain why the prediction from 1 July 2014 to 30 September 2014 was better than the prediction from 1 January 2016 to 31  The new approach described in the paper could determine the flood extent in a large floodplain thanks to the wide spatial coverage and short revisit cycle of remote sensing images. It successfully simulated the inundation extent in the approximately 2000 km 2 region in this study. Furthermore, hydrodynamic models need detailed parameters as inputs, which may be not available. Since the data for this approach are temporally continuous and easy to access, the new method could simulate the flooded area in real time if the real-time water level was given.
If the first five steps have been finished and the inundation thresholds of the grid cells are already known, it will not take much time to obtain a real-time inundation simulation according to the real-time measurements of the water stage in the final part. The final part of the approach is, essentially, a simple matrix operation that is just comparing the elements in the matrix with the given value. Real-time inundation simulation can be achieved with this method.
This method does have the capabilities to use remote sensing big data with huge volume and big complexity, although only Landsat archived data were used in this article. If the user intends to apply this method with remote sensing big data, the inundation simulation can be made through the same six steps of the method, which were illustrated in this article. The only adjustment of the method when using remote sensing big data is the way of extracting water bodies according to the resolutions and wave bands of images.
From the application section, the method also shows that it could simulate real-time flood extent, and also reproduce the flood extents from the past when the images on the floodplain were not available.
With the increasing number of Earth observation satellites equipped with higher-resolution mappers, the ability to predict the real-time inundation extent at higher spatial resolutions by using remote sensing imagery and in situ measurements of water height is promising. Over the last twenty years, an increasing number of high-resolution sensors have been operating in space. For example, Ikonos, Worldview, Quickbird, and RapidEye have the ability to provide imagery at the metre or submetre level.
There are some drawbacks of the new method: First, it could not model flood movement. However, if the flood extent is predicted at a daily time step, flood movement is relatively unimportant on such a temporal scale.
Second, the new method may not predict the inundation area correctly under water levels higher than the historical maximum water level, or lower than the minimum. The RFim is built on images and the corresponding water levels among the acquired dates of the images. Therefore, the range of water levels under which RFim could simulate the corresponding inundation area is limited. If a flood occurred with a water level higher than the historical maximum level, the model could not perform well. The new approach described in the paper could determine the flood extent in a large floodplain thanks to the wide spatial coverage and short revisit cycle of remote sensing images. It successfully simulated the inundation extent in the approximately 2000 km 2 region in this study. Furthermore, hydrodynamic models need detailed parameters as inputs, which may be not available. Since the data for this approach are temporally continuous and easy to access, the new method could simulate the flooded area in real time if the real-time water level was given.
If the first five steps have been finished and the inundation thresholds of the grid cells are already known, it will not take much time to obtain a real-time inundation simulation according to the real-time measurements of the water stage in the final part. The final part of the approach is, essentially, a simple matrix operation that is just comparing the elements in the matrix with the given value. Real-time inundation simulation can be achieved with this method.
This method does have the capabilities to use remote sensing big data with huge volume and big complexity, although only Landsat archived data were used in this article. If the user intends to apply this method with remote sensing big data, the inundation simulation can be made through the same six steps of the method, which were illustrated in this article. The only adjustment of the method when using remote sensing big data is the way of extracting water bodies according to the resolutions and wave bands of images.
From the application section, the method also shows that it could simulate real-time flood extent, and also reproduce the flood extents from the past when the images on the floodplain were not available.
With the increasing number of Earth observation satellites equipped with higher-resolution mappers, the ability to predict the real-time inundation extent at higher spatial resolutions by using remote sensing imagery and in situ measurements of water height is promising. Over the last twenty years, an increasing number of high-resolution sensors have been operating in space. For example, Ikonos, Worldview, Quickbird, and RapidEye have the ability to provide imagery at the metre or submetre level.
There are some drawbacks of the new method: First, it could not model flood movement. However, if the flood extent is predicted at a daily time step, flood movement is relatively unimportant on such a temporal scale.
Second, the new method may not predict the inundation area correctly under water levels higher than the historical maximum water level, or lower than the minimum. The RFim is built on images and the corresponding water levels among the acquired dates of the images. Therefore, the range of water levels under which RFim could simulate the corresponding inundation area is limited. If a flood occurred with a water level higher than the historical maximum level, the model could not perform well.
Third, there should be at least one gauging station or other device which could measure and represent the water levels of rivers or lakes in the target area.
Fourth, the new method works well only if the topography of the study area is relatively stable. If the terrain changes greatly, the inundation extent will be different under the same water level, which will affect the performance of the model. If many other satellites are selected, such as SPOT, Ikonos, and GF-1, more information about water level and inundation extent can be obtained in a short period, during which the terrain is likely to be relatively stable, avoiding the impact of topographic changes in the study area. Fifth, if the water level data are scarcer, the performance will be affected. Scarcer water level data mean that fewer water levels could be related to flood extents according to whether the water level and the flood extent were acquired at the same date, which is necessary to establish the relationship between flood extent and water level.

Conclusions and Further Work
In this paper, a new approach was proposed for flood extent simulation and prediction. The new approach tries to find a relationship between remote sensing big data and in situ data to build a real-time flood prediction model, called RFim, taking advantage of the wide spatial coverage and high resolution of remote sensing images, and the continuous temporal coverage and easy accessibility of in situ observations. RFim was validated in East Dongting Lake. The prediction accuracy was approximately 94%, and the kappa coefficients ranged from 0.67 to 0.91. With an increasing number of Earth observation satellites operating in space and equipped with high-resolution mappers, the approach in this study has great potential for real-time flood simulation, since RFim is based on remote sensing big data.
There are some points that need to be considered in future work. Firstly, how to balance the cost of using remote sensing big data and the performance of the model needs to be considered. Using large quantities of images can improve the model performance, but the performance may be improved only a little with a heavy price of computational resources. Secondly, how to reduce the negative effect from scarcer water level data needs to be investigated. This may solve the problems of finding and establishing more relationships between other factors and the flood extent. That is, when water level data cannot be accessed, other factors can be options as inputs to simulate flood extent. Thirdly, how to extend the model capabilities to predict not just flood extent, but also flood duration and water volume, is worth exploring.
Author Contributions: Z.C. and J.L. conceived and designed the research, processed the data, and wrote the manuscript. N.C. conducted the fieldwork and reviewed the manuscript. R.X. and G.S. contributed materials.
Funding: This work was supported by the National Natural Science Foundation of China (nos. 41771422, 41890822).

Conflicts of Interest:
The authors declare no conflict of interest.

Water Extraction Accuracy Using NDWI
Here, we will calculate the accuracy of water extraction for the composite image from 1 July 2014 to 30 September 2014. Since it was impossible to verify more than 100 images, the composite image from 1 July 2014 to 30 September 2014 was selected for verification. There were two purposes for choosing the composite image from 1 July 2014 to 30 September 2014 for verification: One was to show that it is reasonable to use NDWI to detect water; and the other was that water extraction using NDWI can be treated as the real water distribution, which will be compared with the results of the flood simulation.
The method of verification is as follows: a. Select 250 samples randomly from the image; b. By visual object interpretation, identify the classes of the 250 samples; c. Compare the samples that have been identified with the results of water extraction based on the image, and then calculate the accuracy of extraction.
The accuracy of the composite image from 1 July 2014 to 30 September 2014 is 98.00%, with 130 samples classified correctly as water, 115 classified correctly as non-water, and only five classified incorrectly.