Mapping of Flood Areas Using Landsat with Google Earth Engine Cloud Platform

: The Earth Observation (EO) domain can provide valuable information products that can signiﬁcantly reduce the cost of mapping ﬂood extent and improve the accuracy of mapping and monitoring systems. In this study, Landsat 5, 7, and 8 were utilized to map ﬂood inundation areas. Google Earth Engine (GEE) was used to implement Flood Mapping Algorithm (FMA) and process the Landsat data. FMA relies on developing a “data cube”, which is spatially overlapped pixels of Landsat 5, 7, and 8 imagery captured over a period of time. This data cube is used to identify temporary and permanent water bodies using the Modiﬁed Normalized Difference Water Index (MNDWI) and site-speciﬁc elevation and land use data. The results were assessed by calculating a confusion matrix for nine ﬂood events spread over the globe. The FMA had a high true positive accuracy ranging from 71–90% and overall accuracy in the range of 74–89%. In short, observations from FMA in GEE can be used as a rapid and robust hindsight tool for mapping ﬂood inundation areas, training AI models, and enhancing existing efforts towards ﬂood mitigation, monitoring, and management.


Introduction
Water-Related Disasters (WRD), such as cyclones, floods, and droughts, account for 90% of natural disasters. Since the year 2000, over 5300 WRD have been reported, with over 325,000 fatalities and an economic loss exceeding USD 1.7 trillion globally [1]. Floods account for approximately 54% of all WRD [2]. Since the beginning of 2020, in South Asia alone, floods impacted over 17.5 M people, caused over 1000 deaths, and an economic loss of billions of dollars [3].
Predicting the flood inundation extent of intense and extreme weather events is of critical importance [4]. The extent of flooding is defined by how the river channel overflows and spreads across the surrounding topography [5]; this overflow is a complex function of intense precipitation and generation of runoff and its accumulation and flows in streams [6]. Models are available with varying complexities to compute the extent of flooding for crosssections (1D) [7], cells (2D) [8], and planes (3D) [9]. However, calibrating and running these complex models to map flood inundation extents at the national level is a resource and timeintensive exercise [10]. In Canada, for example, it is expected to take one decade and USD 350 M to update national flood inundation extent maps [11]. In addition, the availability of flood inundation extent and flood risk maps in most developing countries are limited, and existing those are out-of-date and have poor temporal and spatial resolution [12].
A global survey of Flood Early Warning Systems (FEWS) conducted by the United Nations University Institute for Water Environment and Health (UNU-INWEH) shows that the majority of flood forecasting centers in flood-prone countries lack the ability to run complex flood forecasting models to improve the spatial coverage of FEWS and generate flood inundation extents [13].
Two recent developments in the Earth Observation (EO) domain can significantly reduce the cost to map flood extents and improve the accuracy of the flood mapping and monitoring systems. The first development is the open access to data from operational satellites such as Landsat and Sentinel. This has enabled the mapping of the various natural phenomenon at a relatively high spatial and temporal resolution [14]. The United States Geological Survey made their entire Landsat archive available to the public in 2008. The Landsat archive contains more than three decades of earth observation images and provides a unique opportunity to monitor changes in surface water at high spatial and temporal resolutions. Second, the wider availability and adoption of cloud computing architectures to process EO data. Technologies and models like high-performance computing, data cube, and analysis-ready data allow accelerated access and processing of a large volume of EO data [15].
The Google Earth Engine (GEE), a planetary-scale platform for earth science data and analysis developed by Google has enabled the development of global-scale products, tools, and services using temporal EO data such as Landsat [16]. GEE has been used to conduct various global and regional scale studies, including regional land cover mapping [17], surface water mapping [18], accessing food security situations [19], settlement and population mapping [20], and other applications [21].
In this study, we present an innovative Flood Mapping Algorithm (FMA), which harnesses the power of cloud computing (GEE) EO data (Landsat) to generate historical global flood inundation extents at 30 m resolution. Previous relevant initiatives include an online tool launched in 2012 by the International Water Management Institute that maps significant floods in South Asia from 1980 to 2011 at 500 m resolution [22], the European Commission Joint Research Centre's online tool launched in 2016 that provides free access to global surface water indices [23], SERVIR-Mekong's flood analysis tool for Myanmar [18], GEE4Flood [24], a Sentinel-1 and Landsat data-based rapid and robust flood monitoring tool [25] and, automatic extraction of flood-prone areas using digital elevation model based geomorphic approaches [26].
Our approach improves these available models and tools by enhancing the flood inundation extent's resolution to 30 m and generating flood inundation extents at a global scale. FMA is developed as a hindsight tool to generate historical flood inundation extents covering areas where the data and information gaps are prominent and annual losses due to floods are high. The inundation and flood risk maps are out-of-date in most of the Global South [6]. Developing these maps using conventional techniques is a costly exercise for developing countries [11]. FMA addresses this data gap.
Machine learning approaches provide faster and accurate possibilities for flood detection [27]. However, these models have high computational and data requirements. FMA generates valuable training data in the form of historical flood inundation maps for machine learning algorithms.
However, because of FMA's dependency on a dense data cube, it cannot be used to map and monitor current floods until a sufficient number of satellite imagery is available for the inundated area.
The rest of the paper is organized as follows. Section 2 describes the study area, details of data sources, and the research methodology adopted for the study. Section 3 presents the results, and in Section 4, the discussion is presented. Finally, the conclusions of the study are summarized in Section 5.

Methodology
FMA leverages the processing power and analysis-ready data available on Google Earth Engine (GEE) to process Landsat satellite imagery from 1984 till present to map flood inundation extents. GEE was launched in 2010 by Google; it is a cloud-based platform that enables analysis and visualization of analysis-ready big Earth data sets. The GEE JavaScript API was used to develop the FMA.
The FMA relies on developing a "data cube", a stack of spatially overlapped pixels of Landsat 5, 7, and 8 imagery captured over time. This data cube is used to identify temporary and permanent water bodies using the Modified Normalized Difference Water Index (MNDWI) and site-specific HAND data. "Permanent" water refers to distinct rivers and bodies of water, while "temporary" water indicates areas that were inundated during a flooding event. The algorithm is divided into five modules, i.e., (1) image extraction, (2) cloud filter, (3) band mapping, (4) water classification and, (5) HAND and NDVI masking. The conceptual framework of the flood mapping algorithm is shown in Figure 1.

Image Extraction
The first module of the FMA creates a subset of the GEE Landsat data cube based on the spatial extent of the area of interest and temporal extent. The area of interest is manually defined by creating a bounding box covering the reference flood inundation extents. The temporal extent is determined based on the duration of the flood; 45 days are added before and after the flood event to capture the presence or absence of surface water.

Cloud Filter
The data cube subset is refined in this module to remove the cloud cover and cloud shadow pixels. The annual average cloud cover on Earth is about 66%, which presents a significant challenge for temporal analysis. In FMA, the cloud cover on the image is removed using Google cloudScore, a multi-temporal image-based approach for cloud masking [28,29]. cloudScore uses clouds' spectral and thermal properties to identify and remove these artifacts from the image data; it identifies a pixel as a cloud if it is bright and cold and does not share the spectral properties with snow. A cloudScore greater than 0.2 for a pixel shows that the pixel is a cloud. FMA removes these cloud pixels from the analysis.

Band Mapping
FMA uses blue, green, red, NIR, SWIR1, and SWIR2 to compute NDVI and MNDWI. These bands are numbered differently in Landsat 5, 7, and 8, e.g., blue is Band 1, Landsat 5 and 7, and Band 2 in Landsat 8. In this module, these bands are mapped to standard name, which allows for readable and easy access to bands and the ability to perform band math.

Water Classification
The majority of the algorithms detecting water using satellite imagery are based on the fact that water absorbs radiation at near-infrared wavelengths and beyond. This allows for the detection of open water features using a spectral index like the Normalized Difference Water Index [30]. In 2006, MNDWI was introduced as a more sensitive index to detect open water bodies [31]. FMA uses MNDWI to extract open water features from the Landsat imagery. MNDWI uses green and SWIR bands for the enhancement of open water features. MNDWI also diminishes built-up features that are often correlated with open water in other surface water mapping indices [32]. MNDWI is calculated using Equation (1).
The initial water mask is detected using the percentile images of reflectance. The percentile images are extracted from the Landsat image collection for the duration of the flood. All the Landsat images available for the duration of the flood form the image collection. "ee.Reducer.percentile" function in GEE is used to compute the percentile value of each pixel per band in the image collection. Developing water mask using percentile images have been used in other studies [33,34] and is was found to be suitable for when compared with other methods [35]. A lower percentile value is used to detect temporary water bodies, including floods, seasonal ponds, paddy fields, and the higher percentile value for permanent water bodies.

HAND and NDVI Masking
The accuracy of FMA is improved by masking MNDWI's output with the elevation data, assuming that water does not flow on a steep hilly area and most of the permanent and temporary water bodies are concentrated in local valleys. Using elevation data to extract the drainage network is widely used in hydrological applications [36]. HAND (Hand Above the Nearest Drainage) available on GEE is used as the elevation data in FMA because of its higher accuracy to separate areas where water can occur from those where it is unlikely that surface water occurs [37]. Additional steps include using the Normalized Difference Vegetation Index as a mask with a very high threshold to exclude very dark vegetated areas, correction on hill shadows and snow/ice. NDVI is calculated using Equation (2).

Datasets
Optical images were used to map the spatial extents of the flood events. In addition, Landsat 5, 7, and 8 images available before, during, and after the flood were utilized. Table 1 shows the temporal coverage of the Landsat images used. Landsat images obtained from the GEE platform are already preprocessed and calibrated for the top of the atmosphere reflectance with a pixel size of 30 × 30 m. The Height Above the Nearest Drainage (HAND) is used as the elevation data [37]. HAND is a digital elevation model normalized using the nearest drainage; it is used for hydrological applications, such as flood hazard and risk mapping, land use classification, and surface water mapping. Global HAND data available on GEE is used with a pixel size of 30 × 30 m [38].

Study Areas
This study focuses on the significant floods, as shown in Table 2. These flood events are selected based on the flood intensity and availability of flood inundations maps as geospatial data, which could be used for validation purposes.

Thailand, Pathumthani, and Bangkok Flood Event
In 2011 Thailand had record highs for rainfall during the monsoon season, immediately followed by four tropical storms that hit the country's north. This combined load on the already saturated and slow draining catchments was too much for the system to handle, and river banks burst. In later days of the floods, upstream dam releases further exacerbated the issue. It led to a flooding situation that persisted for over 150 days and caused approximately THB 30 B in economic losses and THB 12 B in insured losses. Over 30,000 km 2 were inundated, with 65 of the country's 77 provinces affected [46].

Validation Approach
The flood events used in this study happened at a regional level and covered a large spatial area. Thus, making it extremely challenging to collect the field data for validation. Therefore, the validation is conducted by comparing the computed data with ground truth data published by various national and regional space agencies and flood monitoring labs, as listed in Table 2. The performance indices used to validate the FMA are listed in the Table 3.

Results
This section uses the 2019 Red River floods for visualization purposes. The reference flood extent is shown in red, permanent water bodies in the dark blue, and flood inundation extent mapped by FMA in light blue.

Data Cube Density
A function is developed in the GEE platform to report on the number of Landsat scenes used to create the data cube for a particular flood event. The FMA requires a minimum of 40-50 Landsat scenes to detect the inundated area with stable accuracy, as shown in Figure 2. During the analysis, the start and end date of the flood event was adjusted to ensure the required number of Landsat scenes are available to construct the data cube.

MNDWI, NDVI, and HAND
Percentiles are used to identify the temporary water and the permanent water; 10 percentile is used for temporary water and 40 percentile for the permanent water. Figure 3a shows that the lower percentile detects temporary water bodies, and the higher percentile value detects permanent water bodies. However, the percentile above 40 detect features other than water, as shown in Figure 3e,f. MNDWI is calculated using the percentile images. Figure 4 shows the MNDWI calculated using 10 percentile and 40 percentile images. The temporary and permanent water layers derived from MNDWI are combined in a single 'water' layer for a particular flood event. FMA uses a very high threshold (0.4) value of NDVI index to exclude very dark vegetated areas, correction on hill shadows, snow, and ice; this mask is created with an assumption that land with dark vegetation is not flooded. Figure 5 shows the NDVI mask in green and HAND mask in black for the 2019 Red River flood.

Validation
The results obtained from FMA for the flood events are tabulated in Tables 4 and 5. The TPR value ranges from 0.71-0.90, indicating that FMA is correctly detecting a high number of flood pixels in the ground truth data as flood. A higher TPR value is observed for flood events in rural areas and having land covers such as rural built-up, rural agriculture, open spaces, and wetland. Queensland flood event has the highest TPR and accuracy as the FMA can detect the flood pixels because of homogenous land cover and absence of tree cover, and large urban structures covering the flood water as observed from space.  The same pattern of higher accuracy (84-89%) and TPR (84-90%) is observed for Red River, Bihar, and Malawi flood events. Figure 6a shows the Red River flood extent used as ground truth data, and Figure 6b shows the flood inundation extent mapped by FMA, a good overlap can be observed between the ground truth data and FMA output with few horizontal and vertical strips not detected as flood by the FMA. These vertical and horizontal stripes are the road network which the ground truth labeled as flooded. The false negative is detected at the flood boundary areas, as shown in Figure 6e; this could be because of the overestimation of the reference data or underestimation by the FMA. In some flood events, the false negative was randomly distributed or was concentrated in urban areas. Soon, a study will be conducted using reference data with higher accuracy supported by ground surveys. As shown in Figure 6d, the false positives are detected outside the boundary of the reference flood map; the behavior is observed for other flood events as well.
As shown in Figure 6c, it is observed that the FMA detects flood with an accuracy that can be used to develop flood monitoring and mapping applications discussed in Section 4.
The accuracy and TPR are lower in areas with a land cover like rural built-up and rural agriculture. Several horizontal and vertical lines can be seen in Figure 6b, which are not detected as flood by FMA. These lines are either road networks, or tree canopy cover. This trend in FMA is also observed for other flood events in the urban environment, including Colombo, Dhaka, Phnom Penh, and Bangkok flood events. However, it is worth noting that despite the underperformance of FMA in the urban environment, the accuracy ranges from 0.74-0.78 for these flood events, which is good enough to develop flood mapping and monitoring applications discussed in Section 4.

Discussion
In the last decade, the floods have caused an economic loss of nearly USD 500 B, equal to Singapore's GDP. Around 1.47 B people are exposed to the risk of intense flooding, which is more than Europe's total population. The majority of this population segment lives in low and low-middle-income countries, exponentially increasing the disaster-driven socio-economic risk [47].
Developing FEWS and historical flood and risk maps are the two primary approaches to address the food-related challenges. However, developing and deploying such systems in Global South is an expensive and time-consuming exercise. The resource intensiveness of these solutions can be gauged from the fact that if Canada had to update its historical flood maps, it would cost USD 350 M and one decade to complete the exercise [48].
FMA has been designed to address the data and information gaps and challenges in the Global South. This section identifies the possible application areas for FMA; this is not an exhaustive list, and more application areas exist.

Land Use
Using the FMA in countries with poor or no flood risk maps can help add a new level of robustness to land use planning in such places. Identifying those locations which are at risk of flooding can allow for optimization of resources and investment such as upgrading infrastructure, developing agriculture, etc., in the short term. In the medium to longer-term, land-use planning can benefit from knowledge on where detailed flood studies are required if growth is to be sustained in a given location.

Emergency Services
Metrics such as investment in public infrastructure, natural disaster-related insurance rates, flood-related human and economic losses, etc., can be estimated using FMA. The flood inundation extents are also available as an interoperable service that can be merged with other open datasets for decision making.
In developing countries and emerging economies, data collection of hydrological and other environmental variables is rarely a priority. This activity is often compounded by difficult terrain or cost barriers to technology and tools [49,50]. EO data is a simple solution to this problem in the short term, allowing those most vulnerable but poorly gauged locations to be observed. From these observations, low-resolution flood risk models can be generated. This will enable governments, funding agencies, and disaster management authorities to hone-in on the highest potential risk locations and generate higher resolution models. In addition, this methodology allows users to justify investment in higher resolution models around high-risk settlements and assets. In time this strengthens the ability of these countries to better plan for and respond to disasters.

Insurance
The use and value of remotely sensed risk information in the insurance industry is not new [51,52]. There were significant losses/damage to property and agricultural concerns in several of the flood events above. In the case of Queensland, Australia, huge insurance payouts had to be made. In poorer nations such as Cambodia and Bangladesh, this safety blanket did not exist, and many livelihoods and homes were permanently lost due to flood events. FMA could be part of the conversation between governments/international donor organizations and the insurance industry generating agriculture insurance support for persons living and farming at the subsistence level. Creating this safety net and closing this gap in financial security has far-reaching implications for global development goals and promoting more secure economies and nations.

Conclusions
An algorithm for flood inundation extent mapping using GEE is proposed, utilizing EO data to map any flood event from 1985-present. FMA can be an effective supplement to current inundation and flood risk maps, especially in the Global South, where data and technological gaps are common. FMA can also be used in an exploratory capacity prior to flood mapping, as it is significantly lower in cost, only requiring access to the internet and using open-source EO data. There are some limitations where the terrain influences the accuracy of the outputs, but these are easily characterized and can be further calibrated and accounted for with more event inputs. The FMA can be used to create historical flood inundation maps and potential flood risk maps. Room exists to improve this product with the addition of other remotely sensed datasets.
Additional aspects we hope to study include integrating Sentinel-2 data into the FMA workflow to increase the model performance and combining the FMA output with the demographic and utility data available under open access to create a comprehensive tool to convey the impacts of flooding in an area. Data Availability Statement: The Google Earth engine script, ground truth data, and FMA outputs may be made available on request from the authors.