Next Article in Journal
Mixing Regimes in a Shallow Lake over the Past Five Decades: Application to Laguna Carén
Next Article in Special Issue
Physics-Guided Deep Learning for Spatiotemporal Evolution of Urban Pluvial Flooding
Previous Article in Journal
Economic Operation Scheme of Cascade Pump Station Group Under the Power Market Situation—Taking the Yellow River to Qingdao Project as an Example
Previous Article in Special Issue
Deep Learning Ensemble for Flood Probability Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Rapid Urban Flood Detection Using PlanetScope Imagery and Thresholding Methods

School of Advanced Science and Technology Coverage, Kyungpook National University, Sangju 37224, Republic of Korea
*
Author to whom correspondence should be addressed.
Water 2025, 17(7), 1005; https://doi.org/10.3390/w17071005
Submission received: 9 March 2025 / Revised: 25 March 2025 / Accepted: 27 March 2025 / Published: 28 March 2025
(This article belongs to the Special Issue Machine Learning Methods for Flood Computation)

Abstract

:
With advances in optical satellite remote sensing, urban flood mapping (UFM) leveraging water’s distinct spectral characteristics for water identification is preferred and has gained more attention. PlanetScope’s daily 3 m resolution imagery enables detailed and time-sensitive flood monitoring. Unlike machine learning, which requires extensive training data, thresholding methods offer a faster and more adaptable solution for binary classification. Three global (Yen’s, Otsu’s, Isodata) and three local (Niblack, Sauvola, Gonzalez) thresholding methods, with their parameters optimized for each case study, were assessed in this study. Additionally, a hybrid approach was proposed and evaluated. In this approach, local thresholds are computed for each pixel, using the respective local thresholding method. Then, a global threshold is derived by calculating the simple arithmetic mean of all these local thresholds. This global threshold is subsequently applied across the entire image to perform a binary classification, distinguishing flooded from non-flooded areas. To enhance water detection, we also evaluated 26 remote sensing indices. Each was computed using two formulations—the normalized difference and the ratio—where at least one of the eight PlanetScope bands was NIR or RedEdge to enhance water detection. We tested this methodology on three flooding events with different water coverage scenarios in Brazil (34% water coverage), the USA (11%), and Australia (21%). The model performance was validated using the Matthews correlation coefficient (MCC) and the Fowlkes–Mallows index (FMI). The results demonstrated that combining PlanetScope imagery with carefully selected remote sensing indices and thresholding techniques enhances efficient UFM. The hybrid methods outperformed the others by capturing local variations while maintaining global consistency, with the MCC and the FMI exceeding 0.9. The indices incorporating NIR and RedEdge, particularly NDRE, achieved the highest accuracy. However, each flood event was best classified by a different combination of method and index, indicating that it is important to carefully select the appropriate remote sensing indices and thresholding techniques for each specific case. This framework provides a fast, effective solution for UFM, adaptable to diverse urban environments and flood conditions.

1. Introduction

Urban flooding is an increasingly frequent and severe global issue, driven by factors such as rapid urbanization [1] and changing climate patterns [2]. These events cause substantial economic losses, disrupt critical infrastructure, and pose serious threats to public health and safety. Addressing these challenges requires rapid and effective urban flood mapping (UFM), which plays a vital role in disaster preparedness, emergency response, and long-term resilience planning. With advances in satellite remote sensing, UFM has evolved to leverage the distinct spectral and physical properties of water, enabling the precise identification of flooded areas and enhancing the disaster response.
Satellites distinguish water from other surfaces using distinct physical and spectral properties captured by both optical and synthetic aperture radar (SAR) sensors. Optical sensors exploit the fact that water absorbs light strongly in the near-infrared (NIR) region, causing water bodies to appear dark relative to vegetation and soil, which typically reflect more light in these wavelengths. This contrast across different spectral bands allows algorithms to delineate water from land. In contrast, SAR sensors use microwave signals, which are less affected by weather but face challenges in urban areas due to signal interactions with structures [3], causing issues like layovers and the double-bounce effect, which obscure the flood extent [4]. Additionally, SAR’s lower temporal resolution delays timely mapping [5].
Given these challenges with SAR in urban environments, researchers have increasingly explored optical satellite imagery as an alternative for UFM. Among these, PlanetScope stands out due to its high spatial and temporal resolution. The 3 m spatial resolution of PlanetScope enables the precise detection of small-scale urban features, including roads, buildings, and drainage systems, which are critical for accurately delineating flood extents in complex urban environments. With over 200 satellites in orbit, its daily revisit capability helps overcome cloud cover challenges, improving the chances of capturing clear imagery. This is a notable advantage over common optical satellites such as Landsat, with its 30 m resolution and 16-day revisit cycle, or Sentinel-2, which offers a 10 m resolution with a revisit period of up to five days under optimal conditions. Recent sensor upgrades in the PlanetScope constellation have expanded from an original four-band configuration (Red, Green, Blue, and NIR) to eight spectral bands, incorporating additional wavelengths such as the RedEdge and Yellow. The inclusion of the RedEdge band improves the detection of water boundary delineation. One example is the study by Lu et al., which utilized the Gaofen-6 RedEdge band to improve water surface delineation in mountainous and urban areas [6]. By incorporating the RedEdge band alongside traditional visible and NIR bands, they were able to discriminate water from dense or partially submerged vegetation more effectively. Several studies have demonstrated the utility of PlanetScope data for UFM. For instance, Levin and Phinn employed PlanetScope’s optical indices to map urban inundated areas during severe rainfall events in Queensland, Australia, effectively assessing the impacts of extensive flooding [7]. Their research highlighted PlanetScope’s superiority over SAR in producing accurate flood assessments, which is crucial for understanding flood impacts on urban infrastructure and communities. Li et al. integrated PlanetScope imagery with machine learning (ML) algorithms to map urban floods in Brisbane, Australia [8]. Their study underscored PlanetScope’s high spatial resolution as a key factor in identifying small, scattered water bodies, demonstrating its value for flood detection in densely populated regions. However, applying ML techniques for UFM faces several notable challenges that can impact their effectiveness and reliability [9,10,11,12,13]. A major limitation is the requirement for large, diverse, and high-quality labeled datasets to train models effectively [14].
Thresholding methods are a fundamental and widely adopted approach in remote sensing for water mapping [15], involving the classification of image pixels into water and non-water categories based on a selected threshold value. This technique is particularly valued for its simplicity, speed, minimal computation, and no labeled data requirements, making it suitable for rapid flood assessment in emergency response scenarios. Early approaches to flood mapping relied on simple thresholding techniques, such as applying the Normalized Difference Water Index (NDWI) with a common threshold of 0.0 to effectively delineate water bodies in fluvial environments like lakes and rivers [16]. This method, while straightforward, often struggled to accommodate variations in water clarity and atmospheric conditions, particularly in more heterogeneous urban landscapes where impervious surfaces and mixed materials complicate spectral signatures. To address these challenges, researchers advanced the methodology by incorporating automatic thresholding techniques, with the global Otsu’s method becoming prominent due to its ability to objectively determine an optimal threshold based on maximizing inter-class variance within image histograms [14,17,18,19]. Recent research by Che et al. improved surface water mapping in China by optimizing the Otsu threshold in conjunction with remote sensing water indices, which further enhanced the accuracy of identifying water bodies [20]. Beyond Otsu’s method [21], the literature highlights alternative thresholding strategies to further enhance the classification accuracy in water mapping scenarios. For example, Gunen and Atasever compared Otsu’s method with various thresholding methodologies and indices using Landsat-9 data for effective water mapping in Turkey, underscoring the ongoing evolution towards more robust and context-sensitive frameworks in flood delineation [22]. However, the application of thresholding methods in urban settings presents unique challenges due to the complexity of urban land cover, including buildings, roads, and vegetation, which can have similar spectral signatures to water or create shadows that mimic flooded areas. While thresholding methods have been used in UFM, particularly with Sentinel-1 SAR data, their performance can vary, and there is a need for tailored approaches to improve accuracy. For example, Liang et al. noted that SAR-based threshold UFM is limited by speckle noise, polarization differences that can omit genuine flood pixels, and overlapping backscatter signatures between water and smooth non-water surfaces [23].
PlanetScope’s high spatial resolution and daily revisit capability are advantageous for capturing detailed urban features and increasing the likelihood of cloud-free imagery during flood events. Given these benefits, there is a compelling case to evaluate an effective combination of a spectral water index and a thresholding method for rapid, efficient UFM using PlanetScope data. The additional spectral bands in PlanetScope offer new possibilities for developing spectral indices to better distinguish floodwater from urban land cover. Research (i.e., Li et al. [8]) demonstrates the utility of PlanetScope for UFM, often with ML, but exploring thresholding provides a simpler, faster alternative, especially in resource-constrained environments. In summary, the study focuses on two key components:
i.
The evaluation of remote sensing water indices: This study systematically evaluates the effectiveness of various water indices in detecting urban floodwaters. The goal is to minimize misclassification errors caused by urban features, ensuring a reliable distinction between floodwaters and non-water surfaces. The NDWI, a standard water detection index, serves as the baseline for comparison.
ii.
The selection of thresholding methods: This study compares global and local thresholding techniques to identify the most effective approach. Otsu’s method is used as the benchmark for evaluating alternative techniques.
The effectiveness of our approach is tested through three flooding events with different water coverage scenarios in Brazil (34% water coverage), the USA (11%), and Australia (21%). The selection of the events is driven by the challenges posed by thresholding methods, which are sensitive to the ratio of water to non-water coverage in satellite imagery [24,25,26,27]. By testing the thresholding methods across this diverse range of water coverage ratios, this study evaluates their performance and adaptability, ensuring they are robust enough to handle varying real-world flood conditions.

2. Study Area

Figure 1 shows three urban flood events from distinct geographic regions. The selection is intentional to ensure the comprehensive evaluation and validation of the proposed flood mapping methods. By covering diverse climatic conditions, land cover, and water coverage patterns, the chosen events allow for a robust assessment of the methodology’s adaptability, reliability, and general applicability in varied urban flood scenarios globally.
Brazil: The state of Rio Grande do Sul in southern Brazil experienced severe flooding events in May 2024, predominantly driven by intense rainfall associated with anomalous weather patterns. This southernmost state of Brazil features diverse topographic conditions, ranging from gently rolling hills inland to broad alluvial plains along major watercourses. The particular flood event examined in this study occurred in a low-lying floodplain region, where heavy rainfall caused rivers to overflow their banks and inundate adjacent urban areas. Elevations in these flood-prone zones are relatively modest (often below 50–100 m above sea level), allowing the rapid accumulation of water once rivers exceed their carrying capacity. The combination of built-up surfaces, agricultural lands, and partially forested terrain creates a heterogeneous backdrop for flood delineation. Selecting this event provides insight into the performance of the developed methodology in South America’s urban flood scenarios, characterized by densely built environments and heterogeneous land use.
USA: The flooding in Manville, New Jersey, was caused by the remnants of Hurricane Ida, which struck the eastern United States in September 2021. Manville was particularly affected due to its location near the confluence of the Raritan and Millstone rivers, exacerbating flooding conditions. This vicinity is characterized by a relatively flat floodplain that rarely exceeds 30 m above sea level, making it highly susceptible to inundation during intense or prolonged rainfall. The area’s geology consists primarily of unconsolidated floodplain deposits that do not provide significant drainage when the water level rises. Dense residential and commercial development further complicates water flow, as impervious surfaces reduce infiltration and increase surface runoff.
Australia: In March 2021, Windsor, located in the Greater Sydney area of New South Wales, Australia, experienced significant flooding resulting from prolonged heavy rainfall and the overflowing Hawkesbury River. This event is notable for its scale and impact on infrastructure and urban populations, highlighting the necessity for accurate, timely flood mapping to support disaster response efforts. Much of the terrain around Windsor is low-lying (typically 5–50 m above sea level) with gently undulating slopes, creating broad areas where overbank flows can rapidly spread. The local geomorphology comprises alluvial deposits adjacent to river channels, interspersed with urbanized zones and agricultural lands. This mix of land uses and gradual relief encourages floodwaters to linger, posing substantial risks to infrastructure and communities within the flood zone.

3. Methodology

3.1. Remote Sensing Indices

This study utilized PlanetScope satellite imagery to analyze the flood-affected regions. PlanetScope is a constellation of over 200 high-resolution Earth observation satellites operated by Planet Labs, designed to provide daily imaging of the entire planet. The constellation began with its first launch in 2013, deploying an initial fleet of 28 Dove satellites. It has since expanded through numerous launches, including multiple batches of small satellites between 2017 and 2023. Each satellite in the fleet is equipped with advanced multispectral cameras capable of capturing detailed imagery in multiple wavelengths. For this study, PlanetScope imagery of the area was acquired for the post-flood event. Efforts were made to ensure minimal cloud cover and to maintain similar sensor conditions across the datasets, ensuring the consistency and reliability of the analysis. Eight common bands—Coastal Blue (443 nm), Red (665 nm), Green (565 nm), Green I (531 nm), Blue (490 nm), Yellow (610 nm), RedEdge (705 nm), and NIR (865 nm)—were selected to generalize the data and facilitate the analysis of vegetation and water-related features. These multispectral bands provide valuable insights into changes in land cover and water distribution, contributing to a comprehensive assessment of flood impacts in the region.
To enhance the detection of flooded areas in urban environments, a comprehensive suite of remote sensing indices was derived. Each index was calculated using two distinct mathematical formulations: the normalized difference formula (ab)/(a + b) and the ratio form (a/b), where a and b represent any pair of the eight original bands, selected such that at least one band is NIR or RedEdge to enhance the water detection accuracy. For the ratio, the NIR or RedEdge band is the denominator. This strategic selection leverages the unique spectral properties of water in these bands: water’s strong absorption in NIR provides a clear contrast with land surfaces, while the RedEdge band captures subtle reflectance changes that can aid in differentiating water from spectrally similar urban features. By systematically computing and evaluating indices in both forms across all possible band pairs meeting this condition, this study maximizes the likelihood of identifying the optimal spectral combinations for accurate flood mapping, making the approach robust against diverse environmental conditions and spectral complexities.

3.2. Thresholding Techniques

Thresholding is a fundamental technique in image processing used to segment satellite imagery into binary classes—water and non-water—based on pixel intensity values. This study employed thresholding to distinguish flooded areas from non-flooded regions, leveraging the high spatial resolution and multispectral capabilities of the imagery. This section outlines three global thresholding methods—Yen’s, Otsu’s, and Isodata—and three local adaptive thresholding methods—Niblack, Sauvola, and Gonzalez. For each of the three case studies, the parameters of these methods were optimized to maximize the flood mapping accuracy (Figure 2). This included tuning the window size and sensitivity parameters (e.g., k in Niblack and Sauvola), which were optimized using a grid search approach. This process involved systematically evaluating a predefined range of parameter values to identify the combination that maximized the Matthews correlation coefficient (Section 3.3.2) for each method and case study, ensuring a robust performance tailored to the specific characteristics of the imagery. Additionally, a hybrid approach was proposed and evaluated, which derives a global threshold by averaging the local thresholds computed across the image.

3.2.1. Global Approach

Global thresholding methods apply a single threshold value to the entire image, relying on the overall intensity distribution to separate water from non-water areas. These methods are computationally efficient and serve as benchmarks for comparison with more adaptive approaches. The following global methods were utilized:
  • Yen’s method: This entropy-based technique selects the threshold that maximizes the entropy of the thresholded image [28]. By optimizing the information content, Yen’s method aims to effectively separate the intensity distributions of water and non-water pixels, making it suitable for images with distinct class characteristics.
  • Otsu’s method: A widely adopted approach, Otsu’s method determines the threshold that minimizes the intra-class variance—or equivalently, maximizes the inter-class variance—between water and non-water regions [21]. It performs well when the image histogram exhibits a bimodal distribution, providing a robust baseline for global thresholding.
  • Isodata method: The Iterative Self-Organizing Data Analysis Technique (Isodata) iteratively refines the threshold by calculating the means of the two classes (below and above the threshold) and adjusting the threshold until it converges [29]. This method is particularly useful for images where the histogram lacks a clear bimodal structure, offering adaptability to varying intensity distributions.

3.2.2. Local Approach

Local thresholding methods compute a threshold for each pixel based on the statistical properties of its surrounding neighborhood, making them well suited for urban environments, where spectral signatures vary due to shadows, buildings, and mixed land cover. The following local methods were evaluated, with the parameters, such as window size and sensitivity coefficients, optimized for each case study:
  • Niblack’s method: This method calculates the threshold for each pixel as a function of the local mean and standard deviation within a defined window, typically expressed as threshold = mean + k × standard deviation, where k is a sensitivity parameter. Optimized for each case study, Niblack’s method excels at detecting local details but can be sensitive to noise in uniform regions [30].
  • Sauvola’s method: An improvement over Niblack’s approach, Sauvola’s method adjusts the threshold to account for the dynamic range of the standard deviation, using the formula threshold = mean × (1 + k × (standard deviation/R − 1)), where R is the maximum standard deviation (typically 128 for 8-bit images), and k is a tunable parameter. This adaptation reduces noise in low-contrast areas, making it effective for urban flood mapping, with the parameters being fine-tuned for each case study [31].
  • Gonzalez’s method: This method employs local statistics, such as the mean or median of the pixel neighborhood, to determine the threshold [32]. While specific formulations may vary, they adapt to local image conditions, offering flexibility in handling the diverse spectral characteristics of urban landscapes. In this study, its parameters were optimized to enhance the performance across the case studies.

3.2.3. Hybrid Approach

To combine the strengths of global consistency and local adaptability, a hybrid thresholding approach was introduced and evaluated. For each local method, local thresholds were computed across the image, and these values were then averaged to derive a single global threshold. This resulted in three hybrid variants: Niblack_global, Sauvola_global, and Gonzalez_global. The hybrid approach aims to capture localized variations—such as those caused by shadows or small water bodies—while maintaining a consistent threshold applicable to the entire image. By averaging the local thresholds, it provides a balanced solution that potentially improves flood mapping accuracy in complex urban environments.

3.3. Validation and Performance Metrics

3.3.1. Reference Maps

The accurate generation of a ground truth reference image is paramount for validating the performance of flood mapping models, as it serves as the benchmark against which model outputs are assessed. In this study, the ground truth reference image was crafted through a hybrid methodology that combines manual labeling by domain experts with Random Forest (RF) classification, leveraging the complementary strengths of human expertise and ML to achieve high reliability and precision [33]. Manual labeling by experts is a widely accepted practice in remote sensing and geospatial research (i.e., Lang et al. [34]), especially when direct ground truth data—such as field measurements or sensor readings—are unavailable or impractical to obtain. This is often the case during flood events, where rapid inundation, safety concerns, and logistical challenges prevent on-site data collection. Instead, visual interpretation of satellite imagery by trained professionals serves as a reliable substitute.
Initially, domain experts manually annotated pre- and post-flood imagery (true color and false color composites) to identify flooded and non-flooded regions, forming a robust foundation of labeled data. Over 500 data points were systematically selected to ensure a reliable and comprehensive dataset. A stratified sampling strategy was applied, distributing annotations across diverse land use types and geographical areas. Subsequently, these manually annotated samples were used to train an RF classifier, an ensemble-based ML algorithm renowned for its versatility and robustness in handling high-dimensional feature spaces [35]. The RF model was configured with 200 decision trees—a number determined through empirical testing to balance computational efficiency with predictive power—and optimized using k-fold cross-validation (k = 5) to fine-tune hyperparameters such as tree depth and the number of features considered at each split. To further bolster the robustness of the ground truth reference, the RF classifier’s outputs were subjected to a post-processing refinement step. This involved applying spatial consistency checks—such as morphological filtering—to smooth irregularities in the classified regions and aligning the results with expert-validated boundaries where necessary (Figure 1b). Additionally, the model’s performance was rigorously evaluated using the Matthews correlation coefficient, achieving an accuracy exceeding 0.98 across the test datasets.

3.3.2. Validation Strategy

For validation purposes, the classified binary image was compared against a ground truth reference image, which also contained binary labels indicating the water and non-water regions. The comparison was structured around the construction of a confusion matrix comprising four fundamental components: true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs). These components are defined as follows:
  • TPs: pixels correctly classified as flooded.
  • TNs: pixels correctly classified as non-flooded.
  • FPs: pixels incorrectly classified as flooded.
  • FNs: pixels incorrectly classified as non-flooded.
Using these components, two performance metrics were calculated to quantify the classification effectiveness, including the Matthews correlation coefficient (MCC) [36] and Fowlkes–Mallows index (FMI) [37] (Table 1). These metrics are well suited for binary classification because they capture different aspects of prediction performance and handle class imbalances. MCC considers the full confusion matrix (TP, TN, FP, and FN), yielding a high score only if the classifier performs well in both classes. Some researchers consider MCC the most informative single score for evaluating the quality of a binary classifier’s predictions [38,39]. FMI, similarly, reflects the trade-off between precision and recall—it is 1.0 only when both precision and recall are 1.0 (perfect classification). In our analysis, MCC and FMI provide insight into how well each thresholding method handles the imbalanced class problem (since water is a smaller class in all regions) and whether it achieves both a high recall (few missed water pixels) and high precision (few false water pixels).

4. Results

4.1. Thresholding Method Comparisons

Figure 3 arranges the nine thresholding techniques from left to right according to their decreasing overall performances in the mean MCC (panel a) and the FMI (panel b). Overall, hybrid methods that derive a single global threshold from local computations (e.g., Niblack_global, Gonzalez_global) and purely local approaches (e.g., Niblack, Sauvola) consistently rank at the upper end, confirming that spatially adaptive or semi-adaptive thresholding is often best suited for complex urban flood detection—even though simpler global algorithms can remain viable in less heterogeneous settings.
In panel (a), Gonzalez_global appears on the far left with a median MCC hovering around 0.45, showcasing notably high maxima but also a sizable spread. Moving right, Sauvola and Niblack_global demonstrate similarly strong performances, though their wider interquartile ranges suggest greater sensitivity to local scene conditions. Sauvola_global and Niblack occupy midrange positions, each maintaining respectable medians but occasionally dipping into negative MCC values for challenging images. Otsu and Isodata (further right) share a roughly median-level accuracy, with Otsu sometimes matching or exceeding local methods on select datasets, though both exhibit broad distributions reflecting scene-specific variability. Yen—on the far right—consistently scores the lowest, often slipping into negative territory, indicating limited effectiveness for complex urban flood delineation. Panel (b) (FMI) largely corroborates these trends: Niblack_global and Niblack generally cluster among the top performers, frequently yielding median FMI values above 0.50. Gonzalez_global and Gonzalez are not far behind, suggesting that both the purely local version and the aggregated global variant of the Gonzalez approach excel at capturing accurate flood–non-flood boundaries. Sauvola and Sauvola_global remain competitive but exhibit slightly wider spreads, underscoring their fluctuating success across different scenes and threshold settings. By contrast, Isodata and Otsu offer moderate yet more stable performance, while Yen again registers the lowest median FMI and the greatest prevalence of near-zero outliers.
These findings are consistent with established research on image thresholding and flood delineation, wherein spatially adaptive or semi-adaptive thresholding methods often outperform simpler global approaches in heterogeneous urban contexts. For instance, a study by Liang and Liu found that a local thresholding approach improved accuracy by 4–13% over global methods for flood water delineation using Sentinel-1 SAR imagery [23]. Similarly, Chen et al. demonstrated that an adaptive thresholding method outperformed the Otsu method, achieving up to 17% higher overall accuracy in certain scenarios, particularly in areas with medium and low water probabilities [40].

4.2. Best-Performing Indices

Figure 4 compares classification performance across all evaluated indices using two metrics—the MCC (panel a) and the FMI (panel b). Across both metrics, RedEdge_normalized_NIR (NDRE) consistently emerges as the top performer, exhibiting median values close to 0.7 for the FMI and the MCC. Notably, a few other normalized difference indices (the gray color in the background) that incorporate the NIR band—such as NDWI, NDVI, and Yellow_normalized_NIR—regularly show no significant difference symbols, indicating their mean performances do not differ significantly from NDRE, although their interquartile ranges are slightly lower. Their median MCC and FMI are close to 0.70. This strong performance is consistent with the well-documented sensitivity of normalized difference expressions that exploit NIR’s distinct reflectance characteristics for water delineation. Our findings align with a well-established body of research emphasizing the key role of near-infrared (NIR) reflectance for water delineation [41]. Indices that utilize normalized difference expressions involving the NIR band—such as NDWI [42], NDVI [43], and especially NDRE [6]—have consistently shown high accuracy in detecting and mapping inundated areas, as documented by prior studies on water bodies and flood mapping. The observed top-tier performance of NDRE and closely related NIR-based indices corroborates existing evidence demonstrating that harnessing the strong absorption properties of water in the NIR spectrum significantly improves classification metrics.
In contrast, ratio-based indices with NIR in the denominator (green-shaded) tend to occupy the middle tier, often displaying wider distributions of MCC or FMI values (the median MCC generally being around 0.50–0.65). Similar patterns have been reported by previous researchers (i.e., Chen et al. [24]), who note that while ratio-based approaches can succeed under specific hydrological or sensor conditions, they are more sensitive to threshold selection and image noise. Though statistically different from NDRE in most instances (*, p < 0.05), these ratio indices can still achieve high accuracy in specific conditions or when combined with certain thresholding methods. At the lower end of the spectrum, indices that exclude NIR or rely exclusively on RedEdge (both in normalized difference and ratio-based expressions) combined with visible bands (e.g., CoastalBlue_ratio_RedEdge, Blue_normalized_RedEdge) register lower, or even negative, median MCC values. These are also significantly different from the top performers, emphasizing how crucial the strong water-absorption signal in NIR is for accurate inundation mapping.

4.3. Geographical Evaluation of Classification

Figure 5a shows that, in terms of the MCC, Australia exhibits the strongest performance, with a median MCC of approximately 0.50 and an interquartile range (IQR) spanning from 0.1 to 0.70; notably, significant data reach up to 0.90, indicating instances of positive classification. The statistical tests indicated that Australia’s MCC was significantly different from Brazil’s and the USA’s. Brazil follows with a lower median of the MCC and an IQR from 0.00 to 0.60, with significant data reaching up to 0.90, reflecting a moderate performance with greater variability. The USA demonstrates the weakest MCC performance, with a lower median than Brazil and an IQR from 0.0 to 0.40, suggesting that the classification accuracy is often worse than random guessing.
For the FMI (Figure 5b), the ranking shifts: Brazil leads with a median FMI of approximately 0.60 and an IQR of 0.35 to 0.70, indicating a strong clustering similarity in many cases. The statistical tests confirm that Brazil’s FMI is no different from Australia’s. Australia follows with a median FMI of 0.60 and an IQR extending wider than Brazil, showing a solid but slightly lower clustering agreement than Brazil. The USA trails again, with a median FMI of 0.50 and an IQR from 0.10 to 0.50, still lagging behind the other regions.

5. Discussion

5.1. Effect of Water Coverage Ratio on Method Performance

Threshold-based water classification is a simple yet widely used approach to separate water and land in remote sensing images. This study evaluated three global thresholding methods and three local adaptive thresholding methods alongside a hybrid approach that averages local thresholds to derive a global threshold across locations with different water-to-non-water ratios. The locations (and their water coverage) were Brazil (~34% water), the USA (~11% water), and Australia (~21% water). In this section, we analyze how the proportion of water area influences each method’s effectiveness (Figure 6). We also compare these findings with the existing literature on water classification, highlighting the strengths and weaknesses of global versus local thresholding under various environmental conditions.

5.1.1. High Water Coverage

In the Brazil dataset (Figure 6a), water makes up roughly one-third of the area, a relatively high proportion that helps create a clearer separation between water and land in the image histogram. Global methods tend to perform well under these conditions. Otsu’s method, for instance, likely finds a near-optimal threshold since the water cluster is significant and distinct enough to maximize the between-class variance (Figure 7). Research notes that Otsu’s algorithm achieves a high accuracy when the histogram is distinctly bimodal—a situation more likely when water coverage is substantial [24,25]. Isodata methods showed a similarly strong performance in Brazil; with a large water presence. As a result, the MCC was high for global thresholds in Brazil, indicating both a high true positive rate (water correctly identified) and a low false positive rate. For example, methods like Otsu and Isodata on indices (RedEdge_ratio_NIR, Red_ratio_NIR, and Yellow_ratio_NIR) reach around the 0.91–0.97 MCC range. In practical terms, the global thresholds did not grossly under- or overestimate the water extent here. This aligns with other studies, where a single threshold was adequate across an entire scene when water bodies occupied a considerable portion and had consistent spectral characteristics. For example, Tan et al. reported that an Otsu threshold could achieve around a 98% classification accuracy in a scene dominated by floodwater—a scenario akin to having a very high water percentage [44].
Local adaptive methods also performed well in Brazil, but their advantage over global methods was less pronounced under these favorable conditions. Because the contrast between water and land was sufficiently uniform in much of the scene, adapting the threshold locally did not dramatically change the outcome—a correctly chosen global threshold would already have segmented most water pixels correctly. Sauvola’s method, for instance, may have achieved an MCC on par with Otsu in this high-water scenario. One potential benefit of local methods in Brazil was seen around transitional areas: the local threshold could fine-tune the boundary, possibly capturing some water pixels that a global method might misclassify. Overall, however, when water coverage is high, and the water/non-water distinction is clear, global and local methods both yield strong performances, with the global methods being “steady” and effective without needing to consider the spatial context. In such cases, the simplicity of a global threshold is an advantage, and the MCC differences between the methods were small. All the methods showed strengths: minimal false negatives (most water detected) and few false positives (land misidentified as water) due to the distinct spectral signature of plentiful water in Brazil.

5.1.2. Low Water Coverage

The USA dataset, with only ~11% water, presented a highly imbalanced scenario: water pixels are sparse relative to the majority land background. Compared to the Brazil dataset, this imbalance had a significant impact on the global thresholding performance (Figure 6b). The global methods struggled in the low-water-coverage setting, as evidenced by substantially lower MCC scores for Yen’s, Otsu’s, and Isodata on the USA data. The underlying reason is that when water occupies a small fraction of the image, its influence on a global histogram-based threshold is limited. Otsu’s algorithm, for instance, has been noted to yield unstable results when a small area of water is present amidst a large non-water area [45,46]. In our case, the histogram might not have a clear bimodal shape—the water peak could be shallow or even merged with dark land pixels. Otsu’s method tends to “favor the class with the larger variance” (often the larger class) when the histogram is not cleanly bimodal [26,27]. This can result in a threshold that is too conservative, effectively classifying most pixels as land and missing many true water pixels (FN), thereby lowering recall. Alternatively, if the water pixels are very dark (or very bright, depending on the index used) and form a tail in the histogram, a global method might set the threshold to capture those outliers, but then a lot of slightly darker land areas might also get misclassified as water (FP) (Figure 8). Both situations degrade the MCC and the FMI: the MCC drops because the errors are not balanced, and the FMI drops because the precision or the recall (or both) suffer.
In concrete terms, for the USA site, the global Otsu threshold likely under-detected the water. Many small water bodies or narrow streams might have gone unclassified because the threshold optimized global variance by treating them as part of the background. The literature supports this tendency: Cao et al. found that Otsu’s thresholding method produces the worst results due to the small prior ratio of water, which leads to an inaccurate threshold for water extraction—a situation closely mirroring the case in the USA [47]. Yen’s method and Isodata showed similar weaknesses because they, too, rely on the overall intensity distribution. All three global methods in the USA scenario would have exhibited a relatively low to moderate MCC—indicating that either the precision or the recall was poor. If the threshold was set too high (to avoid FP), the recall of water would be low (water pixels missed). If set too low, the precision would drop (many non-water pixels may be labeled as water). Without additional context, a single global threshold is a blunt tool in such a complex, imbalanced scene.

5.1.3. Moderate Water Coverage

The Australian dataset represents an intermediate situation—water is neither dominant nor extremely rare. At ~21% coverage, water bodies are a significant presence, but land still occupies roughly four-fifths of the area. The performance of thresholding methods in this moderate scenario fell between the extremes observed for Brazil and the USA (Figure 6c). For the global methods, the results in Australia were mixed but generally better than in the USA. Otsu’s and Yen’s algorithms each managed to detect a good portion of the water pixels, as the water class was large enough to influence the global threshold noticeably (unlike in the USA case). The histogram likely showed a clearer separation than in the 11% scenario. If the Australian water bodies included, for example, some turbid water or partially dry wetlands, the intensity distribution could be broad, making the thresholding task non-trivial. Nonetheless, global Otsu would try to maximize variance separation, which might result in slightly overestimating the water in some regions and underestimating it in others—a compromise threshold that does not perfectly fit all parts of a diverse scene. The MCC for global methods in Australia was higher than in the USA and Brazil, with fewer misses/false alarms than in the severely imbalanced case. One observation was that among the global methods, Isodata performed comparably to Otsu in Australia, and Yen’s method sometimes produced a threshold that was a bit different (either more inclusive or exclusive) (Figure 9). Yen’s entropy-based criterion might have been slightly more sensitive to background variations in the Australian scene, given that it theoretically handles varying backgrounds well. If the Australian dataset had a heterogenous background, Yen’s method could have found a threshold that balances information gain, possibly classifying some darker land areas as water. Otsu’s method might have been more conservative, focusing on variance and potentially leaving out some faint water pixels. Without the exact numbers, we can say all three global methods showed an adequate but not flawless performance in Australia—each had some FPs or FNs.
The local thresholding methods in Australia again provided improvements, though the margin of improvement over the global methods was smaller. By adapting to local intensity statistics, the approach was able to detect water in localized regions, even when those water pixels were globally rare. The literature suggests that an adaptive threshold approach can indeed yield a higher accuracy in complex environments. For example, Liang et al. combined global and local thresholding to incrementally increase water proportion in each segmented region, which improved the detection of water in varied landscapes, though at the risk of misclassifying small land islands as water [23].

5.2. Hybrid Global–Local Thresholding

The choice of threshold method can greatly affect accuracy, especially under imbalanced conditions. A global threshold may fail to identify small water bodies or thin rivers since the optimal threshold value tends to shift with the proportion of water pixels. Chen et al. observed that the NDWI thresholds vary depending on the subpixel water fraction; in practice, the threshold often needs adjustment for different water coverage scenarios [48]. Local thresholding mitigates this by responding to water in its immediate context, but at the cost of consistency—each region makes its own decision, which can produce a patchy classification. This trade-off motivates hybrid approaches that attempt to combine the strengths of both global and local methods.
The hybrid thresholding approach in question (referred to as Niblack_global, Gonzalez_global, and Sauvola_global) works by first computing thresholds locally across the image using well-known adaptive techniques and then averaging those local thresholds to obtain a single global threshold. In other words, it uses local computations to inform a global decision. The end result in each case is a single global cutoff value informed by the local context. The approach was observed to deliver MCC values that often exceeded those of both the naive global and fully local methods in the test cases (Figure 3). These results suggest that the hybrid approach offers a quantitative improvement in classification accuracy. For instance, one might see MCC improvements in the order of several percentage points against the best pure global or local method in the dataset—a non-trivial gain in remote sensing classification, where even a 1–2% improvement can be meaningful. This confirms that averaging local thresholds can indeed enhance segmentation accuracy, combining the adaptability of local thresholding with the coherence of a global decision.
The concept of blending local and global thresholding has precedents in the broader literature. The driving idea is to leverage both the detailed sensitivity of adaptive methods and the stability of a single global criterion. In the context of cloud detection—another binary segmentation problem in environmental monitoring, Li et al. introduced a hybrid thresholding algorithm (HYTA) that combined the advantages of fixed thresholding and adaptive thresholding to improve cloud segmentation accuracy [49]. Their algorithm determined whether an image’s distribution was unimodal or bimodal and then applied either a global fixed threshold or an adaptive threshold accordingly. The success of HYTA in handling different sky conditions (some needing a single threshold, others needing local adaptation) supports the notion that a hybrid strategy can outperform one-size-fits-all approaches. In water detection, our hybrid averaging approach is conceptually simpler but follows the same principle of exploiting local variability to inform a global decision, rather than using a purely fixed or purely adaptive scheme.
The literature on water segmentation specifically also acknowledges the need for such adaptive strategies. Liang and Liu’s local threshold method effectively created multiple thresholds for different land-cover clusters in an SAR image to address heterogeneity, which significantly improved the water delineation accuracy over any single global threshold [23]. Our hybrid approach shares a similar spirit—recognizing that no single threshold works for all parts of a complex image—but implements it by statistically blending local thresholds into one. While Liang and Liu achieved improvements by segmenting the image into land-type subsets and finding distinct thresholds per subset, an averaged approach like ours is less complex (no explicit land classification is needed) yet still adapts to the image content. The hybrid methods we evaluated provide a data-driven way to adjust thresholds on the fly. By drawing from many local measurements, the threshold effectively self-calibrates for each image. This approach aligns with the recommendations of Lang et al. [50] to adjust thresholds based on actual conditions, but it does so automatically through the local averaging process.
When comparing our results to the existing hybrid or adaptive approaches, it appears that the hybrid global–local averaging is particularly good at handling intermediate cases where neither purely global nor purely local methods are ideal. For instance, in an image with mixed bright and dark regions (say, water under the shadow in one part and sun glint on the water in another), a global threshold might miss the shadowed water, while a fully local method might falsely label sun-glint areas as water. A hybrid threshold, influenced by both areas, finds a middle ground that can detect the darker water pixels while remaining high enough to exclude most glint reflections. This kind of balanced outcome is echoed in the literature, where multi-threshold or two-step segmentation strategies yield more robust results than single-threshold methods [23]. Overall, our findings reinforce what previous research has suggested: combining global and local thresholding logic can lead to more reliable water/non-water segmentation under diverse conditions.

6. Conclusions

In this study, we tested a threshold-based framework for rapid UFM using daily 3 m resolution PlanetScope imagery across three major floods in the USA, Brazil, and Australia. By systematically comparing an array of water indices—particularly those combining NIR and RedEdge bands—with global (Yen’s, Otsu’s, Isodata), local (Niblack, Sauvola, Gonzalez), and hybrid thresholding methods, the results demonstrated that indices incorporating the NIR and RedEdge bands—particularly NDRE—consistently yielded a high classification accuracy. Among the thresholding techniques, the global thresholding method derived from local thresholds generally emerged superior, offering a balanced approach by effectively capturing localized variations while maintaining global consistency. However, each flood event was best classified by a different combination of method and index, indicating that it is important to carefully select the appropriate remote sensing indices and thresholding techniques for each specific case. Ultimately, these results underscore that threshold-based UFM provides a viable alternative to data-intensive ML methods, especially when rapid flood detection is essential.

Author Contributions

Conceptualization, L.N.V.; methodology, L.N.V.; formal analysis, L.N.V. and G.V.N.; writing—original draft preparation, L.N.V.; visualization, L.N.V.; data curation, Y.K., M.T.T.D., S.K. and J.L.; writing—review and editing, G.V.N. and G.L.; supervision, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Korea Ministry of Environment (MOE) as Research and Development on the Technology for Securing the Water Resources Stability in Response to Future Change (RS-2024-00332494).

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

We acknowledge the Planet Education and Research program, which provided data for this research. We would like to thank three anonymous reviewers and academic editors for their insightful comments and suggestions, which have greatly improved the quality of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chen, Y.; Zhou, H.; Zhang, H.; Du, G.; Zhou, J. Urban Flood Risk Warning under Rapid Urbanization. Environ. Res. 2015, 139, 3–10. [Google Scholar] [CrossRef] [PubMed]
  2. Dharmarathne, G.; Waduge, A.O.; Bogahawaththa, M.; Rathnayake, U.; Meddage, D.P.P. Adapting Cities to the Surge: A Comprehensive Review of Climate-Induced Urban Flooding. Results Eng. 2024, 22, 102123. [Google Scholar] [CrossRef]
  3. Lin, Y.N.; Yun, S.-H.; Bhardwaj, A.; Hill, E.M. Urban Flood Detection with Sentinel-1 Multi-Temporal Synthetic Aperture Radar (SAR) Observations in a Bayesian Framework: A Case Study for Hurricane Matthew. Remote Sens. 2019, 11, 1778. [Google Scholar] [CrossRef]
  4. Clement, M.A.; Kilsby, C.G.; Moore, P. Multi-Temporal Synthetic Aperture Radar Flood Mapping Using Change Detection. J. Flood Risk Manag. 2018, 11, 152–168. [Google Scholar] [CrossRef]
  5. Pulvirenti, L.; Chini, M.; Pierdicca, N.; Boni, G. Use of SAR Data for Detecting Floodwater in Urban and Agricultural Areas: The Role of the Interferometric Coherence. IEEE Trans. Geosci. Remote Sens. 2016, 54, 1532–1544. [Google Scholar] [CrossRef]
  6. Lu, Z.; Wang, D.; Deng, Z.; Shi, Y.; Ding, Z.; Ning, H.; Zhao, H.; Zhao, J.; Xu, H.; Zhao, X. Application of Red Edge Band in Remote Sensing Extraction of Surface Water Body: A Case Study Based on GF-6 WFV Data in Arid Area. Hydrol. Res. 2021, 52, 1526–1541. [Google Scholar] [CrossRef]
  7. Levin, N.; Phinn, S. Assessing the 2022 Flood Impacts in Queensland Combining Daytime and Nighttime Optical and Imaging Radar Data. Remote Sens. 2022, 14, 5009. [Google Scholar] [CrossRef]
  8. Li, L.; Woodley, A.; Chappell, T. Mapping Urban Floods via Spectral Indices and Machine Learning Algorithms. Sustainability 2024, 16, 2493. [Google Scholar] [CrossRef]
  9. Peng, B.; Meng, Z.; Huang, Q.; Wang, C. Patch Similarity Convolutional Neural Network for Urban Flood Extent Mapping Using Bi-Temporal Satellite Multispectral Imagery. Remote Sens. 2019, 11, 2492. [Google Scholar] [CrossRef]
  10. Tanim, A.H.; McRae, C.B.; Tavakol-Davani, H.; Goharian, E. Flood Detection in Urban Areas Using Satellite Imagery and Machine Learning. Water 2022, 14, 1140. [Google Scholar] [CrossRef]
  11. Huang, M.; Jin, S. Rapid Flood Mapping and Evaluation with a Supervised Classifier and Change Detection in Shouguang Using Sentinel-1 SAR and Sentinel-2 Optical Data. Remote Sens. 2020, 12, 2073. [Google Scholar] [CrossRef]
  12. Gebrehiwot, A.; Hashemi-Beni, L.; Thompson, G.; Kordjamshidi, P.; Langan, T.E. Deep Convolutional Neural Network for Flood Extent Mapping Using Unmanned Aerial Vehicles Data. Sensors 2019, 19, 1486. [Google Scholar] [CrossRef]
  13. Wang, Y.; Li, Z.; Zeng, C.; Xia, G.-S.; Shen, H. An Urban Water Extraction Method Combining Deep Learning and Google Earth Engine. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 769–782. [Google Scholar] [CrossRef]
  14. Bangira, T.; Alfieri, S.M.; Menenti, M.; van Niekerk, A. Comparing Thresholding with Machine Learning Classifiers for Mapping Complex Water. Remote Sens. 2019, 11, 1351. [Google Scholar] [CrossRef]
  15. Zhao, J.; Li, M.; Li, Y.; Matgen, P.; Chini, M. Urban Flood Mapping Using Satellite Synthetic Aperture Radar Data: A Review of Characteristics, Approaches, and Datasets. IEEE Geosci. Remote Sens. Mag. 2024, 13, 2–34. [Google Scholar] [CrossRef]
  16. Mcfeeters, S.K. The Use of the Normalized Difference Water Index (NDWI) in the Delineation of Open Water Features. Int. J. Remote Sens. 1996, 17, 1425–1432. [Google Scholar] [CrossRef]
  17. Tran, K.H.; Menenti, M.; Jia, L. Surface Water Mapping and Flood Monitoring in the Mekong Delta Using Sentinel-1 SAR Time Series and Otsu Threshold. Remote Sens. 2022, 14, 5721. [Google Scholar] [CrossRef]
  18. Chini, M.; Hostache, R.; Giustarini, L.; Matgen, P. A Hierarchical Split-Based Approach for Parametric Thresholding of SAR Images: Flood Inundation as a Test Case. IEEE Trans. Geosci. Remote Sens. 2017, 55, 6975–6988. [Google Scholar] [CrossRef]
  19. Moharrami, M.; Javanbakht, M.; Attarchi, S. Automatic Flood Detection Using Sentinel-1 Images on the Google Earth Engine. Environ. Monit Assess 2021, 193, 248. [Google Scholar] [CrossRef]
  20. Che, L.; Li, S.; Liu, X. Improved Surface Water Mapping Using Satellite Remote Sensing Imagery Based on Optimization of the Otsu Threshold and Effective Selection of Remote-Sensing Water Index. J. Hydrol. 2025, 654, 132771. [Google Scholar] [CrossRef]
  21. Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
  22. Günen, M.A.; Atasever, U.H. Remote Sensing and Monitoring of Water Resources: A Comparative Study of Different Indices and Thresholding Methods. Sci. Total Environ. 2024, 926, 172117. [Google Scholar] [CrossRef] [PubMed]
  23. Liang, J.; Liu, D. A Local Thresholding Approach to Flood Water Delineation Using Sentinel-1 SAR Imagery. ISPRS J. Photogramm. Remote Sens. 2020, 159, 53–62. [Google Scholar] [CrossRef]
  24. Chen, J.; Wang, Y.; Wang, J.; Zhang, Y.; Xu, Y.; Yang, O.; Zhang, R.; Wang, J.; Wang, Z.; Lu, F.; et al. The Performance of Landsat-8 and Landsat-9 Data for Water Body Extraction Based on Various Water Indices: A Comparative Analysis. Remote Sens. 2024, 16, 1984. [Google Scholar] [CrossRef]
  25. Yang, X.; Hong, L. A New Classification Rule-Set for Mapping Surface Water in Complex Topographical Regions Using Sentinel-2 Imagery. Water 2024, 16, 943. [Google Scholar] [CrossRef]
  26. Xu, X.; Xu, S.; Jin, L.; Song, E. Characteristic Analysis of Otsu Threshold and Its Applications. Pattern Recognit. Lett. 2011, 32, 956–961. [Google Scholar] [CrossRef]
  27. Yuan, X.; Wu, L.; Peng, Q. An Improved Otsu Method Using the Weighted Object Variance for Defect Detection. Appl. Surf. Sci. 2015, 349, 472–484. [Google Scholar] [CrossRef]
  28. Yen, J.-C.; Chang, F.-J.; Chang, S. A New Criterion for Automatic Multilevel Thresholding. IEEE Trans. Image Process. 1995, 4, 370–378. [Google Scholar] [CrossRef] [PubMed]
  29. Ball, G.H.; Hall, D.J. Isodata, a Novel Method of Data Analysis and Pattern Classification; Stanford Research Institute: Menlo Park, CA, USA, 1965. [Google Scholar]
  30. Niblack, W. An Introduction to Digital Image Processing; Prentice-Hall International: Hoboken, NJ, USA, 1986. [Google Scholar]
  31. Sauvola, J.; Pietikäinen, M. Adaptive Document Image Binarization. Pattern Recognit. 2000, 33, 225–236. [Google Scholar] [CrossRef]
  32. Gonzalez, R.C.; Wood, R.E. Digital Image Processing, 2nd ed.; Prentice-Hall Inc.: Hoboken, NJ, USA, 2002. [Google Scholar]
  33. Feng, Q.; Liu, J.; Gong, J. Urban Flood Mapping Based on Unmanned Aerial Vehicle Remote Sensing and Random Forest Classifier—A Case of Yuyao, China. Water 2015, 7, 1437–1455. [Google Scholar] [CrossRef]
  34. Cavallo, C.; Papa, M.N.; Gargiulo, M.; Palau-Salvador, G.; Vezza, P.; Ruello, G. Continuous Monitoring of the Flooding Dynamics in the Albufera Wetland (Spain) by Landsat-8 and Sentinel-2 Datasets. Remote Sens. 2021, 13, 3525. [Google Scholar] [CrossRef]
  35. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  36. Matthews, B.W. Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme. Biochim. Biophys. Acta (BBA)—Protein Struct. 1975, 405, 442–451. [Google Scholar] [CrossRef]
  37. Fowlkes, E.B.; Mallows, C.L. A Method for Comparing Two Hierarchical Clusterings. J. Am. Stat. Assoc. 1983, 78, 553–569. [Google Scholar] [CrossRef]
  38. Chicco, D. Ten Quick Tips for Machine Learning in Computational Biology. BioData Min 2017, 10, 35. [Google Scholar] [CrossRef]
  39. Chicco, D.; Jurman, G. The Matthews Correlation Coefficient (MCC) Should Replace the ROC AUC as the Standard Metric for Assessing Binary Classification. BioData Min. 2023, 16, 4. [Google Scholar] [CrossRef]
  40. Chen, S.; Huang, W.; Chen, Y.; Feng, M. An Adaptive Thresholding Approach toward Rapid Flood Coverage Extraction from Sentinel-1 SAR Imagery. Remote Sens. 2021, 13, 4899. [Google Scholar] [CrossRef]
  41. Amarnath, G. An Algorithm for Rapid Flood Inundation Mapping from Optical Data Using a Reflectance Differencing Technique. J. Flood Risk Manag. 2014, 7, 239–250. [Google Scholar] [CrossRef]
  42. Li, W.; Du, Z.; Ling, F.; Zhou, D.; Wang, H.; Gui, Y.; Sun, B.; Zhang, X. A Comparison of Land Surface Water Mapping Using the Normalized Difference Water Index from TM, ETM+ and ALI. Remote Sens. 2013, 5, 5530–5549. [Google Scholar] [CrossRef]
  43. Atefi, M.R.; Miura, H. Detection of Flash Flood Inundated Areas Using Relative Difference in NDVI from Sentinel-2 Images: A Case Study of the August 2020 Event in Charikar, Afghanistan. Remote Sens. 2022, 14, 3647. [Google Scholar] [CrossRef]
  44. Tan, J.; Tang, Y.; Liu, B.; Zhao, G.; Mu, Y.; Sun, M.; Wang, B. A Self-Adaptive Thresholding Approach for Automatic Water Extraction Using Sentinel-1 SAR Imagery Based on OTSU Algorithm and Distance Block. Remote Sens. 2023, 15, 2690. [Google Scholar] [CrossRef]
  45. Yang, X.; Zhao, S.; Qin, X.; Zhao, N.; Liang, L. Mapping of Urban Surface Water Bodies from Sentinel-2 MSI Imagery at 10 m Resolution via NDWI-Based Image Sharpening. Remote Sens. 2017, 9, 596. [Google Scholar] [CrossRef]
  46. Guo, J.; Wang, X.; Liu, B.; Liu, K.; Zhang, Y.; Wang, C. Remote-Sensing Extraction of Small Water Bodies on the Loess Plateau. Water 2023, 15, 866. [Google Scholar] [CrossRef]
  47. Cao, H.; Zhang, H.; Wang, C.; Zhang, B. Operational Flood Detection Using Sentinel-1 SAR Data over Large Areas. Water 2019, 11, 786. [Google Scholar] [CrossRef]
  48. Chen, F.; Chen, X.; Van de Voorde, T.; Roberts, D.; Jiang, H.; Xu, W. Open Water Detection in Urban Environments Using High Spatial Resolution Remote Sensing Imagery. Remote Sens. Environ. 2020, 242, 111706. [Google Scholar] [CrossRef]
  49. Li, Q.; Lu, W.; Yang, J. A Hybrid Thresholding Algorithm for Cloud Detection on Ground-Based Color Images. J. Atmos. Ocean. Technol. 2011, 28, 1286–1296. [Google Scholar] [CrossRef]
  50. Lang, F.; Zhu, Y.; Zhao, J.; Hu, X.; Shi, H.; Zheng, N.; Zha, J. Flood Mapping of Synthetic Aperture Radar (SAR) Imagery Based on Semi-Automatic Thresholding and Change Detection. Remote Sens. 2024, 16, 2763. [Google Scholar] [CrossRef]
Figure 1. Flood extent mapping across different geographic regions: Brazil, USA, and Australia. (a) True-color PlanetScope imagery displaying post-flood conditions in the three locations, showing visible floodwaters inundating urban and rural areas. The images highlight the extent of water coverage in each region, with brown-colored floodwaters distinguishing flooded areas from non-flooded terrain. (b) Water classification references classify areas into water (blue) and non-water (gray) regions. Below each map, the percentage of detected water and non-water areas is displayed, indicating the proportion of flooded land per region.
Figure 1. Flood extent mapping across different geographic regions: Brazil, USA, and Australia. (a) True-color PlanetScope imagery displaying post-flood conditions in the three locations, showing visible floodwaters inundating urban and rural areas. The images highlight the extent of water coverage in each region, with brown-colored floodwaters distinguishing flooded areas from non-flooded terrain. (b) Water classification references classify areas into water (blue) and non-water (gray) regions. Below each map, the percentage of detected water and non-water areas is displayed, indicating the proportion of flooded land per region.
Water 17 01005 g001
Figure 2. Parameter tuning across different local methods. Each row represents a different region, (a) Australia, (b) USA, and (c) Brazil, and each column corresponds to a different method. Each subplot includes a red, vertical, dashed line indicating the best parameter value.
Figure 2. Parameter tuning across different local methods. Each row represents a different region, (a) Australia, (b) USA, and (c) Brazil, and each column corresponds to a different method. Each subplot includes a red, vertical, dashed line indicating the best parameter value.
Water 17 01005 g002
Figure 3. Distribution of (a) MCC and (b) FMI values across different methods and three study areas, ordered from highest (left) to lowest mean MCC/FMI (right). Each box represents the interquartile range, with the median shown as a horizontal line inside the box. The whiskers extend to show variability. To assess statistical differences, a pairwise t-test was conducted between the highest mean MCC/FMI band and all the other bands. p-values are marked using symbols: * (p < 0.05) indicates a significant difference, while ** (p > 0.05) suggests no significant difference. The background colors distinguish local methods (yellow panels) from global methods (gray panels), though one global approach (e.g., Niblack_global) derives its threshold from the average of local Niblack thresholds—making it a custom hybrid between traditional global and purely local strategies.
Figure 3. Distribution of (a) MCC and (b) FMI values across different methods and three study areas, ordered from highest (left) to lowest mean MCC/FMI (right). Each box represents the interquartile range, with the median shown as a horizontal line inside the box. The whiskers extend to show variability. To assess statistical differences, a pairwise t-test was conducted between the highest mean MCC/FMI band and all the other bands. p-values are marked using symbols: * (p < 0.05) indicates a significant difference, while ** (p > 0.05) suggests no significant difference. The background colors distinguish local methods (yellow panels) from global methods (gray panels), though one global approach (e.g., Niblack_global) derives its threshold from the average of local Niblack thresholds—making it a custom hybrid between traditional global and purely local strategies.
Water 17 01005 g003
Figure 4. Distribution of (a) MCC and (b) FMI values across different spectral bands and three study areas, ordered from highest (dark color) to lowest mean MCC (light color). Each box represents the interquartile range, with the median shown as a vertical line inside the box. The whiskers extend to show variability. To assess statistical differences, a pairwise t-test was conducted between the highest mean MCC/FMI band and all the other bands. p-values are marked using symbols: * (p < 0.05) indicates a significant difference, while ** (p > 0.05) suggests no significant difference. The background colors differentiate between bands: gray for those containing normalized form, and green for ratio form.
Figure 4. Distribution of (a) MCC and (b) FMI values across different spectral bands and three study areas, ordered from highest (dark color) to lowest mean MCC (light color). Each box represents the interquartile range, with the median shown as a vertical line inside the box. The whiskers extend to show variability. To assess statistical differences, a pairwise t-test was conducted between the highest mean MCC/FMI band and all the other bands. p-values are marked using symbols: * (p < 0.05) indicates a significant difference, while ** (p > 0.05) suggests no significant difference. The background colors differentiate between bands: gray for those containing normalized form, and green for ratio form.
Water 17 01005 g004
Figure 5. Distribution of (a) MCC and (b) FMI values across three study areas, ordered from highest (left) to lowest mean MCC (right). Each box represents the interquartile range, with the median shown as a horizontal line inside the box. The whiskers extend to show variability. To assess statistical differences, a pairwise t-test was conducted between the highest mean MCC/FMI band and all the other bands. p-values are marked using symbols: * (p < 0.05) indicates a significant difference, while ** (p > 0.05) suggests no significant difference.
Figure 5. Distribution of (a) MCC and (b) FMI values across three study areas, ordered from highest (left) to lowest mean MCC (right). Each box represents the interquartile range, with the median shown as a horizontal line inside the box. The whiskers extend to show variability. To assess statistical differences, a pairwise t-test was conducted between the highest mean MCC/FMI band and all the other bands. p-values are marked using symbols: * (p < 0.05) indicates a significant difference, while ** (p > 0.05) suggests no significant difference.
Water 17 01005 g005
Figure 6. Heatmap comparison of spectral index classification performance across different thresholding methods in (a) Brazil, (b) USA, and (c) Australia. The color scale represents performance values, with red indicating higher MCC values and blue indicating lower MCC values.
Figure 6. Heatmap comparison of spectral index classification performance across different thresholding methods in (a) Brazil, (b) USA, and (c) Australia. The color scale represents performance values, with red indicating higher MCC values and blue indicating lower MCC values.
Water 17 01005 g006
Figure 7. An example of the flood detection results using the Otsu method, applied to the Red_ratio_NIR index in Brazil (MCC = 0.97). Panel (a) shows the histogram of Red_ratio_NIR values with the Otsu threshold (red, dashed line). Panel (b) shows the binary flood map generated based on the threshold. Panel (c) is a confusion map comparing the predicted flood extent with the reference data. Panel (d) shows ground true satellite imagery for visual comparison.
Figure 7. An example of the flood detection results using the Otsu method, applied to the Red_ratio_NIR index in Brazil (MCC = 0.97). Panel (a) shows the histogram of Red_ratio_NIR values with the Otsu threshold (red, dashed line). Panel (b) shows the binary flood map generated based on the threshold. Panel (c) is a confusion map comparing the predicted flood extent with the reference data. Panel (d) shows ground true satellite imagery for visual comparison.
Water 17 01005 g007
Figure 8. An example of flood detection results using the Yen method was applied to the RedEdge_ratio_NIR index in the USA (MCC = 0.82). Panel (a) shows the histogram of RedEdge_ratio_NIR values with the Yen threshold (red, dashed line). Panel (b) shows the binary flood map generated based on the threshold. Panel (c) is a confusion map comparing the predicted flood extent with the reference data. Panel (d) shows ground true satellite imagery for visual comparison.
Figure 8. An example of flood detection results using the Yen method was applied to the RedEdge_ratio_NIR index in the USA (MCC = 0.82). Panel (a) shows the histogram of RedEdge_ratio_NIR values with the Yen threshold (red, dashed line). Panel (b) shows the binary flood map generated based on the threshold. Panel (c) is a confusion map comparing the predicted flood extent with the reference data. Panel (d) shows ground true satellite imagery for visual comparison.
Water 17 01005 g008
Figure 9. An example of the flood detection results using the Isodata method, which was applied to the NDRE index in Australia (MCC = 0.94). Panel (a) shows the histogram of NDRE values with the Isodata threshold (red, dashed line). Panel (b) shows the binary flood map generated based on the threshold. Panel (c) is a confusion map comparing the predicted flood extent with the reference data. Panel (d) shows ground true satellite imagery for visual comparison.
Figure 9. An example of the flood detection results using the Isodata method, which was applied to the NDRE index in Australia (MCC = 0.94). Panel (a) shows the histogram of NDRE values with the Isodata threshold (red, dashed line). Panel (b) shows the binary flood map generated based on the threshold. Panel (c) is a confusion map comparing the predicted flood extent with the reference data. Panel (d) shows ground true satellite imagery for visual comparison.
Water 17 01005 g009
Table 1. Statistical metrics used in this study.
Table 1. Statistical metrics used in this study.
MetricsFormulaRangeOptimal Value
FMI Precision = TP TP + FP 0.0–1.01.0
Recall = TP TP + FN
Precision × Recall
N = TN + TP + FN + FP
MCC S = TP + FN N
P = TP + FP N
TP N PS PS 1 S 1 P −1.0–1.01.0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Van, L.N.; Nguyen, G.V.; Kim, Y.; Do, M.T.T.; Kwon, S.; Lee, J.; Lee, G. Rapid Urban Flood Detection Using PlanetScope Imagery and Thresholding Methods. Water 2025, 17, 1005. https://doi.org/10.3390/w17071005

AMA Style

Van LN, Nguyen GV, Kim Y, Do MTT, Kwon S, Lee J, Lee G. Rapid Urban Flood Detection Using PlanetScope Imagery and Thresholding Methods. Water. 2025; 17(7):1005. https://doi.org/10.3390/w17071005

Chicago/Turabian Style

Van, Linh Nguyen, Giang V. Nguyen, Younghun Kim, May T. T. Do, Seongcheon Kwon, Jinhyeong Lee, and Giha Lee. 2025. "Rapid Urban Flood Detection Using PlanetScope Imagery and Thresholding Methods" Water 17, no. 7: 1005. https://doi.org/10.3390/w17071005

APA Style

Van, L. N., Nguyen, G. V., Kim, Y., Do, M. T. T., Kwon, S., Lee, J., & Lee, G. (2025). Rapid Urban Flood Detection Using PlanetScope Imagery and Thresholding Methods. Water, 17(7), 1005. https://doi.org/10.3390/w17071005

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop