1. Introduction
Water is significant for all ecosystems on Earth. The presence of surface water on Earth mainly consists of oceans, lakes and rivers [
1]. The extent of lakes accounts for nearly
of the surface [
2] and is endowed with irreplaceable functions to supply water [
3], control flooding [
4], sustain species [
5] and provide ecosystem services to nations and regions [
6] due to the unique role of water in climate [
7], biological diversity [
8] and human wellbeing [
9]. Meanwhile, natural phenomena and human activities affect the variation of water occurrence in response, especially the water dynamics of inland freshwater lakes [
10]. Timely monitoring of freshwater lake surface is indispensable for sustainable development [
11] and regional and global ecosystem dynamics [
12].
Remote sensing, the science and art of detecting objects from a distance, has been the most common approach to monitor and analyze land features for several decades [
13]. In imagery, land features are typically represented as mixed classes of different vegetation cover and surface types. There are many satellite-based sensors that differ in terms of temporal and spatial resolution, corresponding to revisit time and ground area represented by a pixel, respectively. Medium resolution imagery is the most widely used for lake water surface detection, (with approximately 10 days revisit time and each pixel ranging from 10 to 30 m), due to its open access compared to the cost of acquiring higher resolution imagery [
14] and are less prone to the mixed pixels problem of coarse resolution imagery [
15]. Aside from temporal and spatial resolution, there are both passive sensors and active sensors. Passive sensors, known as optical systems, have been employed since the 1970s when the first satellite sensor, Landsat multispectral scanner (MSS), was launched into space [
16]. However, lack of vertical information, issues with wetland vegetation overlapping canopy, and haze and cloud cover problems have largely impeded the accuracy of results [
17]. Thus active sensors, particularly radar systems, have also contributed to remote sensing of water dominated systems, such as lakes. Radar backscatter is sensitive to moisture content and roughness of landscape, and the wavelength of C-band Sentinel-1 sensor enables penetration of both clouds and thick canopies to deal with the challenges of complicated weather and flora conditions [
18].
Nevertheless, the procedure of processing Sentinel-1 radar data involving data acquisition, calibration, speckle filtering, geometry and terrain correction, classification and validation [
19] is extremely time consuming with use of traditional image process platforms, even those with built-in toolboxes, such as ENVI and ERDAS software packages [
20]. This cost can limit the timeliness and efficiency of research. With the help of high-performance computing and network systems, Google Earth Engine (GEE) allows online processing and analysis of radar imagery by writing light-weight scripts with a Google account, speeding the process in a cloud-based platform [
21]. The plethora of data catalogs and innovative processing algorithms provided by GEE can effectively eliminate the barriers caused by the traditional platforms. The water detection algorithms based on radar sensors have emerged in several categories: thresholding, classification and object-based image analysis. In general, thresholding has commonly been adopted to discriminate water from nonwater surface in the logarithmic representation of the radar imagery, where the water and nonwater features are shown as two Gaussian distributions in the histogram of backscatter coefficient of radar data in dB scale. Although it is limited by double bounce scattering issues because waters beneath vegetation layers may cause extra radar backscatter [
22], thresholding is still an efficient and simple method for water extraction of rural areas in winter season with less complicated vegetation coverage.
One classical method to select the threshold is to manually pick the smallest valley values between the two peaks of distributions based upon visual inspection by the researcher. The main issue of this method is the bias caused by each individual observer. The solution to offset the researcher’s observation bias is to apply computer programming to select a less biased lowest point in the valley, which can be computationally efficient in linear time. However, the intensity histogram presented by radar imaging may not necessarily provide a sharp valley but usually a flat region between the peaks. Thus, it will be less accurate or reasonable to pick the smallest valley value in this case, as the value of the selected point may deviate slightly from the value of its neighboring points in the open intervals next to the selected point. Furthermore, due to the noise in radar detection, the strict convex property is not guaranteed in the valley region between the two peaks. In other words, there may exist multiple local peaks and minimums which are close to each other. In this case, the method of picking the smallest valley value is badly influenced by the noise.
The Gaussian Mixture Model is another conventional method for binary classification based on distribution. The distribution of water and nonwater objects in the radar intensity (dB) histogram presents approximately as two Gaussian Distributions with separate means
and
and standard deviations
and
[
23]. One of the distributions is the conditional probability of the dB value of the water pixels while the other is the conditional probability of the dB value of the nonwater pixels. The objective of this model is to maximize conditional probability of the prediction
given any dB values (
x). According to the Bayes Theorem, this equates to maximizing the multiplication of the conditional probability of
x over
and the marginal probability of
.
However, the issue with such formulation of the problem is based on the assumption of the prior distribution of water and land as a Gaussian Distribution. However, such an assumption cannot be directly assumed to be correct for universal cases. Moreover, the distribution parameters
,
,
,
are unknowns. The researcher also needs to identify estimators for these four parameters through the density diagram. Possible solutions for estimation of these unknown parameters can be iterative methods such as Expectation Maximization Method [
24], however, it is unstable for two reasons. First, the iteration process is time consuming to reach a satisfied accuracy. Second, it is also likely to be constrained in some local optimum points and thus never reaches global optimal solution [
25].
Instead, we propose to use the Otsu Method to solve this thresholding problem. The Otsu Method is an unsupervised method and it was initially designed to select a threshold to separate an object out of its background, through the gray-level histogram of the image [
25]. In application, the Otsu Method can be widely extended to work on other density histograms or distributions other than gray-level histogram from images and can also be applied for multi-thresholding problems. The Otsu Method is a better approach for this problem as compared to some conventional solutions because it automatically selects a threshold from two mixed distributions through the density histogram [
25]. In addition, the Otsu Method does not require prior knowledge nor assumptions of the distribution of objects [
25]. Furthermore, the Otsu Method is equivalent to the K-Means Method but the Otsu Method can provide the global optimal solution, while K - Means Method may be limited to the local optimum point [
26]. Although it is computationally complex and heavy because of iterative searching [
26], GEE can speed up the Otsu Method with its cloud computing platform. For instance, the Otsu Method has been applied on the cloud-free Landsat TM images for urban land cover detection, which focused on differentiating the urban land and nonurban land region in Haidian District of Beijing, China [
27]. This research resulted in an accuracy of
for the Otsu Method, which was larger than the accuracy of
for the conventional postclassification change detection method [
27]. Another study used the Otsu Method on the SAR data for the detection of oil spills over sea surfaces, which tried to find a threshold on the radar data to draw the edge of spilled oil film floating over the sea [
28]. It examined the Penglai oil field and the Gulf of Dalian, resulting in an error rate of
on the Penglai oil field and an error rate of
on the Gulf of Dalian for the Otsu Method [
28]. Even though the Otsu Method has already been widely applied in thresholding problems, it has been seldom used for surface water extraction. Furthermore, most previous studies do not provide algorithms and detailed scripts for implementation of the Otsu Method. Thus, we were interested in the application of the Otsu Method for surface water detection and providing reusable code for future implementation.
Therefore, the objectives of the present work are to:
Implement the Otsu Method and write reusable scripts to automatically select thresholds for surface water extraction using Sentinel-1 data on Google Earth Engine
Analyze the advantages and disadvantages of an unsupervised classifier from both mathematical and application perspectives
Contribute to the knowledge base of hydrological variation at Poyang lake by mapping surface water extent of the lake in January 2017, 2018, 2019 and 2020