Fishpond Mapping by Spectral and Spatial-Based Filtering on Google Earth Engine: A Case Study in Singra Upazila of Bangladesh

: Inland aquaculture in Bangladesh has been growing fast in the last decade. The underlying land use / land cover (LULC) change is an important indicator of socioeconomic and food structure change in Bangladesh, and ﬁshpond mapping is essential to understand such LULC change. Previous research often used water indexes (WI), such as Normalized Di ﬀ erence Water Index (NDWI) and Modiﬁed Normalized Di ﬀ erence Water Index (MNDWI), to enhance water bodies and use shape-based metrics to assist classiﬁcation of individual water features, such as coastal aquaculture ponds. However, inland ﬁshponds in Bangladesh are generally extremely small, and little research has investigated mapping of such small water objects without high-resolution images. Thus, this research aimed to bridge the knowledge gap by developing and evaluating an automatic ﬁshpond mapping workﬂow with Sentinel-2 images that is implemented on Google Earth Engine (GEE) platform. The workﬂow mainly includes two steps: (1) the spectral ﬁltering phase that uses a pixel selection technique and an image segmentation method to automatically identify all-year-inundated water bodies and (2) spatial ﬁltering phase to further classify all-year-inundated water bodies into ﬁshponds and non-ﬁshponds using object-based features (OBF). To evaluate the performance of the workﬂow, we conducted a case study in the Singra Upazila of Bangladesh, and our method can e ﬃ ciently map inland ﬁshponds with a precision score of 0.788. Our results also show that the pixel selection technique is essential in identifying inland ﬁshponds that are generally small. As the workﬂow is implemented on GEE, it can be conveniently applied to other regions. ﬁshponds and non-ﬁshpond types, and the o ﬀ -diagonal subﬁgures show the scattered points on the 2D space of the corresponding OBF features. From the diagonal subﬁgures, we can see that IPQ, CONV, and SqP provide good class separability because the peaks of distributions are well separated, while the distributions of SOLI and PFD have a lot of overlap. From the most upper-right subplot, we can also see that SqP and IPQ show a near quadratic relationship as all points align on a simple curve. This can be proved using the formula of the two features. From all subﬁgures, the most likely feature pairs that may provide linear separability is the pair of PFD and SqP because the positive samples and negative samples roughly align along the opposite sides of the y = x line visually. Other pairs of features do not show clear separability between the two classes.


Introduction
Fishery in Bangladesh has been increasing rapidly in the last few decades as a major source of food and economic growth [1]. According to the International Food Policy Research Institute (IFPRI), the fish farming market has grown 25 times in all aspects of the aquaculture industry in the last three decades. Shahin et al. [2] also reported that the total fish production increased from 4.99 Lac MT (100,000 metric tons) in 1998-1999 to 14.47 Lac MT in 2012-2013. Though rice is still the major food source for Bangladesh, the booming aquaculture is bringing diversity to the dietary structure of people in Bangladesh and gradually improving people's health conditions [3]. However, the growing aquaculture puts pressure on already limited croplands. In recent years, a great portion of croplands in Bangladesh has been gradually transforming to other land use types, such as fishponds, brickyards, and residential area [1,4]. While Bangladesh Statistical Bureau (BSB) publishes statistical yearbooks every year, it lacks information of the spatial distribution of the land use changes. Such information can better help decision-makers make land use policies for better resource distributions. With earth observation (EO) data, especially newly published Sentinel-2 Multispectral Instrument (MSI) 10 m resolution images, mapping and monitoring individual fishponds become feasible. In addition, the advent of Google Earth Engine (GEE) significantly reduced the workload and time of remote sensing data preprocessing, analysis, and visualization [5]. Thus, it is important to investigate the potential of using Sentinel-2 images and GEE platform for fast, timely mapping of inland fishponds in Bangladesh.
There are many research works focus on mapping aquaculture ponds in coastal area of South Asia [6][7][8]. However, inland fishponds differ from coastal aquaculture ponds in that inland fishponds are typically owned by individual families, which means they can have arbitrary shape and size and are not necessarily well-aligned as many coastal aquaculture ponds do. Some key features of fishponds in Bangladesh are that: (1) they are usually filled with water all year round, (2) they are small, and (3) like many other man-made objects, they have regular boundaries and simple shapes, such as rectangles.
To address feature (1), multi-temporal and multi-spectral remote sensing images should be used. High-resolution images are the most suitable data to address feature (2). Specifically, based on the research conducted by Belton and Azad [9], the average size of homestead fishponds in Bangladesh is between 0.08 to 0.1 ha, the median value can be even less due to the skewness towards a few large fishponds, which can go up to over 100 ha each. Fishponds with such small size are challenging to detect on medium-resolution (2-30 m) remote sensing images, and it is almost not feasible to do with low-resolution (>30 m) images. However, high-resolution images, such as SPOT (Satellite Pour l'Observation de la Terre) and IKONOS, are usually not available free of charge. Sentinel-2 MSI L1C data has become increasingly popular for land use and land cover (LULC) mapping in recent years mainly because of its finer spatial resolution (10 m) and temporal resolution (10 days before Sentinel-2B launches and 5 days after) [10][11][12]. The significant improvement of both spatial resolution and temporal resolution offered great potentials of improving existing applications and enabling new missions, such as object detections [11]. Therefore, this research uses Sentinel-2 MSI L1C data for fishpond mapping.
Water indexes (WI), such as Normalized Difference Water Index (NDWI) [13], Modified Normalized Difference Water Index (MNDWI) [14], and Automated Water Extraction Index (AWEI) [15], have been developed to enhance water features on multi-spectral images. Most of the WIs utilize low reflectance of water in near-infrared (NIR) and shortwave-infrared (SWIR) spectrum [12,13,16,17]. NDWI takes the difference between the green band and the NIR band, which produce positive values for water and negative values for other LULC types [13]. To address false positives from built-up using NDWI, Xu introduced MNDWI, which is calculated with green and SWIR bands [14]. Previous research reported that MNDWI generally has a more stable threshold than NDWI [16,18,19]. Aside from the NDWI and MNDWI that use 2 bands to compute, AWEI uses 5 bands to compute, and it aims to reduce false positives coming from shadow pixels [15]. The AWEI consists of two formulas, AWEI sh for areas that are contaminated by shadows, and AWEI nsh for areas that are not [20]. Feyisa et al. reported that AWEI has much more stable optimal thresholds than MNDWI [15].
To address feature (3), shape-based metrics can be used to help characterize fishponds and non-fishponds. Object-based features (OBF), also referred to as geometrical features in Reference [21] and as shape metric in Reference [22], have been used in previous research as ancillary features in object-based image analysis. In a research conducted by van der Werff and van der Meer [23], shape measures were extracted from Landsat image objects and were used to help classify spectrally identical objects, e.g., rivers and different shapes of lakes. Jiao et al. [22] used 10 shape metrics to classify 8 LULC classes, including rivers and ponds, on SPOT-5 images. Their results showed that such metrics can well characterize all LULC classes quantitatively. For water features specifically, Remote Sens. 2020, 12, 2692 3 of 20 they characterized rivers as elongated, concave, and complex, while ponds were round, rectangular, convex, and simple [22]. Both research works reported that OBFs significantly improved classification accuracy especially when objects are spectrally similar. However, previous research typically works with high-resolution images or objects that are large compared to the pixel sizes, so it is unclear how such OBFs will help with classifying small objects on relatively coarse resolution images.
Therefore, to investigate the performance of using WI and OBFs for inland fishpond mapping, this research aimed to develop a GEE-based workflow that incorporates spectral-based filtering with multiple WIs and spatial-based filtering with OBFs for fishpond mapping. To our knowledge, this is the first study that presents fully automated workflow for inland fishpond mapping. A case study in the Singra Upazila of Bangladesh was conducted to evaluate the performance of the workflow.

Study Area and Dataset
The study area of the case study we chose is the Singra Upazila (24 • 30 N 89 • 08 E) in Bangladesh, as shown in Figure 1. It is a sub-district of Natore district in Northern Bangladesh that consists of 13 unions, namely Chamari, Chaugram, Chhatar Dighi, Dahia, Hatiandaha, Italy, Kalam, Lalore, Ramananda Khajura, Sherkole, Singra Paurashava, Sukash, and Tajpur. More than three hundred thousand people live in around 530 sq. km areas with a density of 607 persons per sq. km. Around 80% of people in this area are engaged with agriculture, more specifically rice crop farming. Since this area is located within one of the largest flood plains of the country, most of the agricultural fields are flooded in the rainy season every year. A recent trend of land use change from crop fields to fishponds has been found in this area because of the larger profit of fish culturing than growing rice.
Remote Sens. 2020, 12, x 3 of 20 with high-resolution images or objects that are large compared to the pixel sizes, so it is unclear how such OBFs will help with classifying small objects on relatively coarse resolution images. Therefore, to investigate the performance of using WI and OBFs for inland fishpond mapping, this research aimed to develop a GEE-based workflow that incorporates spectral-based filtering with multiple WIs and spatial-based filtering with OBFs for fishpond mapping. To our knowledge, this is the first study that presents fully automated workflow for inland fishpond mapping. A case study in the Singra Upazila of Bangladesh was conducted to evaluate the performance of the workflow.

Study Area and Dataset
The study area of the case study we chose is the Singra Upazila (24°30′N 89°08′E) in Bangladesh, as shown in Figure 1. It is a sub-district of Natore district in Northern Bangladesh that consists of 13 unions, namely Chamari, Chaugram, Chhatar Dighi, Dahia, Hatiandaha, Italy, Kalam, Lalore, Ramananda Khajura, Sherkole, Singra Paurashava, Sukash, and Tajpur. More than three hundred thousand people live in around 530 sq. km areas with a density of 607 persons per sq. km. Around 80% of people in this area are engaged with agriculture, more specifically rice crop farming. Since this area is located within one of the largest flood plains of the country, most of the agricultural fields are flooded in the rainy season every year. A recent trend of land use change from crop fields to fishponds has been found in this area because of the larger profit of fish culturing than growing rice.  In addition to the Singra upazila, where fishponds are located at, we also chose a subarea in Tibetan Plateau (upper-left corner: 84 • 52 N 31 • 49 E; lower-right corner: 87 • 49 N 29 • 55 E) to select non-fishpond water features to train a classifier for fishpond classification. Tibetan Plateau has an average elevation of over 4000 m, and the plateau is rich of water resources with over 1500 lakes scattered on the plateau [24]. There are three reasons why we selected a region on Tibetan Plateau for non-fishpond sample selection. First, Bangladesh does not have many permanent water bodies for classification. Second, Tibetan Plateau has over 1200 lakes larger than 1 sq. km, as well as multiple river streams [25]. Third, Tibetan Plateau has over 4000 m average altitude, which significantly limited human activities, such as fishery. Therefore, there are rarely any artificial water features in the area.
The dataset used in this study is the Sentinel-2 MSI Level-1C product hosted on GEE. It was preprocessed by radiometric and geometric corrections. As a result, the Sentinel-2 Level-1C is a Top-of-Atmosphere (TOA) reflectance dataset that consists of 100 km by 100 km image tiles projected in UTM/WGS84. The Multi-spectral instrument (MSI) sensors onboard Sentinel-2 satellites collect images with 13 spectral bands in the visible/near-infrared (VNIR) and SWIR spectrums, and the spatial resolutions of the spectral bands vary from 10 m to 60 m. The bands that were used in this study are listed in Table 1. The Military Grid Reference System (MGRS) tile number for all images used in this study is 45RYH.

Methodology
The methodology we used can be divided into two parts, (1) spectral filtering and (2) spatial filtering. The spectral filtering phase conducts image segmentation on multi-temporal Sentinel-2 images to detect all-year-inundated water features. The spatial filtering phase further classifies water features into fishpond class and non-fishpond class based on OBFs. A diagram of the workflow is shown in Figure 2. Details are discussed in the following subsections.

Data Preprocessing
Multi-temporal images are first prepared for the detection of all-year-inundated water features rather than seasonal water features, such as rice paddy and flooding [12,26,27]. Specifically, all Sentinel-2 Level-1C images with less than 10% cloud cover in 2016 were collected. The images that satisfy the selection criteria are the four images in 2016 on January 16th, January 31st, April 30th, and October 17th. Here, we denote the image collection as {T i }, i = 1, 2, . . . , n, where n is the total number of images. Note that, due to extremely high cloud cover during monsoon seasons, no images are available between May and September.
Three WIs, i.e., NDWI, MNDWI, and AWEI nsh , were calculated for each selected image T i . We denote the three WI image collections as {NDWI i }, {MNDWI i }, {AWEI i }. AWEI sh was not used because most fishponds are in rural areas where the noise from shadow is minimal. For the convenience of notation, the rest of the paper use AWEI to refer to the AWEI nsh . Table 2 summarizes the aforementioned three WIs.

Data Preprocessing
Multi-temporal images are first prepared for the detection of all-year-inundated water features rather than seasonal water features, such as rice paddy and flooding [12,26,27]. Specifically, all Sentinel-2 Level-1C images with less than 10% cloud cover in 2016 were collected. The images that satisfy the selection criteria are the four images in 2016 on January 16th, January 31st, April 30th, and October 17th. Here, we denote the image collection as { }, = 1,2, … , , where n is the total number of images. Note that, due to extremely high cloud cover during monsoon seasons, no images are available between May and September.
Three WIs, i.e., NDWI, MNDWI, and AWEInsh, were calculated for each selected image . We denote the three WI image collections as { }, { }, { }. AWEIsh was not used because most fishponds are in rural areas where the noise from shadow is minimal. For the convenience of notation, the rest of the paper use AWEI to refer to the AWEInsh. Table 2 summarizes the aforementioned three WIs. Figure 3. shows the WI images for a selected area. As shown in the figure, fishponds show consistently high WI values, while rice paddy and floods show seasonal variations. Table 2. Equations of the three water indexes (WIs) used in this study. in the equations represents Top-of-Atmosphere (TOA) reflectance of corresponding Sentinel-2 bands, e.g., is the TOA value of Sentinel-2 near-infrared band.  Table 2. Equations of the three water indexes (WIs) used in this study. ρ in the equations represents Top-of-Atmosphere (TOA) reflectance of corresponding Sentinel-2 bands, e.g., ρ NIR is the TOA value of Sentinel-2 near-infrared band.

Spectral Filtering
The spectral filtering phase transforms input multi-temporal Sentinel-2 images to a single mask which mask out all pixels that are not inundated all-year-round. The spectral filtering phase include two steps, a pixel selection step and an image segmentation step, and details are discussed in the following two subsections. Remote Sens. 2020, 12, x 6 of 20 .

Pixel Selection Technique
Many threshold-based image segmentation methods favor bimodal histograms because thresholds can be easily found in the valley of two peaks without much uncertainty [28,29]. Therefore, it is reasonable to do a pre-masking to reduce the number of non-water pixels in threshold selection for fishponds that are typically small and scarcely scattered in a large area. To achieve that, we adapted and modified a technique that was used in previous research [28][29][30][31]. Specifically, a preliminary thresholding step was conducted using a forgiving threshold to loosely identify water features, and then buffers are generated around the water features to include some but not all non-water pixels. Detailed steps of this approach are described below: (a) An empirical threshold 0 was used for all multi-temporal MNDWI images, denoted as {MNDWI i }, to loosely classify water features. (b) All classified MNDWI images were combined by inserting logical operator AND between images to get a single layer mask, denoted as M, where pixel value 1 represents all-year-inundated water features. (c) The mask from (b) was then vectorized by connecting neighboring same-value pixels. (d) Buffer polygons were generated from each water feature. The buffer distance was set to 5 pixels. (e) The buffered polygons were then rasterized as a binary mask image for pixel selection.
In the above steps, step (a) and (b) were to identify all-year-inundated water features, step (c) and (d) aimed to generate a pixel selection mask that includes only a portion of all pixels, the masking operation is illustrated as follows: Masked WI image collections WI i were then used as input to the image segmentation step that is discussed in the following subsection.

Image Segmentation
The threshold selection of WIs is the key factor of accurately mapping water on multispectral images [11,19]. An empirical threshold 0 is chosen by default for many WIs [13,16,32]. However, due to mixed pixels of water and other land cover types, the optimal WI thresholds usually depend highly on the scene and locations [15]. Therefore, previous research uses dynamic thresholds to adapt to varying situations. Among many automatic thresholding algorithms, the Otsu method, which is an automatic grey-level image segmentation algorithm that iteratively finds the optimal threshold that maximizes inter-class variance [33], is widely used for water body mapping [10,12,30,[34][35][36]. In research conducted by Yin et al. [37], the Otsu method achieves the best performance among 8 other automatic thresholding methods, and its performance is on par with support vector machine (SVM) and optimal thresholds. Thus, the Otsu method is chosen as the segmentation algorithm in the workflow.
The Otsu method takes in masked WI image collections WI i and produces thresholds for all images WI i within each collection WI i . The thresholds were then used to segment original WI images WI i into binary classes, the segmented image collections are denoted as WI * i . Then, each segmented WI image collection, i.e., NDWI * i , MNDWI * i , AWEI * i is self-combined by inserting logical operator AND between pairs of images, which produces single-layer consensus results where pixel value 1 represents all-year-inundated water features. With all three single-layer results, the final all-year-inundated water feature classification result was generated by conducting majority vote among all three layers, pixels that get more than 1 vote will be labeled as 1, while the rest are labeled 0.

Spatial Filtering
The spatial filtering phase follows the spectral filtering phase, and it transforms the binary mask image from spectral filtering phase to a feature collection. The feature collection contains all polygons that are classified as fishponds. The spatial filtering phase first vectorizes the binary mask image from spectral filtering phase by connecting neighboring homogeneous pixels, which groups connecting pixels into objects. For each water feature vector object, several OBFs were computed using its perimeter, area, and the perimeter and area of its convex hull. A trained machine learning model is then used to classify all water features into fishponds and non-fishponds using OBFs as features.

Object-Based features
OBFs are representations of geometries which are usually used to describe spatial characteristics of geometries [38] and used as auxiliary features in object detection [39,40]. OBFs are effective in measuring the shape complexity of polygons [41,42]. In this research, we mainly used OBFs that can be calculated with perimeters and the area of the target object and its convex hull because the workflow is implemented on GEE, which has limited functionalities for object-based analysis. Table 3 summarizes the OBFs that we used in this research. Table 3. Object-based features (OBFs) that were used in this research (A and P: area and perimeter of the features; A c and P c : area and perimeter of the convex hull of the features).

OBF ID Full Name Formula
IPQ, also known as FORM (form factors), SI (shape index), is a widely used measurement of shape compactness [21][22][23]41,43]. The IPQ measures similarities between an object and the most compact shape, i.e., circles, and it is scale-invariant. The range of IPQ is 0 to 1 with 1 being full circles and 0 being infinitely complex shapes. SOLI (Solidity) measures the extent to which an object is convex or concave [22]. The range of solidity is between 0 and 1 with 1 being completely convex. PFD (patch fractal dimensions) or simply fractal dimension is another widely used measurement of shape complexity [22,41]. In the equation in Table 3, the perimeter of the geometry is divided by 4, which accounts for the raster bias in perimeters. PFD values are close to 1 when geometry shapes are simple, e.g., squares and rectangles. It approaches 2 when shapes become complex. CONV (convexity) is a measurement of the convexity of a geometry similar to SOLI. The value range for CONV is between 0 and 1 with 1 being convex shapes, and values less than 1 for objects with irregular boundaries [23]. Lastly, SqP (Square pixel metric) is a very similar metric with IPQ, and it measures the shape convexity of an object. SqP is 0 for a square and approaches 1 as the shape becomes more complex [44].

Ground Truth Sample Collection
To classify fishponds based on OBFs, we first manually digitized fishponds in randomly selected 7 of 13 unions that are within the Singra Upazila as ground truth data. Of the 7 unions, 3 randomly selected unions (Chhatar Dighi, Kalam, and Ramananda Khajura) were then used to derive positive samples for training purposes, and the remaining 4 unions (Chaugram, Italy, Sherkole, and Singra Paurashava) were used to evaluate the method. Note that positive samples were simply a subset of water feature objects generated from the spectral filtering process that intersects with ground truth fishpond polygons. The reason why manually digitized ground truth polygons were not used directly for training purpose is that manually digitized polygons have over-simplified boundaries, Remote Sens. 2020, 12, 2692 9 of 20 which cannot represent the true shapes of water feature polygons generated by segmenting remote sensing images.
In addition to positive samples, i.e., fishponds, negative samples of non-fishpond water features were collected using the Joint Research Centre (JRC) Yearly Water Classification dataset [45]. The JRC dataset provides 30-m resolution images of seasonal and permanent water features every year since 1984. Permanent water features from the region on Tibetan Plateau that was introduced in Section 2 were extracted as negative samples.
The JRC dataset is a 30-m resolution dataset, which is 9 times the resolution of Sentinel-2 MSI in terms of the pixel area. As a result, same-area objects on the JRC layer should have much simpler boundaries than on Sentinel-2 images. More specifically, same-shape objects on Landsat images should be 9 times as large as they are on Sentinel-2 images. According to Belton and Azad [9], fishponds in Bangladesh are typically within a range of 0.02 ha and 100 ha. Therefore, to compensate for the simplification of shapes caused by the difference of spatial resolutions between the JRC dataset and Sentinel-2 images, water features that are 9 times the size of fishponds, i.e., water features that are within 0.18 ha to 900 ha in the region were selected. The geographic area used to select training samples is shown in Figure 4. In total, 708 positive samples and 1086 negative samples were used in the training process.
selected unions (Chhatar Dighi, Kalam, and Ramananda Khajura) were then used to derive positive samples for training purposes, and the remaining 4 unions (Chaugram, Italy, Sherkole, and Singra Paurashava) were used to evaluate the method. Note that positive samples were simply a subset of water feature objects generated from the spectral filtering process that intersects with ground truth fishpond polygons. The reason why manually digitized ground truth polygons were not used directly for training purpose is that manually digitized polygons have over-simplified boundaries, which cannot represent the true shapes of water feature polygons generated by segmenting remote sensing images.
In addition to positive samples, i.e., fishponds, negative samples of non-fishpond water features were collected using the Joint Research Centre (JRC) Yearly Water Classification dataset [45]. The JRC dataset provides 30-m resolution images of seasonal and permanent water features every year since 1984. Permanent water features from the region on Tibetan Plateau that was introduced in Section 2 were extracted as negative samples.
The JRC dataset is a 30-m resolution dataset, which is 9 times the resolution of Sentinel-2 MSI in terms of the pixel area. As a result, same-area objects on the JRC layer should have much simpler boundaries than on Sentinel-2 images. More specifically, same-shape objects on Landsat images should be 9 times as large as they are on Sentinel-2 images. According to Belton and Azad [9], fishponds in Bangladesh are typically within a range of 0.02 ha and 100 ha. Therefore, to compensate for the simplification of shapes caused by the difference of spatial resolutions between the JRC dataset and Sentinel-2 images, water features that are 9 times the size of fishponds, i.e., water features that are within 0.18 ha to 900 ha in the region were selected. The geographic area used to select training samples is shown in Figure 4. In total, 708 positive samples and 1086 negative samples were used in the training process.

Fishpond Classification
The last step of the workflow is to classify water feature polygons based on their OBF values. We compared two widely used classification models, i.e., the Logistic Regression (LR) model and Decision Tree (DT) model that are easy to implement on GEE and easy to interpret. The model with better performance was then selected and implemented as the classifier in the workflow.
DT is a widely used classifier that recursively partitions feature space to form purer small subspaces [46]. DTs are easy to interpret as decision rules are no more than a few chained thresholds on features. Moreover, DT provides feature importance score that can help with feature selection. LR is a widely used binary classifier. It has a solid theoretical background, and it is very fast to run due to its simplicity [47]. It is also very simple to implement. A major reason of choosing LR and DT in this experiment is that it is trivial to transplant the fitted models to GEE because LR can be implemented as thresholding the weighted sum of all feature values, and DT can be implemented as a set of nested IF-ELSE statements.

Accuracy Assessment
As mentioned in the previous section, we manually digitized fishponds in our test site for 2016, which is the study year of this research, based on Google Earth historical satellite images. Among all digitized fishponds, a small portion was used for training purpose, and the rest are used as testing samples. As we only have positive samples for testing, the evaluation was then based on the 'hit' rate. Specifically, an identified fishpond is considered correctly identified if its centroid is within a ground truth polygon, hence a 'hit'. A partial confusion matrix can be constructed with (1) TP (true positives), which is the number of 'hits', (2) FP (false positive), which is the number of all classified fishponds minus the number of 'hits', and (3) FN (false negative), which is the number of all ground truth polygons minus the number of 'hits'. Precision, Recall, and F1 score can then be calculated. The equations to calculate the evaluation metrics were shown in Equations (1)-(3):

Fishpond Classification Results
An LR classifier and a DT classifier were built using training samples. The classifiers were built and validated using a 5-fold cross-validation scheme. Specifically, the training dataset was split into 5 parts, and, for each iteration, 4 of them were used as training set, and the last one was used as validation set. To avoid building an overcomplicated DT classifier such that it is inconvenient to transplant the model on GEE and also to reduce overfitting, we limited the max depth of the DT to 3. Moreover, to further reduce overfitting, the minimum samples in the leaf node is set to 50. Table 4 shows the cross-validation results for both LR and DT. From Table 4 we can see that DT generally performs slightly better than LR as the average training and validation scores of DT are both around 3-4% higher than LR. The training score and validation scores are close, which means the models are not overfitting. Table 5 shows the weights of the trained LR model. Figure 5 illustrates the tree structure of the trained DT model. From Figure 5, we can see that the leaf nodes of the left branch of the tree are all fishpond class; thus, the tree can be simplified by representing all leaf nodes on the left branch as one node with splitting criteria SqP ≤ 0.134, which simplified the transplantation of the model on GEE.     Figure 6 shows fishpond classification results of three example subareas. The first subarea (left column) is a small village that contains densely located fishponds. By comparing the AWEI image in Figure 6a and the ground truth layer in Figure 6d, we can see that the majority of fishpond boundaries are visually clear to identify, while some fishponds are too close and small to identify individually. The second subarea (middle column) is a village with more scattered fishponds. Due to increased gaps between fishponds, their boundaries are easier to identify, and thus most of the fishponds are correctly identified. The third subarea (right column) is a village beside a river. As shown in Figure  6i,l, both LR and DT successfully rejected river body objects due to their elongated and concave shapes. For all three subareas, the results produced by LR (last row) are slightly better than DT (third row) because LR results have less false negatives, such as the circled fishponds in Figure 6g-l.  Figure 6 shows fishpond classification results of three example subareas. The first subarea (left column) is a small village that contains densely located fishponds. By comparing the AWEI image in Figure 6a and the ground truth layer in Figure 6d, we can see that the majority of fishpond boundaries are visually clear to identify, while some fishponds are too close and small to identify individually. The second subarea (middle column) is a village with more scattered fishponds. Due to increased gaps between fishponds, their boundaries are easier to identify, and thus most of the fishponds are correctly identified. The third subarea (right column) is a village beside a river. As shown in Figure 6i,l, both LR and DT successfully rejected river body objects due to their elongated and concave shapes. For all three subareas, the results produced by LR (last row) are slightly better than DT (third row) because LR results have less false negatives, such as the circled fishponds in Figure 6g Figure 7 shows another village with a few irregular-shaped fishponds, e.g., the 'L' shaped fishpond on the left side of the region labeled as 1. Both LR and DT misclassified most of the fishponds in this region. Specifically, fishpond 1 was misclassified because of its concave shape. As discussed in the methodology, OBFs are mostly designed to measure shape compactness and convexity, and the primary assumption of using OBFs to classify fishponds is that fishponds have relatively simpler shapes than other water objects. Fishpond 2 is in fact a group of small fishponds according to the ground truth data shown in Figure 7b. The overly small gaps between such small fishponds cannot be identified; thus, they were recognized as one giant fishpond with complex  Figure 7 shows another village with a few irregular-shaped fishponds, e.g., the 'L' shaped fishpond on the left side of the region labeled as 1. Both LR and DT misclassified most of the fishponds in this region. Specifically, fishpond 1 was misclassified because of its concave shape. As discussed in the methodology, OBFs are mostly designed to measure shape compactness and convexity, and the primary assumption of using OBFs to classify fishponds is that fishponds have relatively simpler shapes than other water objects. Fishpond 2 is in fact a group of small fishponds according to the ground truth data shown in Figure 7b. The overly small gaps between such small fishponds cannot be identified; thus, they were recognized as one giant fishpond with complex shapes. Lastly, fishpond 3 was misclassified due to its elongated shape. As a result, any small fluctuations on the longer side can significantly affect convex hull based OBFs, such as SOLI and CONV. The misclassification of fishpond 2 and 3 indicate the huge influence of spatial resolution on the fishpond recognition. Fishponds are generally small, and some of them are as small as 200 sq. meters, which is roughly 2 pixels on a Sentinel-2 image. The relatively coarse spatial resolution not only over-simplified boundaries of small fishponds but also amplifies the differences of OBFs caused by tiny changes to the shapes. shapes. Lastly, fishpond 3 was misclassified due to its elongated shape. As a result, any small fluctuations on the longer side can significantly affect convex hull based OBFs, such as SOLI and CONV. The misclassification of fishpond 2 and 3 indicate the huge influence of spatial resolution on the fishpond recognition. Fishponds are generally small, and some of them are as small as 200 sq. meters, which is roughly 2 pixels on a Sentinel-2 image. The relatively coarse spatial resolution not only over-simplified boundaries of small fishponds but also amplifies the differences of OBFs caused by tiny changes to the shapes.

Validation Results
The proposed method was evaluated using the holdout testing dataset in 4 of the unions within the study area. The results are shown in Table 1. Specifically, the LR model identified in a total of 841

Validation Results
The proposed method was evaluated using the holdout testing dataset in 4 of the unions within the study area. The results are shown in Table 1. Specifically, the LR model identified in a total of 841 fishponds within the test unions, and the DT model identified 789 fishponds. Of all the classified fishponds, LR correctly classified 663 of the 841, and the precision score is 0.788. DT correctly classified 610 of 789, and the precision score is 0.773. As a comparison, the ground truth data contains in total 1232 fishponds within the test region. From Table 6, we can see that LR has dominantly better performance on the test dataset with all metric scores higher than DT even though DT performed better during training. The precision score of LR is 1.5% higher than DT, and the recall rate of LR is over 4% higher than DT. The overall F1 score of LR is 3.6% higher than DT. Therefore, LR is recommended to be implemented as the classifier for fishponds.  Figure 8 shows the comparison of histograms of WI values with and without pixel selection, as well as optimal thresholds derived by the Otsu method. From Figure 8, we can see that most of the histograms show bimodal distributions. One peak on the high-value side indicates water types, and the peak on the lower WI value side indicates other land cover types. Comparing thresholds derived using the Otsu method with and without pixel selection step, we can find that most threshold values are close such that whether to use pixel selection may not result in much difference. However, there are a few exceptions where the thresholds derived with pixel selection are quite different from that without. Specifically, thresholds derived with pixel selection on the NDWI image on January 1st and MNDWI and AWEI images on 30 April 2020 all fall out the valleys of the histograms, which significantly changes the class distributions of the segmented images. Figure 9 shows an example of such differences for the MNDWI image on 30 April 2020. Figure 9 clearly shows that the segmentation result with pixel selection is much better than without because the latter includes a lot of false positives, especially in the southwest region of the study area. Moreover, the boundaries of fishponds cannot be clearly identified due to the false positives, as shown in the upper right subfigures. The pixel selection is especially effective when the target class is scarce so that the difference between the target class and other classes is shadowed by variations within other classes. For example, the histograms of NDWI on 1 January 2020 and MNDWI on 30 April 2020 in Figure 8 show two peaks, and the Otsu method finds the threshold in the valley if no pixel selection is applied. However, such peaks just represent the internal variations of non-water types instead of between water class and non-water class. By applying the pixel selection technique, the number of non-water type pixels is significantly reduced to the same level as water features and thus produce better segmentation results. The thresholds derived using the Otsu method are shown in Table 7.

Importance of the Pixel Selection Technique
histograms of NDWI on 1 January 2020 and MNDWI on 30 April 2020 in Figure 8 show two peaks, and the Otsu method finds the threshold in the valley if no pixel selection is applied. However, such peaks just represent the internal variations of non-water types instead of between water class and non-water class. By applying the pixel selection technique, the number of non-water type pixels is significantly reduced to the same level as water features and thus produce better segmentation results. The thresholds derived using the Otsu method are shown in Table 7.   Figure 9 demonstrated the effectiveness of the pixel selection technique in mapping small water features with Sentinel-2 images. Similar approaches also proved their effectiveness when applied to commercial high-resolution satellite images [31] and medium resolution images, such as Landsat images [29]. This also indicates that the pixel selection technique, in general, can help map water features at continental and global scale in an unsupervised way because water features consist of only a small portion of the continents [36]. For supervised water feature classification, the pixel selection method can also help automatically generate training samples such that positive class samples and negative class samples are well balanced [45]. Lastly, the pixel selection technique can also help with water feature mappings with non-optical sensors, such as Synthetic Aperture Radar (SAR) [48] and Soil Moisture Active Passive (SMAP) [49,50], because microwave-based water feature mapping also depends on finding optimal thresholds.    Figure 9 demonstrated the effectiveness of the pixel selection technique in mapping small water features with Sentinel-2 images. Similar approaches also proved their effectiveness when applied to commercial high-resolution satellite images [31] and medium resolution images, such as Landsat images [29]. This also indicates that the pixel selection technique, in general, can help map water features at continental and global scale in an unsupervised way because water features consist of only a small portion of the continents [36]. For supervised water feature classification, the pixel selection method can also help automatically generate training samples such that positive class samples and negative class samples are well balanced [45]. Lastly, the pixel selection technique can also help with water feature mappings with non-optical sensors, such as Synthetic Aperture Radar (SAR) [48] and Soil Moisture Active Passive (SMAP) [49,50], because microwave-based water feature mapping also depends on finding optimal thresholds.

Object-Based Features Comparisons
Five OBFs, i.e., IPQ, SOLI, PFD, CONV, and SqP were calculated for all vectorized all-yearinundated water features. Figure 10 shows the paired scatter plot of OBF values of training data samples. Specifically, the diagonal subfigures show the distribution of each OBF values for fishponds and non-fishpond types, and the off-diagonal subfigures show the scattered points on the 2D space of the corresponding OBF features. From the diagonal subfigures, we can see that IPQ, CONV, and SqP provide good class separability because the peaks of distributions are well separated, while the distributions of SOLI and PFD have a lot of overlap. From the most upper-right subplot, we can also see that SqP and IPQ show a near quadratic relationship as all points align on a simple curve. This can be proved using the formula of the two features. From all subfigures, the most likely feature pairs

Object-Based Features Comparisons
Five OBFs, i.e., IPQ, SOLI, PFD, CONV, and SqP were calculated for all vectorized all-year-inundated water features. Figure 10 shows the paired scatter plot of OBF values of training data samples. Specifically, the diagonal subfigures show the distribution of each OBF values for fishponds and non-fishpond types, and the off-diagonal subfigures show the scattered points on the 2D space of the corresponding OBF features. From the diagonal subfigures, we can see that IPQ, CONV, and SqP provide good class separability because the peaks of distributions are well separated, while the distributions of SOLI and PFD have a lot of overlap. From the most upper-right subplot, we can also see that SqP and IPQ show a near quadratic relationship as all points align on a simple curve. This can be proved using the formula of the two features. From all subfigures, the most likely feature pairs that may provide linear separability is the pair of PFD and SqP because the positive samples and negative samples roughly align along the opposite sides of the y = x line visually. Other pairs of features do not show clear separability between the two classes.

Limitations and Future Work
The major limitation of the work is that the spatial resolution of Sentinel-2 images is not high enough to map extremely small-scale fishponds, which is common for house-owned inland fishponds in Bangladesh. A great portion of the inland fishponds are only a few pixels large such that the pixel outlines cannot accurately represent true shapes of these fishponds. As for now, Sentinel-2 is the only operational optical satellite that provides continuous and high-resolution multi-spectral data free of charge. Thus, until new missions are launched, Sentinel-2 10 m spatial resolution is the best for long-term and continuous mapping of inland fishponds. One promising direction to address the limitations caused by spatial resolution is sub-pixel mapping which produces LULC maps at the resolution higher than the input images [51][52][53][54]. Many research works have shown the great potential

Limitations and Future Work
The major limitation of the work is that the spatial resolution of Sentinel-2 images is not high enough to map extremely small-scale fishponds, which is common for house-owned inland fishponds in Bangladesh. A great portion of the inland fishponds are only a few pixels large such that the pixel outlines cannot accurately represent true shapes of these fishponds. As for now, Sentinel-2 is the only operational optical satellite that provides continuous and high-resolution multi-spectral data free of charge. Thus, until new missions are launched, Sentinel-2 10 m spatial resolution is the best for long-term and continuous mapping of inland fishponds. One promising direction to address the limitations caused by spatial resolution is sub-pixel mapping which produces LULC maps at the resolution higher than the input images [51][52][53][54]. Many research works have shown the great potential of improving mapping accuracies, especially when spatial resolution is too coarse to accurately show object boundaries. However, the effectiveness of sub-pixel mapping when applied to extremely small objects that are only a few pixels large is yet to be investigated.
Another limitation is that cloud covers in monsoon seasons limited the number of images that can be used. Thus, a few previous research works use SAR images instead of optical images for aquaculture pond mapping in South Asia [6,7]. As SAR uses microwave spectrum that can penetrate clouds, and it does not rely on the presence of sun, it can provide significantly better temporal coverage for regions with heavy cloud contaminations. Moreover, water features have very low backscatter when comparing with other LULC types such that they stand out on SAR images [40,48,55]. However, SAR images are generally contaminated with speckle noises that needs to be preprocessed by filtering, which will somewhat reduce the spatial resolution of the product, and edges may not be preserved. The reduced spatial resolution may not affect large objects mapping, such as coastal aquaculture ponds, it may have a major impact on the performance of mapping small-scale inland fishponds [55]. Recent research showed that the integrative usage of optical and SAR sensors, especially Sentinel-1 and Sentinel-2, has great potential in LULC mapping in monsoon area [56], and this is a very promising direction for improving the current state of inland fishpond mapping.

Conclusions
Inland fishery is growing fast in Bangladesh and many other South Asia countries, such as India. The underlying LULC change, especially the transition of agriculture land to aquaculture land may put pressure on already limited agriculture land for large populations. Mapping and monitoring inland fishponds are essential to understand such LULC changes. While there are many research works focused on mapping coastal aquaculture ponds, which share some similarities with inland fishponds, there is rarely any research focus on mapping inland fishponds that are typically small and unorganized. Thus, this research presented a workflow that uses multi-temporal Sentinel-2 images for fishpond mapping in the context of a case study in the Singra Upazila in Bangladesh. The workflow first applies a spectral-based filtering that automatically segments multi-temporal images using WIs and generates an all-year-inundated water feature mask. A key step in the spectral filtering is to apply a pixel selection technique that limit number of pixels used in image segmentation, which is essential to map small objects in a large area. Then, a spatial-based filtering was applied to classify all-year-inundated water objects into fishponds and non-fishponds using OBFs. We used five OBFs to characterize water objects and trained a LR and a DT model using the five OBFs as features. The LR model achieved better results than DT, and the trained LR model was used in the workflow for final classification. In a case study in the Singra Upazila in Bangladesh, we manually digitized fishponds using Google Earth historical images and tested our method. The proposed workflow achieved around 79% precision and 54% recall rate. Lastly, the workflow was implemented on GEE and thus can be easily reapplied to new regions.