Application of Artificial Neural Networks for Mangrove Mapping Using Multi-Temporal and Multi-Source Remote Sensing Imagery

Mangroves, as unique coastal wetlands with numerous benefits, are endangered mainly due to the coupled effects of anthropogenic activities and climate change. Therefore, acquiring reliable and up-to-date information about these ecosystems is vital for their conservation and sustainable blue carbon development. In this regard, the joint use of remote sensing data and machine learning algorithms can assist in producing accurate mangrove ecosystem maps. This study investigated the potential of artificial neural networks (ANNs) with different topologies and specifications for mangrove classification in Iran. To this end, multi-temporal synthetic aperture radar (SAR) and multi-spectral remote sensing data from Sentinel-1 and Sentinel-2 were processed in the Google Earth Engine (GEE) cloud computing platform. Afterward, the ANN topologies and specifications considering the number of layers and neurons, learning algorithm, type of activation function, and learning rate were examined for mangrove ecosystem mapping. The results indicated that an ANN model with four hidden layers, 36 neurons in each layer, adaptive moment estimation (Adam) learning algorithm, rectified linear unit (Relu) activation function, and the learning rate of 0.001 produced the most accurate mangrove ecosystem map (F-score = 0.97). Further analysis revealed that although ANN models were subjected to accuracy decline when a limited number of training samples were used, they still resulted in satisfactory results. Additionally, it was observed that ANN models had a high resistance when training samples included wrong labels, and only the ANN model with the Adam learning algorithm produced an accurate mangrove ecosystem map when no data standardization was performed. Moreover, further investigations showed the higher potential of multi-temporal and multi-source remote sensing data compared to single-source and mono-temporal (e.g., single season) for accurate mangrove ecosystem mapping. Overall, the high potential of the proposed method, along with utilizing open-access satellite images and big-geo data processing platforms (i.e., GEE, Google Colab, and scikit-learn), made the proposed approach efficient and applicable over other study areas for all interested users.


Introduction
Mangrove ecosystems are among the most productive ecosystems that exist along coastal areas in tropical and sub-tropical regions. These ecosystems provide unique ecological and environmental benefits including coastal protection (i.e., against floods and wave attenuation) [1,2], carbon sequestration [3,4], pollution and waste abatement [5,6], and pharmaceutical production [7,8]. Additionally, mangrove ecosystems are important habitats for various fauna, providing valuable food services for shrimp farming and fishery [9,10]. However, mangrove loss (i.e., in species and extent) due to the fact of anthropogenic activities, catastrophic natural hazards in coastal areas, and climate change continued in recent decades has led to severe environmental degradation [11][12][13][14]. Accordingly, it is a global, regional, and local concern to accurately map these valuable ecosystems to prevent their loss and establish effective practices for their sustainable management.
In the current era, the advancement of remote sensing technology has created an unprecedented opportunity to study various natural resources such as mangrove communities [15][16][17][18][19][20]. In particular, remote sensing systems provide frequent and accurate data sets over mangrove communities with spatial consistency and synoptic views. These capabilities make remote sensing an appealing choice for mangrove studies compared to conventional approaches that rely on in situ data collection. This is rooted in the fact that conventional practices are time consuming, resource intensive, and, on some occasions, infeasible (i.e., due to the limited access and harsh environment of mangrove communities or large-scale studies) [21,22].
Remote sensing data sets have different characteristics in terms of electromagnetic spectrum domains, spatial resolutions, temporal resolutions, and radiometric resolutions. In particular, multi-spectral, synthetic aperture radar (SAR), light detection and ranging (LiDAR), and hyperspectral data are common remote sensing resources that have been employed either individually or in conjunction for mangrove studies [23][24][25][26][27][28]. For instance, Ashiagbor et al. [29] examined the capability of the Sentinel-1 SAR data to obtain information about mangroves in the Keta Lagoon Complex Ramsar Site (KLCRS) to support sustainable conservation and restoration. Likewise, Bindu et al. [30] employed multispectral images, in situ data, and allometric equations [31] to derive the above-ground biomass of mangroves and then converted the estimated values to carbon content. Moreover, Hu et al. [32] incorporated LiDAR, multi-spectral, topographical, and climate data to estimate the above-ground biomass density of mangrove communities. In another study, Lucas et al. [33] integrated time-series multi-spectral and SAR data to estimate mangroves' age in Matang Mangrove Forest Reserve (MMFR), Malaysia. Later, interferometric SAR data were combined with very high-resolution stereo images to estimate the canopy height of mangroves [33].
Along with remote sensing data, machine learning algorithms have been extensively employed to exploit the full potential of these data for automated mapping of mangroves. In this respect, different machine learning algorithms, including maximum likelihood [34], support vector machine (SVM) [35], random forest (RF) [36,37], K nearest neighbor (KNN) [38], classification and regression trees (CART) [39], and artificial neural networks (ANNs) [40] have been utilized. For example, Parida and Kumar [35] implemented an SVM algorithm to map mangrove extent between 2009 and 2019 using Landsat-5 and Sentinel-2 data sets. Their results indicated an increase in the spatial extent of mangroves in the Odisha coast, which were mainly associated with plantation, awareness, restoration, and management. Moreover, Behera et al. [37] applied an RF algorithm for mangrove mapping in Bhitarkanika Wildlife Sanctuary, India. To this end, red-edge spectral bands and chlorophyll absorption information of AVIRIS-NG and Sentinel-2 images were employed, and the results indicated the preeminence of Sentinel-2 images for this task. Likewise, Zhang et al. [41] used multi-temporal Landsat-5 and digital elevation model (DEM) data to map mangrove forests based on a decision tree algorithm. It was reported that employing multi-temporal data can efficiently enhance the classification results by reducing the tidal effect, and the decision tree approach was superior to conventional statistical classifiers. In another study, Bihamta Toosi et al. [42] compared the performances of four machine learning algorithms (i.e., linear SVM, radial SVM, RF, and regularization in discriminant analysis) for mangrove classification and change detection using Landsat archives. The k-fold cross-section validation step was executed, and it was reported that the RF algorithm achieved the best performance.
Previous studies have acknowledged the benefits of multi-source remote sensing data and different machine learning algorithms for mangrove classification [41,[43][44][45][46]. Meanwhile, few studies have been conducted to generate mangrove maps using multispectral data and the ANN algorithm [40,[47][48][49], and its practicability is underreported [40]. Nevertheless, the applicability of the ANN algorithm for mangrove ecosystem mapping using multi-temporal and multi-source remote sensing data has not been comprehensively explored.
Notwithstanding the foregoing, this study aimed to investigate the potential of combining the ANN algorithm and multi-source (i.e., multi-spectral and SAR) remote sensing data for mangrove ecosystem mapping. In this regard, the ANN algorithms with different topologies and specifications were implemented for mangrove ecosystem mapping. In particular, the effects of the number of layers and neurons, learning algorithms, type of activation functions, and learning rates for small-to-medium-sized ANN models were investigated. Subsequently, several other analyses were conducted to explore the impact of data transformation/standardization, a limited number of training samples, noise labels, as well as multi-temporal and multi-source remote sensing data sets on the classification accuracy using the ANN algorithm.

Study Area
The study was conducted over the Hara protected area, located between Khamir port and northwest estuaries of Qeshm Island at the latitudes and longitudes of 26 • 43 -26 • 59 N and 55 • 28 -55 • 48 E, respectively (see Figure 1). It is the largest mangrove community in the Persian Gulf and Oman Sea, with an area of over 850 km 2 [50,51]. The Hara protected area has been officially registered as one of the biosphere reserves under the Ramsar Convention and included in UNESCO's Man and Biosphere Program convention list [52]. This area includes vast intertidal zones and grooved tidal channels with considerable semi-diurnal tidal fluctuations with the minimum and maximum ranges of 0.3 and 4.6 m, respectively [53]. This suggests considering the tidal effect for accurate mangrove studies using multi-temporal imagery. Particularly, gray (Avicennia marina) and red (Rhizophora mucronata) mangrove species exist in the Persian Gulf (e.g., gray and red mangroves), and the gray mangrove (i.e., Avicennia marina, which belongs to the Acanthaceae family) is the dominant mangrove species in the Hara protected area [42,54]. This mangrove species cultivate in sediments with low-oxygen and high-salinity concentrations and are made of light gray bark and thick, glossy, bright green leaves [8,42]. This region has been impacted by anthropogenic practices of local and regional communities including leaf-cutting, fishing, boat journeys for tourism, oil leakage, and petrochemical industries [55][56][57]. Accordingly, developing efficient approaches is vital to obtain reliable information for the sustainable management of such valuable natural resources.

Reference Samples
Reference samples are required to support the training phase of a supervised classification algorithm and to evaluate its performance. In this study, reference samples were collected through visual interpretation of very high-resolution satellite images in Google Earth and ArcMap, which were captured in 2020. Additionally, the latest version of the global mangrove extent map [58] and the global distribution of tidal flat map [59] were also employed. To this end, these maps were overlaid on very high-resolution satellite images to identify suitable locations for collecting reference samples of the associate classes. This permitted selecting proper locations with higher confidence. In the first step, homogenous sites were considered to collect reference samples to avoid selecting mixed pixels as reference samples. In total, eight classes of land cover with appropriate spatial distribution (i.e., having suitable representativeness of reference samples over the study area) were initially collected (see Figure 1). Later, these samples were randomly divided into two independent sets of training and test samples. To this end, two criteria of size and number were considered. The random splitting approach of reference samples into two independent sets (i.e., training and test) would lead to low bias in the performance of the final classification results [60]. This step was implemented in the polygon unit, ensuring training and test samples with no spatial autocorrelation (i.e., spatially disjoint). This is because splitting reference samples into the pixel unit may lead to the leak of information (i.e., selection of training and test samples from a single polygon), affecting the evaluation step of the classification task and decreasing the generality of the classifier [61]. Afterward, several polygons were added to the mangrove test samples, selected from mangroves from narrow patches and sparse areas to better investigate the classification performance for the Mangrove class. This was because several parts of the study area included mangrove patches with narrow and fragmented conditions. Meanwhile, due to the importance of mangrove delineation, this practice could assure the robustness of the proposed method for the accurate mapping of mangroves located along the tidal zones. Table 1 summarizes the number and area of training and test samples. In total, 824 reference polygons were collected, 144.17 and 160.1 ha of which belonged to the training and test samples, respectively. The Barren class had a higher number of reference samples due to the fact of its higher diversity (i.e., surface variation in structure and color) and distribution in the study area. In contrast, the Vegetation class (e.g., croplands and inland forests/trees) had the lowest number of reference samples due to the fact of its scarcity in the region. pixels as reference samples. In total, eight classes of land cover with appropriate spatial distribution (i.e., having suitable representativeness of reference samples over the study area) were initially collected (see Figure 1). Later, these samples were randomly divided into two independent sets of training and test samples. To this end, two criteria of size and number were considered. The random splitting approach of reference samples into two independent sets (i.e., training and test) would lead to low bias in the performance of the final classification results [60]. This step was implemented in the polygon unit, ensuring training and test samples with no spatial autocorrelation (i.e., spatially disjoint). This is because splitting reference samples into the pixel unit may lead to the leak of information (i.e., selection of training and test samples from a single polygon), affecting the evaluation step of the classification task and decreasing the generality of the classifier [61]. Afterward, several polygons were added to the mangrove test samples, selected from mangroves from narrow patches and sparse areas to better investigate the classification performance for the Mangrove class. This was because several parts of the study area included mangrove patches with narrow and fragmented conditions. Meanwhile, due to the importance of mangrove delineation, this practice could assure the robustness of the proposed method for the accurate mapping of mangroves located along the tidal zones. Table  1 summarizes the number and area of training and test samples. In total, 824 reference polygons were collected, 144.17 and 160.1 ha of which belonged to the training and test samples, respectively. The Barren class had a higher number of reference samples due to the fact of its higher diversity (i.e., surface variation in structure and color) and distribution in the study area. In contrast, the Vegetation class (e.g., croplands and inland forests/trees) had the lowest number of reference samples due to the fact of its scarcity in the region.  Table 1 for information about training and test samples).   Table 1 for information about training and test samples).

Satellite Images
This study employed a combination of multi-temporal Sentinel-1 and Sentinel-2 satellite images, which were available in Google Earth Engine (GEE). Sentinel-1 and Sentinel-2 are two European satellites developed by the joint cooperation of the European Space Agency (ESA) and European Commission initiative Copernicus [62].
GEE is a cloud computing platform that hosts petabytes of open-source geospatial data sets and allows for processing of a massive volume of data for various earth science tasks [63,64]. The synergistic use of SAR and optical data provides complementary information regarding the physical and spectral characteristics of existing land covers and leads to better discrimination, thus enhancing the classification results [65][66][67][68]. Furthermore, the utility of multi-temporal satellite images reduces the tidal effects and water level fluctuations in mangrove communities and, consequently, improves the reliability of the classification results [22].
Sentinel-1 captures all-weather C-band SAR data in dual-polarization with a 6 day revisit time. This sensor provides SAR data in three modes: Stripmap, Interferometric Wide Swath (IW), and Extra Wide Swath [69]. In this study, Ground Range Detected (GRD) Sentinel-1 images from the IW mode in ascending and descending orbits with a spatial resolution of 10 m were used. Overall, 89 Sentinel-1 scenes in both polarizations of VH (vertical transmittance and horizontal receiving) and VV (vertical transmittance and vertical receiving), acquired from 1 January 2020 to 1 January 2021 (i.e., whole scenes in 2020), were processed in this study (see Figure 2).
Sentinel-2 carries the MultiSpectral Instrument (MSI) sensor that allows for recording the Earth's surface radiation in 13 spectral bands ranging from visible to shortwave infrared regions of the electromagnetic spectrum. The MSI captures spectral bands in the spatial resolutions of 10, 20, and 60 m. In this study, only spectral bands with the highest spatial resolution (i.e., blue, green, red, and near-infrared (NIR)) were utilized. This was because it was proved that higher resolution images enhance the mangrove ecosystem classification results [70]. This is rooted in the fact that satellite imagery with a higher spatial resolution has an advanced capability to delineate mangrove patches with narrow shapes and small areas [70]. In total, 51 available Sentinel-2 scenes in GEE with a cloud cover lower than or equal to (≤)5%, acquired between 1 January 2020 and 1 January 2021 (i.e., whole scenes in 2020), were employed in the classification (see Figure 2).

Satellite Images
This study employed a combination of multi-temporal Sentinel-1 and Sentinel-2 satellite images, which were available in Google Earth Engine (GEE). Sentinel-1 and Sentinel-2 are two European satellites developed by the joint cooperation of the European Space Agency (ESA) and European Commission initiative Copernicus [62].
GEE is a cloud computing platform that hosts petabytes of open-source geospatial data sets and allows for processing of a massive volume of data for various earth science tasks [63,64]. The synergistic use of SAR and optical data provides complementary information regarding the physical and spectral characteristics of existing land covers and leads to better discrimination, thus enhancing the classification results [65][66][67][68]. Furthermore, the utility of multi-temporal satellite images reduces the tidal effects and water level fluctuations in mangrove communities and, consequently, improves the reliability of the classification results [22].
Sentinel-1 captures all-weather C-band SAR data in dual-polarization with a 6 day revisit time. This sensor provides SAR data in three modes: Stripmap, Interferometric Wide Swath (IW), and Extra Wide Swath [69]. In this study, Ground Range Detected (GRD) Sentinel-1 images from the IW mode in ascending and descending orbits with a spatial resolution of 10 m were used. Overall, 89 Sentinel-1 scenes in both polarizations of VH (vertical transmittance and horizontal receiving) and VV (vertical transmittance and vertical receiving), acquired from 1 January 2020 to 1 January 2021 (i.e., whole scenes in 2020), were processed in this study (see Figure 2).
Sentinel-2 carries the MultiSpectral Instrument (MSI) sensor that allows for recording the Earth's surface radiation in 13 spectral bands ranging from visible to shortwave infrared regions of the electromagnetic spectrum. The MSI captures spectral bands in the spatial resolutions of 10, 20, and 60 m. In this study, only spectral bands with the highest spatial resolution (i.e., blue, green, red, and near-infrared (NIR)) were utilized. This was because it was proved that higher resolution images enhance the mangrove ecosystem classification results [70]. This is rooted in the fact that satellite imagery with a higher spatial resolution has an advanced capability to delineate mangrove patches with narrow shapes and small areas [70]. In total, 51 available Sentinel-2 scenes in GEE with a cloud cover lower than or equal to (≤)5%, acquired between 1 January 2020 and 1 January 2021 (i.e., whole scenes in 2020), were employed in the classification (see Figure 2). The numbers of Sentinel-1 and Sentinel-2 satellite images that were used in the mangrove classification. The data were acquired between 1 January 2020 and 1 January 2021. Figure 2. The numbers of Sentinel-1 and Sentinel-2 satellite images that were used in the mangrove classification. The data were acquired between 1 January 2020 and 1 January 2021.

Methodology
The proposed methodology has three main steps, which are explained in the following three subsections. The satellite image preprocessing is initially described, followed by detailed explanations of the ANN models and classification procedure. Finally, the accuracy assessment of the classification results is discussed.

Satellite Images Preprocessing
In this study, time-series Sentinel-1 GRD products were collected from GEE. The GEE developers applied five common preprocessing steps to all GRD scenes, making them suitable as ready-to-use data for many applications [71]. In this regard, the following five preprocessing steps were first implemented, the detailed information of which is available by the GEE developers in [72]: (1) orbit file correction, (2) GRD border noise removal, (3) thermal noise removal, (4) radiometric calibration, and (5) terrain correction. Afterwards, a speckle filtering step was applied to all Sentinel-1 scenes. To this end, a mono-temporal improved Lee sigma despeckling algorithm with a kernel size of 5 × 5 was implemented to reduce the undesirable speckle effect, enhancing pixel-based classification results [73,74]. Finally, all the preprocessed Sentinel-1 scenes were categorized based on acquisition seasons, and a mean reducer function was applied to aggregate Sentinel-1 images and to create seasonal SAR features. For instance, all Sentinel-1 scenes within spring included 44 features (22 VV + 22 VH), and the mean reducer function was applied to generate only two features of VV and VH for the spring season. It is worth noting that the implemented aggregation approach would decrease the negative effects of image acquisition conditions and also reduce the volume of the input data and speckle noise effect [46,75].
Sentinel-2 top of atmosphere (TOA) reflectance data were collected for this study. Similar to Sentinel-1, Sentinel-2 scenes were subjected to initial preprocessing steps in which orthorectification and radiometric calibration were performed [76]. Later, a cloud filtering step was considered to avoid the negative effect of clouds in optical images and, thus, only Sentinel-2 scenes with less than or equal to (≤)5% cloud cover were included for further steps. Subsequently, a median reducer was applied to all Sentinel-2 images in each season, and seasonal optical features (i.e., four bands in each season) were generated. This allowed for cloud-free and pure seasonal data sets to be generated with no noise and very dark/bright pixels [68,77].
In summary, the final satellite data set contained eight SAR and 16 optical features, which were applied to the ANN algorithm. It is well accredited that the quality of the classification tasks directly relies on the input features. As such, incorporating multi-source (i.e., SAR + optical) data increases the discriminative capability of the classifier [65][66][67][68]. Moreover, time-series satellite data can manifest the water level fluctuations in estuaries such as mangrove ecosystems [22]. Consequently, seasonal data sets can mitigate the tidal effects in the study area and allow for the production of cloud-free mosaics. Finally, all input features were transformed using the standard scaler approach (i.e., removing the mean and scaling to unit variance).

ANN Models and Classification
ANN is among the supervised machine learning algorithms inspired by the biological neurons system that emulates the human brain's nervous recognition system [78]. ANNs are interconnected neurons that attempt to simulate neural processing and have high capabilities in nonlinear classification tasks [79,80].
Various types of ANN models exist of which the feedforward multi-layer perceptron (MLP) is the most common and practical model for classification and regression tasks [80][81][82]. The MLP models (hereafter called ANN) include input, hidden, and output layers in which a number of neurons (or nodes) and connections (or edges) exits. The data are transferred from the input layer to the output layer through these neurons and connections (see Figure 3). At the initial stage, the weight value of each connection is randomly assigned, and, later, the differences between the actual target classes/values and predicted classes/values (i.e., by the ANN model) are computed. This leads to weight values refinement through a backpropagation algorithm in an iterative manner [83]. The backpropagation algorithms calculate the gradient of the loss function and update the weight values for the next iteration [83]. This procedure continues until the ANN model is adequately trained, considering the satisfaction of the loss function.
Generally, four parameters are required to explicitly determine the architecture of an ANN model [80,84]: (1) number of layers, (2) number of neurons, (3) learning algorithm, and (4) type of activation function. These parameters directly influence the classification results of an ANN model and, hence, should be carefully determined to ensure achievement of the best possible accuracy.
In this study, different ANN topologies and specifications were implemented to examine the capability of ANN models for mangrove ecosystem mapping. To this end, the open-source scikit-learn package was used within the Google Colab platform so that other interested users could freely implement the analyses. In this regard, a grid search analysis was first performed to find the best number of layers (ranging between one and five layer(s)) and neurons (ranging between six and 36 neurons) using three different learning algorithms. This resulted in the investigation of 240 models. Three different learning algorithms were the adaptive moment estimation (Adam) [85], stochastic gradient descent (SGD) [86], and limited memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) [87]. Subsequently, the effect of the type of activation function was explored with the best topologies for each learning algorithm. Four types of activation functions, including rectified linear unit (Relu; f(input) = max(0, input)), hyperbolic tangent (Tanh; f(input) = tanh(input)), logistic sigmoid function (Logistic; f(input) = 1/(1 + exp(−input))), and Identity (f(input) = input) were employed. Afterward, the effect of learning rate was examined for the best ANN models, yet determined, for the two learning algorithms of Adam and SGD. Finally, the best topologies and specifications of three ANN algorithms (i.e., based on three learning algorithms) were applied to map the mangrove ecosystem of the study area. Regarding the determination of the best topologies and specifications, different statistical criteria (see Section 4.3) and computation times (i.e., for the training phase) were considered. It is worth noting that other tunable parameters were set to be equal during the experimental stages to allow comparable investigation of ANN models on the classification results.

Accuracy Assessment
One critical step to ensure the reliability of any classification model is evaluating its performance [88]. Accordingly, the independent test samples (see Section 3.1), which were not involved in the training phase, were used for statistical accuracy assessment. In this regard, the macro averaging F-score (hereafter F-score) values of each ANN model (i.e., all cases in Section 4.2) were first computed to determine the best ANN model [89]. Mean-

Accuracy Assessment
One critical step to ensure the reliability of any classification model is evaluating its performance [88]. Accordingly, the independent test samples (see Section 3.1), which were not involved in the training phase, were used for statistical accuracy assessment. In this regard, the macro averaging F-score (hereafter F-score) values of each ANN model (i.e., all cases in Section 4.2) were first computed to determine the best ANN model [89]. Meanwhile, the computation time of each ANN model was calculated as a side criterion to assist in deciding the most accurate ANN model. Later, the confusion matrices for the best ANN model were calculated. Additionally, several parameters, including overall accuracy (OA), kappa coefficient (KC), producer accuracy (PA), and user accuracy (UA), were derived from the confusion matrix for accuracy assessment. Finally, the maps produced by the three ANN models were visually compared with very high-resolution images.

Results
This section covers the experimental results of exploring the ANN topologies and specifications (see Section 4.2). The results are separated into different subsections to provide an explicit overview of different analyses. The effects of the number of layers and neurons based on different learning algorithms are first presented, followed by the impact of activation functions and learning rates. Finally, the produced maps of the study area, focusing on the mangrove ecosystem, are provided based on the best settings of ANN models for each learning algorithm. Figure 4 presents the F-score and computation time values of the ANN models with different topologies (i.e., number of layers and neurons) using three learning algorithms (i.e., Adam, LBFGS, and SGD). It should be noted that the Relu was considered in this step due to the extensive application of this activation function [90,91]. ANN topologies with one to five layer(s) with a step of one layer and a number of neurons from 6 to 36 with a step of 2 neurons were employed as small-to-medium-sized ANN models. The results also demonstrate the effect of different learning algorithms for mangrove ecosystem mapping using the ANN algorithm. Based on the results (see Figure 4a), the best ANN topologies for Adam, LBFGS, and SGD learning algorithms, respectively, were four layers with 36 neurons (F-score of 0.97) in each, one layer with 26 neurons (F-score of 0.95), and three layers with 36 neurons (F-score of 0.95) in each, respectively. Additionally, it was observed that the Adam learning algorithm had more consistent classification results (i.e., convergence with higher overall accuracies) with an average F-score value of 0.95 (±1%). The SGD had similar behavior with an average F-score value of 0.94 (±1.1%), while the LBFGS had a weaker performance with an average F-score value of 0.90 (±11%) and one divergence case (i.e., dissatisfaction of the loss function). Furthermore, over 20% of the ANN topologies with the LBFGS learning algorithm obtained lower F-score values than 0.90. Meanwhile, computation time generally had a consistent manner for all three learning algorithms. The computational time generally increased as the number of layers and neurons increased, with more dependency on the number of neurons (see Figure 4b). Regarding the computation time (i.e., training phase), the LBFGS entailed lower computation time with an average of 4.5 s, while the other two required approximately 6.5 s on average. It should be noted that the relative difference among computational times would increase when using more features or producing mangrove ecosystem maps of large-scale areas. Therefore, the time criterion could be considered an additional factor in identifying a suitable ANN topology.

Activation Function
As stated in Section 4.1, the Relu was considered as the activation function to determine the best topologies for each learning algorithm. Adopting the best topologies from the previous section, the effect of four activation functions were explored. The results (see and Identity. These results are in agreement with previous literature suggesting the high capability of the Relu activation function [90,91]. In particular, the Relu achieved higher F-score values, and in close competitions, required lower computation times. Furthermore, the results suggested a lower dependency of the ANN model on its activation function, especially based on the achieved accuracy (i.e., F-score value) when the best topology was identified. Moreover, the Identity entailed lower computation time, especially compared with Tanh and Logistic activation functions, and the combination of SGD and the Logistic led to inconsistent results with no convergence, making it less appropriate.

Learning Rate
After determining the three ANN models (i.e., for each learning algorithm), the effects of learning rates were explored. It is worth noting that only Adam and SGD support different learning rate values, so the investigations were relevant for these two learning algorithms. In this regard, 12 learning rate values ranging between 1 × 10 6 and 0.5 were applied. The results (see Figure 6) presented similar behaviors for Adam and SGD when different learning rates were considered to train the ANN model. In particular, the ANN

Learning Rate
After determining the three ANN models (i.e., for each learning algorithm), the effects of learning rates were explored. It is worth noting that only Adam and SGD support different learning rate values, so the investigations were relevant for these two learning algorithms. In this regard, 12 learning rate values ranging between 1 × 10 6 and 0.5 were applied. The results (see Figure 6) presented similar behaviors for Adam and SGD when different learning rates were considered to train the ANN model. In particular, the ANN models with both learning algorithms achieved low F-score values when the learning rate values were very high or very low. In fact, very low learning rate values cause the ANNs

Learning Rate
After determining the three ANN models (i.e., for each learning algorithm), the effects of learning rates were explored. It is worth noting that only Adam and SGD support different learning rate values, so the investigations were relevant for these two learning algorithms. In this regard, 12 learning rate values ranging between 1 × 10 6 and 0.5 were applied. The results (see Figure 6) presented similar behaviors for Adam and SGD when different learning rates were considered to train the ANN model. In particular, the ANN models with both learning algorithms achieved low F-score values when the learning rate values were very high or very low. In fact, very low learning rate values cause the ANNs not to converge in a limited number of iterations, while high values of learning rate make the algorithm jump from local minimums, not being able to converge into the best answer. This also indicated a high dependency of the ANN model on the learning rate value, which should be considered to ensure the most optimal classification result.
Water 2022, 13, x FOR PEER REVIEW 11 of 22 not to converge in a limited number of iterations, while high values of learning rate make the algorithm jump from local minimums, not being able to converge into the best answer. This also indicated a high dependency of the ANN model on the learning rate value, which should be considered to ensure the most optimal classification result.

Mangrove Ecosystem Maps
After analyzing different ANN topologies and specifications in the previous sections, the best ANN models with three learning algorithms were applied to classify the mangrove ecosystem. Figure 7 shows the confusion matrices of the produced thematic maps, computed using independent test samples. The confusion matrices are generally diagonal in all cases (i.e., three learning algorithms), and few confusions happened. In particular, the highest confusions were between Shallow water and Tidal zone and between Urban and Barren. The former was mainly rooted in the fact that these classes have high similarity because of the water fluctuations (i.e., due to tides) and, thus, make it challenging to fully separate them. The latter was primarily associated with the materials used for residential roof construction over the study area, which was similar to Barren (bare soil). Furthermore, a minor confusion was observed between Mangrove and Vegetation and Mudflat. The first (i.e., Mangrove and Vegetation) was related to the slight similarity between these two vegetated classes. The second (i.e., Mangrove and Mudflat) mostly occurred due to the existence of mixed pixels as the result of their proximity at the boundary of each class. Table 2 also provides other statistical criteria derived from the confusion matrices. It can be seen that three ANN models obtained high PAs and UAs in each class. Overall, Mangrove and Mudflat classes had the first and second highest PAs and UAs on average. In contrast, Shallow water and Tidal zone obtained the lowest PAs and UAs on average using ANN models with three different learning algorithms. This could be associated with their similarity due to the water level changes.

Mangrove Ecosystem Maps
After analyzing different ANN topologies and specifications in the previous sections, the best ANN models with three learning algorithms were applied to classify the mangrove ecosystem. Figure 7 shows the confusion matrices of the produced thematic maps, computed using independent test samples. The confusion matrices are generally diagonal in all cases (i.e., three learning algorithms), and few confusions happened. In particular, the highest confusions were between Shallow water and Tidal zone and between Urban and Barren. The former was mainly rooted in the fact that these classes have high similarity because of the water fluctuations (i.e., due to tides) and, thus, make it challenging to fully separate them. The latter was primarily associated with the materials used for residential roof construction over the study area, which was similar to Barren (bare soil). Furthermore, a minor confusion was observed between Mangrove and Vegetation and Mudflat. The first (i.e., Mangrove and Vegetation) was related to the slight similarity between these two vegetated classes. The second (i.e., Mangrove and Mudflat) mostly occurred due to the existence of mixed pixels as the result of their proximity at the boundary of each class. Table 2 also provides other statistical criteria derived from the confusion matrices. It can be seen that three ANN models obtained high PAs and UAs in each class. Overall, Mangrove and Mudflat classes had the first and second highest PAs and UAs on average. In contrast, Shallow water and Tidal zone obtained the lowest PAs and UAs on average using ANN models with three different learning algorithms. This could be associated with their similarity due to the water level changes.  The produced maps (i.e., only classes within the mangrove ecosystem) with a 10 m spatial resolution using different ANN models are presented in Figure 8. Based on the visual interpretation, the produced maps had acceptable accuracies, suggesting the high capability of combining the ANN algorithm and seasonal multi-source remote sensing data for mangrove ecosystem mapping. Overall, mangrove areas were delineated precisely, with higher accuracies through the Adam, LBFGS, and SDG algorithms, respectively. All three ANN models were also capable of correctly classifying most of the mangrove areas with small areas or narrow patches that extend along the coastal areas, where the highest capability was observed for the ANN model with the Adam learning algorithm. However, some mangrove regions were still not detected correctly due to the fact of being too sparse and fragmented. This would be effectively resolved using satellite images with higher spatial resolutions [46]. In particular, based on Figure 8, it can be seen that the Adam learning algorithm had the best performance in depicting mangrove areas and distinguishing Mangrove and Mudflat. In contrast, this happened with lower accuracy for other learning algorithms, especially SGD, which could not discriminate the small patches of Mudflat among Mangrove. Additionally, the middle parts of water bodies with higher depths were correctly identified by Adam and SGD, while there existed several misclassifications with the Tidal zone when using the LFBGS learning algorithm.
The agreements between produced mangrove maps were also computed to obtain a quantitative measure of the differences among these maps. It was observed that nearly  The produced maps (i.e., only classes within the mangrove ecosystem) with a 10 m spatial resolution using different ANN models are presented in Figure 8. Based on the visual interpretation, the produced maps had acceptable accuracies, suggesting the high capability of combining the ANN algorithm and seasonal multi-source remote sensing data for mangrove ecosystem mapping. Overall, mangrove areas were delineated precisely, with higher accuracies through the Adam, LBFGS, and SDG algorithms, respectively. All three ANN models were also capable of correctly classifying most of the mangrove areas with small areas or narrow patches that extend along the coastal areas, where the highest capability was observed for the ANN model with the Adam learning algorithm. However, some mangrove regions were still not detected correctly due to the fact of being too sparse and fragmented. This would be effectively resolved using satellite images with higher spatial resolutions [46]. In particular, based on Figure 8, it can be seen that the Adam learning algorithm had the best performance in depicting mangrove areas and distinguishing Mangrove and Mudflat. In contrast, this happened with lower accuracy for other learning algorithms, especially SGD, which could not discriminate the small patches of Mudflat among Mangrove. Additionally, the middle parts of water bodies with higher depths were correctly identified by Adam and SGD, while there existed several misclassifications with the Tidal zone when using the LFBGS learning algorithm.
The agreements between produced mangrove maps were also computed to obtain a quantitative measure of the differences among these maps. It was observed that nearly 90% of pixels had identical labels, and other disagreements appeared primarily between the water classes (i.e., Deep water and Shallow water) and Tidal zone, while the minority was related to the Mangrove and Mudflat classes. 90% of pixels had identical labels, and other disagreements appeared primarily between the water classes (i.e., Deep water and Shallow water) and Tidal zone, while the minority was related to the Mangrove and Mudflat classes.

Discussion
In this section, first, a few general remarks are provided. Then, three different analyses are presented to provide a more comprehensive overview of the performance of the ANN models. Although the input data in these analyses were related to the mangrove ecosystem, the reached implications primarily manifest the behavior of ANN models that

Discussion
In this section, first, a few general remarks are provided. Then, three different analyses are presented to provide a more comprehensive overview of the performance of the ANN models. Although the input data in these analyses were related to the mangrove ecosystem, the reached implications primarily manifest the behavior of ANN models that could elucidate a path for readers implementing ANN algorithms over different study areas.

General Remarks
Mangrove ecosystems provide many economic, ecological, and environmental benefits for humans and their surroundings [92]. Some of these services are unique, which rationalizes the importance of conserving mangrove ecosystems from degradation. One of the efficient approaches for mangrove mapping is to employ remote sensing data to frequently monitor these natural resources and take the necessary actions to avoid their further loss [93]. In this regard, the machine learning algorithms, such as ANN, that were investigated in this paper can be incorporated to obtain highly accurate information about the mangrove ecosystems. Currently, satellite images with medium spatial resolutions, such as Sentinel-1, Sentinel-2, and Landsat archives, can be accessed with no cost, permitting monitoring of these ecosystems with optimized costs in spatial and temporal directions [22]. Indeed, these satellites could be effectively employed by different national and local organizations regarding the preservation practices of mangrove ecosystems from decay. However, it should be considered that more accurate information, especially on sparse mangroves and narrow mangrove patches, requires the utility of satellite images with higher spatial resolutions, and the high cost of these images is the primary obligation [94].
The ANN models proved to be capable of producing accurate mangrove ecosystem maps. Likewise, the results and further discussions could elucidate the path for other researchers to implement the ANN algorithm in other areas. Additionally, the proven potential of the ANN algorithm in this study may encourage other researchers to suppose achieving satisfactory results in other mangrove ecosystems even with more complex conditions. For instance, it is expected to obtain accurate results using the ANN algorithm in other mangrove ecosystems with other classes such as terrestrial forest, other wetland types, shrubs, and other vegetated communities. This is because other vegetated communities and wetlands have a higher rate of spectral similarity with mangroves [95]. This would affect the classification procedure by decreasing the separability of classes and, thus, robust algorithms are vital for accurate mapping. Furthermore, Three ANN models performed satisfactorily, and the achieved PAs and UAs (see Table 2) suggested that the ensemble of ANN models can enhance the results, which could be investigated in future studies. This is mainly due to the fact that each ANN model obtained higher accuracies in different classes over the study area.
In this study, the satellite images were collected from GEE. This cloud-based platform uses high-performance parallel computing that allows for the application of preprocessing steps on numerous images [20,68]. Consequently, it can help to reduce the required time for applying preprocessing steps that are almost repetitive procedures and can lead to a decrease in the dedicated time [46,96]. Despite these massive advantages, this cloud platform does not currently support ANN models in its base form (i.e., JavaScript API); hence, the experiments should be taken in another platform. Here, the Google Colab platform and the scikit-learn package were used in support of the implemented methodology so that it can be applied at a low cost by any users around the world.
The results of this study confirmed the capability of the implemented ANN model for mangrove ecosystem mapping. Accordingly, this algorithm could be implemented to produce an accurate baseline of mangrove ecosystems in any region. Additionally, the utility of charge-free satellite images (i.e., Sentinel-1 and Sentinel-2) and cloud computing platforms allow for frequent mangrove mapping at a low cost. In this regard, the temporal evolutions and changes in mangrove ecosystems can be mapped in previous years based on the availability of Sentinel-1 and Sentinel-2 images and the following years. This framework permits consistent monitoring of mangrove ecosystems and creates the opportunity to enact practical workflows to preserve these natural resources, especially in protected areas, from adverse anthropogenic and natural processes. Furthermore, frequent monitoring through such a reproducible approach can assist in assessing the applied practices, such as conservation planning and mangrove plantation, to support sustainable development.

Impact of Data Standardization
Input data transformation has been recognized as an impactful preprocessing task when using machine learning algorithms such as ANN models [97]. Accordingly, the remote sensing data of this study were transformed using the standard scaler approach (i.e., removing the mean and scaling to unit variance). However, an analysis was also performed to investigate the impact of data transformation/standardization on ANN models based on three different learning algorithms. ANN models with different topologies were retrained using raw input features. Based on Figure 9, all ANN models failed to retain their behavior in comparison to using transformed input data (see Figure 4). In particular, the ANN models with the LBFGS and SGD learning algorithms failed to converge in almost all cases. However, the ANN model in which the Adam learning algorithm was employed performed more consistently. In fact, this analysis demonstrated the capability of the Adam learning algorithm to handle untransformed input data.
areas, from adverse anthropogenic and natural processes. Furthermore, frequent monitoring through such a reproducible approach can assist in assessing the applied practices, such as conservation planning and mangrove plantation, to support sustainable development.

Impact of Data Standardization
Input data transformation has been recognized as an impactful preprocessing task when using machine learning algorithms such as ANN models [97]. Accordingly, the remote sensing data of this study were transformed using the standard scaler approach (i.e., removing the mean and scaling to unit variance). However, an analysis was also performed to investigate the impact of data transformation/standardization on ANN models based on three different learning algorithms. ANN models with different topologies were retrained using raw input features. Based on Figure 9, all ANN models failed to retain their behavior in comparison to using transformed input data (see Figure 4). In particular, the ANN models with the LBFGS and SGD learning algorithms failed to converge in almost all cases. However, the ANN model in which the Adam learning algorithm was employed performed more consistently. In fact, this analysis demonstrated the capability of the Adam learning algorithm to handle untransformed input data. Figure 9. The F-score values of the grid search of the artificial neural network (ANN) algorithm with different topologies (i.e., number of layers and neurons) and three learning algorithms using untransformed input data.

Impact of Limited Training Samples
It is already known that training samples are required to support the training phase of any supervised machine learning algorithm. Furthermore, it is accredited that collecting training samples, either through in situ field campaigns or visual interpretation of high-resolution images, is time consuming and resource intensive [98]. Therefore, it is more convenient to develop efficient approaches or incorporate robust machine learning algorithms that require a limited number of training samples [99,100]. Accordingly, the impact of limited training samples on the performance of the ANN models was also explored. To this end, only a limited number of training samples from the original set (see Section 3.1) was considered in this section. In particular, the number of training samples Figure 9. The F-score values of the grid search of the artificial neural network (ANN) algorithm with different topologies (i.e., number of layers and neurons) and three learning algorithms using untransformed input data.

Impact of Limited Training Samples
It is already known that training samples are required to support the training phase of any supervised machine learning algorithm. Furthermore, it is accredited that collecting training samples, either through in situ field campaigns or visual interpretation of highresolution images, is time consuming and resource intensive [98]. Therefore, it is more convenient to develop efficient approaches or incorporate robust machine learning algorithms that require a limited number of training samples [99,100]. Accordingly, the impact of limited training samples on the performance of the ANN models was also explored. To this end, only a limited number of training samples from the original set (see Section 3.1) was considered in this section. In particular, the number of training samples for each class was set between 10 and 500 to evaluate the performance of ANN models. Figure 10 illustrates the outcome of using a limited number of training samples for mangrove ecosystem mapping. It is evident that three ANN models experienced an F-score fall compared with the case of using whole training samples. For example, the F-score values of the ANN models with the Adam, LBFGS, and SGD learning algorithms, respectively, decreased by 7%, 16%, and 7%, considering their best case using a limited number of training samples. Furthermore, Figure 10 shows that the LBFGS relatively tends to obtain lower accuracies when the number of training samples was limited.
for each class was set between 10 and 500 to evaluate the performance of ANN models. Figure 10 illustrates the outcome of using a limited number of training samples for mangrove ecosystem mapping. It is evident that three ANN models experienced an F-score fall compared with the case of using whole training samples. For example, the F-score values of the ANN models with the Adam, LBFGS, and SGD learning algorithms, respectively, decreased by 7%, 16%, and 7%, considering their best case using a limited number of training samples. Furthermore, Figure 10 shows that the LBFGS relatively tends to obtain lower accuracies when the number of training samples was limited.

Impact of Noise Labels
The quality of training samples can directly influence the classification results. As such, the existence of noise labels (i.e., training samples with wrong labels), from any source, in the training samples can adversely affect the performance of machine learning algorithms [101]. For instance, mislabeling of 25% of the training samples reduced the OA of the RF up to 10%. Additionally, based on experimental results, the KC obtained using the KNN classifier decreased by approximately 35% when 28% of the training samples had wrong labels [101]. In this perspective, the performance of the best ANN models was examined in the case of the presence of noise labels. To this end, the labels of a portion (in percentage) of the training samples were randomly changed to other classes. It should be noted that an equal percentage was applied to training samples of each class. Additionally, the impact of noise labels was investigated based on a different number of training samples to identify whether there was any relationship between these two factors. It was observed (see Figure 11) that ANN models were minorly affected by noise labels when the noise label percentage was set between 1% and 100%, and a more dramatic decrease in the F-score occurred when over 60% of training samples were subjected to label change. Furthermore, no direct relationship was found between the number of training samples and noise labels.

Impact of Noise Labels
The quality of training samples can directly influence the classification results. As such, the existence of noise labels (i.e., training samples with wrong labels), from any source, in the training samples can adversely affect the performance of machine learning algorithms [101]. For instance, mislabeling of 25% of the training samples reduced the OA of the RF up to 10%. Additionally, based on experimental results, the KC obtained using the KNN classifier decreased by approximately 35% when 28% of the training samples had wrong labels [101]. In this perspective, the performance of the best ANN models was examined in the case of the presence of noise labels. To this end, the labels of a portion (in percentage) of the training samples were randomly changed to other classes. It should be noted that an equal percentage was applied to training samples of each class. Additionally, the impact of noise labels was investigated based on a different number of training samples to identify whether there was any relationship between these two factors. It was observed (see Figure 11) that ANN models were minorly affected by noise labels when the noise label percentage was set between 1% and 100%, and a more dramatic decrease in the F-score occurred when over 60% of training samples were subjected to label change. Furthermore, no direct relationship was found between the number of training samples and noise labels.

Contribution of Multi-Temporal and Multi-Source Images
Using multi-temporal and multi-source remote sensing imagery is recognized as a practical approach to obtaining higher land cover (e.g., wetlands) classification results [102]. This is rooted in the fact that different remote sensing data sources could provide Figure 11. The obtained F-score values for the ANN models using (a) all (16,000), (b) 5000, (c) 1000, and (d) 500 training samples with noise labels (i.e., wrong labels).

Contribution of Multi-Temporal and Multi-Source Images
Using multi-temporal and multi-source remote sensing imagery is recognized as a practical approach to obtaining higher land cover (e.g., wetlands) classification results [102]. This is rooted in the fact that different remote sensing data sources could provide complementary information of the Earth's surface. For instance, multi-spectral and SAR data sets provide spectral and physical properties of the Earth's surface, respectively. Furthermore, multi-temporal remote sensing could provide discriminative information about different classes with dynamic characteristics, reducing the confusion of existing classes and improving the classification results [21,103].
Regarding the contribution of the multi-source remote sensing data, the best ANN models were incorporated to map the mangrove ecosystem using single-source data sets. It was observed that the obtained F-score value using multi-spectral (i.e., Sentinel-2) images for the best ANN model was 0.95, which was nearly 2% lower than using multi-source data sets. Furthermore, the results revealed that incorporating only the SAR data set could not achieve satisfactory results (i.e., F-score = 0.75). The results indicated that the utility of multi-source remote sensing could enhance the classification results and could lead to accurate mangrove maps. This improvement would be more considerable in locations with more complex conditions [65].
Moreover, the best ANN model was employed to examine the effects of using multitemporal data sets in the classification results. In this regard, single-season data sets were fed into the best ANN model, and it was observed that no single-season data set achieved higher classification accuracy. In particular, the spring, summer, autumn, and winter data sets achieved F-score values of 0.94, 0.91, 0.92, and 0.92, respectively. These values were 3%, 6%, 5%, and 5% lower than the accuracy of the map using multi-temporal data sets.

Conclusions
This paper investigated the applicability of integrating ANN models with multitemporal and multi-source remote sensing data. The results indicated the high potential of the ANN models, especially the ANN model with the Adam learning algorithm for mangrove ecosystem mapping (F-score = 0.97). Furthermore, the results demonstrated the higher consistency of ANN models when incorporating the Adam learning algorithm. It was also observed that all three ANN models achieved high UAs and PAs, although higher class accuracies were reached by different models. This showed the potential improvement in classification accuracy by using an ensemble approach to integrate the classification results of ANN models with different learning algorithms. Additionally, it was concluded that data standardization is an unavoidable preprocessing step when using the SGD and LFBGS learning algorithms. The ANN models proved to have a high resistance in noise label conditions (i.e., training samples with wrong labels), and the classification results with a limited number of samples were subjected to accuracy loss with more effects when using the LBFGS learning algorithm. Finally, it was observed that incorporating multi-temporal and multi-source remote sensing data sets could enhance the mangrove classification results, and it is expected to see more improvement in other complex ecosystems.
(accessed on 19 November 2021)) satellite images were collected from the Google Earth Engine cloud computing platform and are freely available for all interested users.