A Transfer Learning Technique for Inland Chlorophyll-a Concentration Estimation Using Sentinel-3 Imagery

: Chlorophyll-a (Chla) concentration, which serves as a phytoplankton substitute in inland waters, is one of the leading indicators for water quality. Generally, water samples are analyzed in professional laboratories, and Chla concentrations are measured regularly for the purpose of water quality monitoring. However, limited spatial water sampling and the labor-intensive nature of data collection make global and long-term monitoring difﬁcult. The developments of remote-sensing optical sensors and technologies make the long-term monitoring of Chla concentrations for an entire water body more achievable. Many studies based on machine learning techniques, such as regression and artiﬁcial neural network (ANN) methods, have recently been proposed for Chla concentration estimation using optical satellite images. The methods based on machine learning can achieve accurate estimation. However, overﬁtting problems may arise because the in situ Chla dataset is generally insufﬁcient to train a complicated machine learning model, which makes trained models inapplicable. In this study, an ANN model containing three convolutional and two fully connected layers with 4953 unknown parameters is designed. A transfer learning method, consisting of model pretraining, main-training, and ﬁne-tuning stages, is proposed to ease the problem of insufﬁcient in situ samples. In the model pretraining stage, the ANN model is pretrained and initialized using samples derived from an existing Chla concentration model. The pretrained ANN model is then ﬁne-tuned using the proposed transfer learning technique with in situ samples collected in ﬁve different campaigns carried out during early 2019 from Laguna Lake, the Philippines. Before the transfer learning, data augmentation and rebalancing methods are conducted to enrich the variability and to near-uniformly distribute the in situ samples in Chla concentration space, respectively. To estimate the alleviation of model overﬁtting, the trained ANN model, using an in situ dataset from Laguna Lake, was tested using an in situ dataset from Lake Victoria, Uganda, obtained in 2019, which has a similar trophic state as Laguna Lake. The experimental results from Sentinel-3 imagery indicated that the overﬁtting problem was signiﬁcantly alleviated and the trained ANN model outperformed related models in terms of the root-mean-squared error of the estimated Chla concentrations.


Introduction
Lakes are land-surrounded water bodies that generally provide freshwater for human daily needs. For instance, water from Lake Biwa, Japan, is used as a water drinking resource for people in Osaka and Kyoto and has been maintained as a conservation ecosystem with good water quality [1]. In Indonesia, a freshwater treatment plant, namely, PDAM Kabupaten Kerinci, was built around Lake Kerinci in Jambi to take, store, filter, and distribute the water to people living nearby [2]. Meanwhile, the worldwide demand for Table 1. Description of trophic state index [8].

Trophic Class Chla Concentration Range (in µg/L) Water Condition
Oligotrophic 0~2.6 A lake with very clear waters and high drinking water quality due to low nutrient content and algal production.

Mesotrophic 2.6~20
Commonly clear water lakes with beds of submerged aquatic plants and medium levels of nutrients.

Eutrophic 20~56
The water body will be dominated either by aquatic plants or algae.

Hypertrophic
More than 56 Highly nutrient-rich lakes characterized by frequent and severe nuisance algal blooms and low transparency.
Another promising procedure to estimate Chla concentrations is by means of an artificial neural network (ANN). Buckton et al. [21] proposed a fully connected neural network containing one hidden layer that revealed the capability of ANN for Chla concentration estimation. Similar work was also conducted by other researchers [22][23][24]. Hafeez et al. [25] Appl. Sci. 2022, 12, 203 3 of 16 designed several fully connected neural networks and searched for the optimal hyperparameters, including the number of hidden layers and the number of neurons in a layer. The study also revealed that the optimal ANN model outclassed the other machine learning methods, including random forest, cubist regression, and support vector regression, in terms of Chla concentration estimation. Furthermore, several researchers utilized convolutional neural networks (CNNs), which consider neighborhood spectral information in Chla concentration modelling using convolutional layers with 3D kernels [26][27][28][29].
An ANN model requires a high number of labelled data-that is, it uses in situ Chla concentrations as outputs and their corresponding R rs in satellite images as inputs, and the initial values for unknown parameters for model training. Pyo et al. [28] constructed a CNN model with more than 2000 unknown parameters. This model was trained using only 238 labelled data. Meanwhile, Aptoula and Ariman [26] utilized 320 labelled data to train a CNN model containing 2432 unknown parameters. However, overfitting problems may arise because insufficient labelled data are used to search for the optimal values of thousands of unknown parameters during model training. Nguyen et al. [30] applied data augmentation to enrich the labelled data; however, they did not consider the data imbalance problem that may affect the estimation accuracy. Furthermore, some researchers utilized simulated datasets instead of in situ Chla concentration data to deal with the labelled data insufficiency [31][32][33]. A simulated dataset means that the Chla concentration information is obtained from an existing known model. With this procedure, the labelled data insufficiency can be solved; however, training a neural network model with a simulated dataset may not reach the global optimum of the defined loss function. Syariz et al. [34] proposed a two-stage training method, in which the model is firstly pretrained using a simulated dataset, and the pretrained model is then retrained using an in situ dataset. The advantage of this method is that the pretraining process is able to provide good initial values for the unknown parameters before the main training process using the in situ dataset. The training process can train an ANN model rather well for Chla concentration estimation. However, the overfitting problem is not fully alleviated because of the lack of training sample variability and the problem of training sample imbalance.
In this study, the main objectives were (1) to propose a transfer learning technique using the two-stage transfer training approach for better Chla concentration estimation accuracy; (2) to enrich and balance the Chla-labelled data by performing data augmentation and rebalancing techniques; and (3) to test the ANN model trained using the improved proposed two-stage training transfer learning approach with an in situ dataset from Laguna Lake, using the in situ dataset acquired from Lake Victoria, Uganda. To evaluate the effectiveness of the proposed model learning methods, an ANN model, namely WaterNet, first proposed by Syariz et al. [34], was adopted. The input to WaterNet was a water-body image patch of the size 7 (width) × 7 (height) × 16 (bands) and the output was an estimated Chla concentration at the center pixel of the input patch. Lastly, the proposed transfer learning method can increase the accuracy of Chla concentration retrieval in the lake water body, which can later be utilized by governments to better understand the lake water state and develop a clinical management plan to prevent water quality degradation and to maintain freshwater supplies in the future. The remainder of the paper is organized as follows. Section 2 describes the study area, data material, acquisition, and preprocessing. Section 3 elaborates the proposed transfer learning technique, data augmentation, and data rebalancing. Section 4 presents the experimental results, performance, and the comparisons of the trained ANN model and related models, and Section 5 provides the conclusions and future work.

Data Materials and Preprocessing
The in situ dataset acquired from Laguna Lake, the Philippines, was used to train the proposed ANN model while the in situ dataset acquired from Lake Victoria, Uganda, which has a similar trophic state (i.e., mesotrophic) with Laguna Lake, was utilized to test the trained model. The acquisitions of these two datasets are described in Sections 2.1 and 2.2. The Sentinel-3 imagery used for Chla estimations and data preprocessing are described in Section 2.3.

Laguna Lake of the Philippines
Laguna Lake, with an area of 900 km 2 and an average depth of 2.5 m, is the largest lake in the Philippines. There are more than 20 million people living in the surrounding areas of Laguna Lake, indicating the importance of the lake in providing freshwater for local daily needs [35]. However, around 17% of the lake water body (~150 km 2 ) is occupied by aquaculture cages, where the nutrients and hazardous substances from industrial activity may pollute Laguna Lake, in addition to the issues of rapid population growth, industrialization, and urbanization [36,37]. In this study, field measurements of Chla concentrations were conducted during five different campaigns in 2019, as shown in Figure 1. Infinity-CLW ACLW2-USB, an optical-based data logger used to measure Chla concentrations, was installed on a boat at a depth of 0.5 m below the water's surface. The data logger recorded the Chla concentrations once per second during a 5-hour field survey, collecting more than 15,000 records at each campaign. Outlier removal and data downsampling were conducted to remove noise and to match the Chla concentration sampling resolution with the spatial resolution of the Sentinel-3 images, respectively. After the data pre-processing, 257 in situ Chla samples were obtained from the five field campaigns, as shown in Table 2, and the resulting samples were utilized to train the ANN model and related models for comparison and evaluation.

Data Materials and Preprocessing
The in situ dataset acquired from Laguna Lake, the Philippines, was used to train the proposed ANN model while the in situ dataset acquired from Lake Victoria, Uganda, which has a similar trophic state (i.e., mesotrophic) with Laguna Lake, was utilized to test the trained model. The acquisitions of these two datasets are described in Sections 2.1 and 2.2. The Sentinel-3 imagery used for Chla estimations and data preprocessing are described in Section 2.3.

Laguna Lake of the Philippines
Laguna Lake, with an area of 900 km 2 and an average depth of 2.5 m, is the largest lake in the Philippines. There are more than 20 million people living in the surrounding areas of Laguna Lake, indicating the importance of the lake in providing freshwater for local daily needs [35]. However, around 17% of the lake water body (~150 km 2 ) is occupied by aquaculture cages, where the nutrients and hazardous substances from industrial activity may pollute Laguna Lake, in addition to the issues of rapid population growth, industrialization, and urbanization [36,37]. In this study, field measurements of Chla concentrations were conducted during five different campaigns in 2019, as shown in Figure  1. Infinity-CLW ACLW2-USB, an optical-based data logger used to measure Chla concentrations, was installed on a boat at a depth of 0.5 m below the water's surface. The data logger recorded the Chla concentrations once per second during a 5-hour field survey, collecting more than 15,000 records at each campaign. Outlier removal and data downsampling were conducted to remove noise and to match the Chla concentration sampling resolution with the spatial resolution of the Sentinel-3 images, respectively. After the data pre-processing, 257 in situ Chla samples were obtained from the five field campaigns, as shown in Table 2, and the resulting samples were utilized to train the ANN model and related models for comparison and evaluation.  Table 2. Statistical summary of the in situ samples from Laguna Lake. "Min", "Max", Mean" and "Std." represent the minimum, maximum, mean, and standard deviation of the Chla concentrations, respectively.  Table 2. Statistical summary of the in situ samples from Laguna Lake. "Min", "Max", Mean" and "Std." represent the minimum, maximum, mean, and standard deviation of the Chla concentrations, respectively.

Lake Victoria of Uganda
The in situ Chla concentrations from Lake Victoria were obtained from the Mendeley Online Database (https://data.mendeley.com/ (accessed on 3 August 2021)), as provided by Deirmendjan et al. [38]. The study in [38] estimated the dissolved organic matter (DOM) under the support of the project Lake Victoria Greenhouse Gas Dynamics (LAVIGAS). In this project, there were three campaign periods: 29 March to 8 April 2018, 25 October to 4 November 2018, and 7 June to 17 June 2019. At each period, the water samples for Chla concentrations were measured daily in water depths ranging from 1 to 40 m. Considering that (1) the samples have the same trophic state as Laguna Lake (2.6-20 µg/L), the measurement depth should be similar to that for Laguna Lake (0.5 m), (2) the Chla sampling time should match with the Sentinel-3 image acquisition time, and (3) the Sentinel-3 image pixels corresponding the collected Chla samples should be cloud-free, only two in situ samples, shown in Figure 2, could be utilized. These two samples were used to evaluate the inference performance of the trained models to compare the trained model with related models. Appl

Lake Victoria of Uganda
The in situ Chla concentrations from Lake Victoria were obtained from the Mendeley Online Database (https://data.mendeley.com/ (accessed on 3 August 2021)), as provided by Deirmendjan et al. [38]. The study in [38] estimated the dissolved organic matter (DOM) under the support of the project Lake Victoria Greenhouse Gas Dynamics (LAVI-GAS). In this project, there were three campaign periods: 29 March to 8 April 2018, 25 October to 4 November 2018, and 7 June to 17 June 2019. At each period, the water samples for Chla concentrations were measured daily in water depths ranging from 1 to 40 m. Considering that (1) the samples have the same trophic state as Laguna Lake (2.6-20 μg/L), the measurement depth should be similar to that for Laguna Lake (0.5 m), (2) the Chla sampling time should match with the Sentinel-3 image acquisition time, and (3) the Sentinel-3 image pixels corresponding the collected Chla samples should be cloud-free, only two in situ samples, shown in Figure 2, could be utilized. These two samples were used to evaluate the inference performance of the trained models to compare the trained model with related models.

Sentinel-3 Image Dataset
Fifteen level 2 water full resolution (WFR) images of Laguna Lake, acquired by the ocean and land color instrument (OLCI) sensor of Sentinel-3, were utilized. A Sentinel-3 WFR image contains 16 atmospherically-corrected bands, excluding bands 13-15 ( , , ) and bands 18-19 ( , ,) which are mainly designed for atmospheric correction [39]. The water-leaving reflectance in Sentinel-3 WFR images is further divided by

Sentinel-3 Image Dataset
Fifteen level 2 water full resolution (WFR) images of Laguna Lake, acquired by the ocean and land color instrument (OLCI) sensor of Sentinel-3, were utilized. A Sentinel-3 Appl. Sci. 2022, 12, 203 6 of 16 WFR image contains 16 atmospherically-corrected bands, excluding bands 13-15 (λ 761 , λ 764 , λ 767 ) and bands 18-19 (λ 885 , λ 900 ,) which are mainly designed for atmospheric correction [39]. The water-leaving reflectance in Sentinel-3 WFR images is further divided by π to derive the remote-sensing reflectance R rs . In addition, the Sentinel-3 WFR product also contains several water quality parameters, including the Chla concentrations estimated by using an inverse radiative transfer model-neural network (IRTM-NN) [40]. The Chla concentrations from the IRTM-NN were regarded as a simulated dataset in this study and were used for model pretraining.
In the image data preprocessing, cloud-free water pixels in the R rs images and their neighboring local patches of the spatial size 7 × 7 were extracted. Image patches containing non-water pixels, such as cloud, and pixels with negative R rs values due to imprecise atmospheric correction or cloud shadow, were excluded from the dataset, forming fullwater R rs image patches. The summary of the R rs image patches is presented in Table 3. Similarly, the cloud-free Sentinel-3 image patches corresponding to the locations with the simulated Chla concentrations generated by IRTM-NN were extracted. These image patches with simulated Chla concentrations were used in the model pretraining. The R rs water patches and their corresponding simulated Chla data were used as a training set. The training set is denoted as where n denotes the number of simulated labelled data, and P i and s_chla i represent the i-th R rs water patch and its corresponding simulated Chla concentration, respectively. There were a total of 47,231 simulated labelled data. In addition, 275 in situ Chla data over Laguna Lake and their corresponding R rs water patches were used as the retraining dataset. The retraining dataset is denoted as where n represents the number of in situ Chla samples, and k i and t_chla i represent the i-th R rs water patch and its corresponding in situ Chla concentration, respectively. In addition, one Sentinel-3 WFR image R rs located in Lake Victoria was also obtained, and the acquisition date of the image was 15 June 2019. A R rs water patch of the size 7 × 7 located at the field measurement point LV 1 was extracted. As for the field measurement LV 2 , which was taken on 16 June 2019, the water patch was extracted from the Sentinel-3 image acquired on 15 June 2019. This means that the estimation was conducted using the image acquired one day before the field measurement in LV 2 . Considering the stability of the model training, the water patches from Laguna Lake and Lake Victoria containing R rs at 16 spectral bands were normalized to the range 0, 1 using the minimal and maximal R rs values at each spectral wavelength. The data normalization process was also performed for the in situ and simulated Chla concentration data.

Artificial Neural Network Model
An ANN model, namely WaterNet, proposed by Syariz et al. [34] was adopted. As shown in Figure 3, the input and output to the model was an image patch of the size 7 × 7 × 16 and an estimated Chla concentration in the center pixel of the patch, respectively. The model is an end-to-end network structure consisting of three phases: that is, band expansion, feature extraction, and Chla concentration estimation. In the band expansion phase, there were three convolutional layers with 1 × 1 × 3 kernel filters. The 1 × 1 × 3 kernel filters performing convolution on the spectral domain attempt to augment spectral features from the spectral bands of the input image patch, which is also known as spectral feature extraction via band combination [41][42][43]. Meanwhile, two convolutional layers containing ten filters of the size 3 × 3 × 42, and five filters of the size 3 × 3 × 10, were utilized in the feature extraction phase. With those filters, the spatial feature information was extracted. The output to this phase was a feature map of the size 3 × 3 × 5, and this output was further flattened and linked to the Chla concentration estimation phase which contained two fully connected layers. A rectified linear unit (ReLU) and sigmoid functions were used as the activation function in convolution and fully connected layers, respectively. In total, this ANN model contained 4753 unknown parameters.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 7 of 17 using the minimal and maximal values at each spectral wavelength. The data normalization process was also performed for the in situ and simulated Chla concentration data.

Artificial Neural Network Model
An ANN model, namely WaterNet, proposed by Syariz et al. [34] was adopted. As shown in Figure 3, the input and output to the model was an image patch of the size 7 × 7 × 16 and an estimated Chla concentration in the center pixel of the patch, respectively. The model is an end-to-end network structure consisting of three phases: that is, band expansion, feature extraction, and Chla concentration estimation. In the band expansion phase, there were three convolutional layers with 1 × 1 × 3 kernel filters. The 1 × 1 × 3 kernel filters performing convolution on the spectral domain attempt to augment spectral features from the spectral bands of the input image patch, which is also known as spectral feature extraction via band combination [41][42][43]. Meanwhile, two convolutional layers containing ten filters of the size 3 × 3 × 42, and five filters of the size 3 × 3 × 10, were utilized in the feature extraction phase. With those filters, the spatial feature information was extracted. The output to this phase was a feature map of the size 3 × 3 × 5, and this output was further flattened and linked to the Chla concentration estimation phase which contained two fully connected layers. A rectified linear unit (ReLU) and sigmoid functions were used as the activation function in convolution and fully connected layers, respectively. In total, this ANN model contained 4753 unknown parameters.

ANN Model Training
Utilizing insufficient in situ Chla concentration data and unsuitable initialization for the unknown parameters in ANN model training may lead to model overfitting and make the loss function difficult to converge. Syariz et al. [34] proposed a two-stage training approach consisting of pretraining and main-training, which is shown to be able to deal with the aforementioned problems. The first stage provides a better initialization for the unknown parameters before the main stage by pretraining the model with the simulated labelled data , _ ℎ . Here, the estimation error is large and backpropagating the error could make the extraction of the spatial feature not optimum. Moreover, the convergence of the loss function may not reach its global minimum due to the utilization of the simulated data. However, this allows the model to have suitable initial values of the unknown parameters before the main training stage. Then, the pretrained model is refined with the in situ labelled data , _ ℎ . This procedure is also known as transfer learning.
In this study, the two-stage training was adopted and the main stage part was improved by the implementation of fine-tuning, another kind of transfer learning technique.

ANN Model Training
Utilizing insufficient in situ Chla concentration data and unsuitable initialization for the unknown parameters in ANN model training may lead to model overfitting and make the loss function difficult to converge. Syariz et al. [34] proposed a two-stage training approach consisting of pretraining and main-training, which is shown to be able to deal with the aforementioned problems. The first stage provides a better initialization for the unknown parameters before the main stage by pretraining the model with the simulated labelled data {(K i , s_chla i )} n i=1 . Here, the estimation error is large and backpropagating the error could make the extraction of the spatial feature not optimum. Moreover, the convergence of the loss function may not reach its global minimum due to the utilization of the simulated data. However, this allows the model to have suitable initial values of the unknown parameters before the main training stage. Then, the pretrained model is refined with the in situ labelled data {(P i , t_chla i )} m i=1 . This procedure is also known as transfer learning.
In this study, the two-stage training was adopted and the main stage part was improved by the implementation of fine-tuning, another kind of transfer learning technique. Moreover, data augmentation and rebalancing were also proposed and performed before the training in the improved main stage. The aim was to have more in situ labelled data with balanced amounts of samples in Chla concentration distribution space. Details regarding the data augmentation and rebalancing and the proposed transfer learning approach are explained below.

Data Augmentation and Rebalancing
To enrich the variability of the Chla in situ dataset, the data augmentation technique was implemented, as the convolutional processing is insensitive to rotation and scale [44,45]; however, the balance of data may not be considered. In this study, the data augmentation was performed on the in situ labelled data {(P i , t_chla i )} m i=1 by applying rotation to the image patches (with angles of 90 • , 180 • , and 270 • ) and flipping the rotated images from the left to right. Then, the rotated and flipped image patches were linked to their corresponding Chla concentration as a new dataset, namely, an augmented dataset: that is, where q is the number of rotated and flipped images (2216 data in total). The augmented dataset was further reclassified into 12 classes, with the first class starting from 6 µg/L, the last class ending at 12 µg/L, and each class covering 0.5 µg/L, as shown in Figure 4. Figure 4a implies the frequency of the in situ Chla in the augmented dataset. As seen, the difference between the Chla concentration data inter-range is huge, and indicates the imbalanced distribution of the data. Training the model with a data imbalance may reduce the optimum accuracy, and therefore data rebalancing is necessary. For that, a sample rebalancing technique was conducted by randomly removing several rotated and flipped in situ labelled data if the frequency of Chla concentration of the corresponding class was more than 100 sets (see Figure 4b). This kept the Chla concentration data at each range equal to or less than 100 sets, thus the balance of the data was achieved. In total, the data augmentation and rebalancing generated 900 where r denotes the number of rebalanced in situ labelled data, q i and n_chla i represent the i-th R rs water patch and its corresponding in situ Chla concentration, respectively. This also includes its original data {(P i , t_chla i )} m i=1 . For simplification, the summary of dataset variations is described below.
The simulated labelled data {(K i , s_chla i )} n i=1 , the in situ labelled data which also refer to the original dataset {(P i , Moreover, data augmentation and rebalancing were also proposed and performed before the training in the improved main stage. The aim was to have more in situ labelled data with balanced amounts of samples in Chla concentration distribution space. Details regarding the data augmentation and rebalancing and the proposed transfer learning approach are explained below.

Data Augmentation and Rebalancing
To enrich the variability of the Chla in situ dataset, the data augmentation technique was implemented, as the convolutional processing is insensitive to rotation and scale [44,45]; however, the balance of data may not be considered. In this study, the data augmentation was performed on the in situ labelled data , _ ℎ by applying rotation to the image patches (with angles of 90°, 180°, and 270°) and flipping the rotated images from the left to right. Then, the rotated and flipped image patches were linked to their corresponding Chla concentration as a new dataset, namely, an augmented dataset: that is, , _ ℎ where is the number of rotated and flipped images (2216 data in total). The augmented dataset was further reclassified into 12 classes, with the first class starting from 6 μg/L, the last class ending at 12 μg/L, and each class covering 0.5 μg/L, as shown in Figure 4. Figure 4a implies the frequency of the in situ Chla in the augmented dataset. As seen, the difference between the Chla concentration data inter-range is huge, and indicates the imbalanced distribution of the data. Training the model with a data imbalance may reduce the optimum accuracy, and therefore data rebalancing is necessary. For that, a sample rebalancing technique was conducted by randomly removing several rotated and flipped in situ labelled data if the frequency of Chla concentration of the corresponding class was more than 100 sets (see Figure 4b). This kept the Chla concentration data at each range equal to or less than 100 sets, thus the balance of the data was achieved. In total, the data augmentation and rebalancing generated 900 rebalanced data , _ ℎ where denotes the number of rebalanced in situ labelled data, and _ ℎ represent the -th water patch and its corresponding in situ Chla concentration, respectively. This also includes its original data , _ ℎ . For simplification, the summary of dataset variations is described below.  The simulated labelled data , _ ℎ ,  the in situ labelled data which also refer to the original dataset , _ ℎ  the augmented dataset , _ ℎ , and  the rebalanced dataset , _ ℎ .

Transfer Learning
In this study, two-stage training was adopted and the main stage part was improved by the implementation of fine-tuning. The procedures for the fine-tuning in the proposed transfer learning is as follows. There are two sub-stages in the main training.

1.
Main-training stage. With the help of the pretraining stage, the ANN model contains suitable values of unknown parameters. Training them with the rebalanced dataset increases the possibility that the search for the global minimum in the loss function can be reached. This also means that the accuracy of the estimation is enhanced or the estimation error is smaller. The error is then backpropagated to update the unknown parameters and the spatial feature is more robust.

2.
Fine-tuning stage. In the previous stage, the extraction of the spatial feature is already powerful, and continuing training the previously trained ANN model with the rebalanced dataset may only endanger the spatial feature. Therefore, a fine-tuning technique is performed in this stage by means of "network surgery". First, the model is split into two parts: the body part, consisting of the first and second phase of the ANN model; and the head part, consisting of the last phase of the ANN model, which is the Chla concentration estimation phase. The head part is then removed, leaving the body part only. Inputting an image patch to the body part only will result in a spatial feature image. In machine learning, the technique to split and remove the head part is known as a feature extractor. Moreover, a new head part containing a similar network as the last phase with a random initial value for the unknown parameters is attached to the body part. Here, if the gradient is allowed to backpropagate from these random values all the way through the network, the powerful spatial features could be at risk.
To prevent this problem, the layers in the body part, i.e., in the first and second phase of the model, are frozen or set as untrainable and allow the backpropagation when training be performed on the new head only. This allows the network to start learning from the powerful spatial feature and the estimation of Chla concentration can be optimized. Lastly, all of the layers are unfrozen or set as trainable. However, different to the previous stage or sub-stage in which the training is conducted with a learning rate of 0.001, the learning rate is now set to a very small rate of 0.0001. The aim of setting such very small rate is to obtain a suitable adjustment for the body and head parts. For simplification, Figure 5 shows the workflow of the fine-tuning stage. In this study, two-stage training was adopted and the main stage part was improved by the implementation of fine-tuning. The procedures for the fine-tuning in the proposed transfer learning is as follows. There are two sub-stages in the main training.
1. Main-training stage. With the help of the pretraining stage, the ANN model contains suitable values of unknown parameters. Training them with the rebalanced dataset increases the possibility that the search for the global minimum in the loss function can be reached. This also means that the accuracy of the estimation is enhanced or the estimation error is smaller. The error is then backpropagated to update the unknown parameters and the spatial feature is more robust. 2. Fine-tuning stage. In the previous stage, the extraction of the spatial feature is already powerful, and continuing training the previously trained ANN model with the rebalanced dataset may only endanger the spatial feature. Therefore, a fine-tuning technique is performed in this stage by means of "network surgery". First, the model is split into two parts: the body part, consisting of the first and second phase of the ANN model; and the head part, consisting of the last phase of the ANN model, which is the Chla concentration estimation phase. The head part is then removed, leaving the body part only. Inputting an image patch to the body part only will result in a spatial feature image. In machine learning, the technique to split and remove the head part is known as a feature extractor. Moreover, a new head part containing a similar network as the last phase with a random initial value for the unknown parameters is attached to the body part. Here, if the gradient is allowed to backpropagate from these random values all the way through the network, the powerful spatial features could be at risk. To prevent this problem, the layers in the body part, i.e., in the first and second phase of the model, are frozen or set as untrainable and allow the backpropagation when training be performed on the new head only. This allows the network to start learning from the powerful spatial feature and the estimation of Chla concentration can be optimized. Lastly, all of the layers are unfrozen or set as trainable. However, different to the previous stage or sub-stage in which the training is conducted with a learning rate of 0.001, the learning rate is now set to a very small rate of 0.0001. The aim of setting such very small rate is to obtain a suitable adjustment for the body and head parts. For simplification, Figure 5 shows the workflow of the fine-tuning stage.  For hyperparameters, the Adam optimizer is employed due to its capability in adaptively tuning the learning rate and moment [46], and the mean squared error (MSE) is used as the loss function L and is defined as follows: where prChla i is the prediction or estimation of Chla concentration from the input image patch of the i-th labelled data. Moreover, overfitting is alleviated by adopting two regularization techniques: dropout and L 2 regularization. The dropout rate is set to 0.5, meaning that only 50% of the total unknown parameters are temporarily deactivated when computing the loss function for model convergence monitoring, whereas the L 2 regularization adds the Frobenius norm to the loss function to penalize large weights during error backpropagation for the tuning of unknown parameters. The maximum epoch is set to 30 and the trained network from an epoch with the smallest value of the loss function will be stored and used for the Chla concentration estimation.

Experimental Results and Discussion
This study proposed a transfer learning technique consisting of model pretraining, main-training, and fine-tuning stages for Chla ANN model training with an insufficient in situ dataset. In addition, the data augmentation and rebalancing were integrated with the transfer learning for Chla in situ data enrichment and imbalance. To evaluate the proposed method, a k-fold cross validation was performed with the Chla in situ dataset from Laguna Lake, the Philippines, where k was empirically set to 10. In this section, the results of the proposed transfer learning are presented in Section 4.1, and the effect of data imbalance to the trained ANN model is presented in Section 4.2. In addition, Section 4.3 demonstrates the comparisons between the CNN model trained by the proposed transfer learning with the related models using the dataset from Lake Victoria, Uganda. For accuracy assessment, the root mean squared error (RMSE) is employed by rooting the MSE in Equation (1).

Evaluation of the Transfer Learning
To evaluate the proposed transfer learning technique with the processes of data augmentation and rebalance, the ANN named WaterNet was used for Chla concentration estimation. For details about WaterNet, please refer to Section 3.2. To evaluate the performance of the three training stages in the transfer learning, the hyperparameters containing the batch size, the optimizer, and the number of epochs was the same and 10-fold cross validation was performed on the rebalanced dataset from Laguna Lake. The evaluation results are presented in Table 4. After the model pretraining, the accuracy of the estimated Chla concentrations was not satisfied. The range and average of RMSEs of the folds were 2.070~2.228 µg/L and 2.144 µg/L, respectively. This implies that a poor performance with high estimation errors was obtained when training the ANN model using the simulated Chla data. Although the ANN model at this stage cannot effectively retrieve the Chla concentrations, this training stage can provide suitable initial values for the unknown parameters for the coming stage. As a result, a better estimation result was obtained in the main-training stage. The range and average of the RMSEs at folds decreased to 0.4866~0.6887 µg/L and 0.5819 µg/L, respectively. Moreover, the trained ANN model was further fine-tuned in the next stage. The average RMSE improved from 0.5819 µg/L in the second stage to 0.3724 µg/L in the third stage. This was caused by setting the layers in the band extension and feature extraction phases to untrainable and only permitting the backpropagation to work on the layers in the Chla concentration phase. The ANN model trained by the proposed transfer learning was applied to five Sentinel-3 images, which were acquired at similar dates with the field campaigns in Laguna Lake, Philippines. The Chla concentration maps for the water body, shown in Figure 6, are visualized by colors ranging from yellow (6 µg/L) to red (12 µg/L). In addition, the outputs from the feature extraction phase in the ANN shown in Figure 3 are convolutional feature maps of the size 3 × 3 × 5. The feature maps imply the importance of spatial features for the Chla estimation. To visualize the feature maps for the whole lake body, the center pixels of the feature maps were extracted and combined to form spatial feature maps. The Chla concentrations of Laguna Lake on 6 April 2019, estimated by the trained ANN and the spatial feature maps extracted from the trained ANN, are shown in Figure 7. The spatial feature maps #1 and #3 are flashier than the others. To address this on the two spatial feature maps, the two dashed boxes are set on the maps to represent the area of interest for highlight and discussion. As shown in the brown dashed box, most of the features within this area have smaller values in feature map #1 and higher values in feature map #3. The significant differences between these two feature maps result in high Chla concentrations during the model prediction. As for those in the yellow dashed box, the opposite results are obtained, because the area is homogeneous and the pixels within this area have similar values. This observation revealed that the proposed transfer learning is able to preserve spatial features that are important in Chla concentration estimation.

Performance of Data Augmentation and Rebalancing
Three datasets are used and tested in this subsection, namely, original, augmented, and balanced datasets. The original dataset refers to the Chla in situ data acquired from Laguna Lake, the Philippines. The augmented and balanced datasets are the augmented in situ datasets without and with, respectively, the consideration of in situ Chla concentration unbalancing. The comparisons of the proposed transfer learning using these three datasets are shown in Figure 8. The results indicated that the RMSEs of the training using the original dataset ranged from 0.5 µg/L to 1.0 µg/L. By using the augmented dataset, the RMSEs of estimated Chla concentrations ranged from 7.5 µg/L to 9.5 µg/L. This is caused by the fact that more Chla samples in the augmented dataset are in the Chla concentration ranges 7.5~8 µg/L and 9.5~10 µg/L. Consequently, the sample imbalance on the Chla concentrations makes the performance of the trained model worse than that trained using the original dataset. When the data rebalancing that considers the distribution of samples' Chla concentrations in the augmented dataset is performed, the RMSEs of the estimated Chla concentrations are improved to 0.5~0.7 µg/L. Similar statistical results are shown in Figure 9, where the correlation coefficient between the estimated and in situ Chla concentrations was improved when the data rebalancing was performed with data augmentation.

Performance of Data Augmentation and Rebalancing
Three datasets are used and tested in this subsection, namely, original, augmented, and balanced datasets. The original dataset refers to the Chla in situ data acquired from Laguna Lake, the Philippines. The augmented and balanced datasets are the augmented in situ datasets without and with, respectively, the consideration of in situ Chla concen-

Comparisons of Chla Estimation Models
A real model test, in which the machine-learning model is trained and tested using two geographically different and dependently corrected sample datasets, is rarely conducted due to the limited in situ Chla samples and overfitting problems. In this study, a ANN model was trained using the proposed transfer learning with the processes of data augmentation and rebalancing. The training Chla sample dataset was collected from Laguna Lake, Philippines. The trained ANN model was then applied to the Chla samples acquired from Lake Victoria, Uganda, for testing and evaluation. In addition, the trained ANN model was compared with the related models, including the three-band model [9], two-band model [13], NDCI [14], and WaterNet [34]. WaterNet is described in Section 3.1 and the other models are presented in Table 5. For fair comparisons, the three-band and two-band models were calibrated using in situ Chla-labelled data from Laguna Lake with a linear regression model. Linear regression was selected to ease overfitting problems. In addition, the hyperparameters containing the batch size, the optimizer, and the learning rate in the WaterNet training with original two-stage training are the same as that in the proposed training. Different to the other compared models, it is not necessary to calibrate the NDCI model, as the model directly outputs the estimated Chla concentrations. All of the compared models were trained using the dataset from Laguna Lake and then tested using the dataset from Lake Victoria for fair comparisons.

Comparisons of Chla Estimation Models
A real model test, in which the machine-learning model is trained and tested using two geographically different and dependently corrected sample datasets, is rarely conducted due to the limited in situ Chla samples and overfitting problems. In this study, a ANN model was trained using the proposed transfer learning with the processes of data augmentation and rebalancing. The training Chla sample dataset was collected from Laguna Lake, Philippines. The trained ANN model was then applied to the Chla samples acquired from Lake Victoria, Uganda, for testing and evaluation. In addition, the trained ANN model was compared with the related models, including the three-band model [9], two-band model [13], NDCI [14], and WaterNet [34]. WaterNet is described in Section 3.1 and the other models are presented in Table 5. For fair comparisons, the three-band and two-band models were calibrated using in situ Chla-labelled data from Laguna Lake with a linear regression model. Linear regression was selected to ease overfitting problems. In addition, the hyperparameters containing the batch size, the optimizer, and the learning rate in the WaterNet training with original two-stage training are the same as that in the proposed training. Different to the other compared models, it is not necessary to calibrate the NDCI model, as the model directly outputs the estimated Chla concentrations. All of the compared models were trained using the dataset from Laguna Lake and then tested using the dataset from Lake Victoria for fair comparisons.

Comparisons of Chla Estimation Models
A real model test, in which the machine-learning model is trained and tested using two geographically different and dependently corrected sample datasets, is rarely conducted due to the limited in situ Chla samples and overfitting problems. In this study, a ANN model was trained using the proposed transfer learning with the processes of data augmentation and rebalancing. The training Chla sample dataset was collected from Laguna Lake, Philippines. The trained ANN model was then applied to the Chla samples acquired from Lake Victoria, Uganda, for testing and evaluation. In addition, the trained ANN model was compared with the related models, including the three-band model [9], two-band model [13], NDCI [14], and WaterNet [34]. WaterNet is described in Section 3.1 and the other models are presented in Table 5. For fair comparisons, the three-band and two-band models were calibrated using in situ Chla-labelled data from Laguna Lake with a linear regression model. Linear regression was selected to ease overfitting problems. In addition, the hyperparameters containing the batch size, the optimizer, and the learning rate in the WaterNet training with original two-stage training are the same as that in the proposed training. Different to the other compared models, it is not necessary to calibrate the NDCI model, as the model directly outputs the estimated Chla concentrations. All of the compared models were trained using the dataset from Laguna Lake and then tested using the dataset from Lake Victoria for fair comparisons.  Table 6 shows the comparison results of WaterNet, trained using original two-stage training with original data, and the proposed method, including the improved transfer learning with data augmentation and rebalancing. The table also contains the related models using the Chla dataset from Lake Victoria. The results indicate that the three-band model with the performance RMSE = 0.588 µg/L and the two-band models with the performance RMSE = 0.509 µg/L have similar Chla concentration prediction accuracy. This may be due to the fact that these two models utilize similar R rs features, that are R rs at λ 443 and at λ 490 , which share similar sensitivity to the absorption [13]. WaterNet trained with original two-stage training and data also performed similarly, with RMSE = 0.496 µg/L. Better performances were obtained when the estimation of Chla concentrations was conducted using WaterNet with the proposed training method and NDCI. The RMSEs of the two models were 0.228 µg/L and 0.244 µg/L, and WaterNet with the proposed training was slightly better than NDCI. This means that the proposed transfer learning with the processes of data augmentation and rebalancing is able to resist the overfitting problem, and the performance of the trained model outperforms the related models. Table 6. Comparisons of the ANN model trained by the proposed transfer learning with the related models using Chla samples acquired from Lake Victoria, Uganda.

Station Name
Estimation Error (in µg/L)

Conclusions and Future Work
A transfer learning method containing the stages of model pretraining, main training, and fine tuning, was proposed to train ANN models for Chla concentration estimation using Sentinel-3 images. In addition, data augmentation and rebalancing were performed not only to increase the variability of the training dataset, but also to balance the samples in terms of Chla concentrations. To evaluate the ease of overfitting and to compare with related models, the models were trained using the Chla dataset from Laguna Lake and then tested using the Chla dataset from Lake Victoria, which has the same trophic state with Laguna Lake. The quantitative assessments on the Setinel-3 WFR images demonstrate that the proposed transfer learning method is better than that of WaterNet, and the trained CNN outperforms the related models in terms of Chla estimation accuracy. Considering that the data rebalancing can provide massive effects to the performance of the model, in the near future, WaterNet will be redesigned such that the neural network can be applied to other optical satellite imagery with better spatial resolution, including Sentinel-2 and Landsat 8 images, in order to improve the extraction of important spatial features in lake water bodies. In addition, other water quality parameters, such as turbidity and total suspended matter, will be included in the modelling.