Deep Learning for Chlorophyll-a Concentration Retrieval: A Case Study for the Pearl River Estuary

: The abundance of phytoplankton is generally estimated by measuring the chlorophyll-a concentration ( C chla ), which is an important factor in photosynthesis and can be used to analyze the density and biomass of phytoplankton in the ecosystem. The band-ratio-based empirical or semianalytical algorithms are operationally applied to retrieve C chla in global oceans, which generally experience difﬁculties from the diversity of optical properties and the complexity of the radiative transfer equations in analytical analyses, respectively. With an attempt to develop an accurate C chla retrieval model for the optically complex coastal and estuarine waters, this study aimed to explore the deep learning (DL) methods in satellite retrieval of C chla . A two-stage convolutional neural network (CNN), named C chla -Net, was proposed, which utilized the spectral information of remote sensing reﬂectances at MODIS/Aqua’s visible bands. In the ﬁrst-stage phase, the C chla -Net was pretrained by a set of remote sensing patches, in which the C chla was generated from an existing model (OC3M). The pretrained results were than used as the initial values to reﬁne the network with the synthetic oversampled in-situ dataset in the second-stage training phase. Using in-situ samples for training with the new initial values has a higher probability to reach the global optimum. The quantitative analyses showed that the two-stage training was more likely to achieve a global optimum in the optimization than the one-stage training. Matchups of the in-situ C chla measurements were used to evaluate the retrieval models. Results showed that the proposed C chla -Net produced obvious better performance than the empirical and semi-analytical algorithms, implying the DL method was more effective for optically complex waters with extremely high C chla . This study provided an applicable method for remote sensing retrieval of C chla , which should be helpful for studying the spatial distribution and temporal variability in the productive Pearl River estuary (PRE) waters.


Introduction
Chlorophyll-a concentration (C chla ) is one of the key estuarine water quality parameters and serves as an essential indicator of ocean primary productivity [1]. Accurate retrieval of C chla from ocean color data is often an extremely challenging task in estuarine and coastal waters, due to the complex optical properties related to the inconstant and uncorrelated phytoplankton biomass, suspended sediments and colored dissolved organic matter (CDOM). The currently available satellite-derived water quality products are restricted to optically significant materials [2], and the standard ocean algorithms have tended to be largely dispersed in specific regions [3]. In addition, the atmospheric correction errors

Study Area
The PRE is a subtropical and high biological productivity estuary located in the continental shelf of the northern South China Sea (SCS). The SCS is a typical monsooninfluenced region. Southwest winds prevail in summer, and northeast winds prevail in winter [24,25]. In this study, the seasons refer to those for the northern hemisphere, i.e., summer refers to June, July and August, and winter refers to December, next January and February. As the third largest river in China, the Pearl River is well known for its complex river networks, and the water composition varies widely both spatially and temporally in the PRE [26]. Lingdingyang Bay of the Pearl River estuary (LBPRE) forms the largest estuarine bay in South China, which is a trumpet-shaped bay stretching in a near NNW-SSE direction and covering a sea area of about 2110 km 2 [27]. With rapid growth of the population and urbanization, the PRE is contaminated by industrial pollution, agricultural Remote Sens. 2021, 13, 3717 3 of 16 runoff and domestic sewage, which threaten the water quality of the PRE [28,29]. In the study area, there are turbid and high productive coastal waters and clear continental shelf waters. As a result, the C chla is characterized by wide ranges and fast changes, indicating that the PRE is a suitable place for training a representative retrieval network.

In-Situ Dataset
Ten campaigns were conducted between the year 2003 and 2012 to collect the water samples and optical spectrum. A total of 18 consistent stations were pre-set along the central y-axis of the PRE. The distance between neighboring stations was about 4.5 km, and all the stations covered a total distance of about 80 km from the sea upstream. Positions for sampling stations are plotted in Figure 1. Note that it only covered the first 16 or 17 stations in several campaigns due to weather conditions. A total of 165 in-situ R rs and the corresponding C chla dataset was collected. The statistical descriptions of the in-situ samples are summarized in Table 1. The water-leaving Rrs was measured using a spectrometer (USB4000, Ocean O Inc., Dunedin, FL, USA) following the National Aeronautics and Space Adminis (NASA) ocean optics standard protocol [30]. The upwelling radiance (Lu), sky ra (Lsky) and radiance reflected by a standard gray plaque (Lp) were measured, and R  The water-leaving R rs was measured using a spectrometer (USB4000, Ocean Optics, Inc., Dunedin, FL, USA) following the National Aeronautics and Space Administration (NASA) ocean optics standard protocol [30]. The upwelling radiance (L u ), sky radiance (L sky ) and radiance reflected by a standard gray plaque (L p ) were measured, and R rs was calculated using the following equation: where λ is the wavelength, ρ p is the reflectance of the gray plaque and ρ f is the water surface Fresnel reflectance, with a value of 0.028 for wind speeds less than 5 m·s −1 . The water samples for measuring C chla were collected from the surface layer (a depth of between 30 cm and 50 cm) and filtered through 25-mm Whatman GF/F filters under a low vacuum. The filters were measured using a 90% acetone method in a pre-calibrated Turner Design 10 fluorometer [31].

MODIS Imagery
Level-1A MODIS data onboard the Aqua spacecraft was obtained from the National Aeronautics and Space Agency (NASA) ocean color data archive. The remote sensing imageries were preprocessed using the SeaWiFS data analysis system (SeaDAS, version 7.5.3). The Management Unit of the North Seas Mathematical Models (MUMM) was employed for atmospheric correction [32]. Flags were used to mask contamination from land, clouds, sun glint and other potential disturbances. For the matchups between insitu and satellite data, the procedure developed by Evers-King et al. was adopted [33]. A 3 × 3 box surrounding the location of the in-situ measurement was used to extract satellite data. The mean value within the box was calculated for each parameter if the box contained at least 3 valid pixels.
The discrepancies between in-situ measured and sensor-observed R rs were minimized through the adjustment process based on a multilinear regression algorithm (MLR) [34]. The adjusted R rs adj (λ) was calculated as follows: where R rs or (λ) is the original MODIS-observed R rs , and ∆R rs (λ) is the discrepancy between in-situ measured and MODIS-observed R rs . The MLR scheme is as follows: where the input vectors are the original R rs at the MODIS's visible bands (412, 443, 469, 488, 547, 555, 645, 667 and 678 nm). The coefficients a i sat (i = 0,1,..., 9) were calculated through a multilinear regression between ∆R rs (λ) and the input vectors based on the matchup R rs dataset.

Overall Framework
Despite the complex hierarchical structures, all the DL-based models included three main components: the prepared input data, the core deep networks and the expected output data. The overall framework is briefly outlined in Figure 2. Four major steps were involved in the development of network, including feature generation, imagery patching, dataset oversampling and two-stage C chla -Net training and validating. In these steps, R rs at the MODIS/Aqua's visible bands were used to generate six sensitive features. The Ocean Colour 3 band ratio (OC3M) [4], a fourth-order band ratio algorithm that uses one of two blue and green band ratios, depending on the optical properties of different water types, was utilized for the initial C chla estimation. The formula of OC3M is defined as follows:  (4) A two-stage network was adopted to achieve a global optimum in the optimization. A synthetic minority oversampling technique was adopted to overcome the shortcoming of limited in-situ samples. The 10-fold cross-validation was applied for model training and validation. When the core deep network has been well-trained, it can be employed to predict the expected output of a given testing dataset. The coefficient of determination (R 2 ), root mean squared difference (RMSD), mean absolute difference (MAD) and mean absolute percentage difference (MAPD) between two datasets were used to evaluate model performance.  A two-stage network was adopted to achieve a global optimum in the optimization. A synthetic minority oversampling technique was adopted to overcome the shortcoming of limited in-situ samples. The 10-fold cross-validation was applied for model training and validation. When the core deep network has been well-trained, it can be employed to predict the expected output of a given testing dataset. The coefficient of determination (R 2 ), root mean squared difference (RMSD), mean absolute difference (MAD) and mean absolute percentage difference (MAPD) between two datasets were used to evaluate model performance.
Here, x m and x p denote the measured and predicted samples, respectively. x m denotes the mean value of the measured samples, and N is the number of samples.

Feature Generation and Data Preprocessing
The atmospheric-corrected and adjusted R rs at MODIS/Aqua's visible bands were considered for algorithm development. Band ratio algorithms involving R rs at blue and green bands have been widely employed for C chla retrieval [4,6,7]. To determine the optimal band ratios for the PRE waters, Figure 3 shows the R 2 from the linear regression analysis between different band ratios and C chla based on in-situ dataset. It can be seen that the correlation was insufficient with those band ratios involving R rs (412), which might be attributed to the atmospheric correction issues associated with the 412 nm band in turbid coastal waters. To improve the efficiency of C chla -Net, six different band ratios, with R 2 ranging from 0.38 to 0.54, were used as input features. The six band ratios were R rs (443)/R rs (555), R rs (469)/R rs (555), R rs (488)/R rs (555), R rs (547)/R rs (555), R rs (667)/R rs (645) and R rs 678)/R rs (667).
Here, xm and xp denote the measured and predicted samples, respectively. m x denotes the mean value of the measured samples, and N is the number of samples.

Feature Generation and Data Preprocessing
The atmospheric-corrected and adjusted Rrs at MODIS/Aqua's visible bands were considered for algorithm development. Band ratio algorithms involving Rrs at blue and green bands have been widely employed for Cchla retrieval [4,6,7]. To determine the optimal band ratios for the PRE waters, Figure 3 shows the R 2 from the linear regression analysis between different band ratios and Cchla based on in-situ dataset. It can be seen that the correlation was insufficient with those band ratios involving Rrs(412), which might be attributed to the atmospheric correction issues associated with the 412 nm band in turbid coastal waters. To improve the efficiency of Cchla-Net, six different band ratios, with R 2 ranging from 0.38 to 0.54, were used as input features. The six band ratios were

Rrs(443)/Rrs(555), Rrs(469)/Rrs(555), Rrs(488)/Rrs(555), Rrs(547)/Rrs(555), Rrs(667)/Rrs(645) and
Rrs678)/Rrs(667). The patching process was primarily performed to create a local 64 × 64 patch. The determination of patch size was a key procedure, which needed to take into account both the network's structure and characteristics of the remote sensing imagery (i.e., spatial resolution). Features extracted from too small of a patch were insufficient for a deep network, whereas a single pixel's Cchla from too large of a patch was not representative. Those The patching process was primarily performed to create a local 64 × 64 patch. The determination of patch size was a key procedure, which needed to take into account both the network's structure and characteristics of the remote sensing imagery (i.e., spatial resolution). Features extracted from too small of a patch were insufficient for a deep network, whereas a single pixel's C chla from too large of a patch was not representative. Those patches with clouds or lands were eliminated. The maximum of the OC3M-based C chla within the patch was used to represent the rough value of each patch. After the patching process, the log 10 -transformed C chla and six band ratio data were normalized to 0.0~1.0 to ensure that they were in the same range.

Oversampling In-Situ Dataset
C chla -Net is a deep network which requires a large number of in-situ samples for training. However, only 156 in-situ samples were insufficient, which would probably have increased the generalization errors. In addition, the sampling sites were mostly distributed in the estuary; therefore, the number of samples with a high C chla was more than that with a low C chla . This imbalance of the dataset could have made it difficult to adjust the weights and biases related to low C chla during training and finally reduce the accuracy of low C chla estimation. To solve this problem, a synthetic minority oversampling technique (SMOTE) [35] was adopted. The SMOTE technique, as an improved approach based on random oversampling, is commonly used for imbalanced data learning. Synthetic samples are generated in the following ways: For a dataset with m samples {x i ,y i }, i = 1,2,...,m, where x i is a vector with n dimensional features, and y i is the class label associated with x i . Take the difference between the feature vector under consideration and its nearest neighbor. Multiply the difference by a random number between 0 and 1, and add it to the feature vector under consideration [35]. For each minority class sample x si and the number of synthetic samples that need to be generated g i , repeat the following calculation from 1 to g i . Randomly choose one minority class sample x zi from the K nearest neighbors, and generate the synthetic sample s i .
where λ is a random number between 0 and 1. A novel adaptive synthetic (ADASYN) sampling approach for imbalanced learning was employed [36]. The essential idea of ADASYN is to use a density distribution to adaptively generate synthetic samples for minority datasets.

C chla -Net Structure
The C chla -Net layer configurations were designed following the same principles of VG-GNET16 [15], which has been demonstrated to be beneficial for the classification accuracy by increasing the depth with very small convolution filters. Figure 4 illustrates the network structure of C chla -Net. The input to the C chla -Net was a volume of a fixed size 64 × 64 × 6, and the output was the estimated C chla normalized at the center pixel. Each pixel in the patch contained six normalized band ratio features. The C chla -Net contained 13 convolution layers and three fully connected layers. The input volume was passed through a stack of convolution layers, where the filters used a small kernel size of 3 × 3 to capture the notion of left/right, up/down and center. The channel of convolution started from 64 in the first layer and then increased by a factor of 2, until it reached 512. The stride was fixed to 1 pixel, and the patch was padded with zeros to ensure the spatial size was preserved after the convolution. All convolution layers were equipped with a rectified linear unit function (ReLU) [17]. Spatial pooling was carried out by five max-pooling layers over a 2 × 2 window with stride of 2, following some of the convolution layers. The purpose of max-pooling layer was for downsampling and compressing features. The 3D volume was reshaped into a 1D vector by flattening and three fully connected layers: the first and second layer had 2048 neurons, respectively, and the final layer contained 1 neuron representing the normalized C chla at the center pixel. Remote Sens. 2021, 13, x FOR PEER REVIEW 9 of 18 The stochastic gradient descent with momentum (SGDM) optimizer, which utilizes mini-batch stochastic gradient, was employed for optimization. The batch size was set to 128 and momentum to 0.9. To alleviate the overfitting, the L2 regularization was added to the loss function during the network's backpropagation (the L2 penalty multiplier was set to 1.0 × 10 −5 ), whereas the dropout regularization for the first two fully connected layers was adopted. The dropout ratio was set to 0.5, indicating that 50% of the neurons in the two fully connected layers were temporally retained when computing the loss function for the weights' updating. In the training phase, the number of epochs was set to 30, and the initial learning rate was set to 0.01, with a drop factor of 0.1 after every 10 epochs. The half-mean-squared-error was used as the loss function, which is defined as: where ti is the labeled sample, yi is the corresponding prediction and R is the number of samples.

MLR Adjustment
A total of 15 pairs of matchups from all campaigns were used for extracting coefficients of MLR adjustment and for the network testing independently. The MLR adjustment relied on in-situ measurements for reducing uncertainty and bias due to systematic perturbations, as resulting from absolute calibration and minimization of the atmospheric effects.
The scatterplots in Figure 5 showed the in-situ measurements versus the Rrs before ('black' plots) and after ('red' plots) adjustment at all visible bands. The approach appeared quite effective at those center wavelengths, with the largest differences between in-situ and orbit measurements, which were 443 and 469 nm, and other shorter wavelengths. As expected, a better performance after the adjustment was observed. Specifically, the RMSD and MAPD of MODIS derived Rrs, with respect to in-situ measured Rrs at 443 nm, had shown values of 0.005 Sr −1 and 31.4% before adjustment and a value of 0.001 Sr −1 and 7.3% after adjustment, respectively. The stochastic gradient descent with momentum (SGDM) optimizer, which utilizes mini-batch stochastic gradient, was employed for optimization. The batch size was set to 128 and momentum to 0.9. To alleviate the overfitting, the L2 regularization was added to the loss function during the network's backpropagation (the L2 penalty multiplier was set to 1.0 × 10 −5 ), whereas the dropout regularization for the first two fully connected layers was adopted. The dropout ratio was set to 0.5, indicating that 50% of the neurons in the two fully connected layers were temporally retained when computing the loss function for the weights' updating. In the training phase, the number of epochs was set to 30, and the initial learning rate was set to 0.01, with a drop factor of 0.1 after every 10 epochs. The half-mean-squared-error was used as the loss function, which is defined as: where t i is the labeled sample, y i is the corresponding prediction and R is the number of samples.

MLR Adjustment
A total of 15 pairs of matchups from all campaigns were used for extracting coefficients of MLR adjustment and for the network testing independently. The MLR adjustment relied on in-situ measurements for reducing uncertainty and bias due to systematic perturbations, as resulting from absolute calibration and minimization of the atmospheric effects.
The scatterplots in Figure 5 showed the in-situ measurements versus the R rs before ('black' plots) and after ('red' plots) adjustment at all visible bands. The approach appeared quite effective at those center wavelengths, with the largest differences between in-situ and orbit measurements, which were 443 and 469 nm, and other shorter wavelengths. As expected, a better performance after the adjustment was observed. Specifically, the RMSD and MAPD of MODIS derived R rs , with respect to in-situ measured R rs at 443 nm, had shown values of 0.005 Sr −1 and 31.4% before adjustment and a value of 0.001 Sr −1 and 7.3% after adjustment, respectively.

K-Fold Cross-Validation
A 10-fold cross-validation was conducted, in which all patches and in-situ samples, except those for testing, were uniformly divided into 10 folds. In addition, a two-stage training consisting of pre-training and refinement was used. The first-stage procedure trained the network using the patches in which the C chla was estimated by OC3M algorithm, whereas the second-stage procedure refined the network by utilizing the in-situ samples. The scatterplots of estimated versus original log 10 -transformed C chla showed the network performance for cross-validation results ( Figure 6).

K-Fold Cross-Validation
A 10-fold cross-validation was conducted, in which all patches and inexcept those for testing, were uniformly divided into 10 folds. In addition training consisting of pre-training and refinement was used. The first-stag trained the network using the patches in which the Cchla was estimated by rithm, whereas the second-stage procedure refined the network by utilizin samples. The scatterplots of estimated versus original log10-transformed Cchl network performance for cross-validation results ( Figure 6). The RMSD, MAD and MAPD of second-stage training were decreased those of the first-training, with values that decreased from 0.48 to 0.07, 0.4 38.46% to 6.93%, respectively. The metrics of model accuracy were calculat transformed scale. The pretrained network may have exhibited large discre applied to the validating dataset, implying that the first-stage training could global optimum, because the input Cchla was estimated by the OC3M algorith from in-situ measurements. However, the purpose of first-stage training w suitable initial values of the network parameters. Training with the suitable may have had a higher probability of obtaining a better generalized networ when the number of in-situ samples was insufficient.
Convergence was evaluated by comparing the loss function of both o two-stage training. The loss function of 10-fold networks is presented in Figu the upper panel shows the loss values in the training phase, and the lower the loss values in the validating phase. By using the refined parameters fro training as the initial values of two-stage training, the network could conver ciently (about 6 epochs) than the one-stage training (about 11 epochs). Note the training and validating phases, the final loss value of the one-stage n smaller than that of the two-stage network, with values ranging between 0.0 0.004-0.005, respectively. It should be attributed to the different characteristi datasets. The in-situ dataset was more discrete and contained less features agery patches, despite it being oversampled by the SMOTE technique. The RMSD, MAD and MAPD of second-stage training were decreased compared to those of the first-training, with values that decreased from 0.48 to 0.07, 0.44 to 0.06 and 38.46% to 6.93%, respectively. The metrics of model accuracy were calculated in a log 10transformed scale. The pretrained network may have exhibited large discrepancy while applied to the validating dataset, implying that the first-stage training could not reach a global optimum, because the input C chla was estimated by the OC3M algorithm, instead of from in-situ measurements. However, the purpose of first-stage training was to obtain suitable initial values of the network parameters. Training with the suitable initial values may have had a higher probability of obtaining a better generalized network, especially when the number of in-situ samples was insufficient.
Convergence was evaluated by comparing the loss function of both one-stage and twostage training. The loss function of 10-fold networks is presented in Figure 7, in which the upper panel shows the loss values in the training phase, and the lower panel shows the loss values in the validating phase. By using the refined parameters from one-stage training as the initial values of two-stage training, the network could converge more efficiently (about 6 epochs) than the one-stage training (about 11 epochs). Note that in both the training and validating phases, the final loss value of the one-stage network was smaller than that of the two-stage network, with values ranging between 0.004-0.007 and 0.004-0.005, respectively. It should be attributed to the different characteristics of the two datasets. The in-situ dataset was more discrete and contained less features than the imagery patches, despite it being oversampled by the SMOTE technique.

Model Performances
To evaluate the feasibility and performance, the proposed Cchla-Net approach was compared with two representative algorithms based on the independent testing dataset. The two algorithms were the OC3M, an empirical model, and the Garver-Siegel-Maritorena (GSM01), a semi-analytical model [10]. Statistics of the model performance are listed in Table 2. The Cchla-Net demonstrated a more satisfactory performance than the other two algorithms with higher R2, lower RMSD, lower MAD and lower MAPD, and its slope values of linear fit between estimated versus in-situ measured Cchla (0.97, closer to the 1:1 line) were higher than the other two models (0.29 and 0.30, Figure 8). The OC3M model seemed to be underestimated in the high productive waters, especially when the Cchla was higher than 10 mg·m −3 . The OC3 model was defined on the basis that the difference of two spectral reflectances was small, such that the absorption of suspended sediments and colored dissolved organic matter (CDOM) could be omitted. However, as typical Case-II waters, the optical properties of PRE waters were complex, and the total absorption of phytoplankton, suspended sediments and CDOM and the back-scattering coefficient of phytoplankton and suspended sediments were spectrally variant. Thus, the traditional band-ratio algorithms through blue and green ratios simply did not work for the high productive and turbid PRE waters. The GSM model showed a tendency

Model Performances
To evaluate the feasibility and performance, the proposed C chla -Net approach was compared with two representative algorithms based on the independent testing dataset. The two algorithms were the OC3M, an empirical model, and the Garver-Siegel-Maritorena (GSM01), a semi-analytical model [10]. Statistics of the model performance are listed in Table 2. The C chla -Net demonstrated a more satisfactory performance than the other two algorithms with higher R2, lower RMSD, lower MAD and lower MAPD, and its slope values of linear fit between estimated versus in-situ measured C chla (0.97, closer to the 1:1 line) were higher than the other two models (0.29 and 0.30, Figure 8). The OC3M model seemed to be underestimated in the high productive waters, especially when the C chla was higher than 10 mg·m −3 . The OC3 model was defined on the basis that the difference of two spectral reflectances was small, such that the absorption of suspended sediments and colored dissolved organic matter (CDOM) could be omitted. However, as typical Case-II waters, the optical properties of PRE waters were complex, and the total absorption of phytoplankton, suspended sediments and CDOM and the backscattering coefficient of phytoplankton and suspended sediments were spectrally variant. Thus, the traditional band-ratio algorithms through blue and green ratios simply did not work for the high productive and turbid PRE waters. The GSM model showed a tendency to overestimate the lower C chla . Meanwhile, the correlation between the GSM-estimated and in-situ measured C chla was the lowest among the three algorithms (R 2 = 0.63, lower than 0.77 and 0.85). The optimal GSM parameter values were hard to determine, due to the spareness of in-situ data on the backscattering coefficient of particulates b bp (λ) and the lack of predicted knowledge for the particle phase function [37]. The assumed constants of the model might not be appropriate for the PRE waters.
Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 18 to overestimate the lower Cchla. Meanwhile, the correlation between the GSM-estimated and in-situ measured Cchla was the lowest among the three algorithms (R 2 = 0.63, lower than 0.77 and 0.85). The optimal GSM parameter values were hard to determine, due to the spareness of in-situ data on the backscattering coefficient of particulates bbp(λ) and the lack of predicted knowledge for the particle phase function [37]. The assumed constants of the model might not be appropriate for the PRE waters. The estimated results of Cchla-Net were very close to the in-situ measurement, with its slope being around 1.0 and R 2 being higher than 0.8. It demonstrated that the CNN model had a strong capability to learn the nonlinear relationship between the water-leaving Rrs and the corresponding Cchla of water body, as well as to make full use of the information at all the MODIS/Aqua's visible bands. Additionally, the oversampling approach, the SMOTE technique, allowed us to provide a massive synthetic in-situ dataset for the second-stage training, and it turned out that the trained Cchla-Net generalized well to the independent testing dataset.

Model Applications
Given the satisfactory performance of the proposed Cchla-Net developed using in-situ dataset from PRE, this model was applied to all available MODIS/Aqua Cchla data between 2003 and 2020 to construct a multi-year product for PRE waters. Figure 9 showed the climatological monthly MODIS/Aqua Cchla estimated by Cchla-Net and the difference between Cchla-Net and OC3M models in the PRE. In general, the estimated Cchla from both models agreed well in the temporal patterns in the continental shelf. However, the difference between the two models in the coastal and estuarine areas was remarkable. Especially during summer, the maximal difference was up to 5.80 mg·m −3 . Such differences were mainly due to the worse performance of OC3M model for high Cchla (>10 mg·m −3 ). Therefore, it is likely that Cchla-Net could serve as a better approach to provide the long-term The estimated results of C chla -Net were very close to the in-situ measurement, with its slope being around 1.0 and R 2 being higher than 0.8. It demonstrated that the CNN model had a strong capability to learn the nonlinear relationship between the water-leaving R rs and the corresponding C chla of water body, as well as to make full use of the information at all the MODIS/Aqua's visible bands. Additionally, the oversampling approach, the SMOTE technique, allowed us to provide a massive synthetic in-situ dataset for the second-stage training, and it turned out that the trained C chla -Net generalized well to the independent testing dataset.

Model Applications
Given the satisfactory performance of the proposed C chla -Net developed using insitu dataset from PRE, this model was applied to all available MODIS/Aqua C chla data between 2003 and 2020 to construct a multi-year product for PRE waters. Figure 9 showed the climatological monthly MODIS/Aqua C chla estimated by C chla -Net and the difference between C chla -Net and OC3M models in the PRE. In general, the estimated C chla from both models agreed well in the temporal patterns in the continental shelf. However, the difference between the two models in the coastal and estuarine areas was remarkable. Especially during summer, the maximal difference was up to 5.80 mg·m −3 . Such differences were mainly due to the worse performance of OC3M model for high C chla (>10 mg·m −3 ). Therefore, it is likely that C chla -Net could serve as a better approach to provide the long-term MODIS/Aqua products than the classical OC3M model in the PRE waters. As expected, C chla increased from the continental shelf to the coastal and estuarine area, as the latter received more direct influence of the highly productive freshwater. After exiting the LBPRE, the discharged freshwater generated a nearly stable bulge and formed a distinct plume, which was located in the southwestern LBPRE. The plume axis gradually shifted offshore as a result of the intensified Ekman drift. Therefore, The C chla of western PRE was observed to be higher than that of eastern PRE. During summer, a tongue with a relatively higher C chla tends to expand to the southern and southeastern LBPRE. Forced by the wind-driven coastal current, the plume was wider over the shelf due to the freshwater in the outer part of the bulge flowing downstream at the speed of the current. During winter, the plume was confined nearshore under the influence of the northeastly wind.
MODIS/Aqua products than the classical OC3M model in the PRE waters. As expected Cchla increased from the continental shelf to the coastal and estuarine area, as the latte received more direct influence of the highly productive freshwater. After exiting the LBPRE, the discharged freshwater generated a nearly stable bulge and formed a distinc plume, which was located in the southwestern LBPRE. The plume axis gradually shifted offshore as a result of the intensified Ekman drift. Therefore, The Cchla of western PRE was observed to be higher than that of eastern PRE. During summer, a tongue with a relatively higher Cchla tends to expand to the southern and southeastern LBPRE. Forced by the wind driven coastal current, the plume was wider over the shelf due to the freshwater in the outer part of the bulge flowing downstream at the speed of the current. During winter the plume was confined nearshore under the influence of the northeastly wind. To facilitate quantitative interpretations, the spatial and seasonal variations in the coastal and estuarine area ('Box 1' in Figure 1), as well as the continental shelf ('Box 2' in To facilitate quantitative interpretations, the spatial and seasonal variations in the coastal and estuarine area ('Box 1' in Figure 1), as well as the continental shelf ('Box 2' in Figure 1) were further examined. Figure 10 presents the monthly mean C chla in both areas. The monthly mean values estimated by C chla -Net ranged from 0.94 to 11.97 mg·m −3 in the LBPRE and from 0.09 to 0.65 mg·m −3 in the continental shelf. Different seasonal variations were found in coastal area and continental shelf, with relatively higher C chla observed during summer in the former region and during winter in latter region. These seasonal variations appeared to be regulated primarily by river discharge and mixing of the upper ocean [23].
Remote Sens. 2021, 13, x FOR PEER REVIEW 15 of 1 Figure 1) were further examined. Figure 10 presents the monthly mean Cchla in both areas The monthly mean values estimated by Cchla-Net ranged from 0.94 to 11.97 mg·m −3 in th LBPRE and from 0.09 to 0.65 mg·m −3 in the continental shelf. Different seasonal variation were found in coastal area and continental shelf, with relatively higher Cchla observed dur ing summer in the former region and during winter in latter region. These seasonal vari ations appeared to be regulated primarily by river discharge and mixing of the uppe ocean [23].

Conclusions
This study found that the Cchla-Net showed an apparent advantage over the empirica and semi-analytical models for extremely high Cchla. Therefore, Cchla-Net might be a prom ising method for the Cchla retrieval in optically complex coastal and estuarine waters. Th proposed Cchla-Net model worked well for low to high values especially, while the OC3M algorithm tended to underestimate high values in the coastal and estuarine area. The MLR adjustment, which specifically relies on matchups of corresponding in-situ and orbit measured Rrs to capture systematic differences, could remove the difference likely due t uncertainties in the absolute calibration of sensors and the minimization of atmospheri perturbations. The novel adaptive synthetic oversampling technique improved the D model with respect to the distribution of dataset in two ways: (i) reducing the bias intro duced by the imbalanced distribution of the dataset; (ii) adaptively shifting the classifica tion decision boundary to be more focused on the difficult to learn samples.
Considering the high performance, it has a great potential to be applied in the PRE especially for the productive and optically complex coastal and estuarine waters. How ever, there is still room for improvement. As a data-driven method, input training th dataset directly impacts the network performance. The accuracy of the DL network largel depends on the in-situ dataset, which covered a wide range of Cchla variations. More in-sit

Conclusions
This study found that the C chla -Net showed an apparent advantage over the empirical and semi-analytical models for extremely high C chla . Therefore, C chla -Net might be a promising method for the C chla retrieval in optically complex coastal and estuarine waters. The proposed C chla -Net model worked well for low to high values especially, while the OC3M algorithm tended to underestimate high values in the coastal and estuarine area. The MLR adjustment, which specifically relies on matchups of corresponding in-situ and orbit-measured R rs to capture systematic differences, could remove the difference likely due to uncertainties in the absolute calibration of sensors and the minimization of atmospheric perturbations. The novel adaptive synthetic oversampling technique improved the DL model with respect to the distribution of dataset in two ways: (i) reducing the bias introduced by the imbalanced distribution of the dataset; (ii) adaptively shifting the classification decision boundary to be more focused on the difficult to learn samples.
Considering the high performance, it has a great potential to be applied in the PRE, especially for the productive and optically complex coastal and estuarine waters. However, there is still room for improvement. As a data-driven method, input training the dataset directly impacts the network performance. The accuracy of the DL network largely depends on the in-situ dataset, which covered a wide range of C chla variations. More in-situ datasets are required to improve the model applicability. Furthermore, the OC3M products were used on a global scale and could not be directly applied to the PRE waters. Collecting more in-situ samples to adjust the parameters of the OC3M model could also be beneficial for the DL network training.