Developing a New Machine-Learning Algorithm for Estimating Chlorophyll-a Concentration in Optically Complex Waters: A Case Study for High Northern Latitude Waters by Using Sentinel 3 OLCI

: The monitoring of Chlorophyll-a (Chl-a) concentration in high northern latitude waters has been receiving increased focus due to the rapid environmental changes in the sub-Arctic, Arctic. Spaceborne optical instruments allow the continuous monitoring of the occurrence, distribution, and amount of Chl-a. In recent years, the Ocean and Land Color Instruments (OLCI) onboard the Sentinel 3 (S3) A and B satellites were launched, which provide data about various aquatic environments on advantageous spatial, spectral, and temporal resolutions with high SNR. Although S3 OLCI could be favorable to monitor high northern latitude waters, there have been several challenges related to Chl-a concentration retrieval in these waters due to their unique optical properties coupled with challenging environments including high sun zenith angle, presence of sea ice, and frequent cloud covers. In this work, we aim to overcome these difﬁculties by developing a machine-learning (ML) approach designed to estimate Chl-a concentration from S3 OLCI data in high northern latitude optically complex waters. The ML model is optimized and requires only three S3 OLCI bands, reﬂecting the physical characteristic of Chl-a as input in the regression process to estimate Chl-a concentration with improved accuracy in terms of the bias (ﬁve times improvements.) The ML model was optimized on data from Arctic, coastal, and open waters, and showed promising performance. Finally, we present the performance of the optimized ML approach by computing Chl-a maps and corresponding certainty maps in highly complex sub-Arctic and Arctic waters. We show how these certainty maps can be used as a support to understand possible radiometric calibration issues in the retrieval of Level 2 reﬂectance over these waters. This can be a useful tool in identifying erroneous Level 2 Remote sensing reﬂectance due to possible failure of the atmospheric correction algorithm.


Introduction
Arctic waters have been going through significant changes due to the rapid thinning and retreating sea ice over the past decade [1]. Larger areas of open waters, increasing nutrient supplies and favorable light conditions allowed phytoplankton blooms to occur, where they have not been observed before [2][3][4]. Phytoplankton are photosynthetic marine organisms [5], and their presence indicates

•
Introduce a unified Chl-a retrieval algorithm to monitor Chl-a concentration in complex high northern latitude waters, such as inland, coastal waters, the Marginal Ice Zone (MIZ), and for open Arctic oceans. • Design the model specifically to S3 OLCI so that the advantageous spectral, spatial and temporal resolutions of the instrument can be used for monitoring these complex high northern latitude waters. • Provide a tool to assess uncertainties in the retrieval of the Level 2 (L2) radiometric data, i.e., the Remote sensing reflectance (Rrs), which is the input to many of the Ocean Color algorithms, including the model introduced here.
A combination of machine-learning (ML) approaches has a great potential to achieve these objectives. ML has been demonstrated to perform successfully in ocean color applications. For instance, in [27,28] NN methods have been introduced to improve retrieval of Rrs for both open and complex waters. NN has also been studied to estimate Chl-a from remotely sensed data [29,30]. In addition to NN, other ML methods have also been studied for bio-physical parameter retrieval, which includes Support Vector Regression [31][32][33], Relevance Vector Regression [34], and Gaussian Process Regression (GPR) [35]. Several studies have shown that the Machine-Learning Gaussian Process Regression (ML GPR) model performs outstandingly in comparison to other parametric and ML methods [36][37][38][39].
The ML GPR method is not only a flexible and analytic approach, but it also outputs the certainty level of the estimated Chl-a concentrations, computed according to an analytic scheme. These certainty estimates allow assessment of whether the input Rrs data differs from the Rrs data used in the training of the ML GPR model. This means that in case one has control over the quality of the training Rrs data, possible errors in the new S3 OLCI Rrs data, which might arise due to calibration issues and/or failures of the atmospheric correction algorithm, can be revealed.
In the work [26], the Automatic Model Selection Algorithm (AMSA) was applied, which was introduced in [40] to a complexity-diverse aquatic environment. AMSA combines feature ranking and selection with ML regression methods to determine the number and combination of spectral bands and/or features needed to obtain the strongest performance for a given aquatic environment. Hence, applying the approach to an aquatic environment representing a wide range of optical complexities, might allow determination of a model, which can be generalized and optimized to complex high northern latitude waters. The ML GPR model was established on Lake Balaton, and was shown to improve Chl-a retrieval from S3 OLCI Rrs data [26] in the lake. The ML GPR Balaton model takes as input only three Rrs values, measured on bands centered at 442.5, 665, and 681.25 nm as input to estimate Chl-a concentration. The regression power of the method decreases, when additional OLCI Rrs bands are added. Using all the available OLCI bands in the ML GPR Balaton model resulted in a Normalized Root Mean Squared Error (NRMSE) of 0.11, a bias of 2.25 and Pearson correlation coefficient of 0.79, whereas using only the three selected bands as inputs resulted in a NRMSE of 0.10, a bias of 2.12 and a Pearson correlation coefficient of 0.83) [26]. This suggests that using only these bands in complexity-diverse waters might provide sufficient spectral information to estimate Chl-a concentration. This ML GPR Balaton model was evaluated on a Hydrolight simulated data partitioned to correspond to various Arctic waters with promising performance [41].
Finally, in this work the ML GPR Balaton model was optimized, analyzed, evaluated and tested on a comprehensive in situ dataset, including a significant number of measurements of optically complex waters obtained from Arctic, coastal, and open oceans. Here, we show that the ML GPR Balaton model has reasonable performance in estimating Chl-a concentration from S3 OLCI L2 Rrs data, where other approaches fail. We present the predictive performance of the optimized ML GPR Balaton model on highly complex sub-Arctic and Arctic waters. We show how the certainty maps can provide information about the quality of the L2 input Rrs data. The detailed description and flowchart of the analysis are presented in Section 2.2.3.

Data
In this work, we focused on Chl-a concentration retrieval from Rrs measured on three OLCI bands, which are included in the ML GPR Balaton model. These bands are centered at 442.5, 665, and 681.25 nm. Hence both the in situ and satellite derived radiometric data include measurements only on these three bands. We used two datasets of optically complex waters, an Arctic and coastal data (COASTlOOC), for optimizing and training the ML GPR Balaton model. The location of the stations can be seen in Figure 1. The red circles show the positions of the measurements of the COASTlOOC dataset, and the black circles represent the locations of the Arctic samples. It can be seen that the measurements originate from a wide spatial extent and from various water conditions. We chose three test sites for evaluating the predictive performance of the optimized ML GPR Balaton model. These three scenes include highly complex aquatic environments. The locations of the test sites are indicated with squares in Figure 1. It is challenging to estimate Chl-a concentration in these test sites from remotely sensed data due to the optical complexity of these waters. Therefore, they provide excellent case studies to examine the predictive strength of the proposed model. Furthermore, we use these scenes to present how uncertainties can be assessed by using the ML GPR Balaton model.

Training Data for Model Optimization
The Arctic Data The Arctic dataset contains measurements in the MIZ and from Arctic oceans, and it was composed by five cruises, the France-Canada-USA joint Arctic campaign MALINA [42], ICESCAPE2010 (Impacts of Climate on Ecosystems and Chemistry of the Arctic Pacific Environment), ICESCAPE2011 [43], Tara Oceans Polar Circle expedition [44], and GREEN EDGE [45].
Sampling of MALINA and GREEN EDGE was conducted onboard the Canadian Icebreaker CCGS Amundsen, and the two ICECAPEs were onboard the US Icebreaker USCGC Healy. The Tara expedition was conducted using a French schooner, Tara. Note that in some cases, during the same time periods of the CTD deployment on the ship, a barge and/or zodiac were also deployed to obtain surface water samples, and to avoid contamination due to the main ship including water quality as well as ship shadow. In MALINA, ICESCAPE2010, and ICESCAPE2011, samples were taken both from near sea ice and far from it. Details on the number of samples, sampling date and area are listed in Table 1.

Rrs Measurements
All Rrs(λ) of the five cruises used in this study were determined by the Compact-Optical Profiling System (C-OPS) [46] instrument, which is a radiometer system for determining apparent optical properties in aquatic systems. It consists of two 7 cm diameter radiometers: one measures in-water upwelling radiance L u (λ, z), and the other one measures downward irradiance E d (λ, z), pressure/depth, and dual axes tilts. Both radiometers are equipped with up to 19 optical-filter micro-radiometers. In this study, only three channels at 443, 665, 681 nm are used. The above-water global solar irradiance (E s ) was also measured to take into account the incident light field changes. Other corrections including dark currents, pressure tares, aperture offsets, tilt filtering (5 or less), and self-shading were also taken into account. Pressure tares were used to have radiometric data with vertical resolution of 1 cm [47]. Dark currents together with pressure tares were used to achieve low uncertainties for radiometric measurements (for most cases within a few percent). Data with tilt above 5 degree was not used. Self-shading was corrected in terms of diameter of the sensor and absorption values [48]. Finally, Rrs(λ) was obtained following the protocols described in [46,47]. (Depth interval was determined so that E d (0-) was calculated from E d (0+) through air-water interface, and E d (0-) was measured by using the in-water sensor agreeing within a few percent. Within the depth interval, ln(L u ) data was fitted and extrapolated into the surface, and L u (0-) was obtained. L w (0+) was then calculated through air-water interface. Rrs was calculated using L w (0+) normalized by E d (0+) measured during the same time interval as L u (z).)

The COASTlOOC Data
The second dataset originates from the coastal surveillance through observation of ocean color (COASTlOOC) project [49], and it includes measurements from both coastal and open oceans around Europe. Water samples and coincident radiometric measurements were collected from complex aquatic environments, such as river plums and coastal waters, and open ocean. Irradiance was converted to Rrs by using a Q-factor 3.8 as described in [50]. Note, Chl-a measurements were obtained following the same procedure as in case of the Arctic data, whereas Rrs was derived by a different technique.

The Merged Global Data
The COASTlOOC and Arctic datasets were merged to build a comprehensive dataset, including Arctic, coastal, and some open waters. This dataset was used for optimizing, training, evaluating, and validating the ML GPR Balaton model. Hence, the total number of samples available for training and validation is 521, and overall Chl-a range is 0.02 and 29.07 mg m −3 . The Rrs data included only the three spectral bands used by the ML GPR Balaton model. The model using only the three selected spectral bands was evaluated and compared with a model, when all the available spectral bands were used for estimating Chl-a [26]. The 3 bands model resulted in improvements in all the computed statistical measures. Figure 2 shows the Rrs of the Arctic and COASTlOOC datasets for all the available bands corresponding to OLCI and for the three selected bands. Note that although there is a loss of information, when the number of bands are reduced to three, using only the selected three most relevant bands can improve Chl-a retrieval. This might be due to the fact that only these selected three bands contain the information needed for Chl-a estimation in the ML GPR Balaton model for complexity-diverse waters. The Rrs spectra of the Arctic (c) and COASTlOOC dataset (d ) measured on the three most important OLCI bands from the Balaton model. The term"# of Chl-a observations" refers to the observed Chl-a concentration, which is sorted by an increasing order. Hence, measurements corresponding to low numbers are representing low Chl-a concentrations, while observations labeled with larger numbers are high Chl-a concentrations.

Total Chl-a Measurements
High Performance Liquid Chromatography (HPLC) analysis was used in this study. First, 25 mm GF/F filters were used to filter phytoplankton pigments samples, then certain process were taken including filters extracted in 100 % methanol, disrupted by sonication and filtered (GF/F Whatman) before analyzing it by using HPLC on the same day. For the Arctic data, the detailed protocols of MALINA, TARA-Arctic, and GREEN EDGE samples followed the analytical procedure described in [51], and the method by [52] was applied to ICESCAPEs samples. (For further details on the dataset see for example [53].) HPLC-determined concentration of total Chl-a was determined as the sum of mono-and divinyl Chl-a concentration, Chlorophyllide-a and the allomeric and epimeric forms of Chl-a. The range of Chl-a was between 0.02 and 10.11 mg m −3 for the Arctic data, and it was between 0.05 and 29.07 mg m −3 for the COASTlOOC data, thus representative for various water conditions.

Data for Prediction with the Trained ML GPR Balaton Model
We evaluated the trained ML GPR Balaton model on a swath acquired by S3 OLCI including complex sub-Arctic and Arctic waters. The locations of the study sites can be seen in Figures 3 and 4.
The area indicated as number 1 includes a part of the St. Lawrence river, a smaller river, and a lake. The RGB image indicates various degrees of complexity for this area. Site 2 is the delta of the St. Lawrence river, which is known to have high turbidity and rapidly changing water complexity [54]. Site 3 and 4 are two lakes in Quebec: Lake Mistassini and Lac Saint-Pierre. These chosen sites provide the possibility to test the performance of the ML GPR Balaton model on optically highly complex inland waters. For instance, it can be seen in Figure 4, site 3 (Lake Mistassini) gray features appearing along the shore of the lakes. These features are not likely to be areas with high Chl-a concentration, they are possibly shallower and sediment-dominated patterns. Therefore, they can be used to evaluate the sensitivity of the ML GPR Balaton model to various degree of turbidity.  These areas were used to test the performance of the unified ML GPR Balaton model for Chl-a estimation in highly complex waters by using the Rrs values on the three wavelengths (442.5, 665, and 681.25 nm). The Rrs data was atmospherically corrected Level 2 (L2) reflectances, on 300 m full resolution. The processing of the data was implemented in the Sentinel Application Platform (SNAP). The Level 2 Rrs data was produced by using the C2RCC atmospheric correction algorithm. The recommended flags were used to mask out invalid pixels. Then this data was used to predict Chl-a concentration with the trained ML GPR Balaton model for these complex waters. This prediction step was performed in the MATLAB toolbox.

Methodology
The ML GPR Balaton model was obtained on Lake Balaton by using the Automatic Model Selection Algorithm (AMSA) [40]. Lake Balaton is a complexity-diverse aquatic environment, including eutrophic, mesotrophic, oligotrophic, turbid, and clear water conditions. Following recent findings, the model has great potential to show similar performance, when applied to high northern latitude complex waters as well. (For further details on the establishing of the ML GPR Balaton model we refer to [26].) We optimized and evaluated the ML GPR Balaton model on optically complex waters of the Arctic and COASTlOOC datasets. The approach and the interpretation of the methodology are presented in Sections 2.2.1 and 2.2.2.

The ML GPR Balaton Model
Assume a training dataset, which in this case is the merged Arctic and COASTlOOC datasets, consisting of the output y n Chl-a measurements and the corresponding input x n Rrs, where n = 1, . . . N is the number of measurements, x n is the three-dimensional input Rrs vector, and the dimensions are the wavelengths centered at 442.5, 665, and 681.25 nm. The Chl-a is assumed to be a function of the Rrs(λ i ), for i = 1, 2 and 3, and the noise , and can be written by The noise follows a normal distribution (N) with zero mean and σ 2 variance. Then a zero mean Gaussian Process (GP) prior is placed over the function values f (x n ) and the noise, i.e., f (x) ∼ GP(0, k(x, x )). The term k(x, x ) is the covariance matrix, and the elements of the covariance matrix are computed by the kernel function. By definition, observations drawn from a GP at locations {x n } N n=1 , also follow a joint Gaussian distribution [55]. Using Bayes theorem, the posterior distribution for a new output y * can be analytically derived conditioned on the training data D and new input observations x * . This can be expressed by p(y * |x * , D) = N(y * |µ GP * , σ 2 GP * ), where µ GP * is the predicted Chl-a and σ 2 GP * is the certainty level of the estimate. The new input observations are the Rrs(λ i ) for either the training data (in the model evaluation) or the L2 OLCI Rrs over high northern latitude complex waters (in prediction). Then the estimated Chl-a can be expressed by (2) and the variance (certainty level of the estimated Chl-a concentration) is where k f * is the covariance between the training vector and the test point, α = (K ff + σ 2 I n ) −1 y is the weight vector of the GP mean, k * * is the covariance between the test point with itself, and K ff + σ 2 I n is the noise contaminated covariance matrix of the training data, where I n is the identity matrix. The elements of the covariance matrices are computed with the commonly used Squared Exponential (SE) kernel function. This can be expressed by k(x p , 2 ) for element p and q, where the hyper-parameters ν and λ are the scaling factor and length-scales, respectively, and d = 1, . . . D is the dimension. In this case, D = 3, corresponding to the three wavelengths. The total number of hyper-parameters in this case is 5, which includes the scaling factor, length-scales and the noise variance. These hyper-parameters were optimized by maximizing the negative log-marginal likelihood function with respect to the hyper-parameters. In this work, we used Bayesian optimization.

Interpretation of the ML GPR Balaton Model
The ML GPR Balaton model assumes that Chl-a concentration is a function of the Rrs measured at the three wavelengths and the additional noise, i.e., Chl-a = f (Rrs(λ i )) + , for λ i = 442.5, 665, and 681.25. The Rrs is furthermore a function of the water leaving radiance L w , Rrs(λ i ) = g(L w (λ i )). This functional relationship (g) can be written by Rrs where F 0 is the extraterrestrial solar irradiance, f s is a factor to adjust F 0 for the variation in Earth-Sun distance, t ds is the total downward transmittance of the atmosphere, f b is the bidirectional reflectance correction factor, and f (λ i ) is the correction for out-of-band response. L w is retrieved from the measured total Top Of Atmosphere (TOA) radiance by using atmospheric correction algorithm [56][57][58][59]. In this case, the C2RCC algorithm was used.
Hence, Chl-a = f (g(L w (λ))) + . The water leaving radiance contains the spectral characteristics of the water bodies, and in our model three wavelengths are used to describe the Chl-a pigment. Furthermore, the function f that relates Chl-a to L w through the Rrs, can be learned and analytically expressed. This can be interpreted by collecting all the available measured Rrs(λ) n for n = 1,..., N, and connecting them in the spectral space by a GP, so that the non-linear function of Rrs(λ) n will be jointly Gaussian distributed. The spectral distance between the observed Rrs(λ) n is controlled through the kernel function, more specifically, in case of the SE kernel, the length-scale hyper-parameter provides the measure for the similarity. Small values of the optimized length-scales indicate that the observations are close to each other in the spectral space, while large values show little spectral similarity. This length-scale hyper-parameter is used to parametrize the kernel function, which is used to compute the elements of the covariance matrices in the ML GPR Balaton model, and these covariance matrices are used in the final expression of the estimated Chl-a concentration (Equation (2)). Hence, the approach can be tracked, analytically expressed, and interpreted.
The additional automatic output of the ML GPR model, is the variance (certainty level) of the estimates (Equation (3)). This is a highly advantageous property, since it is independent of the Chl-a measurements, and it reveals whether the new observed Rrs is similar to the input data used in the training process.
Note, the L2 Rrs, which is the function of L w (λ) n is obtained by using atmospheric correction, hence the approach relies on the atmospheric correction algorithm.

Description of the Analysis
First, we evaluated the ML GPR Balaton model in a statistical analysis. The merged dataset was used for studying the learning strength of the approach and for cross-validation. We computed statistical measures commonly used in remote sensing [60,61]. These measures are the bias, which is the sum of the absolute mean errors, the Normalized Root Mean Squared Errors, NRMSE = 1/(y max − , and the p-value to assess the significance. N denotes the number of observations, y is the measured Chl-a concentration,ŷ is the predicted Chl-a, y max is the maximum observed value, y min is the minimum observed value, and y is the mean of the observed Chl-a concentrations. Note, the NRMSE and R 2 are unitless, and the unit of the bias is mg m −3 .
Then an image acquired by S3 OLCI was chosen, when cloud coverage was relatively low, for assessing the predictive strength of the approach in application. The swath includes high northern latitude complex waters: lakes, rivers, and the estuary of the St. Lawrence river. Hence, it provides an excellent site to test the method on various complex high northern latitude waters. In addition to estimating Chl-a concentration by the ML GPR Balaton model, we also present the certainty levels of the estimates, and the state-of-the-art L2 Chl-a products computed by using NNs and OC4 approaches. Note, although the state-of-the-art complex water product is obtained by the NN, we included the estimates of the OC4 algorithm as well, due to the general robustness of the approach and uncertainties in the composition of water constituents of the underlying water bodies. Figure 5 shows the flowchart of the approach. The 3 bands ML GPR Balaton model, which was established on data from Lake Balaton, was tuned and optimized on the Arctic and COOSlOOC datasets. We evaluated the 3 bands ML GPR Balaton model by using bootstrapping in 1000 iterations. The final model was optimized on all the available data, which was the merged Arctic and COOSTlOOC datasets. Finally, the optimized model was studied for Chl-a estimation and uncertainty assessment on data acquired by S3 OLCI over high northern latitude Arctic waters.

bands ML GPR Balaton model Evaluating the optimized bands ML GPR Balaton model by using bootstrapping
Applying the optimized 3 bands ML GPR Balaton model to high northern latitude waters Tuning and optimizing for optically complex waters Chl-a estimation Uncertainty assessment

Statistical Analysis
The trained ML GPR Balaton model was evaluated by performing cross-validation (bootstrapping). The dataset was randomly divided into 90 % for training and 10 % for testing. The statistical measures were computed for the test set for 1000 iterations. At each iteration step, a new model was optimized for the randomly chosen training data, and then tested on the rest of the observation. This assured randomness, hence revealing weaknesses and strengths of the approach. The distribution of the computed statistical measures for the 1000 cross-validations can be seen in Figure 6.
The  Figure 6c also shows skewed distribution, and most of the computed values are close to 0.9. The p-value Figure 6d indicates high significance. The histograms show the performance of the 3 bands ML GPR Balaton model on randomized test data. The aim of this study is to understand how the proposed model performs on various test data, for instance, when the test data differs from the training data. This way the bootstrapping procedure provides information about the statistics of the regression performance measures, whereas dividing the data into one training and one test set gives us only one value of these measures. The heights (y-axis) of the bars of the histograms show the occurrence frequencies of the values of the statistical measures, indicated on the x-axis.

Chl-a Maps in High Northern Latitude Complex Waters
We evaluated the ML GPR Balaton model on the four regions as defined in Section 2.1.2 (Figures 3 and 4). We computed the Chl-a estimates and the corresponding certainty maps pixelwise. Blue-green pixels in the certainty maps correspond to relatively high confidence, and yellow-red pixels show relatively lower certainty. These low certainty pixels reveal areas, where the new observed input Rrs data differs from the data used for obtaining the trained ML GPR Balaton model. At the same time, higher certainty shows regions, where the input Rrs data is similar to the one used for training. Hence, these areas will likely have similar accuracy as the computed measures in the cross-validation study. Figure 8 shows histograms of the in situ Rrs (left column) used for training the ML GPR Balaton model and the L2 OLCI Rrs (middle and right column) used for predicting Chl-a. The L2 OLCI Rrs seems to differ from the in situ measurements, and it includes negative values. Areas in the predicted maps, corresponding to these deviations are expected to show low certainties. However, the range of the in situ Rrs in the L2 OLCI Rrs (Figure 8 right column) reveals that most of the pixels are distributed in the range of the in situ values, and these histograms show similar distribution to the histograms of the in situ Rrs.  This suggests that input Rrs data differed from data used in the training process (as seen in Figure 8), and the assigned higher values to the Chl-a concentration are based on the learned functional relationship. It is likely to have low certainty for pixels representing this higher Chl-a range. Figures 10-13 show the results for the four study sites. Region 1 includes a small lake, a segment of the St. Lawrence river, and a flow connecting the lake with the St. Lawrence river. In general, the three approaches assign different Chl-a values for these regions. For instance, the ML GPR Balaton model shows lower estimates for regions, where the NNs estimates high Chl-a (see the lake and the lower part of the St. Lawrence river in the image). The highest certainty of the ML GPR model is observed in the lake and the upper middle part of the St. Lawrence river in the image. The estimates of the OC4 model show large contrasts in Chl-a concentrations.
The estimated Chl-a maps for region 2 can be seen in Figure 11, which is the mouth of the St. Lawrence river, and known to have a wide variety of optical properties. In this case, all three estimates show similarities in the pattern of the Chl-a concentration. The NNs estimates seem to have sensitivity to cloud shadows, which appears as areas with high Chl-a concentration (c). The ML GPR Balaton and OC4 model seem to assign similar numerical values to the Chl-a concentrations. The certainty map shows that areas by the edges of the clouds differ from the observed training input data, which is indicated by low certainty. This might suggest that the applied masks should have removed these pixels.  The RGB image for region 3, Lake Mistassini (in Figures 3 and 4) reveals some features along the shores of the lake. These features appear as grayish patterns in the RGB image, and not likely to be areas with high primary productivity. Interestingly, the ML GPR Balaton model could not only capture these features with relatively high certainty, but it showed no sensitivity to possible suspended matter, and hence assigned low Chl-a concentration to these pixels ( Figure 12). This indicates that the ML GPR Balaton model can discriminate between waters of various degrees of turbidity, and identify the signal from Chl-a in different water conditions. Note that none of the state-of-the-art L2 Chl-a concentration products could estimate Chl-a concentration in this lake. The last study site, region 4 was the Lac Saint-Pierre. Figure 13 shows the estimated Chl-a concentration maps. It can be seen that both the ML GPR Balaton model and the NNs estimated higher Chl-a values in the same areas; however, the ML GPR model assigned lower values than the NNs. Using NNs to estimate Chl-a concentration in turbid waters has been previously shown to overestimate Chl-a. Furthermore, the certainty map shows low certainty in areas, where the estimated Chl-a concentration is higher. The OC4 model seems to output a binary-like image for this area, which is less likely to be correct. (d) We have also applied the optimized ML GPR Balaton model to data acquired by S3 OLCI over the Marginal Ice Zone, coastal, and open Arctic waters. These Chl-a maps can be seen in Appendix, and they show that the approach also applies to various Arctic waters.

Discussions
This work aimed to evaluate the hypothesis that the Automatic Model Selection Algorithm (AMSA) can be used to establish a Chl-a concentration retrieval algorithm tailored for S3 OLCI for complexity-diverse waters. In addition, this algorithm can be generalized to perform well in high northern latitude complex waters, where the state-of-the-art approaches fail.
We optimized the unified ML GPR Balaton model on in situ radiometric measurements and corresponding Chl-a observations collected from Arctic, coastal (including river plumes) and open waters. Using in situ data for model optimization has several advantages in this case. It is independent on a certain calibration and atmospheric correction algorithm, which might change during a mission and by region. Furthermore, it can reveal uncertainties in the retrieval of satellite products. Hence, the model can contribute to improve the quality of the satellite derived water quality estimates.
We conducted a statistical analysis, which showed that ML GPR Balaton model performs well on data representing a wide variety of optical properties. Our findings (Figure 7) were consistent with previous results obtained on Lake Balaton. This suggests that the results obtained in this work are comparable with the results in case of Lake Balaton. Most importantly, the ML GPR Balaton model seems to be robust, stable, and reliable.
Note, the ML GPR Balaton model uses three features to estimate Chl-a concentration. These features are spectral bands positioned at 442.5, 665, and 681.25 nm center wavelength. This is in contrast with the commonly used machine-learning approaches, where all spectral information (all bands) is used [24,25]. The bias significantly decreased, when only these three spectral bands were used in the ML GPR Balaton model, instead of using all the OLCI bands in the visible part of the spectral region.
(For further details see [26].) These three bands not only improve Chl-a retrieval in complexity-diverse waters from S3 OLCI Rrs, but also reflect the physical properties of the water bodies. The blue band, positioned at 442.5 nm is probably the dominant band for clear, oligotrophic waters. At the same time, red bands are commonly used in turbid waters. The bands centered at 665 and 681.25 nm correspond to the second absorption peak of Chl-a and to the fluorescence band, respectively, and they are probably favored for turbid and/or eutrophic waters. Red bands are commonly used in parametric and semi-analytic Chl-a retrieval algorithms for complex waters [22,61,62].
The predicted Chl-a maps showed that the ML GPR Balaton model can estimate Chl-a concentration in complex high northern latitude waters, and capture patterns which are usually challenging for other Chl-a retrieval algorithms. The presented maps were focusing on Canadian inland waters and the St. Lawrence river estuary due to the difficulties in satellite derived Chl-a estimation in these regions. The optical complexity and the ongoing rapid transitions in water conditions due to environmental changes result in challenges in the retrieval of Level 2 water quality products. The proposed approach proceeds by using Bayesian inversion, which allows the model to reveal uncertainties in the input Level 2 products. We showed that the approach provides the pixelwise certainty level of the estimated Chl-a concentration, which is a highly advantageous property, since it can reveal areas where the acquired Rrs data differs from the data used for training the model. This means that low certainty can be due to deviations in the optical properties of the waterbodies. However, if the training data is representative, which is likely to be the case in this work, then low certainty is most likely due to erroneous Rrs retrieval of the applied atmospheric correction algorithm. Hence, the certainty maps provide a support to understand the challenges in ocean color monitoring by using S3 OLCI. These certainty maps can be used as a mask, to disregard areas with relative high uncertainties, and keep the estimates, where the computed statistical measures are valid.
Even though in this work, we presented the performance of the unified ML GPR Balaton model on high northern latitude complex inland waters, we also computed prediction maps for S3 OLCI Rrs data acquired over the Marginal Ice Zone (MIZ) and sub-Arctic/ Arctic coastal and open oceans. These results can be seen in the Appendix, and they also support that the model can be applied to various Arctic waters, with different degrees of optical complexity. It is advantageous to have a unified Chl-a retrieval algorithm for S3 OLCI, because it is often challenging to assess information about the type of water to be monitored in advance. Furthermore, rapid changes in Arctic waters can also change the type of water with time, causing difficulties to choose the correct type of S3 OLCI L2 Chl-a product. This work shows a possible solution to overcome these difficulties.

Conclusions
It can be concluded that the ML GPR Balaton model performs well and has the potential to be used for operational purposes for complex sub-Arctic/ Arctic waters. The cross-validation (bootstrapping) study resulted in a median of the normalized root mean squared error of 0.13, a median of the bias of 1.3 mg m −3 , a median of the Pearson correlation coefficient of 0.91, and a low p-value, indicating high significance. With regard to the wide variety of the data used for optimizing the ML GPR Balaton model, these computed measures would be valid for a new dataset, under the assumption that the atmospheric correction algorithm is properly calibrated. Using in situ data for model tuning and optimization instead of matchups allows the approach to be independent of the atmospheric correction algorithm used to produce the satellite Level 2 data.
We showed how the ML GPR Balaton approach can be a tool to locate and assist in solving calibration issues, with the use of the certainty maps. Low certainty can help to locate areas, where Rrs values differ from the representative training data. Hence, this work not only presents a candidate unified machine-learning model for monitoring a wide range of water conditions, but also shows how the approach can be used to locate waters, where calibration issues might need to be addressed.
For future work, we will automate the approach to detect uncertain areas. This will be done by defining a threshold from uncertainty estimates, and disregard values above the threshold. This would yield an uncertainty mask based on an analytical approach. Further studies include introducing the approach for other water quality parameters. Acknowledgments: The authors are grateful to Marcel Babin for suggesting to optimize the approach on the in situ data and for his useful comments. We would like to express our gratitude to Drew Gilbert for the proofreading. We thank Torbjørn Eltoft for his useful comments during the revision process. Part of this study was supported by the Japan Aerospace Exploration Agency (JAXA) GCOM-C project to AM (PI No. ER2GCF310).

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Chl-a Maps in the Marginal Ice Zone, Coastal, and Open Oceans
We applied the ML GPR Balaton model to two additional swaths acquired over the MIZ and coastal and open oceans. The RGB images of the two areas are presented in Figure A1. The predicted Chl-a maps by using the ML GPR Balaton model (a), the NN (c) and OC 4 (d) algorithms for the MIZ can be seen in Figure A2. It can be seen that the pattern of the estimated Chl-a concentration is similar for all the three algorithms in this case. However, the assigned Chl-a values differ. The certainty map (b) shows the relative certainty for the ML GPR Balaton model estimates. Figure A3 shows the corresponding enlarged areas. This also shows that in this case, all three algorithms could capture the pattern of Chl-a. The difference here also manifests itself in the amount of the assigned Chl-a concentration. (The range of the units is the same as in Figure A2.) The certainty map (b) reveals high certainty in areas of the appearance of the bloom (green pixels), indicating reliable Chl-a estimates. In general, the ML GPR Balaton model assigns lower Chl-a values, which is probably due to its insensitivity to TSM. The relative certainty is high, showing that input Rrs data is similar to the training data. Surprisingly, certainty decreases in the open ocean area, where there is a possible bloom. This might mean that the training data is lacking information from a similar bloom event that occurred when the image was taken.