## 1. Introduction

Ocean Color (OC) monitoring from spaceborne and airborne platforms using remote sensing techniques has been receiving an increased focus in the past decades [

1,

2]. This is due to the fact that an ever-increasing amount of remote sensing data is getting available, but also, because of increased anthropogenic activity and climate change have resulted in changes in the water quality [

3]. Coastal waters are one of the most sensitive areas due to their vulnerable ecosystems. Worsened water quality might endanger these ecosystems (such as fish’s habitats [

4]), which has both economical and ecological importance [

3]. It is well-known that the eutrophication of coastal waters and inland waters has been increasing lately, leading to decreased water-quality [

5,

6]. Continuously monitoring of the water-bodies, with special focus to coastal waters is therefore important for various reasons. It can also contribute to improved understanding of the ongoing changes, and the impact of increased anthropogenic activities on the ecosystems [

7].

The quality of water bodies, both globally and regionally, is most efficiently inferred from color using multi-spectral or hyper-spectral remote sensing. The color of the oceans is determined by the different type, amount and distribution of water constituents. Being able to monitor these water constituents allows to retrieve information about the environmental state of the water [

8]. The most common parameters used for monitoring water quality are Chlorophyll-a (Chl-a), Colored Dissolved Organic Matter (CDOM), Total Suspended Matter (TSM), Secchi Disk Depth (SDD), turbidity, Total Phosphorus (TP), to name some [

3].

However, the retrieval of water quality contents from remote sensing data is not always strait forward. Algorithms are generally dependent on the sensors’ characteristics, geographical location, and environmental conditions of the water body. The objective of this paper is to

present and

demonstrate a strategy for an Automatic Model Selection Algorithm (AMSA), for retrieval of water quality parameters from remote sensing data, given an appropriate matchup dataset. Since Chl-a is one of the most important and most studied of these water quality parameters [

5], we will use Chl-a as an example parameter throughout the paper. Besides, estimating aquatic Chl-a concentration has several important applications, in addition to providing information about water-quality. Chl-a occurs in phytoplankton in aquatic environments. Phytoplankton uses photosynthesis in order to live and grow. Capturing of light, which is the driving of photosynthesis [

9], takes place in the Chl-a molecule. Estimating Chl-a content allows to retrieve information about the aquatic biomass and several biophysical processes. During photosynthesis, phytoplankton takes up Carbon-Dioxid (CO

${}_{2}$) [

10]. Therefore, monitoring phytoplankton through Chl-a might also contribute to the understanding of climate change [

11,

12,

13].

Using Chl-a as an example, we will in the following give some rational and motivation for AMSA. Remote sensing of Chl-a content (and other water quality parameters) is done by optical imaging sensors onboard satellites, which have different spectral and spatial resolutions. Chl-a content is usually retrieved by relating the measured signal at the sensor, the remote sensing reflectance (Rrs), to coincident in-water Chl-a measurements (see for instance the National Aeronautics and Space Administration’s (NASA) OC products [

14,

15,

16,

17,

18]). This dataset is denoted a so-called matchup dataset, and forms the basis for most of the algorithms used for Chl-a content estimation from remotely sensed data. Since the various sensors have different number of bands at different central wavelengths (see Table 2), the matchup data has to be calibrated for each given sensor.

Furthermore, there are a manifold of retrieval algorithms available to the user [

19,

20,

21]. Some of them are designed to estimate Chl-a globally, whereas others are region specific. These algorithms are in general sensor specific, this means they require a new or adjusted model for each sensor. For an untrained user, it is often challenging to establish or choose the most suitable Chl-a retrieval model. This is especially the fact for optically challenging aquatic environments, such as coastal waters [

22]. Coastal waters are often dominated by other water constituents than Chl-a, such as CDOM, and CDOM and Chl-a are known to have their absorption peak in the same spectral region. This results in difficulties in distinguishing between the signals originating from Chl-a and CDOM, especially, when Chl-a content is estimated by algorithms that use the absorption peak of the Chl-a molecule.

As more datasets are collected, and computer processing power gets unlimited, machine learning (ML) algorithms have become more feasible in OC applications. ML models are not based on assumption about the Chl-a absorption spectrum. They learn the relationship between the in-situ Chl-a content and the available Rrs values, and use this learned functional relationship for prediction. These models use all the available spectral bands for learning and prediction, which results that the importance of the spectral bands in the regression process is kept hidden. It can be questioned whether all the bands are needed to obtain the best regression for a given model and region. Artificial Neural Networks (ANN) models have been lately successfully applied for Chl-a estimation [

23,

24,

25], and to various other applications, such as for predicting the amount of generated electricity [

26], suspendid sediment load in rivers [

27] and rainfall and runoff predictions ahead in time [

28]. For OC applications, satellite derived Chl-a in optically complex waters is also often estimated by using other ML algorithms [

29,

30,

31,

32].

Furthermore, complex waters show great regional variations, which leads to erroneous Chl-a estimates, when algorithms tuned on global datasets, are applied to a local region [

19]. Therefore, it is often required to design local algorithms, which are trained on datasets from the given region [

33,

34]. However, choosing the most suitable model for a given region can still be challenging.

The above arguments suggest that an automatic model selection approach could be an important tool in choosing the optimum model to monitor a given aquatic environment. Comparisons of models for various OC applications have been carried out in [

35,

36,

37], but to the best of our knowledge, a flexible and automatized model selection tool for OC application has not yet been proposed in the literature. Being able to objectively compare models and determine the most suitable one for the given data and purpose might be beneficial for the users.

The contribution of this paper is to present a strategy for an Automatic Model Selection Algorithm (AMSA), which outputs the most suitable water quality retrieval model, given the matchup dataset. The current AMSA model uses three ML models as input options. ML models usually rely on feature selection in prior to regression [

38]. This is due to the fact that dimensionality reduction is often required to increase accuracy, robustness and computational time [

39]. Using feature selection also helps to correctly interpret the data. The method for choosing the most optimal number and combination of features for the given model is model dependent, and needs to be developed in each case. AMSA uses feature ranking methods to assign relevance to the features, then it evaluates the number and combination of these ranked features in regression models using some quantitative regression performance measures.

Hence, AMSA is not only using feature selection prior to regression, but also feature ranking methods derived from regression models based on different principles. This means that the importance of the features is first determined by using several feature ranking approaches, one tailored to each regression model, then sequential forward selection is applied for comparison. Then the regression models are compared by computing regression performance measures. Finally, AMSA returns the best model for the given matchup dataset. Hence, AMSA is neither limited to a given water quality parameter nor to a feature ranking method/regression model/regression performance measure. The only input that it requires, is the matchup dataset.

For

demonstration of the performance of AMSA, we use three sophisticated ML models for feature ranking and regression. These regression models are the Gaussian Process Regression (GPR), Support Vector Regression (SVR) and Partial Least Square Regression (PLSR) models. GPR has been shown to outperform empirical [

31,

40] and ML regression models [

41] for biophysical parameter retrieval from remotely sensed data. GPR has several advantageous properties besides its excellent regression performance, for instance the certainty level of the estimates and the possibility to access feature relevance. Feature relevance for the GPR model can be accessed by the Sensitivity Analysis (SA) [

31,

32] and the Automatic Relevance Determination (ARD) [

40,

42] feature ranking methods.

The SVR model has also been shown to perform well for OC applications [

29,

43,

44]. In this work, we applied the SA to the SVR model in order to access feature relevance. For classification in neuroimage applications, this has been done in [

45]. Here, we introduced the methodology for regression in Chl-a content estimation.

The PLSR model was included in AMSA, because of the Variable Importance in Projection (VIP) feature ranking methods associated with it. PLSR is a strong regression model, which can handle high dimensional inputs, reduce noise and co-linearity in the data [

46]. The PLSR model has been applied for OC applications in optically complex aquatic environments [

47].

We have previously studied the SA of the GPR model, ARD and VIP feature ranking methods and the GPR and PLSR regression models for Chl-a content estimation in [

32]. In [

32], we used a MERIS matchup dataset and two additional matchups for the MODIS-Aqua and SeaWiFS sensors to evaluate the methodologies, and concluded that these feature ranking methods can be used to reduce the number of features, while still obtaining comparable estimates for Chl-a content, compared to the state-of-art algorithms.

In the current demonstration of AMSA, we show how the proposed strategy can be used to determine automatically a model for oceanic Chl-a content estimation for both global waters and optically complex waters. The matchup datasets we have used here include a synthetic dataset produced by the International Ocean-Colour Coordinating Group (IOCCG dataset) [

48], plus two additional matchups, one for the MERIS sensor (MEdium Resolution Imaging Spectrometer) and one for the MODIS-Aqua (MODerate-resolution Imaging Spectroradiometer) sensor. The IOCCG dataset provides the possibility to threshold the data based on the absorption of the CDOM, and the amount of Chl-a concentrations. Hence, observations which are more likely to occur in complex aquatic environments, can be selected. Furthermore, we resample the IOCCG dataset to match the spectral resolution of the MERIS and MODIS-Aqua matchups.

An additional contribution of this work, is to further extend the feature ranking methods by the sensitivity analysis of the SVR model, which allows us to include the SVR regression model in the AMSA model library. We choose to use the IOCCG dataset to have better control over the optical properties of observations, and include the two matchups for the MERIS and MODIS-Aqua sensors to show that the approach work well on different data sets and for different environmental situations. We highlight that the goal here is to show how the AMSA approach can be used to perform an objective comparison and selection of an optimal model for the given dataset, according to the regression criteria used. AMSA automatically performs feature ranking and training and testing of the regression models. Hence, the output model is already validated. Finally, the demonstration includes two images acquired by MERIS over optically complex aquatic areas to visualize the predictions given by the selected optimal AMSA model.

The rest of this work is organized as follows.

Section 2 introduces the general concept of the AMSA and explains the ML AMSA for oceanic Chl-a content estimation in details. Furthermore, the datasets used in this study are described.

Section 3 presents the results.

Section 4 discusses the results and approach, and highlights advantages and disadvantages of the methodology. Finally,

Section 5 concludes this paper and outlines future work.

## 3. Results

We applied AMSA to the eight datasets. For each dataset the total combination of models being evaluated by AMSA is (feature ranking) · (number of spectral bands) · (regression models). The total number of model evaluation are 84 and 96 for the MODIS-Aqua (7 bands) and MERIS (8 bands) datasets, respectively. This means that by using feature ranking methods, the total number of model evaluations are reduced, which speeds up the computational time required to return the most optimal model. Feature ranking reduces the total number of possible model-combinations by assigning relevance to the features. After the spectral bands were ranked, the sequential forward selection approach automatically trained and tested all the possible model combinations, and output the

best model based on the computed regression performance measures.

Table 3 shows the results of the AMSA algorithm for all the datasets. Note that the NMRSE and R

${}^{2}$ values in

Table 3 are calculated from the test data.

In case of all the synthetic datasets (MS 1a, MS 1b, MS 3a and MS 3b) the best regression model was found to be the GPR, while for most of the real datasets (MS 2b, MS 4a and MS 4b) the strongest regression was obtained by the SVR model. This can be due to the fact, that the synthetic dataset has low noise level in comparison to the real dataset (The parameter that handles noise in the GPR model, should have been tuned for the real datasets. However, in order to make the AMSA as robust as possible, we chose to compute the initial noise parameter by following the same formula).

For the MERIS datasets the best regression was achieved by using the spectral bands ranked by the VIP ranking method for most of the cases (MS 1a, MS 1b and MS 2b). In case of the MODIS-Aqua datasets, the ARD ranking method seemed to result in the best ranking (MS 3a, MS 3b and MS 4b).

For global monitoring, the best model was obtained by using most of the available spectral bands for almost all cases (MS 1a, MS 2a and MS 4a). The only exception was the synthetic MODIS-Aqua dataset, where the best model was already achieved by using only 3 spectral bands.

For eutrophic conditions AMSA resulted in the best regression, when only three or four bands were used. In case of the MERIS datasets (MS 1b and MS 2b), these bands are centered at 510, 560 and 620 nm. For the MODIS-Aqua datasets bands centered at 412, 488 and 678 nm were included in the regression models for both the synthetic (MS 3b) and real (MS 4b) dataset to achieve the strongest regression model.

The regression performance measures show, that the lowest NRMSE and highest R${}^{2}$ were achieved for the synthetic global datasets (MS 1a and MS 3a), while the models resulting in highest NRMSE and lowest R${}^{2}$ were for the eutrophic real datasets (MS 2b and MS 4b). These results also confirm the challenges of Chl-a content estimation from optically complex waters.

#### 3.1. Chlorophyll-a Maps

In order to illustrate the performance of the best models for eutrophic conditions, we chose two full resolution MERIS images acquired over areas, which are assumed to be optically complex waters.

#### 3.2. Cross Validation

The outputs of AMSA for the MERIS datasets (MS1b and MS2b), were the GPR and SVR models with bands centered at 510, 560 and 620 nm. We used cross validation to assess the robustness of the models. This was done by randomly dividing the datasets (MS1b and MS2b) into 80% for training and 20% for testing. Then training and testing of the models was performed by computing the NRMSE and R

${}^{2}$ measures. This was done in 500 iterations. The mean values of the computed measures for the cross validations can be seen in

Table 4.

The cross validation resulted in very similar computed measures for both models. In case of the MS1b dataset, the GPR model resulted in slightly better values, while for the MS2b data the SVR model showed some improvements. This is in good agreement with the measures output by AMSA (

Table 3). The cross validation results also indicate, that in case of the MS1b dataset the difference between the computed regression performance measures for the two models is larger, than is case of the MS2b dataset.

#### 3.3. Visual Illustrations

We applied the AMSA selected GPR and SVR models to the test images.

Figure 7 and

Figure 8 show the estimated Chl-a content.

Figure 7 shows the estimated Chl-a content for the coastal water of East USA by using the GPR (left-column) and SVR (right-column) model with bands centered at 510, 560 and 620 nm. The overall Chl-a maps show that the GPR model predicts higher Chl-a content than the SVR model (top-row). It can be seen in the enlarged area (bottom-row), that there are regions where the SVR model assigns higher values to the Chl-a contents.

Figure 8 shows the estimated Chl-a content maps for the southern part of the Baltic sea. In this case, the overall predicted Chl-a content values (top-row) seem to be more similar for the GPR and SVR models. There are some regional variations in this case as well. The bottom-row in

Figure 8 shows the enlarged area. Both models seem to capture the eddies in fine details.

## 4. Discussion

In this work, we presented a strategy to automatically determine the most suitable model for a given dataset for OC applications. The AMSA approach chooses the best model to estimate any water quality parameter from remotely sensed data. AMSA can determine the most suitable model for any regions and sensors. The input to AMSA is the matchup data, and the output is the best model. AMSA also outputs the number and combination of features needed to obtain the output model, and the regression performance measures for the best model.

We presented the AMSA for oceanic Chl-a content estimation by using ML methods. The AMSA we built here, has four feature ranking methods, the SA GPR, SA SVR, ARD and VIP methods, three regression models, the GPR, SVR and PLSR models, and two regression performance measures, the NRMSE and R${}^{2}$ to evaluate the regression models. The four feature ranking methods are associated with the three sophisticated regression models, therefore it was a natural choice to include them in the AMSA we chose here.

Both the GPR and SVR models have been shown to be strong regression models for OC applications. They are flexible non-linear kernel methods, using kernel functions in the regression stage. The choice of the kernel function is strongly dependent on the nature of the data. Here we used the most common kernel, the squared exponential kernel function, which has several advantageous properties. It is a universal kernel [

62], and infinitely differentiable. This is a very important property with regard to the SA feature ranking methods, which uses the partial derivatives of the mean function in the GPR and SVR models. The squared exponential kernel function also allows to assess feature relevance by using the length-scale parameter in the function. The ARD feature ranking method uses the inverse of the optimized length-scale parameter to assign relevance. Optimization is done numerically through the maximum likelihood function, which in some cases can be trapped in a local minimum. This might result in erroneous ranking. Furthermore, the initialization of the parameters in the kernel function also have an impact on the optimization, and hence on the regression as well. Therefore, developing the robustness of initializing these parameters from the data should be prioritized in future methodological development.

Despite the PLSR model differs from the kernel machines in its underlying fundamental principles, it also provides the possibility to assess feature relevance through the VIP method. In this work, the AMSA has not output the PLSR model as the most suitable algorithm for Chl-a content estimation. However, in many cases (see

Table 3) the VIP method seemed to rank the spectral bands such that the strongest regression was achieved by using the kernel machines. Thus, using the VIP method for feature ranking and kernel machines for regression might be a good combination of methods.

In this work we showed how these ML methods can be used to build an AMSA to estimate Chl-a content in different water conditions and for different sensors. The chosen matchup datasets (MERIS, MODIS-Aqua and the synthesized IOCCG dataset) allowed us to simulate water conditions with increased complexity. Note, although the Chl-a threshold we set here to 0.7 mgm

${}^{-3}$ might be low for optically complex waters, the observations in the real

eutrophic datasets above this value, still seem to originate from coastal environments (

Figure 6).

AMSA gave as result that for the synthetic datasets the GPR performed best, but for most of the real dataset the best model was obtained with the SVR model. However, the cross validation results suggest that the SVR model might only have slightly better performance than the GPR model for these datasets.

Generally, for global Chl-a content estimation most of the spectral bands were needed to achieve the

best regression with the chosen models. This might be due to the larger variety in the data. This result was in contrast to water conditions of increased complexity, where using only three or four of the available spectral bands as inputs resulted the strongest regression. In case of MERIS, these bands were centered at 510, 560 and 620 nm for both the synthetic and real datasets. The spectral band at 510 nm is used to estimate Chl-a content in CDOM rich waters [

70]. This is due to the fact that both CDOM and Chl-a has absorption in the blue region of the visible part of the electromagnetic spectrum. The spectral band at 510 nm is mostly representative for the accessory pigments. However, since these pigments are strongly correlated with Chl-a, this band has been widely used to estimate Chl-a content from optically complex waters [

16,

17,

70]. Furthermore, the green spectral band, centered at 560 nm is commonly used for Chl-a estimation, since there is little or no absorption due to phytoplankton in this region. Therefore, this is an important band to use as a reference wavelength in many Chl-a content retrieval algorithms [

70]. Using red bands, included the band centered at 620 nm, to estimate Chl-a content has also been commonly used for optically complex waters due to the second absorption peak of the Chl-a [

16].

For the

eutrophic MODIS-Aqua datasets, the spectral bands centered at 412, 488 and 678 nm were found to have importance in the estimation of Chl-a for both the synthesized and real datasets. The bands centered at 488 and 678 nm are in good correspondence with the results for the MERIS datasets. The spectral band centered at 412 nm has also been suggested for Chl-a estimation in complex waters due to the deviation in the absorption between CDOM and Chl-a in this spectral region [

71].

Since we used a synthetic resampled dataset to present the performance of AMSA, the model outputs differed from the real datasets. Therefore, we chose to illustrate the best models for both the synthetic and real datasets for eutrophic conditions for the MERIS sensor for applications. In case of the coastal part of eastern USA, the GPR assigned in general higher values to the Chl-a content than the SVR model. However, enlarging a region close to shore revealed that the SVR model estimated higher Chl-a than the GPR model. This was also observable for the southern part of the Baltic sea, with less pronounced differences. The illustrative example also showed that both models could capture the same patterns and reveal fine details. Most probably there is a systematic bias occurring in the models. This can be adjusted by tuning the initial parameters in the kernel function, once the model for a given purpose is determined.