Optical Water Type Guided Approach to Estimate Optical Water Quality Parameters

Currently, water monitoring programs are mainly based on in situ measurements; however, this approach is time-consuming, expensive, and may not reflect the status of the whole water body. The availability of Multispectral Imager (MSI) and Ocean and Land Colour Instrument (OLCI) free data with high spectral, spatial, and temporal resolution has increased the potential of adding remote sensing techniques into monitoring programs, leading to improvement of the quality of monitoring water. This study introduced an optical water type guided approach for boreal regions inland and coastal waters to estimate optical water quality parameters, such as the concentration of chlorophyll-a (Chl-a) and total suspended matter (TSM), the absorption coefficient of coloured dissolved organic matter at a wavelength of 442 nm (aCDOM(442)), and the Secchi disk depth, from hyperspectral, OLCI, and MSI reflectance data. This study was based on data from 51 Estonian and Finnish lakes and from the Baltic Sea coastal area, which altogether were used in 415 in situ measurement stations and covered a wide range of optical water quality parameters (Chl-a: 0.5–215.2 mg·m−3; TSM: 0.6–46.0 mg·L−1; aCDOM(442): 0.4–43.7 m−1; and Secchi disk depth: 0.2–12.2 m). For retrieving optical water quality parameters from reflectance spectra, we tested 132 empirical algorithms. The study results describe the best algorithm for each optical water type for each spectral range and for each optical water quality parameter. The correlation was high, from 0.87 up to 0.93, between the in situ measured optical water quality parameters and the parameters predicted by the optical water type guided approach.


Introduction
Remote sensing offers effective ways to observe spatial and/or temporal variations in water quality, which is vital for the comprehensive assessment and management of water bodies [1]. Currently, water monitoring programs are mainly based on in situ measurements; however, considering that water bodies are dynamic in nature, this method may not reflect the status of the whole water body. Therefore, it is important to implement techniques that allow more effective monitoring of the aquatic environment. However, remote sensing of inland and coastal waters can be challenging as they are independently influenced by different optically significant constituents (OSC)-coloured dissolved organic matter (CDOM), phytoplankton, and total suspended matter (TSM). All of these OSCs shape the spectral characteristics that are measured by the remote sensor.
Phytoplankton consists of single-celled, free-floating, photosynthetic organisms that form the base of the aquatic food web, being an important component of the carbon cycle. Seasonal phytoplankton This study was based on data from 415 in situ measurement stations gathered from the boreal region inland and coastal waters from April to September in 2012-2019. Data were collected from 51 different Estonian and Finnish lakes and from three coastal locations in the Baltic Sea ( Figure 1). The first area in the Baltic Sea was Pärnu Bay, located on the west coast of Estonia, where the mean depth is 4.7 m (maximum 8 m) and the water is well mixed. Since Pärnu Bay is shallow, open to winds, and has sandy, clayey, and muddy bottom, wind-derived resuspension can lead to quite high and changeable TSM concentrations. The second area is located in the Gulf of Finland region, where the mean depth is 37 m, with a maximum of 123 m (Paldiski Deep), and where the water column is vertically stratified [16]. The area is prone to upwelling and downwelling events in the summer and autumn [46,47]. The third area is located in the Western Gotland Basin close to the coast of Sweden. The Baltic Sea reaches its greatest depth in the Western Gotland Basin, Landsort Deep (459 m), but in our measurement stations, the water depth was up to 30 m. In situ lake data were mainly collected from Estonian lakes (42 lakes), from small lakes, such as Lake Holstre (0.04 km 2 ), to large lakes, such as Lake Peipsi (3555 km 2 ); from lakes with highly transparent water, such as Lake Nohipalo Valgjärv (Secchi disk depth 5 m), to lakes with very brown-water, such as Nohipalo Mustjärv (Secchi disk depth 0.3 m) ( Figure 2). The in situ dataset contains data from measurements of water-leaving reflectance (R(λ)), concentrations of OSC (such as the concentration of Chl-a and TSM, and aCDOM(442)), Secchi disk depth, and different environment parameters, such as wind speed, waves heights, cloudiness, and sun condition. The Secchi disk depths were measured on the shadowed side of the vessel with a white disk with a 30 cm diameter or with a 20 cm diameter white disk with holes. The wind speed was measured with a handheld mechanical anemometer. The wave height, cloudiness, and sun condition were estimated by visual inspection.  The in situ dataset contains data from measurements of water-leaving reflectance (R(λ)), concentrations of OSC (such as the concentration of Chl-a and TSM, and aCDOM(442)), Secchi Remote Sens. 2020, 12, 931 5 of 35 disk depth, and different environment parameters, such as wind speed, waves heights, cloudiness, and sun condition. The Secchi disk depths were measured on the shadowed side of the vessel with a white disk with a 30 cm diameter or with a 20 cm diameter white disk with holes. The wind speed was measured with a handheld mechanical anemometer. The wave height, cloudiness, and sun condition were estimated by visual inspection.
2.1.1. Water-Leaving Reflectance (R(λ)) R(λ) was measured with TriOS-RAMSES hyperspectral radiometers using two different setups. The first setup was used in 233 in situ measurement stations and the second setup was used on 182 in situ measurement stations.
For the first setup, the above the water system consisted of three TriOS-RAMSES hyperspectral radiometers: two radiance sensors measuring upwelling radiance (L t (λ)) and downwelling radiance (L sky (λ)) in the same azimuthal plane, and one irradiance sensor measuring downwelling irradiance (E d (λ)) [48]. The radiance sensor's nadir/zenith angles of 40 • were fixed in the frame. The measured spectral range was 350-900 nm. The recording interval was once per 10-second interval. The solar azimuth angle was kept between 90 • and 180 • and was adjusted manually during measurements. The calculation of R(λ) followed the protocol of REVAMP [48] and included the next steps. First, all measured radiance and irradiance spectra were linearly interpolated to a 1 nm step. Secondly, R(λ) was calculated as where ρ(w) is the sea surface reflectance as function of wind speed (w, m·s −1 ) and calculated as ρ(w) = 0.0256 + 0.00039w + 0.000034w 2 [48]. Finally, the median R(λ) was calculated and used as the representative of the measurement station. For the second setup, the profiling system consisted of two TriOS-RAMSES hyperspectral radiometers: one irradiance sensor measuring downwelling irradiance (E d (λ)) and one radiance sensor measuring upwelling radiance (L u (λ)). Measurements were made above the water, below the water surface, and at different depths in the water column. In this study, only above-the-water measurements were used. The measured spectral range was 350-900 nm. At every depth, five recordings were made. The R(λ) calculations included three steps. First, all measured spectra were linearly interpolated to a 1 nm step. Secondly, R(λ) was calculated as Finally, the median R(λ) was calculated and used as the representative of the in situ measurement station.
To study the implementation capacity of the OWT guided approach for retrieving water quality parameters, the R(λ) spectra representing the in situ measurement stations were also convolved into MSI and OLCI bands using spectral response functions (SRFs) of satellite sensor bands and calculated according to Uudeberg et al. [45]. The SRFs for the MSI and OLCI were taken from [49] and [50], respectively.

Analysis of Water Samples
Water samples for measurements of the concentrations of OSC were collected from the water surface (up to 0.5 m depth) according to ISO 5667-3 [51] and analysed according to ISO 10260 [52]. After the samples were filtered through Whatman GF/F filters that retain fine particles down to 0.7 µm, and the pigments were extracted with 5 ml of 96% ethanol, the concentrations of Chl-a were measured spectrophotometrically with a Hitachi U-3010 spectrophotometer and calculated according to Jeffrey and Humphrey [53]. After the samples were filtered through a filter with a pore size of 0.2 µm, CDOM was measured in a 5 cm optical cuvette against distilled water with a Hitachi Remote Sens. 2020, 12, 931 6 of 35 U-3010 spectrophotometer and calculated according to Lindell et al. [54]. The concentrations of TSM were measured gravimetrically after the samples were filtered through pre-washed, pre-ashed, and pre-weighed Whatman GF/F filters.
The water samples collected during July 2018 in the Gulf of Finland had some differences in analysis. For TSM, a fixed amount of water (0.750 L) was filtered through pre-combusted and pre-weighted Millipore membrane filters with a pore size of 0.45 µm. For Chl-a, the Thermo Helios γ spectrophotometer was used and the concentrations of Chl-a were calculated according to Lorenzen [55].

Classification of Optical Water Types (OWTs)
The classification of OWTs introduced by Uudeberg et al. [45] was used to determine the OWT. The classification divides the boreal region inland and coastal waters into five OWTs, each linked to the specific bio-optical condition in order to reflect the dominance of group OSC concentrations. The Clear OWT corresponds to water with the highest transparency and lowest OSC concentrations. In the Moderate OWT, the concentrations of OSC have risen, but none of them dominate. In the Turbid OWT, the TSM is the dominant OSC and the R(λ) values are the highest. In the Very Turbid OWT, Chl-a is the dominant OSC. The strong Chl-a peak, which is associated with blooms, occur in the red part of spectra. The Brown OWT water is dominated by CDOM.
The OWT classification is based on reflectance spectra key features, such as the wavelength of the maximum, the slopes, and the amplitude of R(λ). The maximum likelihood of individual spectra to reference spectra using a combination of spectral correlation similarity and modified spectral angle similarity was used to determine the OWT for measured R(λ) spectra. The OWTs were assigned on each in situ measurement of R(λ) and the in situ measurements of R(λ) convolved into MSI and OLCI sensor bands.

Algorithms for Retrieving Water Quality Parameters
OWT guided approach was used for finding models for retrieving water quality parameters, such as the concentration of Chl-a, the concentration of TSM, the aCDOM(442), and Secchi disk depth, from R(λ) spectra. Since the inland and coastal waters are optically complex and cover large variations of OSC concentrations, the OWT guided approach allows to find the best model for different bio-optical conditions. In this study, we used 132 previously published algorithms from manuscripts that were available for us, including 60 for Chl-a, 39 for TSM, 21 for CDOM, and 12 for Secchi disk depth. Details of used algorithms are shown in Appendix A Table A1.
Repeated K-fold Cross-validation [56] was the statistical method to build and select the model using published algorithms for optical water quality parameter and the R package caret [57] by Max Kuhn was used for implementation. The Repeated K-fold Cross-validation contains following steps: (1) the data are divided randomly into k folds of equal size; (2) the k-1 folds are used for the model training; (3) the hold-out fold is used for the model validation; and (4) steps (1)-(3) are repeated n times. Finally, based on a selected statistical metric, the model with the best average score is assigned as the final model. Ten folds, ten repeats, and the root mean squared error as metrics were used in this study.
The ranking system used a combination of scaled and threshold-based statistical metrics to select the model for retrieving the optical water quality parameter from R(λ) per OWTs. Statistical metrics, such as RMSE, RMSLE, MAE, MAPE, and BIAS, were scaled from 0 (maximum value) to 1 (minimum value); the R 2 was scaled from 0 (minimum value) to 1 (maximum value); and for the p-value, the thresholds were used as 1 for p-value under 0.001, 0.5 for p-value between 0.001 and 0.05, and 0, when p-value was over 0.05. Finally, all values were summed up and the model with the highest score was selected as the model for future retrievals. The following statistical metrics were used in this study and they are described in detail in R package Metrics [58]. To find the best solutions also for the MSI and OLCI sensors R(λ) spectral range, all calculations were made separately on R(λ) spectra range with a 1 nm step, R(λ) spectra convolved into OLCI sensor bands, and R(λ) spectra convolved into MSI sensor bands. Since the OLCI and MSI band locations were sometimes different than the band locations described in algorithms, the closest possible band was selected in case of OLCI or MSI were lacking exact bands.

Satellite Dataset
To assess the ability of the OWT guided approach to obtain optical water quality parameters from satellite data, the full resolution Level-1 images acquired in 2018 and 2019 by OLCI, onboard the Sentinel-3 satellite, were used. Level-1 images were processed with the Case-2 Regional CoastColour (C2RCC) atmospheric correction processor v1.15, based on the previous study's recommendation [45]. Image primary processing, such as subsetting the region of interest and atmospheric correction processing, and downloading was done using the ESTHub Processing Platform, which is the Estonian Land Board's national mirror site for Copernicus satellite data and processing [59]. After that, the OWT were determined for every pixel of the satellite images. Finally, the OWT-based models described in Tables 2-5 were used to find the optical water quality parameters.

Description of In Situ Dataset
Our in situ dataset from 415 in situ measurement stations covers wide ranges of the OSC concentrations and Secchi disk depth. However, datasets of all four parameters are significantly non-Gaussian. Table 1 shows the skewness of Chl-a and CDOM are 4.3 and 4.1, respectively. This indicates that their distributions are not symmetrical and are skewed strongly towards higher values. Moreover, Table 1 allows us to assume based on the high maximum and median values of the Chl-a, CDOM, and Secchi disk depth, that the dataset includes also measurements from extreme optical conditions for the boreal region. For example, the high transparency (Secchi disk depth up to 12.2 m) was measured in July in the Gulf of Finland during an upwelling event. Therefore, water masses with different properties (low OSC concentrations, high transparencies, and cold temperatures (~4 • C)) were present compared to the usual situations. High CDOM values around 40 m −1 were measured multiple times through the vegetation period in dark-water small Estonian lakes. Chl-a values of 215.2 mg·m −3 Remote Sens. 2020, 12, 931 8 of 35 and 168.4 mg·m −3 were captured during the blooming situation in large Lake Peipsi. However, in the dataset, there is a lack of measurement stations where the TSM was dominated by mineral particles. For instance, the median proportion of suspended particulate organic matter in TSM is 76%.  Figure 3 shows the in situ measured R(λ) at 415 stations classified into various OWTs, and Figure 4 shows the range of the variations of the OSC concentrations and Secchi disk depth for OWTs. The OWT for each in situ measured R(λ) was determined by the maximum likelihood of the individual spectrum to type averages. The Clear OWT was assigned to the R(λ) of 100 measurement stations. The maxima of R(λ) were between 540 and 580 nm and the mean maximum value was 0.012. Waters were the most transparent (maximum Secchi disk depth 12.2 m) and with the lowest OSC concentrations. However, in Figure 3, one Clear OWT station has a too high maximum value at 550 nm (0.06) compared to the rest of the Clear OWT spectra, but inspection of the station OSC concentration does not confirm a misclassification. The Moderate OWT was the most frequent in our dataset, with the R(λ) of 124 stations being assigned to this OWT. The Turbid OWT was assigned to the R(λ) of 98 measurement stations. This OWT is dominated by TSM and the highest TSM values were expected to be in this OWT. However, in this dataset, the highest TSM values were in stations assigned to the Very Turbid OWT. TSM is mainly dominated by SPOM in our dataset, and especially in these high TSM valued Very Turbid OWT stations (88% of TSM was SPOM). These waters are generally dominated by Chl-a, and stations are classified correctly. The Very Turbid OWT was assigned to the R(λ) of 51 measurement stations, which were dominated by Chl-a. The Brown OWT was assigned to the R(λ) of 42 stations. These stations had very low R(λ) values (median value 0.003), with maxima in the red part of the spectrum, and very high aCDOM(442) values with a median at 13.6 m −1 .
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 36 Figure 3. In situ measured R(λ) classified into various optical water types by the maximum likelihood of individual spectra to type averages, using the spectral correlation similarity and modified spectral angle similarity.  . In situ measured R(λ) classified into various optical water types by the maximum likelihood of individual spectra to type averages, using the spectral correlation similarity and modified spectral angle similarity. The accuracies of OWT estimations, based on in situ measurements of R(λ) convolved into MSI and OLCI sensor bands using sensor-specific SRFs, were 96% for OLCI and 91% for MSI. As the goal was to use the OWT guided approach to retrieve optical water quality parameters, it was important to understand the capability to assign the same OWT on in situ measurements of R(λ) regardless of spectral scale (Ramses with 1 nm step resolution, and distinct bands for OLCI and MSI). Confusion matrices were constructed between the OWTs assigned on in situ measurements of R(λ) (set as the true OWT value) and those which were assigned based on in situ measurements of R(λ) convolved into various sensor bands (set as the predicted OWT value). As shown in Figure 5, the OLCI confusion The accuracies of OWT estimations, based on in situ measurements of R(λ) convolved into MSI and OLCI sensor bands using sensor-specific SRFs, were 96% for OLCI and 91% for MSI. As the goal was to use the OWT guided approach to retrieve optical water quality parameters, it was important to understand the capability to assign the same OWT on in situ measurements of R(λ) regardless of spectral scale (Ramses with 1 nm step resolution, and distinct bands for OLCI and MSI). Confusion matrices were constructed between the OWTs assigned on in situ measurements of R(λ) (set as the true OWT value) and those which were assigned based on in situ measurements of R(λ) convolved into various sensor bands (set as the predicted OWT value). As shown in Figure 5, the OLCI confusion matrix demonstrates a strong distinction for Clear and Brown OWTs (100% correct assignment) and the highest rate of misclassification in Very Turbid OWT, with 8% of spectra being misclassified as Brown or Turbid OWT. The MSI confusion matrix illustrates a strong distinction for Very Turbid, and Brown OWTs (100% and 98% correct assignment respectively); however, 16% of Moderate OWT spectra were misclassified.

Predictive Models for Concentration of Chlorophyll-a (Chl-a)
The best models to retrieve the concentration of Chl-a from the R(λ) spectra for each OWT and each spectral ranges, such as Ramses with 1 nm step, and OLCI and MSI bands, are defined in Table 2. We tested 60 published empirical algorithms to retrieve the Chl-a values and the best ones for all the cases were chosen based on our model ranking system described in paragraph 2.3. As a result, the four best models for each OWT and each spectral range are shown on radar plots in Appendix B Figure A1. The linear regression models showed a better performance for all the cases.
For the Clear OWT, in the case of Ramses and OLCI, the model CHL69 using the Gitelson et al. [60] algorithm had the highest score, and, for MSI, the model using the MERIS maximum chlorophyll index by Gower et al. [61] was the most suitable. The model CHL97 using the Mishra and Mishra [62] algorithm was the best for Moderate OWT for Ramses and in Very Turbid OWT, for all the different spectral ranges. For Turbid OWT, all three spectral ranges were described best with different models with different approaches: for Ramses, the model using the Kutser et al. [63] algorithm based on two-band ratio approach; for OLCI, the model using the Gitelson et al. [64] algorithm based on a three-band approach, and for MSI, the model using the Zimba and Gitelson [11] four-band approach.
For Brown OWT, retrieving the concentration of Chl-a from R(λ) was the most difficult and the coefficient of determination suggested that the regression model explained about 40% of the variance observed in the in situ measured data. However, all models were statistically significant based on the p-values.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 36 matrix demonstrates a strong distinction for Clear and Brown OWTs (100% correct assignment) and the highest rate of misclassification in Very Turbid OWT, with 8% of spectra being misclassified as Brown or Turbid OWT. The MSI confusion matrix illustrates a strong distinction for Very Turbid, and Brown OWTs (100% and 98% correct assignment respectively); however, 16% of Moderate OWT spectra were misclassified. Figure 5. The accuracy of the optical water type (OWT) assignment using in situ measurements of R(λ) convolved into Ocean and Land Colour Instrument (OLCI) and Multispectral Imager (MSI) bands by normalized confusion matrices. The rows show the true OWTs determined from in situ measurements of R(λ) using Ramses, and the columns show the predicted OWTs determined from the convolved R(λ). The diagonal elements represent the situation where the predicted OWT is the same as the true OWT (i.e., the correct classification).

Predictive Models for Concentration of Chlorophyll-a (Chl-a)
The best models to retrieve the concentration of Chl-a from the R(λ) spectra for each OWT and each spectral ranges, such as Ramses with 1 nm step, and OLCI and MSI bands, are defined in Table  2. We tested 60 published empirical algorithms to retrieve the Chl-a values and the best ones for all the cases were chosen based on our model ranking system described in paragraph 2.3. As a result, the four best models for each OWT and each spectral range are shown on radar plots in Figure A1. The linear regression models showed a better performance for all the cases.
For the Clear OWT, in the case of Ramses and OLCI, the model CHL69 using the Gitelson et al. [60] algorithm had the highest score, and, for MSI, the model using the MERIS maximum chlorophyll index by Gower et al. [61] was the most suitable. The model CHL97 using the Mishra and Mishra [62] algorithm was the best for Moderate OWT for Ramses and in Very Turbid OWT, for all the different spectral ranges. For Turbid OWT, all three spectral ranges were described best with different models with different approaches: for Ramses, the model using the Kutser et al. [63] algorithm based on twoband ratio approach; for OLCI, the model using the Gitelson et al. [64] algorithm based on a threeband approach, and for MSI, the model using the Zimba and Gitelson [11] four-band approach. For Brown OWT, retrieving the concentration of Chl-a from R(λ) was the most difficult and the coefficient of determination suggested that the regression model explained about 40% of the variance observed in the in situ measured data. However, all models were statistically significant based on the p-values.
The correlations shown in Figure 6, between the concentration of Chl-a predicted using OWT based models and in situ measured concentration of Chl-a, were strong, such as 0.93 for Ramses and OLCI, and 0.92 for MSI. Overall, the predicted concentrations of Chl-a were the most accurate compared to other optical water quality parameters explored in this study. Table 2. The best published chlorophyll-a (Chl-a) predictive models tested in this study for each optical water types (OWT) and for sensors with different spectral scales. Descriptions of the algorithms used for these models are found in Table A1, and the acronyms and abbreviations are in Table A2. The band central wavelength is used to mark the Ocean and Land Colour Instrument (OLCI) and Multispectral Imager (MSI) models bands. The correlations shown in Figure 6, between the concentration of Chl-a predicted using OWT based models and in situ measured concentration of Chl-a, were strong, such as 0.93 for Ramses and OLCI, and 0.92 for MSI. Overall, the predicted concentrations of Chl-a were the most accurate compared to other optical water quality parameters explored in this study.

Predictive Models for Concentration of Total Suspended Matter (TSM)
The best models to retrieve the concentrations of TSM from R(λ) spectra for each OWT and for each spectral range are defined in Table 3. We tested 39 published empirical algorithms to retrieve the concentrations of TSM. The model TSM18 using the Kutser et al. [65] algorithm based on the reflectance spectra peak at 810 nm was the most well represented based on our models ranking system results. This model suited well for the MSI sensor, where it had the highest score for Turbid, Very Turbid, and Brown OWTs and according to Figure A2, for the Moderate OWT, it had the second Figure 6. Comparison of concentrations of Chl-a estimated from in situ measured R(λ) spectra using the optical water type (OWT) guided approach and in situ measured concentrations of Chl-a for different spectral scales: (left to right) Ramses with 1 nm resolution, OLCI, and MSI. OWTs are indicated by colours and the line shows 1:1 relationship. Table 2. The best published chlorophyll-a (Chl-a) predictive models tested in this study for each optical water types (OWT) and for sensors with different spectral scales. Descriptions of the algorithms used for these models are found in Table A1, and the acronyms and abbreviations are in Table A2. The band central wavelength is used to mark the Ocean and Land Colour Instrument (OLCI) and Multispectral Imager (MSI) models bands.

Predictive Models for Concentration of Total Suspended Matter (TSM)
The best models to retrieve the concentrations of TSM from R(λ) spectra for each OWT and for each spectral range are defined in Table 3. We tested 39 published empirical algorithms to retrieve the concentrations of TSM. The model TSM18 using the Kutser et al. [65] algorithm based on the reflectance spectra peak at 810 nm was the most well represented based on our models ranking system results. This model suited well for the MSI sensor, where it had the highest score for Turbid, Very Turbid, and Brown OWTs and according to Figure A2, for the Moderate OWT, it had the second place due to its low bias contribution. Model TSM18 had high scores for Ramses and the OLCI spectral scale for Turbid, Very Turbid, and Brown OWTs. For the Clear OWT, the model TSM39 using Zhang et al. [66] developed log-transformed multiple linear regression algorithm, based on a combination of 488, 555, and 645 nm information, was the best for all sensors. The Moderate OWT had the lowest coefficient of determination for all the spectral ranges from all the OWTs; however, all models were statistically significant based on p-values.
The correlations between concentrations of TSM predicted using OWT based models and in situ measured concentrations of TSM values were strong: 0.87, 0.89, 0.88 for Ramses, OLCI, and MSI respectively. However, Figure 7 shows that model tends to underestimate strongly points with high TSM values (from 15 to 25 mg·L −1 ), present in Moderate OWT.

Predictive Models for Absorption Coefficient of Coloured Dissolved Organic Matter (CDOM)
The best models to retrieve the aCDOM(442) from the R(λ) spectra for each OWT and for each spectral range, are defined in Table 4. We tested 21 published empirical algorithms to retrieve the CDOM values. The power regression models (referred also as a log-log regression model) of the reflectance ratio showed better performance for all the spectral ranges in Clear, Moderate, and Turbid OWTs. For instance, the reflectance ratio 665/560 nm described by Menken et al. [67] was the best for OLCI and MSI; however, for Ramses, the model using the ratio 560/660 nm had the highest score. For the Very turbid OWT, the linear regression model CDOM119, using the Ammenberg et al. [68] algorithm, was the best for all spectral ranges. Furthermore, this model received high scores also in Clear, Moderate, and Turbid OWTs for OLCI and MSI, as shown in Figure A3. For the Brown OWT, the log-transformed multiple linear regression models had the best performance; however, this time the model for MSI used different reflectances compared to other spectral ranges. The p-value showed that all models were statistically significant, but only the Clear OWT coefficient of determination suggested that the regression model explained 74% of the variance observed in the in situ measured data.

Predictive Models for Absorption Coefficient of Coloured Dissolved Organic Matter (CDOM)
The best models to retrieve the aCDOM(442) from the R(λ) spectra for each OWT and for each spectral range, are defined in Table 4. We tested 21 published empirical algorithms to retrieve the CDOM values. The power regression models (referred also as a log-log regression model) of the reflectance ratio showed better performance for all the spectral ranges in Clear, Moderate, and Turbid OWTs. For instance, the reflectance ratio 665/560 nm described by Menken et al. [67] was the best for OLCI and MSI; however, for Ramses, the model using the ratio 560/660 nm had the highest score. For the Very turbid OWT, the linear regression model CDOM119, using the Ammenberg et al. [68] algorithm, was the best for all spectral ranges. Furthermore, this model received high scores also in Clear, Moderate, and Turbid OWTs for OLCI and MSI, as shown in Figure A3. For the Brown OWT, the log-transformed multiple linear regression models had the best performance; however, this time the model for MSI used different reflectances compared to other spectral ranges. The p-value showed that all models were statistically significant, but only the Clear OWT coefficient of determination suggested that the regression model explained 74% of the variance observed in the in situ measured data.
The correlations between the predicted aCDOM(442), using OWT based models, and the in situ measured aCDOM(442) were strong, 0.88 for Ramses and 0.87 for both OLCI and MSI. Still, as shown in Figure 8, the capability to estimate the Brown OWT high CDOM values over 20 m −1 was nonexistent and needs future investigation. Table 4. The best published coloured dissolved organic matter (CDOM) predictive models tested in this study for each optical water type (OWT) and for sensors with different spectral scales. Descriptions of the algorithms used for these models are found in Table A1, and the acronyms and abbreviations, in Table A2. The band central wavelength is used to mark the Ocean and Land Colour Instrument (OLCI) and Multispectral Imager (MSI) models bands.   Table 3. The best published total suspended matter (TSM) predictive models tested in this study for each optical water type (OWT) and for sensors with different spectral scales. Descriptions of the algorithms used for these models are found in Table A1, and acronyms and abbreviations, in Table A2. The band central wavelength is used to mark the Ocean and Land Colour Instrument (OLCI) and Multispectral Imager (MSI) models bands. The correlations between the predicted aCDOM(442), using OWT based models, and the in situ measured aCDOM(442) were strong, 0.88 for Ramses and 0.87 for both OLCI and MSI. Still, as shown in Figure 8, the capability to estimate the Brown OWT high CDOM values over 20 m −1 was non-existent and needs future investigation. Table 4. The best published coloured dissolved organic matter (CDOM) predictive models tested in this study for each optical water type (OWT) and for sensors with different spectral scales. Descriptions of the algorithms used for these models are found in Table A1, and the acronyms and abbreviations, in Table A2. The band central wavelength is used to mark the Ocean and Land Colour Instrument (OLCI) and Multispectral Imager (MSI) models bands.

Predictive Models for Secchi Disk Depth
The best models to retrieve the Secchi disk depth value from R(λ) spectra for each OWT and for each sensor with different spectral ranges are defined in Table 5. We tested 12 published empirical algorithms to retrieve the Secchi disk depth. The model ZSD104, using the Kloiber et al. [69] algorithm based on log-transformed multiple regression of form 485/660 nm ratio and the additional 485 nm, had the best performance based on our ranking system. In all spectral ranges, the ZSD104 model was the best for Clear, Moderate, and Turbid OWTs and was also in the top four models for Very Turbid and Brown OWTs, as shown in Figure A4. For Very Turbid OWT, the model ZSD112 using Wu et al. [70] algorithm, based on multiple linear regression of five variables, had the highest score for all spectral ranges. For the Brown OWT, retrieving the Secchi disk depth from R(λ) was the most difficult and the highest correlation of all tested models was with ZSD109 using Hellweger et al. [71] algorithm, based on the reflectance at 660 nm. However, the coefficient of determination was around 0.28, but a p-value lower than 0.0005 shows that the model is still statistically significant.
Correlations between the Secchi disk depth predicted using OWT based models and the in situ measured Secchi disk depth values were strong, 0.91 for all the tested spectral ranges. However, as

Predictive Models for Secchi Disk Depth
The best models to retrieve the Secchi disk depth value from R(λ) spectra for each OWT and for each sensor with different spectral ranges are defined in Table 5. We tested 12 published empirical algorithms to retrieve the Secchi disk depth. The model ZSD104, using the Kloiber et al. [69] algorithm based on log-transformed multiple regression of form 485/660 nm ratio and the additional 485 nm, had the best performance based on our ranking system. In all spectral ranges, the ZSD104 model was the best for Clear, Moderate, and Turbid OWTs and was also in the top four models for Very Turbid and Brown OWTs, as shown in Figure A4. For Very Turbid OWT, the model ZSD112 using Wu et al. [70] algorithm, based on multiple linear regression of five variables, had the highest score for all spectral ranges. For the Brown OWT, retrieving the Secchi disk depth from R(λ) was the most difficult and the highest correlation of all tested models was with ZSD109 using Hellweger et al. [71] algorithm, based on the reflectance at 660 nm. However, the coefficient of determination was around 0.28, but a p-value lower than 0.0005 shows that the model is still statistically significant.
Correlations between the Secchi disk depth predicted using OWT based models and the in situ measured Secchi disk depth values were strong, 0.91 for all the tested spectral ranges. However, as shown in Figure 9, some higher Clear OWT Secchi disk depth estimations from the R(λ) spectra had a lower accuracy. These in situ measurements were made during an upwelling event in the Gulf of Finland. Table 5. The best Secchi disk depth predictive models using published algorithms tested in this study for each optical water types (OWT) and for sensors with different spectral scales. Descriptions of the algorithms used for building models are found in Table A1, and the acronyms and abbreviations, in Table A2. The band central wavelength is used to mark the Ocean and Land Colour Instrument (OLCI) and Multispectral Imager (MSI) models bands. shown in Figure 9, some higher Clear OWT Secchi disk depth estimations from the R(λ) spectra had a lower accuracy. These in situ measurements were made during an upwelling event in the Gulf of Finland. Table 5. The best Secchi disk depth predictive models using published algorithms tested in this study for each optical water types (OWT) and for sensors with different spectral scales. Descriptions of the algorithms used for building models are found in Table A1, and the acronyms and abbreviations, in Table A2. The band central wavelength is used to mark the Ocean and Land Colour Instrument (OLCI) and Multispectral Imager (MSI) models bands.

Discussion
Our OWT guided approach to estimate optical water quality parameters, such as Chl-a, TSM, CDOM, and Secchi disk depth, from R(λ) spectra, can be the solution to monitor boreal region inland and coastal waters health operatively. Water bodies can vary in shape, size, depth, and also by water colour. As shown in Table 1, our in situ dataset from 52 different water bodies demonstrates that the optical water quality parameters can vary largely. For example, the concentration of the Chl-a range was from 0.5 to 215.2 mg·m −3 , the TSM range was from 0.6 to 46.0 mg·L −1 , the range for aCDOM(442) was 0.4 to 43.7 m −1 , and for the Secchi disk depth the range was from 0.2 to 12.2 m. Therefore, to assume to find one algorithm per optical water quality parameter to fit with a good accuracy for all the water bodies may be a bit too optimistic.
Optically complex waters are independently influenced by Chl-a, TSM, and CDOM [8]. Therefore, these can be the reasons why standard remote sensing algorithms often fail in optically complex waters [72,73], and there are many regional or waterbody based [25,65,70,74,75] empirical algorithms. However, optical properties can vary strongly even inside one waterbody, for instance, during the one day in Lake Peipsi the concentration of Chl-a varied from 16.8 to 215.2 mg·m −3 , and in addition there is seasonal and interannual variability; therefore, studies have suggested, that more detailed approach would be beneficial [76,77]. The use of OWT guided approach, which first classifies water into different OWTs, reflecting different bio-optical conditions, and then applies that OWT specific algorithm to estimate the optical water quality parameters, may be one of the possible solutions.
In the context of selecting the best empirical algorithm for predicting the optical water quality parameter from R(λ) spectra, this study showed, as expected, that the choice depends on the OWT and also the sensor spectral range. As the R(λ) spectra are the basis for all the future calculations and developments, the error in R(λ) may multiply in the final product or lead to incorrect conclusions [78]. Moreover, it is known that atmospheric correction over inland and coastal areas is difficult [79][80][81] and the result still can contain large errors [30,45,77,82,83]. Therefore, in this study, to eliminate as many errors as possible and focus on finding the best algorithm, we used in situ measured R(λ) spectra with 1 nm step and for OLCI and MSI we recalculated in situ measured R(λ) using SRFs into OLCI and MSI bands. The same approach has been used before [65]. However, it is still important to remember that in situ measured R(λ) is not the absolute truth and can contain large errors for example due to variable weather conditions [45,[84][85][86]. For example, R(λ) measurements are difficult to perform well in CDOM-rich waters, which often leads to strong overestimation in the blue region of the spectrum [45,65]. Moreover, the presence of sunlight increases the R(λ) values in the blue part of the spectrum.
Linear regression models were used for predicting the concentration of Chl-a. Compared to other optical water quality parameter model selections, Chl-a varies the most between OWTs and the spectral range. The widely used [10,63,68,[87][88][89][90][91][92] ratio 700/670 nm with a high correlation for high-biomass waters, was the best model only for Turbid OWT with a Ramses spectral range. However, the two variables ratio approach was the best for all the spectral ranges in Brown OWT, OLCI and MSI Moderate OWT, Ramses Turbid OWT, and MSI Very Turbid OWT. Matthews [93] suggests for lower Chl-a concentrations to use the ratio of the blue and green. In our best-performing models, all variables were selected from the red and NIR spectral range, with different combinations. Moreover, broadly used [10,30,61,87,94,95] algorithms that include chlorophyll fluorescence information using values of reflectance spectra around 681 nm were not the best for any cases. The Brown OWT retrieving models coefficients of determination were about 40%, thus needing future investigation and improvement.
From the 39 tested empirical algorithms to retrieve the concentration of TSM, the model TSM18 using the Kutser et al. [65] algorithm based on a reflectance peak at 810 nm was the best for MSI for Turbid, Very Turbid, and Brown OWTs. It also suited well for Ramses and OLCI Turbid, Very Turbid, and Brown OWTs. The authors of the algorithm pointed out the usefulness of NIR spectral part for retrieving the TSM in turbid waters was already known from previous studies [96,97]. Similarly, Matthews [93] discussed the potential of NIR spectral part for retrieving the concentration of TSM in the empirical algorithms overview article. However, despite being developed in waters with high mineral particles, Kutser et al. [65] showed the usefulness of NIR part of the spectra in waters, where the majority of TSM was organic, as in our dataset. Often, algorithms use the knowledge that near 560 nm is the phytoplankton absorption minimum; therefore, the reflectance is sensitive to changes in the TSM [93]. However, Zhang et al. [66] demonstrated that reflectance at 550 nm is sensitive to TSM changes in less turbid waters and reflectance at 645 nm is sensitive in turbid waters. For Clear OWT, the model TSM39 using Zhang et al.'s [66] developed log-transformed multiple linear regression algorithm, based on a combination of 488,555, and 645 nm information, was the best for all sensors.
For the aCDOM(442) the power regression predictive models of the reflectance ratio showed better performances for all the spectral ranges in Clear, Moderate, and Turbid OWTs. For instance, the reflectance ratio of 665/560 nm was the best for OLCI and MSI; however, for Ramses, the model using the ratio of 560/660 nm was preferred. These ratios are quite commonly used for predictions; however, as shown in Table A1, they are used with different statistical techniques, such as linear regression [67,68,91,98,99], and power regression [67,75,98,99]. For Brown OWT, the log-transformed multiple linear regression model using the Brezonik et al. [100] algorithm, using 488 nm and 830 nm, showed the highest results for all spectral ranges. However, Figure 8 shows that the Brown OWT model is not sensitive for higher than 20 m −1 values and strongly underestimates in situ measured values in that region. Details of the algorithm in Table A1 also show that the algorithm development range was up to 19.7 m −1 . Overall, none of these tested CDOM algorithms' data ranges were up to 42 m −1 as were the in situ measurements present in our dataset. Therefore, future work is needed to find or develop algorithms that are suitable for these kinds of humic lakes. Additionally, reflectance measurements, both in situ and satellite, are difficult to perform well for these very dark lakes. Since uncertainties are higher in the blue part, the longer wavelengths are preferred for algorithms from the reflectance measurement side.
The predictive models for the Secchi disk depth analysis showed that the model ZSD104, using the Kloiber et al. [69] algorithm based on log-transformed multiple regression of form 485/660 nm ratio and the additional 485 nm, was the best model for all spectral ranges in Clear, Moderate, and Turbid OWTs. As shown in Table A1, this algorithm was developed for data ranges up to 5 m; and from Clear, Moderate, and Turbid OWTs, 96% of measurements fit within the limits. Furthermore, in the Clear OWT the maximum measured Secchi disk depth is 12.2 m and Figure 9 shows a lower accuracy predicting high Secchi disk depths. However, studies using the same algorithm have demonstrated that this can be used up to 15 m [101,102]. Therefore, it may be that upwelling and optically extreme conditions are the main reason for the lower prediction accuracy. For the Very Turbid OWT, the model ZSD112 using Wu et al. [70] algorithm, based on multiple linear regression of five variables, was the best. For the Brown OWT, the best model was based on reflectance at 660 nm. Matthews [93] also suggested, that for humic lakes, a single band algorithm may work; however, the blue part of spectrum was suggested. Overall, the Brown OWT best model was not good enough and needs future work.
The presented OWT guided approach to estimate optical water quality parameters was applied to OLCI images acquired in the Pärnu Bay region to investigate ecosystem seasonal and spatial changes and responses to weather effects. As shown in Figure 10, the left column includes OWT estimations and optical water quality parameters derived images during the most common situations, when higher values were present close to coast; however, the bay was mainly classified into Clear OWT. There was a storm event with a daily average wind speed of 11.2 m·s −1 and the gusts over 21 m·s −1 at 22 th June 2018. The image captured the day after the storm showed the changes in the bay. As the bay is shallow with a soft bottom, the wind caused resuspension of sediments into the water column and as a result the bay was then classified into Turbid or Moderate OWTs. Derived images ( Figure 10) showed an increase of TSM concentration (due to both inorganic and organic part) and drastic decrease in Secchi disk depth. This indicates changes in the underwater light field. The Chl-a values rose, but we do not have a reason to believe that this was actually true in this case, and we assume the overprediction of the Chl-a due to TSM side-effect. Firstly, finding empirical algorithms that effectively separate the signals from TSM and Chl-a can be challenging [93], and secondly, our datasets mainly include TSM dominated by SPOM in situ measurements. In the future, it is necessary to add more SPIM dominated in situ measurements to train our Turbid OWT models. Five days later (right column of Figure 10) conditions had returned to near those on the 17 th , though with higher Chl-a values, possibly due to the added nutrients from the sediments. These changes are quick in nature and remain often uncaptured by traditional monitoring programs, adding more value to remote sensing possibilities.
Remote Sens. 2020, 12, x FOR PEER REVIEW 18 of 36 secondly, our datasets mainly include TSM dominated by SPOM in situ measurements. In the future, it is necessary to add more SPIM dominated in situ measurements to train our Turbid OWT models. Five days later (right column of Figure 10) conditions had returned to near those on the 17 th , though with higher Chl-a values, possibly due to the added nutrients from the sediments. These changes are quick in nature and remain often uncaptured by traditional monitoring programs, adding more value to remote sensing possibilities. Additionally, exploring our OWT guided approach applied to OLCI images, we discovered that sometimes artificial lines emerged as a transition from one OWT model to another OWT model.
As changes in natural water are usually continuous, but we are trying to push these into very strict limits, then these lines are expected to happen. In another study, fuzzy logic is often used to blend algorithms for optical water quality parameter algorithms [41,103,104]. Since the OWT classification we used uses similarity calculations, each pixel had a similarity assessment for every OWT. For the purpose of blending OWT models, the combination of similarities as weights and the threshold for the selection of when to add another OWT's model in blending calculations was giving promising results, but are in need of more detailed studies in the future.

Conclusions
Currently, water monitoring programs are mainly based on in situ measurements; however, considering that water bodies are dynamic in nature, this method may not reflect the status of the entire water body. Remote sensing offers effective ways to observe the spatial and temporal variations in water quality. Moreover, the launch of MSI and OLCI, and the free availability of their data with high spectral, spatial, and temporal resolution, have increased the potential to add remote sensing techniques into monitoring programs and to improve the quality of monitoring water. However, remote sensing of inland and coastal waters can be challenging as they are independently influenced by Chl-a, TSM, and CDOM. In this study, we introduce an OWT guided approach to estimate the optical water quality parameters, such as the concentration of Chl-a and TSM, the aCDOM(442), and the Secchi disk depth from the R(λ) spectra with different spectral ranges. We focused our study on the boreal region inland and coastal waters; however, we showed the large variation of optical water quality parameters.
We tested 132 previously published empirical algorithms to find the best existing solution for each OWT (Clear, Moderate, Turbid, Very Turbid, and Brown) and for different spectral ranges, such as Ramses with a 1 nm step, OLCI bands, and MSI bands. We demonstrated that the suitability of the algorithm depends on the OWTs and spectral ranges. It is necessary to select an appropriate algorithm for the region of interest; therefore, using only one algorithm for optically complex waterbodies does not work, and we suggest using the OWT guided approach as the possible of pre-selection method for choosing right algorithm. The OWT guided approach can provide a basis for understanding the seasonal and spatial variabilities of water bodies and can be an additional technique in water monitoring programs to improve the quality of monitoring the optical water quality parameters. Appendix A   Table A1. The algorithms used to retrieve the optical water quality parameters, such as chlorophyll-a (Chl-a), total suspended matter (TSM), absorption coefficient of coloured dissolved organic matter (CDOM), and Secchi disk depth (ZSD). In algorithms that were designed for satellite sensors, the central wavelength is used to the mark sensor band. Descriptions of acronyms and abbreviations are found in Appendix C Table A2. The data range is given in parametric units: for Chl-a mg·m 3 , for TSM mg·L −1 , for the absorption coefficient of CDOM at wavelength shown in brackets m −1 , and for Secchi disk dept m.      Figure A1. On radar plots, four Chl-a predictive models with the highest score from our ranking system are plotted for each optical water type and spectral range. The statistical metrics that were used to rank the model are displayed on each axis. The center of the radar plot indicates a low score.
The legend includes information about model position (1 to 4) and model number, without parametric specifications. Figure A1. On radar plots, four Chl-a predictive models with the highest score from our ranking system are plotted for each optical water type and spectral range. The statistical metrics that were used to rank the model are displayed on each axis. The center of the radar plot indicates a low score. The legend includes information about model position (1 to 4) and model number, without parametric specifications.
Remote Sens. 2020, 12, x FOR PEER REVIEW 26 of 36 Figure A2. On radar plots, four TSM predictive models with the highest score from our ranking system are plotted for each optical water type and spectral range. The statistical metrics that were used to rank the model are displayed on each axis. The center of the radar plot indicates a low score. The legend of the radar plots includes information about the model position (1 to 4) and model number, without parametric specifications. Figure A2. On radar plots, four TSM predictive models with the highest score from our ranking system are plotted for each optical water type and spectral range. The statistical metrics that were used to rank the model are displayed on each axis. The center of the radar plot indicates a low score. The legend of the radar plots includes information about the model position (1 to 4) and model number, without parametric specifications.
Remote Sens. 2020, 12, x FOR PEER REVIEW 27 of 36 Figure A3. On radar plots, four absorption coefficients of CDOM at a wavelength of 442 nm predictive models with the highest score from our ranking system are plotted for each optical water type and spectral range. The statistical metrics that were used to rank the model are displayed on each axis. The center of the radar plot indicates a low score. The legend of the radar plots includes information about the model position (1 to 4) and the model number, without parametric specifications. Figure A3. On radar plots, four absorption coefficients of CDOM at a wavelength of 442 nm predictive models with the highest score from our ranking system are plotted for each optical water type and spectral range. The statistical metrics that were used to rank the model are displayed on each axis. The center of the radar plot indicates a low score. The legend of the radar plots includes information about the model position (1 to 4) and the model number, without parametric specifications.
Remote Sens. 2020, 12, x FOR PEER REVIEW 28 of 36 Figure A4. On radar plots, four Secchi disk depth predictive models with the highest score from our ranking system are plotted for each optical water type and spectral range. The statistical metrics that were used to rank the model are displayed on each axis. The center of the radar plot indicates a low score. The legend of the radar plots includes information about the model position (1 to 4) and model number, without parametric specifications. Figure A4. On radar plots, four Secchi disk depth predictive models with the highest score from our ranking system are plotted for each optical water type and spectral range. The statistical metrics that were used to rank the model are displayed on each axis. The center of the radar plot indicates a low score. The legend of the radar plots includes information about the model position (1 to 4) and model number, without parametric specifications. Polynomial regression (number indicated degree of polynomial) LT10-LR Log-transformed linear regression with base 10 LL10R

Appendix C
Log-log regression with base 10 LT-P2R Log-transformed polynomial regression