1. Introduction
Remote sensing offers substantial advantages over traditional monitoring methods, mainly because of the synoptic coverage and temporal consistency of the data, and it has the potential to provide crucial information on inland and near-coastal transitional waters [
1]. For this reason, remote sensing is an essential tool for the study and monitoring of inland water bodies, often characterized by Case 2 waters (optically complex waters), especially when it comes to studying key variables to determine water quality along a reservoir’s longitudinal profile. Because of this importance, many works develop algorithms based on in situ reflectance for various reasons: the validation of a previously published reflectance band-ratio algorithm [
2] to use historical and unpublished data [
3], obtaining algorithms in advance of the sensor’s activity [
3,
4], to develop cross-instrument algorithms to facilitate the spatial and temporal comparability of overlapping missions [
5], and to achieve a wider range of data [
6].
However, when the algorithms developed from in situ reflectances are applied to satellite imagery, their performance also depends on the errors in the water reflectance retrieval after the atmospheric correction. Estimating water quality variables from satellite images requires an accurate estimation of water-leaving radiance, but the radiation measured by the sensors has another important contributor, the atmosphere, consisting of atmospheric gases and aerosols, which account for at least 90% of the signal measured by the satellite sensor [
7,
8]. Therefore, the atmospheric correction (AC) procedure is an important step to subtract the atmospheric contribution (scattering effects of aerosols, water vapor, etc. …) and sun glint (sunlight reflecting off the water surface at the same angle as the sensor) from the signal at the top of the atmosphere (TOA) [
9].
With regard to Case 2 waters, the algorithms designed to perform the AC of land surfaces are not directly applicable to an aquatic surface because it is darker, and the air-water interface is not Lambertian [
9]. Nevertheless, a significant correlation has been demonstrated between the reflectance obtained by the Sen2Cor processor, developed for the land surface study of Sentinel-2 (S2-MSI) imagery, and the in situ reflectance for hypertrophic waters [
10,
11]. In addition, Case 2 waters are much more complex than oceanic waters (Case 1 waters) because of their optically active constituents (OAC) as they contain a higher presence of phytoplankton, dissolved organic matter, and tripton, therefore requiring a specific AC. Each OAC has a very different effect on the reflective spectrum; the suspended matter increases the reflectance in the green, red, and NIR bands, while the CDOM (chromophoric dissolved organic matter) increases the absorption (reducing the reflectance) in the blue bands [
12]. Consequently, they strongly condition the water spectrum because the percentage contribution of water reflectance in TOA varies depending on the predominance of suspended sediment or CDOM in the water [
13]. Therefore, due to the higher complexity of inland waters in the NIR, the ACs designed for Case 1 waters are generally not applicable to Case 2 waters. This indicates that different ACs may be required for waters of different OAC compositions, depending on the part of the spectrum used to perform the atmospheric correction.
To respond to this challenge, from before the launch of the S2-MSI mission satellites and up to the present day, different types of ACs are being developed for different Case 2 water types, even though the mission objective is soil and vegetation study. The great interest in S2-MSI is due to the fact that its usefulness has been proven for inland water study, thanks to the inclusion of new bands at the red edge (boundary of the red and infrared spectral regions), their radiometric quality, and their high spatial resolution [
14]. A lot of the ACs to obtain water reflectance from S2-MSI images have been developed by the European Space Agency (ESA) and are provided in the freely available SNAP (SeNtinel Application Platform) program toolkit. Initially, there was only the C2RCC (Case 2 Regional Coast Colour) processor, which was developed from the original Case 2 regional processor [
15] and adapted to different multi and hyperspectral sensors (e.g., S2-MSI and Sentinel-3-OLCI). Currently, a set of three different neural networks (NN) are available for S2-MSI imagery, trained with different databases to perform the AC on different water types and named here as C2-Nets: C2RCC, C2X (Case2eXtreme), and C2X-COMPLEX (C2XC). Another algorithm is Polymer; with its generic approach, it has been applied to many sensors, including S2-MSI and S3-OLCI. The Polymer approach is a physical model based on a spectral optimization method called spectral matching which is aimed at recovering the radiation scattered and absorbed by water from the measured signal by satellite sensors in the visible spectrum [
16,
17]. One of the strengths of this algorithm is the ability to recover water radiation in the presence of sun glint, achieving a spatial coverage higher than other products. With Polymer and the C2-Nets, the best S2-MSI radiation validation results have been obtained, as published in previous studies for Case 2 waters, such as those carried out by Ansper and Alikas [
18], Pereira-Sandoval et al. [
12], Uudeberg et al. [
19], Warren et al. [
20], and Bui et al. [
21].
Once the AC has been applied and the water leaving reflectance has been obtained, the next step is obtaining the retrieval algorithms to estimate the different bio-optical variables, either empirically or analytically. In this sense, the objective is always to obtain a general algorithm, but the great difference in waters with respect to their OAC greatly conditions the results, as in the case of the AC step. The problem with validating the AC algorithms lies in the need to measure the in situ reflectance coinciding with the satellite pass, having a cloud-free image, good wind conditions, and all the involved costs of performing the sampling. Thus, an alternative is to use in situ reflectance data from previous studies, even if they are not coincidental with the satellite pass or even before its launch, to develop the algorithms, which will be validated with different ACs applications and with in situ water quality measurements. In this way, the database can be much larger, giving greater consistency and the water quality retrieval algorithms obtained more robust.
In this way, the work elaborated and published by Sòria-Perpinyà et al. [
6] (previous study) was developed, in which five retrieval algorithms using in situ spectrometry were implemented to estimate five water quality variables such as transparency (Secchi disk depth, SDD), total suspended solids (TSS), CDOM, chlorophyll concentration (Chl_a), and phycocyanin concentration (PC), for both S2-MSI and S3-OLCI. This provides us with accurate and robust algorithms developed with in situ reflectance and not specific to a given AC. However, the obtained in situ spectrometry water quality algorithms need to be assessed using different ACs water reflectance because the application with reflectances obtained after an AC process may lead to unexpected results. It is necessary because, in the case of water, the validation of the reflectance resulting from the atmospheric correction is very difficult (the error could be of the same order as the surface reflectance itself). That is why the idea of validating the final products on water quality is the right approach, in particular for water. After all, the interest is not so much in the reflectance of the water itself but in its properties.
Therefore, the article is the second part of a previous study to complete the applicability step that was missing in other works, where only in situ reflectances were used to develop algorithms. With the aim of carrying out this applicability of the algorithms, the validation process has been conducted to test the best of those algorithms when we apply them to satellite S2-MSI imagery after the correction of atmospheric and sun specular effects.
4. Discussion
Merging the databases of two recent projects has provided us with a large amount of data to carry out the validation of Rrs and water quality algorithms. The database covers a wide gradient of limnological variables but is always within the wide range used to develop the algorithms. It is a database representative of the variability of the climatic and limnological conditions of the Mediterranean Basin, with a greater representation of semi-arid environments.
Regarding cross-correlations between the variables, similar to the data used to develop the water quality algorithms by in situ spectrometry, the OACs mainly affecting water transparency are TSS and phytoplankton pigments and as a consequence, have greater influence on the spectra of the studied water bodies, while CDOM has less influence on water transparency and the spectral features of the dataset. This higher or lower influence on the water spectrum facilitates or hinders its determination through remote sensing.
Rrs validation has been performed for four types of ACs, three of them, the C2-Nets based on NN, and Polymer, a physical model based on a spectral optimization method. The C2-Nets use the top-of-atmosphere full spectrum as the input, while Polymer recovers the radiation scattered and absorbed by water from the measured signal by satellite sensors from the blue to the near-infrared spectral range.
Considering the ACs models characteristics tested with our data and the validation results, with the Polymer program, we obtain the best validation results for the visible region, precisely the spectrum part used to apply the spectral matching method, and the worst in the NIR according to Pereira-Sandoval et al. [
12], Warren et al. [
20], and Caballero et al. [
33].
Regarding the C2-Nets, using the top-of-atmosphere full spectrum as the input, except for C2RCC, both C2X and C2XC obtain better validation results than Polymer in the Red-edge and NIR bands. C2-Nets differences are consistent in the used trained ranges of IOPs. With our data, the C2-Net, C2XC-using intermediate IOPs ranges achieve the best validation results for all bands, except for
R740 and
R865, whose best results are obtained with C2X. C2X improves its results for longer wavelength bands because its trained ranges of scattering coefficients of typical sediments and white particles (calcareous sediments) include extreme cases, better reproducing the increased scattering in the NIR and demonstrating their best results for turbid waters, agreeing with Pereira-Sandoval et al. [
12], Pahlevan et al. [
34], and Tavares et al. [
35].
Another important aspect to take into account is the restrictiveness of the respective quality flags. According to this criterion, the Polymer program has obtained the highest number of match-ups with respect to the other ACs, according to Steinmetz et al. [
17], Caballero et al. [
33], Warren et al. [
20], Pereira-Sandoval et al. [
12], and Bui et al. [
21]. Regarding C2-Nets, applying quality flags and match-up exercises, fewer data were removed with C2RCC because it is trained for a lower IOPs range and coastal waters, and as can be seen in
Figure 2, most of the samples have low concentrations of IOPs. However, its validation results were the worst for all bands except for
R560 and
R665, for which C2X had the worst results. Meanwhile, the most restrictive AC, and therefore the one with fewer data for the validation process, was C2X because it is trained for a high IOPs range, and our dataset has few samples with high IOPs concentrations. Therefore, for our dataset, the reduced number of valid match-ups and lower consistency in the accuracy along the spectrum observed with this processor made it the most uncertain C2-Net, agreeing with Soriano-González et al. [
32]. These results indicate that the C2-Nets development is improving and adapting their training range for inland waters, although the sun glint handicap has not yet been solved.
The results illustrate well that the ACs developed from NN are more limited to the water conditions of their training database. Polymer may also have limitations due to its internal marine model, depending on chlorophyll and sediments, although its present high performance shows suitability for our water types. The inconvenience with Polymer is the difficulty to process images for a non-professional user since it is not a program with a user-friendly interface.
Regarding the water quality algorithms validation, the best results were obtained with the ACs with the best Rrs validation results, C2XC and Polymer. The best-validated algorithm for SDD estimation uses the Blue/Green ratio (R492/R560) applied to Polymer Rrs. It is the algorithm with the shortest wavelength bands of all tested algorithms, and therefore the bands with the fewest errors in the validation of the Rrs. The Blue/Green ratio was the second-best result in the previous study, with results very similar to ours, with an R
2 of 0.65, an RMSE of 1.21 m, an RRMSE of 51%, and a bias of 0.47 m, which were obtained for a 0.14–9.55 m range. In our validation results, the RMSE is slightly higher (1.59 m), although the RRMSE is lower (47%) for the same range of data. The results also agree with the RMSE of 1.4 m obtained with 82 samples in the study carried out by Delegido et al. [
36], using the same AC and relation bands. The good results also obtained with C2XC (
Table A2) indicate that the algorithm obtained from 266 in situ reflectances is applicable and reliable for ACs that achieve an accurate estimate of water-leaving radiance.
The best-validated algorithm for CDOM estimation, based on the Red-edge1/Blue ratio (
R704/
R492) and applied to C2XC Rrs, was the algorithm with the biggest errors in the previous study. However, the validation results were very similar among the different algorithms on the previous study, obtaining for
R704/
R492 ratio an R
2 of 0.5, an RMSE of 1.03 µg/L QSE, an RRMSE of 56% and a bias of 0.26 µg/L QSE, for a data range between 0.3 and 5.3 µg/L QSE. The values improved in our validation for R
2 (0.8) and RMSE (0.42 µg/L QSE), while the bias (0.32 µg/L QSE) is similar and the RRMSE (87%) is much higher because the range of data is much lower, 0.03–1.75 µg/L QSE. These results are in agreeance with bands relations obtained in the study carried out by Ruescas et al. [
37], from simulated water-leaving radiance by Hydrolight and with the potential correlation between the Rrs and CDOM values obtained in Kutser et al. [
38], Kutser [
39], Slonecker et al. [
40], and Chen et al. [
41]. The bad results in contrast to the other variables could be due to the lesser influence of CDOM on the transparency for our database and consequently in the reflectance spectrum. Nevertheless, although the RMSE seems low, the algorithm should be tested with a larger CDOM range.
For TSS, in situ reflectance algorithms for two data ranges were validated, below and above 20 mg/L, and in both groups, the best algorithm was obtained using the
R704 band and Polymer Rrs. Both algorithms are obtained with a lineal calibration correlation, but the slopes and offsets are very different. Of all the bands used in the tested algorithms,
R704 is not simply the band with lowest errors in the validation of the Rrs but it is the band with better results in the previous study. The best validation results for values below 20 mg/L in the Previous Study were also with
R704 band, with an R
2 of 0.85, an RMSE of 1.79 mg/L, an RRMSE of 43%, and a bias of 0.39 mg/L, very similar values to those obtained in this work, where all the validation results enhance except bias (0.74 mg/L). On the other hand, for values above 20 mg/L, with only five data points available for validation in the previous study, the
R704 band did not reach the best results. With these five data points, only an R
2 of 0.02, an RMSE of 85.97 mg/L, an RMSE of 190%, and a bias of 20.82 mg/L were obtained. The results greatly improved in this work, using 13 data points in the validation process. The results are in agreeance with Soomets el al.’s [
42] study, using TOA, C2RCC, and C2X ACs according the optical water type.
The other variable with two algorithms was Chl_a, but this time, the algorithms were not coincidental for concentrations above and below 5 mg/m
3. The threshold value was 5 mg/m
3 because waters with high Chl_a (above 3–5 mg/m
3 [
43]) produce discernible spectral features in the red and NIR regions of the reflectance spectrum [
44]. The best algorithm for values below 5 mg/m
3 was the same as that obtained in the previous study, the log
10 of the ratio between the band with a maximum value among the
R443 and
R492 bands as the numerator and
R560 as the denominator using C2XC Rrs, the bands with the fewest errors in the validation of the Rrs. The validation results in the previous study were an R
2 of 0.55, an RMSE of 0.94 mg/m
3, an RRMSE of 43%, and a bias of 0.09 mg/L, similar values to that obtained in this work, where the validation results enhance the RMSE of 0.93 mg/m
3 and the bias of 0.01 mg/m
3. The second-best result was obtained with Polymer Rrs, with the AC having the better validation results for the used bands, obtaining an RMSE of 1.04 mg/m
3, only 0.1 mg/m
3 higher than the previous study validation results but with a very high RRMSE of 64%. The RRMSE is higher as the number of data below 2 mg/m
3 increases, with 72% of the data points in the Polymer validation in front of the 66% in the C2XC validation and the 56% in the previous study. These results are in agreement with the study carried out by Pereira-Sandoval et al. [
12], obtaining an absolute error of 0.89 mg/m
3. The results corroborate the applicability of the algorithm developed by O’Reilly and Werdell [
5] for hyperspectral sensors using in situ radiometry in multispectral sensors using atmospherically corrected MSI Rrs.
The best algorithm for values above 5 mg/m
3 was the Green/Red ratio (
R560/
R665) applied to Polymer Rrs, although in the previous study, the validation results of this algorithm using in situ Rrs were bad, with an R
2 of 0, an RMSE of 144 mg/m
3, an RRMSE of 156%, and a bias of 43 mg/m
3. However, the algorithm generated with 144 data points in the previous study applied to the 42 data used in the Polymer validation provided the best validation results, an R
2 of 0.9, an RMSE of 28.1 mg/m
3, an RRMSE of 50%, and a bias of 1.35 mg/m
3. This result may be due to two factors, (1) the data range and (2) the used bands. Regarding data range, only 4 of the 42 data used were higher than 100 mg/m
3, and in the previous study, the RMSE for the
R560/
R665 ratio would be 34.72 mg/m
3 using only values lower than 100 mg/m
3. Regarding the used bands, the
R560/
R665 ratio was the one that used the shortest wavelength bands of all the tested algorithms using concentrations higher than 5 mg/m
3, the wavelengths for which Polymer gives the best results in the validation of the Rrs. The band relation corresponds to the ratio between reflectances at the minimum absorption at the green region between 550 and 555 nm, to reflectances at the second peak absorption at the red region between 670 and 675 nm, used by Ha et al. [
45] in S2-MSI images using the empirical line method as an AC and obtaining a standard error of 0.14 mg/m
3 for a data range of 1.58–6.00 mg/m
3. The same equation was used by Pereira et al. [
46], obtaining an RMSE of 21.9 for a larger data range of 4.14–76.44 mg/m
3, increasing the RMSE with the data range. It is noted that for C2X and C2XC, using only 13 and 17 data, respectively, the best algorithm obtained was the three-band model of Dall’Olmo et al. [
47], using the
R740,
R704, and
R665 bands, for which C2X and C2XC were the ACs with the best validation results. A three-band model obtained the fourth best validation result in the previous study, with an R
2 of 0.85, an RMSE of 41.8 mg/m
3, an RRMSE of 52%, and a bias of 4.77 mg/m
3, while for the C2XC Rrs validation results, the R
2 was 0.9, the RSME was 50.6 mg/m
3, the RRMSE was 57%, and the bias was 3.22 mg/m
3. This RMSE is similar to that obtained by Cairo et al. [
48] of 56.9 mg/m
3 in his study using the three-band model, the 6S (Second Simulation of the Satellite Signal in the Solar Spectrum) model for AC, and a Chl_a data range until 600 mg/m
3, and an RMSE slightly higher than that obtained by Ogashawara et al. [
49] of 33.5 mg/m
3 for a data range between 2.3 and 306 mg/m
3. It may be that an improved recovery of the Red-edge bands with Polymer will make the three-band model the best Chl_a retrieval algorithm.
The best-validated algorithm for PC estimation was based on the Red-edge1/Red ratio (
R704/
R665) and applied to C2XC Rrs. The Red-edge1/Red ratio was the algorithm with the shortest wavelength bands of all tested algorithms, and therefore the bands with the fewest errors in the validation of the Rrs, and it was also the best result in the previous study. The validation results in the previous study were an R
2 of 0.8, an RMSE of 43.7 mg/m
3, an RRMSE of 55%, and a bias of 14.64 mg/m
3 for a data range between 0.7 and 1040 mg/m
3. The results improved in this work with an R
2 of de 0.9, an RMSE of 38.9 mg/m
3, an RRMSE of 39%, and a bias of 5.45 mg/m
3 for a data range between 0 and 751.1 mg/m
3. This demonstrates the band ratio applicability in S2-MSI images, which is a band ratio used in the previous study and in the drone and aircraft studies of Kwon et al. [
50] and Beck et al. [
51]. The calculated RMSE is similar to the standard error of 45.5 mg/m
3 obtained by Simis et al. [
2] for a PC higher than 50 mg/m
3, using samples from the same climatic region.
The application of water quality algorithms developed from in situ reflectances, using reflectances obtained after an AC process, may lead to different errors than those calculated in the development of the algorithms. Algorithms with the shortest wavelength bands had better results if their validation results in the previous study were not very high. Only three algorithms were coincident and showed similar error statistics. However, the results demonstrate the applicability of the algorithms developed from in situ reflectances to satellite S2-MSI imagery after the correction of atmospheric and sun specular effects, a step that was missing in other works.
Indeed, there is always some difficulty to obtain general algorithms to cover a wide range of data. One solution to obtain a good retrieval algorithm for each different bio-optical variable is to obtain specific algorithms for each different water type according to a pre-established classification based on their optical spectrum (optical water types), as it has been performed in several studies (e.g., Moore et al. [
52]; Neil et al. [
53]; Uudeberg et al. [
19]; Soomets et al. [
42]). If the same problem exists for AC, the results would probably be more accurate if the optical water type differentiation was defined before applying the AC.