A New Algorithm for the Retrieval of Sun Induced Chlorophyll Fluorescence of Water Bodies Exploiting the Detailed Spectral Shape of Water-Leaving Radiance

: Sun induced chlorophyll ﬂuorescence (SICF) emitted by phytoplankton provides consider-able insights into the vital role of the carbon productivity of the earth’s aquatic ecosystems. However, the SICF signal leaving a water body is highly affected by the high spectral variability of its optically active constituents. To disentangle the SICF emission from the water-leaving radiance, a new high spectral resolution retrieval algorithm is presented, which signiﬁcantly improves the ﬂuorescence line height (FLH) method commonly used so far. The proposed algorithm retrieves the reﬂectance without SICF contribution by the extrapolation of the reﬂectance from the adjacent regions. Then, the SICF emission curve is obtained as the difference of the reﬂectance with SICF, the one actually obtained by any remote sensor (apparent reﬂectance), and the reﬂectance without SICF, the one estimated by the algorithm (true reﬂectance). The algorithm ﬁrst normalizes the reﬂectance spectrum at 780 nm, following the similarity index approximation, to minimize the variability due to other optically active constituents different from chlorophyll. Then, the true reﬂectance is estimated empirically from the normalized reﬂectance at three wavelengths using a machine learning regression algorithm (MLRA) and a cubic spline ﬁtting adjustment. Two large reﬂectance databases, representing a wide range of coastal and ocean water components and scattering conditions, were independently simulated with the radiative transfer model HydroLight and used for training and validation of the MLRA ﬁtting strategy. The best results for the high spectral resolution SICF retrieval were obtained using support vector regression, with relative errors lower than 2% for the SICF peak value in 81% of the samples. This represents a signiﬁcant improvement with respect to the classic FLH algorithm, applied for OLCI bands, for which the relative errors were higher than 40% in 59% of the samples.


Introduction
Phytoplankton is the base of the trophic pyramid in aquatic ecosystems, using solar energy for energy fixation in carbon compounds, and playing a key role in the earth's carbon cycle. The monitoring of chlorophyll-a (Chla) fluorescence emitted by phytoplankton is one of the most used methods to understand the state of aquatic ecosystems [1][2][3]. When the Chla molecules present in phytoplankton are excited by absorbed light, the excitation energy can either (1) be used in the photosynthetic chain, (2) be dissipated as heat, or (3) be emitted as Chla fluorescence in the 650-800 nm region and measurable as a contribution to the peak within the 660-750 nm region on the water-leaving radiance or reflectance of water bodies [4]. As shown by Gower and Maritorena [5,6], the sun induced chlorophyll fluorescence (SICF) emission curve in natural waters can be approximated by a Gaussian function with a maximum at ∼685 nm and a full width at half maximum (FWHM) of 25 nm.
In addition to SICF, the fluorescence quantum yield (φ SICF) is defined as the ratio of the number of photons emitted to the number of photons absorbed by the Chla.
The measurement of SICF, and its quantum yield of photosynthetic organisms in natural waters, adds information that can be interpreted in terms of phytoplankton physiology [1,7,8] and species composition [9]. The SICF signal has been used to estimate the phytoplankton's biomass [1,[10][11][12], photosynthetic activity, and primary productivity [13,14]. The use of optical remote sensing data in the visible-near-infra-red (VNIR) spectral range (400-800 nm) for the estimation of SICF and φ has demonstrated its potential to describe the spatial and temporal variability of the physiological status of marine phytoplankton, linked to nutrient limitation [10,15].
The fluorescence line height (FLH) algorithm, developed to retrieve SICF [16], estimates the height of the red fluorescence peak by using the water-leaving radiance (L w ) or remote-sensing reflectance (Rrs, defined as the ratio of L w and E d , the sun irradiance arriving at the water surface, called here apparent reflectance) at three wavelengths, one which is centered on the SICF peak and two others that are used to define a baseline below the SICF peak. The FLH algorithm is routinely implemented for global ocean image processing using MODIS and MERIS/OLCI data [17][18][19]. The accuracy in the estimation of the actual fluorescence peak height (centered around 685 nm) depends on the position of the three bands used in the FLH [18,20] and the signal-to-noise of the sensor used. It has been shown that the SICF signal retrieved by MODIS (peak band at 677 nm, with 11 nm of FWHM) responds to only 57% of SICF signal, while MERIS/OLCI (peak band at 681.25 nm, with 7.5 nm of FWHM) reaches 78% of the SICF signal [21].
The main difficulty in retrieving SICF is the variability in the concentration and optical properties of the optically active constituents (OACs) of water bodies, producing a large variability in the shape of Rrs, especially in the visible wavelengths. The main OACs are (1) pure water itself; (2) phytoplankton, containing a large number of different pigments, the most abundant being Chla, which is present in all phytoplankton cells; (3) colored dissolved organic matter (CDOM); and (4) non-algal particles (NAP) composed of organic particles (e.g., bacteria, protists, zooplankton, detrital organic matter) and suspended inorganic particles. Hence, the detection of SICF by remote sensors is mainly affected, within the emission region, by the absorption properties of water and phytoplankton, plus the particle scattering (from both phytoplankton cells and NAP). The combination of these three effects produces a reflectance peak in the SICF region that overlaps with the actual emission peak at low Chla concentration ([Chla]) and could become dominant with increasing [Chla] and/or NAP concentration [22][23][24][25].
Further difficulties to obtain SICF accurately from the reflected radiance signal are the strong effects of the atmospheric compounds, selectively attenuating the signal. The earth's atmosphere is composed mainly of molecular gases (i.e., O 2 , O 3 , CO 2 , water vapor) and particles suspended in the gases (aerosols). The scattering and absorption produced by these atmospheric compounds strongly affects the signal detected by any remote sensor, especially for signals leaving from water surfaces, where the contribution of L w in general is much lower than the total radiance arriving at any remote sensor. In the spectral SICF range, the signal is strongly affected by the water vapor and O 2 absorptions. In addition to the standard atmospheric effects to be compensated for water targets, the case of fluorescence emission requires that the in-filling of the O2-B band due to fluorescence emission should be taken into account as part of the atmospheric correction process, but this is not analyzed in this paper.
As introduced above, the performance of the FLH algorithm applied on Rrs is affected by the overlap between SICF and the reflectance peak in the same spectral region in water bodies with high [Chla] [4], producing a "shoulder" in the 660-750 nm region. The shape of this reflectance "shoulder" is due to the combined absorption features of phytoplankton and water, while its magnitude is due to the particle scattering. The combined result of these effects determines that, in productive waters, the linear baseline assumption no longer holds, since the fluorescence peak adds to a curved reflectance beneath. Therefore, the FLH calculated with the simple baseline approach leads to inaccurate SICF retrievals [26].
The use of a high spectral resolution sensor providing narrower bands than MODIS or MERIS/OLCI to detect SICF in water allows one to detect a more accurate contribution of SICF from a higher detailed shape of the water-leaving radiance, avoiding the atmospheric O 2 -B (687 nm) absorption band. The ongoing satellite missions that include sensors with higher spectral and radiometric resolutions (i.e., FLEX [27], CHIME [28]), increase the possibilities of retrieving a more accurate SICF peak height and width and, therefore, a more accurate total emission. Hence, the aim of this work is to define an algorithm able to separate the SICF emission signal from the peak produced by the combined effects of water and phytoplankton absorption and scattering by exploiting the detailed spectral shape of water-leaving radiance and considering the spectral variability of water.
The measurement of SICF in natural waters is, however, challenging. There are no standard instruments or protocols and very few actual measurements. The fluorescence measurements routinely done in the ocean, obtained with artificial light sources, can be considered as proxies of the SICF, but not its actual measurement in W m −2 sr −1 . Since in situ SICF data were not available, the methodology proposed in this manuscript was based entirely on simulated data. To ensure the maximum representativeness of the results, a very comprehensive radiative transfer model was used, and the ranges of OAC and the inherent optical properties were chosen to represent the widest variability of ocean and coastal waters.

Background and Definitions
Throughout this work, two different definitions of the remote sensing reflectance are used. The Rrs widely used in ocean optics is called here "apparent" reflectance, because the water-leaving radiance includes the emitted fluorescence contribution (SICF). Further, the "true" reflectance is defined as the water-leaving remote sensing reflectance without the SICF contribution (Rrs −SICF ) (Equations (1) and (2)): where E d (λ) is the downwelling plane irradiance arriving at the water surface (W·m −2 ·nm −1 ). The issue with the definitions of "apparent" reflectance and "true" reflectance also comes from the consideration of the bidirectional reflectance distribution function (BRDF), and from adjacency effects, which are not negligible for coastal areas and inland waters. While a full consideration of these effects is outside the scope of this paper, they must be taken into account during the atmospheric correction procedure. In water bodies, the upwelling radiance is the result of (1) scattering and absorption from particles and dissolved substances, and (2) emission contribution from SICF. Therefore, Rrs corresponds to Rrs −SICF plus the SICF contribution, with Rrs being the reflectance observed by the sensor and Rrs −SICF the unknown true reflectance component. It is important to recall that outside the fluorescence emission region (i.e., <660 nm and >750 nm) both reflectances are identical ( Figure 1).
To improve the SICF retrieval method for water bodies, we propose to further consider the spectral variability of water bodies due to the OAC of water, based on their spectral modelling. Moreover, the advantage is that the SICF spectral range is only slightly affected by the OAC. While the variability in the shape of Rrs in the visible wavelengths is high, the shape of the Rrs −SICF spectrum in the near-infrared (NIR) spectral range (700-900 nm) is largely determined by pure water absorption and thus is almost invariant [29].
Next, the modeling of the inherent optical properties (absorption and scattering) performed in this work is described.

Absorption
The total absorption coefficient (a(λ)) can be expressed as the sum of the absorption coefficients of each OAC: The absorption by pure water a w (λ) corresponds to the absorption described by Pope and Fry [30].
The absorption by CDOM a CDOM (λ) is modeled as [31] The absorption by phytoplankton a phy (λ) is given by [32] as follows: where [Chla] is supplied in mg·m −3 , and a * c (λ) is the chlorophyll-specific absorption coefficient in m 2 ·mg −1 representative of marine phytoplankton [33,34].
In the case of the absorption by NAP, a N AP (λ), the next expression is used [35]: where [NAP] is supplied in g·m −3 , and a * (λ) is the mineral particle-specific absorption coefficient in m 2 ·g −1 .

Scattering
The angular distribution of light scattered by a particle in a direction ψ at a wavelength λ is defined as the volume scattering function (β(ψ, λ)). The total spectral scattering coefficient (b(λ)) is the total magnitude of the scattered light, which corresponds to the integral of β(ψ, λ) over all angles 4π. The phase function ( β(ψ, λ)) corresponds to β(ψ, λ) normalized to b(λ), providing the shape of the β(ψ, λ) regardless of the intensity of the scattered light. The total backscattering coefficient (b b (λ)) is the total light scattered in the backward direction, and the backscattered fraction with respect to total scattering ( b(λ)) is defined as the ratio between b b (λ) and b(λ) [32].
Regarding scattering modeling, it only considers the scattering coming from phytoplankton and NAP. The contribution of CDOM in the total scattering (b(λ)) is considered to be negligible [31].
As the scattering of phytoplankton and NAP depends on the size, structure, distribution, and composition of the particles, it is not easy to model the integral effects. Hence, a realistic approximation given by the spectral beam attenuation coefficient of particles c p (λ) is used to model b(λ) for phytoplankton and NAP [36]: where λ o is a nominal scaling wavelength, and n can have values from 0 to 1, depending on the size distribution (large particles have a small n; small particles have a large n). Therefore, b(λ) is calculated as the difference between c p (λ) and a(λ) according to [37]

Similarity Index
Rrs is related to the total spectral absorption coefficient (a(λ)) and total backscattering coefficient (b b (λ)) [29]: where κ is a coefficient that depends on the illumination conditions. In the NIR, it is assumed that the total absorption coefficient (a(λ)) is dominated by a w (λ), since a CDOM (λ) is negligible, and a phy (λ) and a N AP (λ) are very low in this region. It is also assumed that b b (λ) is almost spectrally flat for wavelengths in the NIR (λ N IR ), and therefore Equation (9) can be redefined as [29,38] Rrs The normalized Rrs (Rrsn), derived from the similarity index concept applied to the NIR region, is defined by dividing the reflectance by its value at 780 nm [29], a wavelength that is only affected by pure water absorption. The concept of similarity index is therefore useful since it allows Rrsn in the red-NIR region to be obtained, where other OACs except for pure water can be neglected. Therefore, it provides a first guess of the shape of the Rrs −SICF in the region where the SICF emission peak occurs. The calculation of Rrsn is the first concept of the strategy underlying the SICF retrieval algorithm presented in this paper.
In the entire visible region, however, the normalized remote sensing reflectance without SICF contribution (Rrsn −SICF (λ)) deviates from the ideal similarity index shape due to the non-negligible absorption and scattering of the different OACs. If a comprehensive simulation of the variability of OACs and illumination and surface conditions is performed in water bodies, the deviations of Rrs −SICF (λ) within the SICF region could be empirically estimated from a database of Rrs(λ) outside that region. This is the second concept underlying the algorithm strategy presented here.

Materials and Methods
To build the simulated databases, the radiative transfer model (RTM) HydroLight (HL) [32] was used. HL is an RTM that computes radiance distributions and derived quantities (irradiances, reflectances, fluorescence, etc.) for water bodies. The spectral radiance distribution is computed as a function of depth, direction, and wavelength within the water. The upwelling radiance just above the sea surface includes both the waterleaving radiance (L w (λ)) and that part of the incident direct and diffuse sky radiance that is reflected upward by the wind-blown sea surface [32]. HL employs mathematically sophisticated invariant embedding techniques to solve the radiative transfer equation [32]. It includes the effects of SICF assuming that Chla always fluoresces in the band centered at 685 nm, regardless of whether it is excited by light at near ultraviolet, blue, green, or even red wavelengths [32]. To calculate the amount of emitted SICF, HL uses the computed scalar irradiance, the absorption by Chla, and various assumptions about the fluorescence efficiency and the wavelength redistribution function approximated by a Gaussian, with 685 nm the mean wavelength and 10.6 nm the standard deviation of the Gaussian, which corresponds to a value of 25 nm for the FWHM of the emission band [32,35]. To train the retrieval algorithm for true reflectance, which is the base for the SICF retrieval described in this paper, a large database of "true" remote sensing reflectances was constructed with HL. This database consisted of 7200 simulations of Rrs −SICF with 1 nm of spectral sampling, obtained from all combinations of the input variables given in Table 1, comprising high variability in water components (Chla and CDOM), as well as including the variability of the scattering coefficient due to the particles' shapes and sizes (by varying the parameters b p (λ 0 ) and m in Equation (11) [35]) and illumination conditions (solar zenith angle (SZA)).
The backscattered fraction with respect to total scattering from particles ( b p (λ)) is defined as follows [35]: where λ 0 = 550 nm, and b p (λ 0 ) and m are values supplied by the user. Accordingly, HL generates a Fournier-Forand phase function using b p (λ) [39]. Hereby, the simulated phase functions are a combination of "small" and "large" particle phase functions, depending on the b p (λ 0 ) and m values. This database is intended to be a representation of the typical spectral behavior of Rrs −SICF in coastal and oceanic waters. Due to the effect of high scattering by NAP over the SICF signal [23,40], NAP was not included in this simulation. The simulated database spectra are shown in Figure 2 (left) together with the calculated Rrsn −SICF for each simulation Figure 2 (right). A high variability of Rrsn −SICF between 640-720 nm and a very low variability for wavelengths higher than 720 nm is observed, in accordance with the similarity index concept. The 7200 remote sensing reflectance without SICF contribution, Rrsn −SICF (left), and remote sensing reflectance without SICF contribution normalized at 780 nm, Rrsn −SICF (right), simulated spectra using the settings described in Table 1. It is noted that the spectral variability is very low in the NIR spectral range (>720 nm).

Validation Database of Remote Sensing Reflectance and Normalized Remote Sensing
Reflectance with and without SICF Contribution (Rrs, Rrsn, Rrs −SICF , Rrsn −SICF ) To validate the SICF retrieval algorithm, a database of 400 random spectra was simulated, here called the validation database (VDB). The VDB was simulated with the HL chlorophyll fluorescence option enabled (considered here as "observed data") to simulate water-leaving remote sensing reflectance (Rrs) and disabled (considered here as "reference data") to simulate water-leaving remote sensing reflectance without SICF contribution (Rrs −SICF ) data. The VDB includes the simulation of NAP according to the values in Table 2. Unlike the training database, the b (λ) modeling for the simulation of the VDB by HL has been performed by selecting a discretized phase function available in the HL library. These discretized phase functions used by HL are the quad-averaged and Fourierdecomposed phase functions computed as described in [32]. This b (λ) modeling is made for both Chla and NAP. Several mass-specific absorption spectra for mineral particles are included in the HL library and are available for brown earth, calcareous sand, yellow clay, red clay [44], and Bukata mix [45]. The variability of SICF is simulated, variating the fluorescence quantum yield (φSICF) parameter in HL between 0.001 and 0.02 according to [25,46]. The considered variable parameters in VDB are described in Table 2.
In Figure 3, the variability of the SICF curve, obtained in the simulations of the VDB, is shown.

Regression Methods
Machine learning regression algorithm (MLRA) methods were used to estimate Rrsn −SICF (λ). MLRA methods have become one of the key tools used in many applications, such as signal and image processing and computer science. They are based on the concept that an algorithm can be trained to learn from data without being explicitly programmed to perform specific tasks. MLRA methods have been used in many remote sensing studies as powerful tools for the inversion of RTMs [49,50]. MLRA methods are robust, and in most cases they are very fast to apply once trained and are able to manage the nonlinear relationship between the output/input data. However, the large majority of methods do not have multi-output capability, and superior accuracies are achieved when MLRA methods are trained per variable, as then the algorithm is more optimized [51,52].

Algorithm Development
In this work, the retrieval of SICF was based on the modeling of Rrsn −SICF (λ) (normalized remote-sensing reflectance without SICF contribution), which is key in the modeling of the major unknown Rrs −SICF (λ), to resolve the SICF contribution from the water-leaving radiance according the following steps: Assuming that after atmospheric correction of remotely sensed images (or from in situ measurements), Rrs(λ) and E d (λ) can be obtained with an acceptable error, the remaining variable to be retrieved is Rrs −SICF (λ), and this is the objective of the algorithm described in this paper.
The normalization of Rrs at 780 nm (Rrsn(λ)) in the NIR region reduces the spectral variability in the red-NIR spectral range and thus facilitates finding a better mathematical adjustment for the estimation of Rrsn −SICF (λ), the major unknown. The proposed methodology to find such adjustment consists of the following steps: (1) analyzing normalized remote-sensing reflectance at 780 nm without SICF contribution (Rrsn −SICF (λ)) behavior based on the representative database of Rrs −SICF (λ), simulated in Section 3.1; (2) obtaining an algorithm from the analysis done in the previous step, allowing the estimation of Rrsn −SICF (λ) within the SICF spectral range from Rrsn(λ), assuming that Rrsn −SICF (λ) within the SICF spectral range is unknown, and Rrsn(λ) and Rrsn −SICF (λ) are equal outside of the SICF range; (3) training different MLRA methods to estimate Rrsn −SICF (λ) using the large database simulated in step 1; and (4) using the simulated random database containing Rrsn −SICF (λ) and Rrsn(λ) described in Section 3.2 to validate the obtained algorithm.
Estimation of Normalized Remote-Sensing Reflectance without SICF Contribution (Rrsn −SICF (λ)) from Normalized Remote-Sensing Reflectance (Rrsn) It is assumed that Rrsn(λ) and Rrsn −SICF (λ) have the same spectral shape in the outer ranges 640-650 nm and 720-750 nm due to the fact that these ranges are, respectively, not or minimally affected by the SICF contribution ( Figure 3). Hence, the estimation of Rrsn −SICF (λ) from Rrsn is based on this assumption of shape stability in the outer SICF regions [32]. The ranges outside of the SICF signal (from 640 to 650 nm and from 720 to 750 nm) are called "outside range 1" and "outside range 2", respectively, and the range inside the SICF signal (from 661 to 710 nm) is hereafter called the "inside range". The "outside range 2" was defined to 750 nm to avoid the O 2 -A absorption band (760 nm) (Figure 4). . Graphic explanation of the Rrsn −SICF (λ) retrieval concept. The green spectrum corresponds to the "observed data" (from a radiometer or remote sensor) Rrsn(λ). The black diamonds indicate the three anchor points estimated by using MLRA methods. The blue spectrum is the Rrsn −SICF (λ) "reference spectra" from the simulated training database and the orange spectrum is the Rrsn −SICF (λ) "estimated spectra" calculating the cubic spline fitting using the bands in the "outside ranges" (green diamonds) and the three anchor points. It is observed that Rrsn −SICF (λ) has a "shoulder" produced by the absorption/scattering effects from OACs.
Estimation of Rrsn −SICF (λ) from Rrsn(λ) was performed using MLRA methods taking all the bands in the "outside ranges" as inputs to calculate the reflectance in the inside region. Due to the multi-output limitations in the MLRA methods, only three anchor points located within the "inside range" were defined for the estimation. These three "inside range" anchor points of Rrsn −SICF (λ) were chosen evenly spaced within the SICF emission region at 670, 685, and 700 nm (Figure 4).
Once these anchor points were estimated, the Rrsn −SICF (λ) spectra in the range between 640 and 750 nm were re-constructed by a cubic spline fitting using the outside ranges, as shown in the Figure 4.
The steps followed to obtain SICF from Rrsn −SICF (λ) are summarized in Figure 5.

Fitting of Normalized Remote-Sensing Reflectance without SICF Contribution (Rrsn −SICF (λ))
The performance of the cubic spline fitting method was assessed for the 7200 Rrsn −SICF (λ) spectra of the calibration database in the spectral range between 640 and 750 nm. Each of the Rrsn −SICF (λ) spectra was fitted using a cubic spline function, and then this fitting was compared to the simulated Rrsn −SICF (λ) for all the bands in the spectral range between 640 and 750 nm. The results are shown in Figure 6. Normalized remote-sensing reflectance without SICF contribution (Rrsn −SICF (λ)) vs. adjusted normalized remote-sensing reflectance without SICF contribution in the spectral range between 640 and 750 nm for all the wavelengths in this range.
It could be observed that despite the high variability, the true normalized remote sensing reflectance spectra of the database in the mentioned range could be modelled by using a cubic spline fitting (Figure 6b). This confirmed the cubic spline fitting method based on the MLRA-derived anchor points as adequate for the purposes of the algorithm.

Validation of the Normalized Remote-Sensing Reflectance without SICF Contribution (Rrsn −SICF (λ)) Estimation
The "observed data" from the VDB was used as input to test the performance of different MLRA methods. Once the three anchor points into the "inside range" were estimated by each MLRA method, the cubic spline fitting was performed to estimate the Rrsn −SICF (λ). Next, the estimated Rrsn −SICF (λ) was compared with the "reference data" from VDB to evaluate each MLRA method.
The relative error ((reference value-estimated value)/reference value) for the estimated Rrsn −SICF (λ) in the "inside range" (661-710 nm) was calculated for each band in the "inside range" for all the 400 spectra from the VDB (Figure 7). All four MLRA methods showed similar performance, resulting in maximum errors near to 0.1, with RMSE values lower than 0.1 and RRMSE values close to 0.02 (Table 3), except GPR with errors up to 0.8 in some samples and with an RRMSE value of 0.05. However, SVR showed a more accurate retrieval, with 0% of samples with errors >0.8 (Figure 8).

Figure 7.
Relative error between the "reference data" from VDB and the retrieved Rrsn −SICF (λ) in the spectral range between 661 and 710 nm for each MLRA trained. The X-axis is the number of validated spectra (1 to 400) and the Y-axis are the wavelengths of the estimated bands. The scale color represents the relative error values, the blue being the lower errors, and the red being the higher errors.

Validation of Sun Induced Chlorophyll Fluorescence (SICF) Estimation
Once Rrs −SICF (λ) was estimated, SICF was calculated according to Equations (12)- (14). The relative error between the reference and retrieved SICF peak (SICF at 685 nm) from the VDB was calculated. A relative error of less than 2% was obtained for the four MLRA methods in more than 70% of the samples. SVR showed the least number of samples with errors >0.8 (0.5%) (Figure 9).
The scatterplots in Figure 10 show that all MLRA methods performed a good estimation of SICF peak (at 685 nm), except the GPR, with an R 2 = 0.14 and an RRMSE greater than 5. The best estimation was performed by using SVR, obtaining an R 2 = 0.98 and an RRMSE lower than 0.25. Figure 8. Histograms of the relative errors for all bands in the spectral range between 661 and 710 nm between the spectra of the "reference data" from VDB and the retrieved Rrsn −SICF (λ) for each MLRA trained. The X-axis is the relative error and the Y-axis are the number of counts. Figure 9. Histograms of the relative errors between the "reference data" from VDB and the estimated SICF peak (at 685 nm), with each MLRA retrieval method used to estimate the intermediate ρ wn . The X-axis is the relative error and the Y-axis is the number of counts.

Comparison of the Proposed Sun Induced Chlorophyll Fluorescence (SICF) Retrieval Method with the Fluorescence Line Height (FLH) Method
To compare SICF obtained by using the algorithm here proposed, from a high spectral resolution reflectance, with the SICF that would be obtained from the ocean and land color instrument (OLCI) by using the FLH method, the spectra of the VDB were convolved to the instrument spectral response function (ISRF) of OLCI, followed by the application of the FLH method and comparison with our algorithm. To illustrate the effect of the reflectance shoulder (or peak) beneath the SICF emission peak, the FLH for OLCI was not only calculated from Rrs(λ), as in its normal operation (red line in Figure 11), but also from the simulated Rrs −SICF (λ), without fluorescence (blue line in Figure 11). Two issues are evident from the spectra shown in Figure 11: (1) the FLH from OLCI underestimated the actual SICF emission, and (2) part of the retrieved OLCI FLH did not correspond to fluorescence, but to "residual" reflectance above the baseline. Both effects can lead to misinterpretations of the meaning of the FLH. To compare our results on SICF estimation with the original FLH method, the VDB was convolved to the OLCI ISRF, to which the FLH method was applied.
Similar to our previous analyses, the relative error of the SICF peak between this retrieval and the "reference" integrated SICF was calculated (Figure 12), showing that the FLH method was only able to obtain 9% of cases with a less than 10% relative error, while the majority of the cases showed a relative error higher than 40%. Figure 12. Histogram of the relative errors between the "reference" SICF peak and the estimated SICF peak by using FLH method over convolved data to OLCI ISRF. The X-axis is the relative error, and the Y-axis is the number of counts.
The SICF peak is a distinct feature in water-leaving remote sensing reflectance, which adds to the "background" reflectance in the 660-750 nm region. In clear ocean waters, that background is roughly linear with a negative slope towards the infrared, and this is the basis of the baseline approach commonly used for fluorescence retrieval. However, in productive coastal waters, the combined effects of particle scattering and phytoplankton and water absorption creates a reflectance shoulder or even a peak in the 670-720 nm region, invalidating the baseline approach based on two wavelengths outside the SICF emission peak. This effect was thoroughly described in [22,23] and affects FLH use as a proxy for the actual SICF emission. In the most extreme cases (very high [Chla]) the peak is entirely dependent on backscattering, due to the strong Chla reabsorption of the fluorescence emission. The only way to estimate the actual SICF curve is to decouple the emission from the background reflectance. Since both signals are combined in actual measurements, it is necessary to estimate somehow the background. The unavailability of SICF in-situ data, limits the validation of the algorithm. Measurements of [Chla] or in-situ fluorescence (measured using artificial light sources) as proxies of SICF could be used to validate it; however, it should be taken into account that the use of this data is not free of biases and errors. The relationship of SICF and [Chla] is not linear, and for the high [Chla] found in productive waters, the contribution of particulate scattering to the fluorescence peak invalids the use of a baseline approximation in the fluorescence peak for SICF retrieval.

Conclusions
The development and testing of a new method for the retrieval of SICF of coastal and ocean waters is presented in this work. The algorithm first performs a normalization of the observed apparent (Rrs) reflectance at 780 nm (following the similarity index approximation), followed by the fitting of the normalized reflectance without fluorescence contribution. This proposed fitting is based on the estimation of three reflectance anchor points within the SICF emission range, by using MLRAs methods and adjusting the normalized true reflectance spectra with a cubic spline fitting.
A training database composed of 7200 Rrsn −SICF (λ) spectra designed to typify the high spectral variability in coastal and ocean waters affected by a wide range of Chla contents and CDOM was used to test the algorithm. The validation database was simulated for these typical ranges too, while using different forward formulations and adding nonalgal particles (NAP) to reduce consistency with the calibration database. The approach of estimating the shape of the true reflectance or Rrs −SICF (λ) from the Rrsn −SICF (λ) shape, obtained from the anchor point fitting inside the SICF emission range of Rrsn(λ), allows the SICF signal to be discriminated from the "shoulder" produced by the absorption/scattering of the OAC in water. All four tested MLRA methods showed similar results for the retrieval of Rrsn −SICF (λ), but the GPR method showed relative errors > 0.8 in a small part of the samples, while lower error values were obtained by using SVR.
The final SICF validation using a validation database with simulated SICF resulted in relative errors close to 2% in the estimation of Rrsn −SICF (λ) and relative errors lower than 2% in more than 70% of the samples in the SICF peak retrieval. When applied to simulated OLCI bands, the proposed SICF retrieval method significantly improved the regular method to retrieve detectable SICF based on the FLH algorithm, as the latter resulted in relative errors higher than 40% in 59% of the samples.
The newly proposed method can be applied for future satellite missions such as FLEX [27] and CHIME [28], which will provide a more detailed spectral shape of waterleaving radiance. A higher accuracy in SICF retrievals from these sensors will provide better insights into the functioning and productivity of ocean and coastal phytoplankton.