## 1. Introduction

The product derived from satellite ocean colour that is the most used is undoubtedly chlorophyll-a (

$chla$) concentration, an index of phytoplankton biomass, which has numerous applications in biogeochemical oceanography, such as phytoplankton ecology and phenology [

1,

2], carbon cycles [

3], climate change, transfer of energy to higher trophic levels, and water quality [

4]. Satellite-derived

$chla$ concentration is a mature product of ocean colour that is used not only by experts in the field of bio-optics but also by the entire oceanographic community in various ways, such as modelling and data assimilation [

5], fisheries applications [

6], and ecosystem management [

7]. For instance, European programs such as Copernicus (

https://www.copernicus.eu/) or NOAA’s COASTWATCH (

https://coastwatch.noaa.gov/cw/index.html) provide daily

$chla$ data that are accessible to the public for fisheries and water quality applications. In Canada, Fisheries and Oceans have been relying on ocean colour for several decades to monitor the state of the marine ecosystem [

8,

9], and in particular on

$chla$ concentration, and its derived product primary production, to infer ecosystem indicators that are at the foundation of ocean management.

$Chla$ concentration is derived from remote sensing reflectance (

${R}_{rs}$), which is the ratio of water-leaving radiance to downwelling irradiance corrected for sun geometry to allow for comparisons independent of locations, times and dates. Several approaches to infer

$chla$ from

${R}_{rs}$ have been developed and embedded by space agencies in their data processing software, including band ratio [

10] and semi-analytical models [

11]. As its name suggests, band ratio algorithms exploit the ratio of wavebands in the blue and green to retrieve

$chla$; the Ocean Colour X (OCx) (x stands for 2, 3, or 4 and indicates the number of bands that were used in the algorithm) suite of empirical algorithms have been developed by the National Aeronautics and Space Administration (NASA) using a global dataset of

in situ measurements of

$chla$ concentration fitted to remote sensing reflectance (NOMAD, the NASA bio-Optical Marine Algorithm Dataset [

12]) and a fourth-degree polynomial expression. On the other hand, semi-analytical algorithms (e.g., Garver–Siegel Maritorena (GSM)) consist of optimizing bio-optical parameters (including

$chla$ concentration) in an approximate solution of the radiative transfer equation to match modelled reflectance to the reflectance measured by the satellite. This type of algorithm has the advantage of decoupling the contribution of the optically active components (i.e., phytoplankton, non-algal particles and coloured dissolved organic carbon) such that

$chla$ concentration should, in theory, be retrieved with higher accuracy than the band ratio algorithms. These two types of approaches are well-suited for the so-called case-1 waters (i.e., waters where

$chla$ concentration drives the optical characteristics of bulk seawater) but do not perform as well in case-2 waters (i.e., where optical signals of marine components are uncorrelated). For case-2 waters, models that use fluorescence [

13] or more advanced statistical methods, such as neural networks [

14] or principal component analysis [

15], have been demonstrated to perform better than band ratio or semi-analytical approaches. Note that the performance of algorithms that use remote sensing reflectance will be inherently dependent on the performance of the atmospheric correction procedure, which will not be addressed here.

It has been shown that OCx [

15,

16] and GSM [

17] algorithms contain regional biases, such that even if its overall performance is satisfying, application to a given region results in systematic bias. Here, we assessed the performance of the OCx and GSM algorithms in Canadian waters (Northwest Atlantic (NWA) and Northeast Pacific (NEP)) using a dataset of

in situ chla concentration that was collected by Fisheries and Oceans Canada as part of their monitoring of the marine ecosystem. Performance of these algorithms was evaluated for NASA’s three sensors, namely the Sea-viewing Wide-field-of-View Sensor (SeaWiFS, 1998–2010), the Moderate Resolution Imaging Spectrometer on the Aqua platform (MODIS-Aqua, 2002–current) and the Visible and Infrared Imaging Spectrometer on the NPP platform (VIIRS, 2012–current). Uncertainties of the algorithms with respect to environmental variables were also discussed.

## 4. Discussion

Satellite-derived

$chla$ concentration remains the most used product to infer global scale information on the status of the marine ecosystem, and non-specialists represent an important fraction of ocean colour data users. Here we have assessed the performance of two algorithms currently implemented in NASA’s SeaDAS software (

https://seadas.gsfc.nasa.gov) in Canadian waters (Atlantic and Pacific Oceans) to inform on biases associated with these algorithms in those regions. In addition, we carried out modifications of these original models by optimizing their parameterization and formulation to correct for regional bias in the NWA and NEP. Both OCx and GSM exhibited a negative overall bias particularly in the NWA, where low

$chla$ were overestimated and high concentrations were underestimated, which resulted in a linear fit of satellite-derived on

in situ chla with a slope lower than one. This bias was partially corrected in the modified versions of the models, which also improved the tightness of the relationship as indicated by higher

${r}^{2}$ than for the original algorithms, but at the expense of the number of individual retrievals with the lowest magnitude of error between different algorithms (“win ratio”,

Table 3 and

Table 4). Significantly different means in

$chla$ concentration between spring and fall in the NWA (see

Section 3.4.1) suggest that the dataset could be further subdivided by season, but at the expense of a synoptical approach. It would also require a larger matchup dataset to achieve statistical significance.

Our approach differed from NASA’s in that the algorithms were fitted directly between satellite-derived and

in situ chla, under the assumption that the satellite

${R}_{rs}$ were reliable, while the original algorithms were developed using both

in situ radiometric and

$chla$ measurements. Furthermore, the regional parameters were optimized, tested, and compared to the original algorithms using the entire dataset for a given region and sensor, rather than extracting a subset for training and using the remainder for testing. Confidence intervals for the polynomial algorithm coefficients were retrieved using a bootstrap method that subsamples the original dataset to compute 2000 different sets of coefficients (see

Section 2.4.2), revealing interval widths ranging from 0.08 to 1.26 for the first three coefficients and 0.38 to 2.39 for the last two, which had the smallest impact on the shape of the polynomial in the region defined by the band ratio values. This suggests that the coefficients derived from the full dataset are representative of any subsets. For the GSM algorithms, the sensitivity study on combinations of spectral variations in

P,

S, and

Y (see

Section 2.4.4) was performed for each region, sensor, and set of

g coefficients (spectral or constant), and the median of the subset of highest-scoring combinations was selected. For this reason, we can assume that using the entire dataset or a subset for each combination would have led to similar results after retrieving the median of the optimal combinations. Finally, the small size of some of the datasets, particularly the NEP SeaWiFS set, presented a challenge in developing a reliable set of parameters spanning a wide range of values for

in situ chla and the environmental variables that were tested for correlations with algorithm error. For these reasons, the regional tuning of the algorithms was carried out on the full datasets rather than subsets.

One of the main results of our studies was the identification of the 443 nm waveband as a strong contributor to the overall uncertainty in the OCx algorithms. Removing this waveband from the OCx algorithms and iteratively forcing the slope and intercept of the linear model between satellite-derived and

in situ chla to one and zero respectively provided better results than the original algorithms (

Table 3). Increasing the degree of the polynomial improved the results for some combinations of region and sensor (see scores in

Table 3 and

Table 4), but overall offered minimal improvements. The negative bias of the original GSM model was corrected by introducing a new exponent,

P, on the chlorophyll term, which lowered satellite-derived

$chla$ concentrations <1 mg m

^{−3} and increased concentrations >1 mg m

^{−3}. This exponent

P was derived in combination with the exponents on the

${a}_{dg}$ and

${b}_{bp}$ terms, giving the best set of spectral slopes for each region and sensor. Inclusion of spectral dependence in the

g coefficients in Equation (

11) slightly improved the results, however, optimization of the spectral slopes of the phytoplankton and yellow substances absorption terms, as well as the particulate backscattering term, had a more positive impact. This finding highlights the importance of accounting for regional properties of absorption and backscattering terms, and particularly their spectral dependence. Overall, the GSM-type algorithms provided the highest win ratio (i.e., the highest number of matchups with the smallest

$chla$ uncertainties), however, this good performance was hampered by the lower number of retrievals compared to the polynomial formulations (

Table 3 and

Table 4). This limitation has to be considered according to the end-user’s applications of the

$chla$ product.

Uncertainties in satellite-derived products are currently an active field of research (IOCCG report on Uncertainties in Ocean Colour Remote Sensing, in preparation) that remains highly complex given the possible sources of uncertainties associated with satellite observations that include, among others, uncertainties in radiometry (and calibration), atmospheric corrections, data binning, and bio-optical algorithms. For instance, it has been shown that at the global scale, the OCx algorithm contains an inherent uncertainty of about 35% [

43]. The difference in scales between satellite and

in situ measurements also creates another source of uncertainty, as

in situ data refer to about two litres or less of seawater at a discrete depth while satellites integrate a volume of tens of millions of litres of seawater. Note that uncertainties in HPLC-derived

$chla$ were partially accounted for in the linear model comparing the results by using Type 2 linear regression as described in

Section 2.5.1, which minimizes the sum of areas of triangles between each point and the regression line, assuming that both variables contain errors. Here, we have addressed discrepancies between satellite-derived

$chla$ and

in situ chla and included temporal and spatial effects. For the two areas of interest, namely the Northwest Atlantic and the Northeast Pacific, we did not find significant patterns of temporal drift, thanks to regular reprocessing of ocean colour products by NASA following a vicarious calibration exercise [

44] that ensures stable measurements of the radiative field with time. We did not find spatial patterns, with the exception of minor positive correlations between latitude and magnitude of error in the NWA VIIRS dataset, and negative correlations between open ocean bathymetry and error magnitude in the NEP datasets, which are likely due to changes in

$chla$ concentration that affect the degree of error. Seasonal biases were observed with an overall underestimation of

$chla$ in the spring and a small overestimation in the fall. This could be explained by the change in phytoplankton community composition (

Table 5), such that when the ratio of fucoxanthin, a marker pigment for diatoms, to

$chla$ concentration increased, the magnitude of error in

$chla$ retrieval also increased. This pattern was observed across all algorithms, but weaker in the regionally-tuned algorithms. GSM-type algorithms offered the greatest improvements, in particular, GSM_GC in the VIIRS dataset where the correlation coefficient of

$|{E}_{chla}|$ and the ratio of fucox to

$chla$ was reduced from 0.54 to 0.27. This demonstrates the effectiveness of regional algorithms at addressing changes in phytoplankton absorption properties, and notably the modified GSM algorithms, which include an exponent on the phytoplankton absorption term to account for changes in absorption properties with changes in phytoplankton community composition in the Northwest Atlantic [

45,

46,

47]. However, the fact that the

$fucox$ to

$chla$ ratio impacted the regional algorithms for SeaWiFS (i.e., both OCx-like and GSM-like) to a similar degree as the original models suggests that the 510 nm band, which is used across all models for the SeaWiFS sensor, can contain information on the presence of diatoms, as highlighted in Sathyendranath et al. [

48] who developed an algorithm to discriminate diatoms from other phytoplankton types for SeaWiFS.

Finally, another notable source of uncertainty is the atmospheric correction method applied to NASA’s level 2 images, based on the “black pixel assumption”, which assumes that scattering in the near-infrared bands from approximately 670 nm to 865 nm [

49] is negligible, thus the ocean appears black. This was a reasonable assumption for case 1 waters [

50], but optically-complex case 2 waters can contain elements that contribute to the scattering in those bands, giving an inaccurate correction of remote sensing reflectance values used in

$chla$ algorithms [

51]. Given that many matchups in this report were near coastal or shelf areas, there could have been a degree of error associated with poor atmospheric correction. As discussed in

Section 3.4.1, many satellite matchups did contain negative

${R}_{rs}$ in a waveband near one end of the visible spectrum. These matchups were incapable of computing valid

$chla$ concentrations for the GSM algorithms but were included in the band ratio algorithms.