Chlorophyll-a Concentration Retrieval in the Optically Complex Waters of the St . Lawrence Estuary and Gulf Using Principal Component Analysis

Empirical methods based on band ratios to infer chlorophyll-a concentration by satellite do not perform well over the optically complex waters of the St. Lawrence Estuary and Gulf. Using a dataset of 93 match-ups, we explore an alternative method relying on empirical orthogonal functions (EOF) to develop an algorithm that relates the satellite-derived remote sensing reflectances to in situ chlorophyll-a concentration for the Sea-viewing Wide Field-of-view Sensor (SeaWiFS). Results show that an accuracy of 41% at retrieving chlorophyll-a concentration can be reached using the EOF method compared to 140% for the widely-used Ocean Chlorophyll 4 (OC4v4) empirical algorithm, 53% for the Garver-Siegel-Maritorena (GSM01) and 54% for the Generalized Inherent Optical Property (GIOP) semi-analytical algorithms. This result is possible because the EOF approach is able to extract region-specific radiometric features from the satellite remote sensing reflectances that are related to absorption properties of optical components (water, coloured dissolved organic matter and chlorophyll-a) using the visible SeaWiFS channels. The method could easily be used with other ocean-colour satellite sensors (e.g., MODIS, MERIS, VIIRS, OLCI) to extend the time series for the St. Lawrence Estuary and Gulf waters.


Introduction
The St. Lawrence Estuary and Gulf (SLEG), in Eastern Canada, is a large (250,000 km 2 ) and complex coastal ecosystem where the biological, physical and chemical features are highly dynamic as a result of strong tides, winds, a high volume of freshwater runoff, complex bathymetry and winter sea ice [1,2].Phytoplankton form the basis of this ecosystem.Their abundance is estimated by measuring the concentration of chlorophyll-a (photosynthetic pigment contained by all phytoplankton), a proxy for phytoplankton biomass [3].Phytoplankton are primary producers [4,5] that transfer energy to higher trophic levels and export carbon to the deep ocean [6].Knowledge of phytoplankton standing stock and distribution helps characterize the status of marine ecosystems, thereby facilitating their protection through sustainable management practices [7].Phytoplankton are also sensitive indicators of changing chemical and physical conditions due to their short life cycles [8,9].
To our knowledge, the only peer-reviewed paper presenting a SLEG chlorophyll-a (Chl) climatology is based on a limited Coastal Zone Colour Scanner dataset (80 images) covering the years 1979-1981 [10].Even though that work described the major features of Chl distribution in the SLEG (upwelling regions, mesoscale circulation), the results were based on a small number of images for each month (see their Table 1) and did not cover the entire seasonal cycle, with no data after September.Thus, there is a need to revise these results with a larger dataset covering a longer time frame.Constructing such a climatology from in situ data is difficult due to the relatively sparse spatial and temporal coverage of measurements over such a large extent.As shown by [10], satellite-derived Chl concentration could provide important information on the ecological status of the SLEG.However, this region is an optically complex marine environment [11] due to the presence of coloured dissolved organic matter (CDOM) derived from decaying organic matter and composed of humic and fulvic acids.Common CDOM sources include land runoff and the degradation products of phytoplankton.CDOM distribution is driven by the interplay of hydrodynamic mechanisms at different spatial and temporal scales and by exposure to ultraviolet (UV) light that causes photodegradation [12,13].CDOM absorbs very strongly in the UV-blue region of the spectrum, competing with phytoplankton for the absorption of blue photons.As a result, when CDOM is present and does not covary with Chl, blue-green band ratio algorithms are confounded, causing overestimation of Chl by as much as 400% [14][15][16][17][18]. Previous studies that attempted to measure Chl using satellite ocean colour in these optically complex waters [18][19][20] indicated the need for more accurate retrieval algorithms in order to exploit this information for operational applications such as ecosystem and fisheries management.Precise estimation of Chl can also be used to provide environmental indicators of ecosystem health, to develop a response plan for ship-source oil pollution spill or detect changes related to climate variability [21].Considering the poor performance of the Ocean Chlorophyll 4 (OC4v4) algorithm [18], the first step to derive robust seasonal climatologies, phytoplankton phenology, inter-annual trends and phytoplankton functional types is to establish a better inverse model to estimate Chl from remote sensing observations.
The goal of the current study is to develop a more accurate satellite-based regional Chl retrieval algorithm for the SLEG using a statistical approach.Tailored to specific optical conditions, regional models are a necessity to improve the interpretation of upwelling light from distinctively complex water bodies and are thus regularly published (e.g., [22][23][24]).We selected a method based on principal component analysis (also known as empirical orthogonal functions (EOF)) for its ability to capture the essential information contained in satellite-derived remote sensing reflectance (Rrs) spectra and relate it to Chl concentration [25].The satellite-derived dominant information is related to coherent variations in measured Chl concentration via a multilinear regression model.This study thus reinforces the use of an easily-implementable PCA-type approach for inversion of geophysical variables [26][27][28].
Such an approach has the potential to significantly improve the reliability of Chl retrievals in the SLEG as shown recently for the optically complex waters of the Bedford Basin, Canada [29].

Data and Method
The EOF-based model was developed using a dataset covering the entire Sea-viewing Wide Field-of-view Sensor (SeaWiFS) era, and its performance is compared against standard and regionally-adapted Chl retrieval algorithms using an extensive in situ database.The use of SeaWiFS over other sensors such as the more recent Moderate Resolution Imaging Spectroradiometer (MODIS) was dictated by the large in situ measurement database that was assembled during the SeaWiFS era (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010).SeaWiFS was also the first ocean colour sensor with high quality measurements (higher signal-to-noise ratio), precise calibration and improved spectral and spatial resolution that led to improved geophysical products compared to its predecessor, the Coastal Zone Color Scanner (CZCS), which operated between November 1978 and June 1986.

In Situ Data
Surface measurements (2927 samples) of in situ Chl data collected by the Department of Fisheries and Oceans Canada were obtained from the St. Lawrence Global Observatory (SGBD) repository [30] for the years encompassing the SeaWiFS mission (1997-2010) and sampled in a region bounded from 45.5-50.5 • N and 59.0-71.0• W. The water samples were processed using fluorometric methods [31,32].Samples were either processed onboard or frozen to be processed later in the laboratory.Usually, high-pressure liquid chromatography (HPLC) measurements are preferred over fluorometric methods to perform satellite validation exercises.Unfortunately, HPLC samples in the SLEG are too few to obtain a large enough match-up dataset.HPLC and fluorometric measurements are however very well correlated, with a log-linear regression R 2 of 0.88, a slope of 0.995 and an intercept of 0.12, N = 161.Figure 1 shows that the dataset covers the entire ice-free season (April-November).The Chl concentration range is [0.03, 76.

Match-Up Dataset
The complex and dynamic SLEG ecosystem requires the use of the highest available spatial and temporal resolution images to capture small-scale and short-term events and minimize errors when applying non-linear models.We used SeaWiFS Level-1A MLAC (Merged Local Area Coverage, 1.1-km spatial resolution at nadir) data downloaded from NASA's Ocean Biology Processing Group (OBPG) website [33], which were processed using the SeaWiFS Data Analysis System (SeaDAS, Version 7.3.2) software to derive Rrs.A window of ±3-h between in situ data collection and satellite overpass was used to perform the match-up effort as recommended [34].This time window represents a good compromise between maximizing the number of match-ups and minimizing the spatio-temporal variability of the SLEG when comparing satellite and in situ measurements.A 3 × 3 pixel matrix centered on each in situ sample in the database was extracted from each concomitant satellite pass.This area (9 km 2 ) was considered large enough to average out small-scale variability in the SLEG [35].It also reduces potential satellite navigation errors.Any spectrum with negative values was removed, as these likely presented pixels where the atmospheric correction process failed.Match-ups were kept when 6 out of 9 pixels were valid.For each match-up, we used the median of the positive Rrs spectra with non-flagged pixel values.All SeaDAS default flags were used during the processing, including atmospheric correction failure, stray light, sun glint and possible sea ice or cloud contamination.The final dataset was composed of 93 valid match-ups from 67 overpasses out of the initial 2927 samples.This comparatively small set of valid data points emphasizes the impact of cloud cover on satellite ocean colour potential and the challenges of atmospheric correction in the validation of geophysical products in coastal zones.
Figures 1 and 2 show that despite the small number of data, the match-up dataset retains similar properties to the original larger SGDB dataset in terms of concentration distribution, temporal distribution and spatial coverage of the Chl.The match-up dataset is spatially homogeneous, but biased towards the June period (typically bloom or post-bloom conditions).Compared to the initial dataset, fewer match-ups were found for November as a result of cloud cover and low sun elevation, which limits the number of good images.The range of Chl concentration for the match-up dataset is [0.14, 22.4 mg chl m −3 ] with a mean (median) value of 1.20 (0.62) mg chl m −3 .The maximum value of 22.4 mg chl m −3 was collected on 31 May 1999 at a station periodically sampled in the St. Lawrence estuary (48.66 • N, 68.58 • W), which is known to reach high values during the spring bloom [36].The match-up dataset therefore adequately represents the SLEG in terms of geographic coverage and phytoplankton dynamics.

Atmospheric Correction
A critical step of ocean colour data processing is to generate the most accurate Rrs (sr −1 ) defined as: where L w is the water-leaving radiance (µW cm −2 m −1 sr −1 ), E d is the downwelling irradiance (µW cm −2 m −1 ) at the sea surface (0 + ) and λ (nm) is the wavelength of interest.Ocean colour sensors measure the radiant energy reflected by the Earth-atmosphere system, so the contribution from the atmosphere to the total signal reaching the sensor must be removed.Because this correction depends not only on illumination conditions and atmospheric composition, but also on the water-type, we characterized the optical signature of the SLEG waters using the approach from [37].This method uses the relation between a proxy for the relative abundance of CDOM to Chl (Rrs(412)/Rrs(443)) and a proxy for Chl concentration (Rrs(555)/Rrs(490)), as these two quantities are very well correlated in Case-1 waters (black line in Figure 2a).Applying that method on the SeaWiFS climatological Rrs data available from the OBPG website [38], it is clear that almost all the SLEG data fall outside the area considered by [37] as representative of Case-1 waters (Figure 2a, purple interval) and for which the optical properties are solely determined by Chl. Figure 2b maps the normalized distance between a point and the perfect Case-1.In Figure 2 the colours represent the distance between the retrieved climatological Rrs value and the black line (i.e., Rrs(412)/Rrs(443), theoretical perfect case-1), normalized by the width of the purple interval for this Chl concentration.This colour coding does not present absolute quantitative information, but rather a guide to illustrate the spatial contrast in the division of the relative abundance of CDOM to Chl, an indicator of the water type.It is clear from this figure that Rrs is influenced by CDOM to a far greater degree than by phytoplankton, such that the SLEG cannot be considered Case-1 water.This result is consistent with the optical classification of coastal waters performed by [11] where the SLEG Rrs spectral shapes were classified as waters dominated by CDOM absorption.According to [39], the standard NIR-iteration atmospheric correction (AC) procedure [40][41][42] is more suited to CDOM-dominated waters than the Management Unit of the North Sea Mathematical Models (MUMM) AC [43], which was developed for turbid waters dominated by mineral particles.Considering the low concentration of suspended particulate matter (SPM) measured over most of the SLEG [44], the standard NIR-iteration AC was used to process the SeaWiFS images.

Performance Evaluation
To assess agreement between measured and predicted Chl, the mean bias (a measure of the systematic error; Equation ( 2)), the root mean squared error (RMSE, a measure of accuracy; Equation (3)) and the absolute mean percentage difference (APD, a measure of accuracy as a percentage; Equation (4)), similar to the relative error (RE, a measure of accuracy of each estimate as a percentage; Equation (5)), were calculated using the following equations: We also used the slope, intercept and R 2 values of the calculated log-linear relationships to evaluate the performance of the various algorithms.We applied a reduced major axis (RMA) regression model of Type II (R package lmodel2, [45]) to derive the slopes and intercepts of the regressions.

Performance of Generic Algorithms
Prior to developing a new algorithm, we tested three Chl algorithms (OC4v4, GSM01, GIOP), which are readily available in SeaDAS.Figure 3a shows that the SLEG dataset is very different from the global dataset used to develop the OC4v4 algorithm, which attests to the strong impact of CDOM absorption on the remote sensing signal (Figure 2).Several studies showed that global empirical relationships do not perform well when external sources of organic and inorganic compound are present [20,46,47].As expected, the OC4v4 algorithm overestimates Chl in the SLEG with an APD of 140% (Table 1).This value can be compared to validation exercises carried out in other coastal environments such as the Baltic Sea (≈159 to 201% overestimation) [16,48], the southeastern Beaufort Sea (188%) [49], the La Plata estuary (106-250%) [50] and the Black Sea (≈400%) [51].Many site-specific empirical relationships between ratios of Rrs at different wavebands and in situ measurements of Chl have been derived in previous studies [16,48,49,52,53].Local tuning methods have the great advantage of being very simple to implement with the model parameters being optimized for a given dataset.This type of approach removes or decreases systematic bias in the Rrs due to other components than phytoplankton because the fit provides the average trend of the data.An attempt was thus made to use the band-ratio approach using the SLEG match-up dataset.Figure 3a shows that the distribution of the 93 match-ups contains both low and high Chl values for a given band ratio.Tests showed that fitting the data with higher order polynomial functions (quadratic, cubic, fourth and fifth order) did not yield significantly better performance than the simple linear regression (p > 0.05).Similar conclusions were drawn by [54] for the Beaufort and Chukchi seas and by [48] for Baltic waters.The band-ratio algorithm that resulted in the best performance (RMSE, APD and R 2 , Figure 3b and Table 1) for the SLEG is therefore the linear regression: where a 0 = 0.047, a 1 = −2.1 and X is the log 10 of the maximum of Rrs(443)/Rrs(555), Rrs(490)/Rrs(555) or Rrs(510)/Rrs(555).The Ocean Chlorophyll 4 Linear (OC4L) algorithm provides better estimates of Chl than the OC4v4 (Table 1) with an RMSE of 0.29 log 10 mg chl m −3 , an APD of 56% and an R 2 of 0.35.Yet, the results suggest that even a local optimization process for the SLEG does not lead to satisfactory results in assessing the status of ocean biota since this approach does not provide the ability to remove the contribution from other components than Chl concentration to the total reflectance signal.Two semi-analytical algorithms available in SeaDAS were also tested.The Garver-Siegel-Maritorena version 1 (GSM01) [56] algorithm was mostly developed using an oceanic dataset, and the Generalized Inherent Optical Property (GIOP) [57] algorithm has been recently developed to provide better performances in coastal waters.Unlike OC4-type algorithms, the GSM01 and GIOP algorithms use Rrs absolute values rather than band ratios.A semi-analytical reflectance model is coupled to an optimization algorithm (e.g., Levenberg-Marquardt method) to find the best set of Chl concentration, yellow substance absorption and particulate backscattering that will minimize the quadratic difference between measured and modeled Rrs.In theory, this type of algorithm decouples the contribution from CDOM and phytoplankton, making it suitable for coastal waters.Nevertheless, it requires a priori knowledge of the spectral shape of absorption and backscattering of the marine components, which can lead to increased errors in the retrieved information when the spectral dependence contains bias.Table 1 shows that both semi-analytical approaches (GSM01 and GIOP) perform relatively well in the SLEG with an APD of 53% and 54%, respectively.In other coastal environments, results obtained from the GSM01 algorithm [49,50,52] indicated significant overestimation of Chl (96-121% in the La Plata estuary, 49% in the Chesapeake Bay, 101% the Beaufort Sea).Both GSM01 and GIOP did not performed well in the South China Sea even when using in situ Rrs [58].Together, the results obtained with the OC4v4, OC4L and the two semi-analytical algorithms indicate that there is still room for improvement in the precision of Chla estimation in the SLEG and justifies the need to develop a SLEG-specific algorithm that relies on a different approach.

Development of a Chlorophyll Algorithm Using EOF
The approach of [29] (hereafter referred to as EOF) demonstrated two important results when applying non-parametric statistical methods to inversion of the ocean colour signal.First, it is possible to build a stable model to derive Chl concentration from a small training set of remote sensing reflectances (<30), and second, multispectral resolution (SeaWiFS-like channels) shows similar capability as hyperspectral resolution when inferring Chl concentration from Rrs.
The 93 Rrs match-ups for the six SeaWiFS wavelengths were aggregated into a single matrix (93 × 6), which was log-transformed to avoid a skewed distribution.No spectral normalization was performed as this did not improve the results.A principal components analysis was performed on this matrix with the principal function from the psych package in R [59].The log-linearized and correlated visible channels were transformed into linearly-independent principal components (also called modes of oscillation) that represent the global covariance structure of the Rrs.The primary mode of oscillation accounts for maximum variability of Rrs and each successive (subordinate) mode accounts for as much of the remaining variability as possible.
A full linear model was first constructed using all of the available six modes.The model was then constructed by performing the Akaike information criterion (AIC)-based stepwise regression (MASS package in R, [60]), which attempts to find the best variable selection by adding and removing modes, keeping the model as simple as possible.A minimum AIC value is a balance between model data fit and a penalty based on the number of dependent variables [61].Chl estimates were thereby derived by performing a multilinear regression with the smallest AIC value using the scores from the selected modes: where B i are the coefficients, N is the number of retained modes (N ≤ 6, corresponding to the number of visible channels of SeaWiFS) and S i are the score vectors.For our complete dataset (Section 2.2), the final model is the same as the full one, meaning all modes parsimoniously explain variance in Chl within the training dataset [62].Besides, all predictors of the regression are significant (p <0.05).
To validate the approach of [29] where the subordinate modes are sometimes removed, we carried out a cross-validation exercise where the full scores were divided into a training sample to define a multilinear regression and a test sample to evaluate the method.Minimum and maximum training data were 20 and 80%, respectively.This exercise was performed 1000 times for each sample size, with 1% increments, choosing the most important modes generated from the stepwise AIC criterion-based procedure (generally between two and six).For every iteration, the fewest number of modes was balanced against the prediction power, using a combination of forward selection and backward elimination of the potential predictors [60].The B i coefficients of the selected models were used with all Rrs-derived score vectors to predict Chl estimates.Figure 4a shows the stepwise regression method (selection of the lowest AIC model) and its performance improvements as a function of the training data size (%).Stable performance was reached once 50% of the training dataset was used (i.e., ∼47 match-ups).This cross-validation exercise allows insight into the stability of the model when it comes to the number of match-ups needed to represent temporal and spatial Chl sampling constraints shown in Figures 1 and 2.
The performance of the EOF algorithm is shown in Figure 4b and Table 1.There is a good improvement over the OC4L and the two semi-analytical algorithms with an RMSE of 0.22 log 10 mg chl m −3 , an APD of 41% and a R 2 of 0.65.The slope and intercept values are also better than the OC4L.

Discussion
In the optically complex SLEG region, the EOF model has considerable advantages over band ratio algorithms.Valuable information is retained by using the six SeaWiFS channels.In the dataset, the further apart the channels, the less correlated they are (correlation matrix not shown), meaning a higher variance should emerge from incorporation of the marginal channels.The blue channel (412 nm) probably records variations that are not directly related to Chl concentration variations, but rather to CDOM absorption, which should partly blend with the usual phytoplankton blue dominant absorption peak at 443 nm.At the other end of the spectrum, although the added red channel (670 nm) is dominated by water absorption, a smaller, but interesting contribution can also appear from the chlorophyll-a absorption band centered at 675 nm.
Figure 4c shows the six modes of oscillation derived from the Rrs matrix.Rrs being a function of both absorption and backscattering, its variations are thus regulated by spectral variations of inherent optical properties [63].Backscattering should not show strong variations in the SLEG due to very low SPM concentrations [44], leaving absorption as the primary source of spectral variability.The work in [47] showed the contributions from different optical components to the absorption spectrum based on 371 in situ surface absorption measurements from all the major ocean basins.According to these results, our principal modes of oscillation can be attributed to the fraction of contribution to the total absorption by CDOM (Mode 1, 66% of variance explained), water (Mode 2, 29% of variance explained) and phytoplankton (Mode 3, 4.1% of variance explained), while Mode 4 (0.5% of variance explained) mostly represents SeaWiFS variations in the red channel and, to a lesser extent, the green channel.The spectral behaviour of the red phytoplankton absorption peak is known to vary for different species [64,65].The minor variations in the red could therefore be caused by different species and provide a very slight, but superior prediction ability when this variance is integrated with the model.Considering the weak visible absorption of detrital particles in the SLEG (a nap (440) = 0.031, N = 148, [13]), it is unclear if subordinate modes should be associated with a definitive variation of optical property (Modes 5 and 6 together account for <0.3% of the variance).
The EOF model is a superior predictor for Chl concentration retrieval in the SLEG for the SeaWiFS data, as it separates the misleading spectral effects from the other major optically-active component (CDOM).Overall, EOF statistics show a better performance than all the other tested algorithms.The log-based bias and RMSE of the EOF method highlight its ability to operate well across the dynamic range of our Chl match-ups (∼2 orders of magnitude).Considering the small magnitude of the values involved in the calculation of the APD and the precision of the in situ fluorometric measurements, these results are encouraging.As shown in Figure 5, the model performance was further evaluated by examining the relative error (1) as a function of the Chl concentration, to assess whether the algorithm was consistent for the entire range of variation of Chl concentration and (2) as a function of the geographical location of the match-ups.Chl concentrations used to develop the EOF model span over a limited range with few values above 3 mg m −3 (Figure 5a).The EOF model may thus not adequately capture the magnitude of the Rrs spectra variations for more eutrophic waters.This is a limit inherent to any empirical method that will only perform well in the range of data that were used to develop it.Figure 5a also shows that the largest relative errors (>100%) are mostly limited to Chl concentrations lower than 0.5 mg m −3 .Table 1 shows that most (71%) of the EOF predictions remain under 50% relative error.Figure 5b shows that no obvious large-scale spatial pattern appears in the geographic relative error distribution.In the St.
Lawrence Estuary where there is a high density of match-ups, only two of the 21 retrievals yield a relative error above 80%.The EOF approach does not appear to have major geographical bias over SLEG waters.
Previous attempts at estimating Chl in the SLEG from satellite remote sensing showed the difficulty in achieving reasonable accuracy.The work in [18] tested several models using fluorometric Chl values and a wide variety of band ratios and band differences.Their results showed that an algorithm based on two band-ratios (Rrs443/Rrs510 and Rrs443/Rrs555) yielded an estimation error of 70% using in situ radiometric measurements and 95% when using SeaWiFS-derived Rrs.Other studies in optically complex coastal waters showed that it is often necessary to avoid band-ratio-based algorithms to better reflect the specificity of the local/regional biophysical characteristics [66][67][68][69][70].As part of the CoastColor validation program, the accuracy of a neural network approach was applied to MERIS to estimate Chl.When evaluated with a network of moored buoys (Larouche, unpublished) it was shown inadequate for the SLEG, yielding accuracies of about 300%.The EOF method is thus the most appropriate Chl algorithm developed for the SLEG to date and is a major improvement over operational or other past regional algorithms.
Figure 6 shows the application of the EOF method on an L2 image acquired in the fall of 2003.Values below 0.1 and above 10 mg chl m −3 (representing <0.07% of the Chl estimates) were removed to effectively depict Chl patterns.The image displays strong spatial variability of Chl concentration in the SLEG, including mesoscale features in the estuary part.One of the reasons why our dataset was limited to 93 match-ups is atmospheric correction (AC) failures.Addressing the AC challenge in the SLEG should provide a major improvement in future algorithm development for these optically complex waters.Local measurements for developing such a regionally-specific AC scheme were recently derived by [71].

Conclusions
A method based on empirical orthogonal functions was used to improve the accuracy at which chlorophyll-a (Chl) concentrations can be retrieved for the optically complex St. Lawrence Estuary and Gulf.The main strength of the method lies in its ability to decouple the different contributors (i.e., water, yellow substances, phytoplankton) to the total reflectance signal.The Chl concentration retrieval accuracy was greatly improved from 140% using the OC4v4 band-ratio approach to 41%.In addition, the EOF method did not show any spatial bias, making it applicable to the entire region of interest.The third report from the International Ocean-Colour Coordinating Group [72] stresses the need for new algorithms and a fresh approach to derive information from optically complex waters.A major impediment for such an endeavour is the lack of in situ data publicly available for research.With the increasing popularity of the SeaBASS dataset, the EOF method could easily be applied to global data.Conjointly, water-type tuning could also help improve algorithm performance.Our work using the SeaWiFS ocean colour satellite was a proof of concept.The same approach could in fact be applied to other satellite sensors (MODIS, MERIS, VIIRS, OLCI) given that a sufficiently large match-up dataset is assembled.This would extend the time series from 2010 to present day.Implementing the method using data from several ocean colour satellites would also help overcome some of the temporal limitations by taking advantage of different equatorial crossing time, filing inter-orbit gaps and limiting sun glint effects.

Figure 1 .
Figure 1.Frequency distribution of the 2927 original Chl measurements, 93 of which were considered valid match-ups.(a) Chl concentration frequency distributions (inset presents the same data binned by order of magnitude).Temporal distributions of Chl concentration for (b) the original and (c) match-up datasets.

Figure 2 .
Figure 2. Representation of the different water-types.(a) Classification of the SLEG climatological Rrs using the methodology of [37].The black line represents the perfect Case-1 relation between Rrs(412)/Rrs(443) and Rrs(555)/Rrs(490).(b) Map of the normalized difference between the climatological SeaWiFS Rrs spectra and the perfect Case-1 line (see the text), with black crosses representing the location of the original Chl samples and grey dots representing the retained Chl samples.The same colour scale applies to both panels.

Figure 4 .
Figure 4. Features of the EOF method: (a) stability of the EOF; (b) scatter plot of in situ Chl versus satellite-derived Chl using the EOF, with the dashed line as the 1:1 ratio; (c) all modes of oscillation (solid coloured lines are linear interpolation over wavelengths of the discrete spectral data and are used as a guide to aid visualizing the spectral signature).

Figure 5 .
Figure 5. Relative error as a function of (a) Chl concentration and (b) spatial distribution.