Next Article in Journal
Assessment of Image-Based Point Cloud Products to Generate a Bare Earth Surface and Estimate Canopy Heights in a Woodland Ecosystem
Next Article in Special Issue
Development of a Regional Coral Observation Method by a Fluorescence Imaging LIDAR Installed in a Towable Buoy
Previous Article in Journal
Spatially and Temporally Complete Satellite Soil Moisture Data Based on a Data Assimilation Method
Previous Article in Special Issue
Scaling up Ecological Measurements of Coral Reefs Using Semi-Automated Field Image Collection and Analysis
Article Menu
Issue 1 (January) cover image

Export Article

Remote Sens. 2016, 8(1), 45;

A Statistical Algorithm for Estimating Chlorophyll Concentration in the New Caledonian Lagoon
Sciences and Technologies Department, University of New Caledonia, Nouville Campus BP R4, Nouméa CEDEX 98851, New Caledonia
Aix-Marseille University, CNRS/INSU, University of Toulon, IRD, Mediterranean Institute of Oceanography (MIO), UM 110, Marseille 13288, France
Institut de Recherche Pour le Développement (IRD), BP A5 98848 Nouméa CEDEX 98848, New Caledonia
University of Montpellier II, University of Reunion, University of French West Indies, University of French Guiana, IRD, ESPACE-DEV UMR 228, Montpellier 34093, France
LEGOS, University of Toulouse CNES, CNRS, IRD, UPS, Toulouse 31401, France
Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 92037, USA
Author to whom correspondence should be addressed.
Academic Editors: Stuart Phinn, Chris Roelfsema, Raphael M. Kudela, Xiaofeng Li and Prasad S. Thenkabail
Received: 16 September 2015 / Accepted: 10 December 2015 / Published: 7 January 2016


Spatial and temporal dynamics of phytoplankton biomass and water turbidity can provide crucial information about the function, health and vulnerability of lagoon ecosystems (coral reefs, sea grasses, etc.). A statistical algorithm is proposed to estimate chlorophyll-a concentration ([chl-a]) in optically complex waters of the New Caledonian lagoon from MODIS-derived “remote-sensing” reflectance (Rrs). The algorithm is developed via supervised learning on match-ups gathered from 2002 to 2010. The best performance is obtained by combining two models, selected according to the ratio of Rrs in spectral bands centered on 488 and 555 nm: a log-linear model for low [chl-a] (AFLC) and a support vector machine (SVM) model or a classic model (OC3) for high [chl-a]. The log-linear model is developed based on SVM regression analysis. This approach outperforms the classical OC3 approach, especially in shallow waters, with a root mean squared error 30% lower. The proposed algorithm enables more accurate assessments of [chl-a] and its variability in this typical oligo- to meso-trophic tropical lagoon, from shallow coastal waters and nearby reefs to deeper waters and in the open ocean.
chlorophyll-a concentration; MODerate resolution Imaging Spectroradiometer (MODIS); ocean color; remote sensing; statistical algorithm; oligotrophic waters; New Caledonia; coral lagoon

1. Introduction

New Caledonia is a South Pacific archipelago located between longitudes 162° and 169° and latitudes −23° and −19°. The New Caledonian lagoon, which extends over 24,000 km2, contains one of the most extensive reef systems in the world. These systems exhibit exceptional diversity of coral and fish species and a continuum of habitats from mangroves to sea grasses [1]. UNESCO added the New Caledonia Barrier Reef to the World Heritage List on 7 July 2008 [2], emphasizing the importance of preserving such biodiversity sites.
However, this fragile environment is subject to both anthropic and environmental stresses. Nickel mining is the major sector of the economy in New Caledonia. The various islands contain about 10% of the world’s nickel reserves [3]. The massive erosion caused by the mineral extraction process produces mineral matter inputs into the coastal ecosystem, particularly during rainy events [4]. These coastal inputs sometimes lead to fish death, coral bleaching, etc. [5]. Thomas et al. quantified farm discharges and evaluated their impacts in the lagoon of New Caledonia [6] where chlorophyll-a concentration ([chl-a]) is generally lower than 1.2 µg·L−1 except in bays subject to anthropogenic influences where [chl-a] may increase to 3.6 µg·L−1 [7]. Acanthaster planci (a coral-eating starfish) proliferation is probably due to algae proliferation, which is itself due to increased anthropogenic inputs, and a recent study highlighted a link between Acanthaster outbreaks and ocean productivity, favored by upwelling increased due to wind forcing [8].
Such increases in chlorophyll (up to 2 µg·L−1 observed in the South Western lagoon) are either linked to rain [9,10] or to other processes such as upwelling or tides [11,12], which were recently modelled [13,14]. Climate change is also a factor of stress for reefs and lagoon ecosystems. Increase of ocean temperature, acidity, overexposure to sunlight and decrease in salinity affect the rate at which lagoons lose or gain water from evaporation, precipitation, surface runoff, and exchange with the ocean, and therefore water quantity and quality. Disturbances and other stressors may act concomitantly, or even interact, at multiple spatial and temporal scales, with consequences already documented or expected for the physical structure, ecological properties, and social values associated with lagoons. Many coral reefs in the world already suffer from climate and anthropogenic changes. Since 1985, Great Barrier Reef in Australia has lost more than half of its coral meadows [5]. Coral bleaching events happened in 1998 and 2002: more than 60% of the coral populations were hit, and even though the situation improved after several weeks, about 10% of the population perished [5,15,16]. To avoid or to monitor such events, it is necessary to accurately assess water properties in terms of chemical, biogeochemical and thermal characteristics, among which is chlorophyll concentration, an indicator of phytoplankton biomass.
Empirical algorithms have been developed in order to predict [chl-a] from ocean color seen by satellites [17,18]. The NASA OC3 algorithm uses a relation between [chl-a] and logarithms of Rrs ratios. The relation used for MODIS imagery is a polynomial function of the maximum Rrs ratio in spectral bands centered on 443 and 555 nm and 488 and 555 nm. This algorithm is valid in oceanic waters where a change in [chl-a] mainly causes a shift in the blue to green water reflectance ratio [19]. By using a color index defined as the difference between Rrs in the green and a reference formed linearly between Rrs in the blue and in the red, [20] improved OC3 assessments in global ocean where [chl-a] is less or equal to 0.25 µg·L−1. In coastal waters, the ratios used in these algorithms can vary due to the influence of other optical components (colored dissolved organic matter—CDOM—and suspended sediments) besides [chl-a], which may introduce large errors in [chl-a] retrievals from satellite data [21,22]. Recently, a round-robin exercise for MERIS was conducted for some coastal areas of the world including the Great Barrier Reef, giving clues for discrepancies noticed in optically complex waters [23]. In coastal waters with high [chl-a] (10 µg·L−1), algorithms based on regressions between MODIS reflectance ratios and [chl-a] improved the [chl-a] estimates [24]. Supervised learning was also used to develop an algorithm adapted to a coastal eutrophic region (between 5 and up to 50 µg·L−1 chl-a) [25,26]. Camps-Valls et al. [27] showed how to improve the use of Support Vector Regression (SVR) or Relevance Vector Machine (RVM) [28] to estimate oceanic [chl-a] from remote sensing.
Past inter-comparison studies [29,30] reported negative or positive biases in [chl-a] retrievals with OC3 applied in lagoon imagery of New Caledonia, leading us to investigate a new algorithm. Average depth is 25 m in the lagoon and waters are principally oligotrophic, with weak [chl-a] and turbidity [29,31]. The changing bathymetry, very variable bottom types, and the oligotrophic and shallow nature of these waters are sources of errors made with OC3 applied to MODIS data [29]. In such shallow waters, Rrs not only depends on the absorption and scattering properties of dissolved and suspending materials in the water column but also on the depth and reflectivity of the bottom [32]. Recently, an inversion method was developed for the lagoon waters of New Caledonia [33]. A recent study has shown that depth strongly influences [chl-a] assessments [34]. Moreover, the influence of attenuation on seabed reflectance and exact bathymetry retrievals has been defined for the New Caledonia lagoon [35]. Thus, NASA products based on OC3 are not adapted to New Caledonia coastal waters.
Another [chl-a] algorithm, OC5 [36], was tested for New Caledonian lagoon waters. However, OC5 was especially designed for the very turbid waters of the shallow Brittany coast or for Tunisian waters [37] with high [chl-a]. This means it is not well-suited for the oligotrophic waters of New Caledonia. Moreover OC5 sets a minimum [chl-a] value of 0.1 µg·L−1, but lower values are sometimes encountered in the New Caledonia lagoon [38] (see also below).
A preliminary study [39], inspired from encouraging results described in [25,26,27,28], showed that a statistical approach could improve [chl-a] retrievals in coastal New Caledonia waters. Indeed, when using a statistical approach, we expect to take into account particularities of optical properties in the study region. Moreover, even though atmospheric correction algorithms are generally not accurate for coastal applications and could lead to large [chl-a] errors, a statistical approach could overcome such problems for our specific area. In this paper, we show how to design a semi-empirical algorithm for estimating [chl-a] in the New Caledonian lagoon from in situ data collected in the region in coincidence with MODIS data from 2002 to 2010 [38]. The resulting algorithm is compared with the NASA OC3 version 6. No comparison was done with the reflectance difference algorithm [20] because our focus is on lagoon waters. The addition of variables such as bathymetry [32,34] and coastal distance [7] is also tested in order to investigate their ability to improve [chl-a] retrievals.

2. Material and Methods

2.1. Data

Two databases are used in this study: world data from SeaWIFS Bio-optical Archive and Storage System (SeaBASS:[40]) and data collected in the New Caledonia area (NCDataBase). Each database contains in situ and MODIS Rrs values in several spectral bands centered on 412 nm, 443 nm, 488 nm, 531 nm, 555 nm, and 667 nm for NCDataBase [29,31,38] and 547 instead of 555 nm for SeaBASS [40,41]. All MODIS Rrs over New Caledonia in the NCDataBase were extracted from 2002 to 2010 [42]. When the two databases are merged, which we call Full DataBase (FDB), it is assumed that the 547 and 555 nm spectral bands give an equivalent signal, i.e., they are considered one category [20]. The NCDataBase contains bathymetry (in meters) and in situ [chl-a] (µg·L−1) measured by fluorometry and spectrofluorometry [29,43]. Water samples were collected from a Niskin bottle at 2 m depth. SeaBASS contains in situ [chl-a] obtained by fluorometry and HPLC, but for consistency only fluorometric measurements were used, and bathymetry was extracted from each latitude-longitude of the measurements.
Figure 1a,b display the distribution of satellite and in situ [chl-a] in SeaBASS and NCDataBase, respectively. The SeaBASS data distribution is bi-modal, with separation at about 3 µg·L−1. The two methods for satellite assessments (closest and weighted mean) are introduced in Section 2.2. When constructing the NCDataBase [43], all field measurements of [chl-a] by fluorometry collected from 1997 to 2010 during more than ten campaigns, mainly in the Southern lagoon [26], were selected with coincident MODIS Rrs. The full area extends from 165.95° to 168.65°E and from 24° to 19.99°S [38]. Figure 2 and Figure 3 display measurement stations and Table 1 gives information about dates and campaigns used for the NCDataBase. Data were collected during each seasonal period for 13 years, which ensures a large range of situations. As several years and all seasons are sampled, we expect no bias due to El-Niño or La Niña event and seasonal variations.
Figure 1. (a) SeaBASS [chl-a] histogram (from the SeaBASS website): green line for in situ measurements and blue line for MODIS Aqua assessments; (b) NCDataBase [chl-a] histogram: green line for in situ measurements, cyan line for “OC3-Closest pixel” assessment and blue line for “OC3-Weight Mean” assessment.
Figure 1. (a) SeaBASS [chl-a] histogram (from the SeaBASS website): green line for in situ measurements and blue line for MODIS Aqua assessments; (b) NCDataBase [chl-a] histogram: green line for in situ measurements, cyan line for “OC3-Closest pixel” assessment and blue line for “OC3-Weight Mean” assessment.
Remotesensing 08 00045 g001
Table 1. Sea campaigns from 1997 to 2010 in New Caledonia in [43].
Table 1. Sea campaigns from 1997 to 2010 in New Caledonia in [43].
Sea CampaignDatesStudy Area
Camelia and Camecal (1–9)21 Oct. 1997 to 27 Jun. 2003South-West lagoon
Diapalis (1–9)13 Oct. 2001 to 15 Oct. 2003Loyalty Channel/Ouinné lagoon
Topaze 1–2926 Apr.2001 to 26 Jan. 2004South-West lagoon
Transects 11 Apr. 2003 to 10 Apr. 2003South-West lagoon
Timeseries12 Dec. 2001 to 22 Apr. 2003South-West lagoon
Transects 24 May 2002 to 29 Feb. 2004South-West lagoon
Southern and Northern 1 transect21 Jun. 2003 to 7 Aug. 2003South-West lagoon
Southern and Northern 2 transect9 Nov. 2004 to 9 Dec. 2004South-West lagoon
Bissecote1 Feb. 2006 to 14 Feb. 2006South-West lagoon
Echolag14 Feb. 2007 to 5 Mar. 2007Great South Lagoon
Zonalis2 Mar. 2008 to 14 Mar. 2008South of New Caledonia
Valhybio22 Mar. 2008 to 8 Apr. 2008LSO and GLS, offshore stations
ValhybioSM27 Apr. 2008 to 21 Jul. 2010Lagoon and offshore OC1 station
Table 2 describes the content of the two databases in terms of [chl-a] values. In the NCDataBase (811 coincidences), we distinguish oceanic waters (bathymetry > 70 m, 159 coincidences) for which the bottom does not affect the water color, lagoon’s deep waters (20 m ≤ bathymetry ≤ 70 m, 352 coincidences) for which the bottom has a priori a little influence on the water color, and lagoon’s shallow waters (bathymetry ≤ 20 m, 300 coincidences) for which the bottom may strongly affect the water color [29]. Similarly, we distinguish waters according to bathymetry for SeaBASS even if the influence of bathymetry is probably not equivalent both in world data and NCDataBase (see Section 4.1). However, thereafter and especially during the construction of our algorithm, no distinction is made based on the depth of the station because the bathymetry was not found to be a good explanatory variable (see Section 4.3). Furthermore, we distinguish data according to [chl-a] values, since we will treat high values (>3 µg·L−1) and low values (≤3 µg·L−1) separately; see Section 2.3.
Figure 2. Visited stations in New Caledonia.
Figure 2. Visited stations in New Caledonia.
Remotesensing 08 00045 g002
Figure 3. Visited stations in the south-west lagoon.
Figure 3. Visited stations in the south-west lagoon.
Remotesensing 08 00045 g003
Table 2. Database description.
Table 2. Database description.
[chl-a] (µg·L−1)
Data baseNMinMax≤3 (%)>3 (%)
FDB (NCDataBase + SeaBASS)13780.0338.0781.2818.72
NCDataBase (<20 m)3000.082.711000
NCDataBase (20 m ≤ bathy ≤ 70 m)3520.113.7099.530.47
NCDataBase (>70 m)1590.081.051000
NCDataBase (total)8110.083.7099.750.25
SeaBASS (<20 m)2620.3738.0713.3686.54
SeaBASS (20 m ≤ bathy ≤ 70 m)200.226.2675.0025.00
SeaBASS (>70 m)2850.0313.1891.588.42
SeaBASS (total)5670.0338.0754.8545.15

2.2. Match-Up

The in situ [chl-a] and Rrs data were matched with MODIS Aqua standard retrievals for NCDataBase at original resolution (1-km, non-gridded data) [41], as provided by the NASA Ocean Color Biology Processing Group (OBPG). The atmospheric correction scheme took into account non-black pixels in the near infrared, but no adjacency effects. SeaDAS flags were applied to the satellite data to eliminate situations with sun glint, large viewing zenith angle, high water turbidity, clouds, land, high top-of-atmosphere radiance, and stray light [42]. To assign a value to a station on a day, two methods were used. The first method consists in assigning the value of the closest pixel: the closest neighbor method (CL) [38,41]. The second method consists in averaging the values from neighboring pixels, using weights depending on the distance to the station: the weighted mean method (WMM) [38,41]. This was done for the spectral bands centered on 412, 443, 488, 531, 555 and 667 nm.
The match-ups from MODIS Aqua images were created using a 0.04° square (about 4 × 4 km²) centered on the visited station as in [41] and in a 5-day temporal window. The two aforementioned methods were compared. They were applied with a temporal window from 0-day to 5-day. Several indices were computed, namely the variation coefficient (VC), the normalized mean bias (NMB), mean normalized bias (MNB), and root mean square error (RMSE):
V C = σ x ¯
N M B = y ¯ x ¯ x ¯
M N B = 1 n i = 1 n y i x i x i
R M S E = 1 n i = 1 n ( y i x i ) 2
where n is the number of observations, x i is the i t h observation of in situ parameter, y i is the i t h observation of remote sensing parameter, x ¯ is the in situ parameter mean, y ¯ is the remote sensing measures mean and σ is the standard deviation of remote sensing measures.
Table 3 displays the comparison statistics for MODIS and in situ Rrs matched data. It highlights that RMSE is not affected by the temporal window with a difference lower than 0.001 both between 0-day CL and 5-day CL, and between 0-day WMM and 5-day WMM. The VC values are very close too (0.358 for 0-day WMM and 0.355 for 5-day WMM). Moreover, NMB and MNB are better with a 5-day window (from −0.266 for 0-day WMM to −0.204 for 5-day WMM for NMB, and from −0.171 for 0-day WMM to −0.101 for 5-day WMM for MNB). Thus using a 5-day temporal window does not affect much the accuracy of results to assess Rrs(443) retrievals and the WMM [29,38,41,43] provides the best performance. Figure 4 and Figure 5 display error densities computed with the different in situ measurements and remote sensing assessments of Rrs(443) for the two methods. “Error densities” enable detection of whether an algorithm tends to overestimate, and whether errors are balanced or distributed around 0. They highlight that errors done with a 5-day temporal window are not much larger than errors made with a narrower window. This is explained by the fact that algorithm errors are similar to those introduces by temporal variability over a few days (see also Section 4.4). Moreover, our full dataset contains more than 86% of match-ups for which the temporal window is lower than or equal to 2 days. In order to keep a maximum of coincidences, we used the 5-day temporal window with the WMM. Since the weighted means method is more efficient, Rrs values were determined using this second method in our NCDataBase to investigate appropriate [chl-a] algorithms for the region.
Table 3. Different methods for generating Rrs(443) (sr−1) match-ups in New Caledonian waters. Min, Max, Mean Median and RMSE are given in sr−1.
Table 3. Different methods for generating Rrs(443) (sr−1) match-ups in New Caledonian waters. Min, Max, Mean Median and RMSE are given in sr−1.
0-day CL3970.00000.02810.00620.00570.4298−0.2700−0.16330.0047
0-day WMM3970.00030.02130.00620.00580.3584−0.2660−0.17050.0044
1-day CL7520.00000.02810.00630.00590.3990−0.2495−0.15920.0047
1-day WMM7520.00030.02130.00650.00620.3452−0.2311−0.14540.0044
5-day CL9860.00000.02810.00620.00580.4096−0.2289−0.12740.0045
5-day WMM9860.00030.02130.00640.00610.3550−0.2044−0.10360.0042
Figure 4. Error densities between in situ measurements and satellite assessments for Rrs(443) at the same day (D0), and from a 1-day temporal (D1) window to a 5-day temporal window (D5). Closest neighbor method.
Figure 4. Error densities between in situ measurements and satellite assessments for Rrs(443) at the same day (D0), and from a 1-day temporal (D1) window to a 5-day temporal window (D5). Closest neighbor method.
Remotesensing 08 00045 g004
Figure 5. Error densities between in situ measurements and satellite assessments for Rrs(443) at the same day (D0), and from a 1-day temporal window (D1) to a 5-day temporal window (D5). Weighted mean method.
Figure 5. Error densities between in situ measurements and satellite assessments for Rrs(443) at the same day (D0), and from a 1-day temporal window (D1) to a 5-day temporal window (D5). Weighted mean method.
Remotesensing 08 00045 g005

2.3. Algorithm Steps

Our goal was to find an algorithm allowing good [chl-a] assessments in the lagoon of New Caledonia from the ocean color imagery acquired by MODIS. When creating the models, explanatory variables for assessing in situ [chl-a] are satellite Rrs. This is a different approach from the OC* algorithms from NASA, which use in situ Rrs as explanatory variables. The statistical study was conducted without a priori knowledge, i.e., all potentially explanatory variables (Rrs in the various spectral bands) were taken in account.
As indicated in Table 2, there are few data with high [chl-a] (>3 µg·L−1) in the NCDataBase. As a result, the algorithm built from the NCDataBase will not be adapted to cases where the [chl-a] is high. The steps to get an algorithm adapted to New Caledonia are the following: (1) using the NCDataBase, determine a model for low [chl-a] (AFLC), i.e. a well-suited model for waters having low [chl-a]; (2) using the SeaBASS database, determine a model for waters with high [chl-a] (AFHC); (3) using the two merged databases, determine a criterion to distinguish low and high [chl-a]; and (4) implement a continuous connection between the models for low and high [chl-a].
Step 1 consists in determining which variables can give a good [chl-a] estimate. As variables are generally not independent, the support vector machine (SVM) method was used to select the best set of explanatory variables [26,28,44]. This kernel method finds the best regression through optimality criteria even if it means increasing the dimension of the variable space. Note that choosing SVM parameters is easier than with a neural network, for which the architecture can be very complex and hard to interpret. A bootstrap with fifty random draws was performed to determine the best parameters. On each draw, each combination of the explanatory variables was used to create and test a model. The number of all the combinations with six variables (Rrs(412), Rrs(443), Rrs(488), Rrs(531), Rrs(555) and Rrs(667)) is 63 ( i = 1 6 ( 6 i ) = 63 ). When a model formed with many variables gave results equivalent to a model formed with fewer explanatory variables, the model with fewer variables was chosen. For each of these 63 models, 50 RMSE values, one per sample, were computed. Results were compared by calculating averages, confidence intervals of RMSE averages, and by testing the equality of means. As computed averages did not follow a Normal Law, the Kruskal-Wallis test of means comparison was applied. For both the SeaBASS and NCDataBase combined, the best results were obtained with Rrs(443), Rrs(488) and Rrs(531). Once the best predictors were known, relations, such as a linear or a log regression, between [chl-a] and predictors and ratios of predictors were sought, with a method similar to the previous one: using bootstrap with 50 draws. With results statistically equivalent on test samples between the best SVM and a simpler relation, the simpler relation was selected.
In Step 2, only data with high [chl-a] were kept to build a specific model for [chl-a] greater than 3 µg·L−1. A SVM model was built with a similar method as in Step 1. The predictive variables are the Rrs in the five spectral bands centered on 412, 443, 488, 531, and 555 nm. This SVM model was compared to OC3. The best model between this SVM and OC3 was chosen to complete the algorithm for high [chl-a].
Step 3 consists in determining from MODIS Rrs if the [chl-a] is high or low. In this step, two methods were tested to determine what MODIS color ranges are linked to a high or a low [chl-a]: SVM (as a classifier) and decision tree. As explained in more detail later (Section 4.1), the decision tree was preferred to the SVM because of its practicality. Indeed, only the ratio R r s ( 488 ) / R r s ( 555 ) is used to determine which group of [chl-a] should be linked to a MODIS color.
For Step 4, several kinds of continuous connections, with weight functions, between the AFHC and the AFLC were tried: linear, quadratic, root squared, logarithmic, exponential, and arc-tangential. Equations (5.1)–(5.4) describe some weight functions with s the threshold determining the limit between high and low [chl-a], ε ] 0 ; s [ the tolerance used to set the transition interval width, a = s   ε the inferior bound of the transition interval, b = s + ε the superior bound of the transition interval, and x is the variable which represents the ratio R r s ( 488 ) / R r s ( 555 ) .
Linear: f ( x ) =   { 0   w h e n   x   [ 0 ; a ] x a b a   w h e n   x   ] a ; b [ 1   w h e n   x   b
Quadratic: f ( x ) =   { 0   w h e n   x   [ 0 ; a ] ( x a b a ) 2   w h e n   x   ] a ; b [ 1   w h e n   x   b
Square root: f ( x ) =   { 0   w h e n   x   [ 0 ; a ] x a b a   w h e n   x   ] a ; b [ 1   w h e n   x   b
Arc-tangential: f ( x ) =   { 0   w h e n   x   [ 0 ; a ] tan 1 ( [ 1 b x   1 x a ] b a s ) / π +   1 / 2   w h e n   x   ] a ; b [ 1   w h e n   x   b
Given a value of the ratio R r s ( 488 ) / R r s ( 555 ) , the weight function f is applied to the value determined by the AFLC algorithm, and the weight function 1 f is applied to the value determined by the AFHC (SVM or OC3).
We also tested a general SVM (SVMg) from the merged NCDataBase and SeaBASS database, built without differentiating between the two [chl-a] groups. For this SVMg construction, we used the bootstrap method described before (selection with 50 random draws). Explanatory variables belonging to the model were selected with learning and test samples. The model with the lower RMSE on test samples was retained. In this SVM, the kernel is the radial basis kernel and predictors are Rrs channels 443, 531 and 555 nm. The SVMg and the “AFLC + AFHC” algorithms were also compared with OC3.

2.4. Statistical Tests

In order to verify the effectiveness of an algorithm without an overtraining effect, data were systematically divided into two samples: one learning sample to build the model, and one test sample on which the built model was applied and checked with indicators (specified after). The learning sample was constructed with 70% of the data and the test sample was formed with the remaining 30%. To maintain the proportions between high and low [chl-a] in each sample when the NCDataBase and SeaBASS database were merged, samples were obtained with “semi-random draws”, i.e., the dataset was partitioned into two groups (high and low [chl-a]) and then a random draw was made for each group. For model comparisons, we essentially used RMSE (Equation (4)). In order to not rely on a single indicator, we also calculated the correlation coefficient between the values given by algorithms and the measured values, which provided a measure of “the link between two random variables” [31].

3. Results

3.1. Algorithm Specifics

As indicated above, the algorithm uses a log-log linear model for [chl-a] below 3 µg·L−1 and an SVM model or OC3 for [chl-a] above 3 µg·L−1. The log-log linear model, built in Step 1, uses the Rrs ratio of spectral bands centered on 488 and 531 nm and 443 and 531 nm, that is:
ln ( c h l a ) = α   ln ( R r s ( 488 ) R r s ( 531 ) ) +   β   ln ( R r s ( 443 ) R r s ( 531 ) ) +   γ
where α = 2.53276 , β = 0.49286 and γ = 0.16763 . This model has similarities with OC3. The polynomial used in OC3 is a 4th degree polynomial with one variable, log[max(Rrs(443)/Rrs(555); Rrs(488)/Rrs(555))], and the polynomial used in our relation is a 1st degree polynomial with two variables. The SVM model for high [chl-a], built in Step 2, uses a radial basis kernel and Rrs in five spectral bands (412, 443, 488, 531 and 555 nm) as predictors.
For Step 3, the two tested methods to determine whether the MODIS color is linked to a high or a low [chl-a] (SVM vs. decision tree) provided close results. With optimal parameters for the SVM determined thanks to statistical tests on bootstrap, 95.9% success on tests was achieved in determining whether a pixel had a [chl-a] below or above 3 µg·L−1. The decision tree gave a 95.7% success rate with only two branches:
  • R r s ( 488 ) R r s ( 555 ) 0.76 : 97% of the pixels have a low [chl-a] and 3% have a high [chl-a];
  • R r s ( 488 ) R r s ( 555 ) < 0.76 : 12% of the pixels have a low [chl-a] and 88% have a high [chl-a].
Given the close results obtained with the SVM and decision tree methods and the simplicity of the latter, the decision tree method was selected to determine pixel category.
To apply our Complete Algorithm (AFLC + AFHC), we proceeded as follows: (1) determine from the 488/555 nm ratio whether the studied pixel should be considered a low or a high [chl-a] pixel; (2) if it is a low [chl-a] pixel, apply the AFLC model (Equation (6)) and if it is a high [chl-a] pixel, apply the AFHC model (SVM model or OC3). Finally, to deal with algorithm continuity, determine the value of the weighting function f (see Section 2.3) and apply it to the estimated [chl-a] value. Tests of our algorithm were first performed without continuous connection, and later on different types of weight functions are shown in Paragraph 3.3.

3.2. Algorithm Performance

Our Complete Algorithm (AFLC + AFHC) was compared with two other ones, SVMg and OC3, and forms of the AFHC (SVM model or OC3) were also compared. Comparisons were carried out on the NCDataBase (for shallow and deep lagoon waters, for oceanic waters, and for all kinds of water) and on the merged NCDataBase and SeaBASS database (Full DataBase: FDB).
Table 4 provides basic statistics, i.e., mean, variance, and range of RMSE, computed on 50 test samples randomly drawn as explained in Section 2.3 and for different depths as explained in Section 2.1 and Table 2. Figure 6 displays scatterplots of estimated and measured [chl-a] values for the various algorithms and databases considered. The comparisons are made on data from a Test sample containing data from SeaBASS and from NCDataBase (A, C, and E), and on full data from NCDataBase, containing data of the learning sample and data of the test sample (B, D and F). As the bathymetry is known on registered stations in NCDataBase (frames B, D, and F), differences in behavior of the algorithm are highlighted for shallow lagoon waters with a bathymetry < 20 m, for deep waters of the lagoon (20 m ≤ bathymetry ≤ 70 m), and for oceanic waters with a bathymetry > 70 m. The separation set to 20 m between shallow and deep waters of the lagoon was determined from observations on data gathered in the lagoon of New Caledonia [23,34].
Table 4. Comparison of algorithm performance on 50 test samples.
Table 4. Comparison of algorithm performance on 50 test samples.
Product NameDatabaseMean of RMSE (µg·L−1)Variance of RMSE (µg·L−1)2RMSE Range (µg·L−1)
AFLC + SVMFDB2.8110.2171.859–3.799
AFLC + OC3FDB2.8710.2021.993–3.798
OC3NCDataBase (shallow)1.1300.0630.700–1.709
SVMgNCDataBase (shallow)0.9400.0610.584–1.560
AFLC + SVMNCDataBase (shallow)0.9230.2220.234–1.829
AFLC + OC3NCDataBase (shallow)0.7130.1130.234–1.463
OC3NCDataBase (deep)0.3630.0110.228–0.558
SVMgNCDataBase (deep)0.4120.0110.294–0.816
AFLC + SVMNCDataBase (deep)0.3640.0520.149–0.802
AFLC + OC3NCDataBase (deep)0.2800.0120.149–0.481
OC3NCDataBase (ocean)0.2080.0010.164–0.256
SVMgNCDataBase (ocean)0.4060.1250.215–2.767
AFLC + SVMNCDataBase (ocean)0.1630.0010.108–0.217
AFLC + OC3NCDataBase (ocean)0.1630.0010.108–0.217
OC3NCDataBase (total)0.6690.0170.426–0.969
SVMgNCDataBase (total)0.6670.0230.442–1.075
AFLC + SVMNCDataBase (total)0.5890.0600.205–1.108
AFLC + OC3NCDataBase (total)0.4490.0270.205–0.829
On the FDB, the “AFLC + SVM” algorithm globally outperforms OC3 and SVMg. On FDB, results are very similar as the mean of RMSE only differs by 1% and the main difference between OC3 and “AFLC + SVM” is the RMSE range which is more extended for “AFLC + SVM”. Consequently, the “AFLC + SVM” algorithm is able to give better results on some samples but assessments are sometimes worse. On New Caledonia data (NCDatabase), and for the different depth groups, results are better with the “AFLC + *” especially in shallow waters and in the open ocean, but less improved for the deep waters in the lagoon. On the total NCDataBase, mean of RMSE is 12% lower with “AFLC + SVM” (Mean of RMSE = 0.589 µg·L−1) than with OC3 (Mean of RMSE = 0.669 µg·L−1). Using OC3 with AFLC rather than SVM enables better results on New Caledonia data. The mean of RMSE is about 33% lower with “AFLC + OC3” (Mean of RMSE = 0.449 µg·L−1) than with OC3 (Mean of RMSE = 0.669 µg·L−1). Results are also improved with “AFLC + OC3” both in shallow lagoon waters and in deep lagoon waters.
Table 4 highlights why the AFLC model was preferred to a SVM model. Indeed, for the NCDataBase, models using AFLC provides better results than SVMg with mean RMSE values of 0.449 µg·L−1 (AFLC + OC3) and 0.589 µg·L−1 (AFLC + SVM) instead of 0.667 µg·L−1. Consequently, choosing the simple model (i.e., AFLC, Equation (6)) is obvious.
On graphs where the AFLC algorithm is used on world data (Figure 6a,c), there is a separation between the two scatterplots of low and high [chl-a]. This phenomenon, partly due to a lack of [chl-a] data in the range of 1–5 µg⋅L−1, is less obvious with OC3 (Figure 6e). It is linked, however, to the fact that two different algorithms are applied below and above 3 µg⋅L−1, hence the need to introduce a continuous connection between the two models forming the complete algorithm. According to the bathymetry in oligotrophic waters (Figure 6b,d,f), overestimation in shallow waters and underestimation in the open ocean in New Caledonian waters with OC3 both disappear. Figure 6e shows that both the overestimation in shallow waters and underestimation in deep waters by OC3 is not observed with the SeaBASS data. This means the real improvement is made in the New Caledonia area (Figure 6b,d,f), for which points of oceanic stations as well as points of shallow stations are distributed around the first bisector. The AFLC points are generally closer to the line y = x than the OC3 points. In some instances, overestimation is large, especially in shallow waters, but reduced when using AFLC instead of OC3 in oligotrophic waters.
As explained in Section 2.2, “error densities” highlight if an algorithm tends to overestimate or underestimate. Comparisons are made on errors of logarithms in base 10, i.e., for estimates X ^ with an algorithm of a random variable X , the density of log ( X ) log ( X ^ ) was plotted. Figure 6 displays results for a test sample containing data from SeaBASS and NCDataBase (A) and data uniquely from NCDataBase in the same test sample (B). With the Full DataDase (Figure 7a), the difference is not obvious between the three algorithms but errors are closer to 0 with algorithms using AFLC. With data from New Caledonia (Figure 7b), the error density graph shows much better performance by both algorithms using AFLC (“ALFC + SVM” or “AFLC + OC3”—exactly the same curves on Figure 7b) compared with OC3, as error density is higher around 0 and the curve is narrower (most values are between −0.5 and 0.5), indicating smaller errors. The benefits of using AFLC are highlighted with errors distributed around 0 and less dispersed than with OC3, especially for the New Caledonia data.
Finally, the results reveal that using an SVM model for high [chl-a] does not provide any improvement. This may be attributed to the fact that the SVM model uses Rrs in spectral bands in the blue that may be noisy and not sensitive to [chl-a] in the presence of CDOM. Moreover, the use of OC3 to complete the algorithm for waters with high [chl-a] provides good results and is more generic than a SVM model. The use of SVM is suitable for the New Caledonia area, but not necessarily for other parts of the world.
Figure 6. Comparisons of different algorithms on a test sample including data from SeaBASS and data from NCDataBase (left column) and on NCDataBase (right column) (a) “AFLC + OC3”; (b) idem on full NCDataBase; with 3 bathymetry groups (c) “AFLC + SVM”; (d) idem on full NCDataBase; (e) Result of OC3; (f) Idem on full NCDataBase; RMSE, R2ajusted and the linear regression line between log ( c h l a i n   s i t u ) and log ( c h l a a l g o r i t h m ) . The line y = x is red and the regression line is green.
Figure 6. Comparisons of different algorithms on a test sample including data from SeaBASS and data from NCDataBase (left column) and on NCDataBase (right column) (a) “AFLC + OC3”; (b) idem on full NCDataBase; with 3 bathymetry groups (c) “AFLC + SVM”; (d) idem on full NCDataBase; (e) Result of OC3; (f) Idem on full NCDataBase; RMSE, R2ajusted and the linear regression line between log ( c h l a i n   s i t u ) and log ( c h l a a l g o r i t h m ) . The line y = x is red and the regression line is green.
Remotesensing 08 00045 g006
Figure 7. Log[chl-a] error densities for two test samples. (a) Test sample from Full DataBase; (b) Test sample uniquely from NCDataBase.
Figure 7. Log[chl-a] error densities for two test samples. (a) Test sample from Full DataBase; (b) Test sample uniquely from NCDataBase.
Remotesensing 08 00045 g007

3.3. Continuous Connection between Low and High [chl-a]

Performing the continuous connection between AFLC and OC3 provides a complete algorithm (i.e., over the entire [chl-a] range) without unrealistic transition between low and high [chl-a]. Table 5 lists RMSE computed on the two databases and on a test sample used for validation, and Figure 8 allows one to compare the continuous connections obtained using various weighting functions. Parameters are s = 0.76 and ε = 0.2 .
Table 5. RMSE (in µg·L−1) shown to compare continuous connections on different databases.
Table 5. RMSE (in µg·L−1) shown to compare continuous connections on different databases.
Algorithms without and with ConnectionsRMSE on NCDataBaseRMSE on SeaBASSRMSE on a Test Sample
No continuous connection (AFLC + OC3)0.4962.9762.000
Linear weight function0.4482.7992.050
Square root weight function0.3662.8742.161
Quadratic weight function0.5152.7662.011
Logarithmic weight function0.4152.8452.087
Exponential weight function0.4692.7732.033
Arc-tangential weight function0.4762.9102.021
The continuous connection between AFLC and OC3 works well with all the weighting functions tested: the shift around 3 µg·L−1 in Figure 6a,c has disappeared in Figure 8. RMSE computed on NCDataBase is lower using “AFLC + OC3” continuously connected with weighting functions (0.515 µg·L−1 maximum) than using OC3 (0.640 µg·L−1). Results provided by the different kinds of connection are very close. For the test sample (last column of Table 5), RMSE values are between 2.011 µg·L−1 and 2.161 µg·L−1. Moreover, accuracy is not greatly affected according to Table 5, i.e., results in terms of performance are very close with and without the connection scheme. The worst result obtained with a quadratic weight function provides a RMSE 8% higher than does “AFLC + OC3” without continuous connection.
Figure 8. Complete algorithm with a continuous connection applied on a test sample (a) with a linear; (b) a quadratic; (c) a square root and (d) an arc-tangential connection. Comments are similar to comments on Figure 6.
Figure 8. Complete algorithm with a continuous connection applied on a test sample (a) with a linear; (b) a quadratic; (c) a square root and (d) an arc-tangential connection. Comments are similar to comments on Figure 6.
Remotesensing 08 00045 g008

3.4. Application to MODIS Imagery

Figure 9a,b display the [chl-a] imagery obtained by applying OC3 to two parts of the New Caledonia area (North-East and South lagoon), and Figure 10a,b display the corresponding imagery obtained by applying “AFLC + OC3 linearly connected”. Results obtained with other weighting functions for the linear connection are very close, except for very coastal pixels, and are not shown here.
The best results are obtained in the lagoon. In Figure 9a and Figure 10a (North-East lagoon) [chl-a] values are less saturated in bays and in the whole lagoon. Around reefs, our algorithm gives high [chl-a] but values are lower than those provided by OC3. In the black circle, [chl-a] is 2.6 µg·L−1 with OC3 and 1.5 µg·L−1 with the proposed algorithm. In Figure 9b and Figure 10b (South lagoon), [chl-a] values in the red circle are about 0.2 µg·L−1 with our algorithm, but OC3 gives 0.3 µg·L−1.
Maps built with AFLC + OC3 exhibit more homogeneous [chl-a] patterns than with OC3, especially in the open ocean and in the lagoon. However, both types of maps highlight the major difference between the lagoon and the open ocean [chl-a]. Near reef areas, the AFLC + OC3 values are more realistic, and continuity is better with adjacent waters. Observations between in situ [chl-a] and [chl-a] retrieved using OC3 have led us to conclude that OC3 overestimates [chl-a] in major parts of the lagoon but underestimates [chl-a] in the open ocean (Figure 6). The AFLC + OC3 algorithm balances these errors, providing lower [chl-a] waters in the lagoon and higher [chl-a] in the open ocean. The differences observed between OC3 and in situ [chl-a] in the open ocean may be partly due to the fact that our in situ [chl-a] are issued from fluorometric extraction method, i.e., they may be slightly higher than HPLC values (used in constructing OC3). Since our algorithm AFLC + OC3 was built from our in situ dataset, it is not surprising that it yields higher [chl-a] outside the lagoon.
Figure 9. (a) Coral reefs and [chl-a] assessment (µg·L−1) with OC3 in the North-East lagoon of New Caledonia on 20 July 2008; (b) Coral reefs and [chl-a] assessment (µg·L−1) with OC3 in the South lagoon of New Caledonia on 20 July 2008.
Figure 9. (a) Coral reefs and [chl-a] assessment (µg·L−1) with OC3 in the North-East lagoon of New Caledonia on 20 July 2008; (b) Coral reefs and [chl-a] assessment (µg·L−1) with OC3 in the South lagoon of New Caledonia on 20 July 2008.
Remotesensing 08 00045 g009
Figure 10. (a) Coral reefs and [chl-a] assessment (µg·L−1) with “AFLC + OC3 linearly connected” in the North-East lagoon of New Caledonia on 20 July 2008; (b) Coral reefs and [chl-a] assessment (µg·L−1) with “AFLC + OC3 linearly connected” in the South lagoon of New Caledonia on 20th July 2008.
Figure 10. (a) Coral reefs and [chl-a] assessment (µg·L−1) with “AFLC + OC3 linearly connected” in the North-East lagoon of New Caledonia on 20 July 2008; (b) Coral reefs and [chl-a] assessment (µg·L−1) with “AFLC + OC3 linearly connected” in the South lagoon of New Caledonia on 20th July 2008.
Remotesensing 08 00045 g010
Figure 11 displays the distribution of [chl-a] estimated using OC3 and “AFLC + OC3 linearly connected” in the lagoon of New Caledonia on 20 July 2008. We observe that [chl-a] values are more homogeneous with “AFLC + OC3 linearly connected” but centered on the in situ median value and may be less sensitive to the bottom effects. The range of values extends from 0.04 to 58.26 µg·L−1 whereas with OC3 it extends from 0 to 1913 µg·L−1 (Table 6). The interquartile range is about two times lower with “AFLC + OC3 linearly connected” showing a much lower spread.
Figure 11. Densities of [chl-a] assessments in the lagoon of New Caledonia on 20 July, 2008. The red line is the OC3 assessment density and the blue line is the “AFLC + OC3 linearly connected” assessment.
Figure 11. Densities of [chl-a] assessments in the lagoon of New Caledonia on 20 July, 2008. The red line is the OC3 assessment density and the blue line is the “AFLC + OC3 linearly connected” assessment.
Remotesensing 08 00045 g011
Table 6. Quantiles of [chl-a] assessments (µg·L−1) in the lagoon of New Caledonia on 20 July 2008.
Table 6. Quantiles of [chl-a] assessments (µg·L−1) in the lagoon of New Caledonia on 20 July 2008.
QuantileOC3AFLC + OC3 Linearly Connected

4. Discussion

4.1. Comparison with Other Algorithms

On the one hand, numerical indicators, providing a global view since they were computed thanks to several draws, clearly demonstrate that AFLC, especially “AFLC + OC3”, is well suited for waters in New Caledonia and it can be equivalent to OC3 for world data. On the other hand, error densities in Figure 7 for algorithms using AFLC are higher around 0 and narrower than for OC3. This shows that using AFLC allows one to get smaller errors. Figure 6 also underlines that overestimation in shallow waters and underestimation in oceanic waters of New Caledonia are reduced with AFLC.
Moreover, seabed effects are reduced in the lagoon. In areas where coral reefs and white sand are present (south lagoon), [chl-a] is overestimated with OC3 (values higher than 5 µg·L−1), but with AFLC + OC3 values are generally lower than 1 µg·L−1. Figure 9 and Figure 10 indicate that the effect of coral reefs on [chl-a] retrieval is smaller with this new algorithm compared to OC3: [chl-a] values are smoother around the coral reef barrier. Since [chl-a] estimated with the complete algorithm (AFLC + OC3 continuously connected) are higher in the open ocean and lower in the lagoon than with OC3, and as the bottom effect of shallow waters is reduced, results are globally satisfying.
Note that the effect of bathymetry seen on NCDataBase with OC3 (Figure 6f) is less obvious on world data (Figure 6e). There is nonetheless a small underestimation for the offshore group (dark blue) especially for low [chl-a]. In Figure 6a,c, low [chl-a] points are also closer to the y = x line. Even though the underestimation with OC3 is weak on offshore world data, the use of AFLC reduces this underestimation.
Table 7 shows the comparison between OC3 and “AFLC + OC3”. This comparison was computed on FDB (merged SeaBASS and NCDataBase), on SeaBASS only, on NCDataBase only for the different bathymetry groups. It provides further information about the effect of bathymetry on algorithm performance. Errors are generally higher in shallow waters for both algorithms. Indeed, RMSE and MNB are always larger in the group “bathy < 20 m”, except on “SeaBASS (20 m < bathy < 70 m)” for which the MNB value is up to 1.29 µg·L−1 with OC3. Bathymetry obviously decreases the algorithm performance in shallow waters and the best results are obtained in the deepest waters where the water column is sufficiently deep to avoid a bottom effect on assessments. For instance, RMSE values computed with OC3 are equal to 4.59 µg·L−1 for the group “bathy < 20 m” and equal to 1.26 µg·L−1 for the group “bathy > 70 m” and results are similar with AFLC + OC3 with 4.66 µg·L−1 and 1.33 µg·L−1, respectively. MNB values highlight the efficiency of AFLC in each bathymetry group, since they are always lower than those computed with OC3. This observation clearly shows that AFLC provides better results, especially for low [chl-a] in both SeaBASS and NCDataBase.
Table 7. Algorithm performance according to bathymetry.
Table 7. Algorithm performance according to bathymetry.
RMSE (µg·L−1)MNB
FDB (bathy < 20 m)4.594.660.690.42
FDB (20 m ≤ bathy ≤ 70 m)0.420.390.240.14
FDB (bathy > 70 m)1.261.330.100.03
SeaBASS (bathy < 20 m)6.236.370.410.35
SeaBASS (20 m ≤ bathy ≤ 70 m)
SeaBASS ocean1.561.650.380.05
NCDataBase (bathy < 20 m)1.080.821.010.49
NCDataBase (20 m ≤ bathy ≤ 70 m)0.340.300.180.13
NCDataBase (bathy > 70 m)0.210.16−0.40−0.01
The main difference between OC3 and algorithms based on AFLC is in concept/design. OC3 is built by empirically regressing in situ [chl-a] against in situ Rrs ratios. Then, users count on a suitable atmospheric correction to retrieve Rrs and to apply the relation obtained using in situ Rrs. For AFLC, in contrast, match-ups link directly in situ [chl-a] to satellite Rrs, already atmospherically corrected. Consequently, the relation found with the AFLC algorithm is dependent of the MODIS sensor and of the atmospheric correction applied to retrieve Rrs. It will be interesting to check whether the change in the coefficients α,β,γ will provide good results or whether a different relation than Equation (6) should be used for another sensor and/or another atmospheric correction scheme.
The atmospheric correction used by the NASA OBPG assumes a relation between red and infrared Rrs. This relation is not necessarily suitable for the New Caledonian lagoon. Even if it is not perfect, statistical learning from the large dataset NCDataBase enables this uncertainty to be overcome or reduced. Improved [chl-a] are expected, but only when atmospheric corrections will provide more accurate Rrs values.

4.2. Functional Form of the Algorithm

During the selection of predictors via SVM regression, no a priori information was retained. All models using combinations of available variables were tested in the same way. From this variable selection, it appears that the best wavelengths to retrieve [chl-a] are already known and used by most algorithms. Thus in the log-log regression (Equation (6)), the chosen wavelengths are 443, 488 and 531 nm; they correspond to blue and green light. Interestingly 555 nm was not selected, and this might be due to a larger sensitivity to bottom effects at that wavelength than at 531 nm in the New Caledonia lagoon. Sensitivity to [chl-a] is less using 531 instead of 555 nm, which may partly explain the reduced range in retrieved [chl-a] values. Finally, to retrieve [chl-a], we use a first-degree polynomial with two variables, each a ratio of Rrs in the blue and green.
When it has to be determined whether a pixel should be classified in the group of high or low [chl-a], the selected reflectance ratio is the ratio of Rrs at 488 and 555 nm, i.e., blue and green, which is consistent with expectations e.g., [11]. Note that Kahru et al. [45] found a similar relation in the California Current. They used the ratio of 488 and 547 nm to determine if [chl-a] is either high or low and they found that when Rrs(488)/Rrs(547) < 0.8, [chl-a] is greater than or equal to 3.3 µg·L−1.

4.3. Adding Other Variables Than Reflectance in the Area of New Caledonia

For data from the New Caledonia lagoon, we knew the bathymetry and we had computed the distance of stations to the coast. Since the bottom affects the [chl-a] estimate [29,34,35], we tried to eliminate the bottom effect by introducing bathymetry as an explanatory variable. The distance to the coast might help to determine if the station could be impacted by high nutrient and/or sediment inputs, facilitating or mitigating phytoplankton production and thus modifying the chlorophyll concentration. A station close to the coast generally has greater [chl-a] than far distant stations. Unfortunately, neither bathymetry nor distance to the coast was a variable statistically decisive for estimating [chl-a] in the New Caledonia lagoon. This suggests that bathymetry information, in some way to be determined, is contained in the MODIS-derived Rrs.
We have found that overestimation in shallow waters and underestimation in oceanic waters have been largely eliminated with our complete algorithm, since scatter plots are fairly distributed around the y = x line (Figure 6). Thus using Rrs is probably sufficient. Nevertheless, in recent studies, bottom types were mapped in the south part of the lagoon in New Caledonia [35]. This information could be used to get more efficient estimates. According to bottom types, we could predict whether an algorithm tends to overestimate or underestimate [chl-a]. Potentially, bathymetry would complete this new information with coefficients relative to depth, and the bottom effect on [chl-a] assessments could be reduced. With such an approach, it would be possible to generalize the algorithm to other areas in the world, provided that we can retrieve both bottom color and bathymetry maps in these other areas with dedicated algorithms [33,34,35].

4.4. Temporal Window for Match-Ups

An a posteriori check was performed concerning the 5-day temporal window. The algorithm is applied to the full NCDataBase and results were tested according to the temporal window of each match-up with the WMM for Rrs (better than the CL—see Section 2.2).
Table 8 shows that in the New Caledonia area, choosing 0-day or 5-day match-ups are equivalent, as shown in Table 3 and Figure 4 and Figure 5. Indeed, the greatest difference in terms of RMSE is 0.092 µg·L−1, obtained for AFLC + OC3 without continuous connection. There are more differences in choosing the algorithm (OC3, AFLC or mixing AFLC and OC3) than in choosing a temporal window of 0 or 5 days, suggesting that algorithm errors could be more important than the variability for two or five days.
Recall that match-ups are determined in a 0.04° square centered on a station in spite of significant variability in bottom types and bathymetry in the lagoon. The Rrs and therefore [chl-a] estimates are also affected by atmospheric correction. We must therefore admit that spatial errors can be greater than temporal errors, but this does not mean that lagoon waters in New Caledonia have a high residence time. On the contrary, coastal variability from rivers, upwelling processes and tides have major impacts [13,14]. A larger temporal window provides more match-ups, indeed, but improving atmospheric correction and getting coincidences as close as possible both spatially and temporally will certainly improve [chl-a] assessments.
Table 8. Indices to compare the accuracy of assessment when using different temporal windows for six algorithms: OC3, AFLC, AFLC + OC3 without continuous connection, AFLC + OC3 with a linear connection, AFLC + OC3 with a quadratic connection and AFLC + OC3 with an arc-tangential connection. Min, Max, Mean, Median, and RMSE are given in µg·L−1.
Table 8. Indices to compare the accuracy of assessment when using different temporal windows for six algorithms: OC3, AFLC, AFLC + OC3 without continuous connection, AFLC + OC3 with a linear connection, AFLC + OC3 with a quadratic connection and AFLC + OC3 with an arc-tangential connection. Min, Max, Mean, Median, and RMSE are given in µg·L−1.
0-day OC33300.0197.9120.5980.3821.8470.2800.2700.682
0-day AFLC3300.0981.4880.4440.3870.508−0.0510.1770.296
0-day no continuous connection3300.0987.9120.5140.3871.5730.0990.2230.588
0-day linear connection3300.0987.0230.5080.3871.3580.0880.2260.493
0-day quadratic connection3300.0987.7890.5290.3871.5740.1320.2460.580
0-day arc-tangential connection3300.0987.6570.5120.3871.4760.0940.2250.542
1-day OC36140.0198.0530.5770.3861.8120.3150.3140.667
1-day AFLC6140.0981.8930.4370.3890.528−0.0030.1920.263
1-day no continuous connection6140.0988.0530.4850.3891.4520.1050.2420.547
1-day linear connection6140.0987.2490.4890.3891.3070.1140.2500.483
1-day quadratic connection6140.0987.9480.5060.3891.5050.1530.2690.558
1-day arc-tangential connection6140.0987.8250.4880.3891.3950.1130.2460.519
5-day OC38110.0198.0530.5650.3851.6880.2870.3070.640
5-day AFLC8110.0981.8931.4360.3910.518−0.0070.1720.267
5-day no continuous connection8110.0988.0530.4720.3911.2860.0750.2090.496
5-day linear connection8110.0987.2490.4780.3911.1780.0890.2220.448
5-day quadratic connection8110.0987.9480.4930.3911.3570.1240.2410.515
5-day arc-tangential connection8110.0987.8250.4760.3911.2460.0850.2170.476

4.5. Behavior of the Algorithm in New Caledonian Waters

This study was conducted on both a lagoon and the ocean. Recall that the waters in New Caledonia are oligotrophic. The lagoon is delimited by coral reef and is connected to the open ocean by passes (Figure 2, Figure 3, Figure 9 and Figure 10). Bathymetry in lagoon does not exceed 70 m. The effect of non-algae particles (mineral suspended solids—NAP) is negligible in this typical oligotrophic lagoon, except in small enclosed and shallow bays. Measured SPM concentration ranges from 0.2 mg·L−1 off the barrier reef to 0.38 mg·L−1 in the middle part of the lagoon (deep lagoon) to up to 2 mgL−1 in bays [46] (there are exceptional values of 6 mg·L−1 in some laterite impacted bays during special events, such as cyclones or strong rains [29,31]). In this study, few match-ups are in bays, and it is difficult to get match-ups during rainy events because of clouds. Thus most of our coincidences in NCDataBase are in fact obtained over clear waters, i.e., the impact of NAP on algorithm performance is not significant.
In New Caledonian waters, despite of many differences in water compositions between ocean and lagoon waters, the new algorithm especially designed in this study provides better results for both lagoon and offshore waters (Table 4, Figure 6 and Figure 7).

5. Conclusions

In this paper, we have introduced an algorithm for estimating [chl-a] from satellite-derived Rrs without a priori information, based solely on statistical considerations. Through this approach, we have obtained a suitable algorithm for optically complex waters of New Caledonia. The bottom influence in the lagoon is smaller than with OC3. The main improvement is obtained for waters with [chl-a] less than 3 µg·L−1, with a RMSE 30% lower in average than with OC3 in New Caledonian lagoon waters. We have also shown satisfactory results for both world data and New Caledonia data.
It is notable, but not surprising, that the best explanatory variables from the SVM regression analysis are Rrs corresponding to wavelengths of blue and green light. For the data sets considered, the best wavelengths are 443, 488, and 531 nm. To classify a pixel in the group of high or low [chl-a], it is sufficient to simply use a threshold in the ratio of Rrs in the blue (488 nm) and green (555 nm), here 0.76 to separate waters with [chl-a] below and above 3 µg·L−1. This algorithm is sensor-dependent but it had been constructed and checked with around 1400 match-ups from two different data sources. The risks of overtraining are very low and it is therefore possible to apply this algorithm at least to MODIS data. Tests should be performed to extend this algorithm to other sensors and coefficients should be adjusted accordingly.
A great deal of work is ongoing concerning atmospheric correction in coastal, optically complex waters. This is essential to obtain satisfactory match-ups. A major step will be made when much better agreement is obtained with in situ Rrs measurements. The [chl-a] algorithms will then provide more accurate results, allowing more efficient evaluation of the impact of environmental stress factors on lagoon ecosystems, especially coral reefs. Stress factors affect coral health both with intensity and time, hence the interest in having a continuous monitoring of water properties over large areas, which is only possible thanks to satellite data.


This work was supported by the French INSU Programme National de Télédétection Spatiale “Validation hyperspectral biogeochemical model (VALHYBIO)”, IRD SPIRALES “VALHYSAT”, INSU Ecosphère continentale: processus et modélisation ECCO “Transfer of organic matter from land to ocean (TREMOLO)”, the “Grand Observatoire de l’Environnement et de la biodiversité terrestre et marine du Pacific Sud” (GOPS) “Dissolved organic matter around Pacific islands (DROPS)”, and “CNRT Nickel et son Environnement DYNAMINE” projects. R. Frouin was supported by grants from the National Aeronautics and Space Administration (NASA). We thank the captains and crews of IRD R/V Alis and R/V Coris and engineers and technicians of the IMAGO UMS Jean-Yves Panché, Jean-Pierre Lamoureux, David Varillon, Léocadie Jamet, Francis Gallois and finally Philippe Gérard for [chl-a] analyses, and all IRD researchers and students who organized cruises and provided the [chl-a] data from the NCDataBase for other cruises than Topaze, Valhybio, ValhybioSM and Diapalis, i.e., Renaud Fichez, Jacques Neveux, Pascal Douillet, Sylvain Ouillon, Jean-Michel Fernandez, Jean-Marie Froidefond, Jean-Pascal Torréton, Martine Rodier, Aubert Le Bouteiller, Robert Le Borgne, Christophe Menkes, Emma Rochelle-Newall, Xavier Mari, Christel Pinazo, Olivier Pringault, Jean-Pierre Lefebvre, Marc Despinoy, Loic Charpy (†), Sandrine Chifflet, Séverine Jacquet, Vincent Faure, Sandrine Bouisset, Xavier Combres, Guillaume Dirberg, Anthony Gonzalez, Marcio Tenorio, Aymeric Jouon, Sébastien Hochard, Clément Fontana, Rosalie Fuchs. We especially thank Tatiana Donnay-Savranski, who extracted the data used to create match-ups in the New Caledonian area, and Guillaume Rousset, who extracted MODIS images for applying the algorithm. We gratefully acknowledge the NASA Ocean Biology Processing Group (OBPG) for making MODIS ocean-color imagery and products available and for allowing access to in situ data via the SeaWIFS Bio-optical Archive and Storage System (SeaBASS). We thank all administrative support from IRD Center of Noumea. Diapalis 1-9 cruises were supported by INSU PROOF “Diazotrophy in a Pacific Zone”, Camelia transects, Time series, Echolag, Camecal 1-9 and different transects, by the French Programme National Environnement Côtier (PNEC) “Chantier Nouvelle-Calédonie”, Bissecote by the CNRS ACI “Observation de la Terre” program, Zonalis by New Caledonia ZONECO program, and Valhybio and ValhybioSM by PNTS Valhybio.

Author Contributions

Guillaume Wattelez and Morgan Mangeas built the algorithm and performed the analyses. Cecile Dupouy organized the in situ database from IRD cruises and is engaged in the elaboration of ocean color algorithms for New Caledonia lagoon waters. Jérôme Lefèvre provided MODIS satellite images in a database to easily get match-ups. Robert Frouin and Touraivane provided advice about methodology and results interpretation, and reviewed and edited this article with the other authors.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Labrosse, P.; Fichez, R.; Farman, R.; Adams, T. Regional Chapters: The Indian Ocean to the Pacific. In Seas at the Millenium: An Environmental Evaluation: 2; Sheppard, C.R.C., Ed.; CRC Press: Boca Raton, FL, USA, 2000; pp. 723–736. [Google Scholar]
  2. Lagoons of New Caledonia: Reef Diversity and Associated Ecosystems—UNESCO World Heritage Centre. Available online: (accessed on 24 July 2015).
  3. USGS. Mineral Commodity Summaries 2011. Available online: (accessed on 24 July 2015).
  4. Sarramegna, S.; EMR. Expertise Environnementale Des Conséquences des Fortes Précipitations Observées les 02 et 03 juillet 2013 sur les Communautés Récifo-Lagonaires Des Baies Kué et Port-Boisé. Available online: (accessed on 24 July 2015).
  5. De’ath, G.; Fabricius, K.; Sweatman, H.; Puotinen, M. The 27-year decline of coral cover on the Great Barrier Reef and its causes. Proc. Natl. Acad. Sci. USA 2012, 109, 17995–17999. [Google Scholar] [CrossRef] [PubMed]
  6. Thomas, Y.; Courties, C.; El Helwe, Y.; Herbland, A.; Lemonnier, H. Spatial and temporal extension of eutrophication associated with shrimp farm wastewater discharges in the New Caledonia lagoon. Mar. Pollut. Bull. 2010, 61, 387–398. [Google Scholar] [CrossRef] [PubMed]
  7. Torréton, J.-P.; Rochelle-Newall, E.J.; Jouon, A.; Faure, V.; Jacquet, S.; Douillet, P. Correspondence between the distribution of hydrodynamic time parameters and the distribution of biological and chemical variables in a semi-enclosed coral reef lagoon. Estuar. Coast. Shelf Sci. 2007, 74, 766–776. [Google Scholar] [CrossRef]
  8. Houk, P.; Raubani, J. Acanthaster planci outbreaks in Vanuatu coincide with oceanically-derived chlorophyll blooms, furthering consistencies throughout the Pacific. J. Oceanogr. 2010, 66, 435–438. [Google Scholar] [CrossRef]
  9. Tenorio, M.M.B.; le Borgne, R.; Rodier, M.; Neveux, J. The impact of terrigeneous inputs on the Bay of Ouinne (New Caledonia) phytoplankton communities: A spectrofluorometric and microscopic approach. Estuar. Coast. Shelf Sci. 2005, 64, 531–545. [Google Scholar] [CrossRef]
  10. Dupouy, C.; Frouin, R.; Röttgers, R.; Neveux, J.; Gallois, F.; Panché, J.Y.; Gérard, P.; Fontana, C.; Pinazo, C.; Ouillon, S.; et al. Ocean color response to an episode of heavy rainfall in the lagoon of New Caledonia. Proc. SPIE 2009, 7459. [Google Scholar] [CrossRef]
  11. Ganachaud, A.; Vega, A.; Rodier, M.; Dupouy, C.; Maes, C.; Marchesiello, P.; Eldin, G.; Ridgway, K.; le Borgne, R. Observed impact of upwelling on water properties and biological activity off the southwest coast of New Caledonia. Mar. Pollut. Bull. 2010, 61, 449–464. [Google Scholar] [CrossRef] [PubMed]
  12. Neveux, J.; Lefebvre, J.-P.; le Gendre, R.; Dupouy, C.; Gallois, F.; Courties, C.; Gérard, P.; Ouillon, S.; Fernandez, J.M. Phytoplankton dynamics in New-Caledonian lagoon during a southeast trade winds event. J. Mar. Syst. 2010, 82, 230–244. [Google Scholar] [CrossRef]
  13. Fuchs, R.; Dupouy, C.; Douillet, P.; Dumas, F.; Caillaud, M.; Mangin, A.; Pinazo, C. Modelling the impact of a La Niña event on a South West Pacific Lagoon. Mar. Pollut. Bull. 2012, 64, 1596–1613. [Google Scholar] [CrossRef] [PubMed]
  14. Fuchs, R.; Pinazo, C.; Douillet, P.; Fraysse, M.; Grenz, C.; Mangin, A.; Dupouy, C. Modeling the ocean-lagoon interaction via upwelling processes on the South West of New Caledonia. Estuar. Coast. Shelf Sci. 2013, 135, 5–17. [Google Scholar] [CrossRef]
  15. Berkelmans, R.; de’ath, G.; Kininmonth, S.; Skirving, W.J. A comparison of the 1998 and 2002 coral bleaching events on the Great Barrier Reef: Spatial correlation, patterns, and predictions. Coral Reefs 2004, 23, 74–83. [Google Scholar] [CrossRef]
  16. Baird, A.H.; Marshall, P.A. Mortality, growth and reproduction in scleractinian corals following bleaching on the Great Barrier Reef. Mar. Ecol. Prog. Ser. 2002, 237, 133–141. [Google Scholar] [CrossRef]
  17. International Ocean-Color Coordinating Group. Minimum Requirements for an Operational, Ocean-Color Sensor for the Open Ocean, Reports of the International Ocean-Color Coordinating Group; IOCCG Report Number 1; International Ocean-Color Coordinating Group: Dartmouth, NH, Canada, 1998. [Google Scholar]
  18. O’reilly, J.E.; Maritorena, S.; Mitchell, B.G.; Siegel, D.A.; Carder, K.L.; Gerver, S.A.; Kahru, M.; McClain, C. Ocean color chlorophyll algorithms for SeaWiFS. J. Geophys. Res. 1998, 2013, 24937–24953. [Google Scholar] [CrossRef]
  19. Morel, A.; Prieur, L. Analysis of variations in ocean color. Limnol. Oceanogr. 1977, 22, 709–722. [Google Scholar] [CrossRef]
  20. Hu, C.; Lee, Z.; Franz, B. Chlorophyll a algorithms for oligotrophic oceans: A novel approach based on three-band reflectance difference. J. Geophys. Res. 2012, 117, C01011. [Google Scholar] [CrossRef]
  21. International Ocean-Color Coordinating Group. Remote Sensing of Ocean Colour Coastal, and Other Optically-Complex, Waters, Reports of the International Ocean-Color Coordinating Group; IOCCG Report Number 3; International Ocean-Color Coordinating Group: Dartmouth, NH, Canada, 2000. [Google Scholar]
  22. Cannizzaro, J.P.; Carder, K.L. Estimating chlorophyll a concentrations from remote-sensing reflectance in optically shallow waters. Remote Sens. Environ. 2006, 101, 13–24. [Google Scholar] [CrossRef]
  23. Nechad, B.; Ruddick, K.; Schroeder, T.; Oubelkheir, K.; Blondeau-Patissier, D.; Cherukuru, N.; Brando, V.; Dekker, A.; Clementson, L.; Banks, A.C.; et al. CoastColour Round Robin datasets: A database to evaluate the performance of algorithms for the retrieval of water quality parameters in coastal waters. Earth Syst. Sci. Data (ESSD) 2015, 8, 173–258. [Google Scholar] [CrossRef]
  24. Ha, N.T.T.; Koike, K.; Nhuan, M.T. Improved accuracy of chlorophyll-a concentration estimates from MODIS imagery using a two-band ratio algorithm and geostatistics: As applied to the monitoring of eutrophication processes over Tien Yen Bay (Northern Vietnam). Remote Sens. 2014, 6, 421–442. [Google Scholar] [CrossRef]
  25. Samli, R.; Sivri, N.; Sevgen, S.; Kiremitci, V.Z. Applying artificial neural networks for the estimation of chlorophyll-a concentrations along the Istanbul coast. Pol. J. Environ. Stud. 2014, 23, 1281–1287. [Google Scholar]
  26. Zhan, H. Application of support vector machines in inverse problems in ocean color remote sensing. Stud. Fuzziness Soft Comput. 2005, 177, 387–398. [Google Scholar]
  27. Camps-Valls, G.; Bruzzone, L.; Rojo-Alvarez, J.L.; Melgeni, F. Robust Support Vector Regression for biophysical variable estimation from remotely sensed images. IEEE Geosci. Remote Sens. Lett. 2006, 3, 1–5. [Google Scholar] [CrossRef]
  28. Camps-Valls, G.; Gómez-Chova, L.; Muñoz-Marí, J.; Vila-Francés, J.; Amorós-López, J.; Calpe-Maravilla, J. Retrieval of oceanic chlorophyll concentration with relevance vector machines. Remote Sens. Environ. 2006, 105, 23–33. [Google Scholar] [CrossRef]
  29. Dupouy, C.; Neveux, J.; Ouillon, S.; Frouin, R.; Murakami, H.; Hochard, S.; Dirberg, G. Inherent optical properties and satellite retrieval of chlorophyll concentration in the lagoon and open waters of New Caledonia. Mar. Pollut. Bull. 2010, 61, 503–518. [Google Scholar] [CrossRef] [PubMed]
  30. Dupouy, C.; Wattelez, G.; Fuchs, R.; Lefèvre, J.; Mangeas, M.; Murakami, H.; Frouin, R. The Colour of the Coral Sea. In The Future of the Coral Sea Reefs and Sea Mounts, Proceedings of the 12th International Coral Reef Symposium, Cairns, Australia, 9–13 July 2012.
  31. Ouillon, S.; Douillet, P.; Petrenko, A.; Neveux, J.; Dupouy, C.; Froidefond, J.M.; Andrefouet, S.; Muñoz-Caravaca, A. Optical algorithms at satellite wavelengths for total suspended matter in tropical coastal waters. Sensors 2008, 8, 4165–4185. [Google Scholar] [CrossRef]
  32. Dekker, A.G.; Phinn, S.R.; Anstee, J.; Bissett, P.; Brando, V.E.; Casey, B.; Fearns, P.; Hedley, J.; Klonowski, W.; Lee, Z.P.; et al. Intercomparison of shallow water bathymetry, hydro-optics, and benthos mapping techniques in Australian and Caribbean coastal environments. Limnol. Oceanogr. Methods 2011, 9, 396–425. [Google Scholar] [CrossRef]
  33. Murakami, H.; Dupouy, C. Atmospheric correction and inherent optical property estimation in the southwest New Caledonia lagoon using AVNIR-2 high-resolution data. Appl. Opt. 2013, 52, 182–198. [Google Scholar] [CrossRef] [PubMed]
  34. Minghelli-Roman, A.; Dupouy, C. Influence of water column chlorophyll concentration on bathymetric estimations in the lagoon of New Caledonia, using several MERIS images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 739–745. [Google Scholar] [CrossRef][Green Version]
  35. Minghelli-Roman, A.; Dupouy, C. Correction of the Water Column Attenuation: Application to the Seabed Mapping of the lagoon of New Caledonia using MERIS images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2617–2629. [Google Scholar] [CrossRef]
  36. Gohin, F.; Druhon, J.N.; Lampert, L. A five channel chlorophyll concentration algorithm applied to SeaWiFS data processed by SeaDAS in coastal waters. Int. J. Remote Sens. 2002, 23, 1639–1661. [Google Scholar] [CrossRef]
  37. Katlane, R.; Dupouy, C.; Zargouni, F. Chlorophyll and turbidity concentrations as an index of water quality of the Gulf of Gabes from MODIS in 2009. Teledetection 2012, 11, 265–273. [Google Scholar]
  38. Dupouy, C.; Savranski, T.; Lefèvre, J.; Despinoy, M.; Mangeas, M.; Fuchs, R.; Faure, V.; Ouillon, S.; Petit, M. Monitoring optical properties of the Southwest Tropical Pacific. In Proceedings of the Remote Sensing of the Coastal Ocean, Land, and Atmosphere Environment, Incheon, Korea, 4 November 2010; Frouin, R.J., RhyongYoo, H., Won, J.-S., Feng, A., Eds.; Volume 7858. [CrossRef]
  39. Wattelez, G.; Dupouy, C.; Mangeas, M.; Lefèvre, J.; Touraivane, T.; Frouin, R.J. A statistical algorithm for estimating chlorophyll concentration from MODIS data. In Proceedings of the Ocean Remote Sensing and Monitoring from Space, Beijing, China, 13–17 October 2014; Frouin, R.J., Pan, D., Murakami, H., Son, Y.B., Eds.; Volume 9261. [CrossRef]
  40. SeaBASS. Available online: (accessed on 20 December 2015).
  41. Bailey, S.W.; Werdell, P.J. A multi-sensor approach for the on-orbit validation of ocean color satellite data products. Remote Sens. Environ. 2006, 102, 12–23. [Google Scholar] [CrossRef]
  42. Lefèvre, J. The VALHYSAT Project: MODIS-DB Database: Description Guide of the Database; Valhysat Report 1; IRD Internal Report: Noumea, New Caledonia, 2010. [Google Scholar]
  43. Savranski, T. Télédétection de la chlorophylle de surface dans un système lagonaire tropical: Validation de données MODIS couleur de l'eau du lagon Sud-Ouest de Nouvelle-Calédonie, Rapport de stage Master 2 Professionnel: Surveillance et Gestion de l'Environnement (direction de C. Dupouy); Msc Report: University of Toulouse, Toulouse, France, 2010. [Google Scholar]
  44. Matarrese, R.; Chiaradia, M.T.; Tijani, K.; Morea, A.; Carlucci, R. Chlorophyll a multi-temporal analysis in coastal waters with MODIS data. Ital. J. Remote Sens. 2011, 43, 39–48. [Google Scholar]
  45. Kahru, M.; Kudela, R.M.; Anderson, C.R.; Manzano-Sarabia, M.; Mitchell, B.G. Evaluation of Satellite Retrievals of Ocean Chlorophyll-a in the California Current. Remote Sens. 2014, 6, 8524–8540. [Google Scholar] [CrossRef]
  46. Ouillon, S.; Douillet, P.; Lefebvre, J.P.; le Gendre, R.; Jouon, A.; Bonneton, P.; Fernandez, J.M.; Chevillon, C.; Magand, O.; Lefèvre, J.; et al. Circulation and suspended sediment transport in a coral reef lagoon: The south-west lagoon of New Caledonia. Mar. Pollut. Bull. 2010, 61, 269–296. [Google Scholar] [CrossRef] [PubMed]
Remote Sens. EISSN 2072-4292 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top