Deriving Particulate Organic Carbon in Coastal Waters from Remote Sensing: Inter-Comparison Exercise and Development of a Maximum Band-Ratio Approach

Recently, different algorithms have been developed to assess near-surface particulate organic matter (POC) concentration over coastal waters. In this study, we gathered an extensive in situ dataset representing various contrasted bio-optical coastal environments at low, medium, and high latitudes, with various bulk particulate matter chemical compositions (mineral-dominated, 50% of the data set, mixed, 40%, or organic-dominated, 10%). The dataset includes 606 coincident measurements of POC concentration and remote-sensing reflectance, Rrs, with POC concentrations covering three orders of magnitude. Twelve existing algorithms have then been tested on this data set, and a new one was proposed. The results show that the performance of historical algorithms depends on the type of water, with an overall low performance observed for mineral-dominated waters. Furthermore, none of the tested algorithms provided satisfactory results over the whole POC range. A novel approach was thus developed based on a maximum band ratio of Rrs (red/blue, red/yellow or red/green ratio). Based on the standard statistical metric for the evaluation of inverse models, the new algorithm presents the best performance. The root-mean square deviation for log-transformed data (RMSDlog) is 0.25. The mean absolute percentage difference (MAPD) is 37.48%. The mean bias (MB) and median ratio (MR) values are 0.54 μg L−1 and 1.02, respectively. This algorithm replicates quite well the distribution of in situ data. The new algorithm was also tested on a matchup dataset gathering 154 coincident MERIS (MEdium Resolution Imaging Spectrometer) Rrs and in situ POC concentration sampled along the French coast. The matchup analysis showed that the performance of the new algorithm is satisfactory (RMSDlog = 0.24, MAPD = 34.16%, MR = 0.92). A regional illustration of the model performance for the Louisiana continental shelf shows that monthly mean POC concentrations derived from MERIS with the new algorithm are consistent with those derived from the 2016 algorithm of Le et al. which was specifically developed for this region.


Introduction
Carbon is unevenly distributed in the biosphere among three major reservoirs: atmospheric, oceanic, and terrestrial (on land in vegetation, soils and freshwaters). Knowledge of the carbon exchanges between the different reservoirs is a key issue for better understanding the carbon cycle in the biosphere [1,2]. Although coastal waters represent only 7% of the oceanic surface area, this domain is of a particular interest, being an active exchange zone between the terrestrial and oceanic reservoirs. A significant amount of terrestrial carbon is transported from soils to river headstreams. A fraction of this carbon returns to the atmosphere via degassing occurring in inland waters, a fraction is stored in freshwater organic sediments, and the remaining amount is delivered by estuaries to the coastal waters as dissolved inorganic carbon (DIC), dissolved organic carbon (DOC) and particulate organic (POC) and inorganic (PIC) carbon [3][4][5]. Besides this pool of terrestrial carbon origin, marine coastal ecosystems are also productive areas where dissolved and particulate organic carbon may be locally produced and degraded due to complex biogeochemical and physical processes. For a better understanding of the role of coastal waters in the carbon cycle, the spatiotemporal evolution of the different carbon compartment (POC, DOC, PIC, DIC) and their respective standing stocks, must be evaluated and analyzed [6].
POC is one of the main pools of ocean organic carbon, which is composed of living material (heterotrophic bacteria, phytoplankton, zooplankton) and detritus (i.e., non-living cells). Despite its relatively small stock in open ocean waters, its high turnover rate makes POC a central component of the oceanic carbon cycle. Knowledge of the POC concentration distribution and dynamics is indeed a key parameter to study the biological export of carbon from the surface to the deep ocean but also the transfer of carbon throughout the marine food web. Indeed, being the first level of the trophic chain, the production of organic matter (and organic carbon) by phytoplankton supports higher trophic levels and marine diversity. Thus, the amount of POC is an indicator of productivity in the euphotic zone, and can also be used as an indicator of pollution events in coastal areas impacted by human activities.
Over the past few years, many bio-optical algorithms have been developed to derive the POC concentration at both the surface [7][8][9][10][11][12][13][14] and within the euphotic [15] or mixed oceanic layers [12,16] from ocean-color radiometry (OCR). These algorithms were developed for open waters and rely on the fact that the variability of the inherent optical properties is driven by phytoplankton and its associated material (heterotrophic bacteria, detritus, colored dissolved organic matter (CDOM)). The performance of different available POC algorithms for oceanic waters has recently been evaluated [17] showing that empirical approaches based on band ratios [11] and semi-analytical approaches based on the back scattering coefficient (b bp ) and chlorophyll-a concentration (Chla) [8] performed the best. While the application of these algorithms to OCR observations allowed the pool of POC over the open ocean to be estimated (about 0.4 and 1.2 Pg. C in the first attenuation and euphotic layers, respectively), [15], such information is still not available for global coastal waters, which are more complex bio-optical environments [18]. To overcome this limitation on our understanding of the POC dynamics, some purely empirical approaches were recently developed from in situ measurements performed in offshore and coastal waters [19][20][21][22] or exclusively from measurements collected in coastal waters (mainly in river-dominated systems) [23] to estimate the surface POC concentration from OCR. However, these algorithms were almost all developed from limited datasets gathered in specific regions. This dictates that the results and performance of these approaches at a global scale may be strongly conditioned by the representativeness of the dataset used for their development. In other words, these algorithms may be not suitable to catch the large POC variability encountered in optically contrasted coastal areas.
This study emerged in this context and aimed at improving the retrieval of POC concentration for global scale applications in coastal waters (i.e., over large POC concentration range). For this purpose, a large number of in situ measurements performed at low, medium, and high latitudes were first gathered to constitute an extended dataset attesting to the high diversity of physical and biological coastal environments. Different existing algorithms built on different approaches and assumptions (DSV), which represent 67.8% (N = 411) and 32.2% (N = 195) of DSW, respectively. The ranges of R rs (490), R rs (555), and R rs (665) and proportion of mineral-dominated, organic-dominated, and mixed waters are quite similar for DSW, DSD, and DSV ( Figure 3). DSD and DSV cover about three orders of magnitude in terms of POC concentration ( Figure 4) being representative of the large natural variability of coastal environments.
Remote Sens. 2019, 11, x FOR PEER REVIEW 4 of 31 measurements of our dataset were sampled within mineral-dominated waters, 10% within organicdominated waters, and 40% within mixed waters (Figure 3a). The whole dataset (DSW, NPOC = N = 606) was randomly split (Monte Carlo method) into a development dataset (DSD) and a validation dataset (DSV), which represent 67.8% (N = 411) and 32.2% (N = 195) of DSW, respectively. The ranges of Rrs (490), Rrs (555), and Rrs (665) and proportion of mineral-dominated, organic-dominated, and mixed waters are quite similar for DSW, DSD, and DSV ( Figure 3). DSD and DSV cover about three orders of magnitude in terms of POC concentration ( Figure 4) being representative of the large natural variability of coastal environments.         Dotdashed lines represent the values, which delimit the POC/SPM ranges for the mineral-dominated, mixed, and organic-dominated waters according to [39]. The percentages between brackets indicate the percentage of in situ data for each type of water. Dot-dashed lines represent the values, which delimit the POC/SPM ranges for the mineral-dominated, mixed, and organic-dominated waters according to [39]. The percentages between brackets indicate the percentage of in situ data for each type of water.

Satellite-In Situ Matchup Data Base
The matchup dataset is composed by in situ measurements of POC collected in the frame of the French Coastal Monitoring Network (SOMLIT, Service d'Observation en Milieu LITtoral, http://somlit.epoc.u-bordeaux1.fr/fr/) nearly simultaneously with the overpass of the MERIS onboard the European Space Agency (ESA)'s Envisat platform. The SOMLIT network gathers several coastal stations along the French coastline (Eastern English Channel, Atlantic Ocean, and Mediterranean Sea) sampled bi-monthly since 1995 ( Figure 5). MERIS level 1 data were processed using the polymer atmospheric correction algorithm [41], which was adapted for coastal waters in the frame of the GlobCoast project funded by the French Research National Agency [42]. From daily MERIS images, a total of 336 in situ data points were extracted during the satellite lifetime (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012). The criteria considered for the matchup selection were: (1) selection of a 3 × 3 pixel window centered on matchup location, (2) a time difference between in situ measurements and satellite overpass lower than 3 h (a time window of one hour also was tested with no significant differences in the results), (3) spatial homogeneity in the satellite pixels (coefficient of variation of Rrs(λ), ratio of the standard deviation to the mean computed over the pixel window, lower than 30%), and (4) number of valid pixels per pixel

Satellite-In Situ Matchup Data Base
The matchup dataset is composed by in situ measurements of POC collected in the frame of the French Coastal Monitoring Network (SOMLIT, Service d'Observation en Milieu LITtoral, http://somlit.epoc.u-bordeaux1.fr/fr/) nearly simultaneously with the overpass of the MERIS onboard the European Space Agency (ESA)'s Envisat platform. The SOMLIT network gathers several coastal stations along the French coastline (Eastern English Channel, Atlantic Ocean, and Mediterranean Sea) sampled bi-monthly since 1995 ( Figure 5). MERIS level 1 data were processed using the polymer atmospheric correction algorithm [41], which was adapted for coastal waters in the frame of the GlobCoast project funded by the French Research National Agency [42]. From daily MERIS images, a total of 336 in situ data points were extracted during the satellite lifetime (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012). The criteria considered for the matchup selection were: (1) selection of a 3 × 3 pixel window centered on matchup location, (2) a time difference between in situ measurements and satellite overpass lower than 3 h (a time window of one hour also was tested with no significant differences in the results), (3) spatial homogeneity in the satellite pixels (coefficient of variation of R rs (λ), ratio of the standard deviation to the mean computed over the pixel window, lower than 30%), and (4) number of valid pixels per pixel window greater than 6. After the application of all these criteria, the final matchup dataset is then composed of 154 matched points (Table 2). For the matchup dataset, POC concentrations are between 26.58 and 658.2 µg L −1 , Chla ranges from 0.04 to 10.04 µg L −1 , and SPM from 80 to 20,270 µg L −1 , respectively ( Figure 6). According to the value of the in situ POC/SPM ratio, 38.31%, 15.58%, and 46.1% of matchups concern mineral-dominated, organic-dominated, and mixed waters, respectively. window greater than 6. After the application of all these criteria, the final matchup dataset is then composed of 154 matched points ( Table 2). For the matchup dataset, POC concentrations are between 26.58 and 658.2 μg L −1 , Chla ranges from 0.04 to 10.04 μg L −1 , and SPM from 80 to 20,270 μg L −1 , respectively ( Figure 6). According to the value of the in situ POC/SPM ratio, 38.31%, 15.58%, and 46.1% of matchups concern mineral-dominated, organic-dominated, and mixed waters, respectively.

Candidate Algorithms
Many bio-optical algorithms have been developed to derive the near-surface concentration of POC from satellite measurements [7][8][9][10][11][12][13][14] and an inter-comparison exercise was recently performed to test their respective performances [17]. These algorithms were developed for open ocean waters and rely on the dominance of phytoplankton biomass in the total POC concentration. More recently, some purely empirical approaches were developed from in situ measurements performed in offshore and coastal waters [19][20][21]22] or exclusively from measurements collected in coastal waters (mainly in river-dominated systems) [23]. These algorithms are all empirical because the use of a semi-analytical algorithm to estimate POC requires a perfect knowledge of the spectral specific POC-IOPs properties, which are not available as yet.
These latter coastal and offshore-coastal algorithms, which were almost all developed for specific regions (Table 3), were all considered for the inter-comparison exercise. In addition, we selected Stramski et al., (2008) algorithm that was implemented by the NASA Ocean Biology Processing Group to derive POC from MODIS at global scale. Because this algorithm is dedicated to open waters, the discussion on its performance is limited to organic-dominated waters. The selected models are representative of the different approaches found in the literature: purely empirical approaches using Rrs(λ) [23], Rrs band ratios [11,[19][20][21], or color index [22] as input parameters.

Candidate Algorithms
Many bio-optical algorithms have been developed to derive the near-surface concentration of POC from satellite measurements [7][8][9][10][11][12][13][14] and an inter-comparison exercise was recently performed to test their respective performances [17]. These algorithms were developed for open ocean waters and rely on the dominance of phytoplankton biomass in the total POC concentration. More recently, some purely empirical approaches were developed from in situ measurements performed in offshore and coastal waters [19][20][21][22] or exclusively from measurements collected in coastal waters (mainly in river-dominated systems) [23]. These algorithms are all empirical because the use of a semi-analytical algorithm to estimate POC requires a perfect knowledge of the spectral specific POC-IOPs properties, which are not available as yet.
These latter coastal and offshore-coastal algorithms, which were almost all developed for specific regions (Table 3), were all considered for the inter-comparison exercise. In addition, we selected Stramski et al., (2008) algorithm that was implemented by the NASA Ocean Biology Processing Group to derive POC from MODIS at global scale. Because this algorithm is dedicated to open waters, the discussion on its performance is limited to organic-dominated waters. The selected models are representative of the different approaches found in the literature: purely empirical approaches using R rs (λ) [23], R rs band ratios [11,[19][20][21], or color index [22] as input parameters. Table 3. Candidate algorithms used for the inter-comparison exercise. The four last columns provide relevant information on the algorithms: inputs of algorithm, region where the data were collected, the range of POC, and number of data used for the development of the algorithm.

Band Ratio-based Algorithms
S08-Stramski et al., 2008 [11] developed an empirical algorithm based on R rs (λ) ratios at two different wavelengths to derive POC concentration in open ocean waters. This algorithm was developed from in situ data (N = 53) collected within oligotrophic and upwelling waters of the eastern South Pacific Ocean (10 ≤ POC ≤ 270 µg L −1 ) during the Biosope campaign. This algorithm is based on an empirical power-law between near-surface POC and blue-to-green ratio of R rs (λ). Stramski et al. (2008) showed that the following two algorithms have the best performance for their dataset: These two algorithms will be named S08-1 (Equation (1)) and S08-2 (Equation (2)) in the following. W16-Woźniak et al. 2016 [21] proposed an empirical algorithm built from 73 in situ measurements collected during four field campaigns during spring and late summer within the open waters of the Southern Baltic and coastal regions of the Gulf of Gdańsk. POC measurements cover a wide range from 145 to 2370 µg L −1 . The authors tested correlations of POC concentrations with reflectance ratios at various spectral bands leading to the selection of the R rs (555)/R rs (589) ratio: A good performance was also obtained with the R rs (490)/R rs (625) ratio, which has benefits to consider the input R rs close to bands potentially available from satellite observations: Equations (3) and (4) will be further named W16-1 and W16-2, respectively. Note that the 589 nm spectral band is absent in all ocean color sensors, while the 625 nm band is only available for MERIS and OLCI (Ocean and Land Colour Instrument).
Hu16-Hu et al., 2016 [20] developed a regional algorithm using coastal and offshore data (N = 120) collected in the South China Sea mainly during the summer. The POC values range between 25.3 and 687.5 mg m −3 . Their algorithm is quite similar to the 2008 algorithm from Stramski et al. as it is based on a blue-to-green reflectance ratio. The authors examined the following band ratios: They showed that the algorithm described by Equation (5) provides the best performance. The Equations (5)-(7) will be further named Hu16-1, Hu16-2, and Hu16-3, respectively.
Liu15-Liu et al., 2015 [19] developed an algorithm for estimating surface POC concentration in the highly turbid waters of the Pearl River estuary in China. Their algorithm is based on band ratios of the equivalent reflectance computed for MODIS/AQUA (Moderate Resolution Imaging Spectro-radiometer) spectral bands (412, 443, 488, 531, 551, 667, 678 and 748 nm). The equivalent reflectance is defined by Equation (8): where r equi (λ) is the equivalent reflectance for a band with a central wavelength λ, f(λ) is the spectral response function, available from the Ocean Color website (http://oceancolor.gsfc.nasa.gov/), r(λ) is the in situ remote-sensing reflectance, L(λ) is the solar irradiance at mean Earth-Sun distance, λ min is equal to 350 nm, and λ max is 800 nm. The authors selected an approach based on two different band ratios, and used an optimization approach to determine the best band combination: After testing all possible band combinations, the algorithm achieved the best fit with λ 1 = 678 nm, λ 2 = 488 nm, λ 3 = 748 nm, and λ 4 = 412 nm. The regression coefficients are b 0 = 0.0078, b 1 = 1.3973, and b 2 = −1.2397. Equation (9) is named Liu15. It was shown that the R rs (λ) values at the central bands are very similar to the equivalent reflectance values calculated using the band spectral response at all visible bands of the different OCR [36]. For this reason, the central remote-sensing reflectance at 678, 488, 748, and 412 nm are used instead of the equivalent reflectance in this inter-comparison exercise.
It is worth noticing that the two-step algorithm developed by Woźniak et al. in 2016 [21], based on the particulate absorption (a p ) coefficient at 570 and 675 and the concentration of SPM, was not tested because of the complexity in retrieving a p in the green and in the red from OCR [43].

Absolute R rs -based Algorithms
Le16-Using a dataset of in situ POC concentrations matched with satellite R rs (λ), Le et al., 2016 [23] developed multiple regression algorithms for two river-dominated estuaries in the northern Gulf of Mexico (the Louisiana Continental Shelf and Mobile Bay). The multiple linear regression models which showed the lowest prediction error between log(POC) (log-transformed base 10) and R rs (λ) are given in Equation (10) (named Le16-1) and in Equation (11)

Color Index Algorithm
Le18-In 2018, Le et al., [22] formulated two algorithms using in situ POC data from the SeaWiFS Bio-optical Archive and Storage System (SeaBASS) of NASA (https://seabass.gsfc.nasa.gov) and satellite R rs (λ) data obtained from matchups. This approach uses three spectral bands centered at 490 nm, 550 nm, and 670 nm to determine a color index (CI POC , Equation (12)), from which the POC concentration is estimated (Equations (13) and (14)). This approach is named Le18-1.
Le et al. formulated another algorithm based on the blue-to-green ratio (Equations (15) and (16)). The color index (Equation (12)) is used to pick up a suitable relationship. Indeed, the authors indicated that Equation (15)

Statistical Indicators Used for Model Development and Validation
To assess the performance of the 12 selected algorithms, we use the graphical comparison of model predictions and observations as well as quantitative statistical metrics of differences between the corresponding predictions of model and observations. These indicators include the root mean square deviation for log-transformed data (RMSD log ) (Equation (17)), the root mean square Deviation for un-transformed data (RMSD) (Equation (18)), and the median absolute percent difference (MAPD) (Equation (19)):  (20)) and the median ratio (MR) (Equation (21)) are also calculated: The statistical indicators obtained for the 12 algorithms are normalized to be compared graphically. For that purpose, the normalized statistical indicators for each algorithm are defined as follows: where j identifies the algorithm (j ="S08-1","S08-2", "W16-1", "W16-2", "Hu16-1", "Hu16-2", "Hu16-3", "Liu-15", "Le16-1", Le16-2", "Le18-1", "Le18-2") and k is the number of tested algorithms. Radar charts are also used to compare the performance of the algorithms. A radar chart is a graphical method displaying multi parameters in the form of a two-dimensional chart. The radar plots in used in the present study give an overview of the statistical indicators by displaying the normalized MAPD, RMSDlog, MB and MR (Equations (22)-(25)). For the development of the new model, we also use a half-matrix representation of the determination coefficient (R 2 ) calculated between POC and R rs (λi)/R rs (λj) as it does not matter which R rs (λ) is taken as a numerator or denominator.

Development of a New Algorithm for POC
As explained in Section 2.3.1, empirical algorithms based on band ratios at two different wavelengths were first developed for open waters. More recently, they were adjusted to coastal waters but from limited in situ datasets in terms of the number of samples or geographical distribution ( Table 3). The objective of this section is to define a band ratio algorithm for coastal waters using the large DSD dataset. The use of band ratios allows the influence of potential errors due to atmospheric correction being minimized [21].
The performance of different color ratios is tested considering only the hyperspectral R rs (λ) measurements in DSD (N=298, representing 72.5% of DSD). For each spectrum, R rs (λ) in the visible part (400-700 nm) is measured at 300 different wavelengths (R rs (λ i )). Mathematically, the different R rs (λ) ratios, defined as R rs (λ i )/R rs (λ j ) where i j, correspond to k-combinations (R rs (λ i ), R rs (λ j )) of the set composed of DSD hyperspectral R rs (λ). The number of k-combinations is equal to 44,850. It corresponds to the binomial coefficient calculated using the factorials according to: where k = 2 and n = 300. So, 44,850 linear type II regressions were computed between POC and the different R rs (λ i )/R rs (λ j ) ratios. Both POC concentration and R rs (λ i )/R rs (λ j ) were log-transformed to base 10. Figure 7 summarizes through a half matrix representation the value of the coefficient of determination (R 2 ) for the 44,850 regressions. The highest R 2 values (about 0.68) are obtained for band ratios R rs (λ i )/R rs (λ j ) with λ i ranging from 675 to 695 nm and λ j ranging from 490 and 590 nm. It corresponds to red-to-blue, red-to-green, or red-to-yellow ratios. However, as the spectral region around 680 nm corresponds to the maximum of chlorophyll fluorescence [44,45], the R rs (λ) signal may be contaminated by light emission, which may bias the POC retrieval. Figure 7 shows that R rs (λ) between 660-670 nm and R rs (λ) between 490-560 are "statistically promising" spectral band combinations. The advantage is that many ocean color sensors have bands in the corresponding spectral region. These results are in agreement with those of Woźniak et al. in 2016, who obtained good performance for a blue-to-red ratio (R rs (490)/R rs (625) (Equation (4)). As MERIS data will be used in the matchup exercise, a focus is performed on bands for that sensor, selecting the following red-to-green and red-to-blue band ratios: R rs (665)/R rs (555), R rs (665)/R rs (510), R rs (665)/R rs (490) ( Table 4 and Figure 8). Note that these bands are also available on OLCI, SeaWiFS, and VIIRS (Visible Infrared Imaging Radiometer Suite). Because of the slightly different statistics (in terms of R 2 and RMSD) observed between POC and the different latter band ratios (see Table 4), a linear (and polynomial) type II regression (log-transformed data) based on the maximum band ratio (MBR) are also examined ( Figure 9). Considering both statistical and graphical criteria, the linear type II regression based on MBR presents rather better performance with linear type II regression based on a single band ratio. The coefficient of determination is a bit higher (R 2 = 0.67 instead of R 2 between 0.59 and 0.66) and the RMSD log is a bit lower (RMSD log = 0.242 instead of RMSD log between 0.246 and 0.267) ( Table 4). As already discussed by [46] for Chla estimates from OC4 algorithm, the MBR allows to switch from a given band ratio to another, thereby avoiding, in some cases, a low and potentially noisy band ratio. Thus, in the context of satellite applications, it is expected that using MBR instead of a single band ratio allow maximization of the model precision over the entire range of POC. Among the three band ratios, R rs (665)/R rs (490) and R rs (665)/R rs (555) are maximal for POC above 500 µg L −1 and POC below 500 µg L −1 , respectively ( Figure 10). The latter pattern can be explained by the fact that, over their broad range of variability, SPM and POC tend, at first order, to co-vary in coastal waters (R 2 = 0.63 on DSW). For instance, high SPM values increase R rs in the red, while associated high POC values will increase absorption in the blue-green part of the spectrum, hence decreasing R rs (490). Although the R rs (665)/R rs (510) ratio is maximum for only 4 data points over the present dataset, this ratio is more frequently selected as the MBR over the MERIS coastal archive (not shown here). As observed for OC4 [46,47], there is an overlap, over the POC range, in the bands selected for the MBR definition, so there is a smooth transition for MBR around 0.2, which corresponds to POC concentrations between 100 and 500 µg L −1 (Figure 10). Second and third order polynomial regressions, which theoretically allow for a better account of specific spectral features, were also tested. Statistical results obtained for the second-order polynomial are similar to those obtained for the type II linear regression (Figure 9, Table 4) while those for the third polynomial regression do not show any improvements (not shown). Therefore, only the linear (Equation (27)) and second-order polynomial (Equations (28)-(29)) fits based on the maximum band ratio, named CPOC-1st and CPOC-2nd (Coastal POC), will be evaluated in the following section.   The performance of different color ratios is tested considering only the hyperspectral Rrs(λ) measurements in DSD (N=298, representing 72.5% of DSD). For each spectrum, Rrs(λ) in the visible part (400-700 nm) is measured at 300 different wavelengths (Rrs(λi)). Mathematically, the different Rrs(λ) ratios, defined as Rrs(λi)/Rrs(λj) where i ≠ j, correspond to k-combinations (Rrs(λi), Rrs(λj)) of the set composed of DSD hyperspectral Rrs(λ). The number of k-combinations is equal to 44,850. It corresponds to the binomial coefficient calculated using the factorials according to: where k = 2 and n = 300. So, 44,850 linear type II regressions were computed between POC and the

Inter-comparison Exercise of Existing Algorithms
The inter-comparison exercise is carried out on DSV (N = 195). Because some algorithms (W16, Le16 and Liu15) require Rrs(λ) at wavelengths available from hyperspectral measurements only, multi-spectral measurements of DSV cannot be used for their evaluation. The W16-1, W16-2, Le16-2, and Liu15 algorithms are therefore tested using 150, 146, and 144 data points, respectively. The For open circles, it is R rs (665)/R rs (555), which is maximum, whereas for crosses and filled triangles, it is R rs (665)/R rs (490) and R rs (665)/R rs (510), respectively.

Inter-comparison Exercise of Existing Algorithms
The inter-comparison exercise is carried out on DSV (N = 195). Because some algorithms (W16, Le16 and Liu15) require R rs (λ) at wavelengths available from hyperspectral measurements only, multi-spectral measurements of DSV cannot be used for their evaluation. The W16-1, W16-2, Le16-2, and Liu15 algorithms are therefore tested using 150, 146, and 144 data points, respectively. The number of used hyperspectral data changes based on the availability of the bands required by each algorithm. DSV was however not restricted to hyperspectral data to allow a wide representativeness of this validation dataset in terms of geographical distribution as well as biogeochemical and optical variability. The inter-comparison exercise is realized in two steps. First, the performance of the algorithms was assessed using hyperspectral and multispectral data (when possible) to cover a wide range of variability. Second, the inter-comparison exercise was performed on a consistent number of hyperspectral data (N = 144) to observe if the number of data impacts the ranking of the algorithms' performance. Figure 11 shows POC concentration derived from the different algorithms described in Section 2.3 and Table 3 against in situ POC measurements. Figure 12 compares the histograms of the frequency distribution of in situ and model-derived POC concentrations. Relevant statistical metrics are summarized in Table 5. The Le18-1 algorithm presents the highest MAPD, MB, RMSD log and MR values, and a much wider POC distribution compared to the in-situ ones (Figures 11f and 12f). For the organic-dominated waters, Le18-1 performs quite well as data points (green dots) are distributed along the 1:1 line. However, for mixed or organic-dominated waters, the variability in POC concentration is not reproduced and modeled POC values can be 100 times higher than in situ ones. In view of these results, this algorithm was excluded from further steps of the inter-comparison exercise. The statistical metrics (Table 5) provide a range of values among the 11 remaining algorithms for which the MAPD values vary between 38.83% and 64.40%, RMSD log between 0.27 and 0.44, MR between 1.04 and 1.59, and MB between −104.78 and 1237 µg L −1 . Note that the Liu15 algorithm generates 10 negative POC values, which were not considered within the log-scale statistics may be leading to an artificial increase in the algorithm performance. The radar plot in Figure 13 gives an overview of the statistical indicators by displaying the normalized MAPD, RMSD log , MB and MR (Equations (20)(21)(22)(23)). The best performance, related to the smallest area in the normalized radar diagram (Figure 13), is obtained for the Hu15-3 algorithm. The Hu15-3 presents the smallest MAPD (38.87%), the smallest RMSD log (0.27), and an MR close to one (1.11) ( Table 5). The Hu15-3 algorithm provides relatively good performance for organic-dominated (green dots close to the 1:1 line in Figure 11c). Data points sampled in mixed waters are scattered but follow the 1:1 line. However, for mineral-dominated waters, the Hu15-3 algorithm tends to underestimate high concentrations (POC > 1000 µg L −1 ) (Figure 11c and 12c) and overestimates POC concentrations lower than 1000 µg L −1 (blue dots in Figure 11c). This results in the the regression slope between in situ and derived POC being lower than 1 (= 0.5). Similar observations can be made for the Hu15-2 and Hu15-1 algorithms.
The S08-1 and S08-2 algorithms provide accurate estimates at low concentrations for organic-dominated and mixed waters (around 100-200 µg L −1 ), as it was developed mainly from data collected in open ocean waters. The performances of these algorithms are degraded at high concentrations, and for mineral dominated waters (blue dots in Figure 11h,i). Two Biosope data points in DSV were used previously by Stramski et al. (2008) to establish the S08-1 and S08-2 relationships and may artificially increase the performance of the S08-1 and S08-2. The performances of the S08 algorithms however do not change when removing these Biosope data points. The performance of the L18-2 algorithm is quite similar to the performance of the S08-1, 2 (Figure 11g, Tables 5 and A1). This was expected as Le18-2, S08-1 and S08-2 are based on a blue-to-green ratio.
Woźniak's algorithm W16-1 presents the best linear fit with a slope of 1.1. Both W16-1 and W16-2 algorithms overestimate POC concentration in mineral-dominated waters and underestimate POC concentration in organic-dominated and mixed waters (Figure 11k,l). For the W16-2 algorithm, data are less scattered and more accurate estimates of POC are achieved, especially for POC concentrations around 100 mg m −3 . This results in MAPD and RMSD log being 61.32% and 0.337 for W16-2, respectively, against 64.08% and 0.427 for W16-1. The histograms in Figure 12k,l show that the shape of the frequency distribution of POC estimates and in situ data differ. The peak of frequency between 200 and 500 µg L −1 observed for the in-situ data is absent for modeled data.
Concerning the Liu15 algorithm, despite the fact that this algorithm was developed for highly turbid waters, it does not provide better POC retrievals over the mineral-dominated water dataset (Figure 11j and Table 5).
The second step of the inter-comparison exercise, performed on the same number of hyperspectral data (N = 144), does not change the respective statistical results for the different algorithms, and Hu15-3 remains the best algorithm (Appendix A).      (Table 5).

Performance of the New Algorithms
The radar plot in Figure 14 and Table 5 illustrate that the CPOC-1 st and CPOC-2 nd algorithms improve the overall performance as compared to Hu15-3 algorithm. The second-order polynomial relationship shows the best performance with a smaller MAPD and smaller MB than the Type II linear regression. The MAPD is 37.48%, the RMSDlog is 0.25, MB is = 0.54, MR is equal to 1.02, and the regression slope is around 0.78. Most of the POC values in waters identified as mineral-dominated are overestimated, whereas the POC values in organic-dominated waters are underestimated ( Figure  15). We tried to develop a specific second-order polynomial regression according to the type of waters to adjust the POC concentration from Equation (28) according to the POC/SPM ratio with SPM concentration from Han et al., 2016. Results are not shown as it does not improve the accuracy of the estimates. As no specific regional pattern has been noticed on the validation results (not shown), the performance of the algorithm is more related to the chemical nature of the bulk suspended particulate matter, as opposed to any regional aspects. The shape of the POC distribution of model-derived and in situ data are quite similar (Figure 15b and Figure 16b). Nevertheless, the maximum of occurrence is slightly shifted towards higher POC values. This results in the median calculated for modeled data being a bit higher (425 μg L −1 instead of 391 μg L −1 ), whereas the mean values are equal (=569 μg L −1 ). Figure 13. Comparison of the statistical performance of the eleven algorithms. The algorithms were tested using hyperspectral and multispectral data. The number of data changes according the considered algorithm (Table 5).

Performance of the New Algorithms
The radar plot in Figure 14 and Table 5 illustrate that the CPOC-1 st and CPOC-2 nd algorithms improve the overall performance as compared to Hu15-3 algorithm. The second-order polynomial relationship shows the best performance with a smaller MAPD and smaller MB than the Type II linear regression. The MAPD is 37.48%, the RMSD log is 0.25, MB is = 0.54, MR is equal to 1.02, and the regression slope is around 0.78. Most of the POC values in waters identified as mineral-dominated are overestimated, whereas the POC values in organic-dominated waters are underestimated ( Figure 15). We tried to develop a specific second-order polynomial regression according to the type of waters to adjust the POC concentration from Equation (28) according to the POC/SPM ratio with SPM concentration from Han et al., 2016. Results are not shown as it does not improve the accuracy of the estimates. As no specific regional pattern has been noticed on the validation results (not shown), the performance of the algorithm is more related to the chemical nature of the bulk suspended particulate matter, as opposed to any regional aspects. The shape of the POC distribution of model-derived and in situ data are quite similar (Figures 15b and 16b). Nevertheless, the maximum of occurrence is slightly shifted towards higher POC values. This results in the median calculated for modeled data being a bit higher (425 µg L −1 instead of 391 µg L −1 ), whereas the mean values are equal (=569 µg L −1 ).  [43,48]. Note that overestimations and underestimations of POC values are only observed for two of the eight sampling stations (Banyuls-sur-Mer and Marseille). These differences can be due to uncertainties on in situ POC measurements as well as satellite remote-sensing reflectance (partly due to atmospheric corrections uncertainties). Concerning this latter aspect, we verified the retrieval accuracy of Rrs based on an extensive matchup exercise of 760 coincident data points collected from the AERONET-OC sites (Ocean Color component of the Aerosol Robotic Network). A bias of 3.4 × 10 −3 , 1.5 × 10 −3 , −1.9 × 10 −4 between in situ AERONET and MERIS Rrs(λ) at 490, 555 and 665 has been observed, respectively. Taking into account these bias values for the MERIS Rrs(λ) values over the SOMLIT matchup dataset, we showed that this correction only slightly modifies the POC estimates. Inaccuracies can also be explained by the fact that the range of POC concentration used in this matchup exercise is lower than the range of POC concentration of DSD. For instance, the median POC value for the SOMLIT in situ dataset is 108.4 μg L −1 , while it reaches 366.4 for DSD. The fact that the statistical values are better for the match-up data set than for DSV may be partly explained by the fact that the SOMLIT POC data set used for the match-up exercise gathers less mineral dominated waters (38 %) for which a slight over-estimation by CPOC-2nd has been observed, against 50% for DSV. The matchup analysis was conducted by applying CPOC-2nd on the satellite R rs (λ) measurements gathered in the match-up dataset described in Section 2.2. The matchup analysis shows that the CPOC-2nd algorithm, developed only on R rs (λ) and POC in situ data, is able to estimate satisfactorily the surface POC concentration from satellite observation over coastal waters. The histograms ( Figure 17 [43,48]. Note that overestimations and underestimations of POC values are only observed for two of the eight sampling stations (Banyuls-sur-Mer and Marseille). These differences can be due to uncertainties on in situ POC measurements as well as satellite remote-sensing reflectance (partly due to atmospheric corrections uncertainties). Concerning this latter aspect, we verified the retrieval accuracy of R rs based on an extensive matchup exercise of 760 coincident data points collected from the AERONET-OC sites (Ocean Color component of the Aerosol Robotic Network). A bias of 3.4 × 10 −3 , 1.5 × 10 −3 , −1.9 × 10 −4 between in situ AERONET and MERIS R rs (λ) at 490, 555 and 665 has been observed, respectively. Taking into account these bias values for the MERIS R rs (λ) values over the SOMLIT matchup dataset, we showed that this correction only slightly modifies the POC estimates. Inaccuracies can also be explained by the fact that the range of POC concentration used in this matchup exercise is lower than the range of POC concentration of DSD. For instance, the median POC value for the SOMLIT in situ dataset is 108.4 µg L −1 , while it reaches 366.4 for DSD. The fact that the statistical values are better for the match-up data set than for DSV may be partly explained by the fact that the SOMLIT POC data set used for the match-up exercise gathers less mineral dominated waters (38 %) for which a slight over-estimation by CPOC-2nd has been observed, against 50% for DSV. Remote Sens. 2019, 11, x FOR PEER REVIEW 23 of 31

Satellite POC Estimates for Coastal Regions
The present algorithm (CPOC-2nd) is applied to MERIS data over the Louisiana Continental Shelf, where the 2016 algorithm of Le et al. has been specifically developed. The MERIS monthly mean POC concentrations (June 2006) are displayed in Figure 18. The POC estimates were derived using the CPOC-2nd algorithm (Figure 18a) and the Le16-2 algorithm (Equation (11)), which is suitable for MERIS data (Figure 18b). The spatial patterns of the POC concentration derived using the CPOC-2nd mimics the spatial distribution observed using the Le16 algorithm. Figure 18a,b shows a strong gradient from the inner to outer shelves, which was observed by Le et al. from in situ measurements. A slight overestimation of POC by CPOC-2 nd compared to Le16-2 is observed when all pixels are taken into account (Figure 18c). However, by restricting only the comparison to coastal pixels (as defined by Rrs(665) > 0.0012 [49]) for which the CPOC-2nd has been developed, an excellent agreement between the two POC products is observed (Figure 18d-f). The two histograms are more similar than previously, and the medians are closer: 700.25 μg L −1 for Le16-2 and 627.50 μg L −1 for CPOC-2nd estimates. This pattern shows that part of the discrepancies between the Le16-2 and CPOC-2nd are related to open water situations for which the CPOC-2nd and Le16-2 are not suitable.

Satellite POC Estimates for Coastal Regions
The present algorithm (CPOC-2nd) is applied to MERIS data over the Louisiana Continental Shelf, where the 2016 algorithm of Le et al. has been specifically developed. The MERIS monthly mean POC concentrations (June 2006) are displayed in Figure 18. The POC estimates were derived using the CPOC-2nd algorithm (Figure 18a) and the Le16-2 algorithm (Equation (11)), which is suitable for MERIS data (Figure 18b). The spatial patterns of the POC concentration derived using the CPOC-2nd mimics the spatial distribution observed using the Le16 algorithm. Figure 18a,b shows a strong gradient from the inner to outer shelves, which was observed by Le et al. from in situ measurements. A slight overestimation of POC by CPOC-2 nd compared to Le16-2 is observed when all pixels are taken into account (Figure 18c). However, by restricting only the comparison to coastal pixels (as defined by R rs (665) > 0.0012 [49]) for which the CPOC-2nd has been developed, an excellent agreement between the two POC products is observed (Figure 18d-f). The two histograms are more similar than previously, and the medians are closer: 700.25 µg L −1 for Le16-2 and 627.50 µg L −1 for CPOC-2nd estimates. This pattern shows that part of the discrepancies between the Le16-2 and CPOC-2nd are related to open water situations for which the CPOC-2nd and Le16-2 are not suitable.  [49] and offshore pixels with Rrs(665) < 0.0012 (c,d). Density plots of POC as derived with the CPOC-2nd and Le16-2 algorithms for (c) all the pixels of the scene (d) only for pixels with Rrs(665) > 0.0012. Distribution of POC estimates with CPOC-2nd and Le16-2 algorithm for (e) all pixels of the scene (f) only for pixels with Rrs(665) > 0.0012. The black and red lines represent the median of POC estimates using the Le16-2 and CPOC-2nd algorithms, respectively.  [49] and offshore pixels with R rs (665) < 0.0012 (c,d). Density plots of POC as derived with the CPOC-2nd and Le16-2 algorithms for (c) all the pixels of the scene (d) only for pixels with R rs (665) > 0.0012. Distribution of POC estimates with CPOC-2nd and Le16-2 algorithm for (e) all pixels of the scene (f) only for pixels with R rs (665) > 0.0012. The black and red lines represent the median of POC estimates using the Le16-2 and CPOC-2nd algorithms, respectively.

Concluding Remarks
A variety of POC inversion algorithms based on different approaches were evaluated from an extensive dataset composed of coincident measurements of POC and R rs (λ), which were sampled in highly contrasting bio-optical coastal environments. While these existing algorithms perform relatively well over POC ranges for which they were developed, they present some lack of accuracy over a broad range of POC concentrations. With the objective of improving POC estimates at the global scale for coastal waters, a new empirical relationship was proposed based on a second-order polynomial regression using the maximum band ratio. The new algorithm (CPOC) shows relevant capacity to estimate POC concentrations on the in-situ validation dataset (MAPD = 37.48%, RMSD log = 0.25, MB = 0.54 µg L −1 , and MR = 1.02). Robust results were found when the algorithm was tested on the matchup dataset as illustrated by the consistency in the median values computed for the modeled POC (102.5 µg L −1 ) and the in situ POC datasets (108.4 µg L −1 ). While the use of a spectral band ratio to retrieve POC reduces the impact of atmospheric corrections, the latter continues to have an impact as the accuracy of R rs retrieval is spectrally dependent [50]. The new algorithm was applied to MERIS data in the Louisiana Continental Shelf in June 2006. The observed spatial and temporal (not shown) patterns were in good agreement with the patterns observed by Le et al. who developed an algorithm specifically for this region in 2006. The algorithm developed in this study performs consistently across the three types of water (mineral-dominated, mixed, and organic-dominated). However, the POC concentration in mineral-dominated waters tends to be overestimated, whereas POC concentration in organic-dominated waters tends to be underestimated. Second-order polynomial regressions specific to the type of waters to adjust the POC concentration according to the POC/SPM ratio were evaluated, but such formulations did not improve the estimates. From these results, several key points of development are highlighted as necessary to the development of greater knowledge pertaining to the composition of the particulate pool.
The use of POC/SPM ratio as a proxy for particulate composition according to the criteria of [39] should be re-examined. The criteria established by [39] to define the type of waters were fixed regarding the spectral shape of the absorption coefficient and the particle size distribution (PSD). This exercise was based on measurements sampled from optically contrasted conditions but in a restricted area (Imperial Beach, San Diego, Tijuana River watershed). It will be interesting to re-examine these criteria on a larger number of observations performed in different coastal regions to ensure their possible generalization at a global scale. However, in many cases, PSD is not available from the optical measurements and measurements of the particulate absorption spectra using benchtop spectrophotometers are not systematically performed. In this framework, the ratio of the particulate backscattering to scattering coefficient (b bp /b p ), measured by commercial optical backscattering sensors, such as WetLabs Eco-VSF and ECO-BB, could be a valuable parameter. Indeed, previous studies [37,51,52] have shown that b bp /b p could be an indicator of the amount of organic material relative to mineral particles. It would be interesting to re-conduct these studies on coincident POC, R rs (λ), SPM, and b bp /b p in situ measurements to define some criteria at the global scale and to better characterize which kind of particles dominate in a defined water mass. Table A1. Statistics obtained on the 144 hyperspectral data of DSV with hyperspectral R rs only. Multispectral data coming from CASES [36], Biosope [37] and Coastlooc [26,27] Figure A1. Comparison of the statistical performance of the eleven algorithms. The algorithms were tested using hyperspectral data only. The number of data (N = 144) is the same for the 11 algorithms (Table 5).