Assessment of Chlorophyll-a Algorithms Considering Different Trophic Statuses and Optimal Bands

Numerous algorithms have been proposed to retrieve chlorophyll-a concentrations in Case 2 waters; however, the retrieval accuracy is far from satisfactory. In this research, seven algorithms are assessed with different band combinations of multispectral and hyperspectral bands using linear (LN), quadratic polynomial (QP) and power (PW) regression approaches, resulting in altogether 43 algorithmic combinations. These algorithms are evaluated by using simulated and measured datasets to understand the strengths and limitations of these algorithms. Two simulated datasets comprising 500,000 reflectance spectra each, both based on wide ranges of inherent optical properties (IOPs), are generated for the calibration and validation stages. Results reveal that the regression approach (i.e., LN, QP, and PW) has more influence on the simulated dataset than on the measured one. The algorithms that incorporated linear regression provide the highest retrieval accuracy for the simulated dataset. Results from simulated datasets reveal that the 3-band (3b) algorithm that incorporate 665-nm and 680-nm bands and band tuning selection approach outperformed other algorithms with root mean square error (RMSE) of 15.87 mg·m−3, 16.25 mg·m−3, and 19.05 mg·m−3, respectively. The spatial distribution of the best performing algorithms, for various combinations of chlorophyll-a (Chla) and non-algal particles (NAP) concentrations, show that the 3b_tuning_QP and 3b_680_QP outperform other algorithms in terms of minimum RMSE frequency of 33.19% and 60.52%, respectively. However, the two algorithms failed to accurately retrieve Chla for many combinations of Chla and NAP, particularly for low Chla and NAP concentrations. In addition, the spatial distribution emphasizes that no single algorithm can provide outstanding accuracy for Chla retrieval and that multi-algorithms should be included to reduce the error. Comparing the results of the measured and simulated datasets reveal that the algorithms that incorporate the 665-nm band outperform other algorithms for measured dataset (RMSE = 36.84 mg·m−3), while algorithms that incorporate the band tuning approach provide the highest retrieval accuracy for the simulated dataset (RMSE = 25.05 mg·m−3).

Zhang et al. [47]). A dataset covers wide ranges of constituents is required to evaluate these algorithms. Thus, a comparison of these algorithms with a huge dataset that covers a broad range of trophic statuses is required, particularly when the optimal bands and regression model are considered. In this study, two simulated datasets, each comprising 500,000 reflectance spectra, were used together with a measured dataset to achieve the following objectives: (1) assess the performance of seven Chla retrieval algorithms over wide ranges of Chla, NAP and CDOM concentrations; (2) evaluate the effect of different band combinations and regression approaches on the Chla retrieval accuracy; and (3) provide a reference for algorithm selection based on the concentrations of Chla and NAP, which can be used as a base for developing a classification scheme.

Field and Laboratory Measurements
Sixteen field campaigns were conducted in Tokyo Bay (35 • 25 N; 139 • 47 E) from June 2010 to August 2013 ( Figure 1). Tokyo Bay is located in the center of Japan. Many rivers flow into the northern and western sides of the bay. Tokyo Bay has a surface area of 1500 km 2 , with a mean and maximum water depth of 40 and 70 m, respectively. Water samples and the corresponding water surface reflectance spectra were recorded in each field campaign. Water samples were collected at a depth of 0.5 m below the surface between 09:00 and 14:00 JST. The absorption and backscattering of water constituents were also measured in four campaigns (12 sites): October 2011 (4 sites), May 2012 (3 sites), June 2012 (4 sites) and August 2012 (1 site). Remote sensing reflectance (R rs (λ)) were estimated by measuring the in-water upwelling radiance (L u (0 − , λ)) and downwelling irradiance (E d (0 + , λ)) with three TriOS-RAMSES hyperspectral radiometers in the spectral range between 350 nm and 900 nm at a 2-nm spectral interval and a field-of-view of 7 • . Irradiance sensor was attached to the ship to measure the E d (0 + , λ), whereas the other two sensors were placed in the water for measuring L u (0 − , λ). Both E d (0 + , λ) and L u (0 − , λ) are acquired for a few minutes at each station and then the irregular spectra (i.e., based on the standard deviation) are removed before averaging the remaining spectra to obtain E d (0 + , λ) and L u (0 − , λ) [21]. The R rs (λ) shown in Figure 2 were calculated as follows: where L w (λ) denotes the water leaving radiance, ρ refers to the Fresnel reflectance of the air sea interface with a value of 0.025 and n represents the refractive index of seawater with a value of 1.34 [50]. Thus, a comparison of these algorithms with a huge dataset that covers a broad range of trophic statuses is required, particularly when the optimal bands and regression model are considered. In this study, two simulated datasets, each comprising 500,000 reflectance spectra, were used together with a measured dataset to achieve the following objectives: (1) assess the performance of seven Chla retrieval algorithms over wide ranges of Chla, NAP and CDOM concentrations; (2) evaluate the effect of different band combinations and regression approaches on the Chla retrieval accuracy; and (3) provide a reference for algorithm selection based on the concentrations of Chla and NAP, which can be used as a base for developing a classification scheme.

Field and Laboratory Measurements
Sixteen field campaigns were conducted in Tokyo Bay (35°25′ N; 139°47′ E) from June 2010 to August 2013 ( Figure 1). Tokyo Bay is located in the center of Japan. Many rivers flow into the northern and western sides of the bay. Tokyo Bay has a surface area of 1500 km 2 , with a mean and maximum water depth of 40 and 70 m, respectively. Water samples and the corresponding water surface reflectance spectra were recorded in each field campaign. Water samples were collected at a depth of 0.5 m below the surface between 09:00 and 14:00 JST. The absorption and backscattering of water constituents were also measured in four campaigns (12 sites): October 2011 (4 sites), May 2012 (3 sites), June 2012 (4 sites) and August 2012 (1 site). Remote sensing reflectance ( ( )) were estimated by measuring the in-water upwelling radiance ( (0 , )) and downwelling irradiance ( (0 , )) with three TriOS-RAMSES hyperspectral radiometers in the spectral range between 350 nm and 900 nm at a 2-nm spectral interval and a field-of-view of 7°. Irradiance sensor was attached to the ship to measure the (0 , ), whereas the other two sensors were placed in the water for measuring (0 , ). Both (0 , ) and (0 , ) are acquired for a few minutes at each station and then the irregular spectra (i.e., based on the standard deviation) are removed before averaging the remaining spectra to obtain (0 , ) and (0 , ) [21]. The ( ) shown in Figure 2 were calculated as follows: where ( ) denotes the water leaving radiance, refers to the Fresnel reflectance of the air sea interface with a value of 0.025 and represents the refractive index of seawater with a value of 1.34 [50].  The Chla concentrations were fluorometrically measured by using a Turner Designs 10-AU fluorometer. The fluorometer was calibrated using the method described by Mitchell et al [51]. Water samples of 20 mL were filtered through 25-mm Whatman GF/F filters with a 0.7-µm pore size. The filter was immediately soaked in 6 mL of N,N-dimethylformamide (DMF) and stored in the dark at 4 °C for 4 h to extract the pigments. At each station, the Chla measurements repeated three times and the Chla concentration was obtained by averaging the three values. There are several advantages of using DMF instead of acetone including: (1) pigment extraction is less sensitive to variation of temperature; and (2) shorter time is required for pigment extraction [52]. The total suspended solids (TSS), organic suspended solids (OSS), and inorganic suspended solids (ISS) were determined gravimetrically [53]. The absorption coefficients of the non-algal particles, colored dissolved organic matter, and phytoplankton were measured with the quantitative filter technique (QFT) approach [51] by using a UV/VIS spectrophotometer (JASCO, V550, Hachioji, Japan). The vertical profiles of the total backscattering coefficients ( ( )) were measured at six wavelengths (420, 442, 488, 510, 550 and 676 nm) with a HydroScat-6 (HOBI Labs). The backscattering coefficients for particles ( ( )) were estimated by subtracting the backscattering of pure water [36,54] from ( ) [55]. Table 1 summarizes the descriptive statistics of the optical water quality parameters.  The Chla concentrations were fluorometrically measured by using a Turner Designs 10-AU fluorometer. The fluorometer was calibrated using the method described by Mitchell et al [51]. Water samples of 20 mL were filtered through 25-mm Whatman GF/F filters with a 0.7-µm pore size. The filter was immediately soaked in 6 mL of N,N-dimethylformamide (DMF) and stored in the dark at 4 • C for 4 h to extract the pigments. At each station, the Chla measurements repeated three times and the Chla concentration was obtained by averaging the three values. There are several advantages of using DMF instead of acetone including: (1) pigment extraction is less sensitive to variation of temperature; and (2) shorter time is required for pigment extraction [52]. The total suspended solids (TSS), organic suspended solids (OSS), and inorganic suspended solids (ISS) were determined gravimetrically [53]. The absorption coefficients of the non-algal particles, colored dissolved organic matter, and phytoplankton were measured with the quantitative filter technique (QFT) approach [51] by using a UV/VIS spectrophotometer (JASCO, V550, Hachioji, Japan). The vertical profiles of the total backscattering coefficients (b b (λ)) were measured at six wavelengths (420, 442, 488, 510, 550 and 676 nm) with a HydroScat-6 (HOBI Labs). The backscattering coefficients for particles (b bp (λ)) were estimated by subtracting the backscattering of pure water [36,54] from b b (λ) [55]. Table 1 summarizes the descriptive statistics of the optical water quality parameters.

Bio-Optical Model
Gordon et al. [11] fitted the remote sensing reflectance just beneath the water surface, r rs (λ), with a polynomial function of inherent optical properties (IOPs), which can be simplified as where a(λ) and b b (λ) are the total absorption and total backscattering coefficients, respectively; f is the light field factor; and Q denotes the light distribution factor. Kirk [56] found that f is a function of the solar altitude, which is expressed as a linear function of the cosine of the zenith angle of the refracted photons µ o : The value of f /Q was 0.09 based on the Tokyo Bay in-situ measurements, which was consistent with the previously reported value from a highly turbid water body (i.e., Lake Taihu, China) [57]. Both values of a(λ) and are due to the collective contributions of pure water and optically active components within the water, which include three constituents: phytoplankton, NAP and CDOM. The total absorption a(λ) is expressed as where a w (λ) is the absorption coefficient of pure water [58]; and a ph (λ), a N AP (λ), and a CDOM (λ) are the absorption coefficients for phytoplankton, NAP, and CDOM, respectively ( Figure S1 in Supplementary Materials). The total backscattering coefficient b b (λ) can be described as where b b,w (λ) is the backscattering coefficient of pure water [36,54] and b b,p (λ) is the backscattering coefficient of particles. The radiance just above the surface was related to the radiance just below the surface by a factor of 0.544 [59]. Hence, the remote sensing reflectance just above the water surface, R rs (λ), can be calculated as

Simulating the Remote Sensing Reflectance
Numerous researchers have proposed many models to generate simulated R rs based on bio-optical models [38,60,61]. The main difference between these models is the specific absorption and backscattering parameters [62]. The phytoplankton absorption, a ph (λ), and chlorophyll concentration, [Chla], have a proportional relationship [63] that can be expressed as where a * ph (λ) refers to the specific absorption of phytoplankton. The specific IOPs for phytoplankton, NAP, and CDOM were computed by averaging their values for the 12 stations within Tokyo Bay. The absorption spectra of both NAP and CDOM have an exponential decay distribution with increasing wavelength and can be determined from their reference values at 440 nm as follows: [CDOM] refers to the CDOM absorption at 440 nm; a * N AP (λ) and a * CDOM (λ) are the specific absorption of NAP and CDOM at any wavelength, respectively; a * N AP (440) and a * CDOM (440) stand for the specific absorption of NAP and CDOM at 440 nm, respectively; and S N AP and S CDOM represent the decay slope coefficients of NAP and CDOM, respectively. The backscattering coefficient of particles, b b,p (λ), at different wavelengths was estimated by fitting the measured backscattering at the six wavelengths with an exponential function: where n stands for the spectral slope of b b,p (λ); b b,p (λ o ) refers to the backscattering at the reference band (λ o ); and λ o was assigned to 550 nm. The segregation of b b,p (λ) into the backscattering of phytoplankton (b b,ph (λ)) and non-algal particles (b b,N AP (λ)) was based on the assumption that their contributions to b b,p (λ) were proportionally correlated to their concentrations [38]. Separating NAP from the TSS with laboratory analysis is impossible [64], so the approach of Gons et al. [65] was employed to divide the TSS into NAP and phytoplankton suspended solids (PSS). First, a linear regression model was created a relationship between Chla and OSS at 70 stations of Tokyo Bay. This relationship showed that the 1.0 mg·m −3 Chla concentration was relatively equivalent to 0.0687·g m −3 TSS. Therefore, the NAP concentration was calculated by subtracting the PSS (=0.0687 × Chla) from the TSS. The contribution of chlorophyll fluorescence emission to the remote sensing reflectance was simulated by Gilerson et al. [63] for Case 2 waters. The fluorescence spectral distribution follows a Gaussian shape with a peak at 685 nm, a full-width half maximum of 25 nm and a standard deviation (σ) of 10.6 nm [66]. Thus, the chlorophyll fluorescence emission reflectance R rs, f l (λ) was modeled as where E d (685) refers to the downwelling irradiance just above the water surface at 685 nm with a value of 1.
The total reflectance was calculated as the sum of the reflectance that was estimated in Equations (5) and (10) [32]. Table 2 summarizes the IOPs values that used to simulate reflectance.

Candidate Chla Algorithms and Band Selection
During this research, seven algorithms (i.e., OC4E, two-band ratio, three-band algorithm, four-band algorithm, maximum chlorophyll index, normalized difference chlorophyll index, and synthetic chlorophyll index) were selected to assess Chla retrieval for highly turbid waters. The formulas of the seven algorithms are listed in Appendix A and a brief summary of these algorithms can be found in the Supplementary Materials (Section S1). The aforementioned algorithms are general expressions that can be applied to various band combinations of multispectral and hyperspectral data. All the possible band combinations for each algorithm were examined, resulting in 15 algorithms ( Table 3). The proposed bands were the same as MERIS's central bands. Although the MERIS mission ended in 2012, the Ocean and Land Colour Instrument (OLCI) sensor on board of the Sentinel-3 satellite was recently launched with similar configurations to MERIS and additional bands to improve atmospheric correction. The 665-nm and 680-nm bands were selected for three reasons. First, these two bands are present in most ocean color sensors (e.g., the spectral bands of the MEdium Resolution Imaging Spectrometer (MERIS) and MODerate Resolution Imaging Spectro-radiometer (MODIS) sensors); Second, these two wavelengths are interchangeable for Case 2 water algorithms; for instance, Le et al. [39] and Matsushita et al. [42] used the 680-nm and 665-nm bands with the MCI algorithms, respectively; Finally, these two bands are correlated with chlorophyll-a; as the 665-nm band is located near the red Chla absorption maximum in the spectral curves, while the 680-nm band is closer to the maximum fluorescence of Chla. The band tuning approach was applied to the three-and four-band algorithms to find the optimal band, as proposed by Sun et al. [69]. All 15 algorithms, except for OC4E, provided indicators that could be related to the measured constituents through the regression process. Researchers considered linear [37,70], polynomial [32,47,71,72] and power [10,73] regression approaches. Consequently, 14 out of the 15 proposed algorithms, excluding OC4E, were assessed with linear, quadratic polynomial and power regression approaches. Thus, this study proposed 43 algorithms to be evaluated. The linear, polynomial and power regressions were as follows: where Ind alg denotes the indicators from each algorithm; and a-c are the regression coefficients. The selected power regression varies from the standard power regression expression (Chla = a × (alg r ) b ) as this standard expression creates error for negative values of alg r . The proposed power model was also employed by Matsushita et al. [42].

Accuracy Assessment of the Algorithms
The algorithms' performance was evaluated using the root mean square error (RMSE) and the mean absolute relative error (MARE).
where X i,input and X i,retr denote the input (i.e., measured or reference concentrations) and retrieved Chla concentrations, respectively; and N is the total number of samples. The log-based error (Error log ) was used as proposed by Zhu et al. [60] to investigate the error of the Chla concentration.

Simulated Reflectance Assessment
Given the lack of extensive field data covering different tropic status, a simulated reflectance dataset is required to assess the proposed algorithms for waters that range from low to high turbidity. Bio-optical modeling is used to connect remote sensing reflectance with Chla, NAP, and CDOM concentrations. Changing any of these three constituents will change the remote sensing reflectance. Remote sensing reflectance spectra were generated based on a bio-optical model. Figure 3 illustrates a comparison between the measured and simulated reflectance spectra at two stations, whereas Figure S2 in Supplementary Materials shows the comparison of the Tokyo Bay's twelve stations that have IOPs measurements. The simulated reflectance spectra for these two stations were generated using the bio-optical model and average value of IOPs measured in Tokyo Bay for these two stations ( Table 2). This comparison revealed that the simulated reflectance spectra were similar to the measured reflectance in terms of both the spectral shape and spectral magnitude, especially in the red-NIR region, which is mainly used to substitute for Case 2 water algorithms.
In order to generate the simulated reflectance datasets covering a wide range of constituents, the concentrations of Chla (hereafter called the reference Chla), NAP, and CDOM (the CDOM concentration refers to the absorption of CDOM at 440 nm) were changed in ranges of 1-200 (mg·m −3 ), 1-200 (g·m −3 ) and 0.1-10 (m −1 ), respectively. Increments of 2 mg·m −3 , 2 g·m −3 and 0.2 m −1 were used for Chla, NAP and CDOM, respectively. The simulated reflectance was generated using the bio-optical model explained in Section 2.2 (Equations (3)-(11)). The values of SIOPs were randomly changed during the simulation process ( Table 2, wide range). Specific phytoplankton absorption used to simulate reflectance was taken from Ciotti [74] with different weighting factors (0.1 ≤ Sf ≤ 0.5) range for microplankton and picoplankton. Two simulated datasets were generated with 500,000 reflectance spectra (i.e., 100 × 50 × 100) for each dataset. These two datasets were used for calibration and validation stages.  (Figure 4b). Based on these evaluations, the simulated reflectance could be a reliable dataset to assess the performance of Chla algorithms over wide ranges of Chla, NAP and CDOM concentrations.
during the simulation process (Table 2, wide range). Specific phytoplankton absorption used to simulate reflectance was taken from Ciotti [74] with different weighting factors (0.1 ≤ Sf ≤ 0.5) range for microplankton and picoplankton. Two simulated datasets were generated with 500,000 reflectance spectra (i.e., 100 × 50 × 100) for each dataset. These two datasets were used for calibration and validation stages. Figure 4 demonstrates the simulated reflectance for different Chla concentrations of 1-200 mg m −3 and CDOM concentration of 5.1 m −1 for two groups of data, while the NAP was 5.1 g m −3 (Figure 4a) and 61 g·m −3 (Figure 4b). Based on these evaluations, the simulated reflectance could be a reliable dataset to assess the performance of Chla algorithms over wide ranges of Chla, NAP and CDOM concentrations.   simulate reflectance was taken from Ciotti [74] with different weighting factors (0.1 ≤ Sf ≤ 0.5) range for microplankton and picoplankton. Two simulated datasets were generated with 500,000 reflectance spectra (i.e., 100 × 50 × 100) for each dataset. These two datasets were used for calibration and validation stages. Figure 4 demonstrates the simulated reflectance for different Chla concentrations of 1-200 mg m −3 and CDOM concentration of 5.1 m −1 for two groups of data, while the NAP was 5.1 g m −3 (Figure 4a) and 61 g·m −3 (Figure 4b). Based on these evaluations, the simulated reflectance could be a reliable dataset to assess the performance of Chla algorithms over wide ranges of Chla, NAP and CDOM concentrations.   During the calibration stage, algorithms' indicators were estimated using reflectance spectra from one of the generated simulated datasets (hereafter, calibration dataset). The algorithms' indicators were then correlated with the reference Chla of the calibration dataset to obtain the regression coefficients of linear, quadratic polynomial and power regression approaches (Table 4). Figure 5 shows scatterplots of reference Chla versus algorithms' indicators. The 4b_tuning, 3b_tuning, and 3b_680 revealed a relatively high correlation with coefficient of determination (R 2 ) ranging from 0.93 to 0.94. The SCI_4b and OC3E algorithms introduced the lowest correlation with R 2 < 0.07. For the validation stage, the Chla concentrations were retrieved using regression coefficients that were estimated during the calibration stage, as well as algorithms' indicators from the second simulated dataset (hereafter, validation dataset). In the following sections, an evaluation of the retrieved Chla will be conducted. 3b_tuning, and 3b_680 revealed a relatively high correlation with coefficient of determination (R 2 ) ranging from 0.93 to 0.94. The SCI_4b and OC3E algorithms introduced the lowest correlation with R 2 < 0.07. For the validation stage, the Chla concentrations were retrieved using regression coefficients that were estimated during the calibration stage, as well as algorithms' indicators from the second simulated dataset (hereafter, validation dataset). In the following sections, an evaluation of the retrieved Chla will be conducted. conducted.  Reference reflectance versus algorithms' ratio for the calibration dataset of simulated reflectance. The highest three performing algorithms were highlighted in bold. The lowest three performing algorithms were single-underlined. LN, QP, and PW stand for linear, quadratic polynomial, and power regression approaches, respectively.

Overall Performance
The retrieval accuracy of each of the 43 algorithmic combinations was compared in terms of the R 2 , RMSE and MARE values (Table 4). Overall, the comparison of the three regression approach (i.e., linear, quadratic polynomial and power regression) for each algorithm revealed that their retrieval accuracies were comparable. The most important factor that contributed toward the algorithms' accuracy is the algorithm indicators. Consequently, the RMSE and MARE values were averaged for the three regression approaches within each algorithm (e.g., the RMSE of 2b_665_LN, 2b_665_QP, and 2b_665_PW were averaged as 2b_665), as shown in Figure 6 The results from the SCI_4b and SCI_max_min revealed that the SCI_max_min provided less error in terms of RMSE, which decreased from 55.55 mg·m −3 to 28.80 mg·m −3 ( Figure 6). These results can be attributed to the selected bands as SCI_4b did not contain the 709-nm band, which is significant to Chla retrieval; this is consistent with the findings of Gower et al. [34]. The OC4E algorithm was evaluated using relatively low Chla and NAP concentrations in ranges of 1-20 mg·m −3 and 1-20 g·m −3 , respectively. Although the OC4E algorithm was originally trained using a huge dataset (2804 station) with wide ranges of Chla concentrations (0.01-64 mg·m −3 ) [75], it introduced relatively high error (RMSE = 9.89 mg·m −3 , MARE = 191.79%). The MARE of OC4E was very high because the MARE is a relative error, which assesses the Chla retrieval over a small range of Chla values (≤20 mg·m −3 ); thus, any small difference between the reference and retrieved Chla concentrations would cause a high relative error. These results reveal the importance of executing an optimization process to determine the coefficients of OC4E with a calibration dataset to improve retrieval accuracy. the three regression approaches within each algorithm (e.g., the RMSE of 2b_665_LN, 2b_665_QP, and 2b_665_PW were averaged as 2b_665), as shown in Figure 6. The results from the SCI_4b and SCI_max_min revealed that the SCI_max_min provided less error in terms of RMSE, which decreased from 55.55 mg·m −3 to 28.80 mg·m −3 ( Figure 6). These results can be attributed to the selected bands as SCI_4b did not contain the 709-nm band, which is significant to Chla retrieval; this is consistent with the findings of Gower et al. [34]. The OC4E algorithm was evaluated using relatively low Chla and NAP concentrations in ranges of 1-20 mg·m −3 and 1-20 g·m −3 , respectively. Although the OC4E algorithm was originally trained using a huge dataset (2804 station) with wide ranges of Chla concentrations (0.01-64 mg·m −3 ) [75], it introduced relatively high error (RMSE = 9.89 mg·m −3 , MARE = 191.79%). The MARE of OC4E was very high because the MARE is a relative error, which assesses the Chla retrieval over a small range of Chla values (≤20 mg·m −3 ); thus, any small difference between the reference and retrieved Chla concentrations would cause a high relative error. These results reveal the importance of executing an optimization process to determine the coefficients of OC4E with a calibration dataset to improve retrieval accuracy. Figure 6. Overall assessment of the 15 algorithms in terms of root mean square error (RMSE) and mean absolute relative error (MARE) for simulated dataset. The values for RMSE and MARE were averaged for the three regression approaches of each algorithm (e.g., the RMSE of 2b_665_LN, 2b_665_QP, and 2b_665_PW were averaged as 2b_665).

Algorithm Performance by Considering Chla and NAP
As explained, the bio-optical model links the remote sensing reflectance with IOPs (i.e., total absorption and total backscattering). The CDOM only contributes to the total absorption, while the Chla and NAP are fractions of both the absorption and backscattering. The CDOM absorption is very high in the blue wavelengths and exponentially decreases with increasing wavelength. Thus, its contribution to the total absorption in the red-NIR wavelengths is very low compared to that of phytoplankton, which has a relatively high absorption in the red-NIR wavelengths [76]. In addition, increasing the NAP increases the total backscattering and the backscattering in the NIR wavelengths. The reflectance spectra were compared for two groups of simulated reflectance to reveal the influence of changing CDOM concentrations on the simulated reflectance for blue-green and red-NIR regions (Figure 7). Within each group, the concentrations of Chla and NAP were fixed, while the CDOM was examined at three different concentrations: low (0.1 m −1 ), moderate (2.5 m −1 ) and high (9.9 m −1 ) concentrations. For both groups, the influence of changing CDOM concentrations was relatively higher in the blue-green region than in the red-NIR region. Case 2 waters' algorithms mainly use the Figure 6. Overall assessment of the 15 algorithms in terms of root mean square error (RMSE) and mean absolute relative error (MARE) for simulated dataset. The values for RMSE and MARE were averaged for the three regression approaches of each algorithm (e.g., the RMSE of 2b_665_LN, 2b_665_QP, and 2b_665_PW were averaged as 2b_665).

Algorithm Performance by Considering Chla and NAP
As explained, the bio-optical model links the remote sensing reflectance with IOPs (i.e., total absorption and total backscattering). The CDOM only contributes to the total absorption, while the Chla and NAP are fractions of both the absorption and backscattering. The CDOM absorption is very high in the blue wavelengths and exponentially decreases with increasing wavelength. Thus, its contribution to the total absorption in the red-NIR wavelengths is very low compared to that of phytoplankton, which has a relatively high absorption in the red-NIR wavelengths [76]. In addition, increasing the NAP increases the total backscattering and the backscattering in the NIR wavelengths. The reflectance spectra were compared for two groups of simulated reflectance to reveal the influence of changing CDOM concentrations on the simulated reflectance for blue-green and red-NIR regions (Figure 7). Within each group, the concentrations of Chla and NAP were fixed, while the CDOM was examined at three different concentrations: low (0.1 m −1 ), moderate (2.5 m −1 ) and high (9.9 m −1 ) concentrations. For both groups, the influence of changing CDOM concentrations was relatively higher in the blue-green region than in the red-NIR region. Case 2 waters' algorithms mainly use the reflectance spectra in the red-NIR region. Accordingly, the algorithms' retrieval accuracy was assessed for various combinations of Chla and NAP.
The RMSE of the 43 algorithms were estimated between the reference and retrieved Chla for each Chla and NAP combination (Figure 8). The red colors indicate low error, whereas the blue colors indicate high error with an upper limit of 40 mg·m −3 for RMSE. In general, the selection of regression approach (i.e., linear, quadratic polynomial and power regression) had low influence on the retrieval accuracy ( Figure 8). For example, the 2b_665_LN, 2b_665_QP and 2b_665_PW provided similar accuracies (Figure 8). In addition, the band incorporated with each algorithm (e.g., 665-nm, 680-nm and tuning approach) had low influence on the retrieval accuracy. For instance, the 2b_665, 2b_680 and 2b_max_min with linear, quadratic polynomial and power regression approaches introduced comparable accuracy (Figure 8). However, the retrieval accuracy significantly changed across different algorithms (i.e., two-band, three-band, four-band, MCI, NDCI and SCI algorithms).
Twelve out of the 43 combinations (3b_665, 3b_680, 3b_tuning and 4b_tuning with linear, quadratic polynomial and power regression approaches) introduced high retrieval accuracy for different concentration combinations of Chla and NAP (Figure 8). The second-tier of accurate algorithms were 2b_665, 2b_680, 2b_max_min, NDCI_665, NDCI_680, and NDCI_max_min with linear, quadratic polynomial and power models. The second-tier algorithms were accurate with moderate Chla and NAP concentrations (Figure 8). The MCI and SCI algorithms with different combinations showed the lowest retrieval accuracies. The performance of SCI_max_min was higher than SCI_4b as using the maximum and minimum technique to find the optimal band enabled SCI_max_min to be correlated with Chla concentrations. The Chla retrieval accuracy of OC4E in terms of RMSE was about 8.0 mg m −3 for various combinations of Chla and NAP ( Figure 8). However, this retrieval accuracy can be considered of low accuracy because the OC4E was investigated in low ranges of Chla (≤20 mg·m −3 ) and NAP (≤20 g·m −3 ). The low retrieval of OC4E can be attributed to two reasons; (1) the coefficients of OC4E need to be optimized with the calibration dataset to improve the retrieval accuracy; and (2) high absorbance of NAP and CDOM along with Chla in the blue-green wavelengths could affect the accuracy of OC4E to accurately retrieve Chla.
The algorithms illustrated in Figure 8 can be classified into two groups based on the influence of NAP concentrations on the retrieval accuracy. The first group consists of twelve algorithms (i.e., 3b_665, 3b_680, 3b_tuning and 4b_tuning with linear, quadratic polynomial and power regression approaches), whereas the second group represents the rest of the algorithms. In the first group, the NAP had almost the same influence at a given Chla concentration. For example, the 3b_680_LN algorithm produced same retrieval accuracy at different NAP concentration (i.e., 1-200 g·m −3 ) and Chla concentration of 100 mg·m −3 . In the second group, the retrieval accuracy will change for different NAP concentrations at a given Chla concentration. For instance, the retrieval accuracy of 2b_665_LN was different across different NAP concentrations (i.e.,   The RMSE of the 43 algorithms were estimated between the reference and retrieved Chla for each Chla and NAP combination (Figure 8). The red colors indicate low error, whereas the blue colors indicate high error with an upper limit of 40 mg·m −3 for RMSE. In general, the selection of regression approach (i.e., linear, quadratic polynomial and power regression) had low influence on the retrieval accuracy ( Figure 8). For example, the 2b_665_LN, 2b_665_QP and 2b_665_PW provided similar accuracies (Figure 8). In addition, the band incorporated with each algorithm (e.g., 665-nm, 680-nm and tuning approach) had low influence on the retrieval accuracy. For instance, the 2b_665, 2b_680 and 2b_max_min with linear, quadratic polynomial and power regression approaches introduced comparable accuracy (Figure 8). However, the retrieval accuracy significantly changed across different algorithms (i.e., two-band, three-band, four-band, MCI, NDCI and SCI algorithms).
Twelve out of the 43 combinations (3b_665, 3b_680, 3b_tuning and 4b_tuning with linear, quadratic polynomial and power regression approaches) introduced high retrieval accuracy for different concentration combinations of Chla and NAP (Figure 8). The second-tier of accurate algorithms were 2b_665, 2b_680, 2b_max_min, NDCI_665, NDCI_680, and NDCI_max_min with linear, quadratic polynomial and power models. The second-tier algorithms were accurate with moderate Chla and NAP concentrations (Figure 8). The MCI and SCI algorithms with different combinations showed the lowest retrieval accuracies. The performance of SCI_max_min was higher than SCI_4b as using the maximum and minimum technique to find the optimal band enabled SCI_max_min to be correlated with Chla concentrations. The Chla retrieval accuracy of OC4E in terms of RMSE was about 8.0 mg m −3 for various combinations of Chla and NAP ( Figure 8). However, this retrieval accuracy can be considered of low accuracy because the OC4E was investigated in low ranges of Chla (≤20 mg·m −3 ) and NAP (≤20 g·m −3 ). The low retrieval of OC4E can be attributed to two reasons; (1) the coefficients of OC4E need to be optimized with the calibration dataset to improve the retrieval accuracy; and (2) high absorbance of NAP and CDOM along with Chla in the blue-green wavelengths could affect the accuracy of OC4E to accurately retrieve Chla.
The algorithms illustrated in Figure 8 can be classified into two groups based on the influence of NAP concentrations on the retrieval accuracy. The first group consists of twelve algorithms (i.e., 3b_665, 3b_680, 3b_tuning and 4b_tuning with linear, quadratic polynomial and power regression approaches), whereas the second group represents the rest of the algorithms. In the first group, the NAP had almost the same influence at a given Chla concentration. For example, the 3b_680_LN algorithm produced same retrieval accuracy at different NAP concentration (i.e., 1-200 g·m −3 ) and Chla concentration of 100 mg·m −3 . In the second group, the retrieval accuracy will change for different NAP concentrations at a given Chla concentration. For instance, the retrieval accuracy of 2b_665_LN was different across different NAP concentrations (i.e., 1-200 g·m −3 ) and Chla of 150 mg·m −3   The retrieval accuracies of the algorithms by using linear, quadratic polynomial and power regression were comparable, as concluded in Section 3.1.1. Thus, the quadratic polynomial regression approaches, along with OC4E, were compared to find the most accurate algorithms among Chla and NAP combinations, resulting in 15 algorithms listed in Table 5. A total of 10,000 combinations of Chla and NAP concentrations existed. The lowest RMSE values were selected among the 15 algorithms to find the most accurate algorithm for each combination of Chla and NAP. Table 5 summarizes the frequency of producing the minimum error for each algorithm. The 3b_tuning_QP outperformed other algorithms in terms of frequency (33.19%). In addition, the ten algorithms that required only The retrieval accuracies of the algorithms by using linear, quadratic polynomial and power regression were comparable, as concluded in Section 3.1.1. Thus, the quadratic polynomial regression approaches, along with OC4E, were compared to find the most accurate algorithms among Chla and NAP combinations, resulting in 15 algorithms listed in Table 5. A total of 10,000 combinations of Chla and NAP concentrations existed. The lowest RMSE values were selected among the 15 algorithms to find the most accurate algorithm for each combination of Chla and NAP. Table 5 summarizes the frequency of producing the minimum error for each algorithm. The 3b_tuning_QP outperformed other algorithms in terms of frequency (33.19%). In addition, the ten algorithms that required only multispectral data (i.e., OC4E and the nine multi-band algorithms, excluding all of the algorithms that required band tuning) were also examined ( Table 5). The 3b_680_QP had the most outstanding accuracy, as it was the most frequent algorithm that achieved the minimum RMSE among the ten multi-band algorithms (60.52%). Figure 9 illustrates the spatial distribution of the best three algorithms by considering all 15 algorithms and the 10 multi-band algorithms. Overall, there was a complex interaction between the spatial distributions of the most accurate algorithms (Figure 9a,b). These results reveal that no single algorithm has the best accuracy for Chla retrieval and that multi-algorithms should be included to reduce the error. This finding is consistent with those of recently published results [40,44]. The four-band algorithm produced higher accuracy for high Chla and NAP concentrations, due to the fact that the four-band algorithm was proposed for highly turbid water [47]. multispectral data (i.e., OC4E and the nine multi-band algorithms, excluding all of the algorithms that required band tuning) were also examined ( Table 5). The 3b_680_QP had the most outstanding accuracy, as it was the most frequent algorithm that achieved the minimum RMSE among the ten multi-band algorithms (60.52%). Figure 9 illustrates the spatial distribution of the best three algorithms by considering all 15 algorithms and the 10 multi-band algorithms. Overall, there was a complex interaction between the spatial distributions of the most accurate algorithms (Figure 9a,b). These results reveal that no single algorithm has the best accuracy for Chla retrieval and that multi-algorithms should be included to reduce the error. This finding is consistent with those of recently published results [40,44]. The fourband algorithm produced higher accuracy for high Chla and NAP concentrations, due to the fact that the four-band algorithm was proposed for highly turbid water [47]. The best three algorithms in terms of frequency were highlighted in bold.

Algorithm Assessment with Field Measurements
The 43 algorithms were also evaluated based on the in-situ dataset measured in Tokyo Bay (i.e., 70 samples with full water quality parameters and water surface reflectance spectra). The 70 stations were randomly divided into the following ratio: 70% as a calibration dataset and 30% for validation stage. The Chla retrieval accuracy for the 43 algorithms was assessed, as shown in Table 6. The best

Algorithm Assessment with Field Measurements
The 43 algorithms were also evaluated based on the in-situ dataset measured in Tokyo Bay (i.e., 70 samples with full water quality parameters and water surface reflectance spectra). The 70 stations were randomly divided into the following ratio: 70% as a calibration dataset and 30% for validation stage. The Chla retrieval accuracy for the 43 algorithms was assessed, as shown in Table 6. The best three algorithms in terms of R 2 , RMSE and MARE for the measured dataset were highlighted in bold (Table 6). During the calibration stage, the 3b_tuning_QP, 3b_tuning_PW and 3b_80_PW outperformed other algorithms with R 2 ≥ 0.72. However, 2b_665_QP, 2b_665_PW, and NDCI_665_QP provided the highest retrieval accuracy with R 2 of 0.85 and RMSE ≤ 9.87 mg·m −3 for the validation dataset.
The SCI_4b algorithm with linear, quadratic polynomial, and power regression models introduced the lowest retrieval accuracy during the calibration (R 2 ≤ 0.12) and validation stages (R 2 ≤ 0.14, RMSE ≥ 22.80 mg·m −3 , and MARE ≥ 297.10 g·m −3 ). In contrast, SCI_max_min_QP provided better accuracy, which can be attributed to the band selection as SCI_max_min incorporated the reflectance peak at 709 nm, while SCI_4b did not. These results reveal the importance of incorporating the 709-nm wavelength to retrieve Chla with adequate accuracy. OC4E also provided low retrieval accuracy in terms of RMSE of 21.28 mg·m −3 . This result was observed because OC4E was proposed for applications in open oceans, where the optical properties are dominated by the Chla concentrations. In contrast, the optical properties of Tokyo Bay are influenced not only by Chla but also by NAP and CDOM, especially in the blue and green wavelengths, which are used to substitute for the OC4E algorithm. Therefore, OC4E failed to correlate the measured reflectance with the Chla concentrations. The highest three performing algorithms were highlighted in bold. The lowest three performing algorithms were single-underlined. Linear regression (LN) as expressed in Equation (12) (Chla = a × ind alg + b) LN, QP, and PW stand for linear, quadratic polynomial, and power regression approaches.
The Chla retrieval accuracies of the ten algorithms with the lowest RMSE of in-situ dataset were evaluated by comparing the measured and retrieved Chla (Figure 10a) and by calculating Error log between the measured and retrieved Chla (Figure 10b). A common trend existed among these algorithms, which overestimated low Chla concentrations (Chla ≤ 6.0 mg·m −3 ) with Error log > 0.0 and underestimated high Chla concentrations (Chla > 50.0 mg·m −3 ) with Error log < 0.0, as shown in Figure 10a,b. No bias occurred for Chla concentrations from 7.0 to 50 mg·m −3 . The results revealed the limitation of the investigated algorithms to accurately retrieve low and high Chla concentrations, which implies the importance of developing new algorithms that can reduce the influence of NAP and CDOM to accurately retrieve the Chla concentrations in turbid water. The highest three performing algorithms were highlighted in bold. The lowest three performing algorithms were single-underlined. Linear regression (LN) as expressed in Equation (12) (Chla = a × indalg + b) LN, QP, and PW stand for linear, quadratic polynomial, and power regression approaches.
The Chla retrieval accuracies of the ten algorithms with the lowest RMSE of in-situ dataset were evaluated by comparing the measured and retrieved Chla (Figure 10a) and by calculating Errorlog between the measured and retrieved Chla (Figure 10b). A common trend existed among these algorithms, which overestimated low Chla concentrations (Chla ≤ 6.0 mg·m −3 ) with Errorlog > 0.0 and underestimated high Chla concentrations (Chla > 50.0 mg·m −3 ) with Errorlog < 0.0, as shown in Figure  10a,b. No bias occurred for Chla concentrations from 7.0 to 50 mg·m −3 . The results revealed the limitation of the investigated algorithms to accurately retrieve low and high Chla concentrations, which implies the importance of developing new algorithms that can reduce the influence of NAP and CDOM to accurately retrieve the Chla concentrations in turbid water. Figure 10. Assessment of the ten best algorithms of the in-situ dataset: (a) measured versus retrieved chlorophyll-a (Chla); and (b) errors versus the measured Chla, where the errors were estimated between the measured and mean retrieved Chla of the ten algorithms. Figure 10. Assessment of the ten best algorithms of the in-situ dataset: (a) measured versus retrieved chlorophyll-a (Chla); and (b) errors versus the measured Chla, where the errors were estimated between the measured and mean retrieved Chla of the ten algorithms.

Validity of the Simulated Reflectance Dataset's Models on Chla Retrieval in Tokyo Bay
Although the current study was not intended to provide regression relationships from the simulated dataset to retrieve Chla concentrations in local water bodies, the retrieval accuracy of simulated dataset's models (i.e., regression models summarizes in Table 4 obtained during calibration stage) was investigated using Tokyo Bay dataset. The measured Chla concentrations were compared with the retrieved Chla using the relationships of the 43 algorithms summarized in Table 4. In general, most of the algorithms overestimated Chla concentrations and many algorithms produced negative Chla values, especially with linear regression ( Figure S3 in Supplementary Materials). Although the three-band and four-band algorithms were the most accurate algorithms in terms of R 2 > 0.56 (i.e., 3b_tuning_LN, 3b_tuning_QP, 4b_tuning_LN and 4b_tuning_QP) ( Table S1 in Supplementary Materials), they produced negative Chla concentration, particularly for low measured Chla ( Figure S3f,g in Supplementary Materials). The 3b_680_LN and 4b_tuning_PW had the lowest RMSE of 17.56 and 17.14 mg·m −3 , respectively. There was good agreement between the measured and retrieved Chla from 3b_680_LN and 4b_tuning_PW algorithms without negative values ( Figure S3e,g in Supplementary Materials). The lowest algorithm in terms of MARE was 3b_665_PW (86.08%) ( Table S1 in Supplementary Materials), which overestimated low Chla concentrations. Figure S3h,i in Supplementary Materials reveals the limitation of the MCI by producing same retrieved Chla for different measured Chla. These results can be attributed to the poor correlation between reference Chla and the algorithms' indicators (Figure 5h,i). Similarly, the SCI_4b provided the lowest retrieval accuracy.

Comparing the Algorithms' Performance for Measured and Simulated Datasets
The overall retrieval accuracy of the band selection (e.g., algorithms with the 665-nm or band tuning) and regression models (i.e., linear, quadratic polynomial, and power models) were compared for the measured (i.e., the 70 samples), and simulated datasets ( Figure 11). Except for OC4E and SCI_4b, which provided the lowest retrieval accuracy, the other 39 of the 43 algorithms were classified into four groups: algorithms with the 665-nm band (665_algs), algorithms with the 680-nm band (680_algs), algorithms that incorporated band tuning (Tuning_algs), and algorithms that utilized the maximum and minimum approach (Max_min_algs). The retrieval accuracy in terms of the RMSE for all of the algorithms within each group was averaged (Figure 11a). The Tuning_algs outperformed the other three groups for simulated dataset with RMSE of 25.05 mg·m −3 , while the 665_algs provided the highest retrieval accuracy with RMSE of 11.58 mg. The second most accurate group varied among the two datasets: Max_Min_algs and 680_algs were the second most accurate for the simulated (RMSE = 33.21 mg·m −3 ) and in-situ datasets (RMSE = 14.23 mg·m −3 ), respectively. The 665_algs algorithm had the lowest retrieval accuracy for the simulated dataset with RMSE of 36.84 mg·m −3 . The Max_Min_algs had the worst performance for the in-situ dataset with RMSE of 17.09 mg·m −3 .
Sensors 2017, 17, 1746 19 of 24 Figure 11. Comparison of the average retrieval accuracy of the in-situ and simulated datasets in terms of RMSE: (a) 665_algs, 680_algs, tuning_algs, and Max_Min_algs denote the average retrieval for all the algorithms that included the 665-nm band, 680-nm band, band tuning and maximum and minimum, respectively; and (b) LN_algs, QP_algs, and PW_algs represent the average of all the algorithms with linear, quadratic polynomial, and power regression models, respectively.

Conclusions
In this study, the performances of seven Chla algorithms were investigated with all possible band combinations and three regression models (i.e., linear, quadratic polynomial and power regression approaches). In total, 43 algorithms were assessed based on in-situ and simulated datasets. Figure 11. Comparison of the average retrieval accuracy of the in-situ and simulated datasets in terms of RMSE: (a) 665_algs, 680_algs, tuning_algs, and Max_Min_algs denote the average retrieval for all the algorithms that included the 665-nm band, 680-nm band, band tuning and maximum and minimum, respectively; and (b) LN_algs, QP_algs, and PW_algs represent the average of all the algorithms with linear, quadratic polynomial, and power regression models, respectively. Similarly, 39 of the 43 algorithms, excluding the OC4E and SCI_4b algorithms, were classified into three groups: namely, LN_algs, QP_algs, and PW_algs, which were estimated by averaging the Chla retrieval accuracy for all the algorithms with the linear, quadratic polynomial and power regression models, respectively (Figure 11b). In general, the regression approach (i.e., LN, QP, and PW) had more influence on the simulated dataset than the measured dataset due to the fact that the variation of the remote sensing reflectance among the simulated dataset was very large, as shown in Figure 4a, comparing with measured reflectance (Figure 2). The LN_algs had the best retrieval accuracy for the simulated dataset (RMSE = 27.71 mg·m −3 ), whereas the QP_algs had the lowest retrieval accuracy for simulated dataset in terms of RMSE of 42.03 mg·m −3 . The accuracy for in-situ measurements ranged between 13.77 mg·m −3 and 14.55 mg·m −3 .

Conclusions
In this study, the performances of seven Chla algorithms were investigated with all possible band combinations and three regression models (i.e., linear, quadratic polynomial and power regression approaches). In total, 43 algorithms were assessed based on in-situ and simulated datasets. Two simulated datasets that covered wide ranges of Chla (1-200 mg·m −3 ), NAP (1-200 g·m −3 ), and CDOM (0.1-10 m −1 ) concentrations were generated to calibrate and validate the proposed algorithms. Each of the simulated dataset comprised 500,000 reflectance spectra. Having a large pool of simulated reflectance enabled us to thoroughly evaluate the Chla algorithms. Across all algorithms, the 2b_665_QP and 4b_tuning_LN algorithms outperformed the other algorithms for measured (R 2 = 0.85%, RMSE = 9.64 mg·m −3 ) and simulated datasets (R 2 = 0.93%, RMSE = 14.73 mg·m −3 ). The SCI algorithms showed the highest error for both datasets, with average RMSE of 22.95 mg·m −3 and 55.55 mg·m −3 for measured and simulated datasets, respectively. The spatial distribution of the most accurate algorithms among 15 algorithms (i.e., OC4E and all of the algorithms incorporated the quadratic polynomial regression approach) for the 10,000 combinations of Chla, and NAP revealed that the three-band incorporated tuning selection approach outperformed other algorithms with minimum RMSE frequency of 33.19%. In addition, the spatial distribution highlighted the importance of incorporating multi-algorithms to improve retrieval accuracy. The two-, three-, and four-band, and NDCI algorithms tend to have acceptable accuracy among measured and simulated datasets, while some other algorithms have high error (i.e., SCI algorithms). Overall, the regression approach has more influence on simulated datasets than the measured dataset, due to the wide range of simulated datasets magnifying the influence of the fitting process. In addition, the power regression can cause errors for negative values of the algorithm's ratio. As a result, the linear and quadratic polynomial regression is recommended for simulated and measured datasets, respectively.