Retrieval of Phytoplankton Pigment Composition from Their In Vivo Absorption Spectra

: Algal pigment composition is an indicator of phytoplankton community structure that can be estimated from optical observations. Assessing the potential capability to retrieve different types of pigments from phytoplankton absorption is critical for further applications. This study investigated the performance of three models and the utility of hyperspectral in vivo phytoplankton absorption spectra for retrieving pigment composition using a large database (n = 1392). Models based on chlorophyll-a (Chl-a model), Gaussian decomposition (Gaussian model), and partial least squares (PLS) regression (PLS model) were compared. Both the Gaussian model and the PLS model were applied to hyperspectral phytoplankton absorption data. Statistical analysis revealed the advantages and limitations of each model. The Chl-a model performed well for chlorophyll-c (Chl-c), diadinoxanthin, fucoxanthin, photosynthetic carotenoids (PSC), and photoprotective carotenoids (PPC), with a median absolute percent difference for cross-validation (MAPD CV ) < 58%. The Gaussian model yielded good results for predicting Chl-a, Chl-c, PSC, and PPC (MAPD CV < 43%). The performance of the PLS model was comparable to that of the Chl-a model, and it exhibited improved retrievals of chlorophyll-b, alloxanthin, peridinin, and zeaxanthin. Additional work undertaken with the PLS model revealed the prospects of hyperspectral-resolution data and spectral derivative analyses for retrieving marker pigment concentrations. This study demonstrated the applicability of in situ hyperspectral phytoplankton absorption data for retrieving pigment composition and provided useful insights regarding the development of bio-optical algorithms from hyperspectral and satellite-based ocean-colour observations.


Introduction
The magnitude of oceanic carbon fixation by phytoplankton through photosynthesis is comparable to net production by terrestrial plants at the global scale, making phytoplankton a key component in the global carbon cycle. Phytoplankton dynamics are important when examining existential threats such as the greenhouse effect and ocean acidification [1][2][3][4]. The role of phytoplankton in various biogeochemical cycles varies with their functional type. Important phytoplankton types can be distinguished through auxiliary pigments that are referred to as marker pigments. It is a basic requirement in the field of marine ecology to study the quantitative distribution of these marker pigments at large spatiotemporal scales. Their concentrations, together with that of the main photosynthetic pigment, chlorophyll-a (Chl-a), can be estimated using high-performance liquid chromatography (HPLC) techniques. However, HPLC methods are time-consuming and expensive, limiting the number of observations that can be made. During the last few decades, optical data, including reflectance and spectral absorption measurements from various platforms have been successfully used to determine the concentration of Chl-a as well as other pigments at large spatiotemporal scales [3,[5][6][7][8][9]. Many approaches have been developed to retrieve pigment compositions of phytoplankton from multispectral or hyperspectral data.
Most current ocean-colour sensors provide spectral information on remote sensing reflectance (R rs ) in the visible domain at a limited set of wavebands. However, the potential ability of hyperspectral retrieval of phytoplankton pigment compositions has attracted much attention [6,10], especially with the launch of the PRISMA satellite [11,12] and the upcoming PACE missions of NASA [13]. Algorithms using empirical orthogonal functions and Gaussian decomposition have been designed to improve retrieval of phytoplankton pigments from in situ hyperspectral R rs data [6,7,10,14], preparatory to their use in hyperspectral satellite missions. Their results showed the potential utility of optical methods for retrieval of certain types of pigments and suggested that using continuous hyperspectral instruments was beneficial for improving algorithm performance. However, challenges of finding a global set of have also been raised. With an inversion algorithm that defines phytoplankton absorption spectra (a B (λ), where we have used the subscript B to indicate phytoplankton biomass) as a sum of Gaussian functions, Chase et al. [7] found that this method is only feasible for a limited set of pigments (Chl-a; chlorophyll-b, Chl-b; chlorophyll-c 1 + chlorophyll-c 2 , Chl-c 1,2 ; and photoprotective carotenoids, PPC) and similar errors were found for empirical algorithms with Chl-a as an intermediate variable.
Xi et al. [15] re-tuned the empirical orthogonal function algorithm by including sea surface temperature as an adaptational input parameter.
Algorithms for retrieval of phytoplankton pigments from R rs typically rely on the influence of pigments on the spectral absorption characteristics of phytoplankton [6,7,[16][17][18], or empirical relationships between the concentration of Chl-a and the major auxiliary pigments [19]. Phytoplankton-absorption-based methods make possible a more direct approach to predict phytoplankton community structures. Many studies focused on phytoplankton absorption spectra to investigate the possibilities of using hyperspectral data to get more information about pigment composition [5,8,20]. Decomposition of a B (λ) into Gaussian components attributed to specific pigments or pigment groups was first proposed by Hoepffner and Sathyendranath [21] and was successfully extended for partitioning particulate absorption (a p (λ)) data measured with underway flow-through system [5,9] and for characterizing the specific absorption spectra of eight phytoplanktonic groups [20]. Derivative analysis was used as another promising approach for isolating phytoplankton absorption signatures from hyperspectral ocean colour observations [14,22]. Organelli et al. [23] used the multivariate least squares regression technique to estimate phytoplankton size classes from the fourth-derivative spectra of phytoplankton or total particulate absorption. Catlett et al. [8] developed a novel optical model with principal components regression based on the unique relationships between pigments and absorption signatures isolated using spectral derivative analysis of a B (λ). The method provided robust results for obtaining concentrations of many specific pigments. In short, these algorithms all have benefits and limitations. Understanding the advantages and limitations of different models may help us to assess the relative merits of various methods of pigment retrieval from optical data.
In this study, based on a large database of phytoplankton absorption and pigment composition data collected in three different areas (Northwest Atlantic, the Arabian Sea, and off the coast of Chile), we compare the performance of three model types for retrieval of pigment concentrations from in vivo a B (λ) or from the concentration of Chl-a. The three types of retrieval models investigated here comprised (i) Chl-a models (of the type proposed by Hirata et al. [19] and Chase et al. [7]) that rely on observed relationships between the Chl-a concentration and the concentration of various auxiliary pigments; (ii) the Gaussian decomposition model (Gaussian model) [5,21,24,25], in which absorption spectra are broken down into a number of Gaussian components and their relationships with pigments Remote Sens. 2021, 13, 5112 3 of 21 explored; and (iii) the partial least squares model (PLS model), in which concentrations of various pigments are expressed as linear functions of phytoplankton absorption at a number of key wavelengths [23], which have first been subjected to principal component analysis. The advantages of derivative analysis and the superiority of hyperspectral data over multispectral are also discussed.

Datasets
From 1994 to 2003, 3205 samples of phytoplankton pigment and absorption data were collected during 27 cruises. Sampling sites were located in the Northwest Atlantic, the Arabian Sea, and off the coast of Chile with water types ranging from eutrophic coastal waters to stratified and oligotrophic waters ( Figure 1). Water samples were collected using Niskin bottles from the surface to a maximum depth of 140 m. In this study, 85% of the samples considered were collected within the top 50-m layer. Additional information regarding the measurement methods and the cruise's information is available in Sathyendranath et al. [26].
Remote Sens. 2021, 13, x FOR PEER REVIEW 3 of 2 spectra are broken down into a number of Gaussian components and their relationship with pigments explored; and (iii) the partial least squares model (PLS model), in whic concentrations of various pigments are expressed as linear functions of phytoplankton absorption at a number of key wavelengths [23], which have first been subjected to prin cipal component analysis. The advantages of derivative analysis and the superiority o hyperspectral data over multispectral are also discussed.

Datasets
From 1994 to 2003, 3205 samples of phytoplankton pigment and absorption data wer collected during 27 cruises. Sampling sites were located in the Northwest Atlantic, th Arabian Sea, and off the coast of Chile with water types ranging from eutrophic coasta waters to stratified and oligotrophic waters ( Figure 1). Water samples were collected us ing Niskin bottles from the surface to a maximum depth of 140 m. In this study, 85% o the samples considered were collected within the top 50-m layer. Additional information regarding the measurement methods and the cruise's information is available in Sathy endranath et al. [26].

Phytoplankton Absorption Spectra
Particulate samples were collected on GF/F glass fiber filters and stored at −70 °C until processed. The quantitative filter-pad method was used for measuring ( ) over range of wavelengths from 350 to 750 nm (at 1-nm increments), with a spectrophotomete equipped with a 15-cm integrating sphere. Phytoplankton pigments were extracted from the filters using a mixture of 90% acetone and dimethyl sulfoxide (6:4 vol:vol) for som cruises [27,28] or using 20 mL hot methanol for other cruises [26]. An exponential curv was fitted to the absorption spectrum of the residual material on the filter, to obtain th detrital absorption spectrum ( ( )) [27,29]. The detrital absorption was subtracted from the particulate absorption spectrum to obtain ( ). This method of calculating phyto plankton absorption spectra from total particulate absorption spectra corrects for the ef fect of incomplete extraction of pigments on the absorption spectrum of the residual ma terial. Correction for path-length amplification was implemented using the method o Hoepffner and Sathyendranath [24,30], with modifications as in Kyewalyanga et al. [31]

Phytoplankton Absorption Spectra
Particulate samples were collected on GF/F glass fiber filters and stored at −70 • C until processed. The quantitative filter-pad method was used for measuring a p (λ) over a range of wavelengths from 350 to 750 nm (at 1-nm increments), with a spectrophotometer equipped with a 15-cm integrating sphere. Phytoplankton pigments were extracted from the filters using a mixture of 90% acetone and dimethyl sulfoxide (6:4 vol:vol) for some cruises [27,28] or using 20 mL hot methanol for other cruises [26]. An exponential curve was fitted to the absorption spectrum of the residual material on the filter, to obtain the detrital absorption spectrum (a d (λ)) [27,29]. The detrital absorption was subtracted from the particulate absorption spectrum to obtain a B (λ). This method of calculating phytoplankton absorption spectra from total particulate absorption spectra corrects for the effect of incomplete extraction of pigments on the absorption spectrum of the residual material. Correction for path-length amplification was implemented using the method of Hoepffner and Sathyendranath [24,30], with modifications as in Kyewalyanga et al. [31].

Phytoplankton Pigments
Pigment composition was analyzed using reverse-phase HPLC, as described in Head and Horne [32]. Filters were homogenized in 1.5 mL of 90% acetone, then centrifuged and diluted with 0.5 M ammonium acetate buffer at a ratio of 2:1 before injection. The samples were run on a Beckman C18 reverse-phase, 3-um Ultrasphere column (70 × 4.6 mm), using methanol/0.5 M ammonium acetate (80/20) and methanol/ethyl acetate (70/30) as eluents. Chromatographic peaks were detected using a UV-VIS photodiode array detector (Beckman 168) and identified based on retention time and comparison with absorbance spectra of known pigment standards (Sigma Chemical Company, MO, USA; and the DHI Institute for Water and Environment, Hørsholm, Denmark). Divinyl chlorophyll-a and divinyl chlorophyll-b, characteristic of Prochlorococcus sp., were quantified by acidifying samples with 1 N HCL and recording the peak heights of divinyl phaeophytin-like pigments. The quality of pigment data was strictly controlled following the method described by Aiken et al. [33]. After that, 1392 samples combined with the phytoplankton absorption data were retained for this study.

Model Development
As illustrated in Figure 2, three model types were examined in this study to estimate pigment concentrations based on Chl-a concentration or phytoplankton absorption spectra. Various parameters were applied to compare the performance of each of the three model types. The same 75% of all samples were used as a training set for building the models and the same 25% of all samples were used as a testing set for validating the models.

Chl-a Model
The Chl-a model formula can be expressed as follows, which is an adjusted form of that used in Chase et al. [7]: where is the value of the j-th predicted pigment concentration, is the observed concentration of Chl-a, and and are model coefficients associated with the j-th pigment, determined using the training set. The MATLAB function "fit" was used for calculating the coefficients of the model, and "NonlinearLeastSquares" was the selected option for fitting.

Gaussian Model
The principle of Gaussian decomposition is that a phytoplankton absorption spectrum can be segmented typically into 11-13 Gaussian bands, each of which would be associated with a specific pigment group, and that the concentration of that pigment group could be estimated from the amplitude of the Gaussian band.
We used the method of Hoepffner and Sathyendranath [21], as implemented by Chase et al. [5] and Ye et al. [25], in which an optimized fixed-parameter configuration was selected for the central wavelength of each Gaussian band and its width. Each spectrum was decomposed by optimizing the overdetermined equation, which expressed the absorption spectrum as the sum of the individual absorption bands, by minimizing the error between the observed and reconstructed absorption spectra: where ( ) is the measured phytoplankton absorption, ( ) denotes the amplitude of the i-th Gaussian function, and and represent the central wavelength with maximum absorption and the width of the i-th Gaussian band, respectively. The range of wavelengths (400-700 nm) was treated at 1-nm increments. Here, the MATLAB function "lsqnonneg" was used to solve the nonlinear least square problem in Equation (2) and to invert the amplitudes .
The fixed central wavelengths (peak wavelengths) of the 12 Gaussian bands and their widths, together with the pigments associated with each band, are presented in Table 2. Although biliproteins were not analyzed in this work, one peak was assigned to phyco-

Chl-a Model
The Chl-a model formula can be expressed as follows, which is an adjusted form of that used in Chase et al. [7]: where C j is the value of the j-th predicted pigment concentration, C 1 is the observed concentration of Chl-a, and A j and B j are model coefficients associated with the j-th pigment, determined using the training set. The MATLAB function "fit" was used for calculating the coefficients of the model, and "NonlinearLeastSquares" was the selected option for fitting.

Gaussian Model
The principle of Gaussian decomposition is that a phytoplankton absorption spectrum can be segmented typically into 11-13 Gaussian bands, each of which would be associated with a specific pigment group, and that the concentration of that pigment group could be estimated from the amplitude of the Gaussian band.
We used the method of Hoepffner and Sathyendranath [21], as implemented by Chase et al. [5] and Ye et al. [25], in which an optimized fixed-parameter configuration was selected for the central wavelength of each Gaussian band and its width. Each spectrum was decomposed by optimizing the overdetermined equation, which expressed the absorption spectrum as the sum of the individual absorption bands, by minimizing the error between the observed and reconstructed absorption spectra: where a B (λ) is the measured phytoplankton absorption, a g (λ mi ) denotes the amplitude of the i-th Gaussian function, and λ mi and σ i represent the central wavelength with maximum absorption and the width of the i-th Gaussian band, respectively. The range of wavelengths (400-700 nm) was treated at 1-nm increments. Here, the MATLAB function "lsqnonneg" was used to solve the nonlinear least square problem in Equation (2) and to invert the amplitudes a g . The fixed central wavelengths (peak wavelengths) of the 12 Gaussian bands and their widths, together with the pigments associated with each band, are presented in Table 2. Although biliproteins were not analyzed in this work, one peak was assigned to phycoerythrin to ensure good fits. Table 2. Central wavelengths of the Gaussian peak and the widths (σ) of the Gaussian functions are assigned for different phytoplankton pigments (PE = phycoerythrin).

Chl-a
Chl-a Chl-c A power function was used to model the relationship between the magnitude of the Gaussian peak and the concentration of the associated pigment, as reported in Chase et al. [7]: where C j is the value of the j-th predicted pigment concentration, and a g (λ mi ) is the value of the i-th Gaussian peak at its central wavelength, λ mi . The coefficients A 2,i,j and B 2,i,j are the fitted parameters. The MATLAB function is used to calculate A 2 and B 2 was the same as that mentioned in Section 2.2.1.

Partial Least Squares Model
In the PLS model, principal component analysis and canonical correlation analysis were applied to phytoplankton absorption data (predictor variables, Xv) and pigments (response variables, Yv). First, the analyses were repeated for Xv and Yv to decide the required number of principal components (PCN) that maximize the covariance of Xv and Yv and maximize the correlation between Xv and Yv simultaneously. Details of the steps adopted for the selection of PCN are given in the following.

•
Step 1: For each analysis, all absorption samples with full wavelengths were split into two parts randomly, 75% of the dataset was used for training, and the remaining 25% was used for validation.

•
Step 2: Typically, the first 15 or fewer principal components (PCs) explained more than 99% of the total variance of the absorption spectrum, which demonstrates that the first 15 or fewer PCs contain most of the signals of the absorption spectrum. The explained variance for the independent and dependent variables was a valuable parameter derived from function "plsregress" in MATLAB. Therefore, the analyses were limited to the first 15 PCs. Within the first 15 PCs, the PCs were added sequentially from number 2. The weights associated with the corresponding PCs in the training dataset were regressed against the corresponding pigment concentration based on a leaveone-out cross-validation (CV) calculation. The coefficient of determination (r 2 ) of the PLS regression between each pigment and the corresponding principal component was tabulated.

•
Step 3: Steps 1 and 2 were repeated 100 times, and the mean r 2 and the mean value of the root mean square error (RMSE) were calculated for each pigment-PCN pair ( Figure 3).

•
Step 4: High RMSEs implied that the PCN was unsuitable for building the relationship between the PCs and pigment concentration. The ideal pigment-PCN pair would therefore have a high r 2 value and a low RMSE. We thus identified the optimal number of PCs necessary for the analysis based on the maximum r 2 and low standard RMSE.  Following the determination of the PCN, the inversion model could be established using the built-in function "plsregress" provided in MATLAB. The inputs for the PLS model were absorption spectra, target pigment concentration, and PCN related to the specific pigment. The main outputs of the PLS model were the corresponding coefficients for retrieving pigment concentrations.
Derivative spectra (i.e., second and fourth derivatives of phytoplankton absorption spectra, ( ) and ( ), respectively) of the phytoplankton absorption spectra were used to identify the major absorption signatures and exploit the utility for retrieving pigment concentrations. The derivative of ( ) was estimated using a finite difference approximation, as recommended by Organelli et al. [23], with a band separation of 9 nm after smoothing [8].
Selected multispectral bands of ( ) were also used as inputs to the PLS model.
The bands were selected based on typical ocean-colour satellite sensors (i.e., SeaWiFS, MODIS, and MERIS). Following the determination of the PCN, the inversion model could be established using the built-in function "plsregress" provided in MATLAB. The inputs for the PLS model were absorption spectra, target pigment concentration, and PCN related to the specific pigment. The main outputs of the PLS model were the corresponding coefficients for retrieving pigment concentrations.
Derivative spectra (i.e., second and fourth derivatives of phytoplankton absorption spectra, a B (λ) and a 4th B (λ), respectively) of the phytoplankton absorption spectra were used to identify the major absorption signatures and exploit the utility for retrieving pigment concentrations. The derivative of a B (λ) was estimated using a finite difference approximation, as recommended by Organelli et al. [23], with a band separation of 9 nm after smoothing [8].
Selected multispectral bands of a B (λ) were also used as inputs to the PLS model. The bands were selected based on typical ocean-colour satellite sensors (i.e., SeaWiFS, MODIS, and MERIS).

Model Assessment
The Pearson correlation coefficients (r), r 2 , RMSE, bias (Bias), and standard deviation (std) values were all calculated using the log-transformed predicted pigment concentration (C pi ) and log-transformed observed pigment concentration (C oi ). The median absolute percent difference (MAPD) was calculated using the non-log-transformed pigment concentration. We discussed the results obtained using the different models from the following three perspectives: 1.
Training model. Calculating statistical parameters based on the leave-one-out CV method [8,23]. The model was trained using the k − 1 sample selected from the training set (k samples) and verified using the k-th sample. After all, samples were traversed, k model results could be obtained from the model evaluation. 2.
Validating model. Statistical parameters were calculated by putting the testing set into the model built with the training set. 3.
Validating model with CV calculation. Statistical parameters were calculated based on the CV for 500 permutations [6,34,35]. The training set was divided into two parts randomly, as mentioned in Section 2.2.3. For each pigment, the model was fitted using S_train and validated using S_test for the 500 permutations, following which we obtained 500 groups of evaluation parameters. For each parameter, the mean of the 500 results was used as the final indicator of the model.
Parameters without "CV" were calculated as follows: Parameters with "CV" were calculated as follows: where C val o and C val p are the observed and predicted concentrations selected or retrieved from S_test, respectively, and N is the number of samples used in the model.

Chl-a Model
As shown in Table 3 and Figure 4, the retrieval performance for Chl-c 1,2 , chlorophyll c (Chl-c), diadinoxanthin (Diadino), fucoxanthin (Fuco), PSC, and PPC derived using the Chl-a model was better than that for other pigments, with r 2 values of >0.7, RMSEs of <0.53 mg m −3 , and Bias of <0.18 mg m −3 , except for Fuco. Predicted and measured data points of these pigments were distributed evenly around the 1:1 line. A high r 2 value corresponds to a slope (S) close to 1 and an intercept (I) close to 0, which can be observed in Figure 4. Relatively low values of their MAPD also suggest that the Chl-a model could Remote Sens. 2021, 13, 5112 9 of 21 retrieve these pigments reliably. With consideration of the analysis in Table 1, it can be ascertained that the correlation coefficients between these pigments and Chl-a were all >0.8, suggesting that these pigments have reasonably strong empirical relationships with Chl-a. Table 3. Statistics for the Chl-a model including A and B in Equation (1), slope (S), intercept (I), and determination coefficient (r 2 ) for the linear regression, root mean square error (RMSE, mg m −3 ), median absolute percent difference (MAPD, %), and bias (Bias, mg m −3 ).

Pigment
For Values of RMSE, Bias, and MAPD for Fuco were higher than for the other pigments, which might reflect the poor performance of the model for lower concentration samples of Fuco ( Figure 4). Moreover, other pigments such as alloxanthin (Allo), Chl-b, chlorophyll-c 3 (Chl-c 3 ), 19 -hexanoyloxy-xanthin (Hex19), peridinin (Perid), and zeaxanthin (Zea) are hardly retrieved by the Chl-a model. Specifically, all these pigments were estimated with values of r 2 < 0.29, Bias > 0.27 mg m −3 , and RMSE > 0.76 mg m −3 . The slopes of the data trend for these pigments were all <0.2, as reflected by the scattered points in Figure 4. Although there was an obvious trend along the 1:1 line for 19 -butanoyloxy-fucoxanthin (But19), the retrieved points were more dispersed than those for other pigments such as Chl-c. Coincidentally, the correlations between the concentrations of these pigments and Chl-a were all <0.56 (Table 1), suggesting weak relationships between these pigments and Chl-a.

Gaussian Model
As shown in Table 4, large differences existed in the model coefficients for different pigments. Specifically, the A 2 coefficient varied widely among the models based on different wavebands for Chl-a inversion. Validation results calculated from the testing set for Chl-a, Chl-b, Chl-c, PSC, and PPC showed consistency with those calculated from the training set (Table 4). For Chl-a, the highest value of r 2 and the lowest values of RMSE and MAPD were all obtained at 675 nm. The r 2 value of Chl-b, calculated either from the 470-nm band or from the 660-nm band, was lower than that of other pigments. The sample points of Chl-b deviated obviously from the 1:1 line ( Figure 5) and the slope of the Chl-b inversion was far from the value of 1 ( Table 4). The inversion quality obtained for Chl-c was similar at 584 and 638 nm with r 2 > 0.7. A clear 1:1 trend could be found for Chl-c despite the higher values of RMSE, MAPD, and Bias in Table 4. Satisfactory performance was also obtained for PSC and PPC at 523 and 492 nm, respectively. The r 2 value of PPC was slightly lower than that of PSC, but the values of RMSE, MAPD, and Bias for PPC were lower than those for PSC. The above results indicate that the Gaussian model performed well for Chl-a, Chl-c, PSC, and PPC but not for Chl-b.

Gaussian Model
As shown in Table 4, large differences existed in the model coefficients for diffe pigments. Specifically, the A2 coefficient varied widely among the models based on ferent wavebands for Chl-a inversion. Validation results calculated from the testing  was slightly lower than that of PSC, but the values of RMSE, MAPD, and Bias for were lower than those for PSC. The above results indicate that the Gaussian model formed well for Chl-a, Chl-c, PSC, and PPC but not for Chl-b.

Analysis of Inversion Results
In this section, we analyze only the PLS models based on original data and ful perspectral data; the models corresponding to derivative data or multispectral data discussed in Sections 3.

Analysis of Inversion Results
In this section, we analyze only the PLS models based on original data and full hyperspectral data; the models corresponding to derivative data or multispectral data are discussed in Sections 3.3.2 and 3.3.3. Results showed that the explained variances for the independent values were >99.8% for all pigments, suggesting excellent capability for signal extraction using the selected PCN. The explained variances for the dependent parameters showed lower values and large differences in performance among these pigments. It can be seen in Table 5 and from Figure 6 that the points of some pigments were scattered in the low-concentration region, but compact in the high-value region, e.g., Chl-a, Chl-c 3 , and Fuco. Specifically, the values of MAPD for all pigments were relatively low (24%-49%), while the RMSE values for But19, Chl-c 3 , Fuco, Perid, and Zea were slightly higher (RMSE > 0.6 mg m −3 ). Values of r 2 for Chl-a, Chl-b, Chl-c 1,2 , Chl-c, Diadino, Fuco, PSC, and PPC were higher than those for other pigments. Overall, the inversion performances of Chl-a, Chl-b, Chl-c 1,2 , Chl-c, PSC, and PPC were better than those of other pigments considering the r 2 , RMSE, and Bias values. The other pigments had relatively low values of r 2 but clear distribution of points ( Figure 6) around the 1:1 line. Reasonable consistency could be found among the slopes of the data trends, r 2 obtained from the training set, and those calculated from the testing set, suggesting that PLS is a promising and stable method for retrieval of concentrations of various pigments. Table 5. Statistics for the PLS model including the explained variance for independent (X, %) and dependent (Y, %) variables corresponding to the relative PCN (see Table 6), slope (S), intercept (I), determination coefficient (r 2 ) for the linear regression, root mean square error (RMSE, mg m −3 ), median absolute percent difference (MAPD, %), and bias (Bias, mg m −3 ).

Pigment
For  Table 6. The PCN for each pigment when different types of spectra are used, which includes a B (λ), a B (λ), a 4th B (λ), Multi_SeaWiFS, Multi_MODIS, and Multi_MERIS.

Comparison of Inversion Methods Using Absorption Spectra and Their Derivative Spectra
Derivative spectral analysis is usually used to analyze high-resolution and spectrally continuous remote sensing data [22,36]. To determine the advantage of using the derivative of phytoplankton absorption spectra for retrieving pigment concentrations, we compared the performances of the PLS models trained with a B (λ), a B (λ), and a 4th B (λ). The first derivative of the spectra was not selected because it could weaken the signal of absorption to a value of zero. The corresponding optimal PCN for each type of spectra is listed in Table 6, and the CV results of the model performances are shown in Figure 7.  these pigments, values of MAPDCV were also higher for But19 and Perid computed based on ( ). By contrast, the performance of both ( ) and ( ) was better than that of ( ) for most pigments, and there was generally little difference between the performances of ( ) and ( ). There was also little difference between the standard deviations of r 2 CV, RMSECV, and MAPDCV for all pigments among the different derivative spectra.

Comparison of Inversion Performance of Hyperspectral Absorption and Multispectral Absorption
To investigate whether hyperspectral data could improve the retrieval accuracy in comparison with limited spectral bands, we compared the inversion performances of the PLS models by inputting hyperspectral or multiband ( ) to the models as independent variables. Multispectral  As shown in Figure 7, a lower r 2 CV value was generally obtained from a 4th B (λ) for most pigments except Diadino. Values of RMSE CV derived from a 4th B (λ) were slightly higher than those derived from a B (λ) and a B (λ) for But19, Chl-c 3 , Fuco, and Perid. For these pigments, values of MAPD CV were also higher for But19 and Perid computed based on a 4th B (λ). By contrast, the performance of both a B (λ) and a B (λ) was better than that of a 4th B (λ) for most pigments, and there was generally little difference between the performances of a B (λ) and a B (λ). There was also little difference between the standard deviations of r 2 CV , RMSE CV , and MAPD CV for all pigments among the different derivative spectra.

Comparison of Inversion Performance of Hyperspectral Absorption and Multispectral Absorption
To investigate whether hyperspectral data could improve the retrieval accuracy in comparison with limited spectral bands, we compared the inversion performances of the PLS models by inputting hyperspectral or multiband a B (λ) to the models as independent variables. Multispectral a B (λ) corresponding to SeaWiFS bands, MODIS bands, and MERIS bands were denoted as Multi_SeaWiFS, Multi_MODIS, and Multi_MERIS, respectively.
In our study, the retrieval accuracy of Chl-a, But19, Chl-c 1,2 , Chl-c, Diadino, Fuco, Hex19, PSC, and PPC obtained using hyperspectral data and multispectral data was similar according to the values of r 2 CV , RMSE CV , and MAPD CV ; the MAPD CV of Fuco calculated using multispectral data was particularly low. For other pigments, including Allo, Chl-b, Chl-c 3 , Perid, and Zea, the hyperspectral absorption spectra had an obvious advantage that was reflected in the values of r 2 CV , RMSE CV , and MAPD CV (Figure 8).
using multispectral data was particularly low. For other pigments, including Allo, Chl-b, Chl-c3, Perid, and Zea, the hyperspectral absorption spectra had an obvious advantage that was reflected in the values of r 2 CV, RMSECV, and MAPDCV ( Figure 8). The performances achieved using multispectral data showed certain differences. Better performance for Allo was obtained using the MERIS bands in comparison with the performance achieved with the bands of the other two sensors (Figure 8). The MERIS bands differed from those of SeaWiFS and MODIS at 413, 560, 620, 665, and 681 nm, suggesting that the SeaWiFS and MODIS bands do not carry the same spectral information for Allo as carried by the MERIS bands.

Comparison of the Three Model Types
The performances of the three model types can be compared by considering two aspects: retrieved pigment types and model accuracy. Taylor diagrams ( Figure 9) were used to compare the relative precision of the different models, and the CV results (Table 7) are presented here. The performances achieved using multispectral data showed certain differences. Better performance for Allo was obtained using the MERIS bands in comparison with the performance achieved with the bands of the other two sensors (Figure 8). The MERIS bands differed from those of SeaWiFS and MODIS at 413, 560, 620, 665, and 681 nm, suggesting that the SeaWiFS and MODIS bands do not carry the same spectral information for Allo as carried by the MERIS bands.

Comparison of the Three Model Types
The performances of the three model types can be compared by considering two aspects: retrieved pigment types and model accuracy. Taylor diagrams ( Figure 9) were used to compare the relative precision of the different models, and the CV results (Table 7) are presented here. The three model types each had different capabilities regarding pigment species retrieval. The Chl-a model showed a reasonable capacity for retrieving Chl-c1,2, Chl-c,  The three model types each had different capabilities regarding pigment species retrieval. The Chl-a model showed a reasonable capacity for retrieving Chl-c 1,2 , Chl-c, Diadino, Fuco, PSC, and PPC with satisfactory statistics. The Gaussian model could retrieve the four main pigment groups from phytoplankton absorption spectra with satisfactory results, except for Chl-b. In comparison, an obvious advantage was exhibited by the PLS model in retrieving more types of pigment that included Chl-a, Chl-b, Chl-c 1,2 , Chl-c 3 , Chl-c, Diadino, Fuco, Hex19, PSC, and PPC, even when there were no strong correlations between those pigments and Chl-a (Table 1).
Inversion accuracy is an important index with which to evaluate a model that can be illustrated intuitively using Taylor diagrams. Regardless of the model used, Chl-a (except for the Chl-a model), Chl-c 1,2 (except for the Gaussian model), Chl-c, PSC, and PPC were all retrieved well with relatively high accuracy. For But19, the Chl-a model and the PLS model produced similar results with lower retrieval precision, whereas the Gaussian model could not produce an estimate. In comparison with the Chl-a model, the capacity for retrieving Allo, Chl-c 3 , and even Hex19, Perid, and Zea could be improved using the PLS model, as indicated by the higher r 2 values and lower RMSE, Bias, and MAPD values in the CV calculation ( Table 7). The retrieval accuracy for Chl-b predicted by the PLS model was higher than that of both the Chl-a model and the Gaussian model. The data points retrieved using the PLS model for these pigments were close to the reference point "A" shown in the Taylor diagrams (Figure 9), highlighting the improvement and advantages of this model. Among all the pigments, the best results for Chl-c, Chl-c 1,2 , Diadino, and PSC were obtained by the Chl-a model with higher accuracy and lower error; Chl-a was retrieved with slightly higher accuracy by the Gaussian model.

Performances of the Three Model Types
Using each of the three model types, stable and satisfactory results could be obtained for Chl-a, Chl-c, and PSC but not for Chl-b, which might reflect their different operating principles. Easy implementation is the main advantage of the Chl-a model, which is determined by the internal relationships between Chl-a and the objective pigments. Pigments that have a strong correlation with Chl-a, e.g., Chl-c 1,2 , Chl-c, Diadino, Fuco, and PSC, could be retrieved with relatively high accuracy. The Gaussian model produced a better performance for Chl-a but it was limited with regard to the other specific accessory pigments owing to the similar absorption features within each pigment group, as noted by Bricaud et al. [37]. In comparison, the PLS model exhibited a considerable advantage for more accessory pigments by capturing the principal characteristics of phytoplankton absorption spectra. It means that the PLS model has the potential capability to obtain more types of phytoplankton community structure. This is important for the estimation of ocean productivity and quantitative evaluation of climate change. As reported in previous studies, it is difficult to retrieve Chl-b with satisfactory accuracy [7,9,24,38]. In this study, the Chl-a model also showed low inversion accuracy for Chl-b owing to the weak covariance between Chl-b and Chl-a. A similar result was observed using the Gaussian model, which was found to be related to the poor correlation between Chl-b and the absorption at a single band. However, a significant improvement for Chl-b retrieval was realized using the PLS model, which could reflect its use of the full spectral absorption signal. Better performance of the PLS model was also observed for other pigments, e.g., Allo, Chl-c 3 , Hex19, Perid, and Zea.
For the examined models, it is important to discuss the internal variability of the dataset used. As reported by Sathyendranath et al. [26], the retrieval accuracy of Chl-b derived from an empirical regression model could be improved by restricting the samples to a lower ratio of Fuco/Chl-a. Such an improvement of r 2 for Chl-b was negligibly small based on our dataset, despite the repetition of the same analysis. A regionally developed model produced better performance than that obtained with a large-scale dataset, as revealed by comparison between the higher accuracy results derived by Liu et al. [9] and the relatively lower accuracy results obtained in our study, both of which were retrieved using the Gaussian model. Besides, results of Chl-c 1,2 , Chl-b, and PPC obtained from the Chl-a model trained in our study were compared to that trained in Chase et al. [7], who built a global Chl-a model based on a global dataset of in situ hyperspectral reflectance. Similar results were obtained for Chl-c 1,2 and PPC in both studies, but an obvious difference existed for Chl-b. Phytoplankton adjusts their activities such as photosynthesis to adapt to specific and changeable conditions. Such complex and dynamic vital processes highlight the importance of regional models for analyzing phytoplankton community structures. Sample concentration also has an important influence on the model extraction of information regarding pigments. Differences in retrieval accuracy for PSC and PPC were found between our study and that of Liu et al. [9], i.e., the concentration of PPC was lower and that of PSC was approximately one order of magnitude lower than that in our dataset (Table 1). Lower concentrations of pigments might cause relatively weak signals, making pigment concentration inversion more difficult.

Influences of Derivation and Spectral Resolution on the PLS Model
Overall, a B (λ), a B (λ), and a 4th B (λ) presented similar performances for most pigments, except Diadino, Fuco, and Perid. Using derivative spectra resulted in better performance for Diadino and poorer results for Fuco and Perid. Such results could be explained by the smooth and wide spectral features of Fuco and Perid, as shown in Figure 2 of Clementson and Wojtasiewicz [39]. In this case, a derivative analysis might alter the spectral signals. Conversely, Diadino showed relatively "sharp" absorption spectra within the blue wavebands, and derivative analysis could be beneficial for the extraction of "sharp" signals. Therefore, the superiority of derivative analysis depends on the features of the target pigment spectra.
The effect of spectral resolution on retrieval accuracy was evaluated by comparing continuous hyperspectral data with data collected from multispectral wavebands. Results showed that phytoplankton absorption spectra with hyperspectral resolution might exhibit considerable advantages for retrieval of certain individual pigments in comparison with the performance achieved using data with reduced spectral resolution. Our study suggests that the effectiveness of the multispectral model depends on whether the multispectral absorption contains the main spectral features with respect to the target pigment. However, the bands configured on current satellite sensors make it difficult to cover all spectral features of various pigments; therefore, it is undeniable that inversion performance could be improved using higher resolution spectra. This finding is consistent with the investigation of Wolanin et al. [40], which suggested that continuous hyperspectral data usually produce better results in comparison with multi-band data and that retrievals of pigment concentration could be improved by adding specific bands containing the optical signatures of those pigments.

Conclusions
We explored a number of methods for the retrieval of pigments concentrations from bio-optical measurements, including an empirical model based on chlorophyll-a, a Gaussian model, and a PLS regression model applied to phytoplankton absorption data after they were subjected to principal component analysis. According to validation of the model using in situ data, the Gaussian model could retrieve Chl-a, Chl-c, PSC, and PPC with satisfactory results. Chl-c 1,2 , Chl-c, Diadino, Fuco, PSC, and PPC could be well derived using the Chl-a model because these pigments have strong covariance with the concentration of Chl-a. For pigments that have a poor correlation with Chl-a, the PLS model represents the preferable application prospect. According to our results, we offer the following recommendation. If the pigment has a strong and stable correlation with Chl-a, the use of the Chl-a model is recommended. If only the Chl-a concentration is to be retrieved, the use of the Gaussian model should be considered. The PLS model based on hyperspectral data is recommended for simultaneous inversion of multiple pigment concentrations.
Our study demonstrated the superiority of continuous hyperspectral phytoplankton absorption over the multispectral bands of existing sensors for retrieving pigment concentrations, at least for some auxiliary pigments. The result of this work is a step towards the development of satellite-based methods applicable to hyperspectral missions. To extend the results to remote sensing data, additional long-term measurements of in situ observations and further analysis of the relationships between R rs and phytoplankton absorption will be necessary.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data that support the findings of this study are available from the author upon reasonable request.