Retrieval of Phytoplankton Pigments from Underway Spectrophotometry in the Fram Strait

Phytoplankton in the ocean are extremely diverse. The abundance of various intracellular pigments are often used to study phytoplankton physiology and ecology, and identify and quantify different phytoplankton groups. In this study, phytoplankton absorption spectra (aph(λ)) derived from underway flow-through AC-S measurements in the Fram Strait are combined with phytoplankton pigment measurements analyzed by high-performance liquid chromatography (HPLC) to evaluate the retrieval of various pigment concentrations at high spatial resolution. The performances of two approaches, Gaussian decomposition and the matrix inversion technique are investigated and compared. Our study is the first to apply the matrix inversion technique to underway spectrophotometry data. We find that Gaussian decomposition provides good estimates (median absolute percentage error, MPE 21–34%) of total chlorophyll-a (TChl-a), total chlorophyll-b (TChl-b), the combination of chlorophyll-c1 and -c2 (Chl-c1/2), photoprotective (PPC) and photosynthetic carotenoids (PSC). This method outperformed one of the matrix inversion algorithms, i.e., singular value decomposition combined with non-negative least squares (SVD-NNLS), in retrieving TChl-b, Chl-c1/2, PSC, and PPC. However, SVD-NNLS enables robust retrievals of specific carotenoids (MPE 37–65%), i.e., fucoxanthin, diadinoxanthin and 19′-hexanoyloxyfucoxanthin, which is currently not accomplished by Gaussian decomposition. More robust predictions are obtained using the Gaussian decomposition method when the observed aph(λ) is normalized by the package effect index at 675 nm. The latter is determined as a function of “packaged” aph(675) and TChl-a concentration, which shows potential for improving pigment retrieval accuracy by the combined use of aph(λ) and TChl-a concentration data. To generate robust estimation statistics for the matrix inversion technique, we combine leave-one-out cross-validation with data perturbations. We find that both approaches provide useful information on pigment distributions, and hence, phytoplankton community composition indicators, at a spatial resolution much finer than that can be achieved with discrete samples. Remote Sens. 2019, 11, 318; doi:10.3390/rs11030318 www.mdpi.com/journal/remotesensing Remote Sens. 2019, 11, 318 2 of 32


Introduction
Phytoplankton account for approximately half of global primary production via photosynthesis [1] and form the base of the marine food web.Intracellular pigments of phytoplankton, composed of chlorophylls (a, b and c), carotenoids (carotenes and xanthophylls) and phycobiliproteins (phycoerythrin, phycocyanin and allophycocyanin) [2], play a vital role in photoprotection and the light-driven part of photosynthesis.Chlorophyll-b, -c and photosynthetic carotenoids (PSC), such as fucoxanthin (Fuco), act as antenna pigments that transfer the light energy to chlorophyll-a in the photosynthetic reaction centers of photosystems, assisting in light harvesting for photosynthesis.In cyanobacteria, red algae, and cryptophytes, phycobiliproteins are the major light-harvesting pigments [3].Chlorophyll-a is crucial in converting the received light energy to chemically bonded energy.The carotenoids not involved in photosynthesis are photoprotective (PPC).In particular, some xanthophylls such as violaxanthin (Viola), zeaxanthin (Zea), diadinoxanthin (Diadino) and diatoxanthin (Diato) are involved in the xanthophyll cycle, one of the most important photoprotective mechanisms that drives the non-radiative dissipation of the excess light energy to prevent photoinhibition [4,5].Therefore, their relative abundance can be used as a tracer of photoacclimation processes [5].
In the context of global climate change, knowledge of the distributions of phytoplankton pigments is useful to understand the impacts of the changing environment on primary productivity [6], phytoplankton diversity and community composition through appropriate analysis, for example, CHEMTAX [7] and diagnostic pigment analysis [8].In remote sensing applications, phytoplankton pigment databases have been extensively used to develop, validate, or refine bio-optical algorithms for estimating phytoplankton biomass (often estimated using total chlorophyll-a (TChl-a) concentration) and functional types (via diagnostic pigment analysis) based on both cell size (micro-, nano-and pico-phytoplankton) and biogeochemical functions (e.g., calcification, silicification, dimethyl sulphide production and nitrogen fixation) [9] and references therein.These data sets are mainly based on high-performance liquid chromatography (HPLC) analysis of discrete water samples.This technique enables the accurate quantification of 25-50 pigments in a single analysis [10].However, it requires highly trained personnel, intensive labor and time, expensive and complex analysis, and is limited by the sampling frequency, spatial coverage and additional issues related to discrete sampling such as sample handling, storage and transportation.While HPLC pigment analysis remains indispensable, it is necessary to explore methods that enable easier access to pigment data at higher spatial-temporal resolution.
Because optical measurements are currently the only means of collecting synoptic scale information on upper ocean particles (e.g., operational open-ocean satellite ocean color provides data daily with pixel size down to 300 m by 300 m), attempts have been made to quantify the concentrations of various phytoplankton pigments from these measurements (e.g., absorption or reflectance spectra).Optical methods take advantage of the distinctive absorption characteristics of different pigments and various approaches are applied, such as the decomposition of spectra into Gaussian functions, e.g., [11], spectral reconstruction, e.g., [12], derivative analysis, e.g., [13], partial least squares regression, e.g., [14], multiple linear regression [15], reflectance band ratio, e.g., [16,17], principal component analysis, e.g., [18] and artificial neural networks [19,20].
The Gaussian decomposition method decomposes phytoplankton absorption spectra (a ph (λ)) into Gaussian functions and correlates the amplitudes of the Gaussian functions with the concentrations of major pigment groups.The amplitude of each Gaussian function is assumed to represent the magnitude of the absorption coefficient of a specific pigment or pigment group at the Gaussian peak wavelength, based on known pigment absorption properties determined in laboratory analyses.This method simultaneously retrieves the concentrations of chlorophyll-a, chlorophyll-b, chlorophyll-c and carotenoids [11,[21][22][23] or of chlorophyll-a and phycocyanin [24,25].However, the retrieval accuracy is generally limited by the variations in pigment package effect of field samples.Nevertheless, the Gaussian absorption coefficients of specific pigment groups were recently incorporated into the reconstruction of hyper-and multi-spectral remote sensing reflectance, allowing the robust estimation of the concentrations of TChl-a, total chlorophyll-b (TChl-b), the combination of chlorophyll-c1 and -c2 (Chl-c1/2) and PPC globally [23] as well as of phycocyanin in cyanobacteria bloom waters [24,25] from remote sensing reflectance data.
The spectral reconstruction method assumes that a ph (λ) can be reconstructed from the linear combination of pigment-specific absorption coefficients multiplied by corresponding pigment concentrations [26].Moisan et al. [27,28] applied matrix inversion analysis to the reconstruction model and successfully estimated the concentrations of a series of pigments directly from a ph (λ).This technique involves a first inversion of the observed pigment concentrations that derives pigment-specific absorption spectra and a second inversion of these derived pigment-specific absorption spectra that solves for pigment concentrations.Four methods that solve least squares problems, i.e., singular value decomposition (SVD) [29], non-negative least squares (NNLS) [30] and two nonlinear least squares minimization schemes based on the Levenberg-Marquardt algorithm [31,32] were compared for the two inversions.They found that when the first inversion was carried out with SVD and the second one with NNLS, the inverse modeling technique yielded the most accurate pigment estimates.However, the retrieval accuracy is affected by the level of correlation between pigment concentrations, the contribution of a specific pigment to the spectral a ph (λ), pigment package effect, the missing absorption components by the pigments that exist in the samples but are not obtained by standard HPLC (e.g., mycosporine-like amino acids and phycobiliproteins) [27,28], and the number of spectral bands of a ph (λ) used in the inversion model [27,28,33].Overall, the SVD-NNLS method achieved simultaneous statistically significant retrievals of TChl-a, total chlorophyll-c (TChl-c), β-carotene (β-Caro), Fuco, Viola, Diadino and peridinin (Peri) in U.S. east coast waters [27,28].It was recently applied to a ph (λ) modeled from MODIS-Aqua TChl-a data for northeastern U.S. waters, yielding maps of the concentrations of ten pigments [34].Similar approaches were successful in infering phytoplankton size classes globally [35,36] and taxonomic groups in the Chukchi and Bering Seas [33] from absorption data.
Derivative analysis of absorption spectra separates the secondary absorption peaks and shoulders contributed by phytoplankton pigments within the overlapping absorption regions [37].Bidigare et al. [13] found that the fourth derivative maxima of particulate absorption spectra (a p (λ)) provided strong linear relationships with chlorophylls (a, b and c) concentrations in Sargasso Sea.However, this method failed to estimate carotenoid concentrations because of the similarity of their spectral properties, the broad spectral absorption and relatively rounded absorption peaks that are less accessible to derivative analysis.
Principal component analysis (a.k.a.empirical orthogonal function analysis) derives several dominant modes (known as "principal components") of the spectra that mainly account for the variability in spectral shape and relates them to pigment concentrations.Bracher et al. [18] performed this analysis on both hyperspectral and multispectral remote sensing reflectance data and retrieved the concentrations of TChl-a, monovinyl-chlorophyll-a, PPC, PSC, Chl-c1/2, 19 -butanoyloxyfucoxanthin (But), 19 -hexanoyloxyfucoxanthin (Hex), Zea, phycoerythrin and the sum of α-and β-Caro from the linear combinations of the principal components in the Atlantic Ocean.This method is, however, only applicable to the pigments that have been identified in most collocated samples.It failed to retrieve the pigments that are mostly absent or below detection limit.Similarly, Soja-Woźniak et al. [38] applied this analysis on both hyperspectral and multispectral remote sensing reflectance data and successfully retrieved TChl-a, phycocyanin and phycoerythrin in the Gulf of Gdansk.
An artificial neural network relates spectra to pigment data with a nonlinear model that self-adjusts the model parameters (i.e., weight matrix) for the best fit.Bricaud et al. [19] developed a multilayer perceptron using a global data set and obtained estimations of the concentrations of TChl-a, TChl-b, TChl-c, PSC and PPC, with TChl-a and TChl-b being the most accurate and poorest estimates, respectively.The main limitation of this method lies in the biological variability embedded in the training data set.
More recently, there has been an increased use of in situ hyperspectral optical sensors to obtain pigment data from continuous optical measurements, e.g., [22].In-line and autonomous measurements by new miniature sensors deployed on various platforms (e.g., profiling floats, autonomous surface water vehicles) have substantially increased the sampling frequency and spatial coverage of measurements.The shipboard underway spectrophotometry considerably facilitates the acquisition of a p (λ) with unprecedented spatial resolution.It utilizes an AC-S hyperspectral spectrophotometer (or the 9-wavelength resolved AC-9) (Sea-Bird Scientific, Philomath, OR, USA) operated in flow-through mode and derives a p (λ) by differencing the bulk seawater absorption measurements from temporally adjacent 0.2-µm filtered water sample measurements, e.g., [22,[39][40][41][42][43][44][45][46][47][48].It has provided surface TChl-a data along cruise tracks via the empirical relationships between the spectrophotometry derived a p (λ) and HPLC measured TChl-a concentrations [39,40,[45][46][47][48].Furthermore, Gaussian decomposition has been performed by Chase et al. [22] to retrieve major pigment groups from a globally extensive underway AC-S derived a p (λ) data set.Here we use a data set obtained with a similar underway system to compare and contrast two different methods to obtain information on the underlying pigments.
The Fram Strait, the region between Svalbard and Greenland, provides the only deep connection between the North Atlantic and Arctic Oceans (Figure 1).It is of great importance to the climate in the Arctic region, as it accounts for 75% of the mass exchange and 90% of the heat exchange between the Arctic Ocean and the rest of the world's ocean [49].In recent decades, the Fram Strait has undergone a significant warming, high variability of Atlantic water inflow [50] and an overall increase of sea ice area export [51][52][53][54][55].This impacts phytoplankton biomass, community composition and distribution by altering light and nutrient regimes.The seasonal cycle of phytoplankton biomass has been significantly enhanced in the shallow upper water layers since 2008 [56].Phytoplankton distributions reflect the dominant local physical processes [56,57].A significant increase in summertime chlorophyll-a concentration in the eastern Fram Strait was observed, whereas on the western side there were minor changes [56].Furthermore, a shift of dominant phytoplankton assemblages from diatoms (mainly Thalassiosira spp., Chaetoceros spp.and Fragilariopsis spp.) towards coccolithophores (mainly Emiliania huxleyi) and more recently, Phaeocystis spp.(mainly Phaeocystis pouchetii) and other small pico-and nanoflagellates during summer months was suggested [56,58,59], which can strongly affect the functioning and stability of marine food webs [60,61].The studies of phytoplankton community composition in this region are mainly based on discrete water samples or moored sediment traps.Because of the inherent limitations of these methods, the observations are scarce.Furthermore, it remains difficult to obtain information on phytoplankton community composition via satellite due to the poor spatial-temporal coverage of ocean color data in this region, e.g., [57] and the lack of assessment of the applicability of satellite algorithms determining the phytoplankton community structure for this region.Additionally, algorithms applicable to other waters for quantifying phytoplankton community structure or pigment composition from in situ optical measurements have not been assessed yet in this region.
The Fram Strait cruises PS93.2,PS99.2 and PS107 on R/V Polarstern collected a comprehensive in situ bio-optical data set and offer a unique opportunity for bio-optical modeling.In particular, underway spectrophotometry was applied during all three cruises.To obtain the information of individual phytoplankton pigments or pigment groups (e.g., PSC and PPC) from underway spectrophotometry, here, we compare and optimize the performances of two pigment retrieval approaches, Gaussian decomposition [22] and the matrix inversion technique [27,28], find the potential number and types of pigments that can be retrieved, and assess the applicability of the two approaches to the Fram Strait and its vicinity.

Data Collection
Data were collected during three expeditions on R/V Polarstern: PS93.2 (July to August 2015), PS99.2 (June to July 2016) and PS107 (July to August 2017).These cruises have repeated survey design.Sampling sites were located in the Fram Strait and its vicinity, ranging from approximately latitudes 72 • to 80 • N and longitudes 10 • W to 15 • E (Figure 1).
The underway a p (λ) and discrete pigment concentration measurements of the surface water were collected for each expedition.The average velocity of the ship while moving is ~10 knots (5.1 m s −1 ).Sampling methods and data analysis are detailed in Liu et al. [45].Briefly, a 25-cm-pathlength AC-S spectrophotometer (spectral range: 400-740 nm, full width half maximum (FWHM): 10 nm, wavelength resolution: ~3.5 nm) was integrated into the shipboard flow-through system following the setup of Slade et al. [46].Seawater was sampled at roughly 11 m below the sea surface from the ship's keel using a membrane pump.The flow rate of seawater is 1-2 L min −1 .The a p (λ) spectra were derived by subtracting the absorption coefficients of 0.2-µm filtered seawater from those of seawater materials measured by AC-S.Subsequently, they were corrected for temperature and salinity dependency of pure water absorption [63], scatter errors [64] and residual temperature effect [46].Additionally, the effect of AC-S filter factors resulting in a smoothing of the measured a p (λ) spectra was corrected [22].
Discrete seawater samples were collected from the unfiltered AC-S outflow approximately every three hours.Seawater (1-3 L) was filtered with GF/F glass fiber filters (nominal pore size 0.7 µm) for HPLC phytoplankton pigment analysis (see Table 1 for the names and abbreviations of the pigments and pigment groups used in this study).Pigments were grouped following Hooker et al. [65].Divinyl-chlorophyll-a and divinyl-chlorophyll-b were not found in our data set.For convenience, in the following context, the term "pigment" stands for either a specific type of pigment or a pigment group such as PSC and PPC.In addition, the spectral absorption coefficient of non-algal particles (a N AP (λ)) in discrete water samples was measured for the determination of its spectral exponent.Seawater (0.2-1 L) was filtered to concentrate particulate materials on the GF/F filters.a p (λ) from discrete samples was determined using Quantitative Filter Technique [66][67][68].Measurements for samples from PS93.2 were carried out on a dual-beam UV/VIS spectrophotometer (Cary 4000, Varian Inc., Palo Alto, CA, USA) (spectral range: 300-850 nm, FWHM: 2 nm, wavelength resolution: 1 nm) equipped with a 150 mm integrating sphere following Simis et al. [69], whereas filters collected during PS99.2 and PS107 were measured using a small portable integrating cavity absorption meter (spectral range: 300-850 nm, FWHM: 2 nm, wavelength resolution: 0.3 nm) [70], as detailed in Liu et al. [45].a N AP (λ) was then obtained by measuring the sample filters bleached with 10% NaClO solution [69] following the same procedure as a p (λ) measurements.a N AP (λ) was approximated using an exponentially decaying function [71,72]: where S is the spectral exponent of a N AP (λ).Equation ( 1) was fit to a N AP (λ) for data between 380-620 nm excluding the 400-480 nm range (to eliminate residual chlorophyll-a absorption peak) using non-linear least squares method [72].The median value of S for all three expeditions is 0.016 nm −1 (standard deviation with respect to the median value is 0.006 nm −1 ), which was subsequently used in the decomposition of the AC-S derived a p (λ) to obtain a ph (λ) (see Section 2.2.1).AC-S derived a p (λ) were averaged within the period of ten minutes before and after HPLC sampling time and were matched with HPLC pigments data.a ph (λ) (400-700 nm, wavelength resolution: ~3.5 nm) was obtained by numerical decomposition (see Section 2.2.1).In total, 298 a p (λ)-pigments match-ups were obtained, which were subsequently used as the pigment retrieval data set.The link to the data used in this study is shared in the supplementary materials.Note: TChl-a = monovinyl-chlorophyll-a + chlorophyllide-a; TChl-b = monovinyl-chlorophyll-b; TChl-c = Chl-c1/2 + Chl-c3; PSC = Fuco + But + Hex + Peri [65], PPC = Allo + Diadino + Diato + Zea + α-+ β-Caro [65].

Retrieval of Phytoplankton Pigments
Figures 2 and 3 illustrate the steps of applying Gaussian decomposition and the matrix inversion technique, respectively, to retrieve phytoplankton pigment concentrations, which are described in detail in the following subsections.The link to the codes for data processing is shared in the supplementary materials.

Gaussian Decomposition
Following Chase et al. [22], AC-S derived a p (λ) was decomposed to twelve Gaussian functions and one a N AP (λ) exponential function expressed by Equation (1) in the range of 400-700 nm.Each Gaussian function represents the absorption by a certain phytoplankton pigment.The absorption by the water-soluble photosynthetic pigment phycoerythrin was also represented as a Gaussian function though its concentration could not be validated with HPLC.The peak location and width of each Gaussian function shown in Table 2 were defined with fixed values based on known pigment absorption shapes [73].The decomposition is optimized by minimizing the cost function using a weighted least squares method [22]: where a gaus,i (λ) denotes the ith Gaussian function, and σ SD (λ) is the standard deviation of the 20-min averaged matched a p (λ) spectra.
Table 2.The peak wavelengths (λ 0 ) and widths (σ) of the Gaussian functions for phytoplankton pigments, the statistics for the power function regression of a gaus (λ 0 )-pigment pairs in the training set (regression coefficients A and B of Equation ( 3) were calculated with 95% confidence bounds), and the statistics based on leave-one-out cross-validation.MAE is in mg m −3 (values outside the parentheses were calculated with linear-scale values, while inside the parentheses with log10-scale values), MPE in %, and N is the number of data points for the regressions.
(a) Decomposition of a ph (λ) The amplitude of a N AP (400) was derived by minimizing Equation ( 2) sample by sample and used to reconstruct a N AP (λ) for each sample according to Equation (1). a ph (λ) was obtained by differencing a p (λ) and a N AP (λ).The amplitude of each Gaussian function a gaus (λ 0 ) [m −1 ] was derived by minimizing Equation ( 2) and related to the concentration of the corresponding pigment measured by HPLC (c [mg m −3 ]) by fitting the following equation using Bisquare robust non-linear least squares method (data pairs with either a gaus (λ 0 ) or c being 0 were excluded) (Table 2): For convenience, we denote the five pigments that can be retrieved using Gaussian decomposition (Table 2), i.e., TChl-a, TChl-b, Chl-c1/2, PSC and PPC as "Gauss-5 pigments".

Matrix Inversion Technique
The a ph (λ) spectra can be reconstructed as the linear combination of the absorption spectra of individual pigments that equal to the pigment-specific absorption coefficients (a * j (λ)) multiplied by pigment concentrations (c j ) [26], i.e., a ph (λ) = ∑ m j=1 c j a * j (λ).When there is more than one sample in the observed collocated pigment concentrations and a ph (λ) data set, the reconstruction model can be written in matrix multiplication form as: where c is the observed pigment concentration (e.g., from HPLC), a * (λ) is the derived pigment-specific absorption coefficient, n is the number of samples, i is the sample index, m is the number of pigments measured in each sample, and j is the pigment index.The c and a ph (λ) are known (the former from HPLC and the latter from spectrophotometry) while a * (λ) is unknown.To solve for a * (λ), the elements of matrix A, the inverse of matrix C is computed.Once this is done, the derived a * (λ) is used with the observed a ph (λ) (l in Equation ( 5) is the number of wavelengths) to solve for the pigment concentrations ( c in Equation ( 5)).Likewise, the computation of the inverse of matrix A is necessary.
Hence, this is a two step approach.First, where HPLC is available, Equation ( 4) is used to obtain the pigment-specific absorption spectra.Once those are available, Equation ( 5) is used to derive pigment concentrations directly from a ph (λ) (a step that does not require HPLC data).The SVD-NNLS approach proposed by Moisan et al. [27,28], i.e., solving Equation ( 4) with SVD least squares method on each wavelength and Equation ( 5) with NNLS method, was proved to give the best pigment estimates.

Singular Value Decomposition-Non-Negative Least Squares (SVD-NNLS)
The SVD-NNLS approach was adapted and tested using our data set.The matrix C in Equation (4) (the concentrations of the pigments listed in Table 1) was inverted using SVD.The least-squares solution of the overdetermined Equation (4) (in this study n > m) is derived by A = C + • A ph , where C + is the Moore-Penrose pseudoinverse of matrix C computed by SVD (MATLAB function pinv).A provides the specific absorption spectra of each pigment and is then used in Equation ( 5) to solve for pigment concentrations via NNLS (MATLAB function lsqnonneg), i.e., by inverting Equation ( 5) using least squares method with the constraint c i,j ≥ 0.
To ensure robust solutions of the overdetermined systems, matrix C in Equation ( 4) and matrix A in Equation ( 5) should be constructed to avoid ill-conditioning, requiring that the columns and rows of matrix C have sufficient linear independence, and that the shapes of any two a * (λ) be sufficiently different from each other [36].Here, we used the condition number (n cond ) (MATLAB function cond) as a diagnostic for the degree of the well-conditioning of matrix C (a matrix with a high n cond is ill-conditioned, and vice versa).In addition, a similarity index (SI i,j ) [74,75] was used to represent the similarity between the absolute values of two specific spectra a i * (λ) and a j * (λ) (denoted as a * + (λ)).
The SI i,j is a number ranging from 0 (no similarity) to 1 (perfect similarity).
where || a * (λ)|| is the norm of the vector a * (λ) (MATLAB function norm).To maximize the number of pigment types to be determined (m) while reducing the degree of the ill-conditioning, we tested all the possibilities of combining pigments composed of matrix C (number of combinations 20! m!(20−m)! , 20 types of pigments measured in total) and examined n cond and SI i,j for all cases.For example, when m = 20, matrix C includes all the pigments listed in Table 1 excluding TChl-c, PSC and PPC (denoted as "Fram-20 pigments") in 298 samples .When 1 ≤ m < 20, matrix C includes the concentrations of m types of pigments and a summed contribution of other pigments included in the Fram-20 pigments.m was then determined as the biggest number with n cond smaller than 60 and SI i,j smaller than 0.9.For convenience, we denote the retrieval of these m pigments using SVD-NNLS as SVD-NNLS-m.
To compare with the results from Gaussian decomposition, the SVD-NNLS method was also applied to only retrieve Gauss-5 pigments (denoted as SVD-NNLS-5 ), i.e., matrix C in Equation ( 4) only includes the concentrations of these five pigments and a summed contribution of other pigments.

Non-Negative Least Squares-Non-Negative Least Squares (NNLS-NNLS)
Though SVD provides a powerful tool for matrix inversion, one concern is that the SVD derived specific absorption spectra can be negative, which are physically unsound and not intuitively understood.To cope with this issue, NNLS was used twice (denoted as NNLS-NNLS), i.e., to invert Equation (4) to solve for the non-negetive matrix A, as well as to invert Equation ( 5) to derive the non-negative matrix C. Similarly, NNLS-NNLS-5 was tested for comparison with the results from Gaussian decomposition and SVD-NNLS-5 .Bricaud-NNLS-NNLS were also tested using the same way as Bricaud-SVD-NNLS with the exception that the a * (λ) for the missing pigments was solved using NNLS.The NNLS-NNLS approach was also performed by Moisan et al. [27,28].The pigment estimation results from NNLS-NNLS are provided in Appendix A.

Sensitivity Analysis
The solution of matrix inversion can be sensitive to input errors depending on the degree of well-conditioning of the input matrices, which can affect the pigment estimation accuracy.To ensure the stability of the matrix inversion model and obtain reliable pigment estimation statistics, perturbations were introduced to the input data for 300 iterations, and the related parameters, i.e., n cond , SI, a * (λ) and cross-validation statistics (see Section 2.2.4) were calculated as the median values of the 300 sets.
Assuming an uncertainty of 15% for HPLC pigment data, matrix C was perturbed with random values within ±15% of the measurements.Matrix A ph was also perturbed by adding the random values within ±σ SD (λ) (see Equation ( 2)) to the measurements.The results for three cases, i.e., with perturbed C, with perturbed A ph and with both matrices perturbed were considered for both SVD-NNLS and NNLS-NNLS.

Normalization of a ph (λ) by Pigment Package Effect
The pigment package effect index Q * a (λ) can be calculated as the ratio of the measured a ph (λ) to the absorption coefficient of the same pigments which would be dispersed into solution [73,76].To partially account for the package effect, Moisan et al. [27,28,34] where c TChl-a is TChl-a concentration (in mg m −3 ), 0.033 (in m 2 mg −1 ) is the "unpackaged" Chl-a-specific absorption coefficient at 675 nm measured by Bricaud et al. [73], and the fraction is the inverse of Q * a (675).Note that chlorophyll-a, divinyl-chlorophyll-a, chlorophyll-b, divinyl-chlorophyll-b and Chl-c1/2 absorb light at 675 nm, e.g., [12,73].For simplicity, we assume that Q * a (675) is only contributed by TChl-a, not only because TChl-a contributes most to a ph (675), but also due to the weaker dependence on pigment data for Q * a (675) calculation.In theory, Q * a (675) ranges from 0 (fully "packed") to 1 ("unpackaged").Due to uncertainties, samples with calculated Q * a (675) greater than 1 are unavoidable in practise and included in the calculation of âph (λ).
The âph (λ) was subsequently used to retrieve pigment concentrations via Gaussian decomposition, SVD-NNLS and NNLS-NNLS methods.Results were compared with those using a ph (λ).To avoid confusion, in the following context, unless otherwise stated, the results are based on a ph (λ).

Statistics
For the development of pigment retrieval models, all the match-up points were used as training data, allowing the models to best account for the biological variations in the data set.Statistics for applying the model to the training data include the slope and the intercept of Model-1 Bisquare robust linear regression, the determination coefficient (R 2 ), the mean absolute error (MAE) and the median absolute percentage error (MPE).The equations for the statistical metrics are given below: where n is the number of samples, c i,j is the concentration of the jth pigment in the ith sample measured by HPLC, and c i,j is the estimated pigment concentration.Considering phytoplankton pigments are approximately log-normally distributed in the ocean, the slope, intercept and R 2 were computed in log10 space.Additionally, MAE is also calculated using log10(c i,j ).To avoid confusion, in the following context, unless otherwise stated, MAE values are calculated using c i,j .
For model evaluation, leave-one-out cross-validation was performed (MATLAB function crossvalind) to estimate likely performance of each model on out-of-sample data.The pigment retrieval data set with N data points was split into two partitions: one partition is the testing set with one data point, and the other partition is the training set with the union of the other data points.Statistics were iteratively calculated N times, using a different data point as the testing set each time.The model prediction errors for out-of-sample data were defined as the average values of the statistics (mean value for MAE and median value for MPE, respectively) for the N sets.Cross-validation is an effective way of estimating model test errors when the number of data available is relatively small [18,77].Compared to the random train/test split method and k-fold cross-validation, leave-one-out cross-validation has the advantage that the training set highly resembles the whole data set (the former has only one data point less than the latter), thus avoids the bias introduced to the estimation of the test errors due to less training data than data available.
For clarity, statistics (errors) obtained from running the trained model back on the training data are denoted as "training statistics (errors)", while those obtained when applying the trained model to the test data are called "test (or estimation or prediction) statistics (errors)".
The composition and data range of pigments (minimum, maximum, mean and standard deviation) are shown in Table 1.TChl-a concentration spans the range 0.06-3.87mg m −3 .Only TChl-a, TChl-c, Fuco, Hex and PSC have a mean concentration greater than 0.2 mg m −3 .Other than that, the pigments with mean concentrations greater than 0.05 mg m −3 are Chl-c1/2, Diadino, TChl-b, and PPC.Large standard deviations were observed within individual pigments, and the ratios of standard deviation to mean value are in the range of 0.58-4.35.The correlation between the concentrations of phytoplankton pigments was represented by the Spearman s rank correlation coefficient (Spearman's ρ) (Figure 4b).High level of the correlation (e.g., Spearman's ρ > 0.7 or <−0.7) is a concern as it can cause the ill-conditioning of Equation ( 4), influencing the accuracy of pigment estimation using the matrix inversion technique.

Gaussian Decomposition
Strong correlations were found between the Gaussian function amplitudes and the corresponding pigment concentrations (R 2 > 0.5) (Table 2 (a)).Among all the Gaussian functions representing TChl-a absorption, the amplitude at 434 nm has the strongest relationship to TChl-a concentration (R 2 = 0.87, MAE = 0.21 mg m −3 ), closely followed by that at 675 nm. a gaus (638) is much better correlated with Chl-c1/2 than a gaus (584), while a gaus (660) provides slightly better correlation with TChl-b than a gaus (470).That the R 2 for TChl-b and PPC is smaller than 0.6 is likely due to the reduced dynamic range of TChl-b and PPC compared to TChl-a, Chl-c1/2 and PSC (Table 1).Considering the relationships for PSC and PPC as well as the strongest correlations for TChl-a, TChl-b and Chl-c1/2, the MPE ranges from 20.7% to 33.4%.The training errors for the five pigments increase in the order of: TChl-a, TChl-b, PPC, Chl-c1/2 and PSC.The strongest correlations for all five pigments are shown in Figure 5a,c,e,g,i.For comparison, TChl-a is also plotted against a gaus (675) in Figure 5k.
As a result of the above, a gaus (434), a gaus (660), a gaus (638), a gaus (523) and a gaus (492) were used to predict the concentrations of TChl-a, TChl-b, Chl-c1/2, PSC and PPC, respectively.Statistics based on leave-one-out cross-validation (Table 2 (a)) show that overall, all five pigments were reasonably well retrieved (MPE 20.8-34.0%).TChl-a and TChl-b have the least and the second least prediction errors, respectively, while PSC is most poorly estimated.
When using âph (λ), the performance of the Gaussian decomposition significantly improved.Strong contrast was observed in the relationships between the Gaussian function amplitudes and pigment concentrations before and after applying the package effect normalization according to Equation (7) (Figure 5, Table 2 (b)).All statistical parameters for the eleven relationships between Gaussian absorption and pigment concentrations are improved (Table 2 (b)).The R 2 for a gaus (434) to TChl-a correlation is 0.96 (0.87 before the normalization) and the regression coefficient B (Equation ( 3)) is close to one.Similarly, the R 2 for a gaus (638) to Chl-c1/2 correlation is 0.91 (0.81 before the normalization) and the regression coefficient B (Equation ( 3)) is also close to one.The relationships for PSC and PPC also improved in terms of increased R 2 and decreased MAE and MPE for both pigments, and of a much closer shift of B (Equation (3)) to one for PPC.In contrast, the relationship for TChl-b is least affected by the package effect normalization of a ph (λ), with still increased R 2 , but only slightly decreased MAE and MPE, and a relatively large deviation of B (Equation (3)) to the unity (0.35-0.60 for both 470 nm and 660 nm regardless of the normalization).Cross-validation results (Table 2 (b)) further confirm the improved performance of Gaussian decompostion in estimating TChl-b, Chlc1/2, PPC and PSC after taking the package effect into account (MPE 29.3-34.0%before the normalization and 20.5-27.4% afterwards).The pigment with the lowest estimated accuracy is now Chl-c1/2, but still with an improved accuracy following normalization.Similarly, TChl-b retrievals have slightly reduced MAE and MPE than those before the normalization.,d,f,h,j,l).The results of Chase et al. [22] are based on a ph (λ).

The Number of Pigment Types to Be Estimated
To apply the matrix inversion technique to estimate phytoplankton pigments, a key issue is to determine the number of pigments m included in matrix C in Equation ( 4).Table 3 summarizes the final selections of m that fulfilled the criteria mentioned in Section 2.2.2 and the corresponding n cond and SI for all cases of the matrix inversion technique.The n cond and maximum values of SI did not significantly change after data perturbations were introduced, regardless of the application of the package effect normalization.
As shown in Figure 6, the minimum values of n cond for all pigment combinations increase with increasing m, indicating that the larger the number of pigment types to be estimated, the more sensitive the matrix inversion is to input errors.The largest value of m is nine and six for all combinations of Fram-20 and Bricaud-12 pigments, respectively, with the minimum value of n cond smaller than the threshold 60.When composed of Gauss-5 pigments, matrix C has a n cond of 47.8.
Based on the results fulfilling the n cond criterion, when a * (λ) is derived by SVD, the SI criterion allowed m to reach nine out of the Fram-20 pigments, and was satisfied by the Gauss-5 pigments.However, m for the NNLS-NNLS method decreased to six for of the Fram-20 pigments when package effect normalization was not performed, because the NNLS algorithm set some of the derived a * (λ) in the pigment combinations (m > 6) to zero to avoid negative values [30].When the package effect normalization was applied, the number of pigment combinations valid for SI calculation increased to nine, possibly because this normalization increases the inner differences of the set of the derived a * (λ).However, the maximum SI value exceeded 0.9 for Gauss-5 pigments.Therefore, NNLS-NNLS-5 method based on âph (λ) was not considered in the estimation of pigments.As for the Bricaud-12 pigments, the choice of SVD and NNLS influences the derived specific absorption of the missing pigments, as indicated by the different SI values.In this case, m reaches four.The specific pigments to be inverted for all cases are shown in Tables 4 and 5 and Table A1.

SVD-NNLS
The pigment-specific absorption coefficients obtained from SVD-NNLS-9, SVD-NNLS-5 and Bricaud-SVD-NNLS-4 without input data perturbations are displayed in Figure 7.Each SVD derived specific spectra varies smoothly across the full bands and sufficiently differs from each other (SI < 0.9).Negative coefficients are permissible because these are mathematical constructs.Pigment concentration estimates were then solved via NNLS and compared to HPLC pigment concentrations.Training statistics (Table 4) and the scatter plots (Figure 8) show that the estimated and the corresponding measured pigments were correlated (R 2 > 0.30) except for Pheo-a (SVD-NNLS-9), TChl-b (SVD-NNLS-5 ) and Hex (Bricaud-SVD-NNLS-4).For SVD-NNLS-9, the training errors for all pigments except for Peri and Pheo-a were reduced by the package effect normalization (Table 4 (a)).In contrast, this normalization increased training errors when using Bricaud-SVD-NNLS-4 (Table 4 (c)).For SVD-NNLS-5 , all Gauss-5 pigments except for PPC were observed to have smaller training errors with this normalization (Table 4 (b)).To obtain robust pigment estimation statistics and evaluate the influences of the package effect normalization on them, data perturbations and cross-validation were combined to generate the test errors (Table 5).The SVD-NNLS-9 method (Table 5 (a)) exhibited stable prediction accuracy (MPE 16-65%) for six types of pigments, i.e., TChl-a, TChl-b, Chlc-1/2, Diadino, Fuco and Hex, in which MPE varied less than 10% and MAE less than a factor of 1.2 with input data perturbations.TChl-a had the least prediction error (MAE 0.17-0.22mg m −3 , MPE 16-22%); Fuco, Hex and Chl-c1/2 shared the second best estimation (MPE 35-45%); Diadino was least accurately estimated (MAE 0.07-0.08mg m −3 , MPE 61-65%), and TChl-b showed a slightly smaller MPE of 53-60% (MAE 0.03-0.04mg m −3 ).Though included in the calculation, But, Peri and Pheo-a exhibited relatively inconsistent prediction errors with the three cases of data perturbations, likely because of their relatively low concentrations (Table 1) and infrequent occurrence in the data set.When considering the package effect normalization, however, all nine pigments except for Pheo-a achieved stable prediction statistics (MPE 36-76%) with the different cases of data perturbations (MPE varied less than 13% and MAE less than a factor of 1.3, Table 5 (a)).The normalization obtained comparable prediction errors as those without normalization for TChl-b, Diadino and Hex, 8% higher of MPE for Chlc-1/2 and Fuco, and additional estimations of But (MPE 67-70%) and Peri (MPE 68-75%).

Feasibility of SVD-NNLS-9 for Multispectral a ph (λ)
The performance of the matrix inversion technique is sensitive to the number of wavebands and their locations on the a ph (λ) spectrum [27,33].To test the feasibility of this technique using multispectral a ph (λ), SVD-NNLS-9 was performed with a ph (λ) at ten MODIS bands, i.e., 412, 443, 469, 488, 531, 547, 555, 645, 667 and 678 nm.In this case, only four types of pigments, i.e., TChl-a, TChl-b, Chlc-1/2 and Hex, were retrieved with stable and acceptable estimation statistics (Table 6) both with and without package effect normalization.The estimation errors of TChl-a and TChl-b slightly increased by 4 to 14% using multispectral a ph (λ), while for Chlc-1/2 and Hex, the MPE were approximately 30% and 20% higher than those with hyperspectral a ph (λ), respectively.Table 6.Statistics of phytoplankton pigment retrieval using SVD-NNLS-9 with a ph (λ) at ten MODIS bands based on leave-one-out cross-validation.MAE is in mg m −3 (values outside the parentheses were calculated with linear-scale values, while inside the parentheses with log10-scale values) and MPE in %. "Perturb 1, 2 and 3" represent the input data with perturbations of pigment concentrations solely, a ph (λ) solely and both, respectively.

Discussion
The study of phytoplankton dynamics in relation to a changing climate requires access to high resolution phytoplankton pigment information in space and time.Unlike data obtained from HPLC analysis of discrete water samples, underway spectrophotometry is capable of providing nearly continuous in situ data records of spectral particulate absorption as well as phytoplankton absorption.The latter is dependent on the concentrations and composition of phytolankton pigments.Therefore, algorithms that link hyperspectral a ph (λ) obtained from underway spectrophotometry to the concentrations of various phytoplankton pigments provide pigment information with high spatial resolution (~300 m for one minute binned-averaged spectra when the ship is moving at ~10 knots).This pigment information can support the evaluation of ocean color algorithms and coupled hydrodynamic-biological modelling.With the advancement of hyperspectral radiometers, these algorithms have great potential to be incorporated into the inversion of satellite ocean color measurements for the synoptic detection of phytoplankton pigments and thus the monitoring of phytoplankton spatial and temporal dynamics.
Gaussian decomposition is an effective pigment retrieval algorithm that dates back to 1990s [11] and was recently modified for more extended applications, e.g., [22][23][24][25].At the time this manuscript is drafted, it is the only method that has been applied to underway spectrophotometry data and successfully retrieved the concentrations of various pigments [22].Compared with Gaussian decomposition and other methods from previous studies, e.g., [14,15,19,26] that are incapable of resolving PPC and PSC, the matrix inversion technique is capable of estimating these and various marker pigments indicative of phytoplankton composition [27,28,34].Both Gaussian decomposition and the matrix inversion technique rely on the physical links between phytoplankton pigments and their distinct light absorption properties, while other methods' (e.g., principle component analysis) output is often physically uninterpretable.
To assess the utility of the two methods and improve their performances in our study area, we improved the Gaussian decomposition algorithm from Chase et al. [22] by considering pigment package effect and reconsidered the matrix inversion technique from Moisan et al. [27] by taking into account matrix conditioning.In the following discussion, we compare the results from this study to those from the literature, highlight the improvements we have made, and show applications to underway absorption data.

Gaussian Decomposition
Our a gaus (λ 0 )-pigment data falls into the range of the global data set from Tara expeditions [22] and has a large portion of overlap with the global data (Figure 5, Table 7) except for TChl-b (Figure 5i), likely due to the lack of low TChl-b concentrations in this study.Furthermore, our a gaus (λ)-pigment data shows less scatter (Figure 5a,c,e,g,i,k), and the estimation errors for all Gauss-5 pigments are lower than those of the Tara data set (MPE 7.5-20% lower).This is probably because of a higher level of variability in pigment package effect of the global data set than that in the Fram Strait, as indicated by the greater quartile coefficient of dispersion of the pigment-specific absorption coefficient (a * i (λ 0 ), defined as the ratio of a gaus,i (λ 0 ) to the corresponding pigment concentration) for the global data set (Table 7).In addition, it could also be due to the errors introduced by the extent of the Tara expeditions study (2.5 years) and resulting increased potential for methodological variability, while only four people were involved in discrete sampling in the current study.
After applying the package effect normalization, as expected, almost all the a gaus (675)-TChl-a data points fall on the regression line (Figure 5l, Table 2), i.e., no additional package effect was found at a gaus (675), suggesting the successful simplification of Q * a (675) calculation using only TChl-a concentration.Both the R 2 for a gaus (434) to TChl-a correlation and the regression coefficient B (Equation ( 3)) are close to one, indicating that the variations in the magnitude of Q * a (675) account for a large proportion of the variations in the package effect due to TChl-a at 434 nm.Similar improvements have been observed for Chl-c1/2, PSC, PPC and to a less extent, TChl-b, which implicates the covariation between the package effect of these pigments at the corresponding wavelengths and that of TChl-a.This is also proved by the reduced data dispersion of a * i (λ 0 ) except for TChl-b (Table 7) and the strong power law relationships (R 2 0.39-0.87)between a * TChl-a (434), a * TChl-b (660), a * Chl-c1/2 (638), a * PPC (492), a * PSC (523) and a * TChl-a (675), respectively.The package effect of a specific pigment is a function of the concentration of this pigment as well as phytoplankton cell size [76].The strong correlations between TChl-a concentration and the concentrations of Chl-c1/2 (Spearman's ρ = 0.9), PSC (Spearman's ρ = 1.0),PPC (Spearman's ρ = 0.8), and of TChl-b (Spearman's ρ = 0.7) (Figure 4b) explain the covariation of the package effect by the pigments.
In essence, the Gaussian decomposition method takes advantage of the absorption-pigment concentration correlation.The shape and magnitude of a ph (λ) is controlled by phytoplankton pigment composition and the level of the pigment package effect.When applying Gaussian decomposition, the absorption of individual pigments were separated and related to corresponding pigment concentrations.Therefore, the effects of pigment composition and package were separated, and the observed variations in absorption-pigment concentration correlation are mainly attributed to the package effect.The covarying absorption by more than one pigment that failed to be separated by this method is also a reason for these variations.Though simplified, the package effect is for the first time taken into account during Gaussian decomposition of a ph (λ) and improved the pigment estimation accuracy.It provides new insight into increasing pigment retrieval accuracy via the combined use of concurrent a ph (λ) and TChl-a concentration either obtained from field instrumental measurements/estimates or satellite data.To test the applicability of this normalization to underway spectrophotometry data when HPLC TChl-a data is not available, TChl-a was firstly calculated from the AC-S derived a ph (675) via the cruise-specific power functions (see Section 3.1).In this case, the improvement was also found with the normalization (Table 8).However, when TChl-a was derived either from a p (440) using the power law relationship described in Section 3.1 or the global relationship SVD-NNLS-9 and SVD-NNLS-5 (Table 4 (a,b)), most of the pigments showed improved MAE and MPE with this normalization.However, the cross-validation results calculated by taking into account the data perturbations (Table 5 (a,b)) showed randomly improved, reduced, or similar pigment prediction errors after the application of the normalization.The reason for this inconsistency is probably due to the fact that both Moisan et al. [27] and the training statistics in this study did not take into account the sensitivity of matrix inversion to input errors.Moisan et al. [27] also pointed out that the inverse model solutions are sensitive to the level of errors in the measured a ph (λ).The inconsistency between the results of the training (Table 4) and test errors (Table 5) confirms this sensitivity and that in this case, the training statistics are not appropriate for use to indicate pigment estimation errors.The results from cross-validation with data perturbations, on the other hand, encompassed the training statistics as their one special case and effectively reduced the sensitivity of the SVD-NNLS method to provide robust and stable pigment estimation statistics.Nevertheless, when performing package effect normalization on SVD-NNLS-9, two more pigments with lower concentrations also obtained robust statistics, possibly due to the enhancement of the differences between a * (λ) of different pigments.
The sensitivity of the matrix inversion technique comes from the ill-conditioning of the linear systems (Equations ( 4) and ( 5)), which originates from the multicollinearity of phytoplankton pigments.In natural water samples, the multicollinearity of phytoplankton pigments (Figure 4b) are physiologically unavoidable.In other words, the ill-conditioning of Equations ( 4) and ( 5) is not completely avoidable.To reduce the degree of the ill-conditioning, the choice of the pigments included in matrix C (Equation ( 4)) is crucial.Our proposed n cond and SI criteria to determine which pigments should be inverted for is empirically based on the understanding of the characteristics of the input data and many trials.The thresholds of n cond and SI may change from case to case.The sensitivity analysis based on data perturbations (Table 3) shows the relative stable values of n cond and SI.This is consistent with the singular value perturbation theorems [79], i.e., the singular values of a matrix are very stable with respect to changes in the elements of the matrix, because the n cond is by definition the ratio of the largest to smallest singular values of a matrix.In contrast, though aware of the multicollinearity issue, Moisan et al. [27] did not pre-select the pigments to be determined.Instead, all types of pigments available from HPLC were included in the inverse modelling, which was also tested by this study (results not shown) and can lead to reduction of the pigment retrieval accuracy.

Applications
With the particulate absorption data collected by underway AC-S flow-through system, phytoplankton pigment concentrations along the cruise tracks are retrieved.Figure 9 shows an example of the underway Fuco and Hex estimated by SVD-NNLS-9.Fuco and Hex are dominant in diatoms and prymnesiophytes, respectively, which are the two common phytoplankton groups in the Fram Strait, e.g., [56].However, we have to bear in mind that differentiating phytoplankton groups by marker pigments can be problematic, as there is substantial variability in pigment concentrations as a function of physiological responses to the environmental conditions.More importantly, a given marker pigment can be present in several phytoplankton groups (e.g., Fuco in diatoms and prymnesiophytes; more details in Wright and Jeffrey [80]).Data from the year 2015 indicates a co-prosperity of diatoms and prymnesiophytes.Overall, the concentrations of Fuco are relatively higher than those of Hex, which possibly reflects an overall higher biomass of diatoms (Figure 9a).In the year 2016, while this co-prosperity continued, the Hex concentrations exceeded the concentrations of Fuco in the western part of our study area (2 July 2016) (Figure 9b).In contrast, the year 2017 experienced an overall higher concentrations of Hex than Fuco (Figure 9c).These results are consistent with previous observations of the shift of phytoplankton assemblages from diatoms to Phaeocystis spp.(a type of prymnesiophyte) during the summer months in the Fram Strait [56,58,59].In the future, access to similar high resolution phytoplankton pigment data verified by microscopic and flow cytometric techniques could support the studies on biogeophysical coupling in the Fram Strait and further enhance our understanding in the responses of phytoplankton community composition and physiology to climate change.

Conclusions
We demonstrated the retrieval of high spatially resolved phytoplankton pigment concentrations in the Fram Strait and its vicinity from underway hyperspectral a ph (λ) (400-700 nm, ~3.5 nm wavelength resolution) by the application of Gaussian decomposition [22] and the matrix inversion technique [27].Gaussian decomposition enables robust predictions of Gauss-5 pigments (MPE 21-34%).Improved retrieval accuracy was obtained by normalizing the a ph (λ) spectra with the pigment package effect factor at 675 nm.For the matrix inversion technique, although SVD cannot guarantee the derivation of non-negative pigment-specific absorption spectra, it generates more accurate pigment estimates compared to the NNLS derived spectra or the measured spectra from pigments in solution.To minimize the effect of the ill-conditioned matrices on pigment retrieval accuracy, we propose an innovative approach in selecting the pigments to be determined based on the combined use of data perturbations and leave-one-out cross-validation to generate robust pigment estimation statistics.Considering the overall pigment retrieval accuracy, SVD-NNLS-9 performed best among the three SVD-NNLS methods.The SVD-NNLS-9 method enables the robust estimations of six pigments (MPE 16-65%), i.e., TChl-a, TChl-b, Chl-c1/2, Diadino, Fuco and Hex, and two more being less accuraely estimated (MPE 67-76%), i.e., But and Peri, with the application of the package effect normalization.Gaussian decomposition outperforms SVD-NNLS-5 in retrieving the TChl-b, Chl-c1/2, PPC and PSC, while both methods show similar capability in estimating TChl-a.
The matrix inversion technique has the advantage of retrieving the concentrations of several specific carotenoids, which is currently not accomplished by Gaussian decomposition, derivative analysis [13], partial least squares regression [14], and multiple linear regression [15].However, its performance is sensitive to input errors when the input matrix is to some extent ill-conditioned.Therefore, sensitivity analysis such as the one based on data perturbations used in our study is always needed when assessing the performance of the matrix inversion technique in retrieving phytoplankton pigments or pigment related parameters.Future studies using methods such as principle component analysis and artificial neural network may show promise to obtain not only chlorophylls but also different types of carotenoids in our study area.
In addition to the number of pigments, the number of spectral bands used for pigment retrieval also significantly influence the performance of the matrix inversion technique.Compared with the results using hyperspectral a ph (λ), the number of pigments able to be retrieved by SVD-NNLS-9 was reduced to four, i.e., TChl-a, TChl-b, Chl-c1/2 and Hex, with increased estimation errors, especially for Chl-c1/2 and Hex, when using multispectral a ph (λ) (at ten MODIS bands).This suggests the advantage of using hyperspectral data for increasing the accuracy of phytoplankton pigment retrievals.It follows that a ph (λ) inverted from hyperspectral remote sensing reflectance measured by in situ or satellite radiometry has a greater potential for the application of Gaussian decomposition and the matrix inversion technique than multispectral radiometric measurements.
To apply Gaussian decomposition or the matrix inversion technique to a study area, prior knowledge of concurrent AC-S derived a ph (λ) and HPLC pigment concentrations in this region is necessary to derive either the regional a gaus (λ 0 )-pigment concentration relationship or the regional pigment-specific absorption spectra.With this knowledge, we apply both approaches to underway AC-S measurements in times when no HPLC data is available.Given that proxy-relation may change in the future, it is imperative to always collect some HPLC data to validate that derived relations or coefficients are still consistent.
The application of the two methods to our data obtain in three Fram Strait expeditions enables the derivation of pigment data sets along the cruise tracks.Future work could build upon these results, by deriving phytoplankton functional types based on retrieved marker pigments from hyperspectral phytoplankton absorption as well as hyperspectral remote sensing reflectance data.Such a high resolution data set will strengthen the study of phytoplankton dynamics in responses to environmental variables in the context of climate change.

Supplementary Materials:
The quality controlled particulate absorption data from underway AC-S flow-through system, the HPLC phytoplankton pigments from discrete samples, and the estimated pigment concentrations along cruise tracks mentioned in this paper are available on PANGAE: https://doi.pangaea.de/10.1594/PANGAEA.894875.The MATLAB codes for data processing are available online at https://github.com/phytooptics.Author Contributions: A.B. and Y.L. conceived and designed the experiments; Y.L. collected HPLC pigments data from the expeditions PS99.2 and PS107, and processed AC-S absorption data from all three expeditions; Y.L. led the data analysis and all coauthors assisted in it; E.B., A.C., A.B., H.X., X.Z. and R.R. helped with the interpretation of the data; E.B. and A.C. contributed the software of Gaussian decomposition and its application to this study; Y.P. contributed to programming and data visualization; Y.L. drafted the manuscript and all coauthors provided substantial comments and suggestions to improve it.
for the discussions of SVD related issues.Thanks to Jan Streffing and Nils Haëntjens for helping program using MATLAB.Thanks to Florian Riefstahl for helping with data visualization.Thanks to three anonymous reviewers whose comments significantly improved the manuscript.Table A1.Statistics of phytoplankton pigment retrieval using NNLS-NNLS based on leave-one-out cross-validation.MAE is in mg m −3 (values outside the parentheses were calculated with linear-scale values, while inside the parentheses with log10-scale values) and MPE in %. "Perturb 1, 2 and 3" represent the input data with perturbations of pigment concentrations solely, a ph (λ) solely and both, respectively.

NFigure 1 .
Figure 1.Cruise tracks for PS93.2 (July-August 2015), PS99.2 (June-July 2016) and PS107 (July-August 2017).Symbols denote locations where both AC-S and HPLC data were collected.Bathymetric grid data are extracted from the International Bathymetric Chart of the Arctic Ocean Version 3.0 [62].Lambert azimuthal equal-area projection was used for mapping.

Figure 2 .
Figure 2. Schematic overview of the steps of applying Gaussian decomposition for phytoplankton pigment retrieval.

Figure 3 .
Figure 3. Schematic overview of the steps of applying the matrix inversion technique for phytoplankton pigment retrieval.

Figure 4 .
Figure 4. (a) Variations of the AC-S derived a p (440) as a power function of TChl-a concentration; (b) the Spearman s rank correlation coefficients between the concentrations of phytoplankton pigments in our data set (linear color bar scale).

Figure 5 .
Figure 5. Concentrations of phytoplankton pigments measured by HPLC versus the magnitudes of the corresponding Gaussian functions obtained from the Gaussian decomposition of both a ph (λ) (a,c,e,g,i) and âph (λ) (b,d,f,h,j,l).The results of Chase et al. [22] are based on a ph (λ).

Figure 6 .
Figure 6.Variations in the minimum values of the condition number (n cond ) of matrix C in Equation (4) with different pigment combination (m pigment types to be estimated): (a) pigment data unperturbed; (b) pigment data perturbed.

Table 1 .
Names and abbreviations of phytoplankton pigments and pigment groups analyzed in this study, and the minimum, maximum, mean and standard deviation of the pigment concentrations (mg m −3 ).

Pigment/Pigment Group Abbreviation Minimum Maximum Mean Standard Deviation
normalized the measured a ph (λ) by dividing it with Q * a (675) and found improved capability of the matrix inversion technique in retrieving pigment concentrations.To test the performances of both Gaussian decomposition and the matrix inversion technique with the normalization strategy, Q * a (675) was calculated, and a ph (λ)

Table 3 .
The m types of pigments to be estimated using the matrix inversion technique and the corresponding n cond and maximum SI values.
(c) Both pigment and a ph (λ) data perturbed Method Pigments a ph (λ) Based âph (λ) Based n cond Maximum SI m n cond Maximum SI m

Table 4 .
Statistics for the Model-1 linear regressions between SVD-NNLS retrieved and measured pigment concentrations (regression coefficients were calculated with 95% confidence bounds).MAE is in mg m −3 (values outside the parentheses were calculated with linear-scale values, while inside the parentheses with log10-scale values), MPE in %, and N is the number of data points for the regressions.Unperturbed training data set was used.(a)SVD-NNLS-9

Table 5 .
Statistics of phytoplankton pigment retrieval using SVD-NNLS based on leave-one-out cross-validation.MAE is in mg m −3 (values outside the parentheses were calculated with linear-scale values, while inside the parentheses with log10-scale values) and MPE in %. "Perturb 1, 2 and 3" represent the input data with perturbations of pigment concentrations solely, a ph (λ) solely and both, respectively.