Hybrid Inversion Algorithms for Retrieval of Absorption Subcomponents from Ocean Colour Remote Sensing Reﬂectance

: Semi-analytical algorithms (SAAs) invert spectral remote sensing reﬂectance (cid:0) R rs ( λ ) , sr − 1 (cid:1) to Inherent Optical Properties (IOPs) of an aquatic medium ( λ is the wavelength). Existing SAAs implement different methodologies with a range of spectral IOP models and inversion methods producing concentrations of non-water constituents. Absorption spectrum decomposition algorithms (ADAs) are a set of algorithms developed to partition a nw ( λ ) , m − 1 (i.e., the light absorption coefﬁcient without pure water absorption), into absorption subcomponents of phytoplankton (cid:16) a ph ( λ ) , m − 1 (cid:17) and coloured detrital matter (cid:16) a dg ( λ ) , m − 1 (cid:17) . Despite signiﬁcant developments in ADAs, their applicability to remote sensing applications is rarely studied. The present study formulates hybrid inversion approaches that combine SAAs and ADAs to derive absorption subcomponents from R rs ( λ ) and explores potential alternatives to operational SAAs. Using R rs ( λ ) and concurrent absorption subcomponents from four datasets covering a wide range of optical properties, three operational SAAs, i.e., Garver–Siegel–Maritorena (GSM), Quasi-Analytical Algorithm (QAA), Generalized Inherent Optical Property (GIOP) model are evaluated in deriving a nw ( λ ) from R rs ( λ ) . Among these three models, QAA and GIOP models derived a nw ( λ ) with lower errors. Among six distinctive ADAs tested in the study, the Generalized Stacked Constraints Model (GSCM) and Zhang’s model-derived absorption subcomponents achieved lower average spectral mean absolute percentage errors (MAPE) in the range of 8–38%. Four hybrid models, GIOP GSCM , GIOP Zhang, QAA GSCM and QAA Zhang , formulated using the SAAs and ADAs, are compared for their absorption subcomponent retrieval performance from R rs ( λ ) . GIOP GSCM and GIOP Zhang models derived absorption subcomponents have lower errors than GIOP and QAA. Potential uncertainties associated with datasets and dependency of algorithm performance on datasets were discussed.


Introduction
The colour of natural water bodies is dependent on the quantity and distribution of optically active substances (OAS) like phytoplankton, coloured dissolved organic matter (CDOM) and non-algal particulate (NAP) matter. The OAS absorb, transmit and scatter the incident solar irradiation, thereby influencing light availability in various parts of the ocean. The light absorption properties of OAS are crucial in studying primary productivity, biogeochemical cycles, phytoplankton community, carbon pools and cycling, sources of CDOM origin and distribution [1]. The bulk total spectral absorption coefficient, a(λ) and the total backscattering coefficient, b b (λ) are inherent optical properties (IOPs) and depend only on the available OAS in the water column and are unaffected by the variations in the incident solar radiation. In all natural waters, except with extremely turbidity, a(λ) is expressed as the sum of absorption by phytoplankton a ph (λ) , CDOM (a g (λ)), NAP (a d (λ)) and pure using two simulated datasets and two measured datasets covering a wide range of optical properties are used. Potential candidates that perform better than existing SAAs are investigated by assessing the applicability of ADAs in deriving absorption subcomponents from ( ).

Models and Methods
To achieve the above-mentioned objectives, six ADAs with different methodologies and three operational SAAs are used. Figure 1 shows the systemic methodology of the study. Figure 1. Semi-analytical Algorithms (SAAs) use R rs (λ) and provide bulk and subcomponent IOPs as outputs. Absorption decomposition algorithms (ADAs) derive absorption subcomponents from total or non-water absorption coefficient. In hybrid SAAADA models, a nw (λ) from an SAA is input to ADA to derive absorption subcomponents.
The first ADA is the non-linear optimization method by Ciotti and Bricaud (2006) [18] based on the inversion technique by Roesler and Perry (1995) [23]. In this method, the phytoplankton light absorption was modelled according to the spectral mixing model that uses normalized shapes for pico-and micro-plankton and a size parameter, [18]. An exponentially decreasing slope model is used to model ( ). The three parameters optimized in this model are , (443) and . To find optimal values of the three parameters that minimize the relative error between measured and modelled ( ), MATLAB's inbuilt routine of "fminsearch" is used. "fminsearch" uses Nelder-Mead simplex algorithm wherein the concept of a simplex is used. The algorithm takes series of steps to achieve the minimum of the objective function. At each step, the objective function is evaluated and the next step is taken to reduce the value of the objective function. The initial guess value of the variables should be chosen carefully based on the observed range Figure 1. Semi-analytical Algorithms (SAAs) use R rs (λ) and provide bulk and subcomponent IOPs as outputs. Absorption decomposition algorithms (ADAs) derive absorption subcomponents from total or non-water absorption coefficient. In hybrid SAA ADA models, a nw (λ) from an SAA is input to ADA to derive absorption subcomponents.
The first ADA is the non-linear optimization method by Ciotti and Bricaud (2006) [18] based on the inversion technique by Roesler and Perry (1995) [23]. In this method, the phytoplankton light absorption was modelled according to the spectral mixing model that uses normalized shapes for pico-and micro-plankton and a size parameter, S f [18]. An exponentially decreasing slope model is used to model a dg (λ). The three parameters optimized in this model are S f , a dg (443) and S dg . To find optimal values of the three parameters that minimize the relative error between measured and modelled a nw (λ), MATLAB's inbuilt routine of "fminsearch" is used. "fminsearch" uses Nelder-Mead simplex algorithm wherein the concept of a simplex is used. The algorithm takes series of steps to achieve the minimum of the objective function. At each step, the objective function is evaluated and the next step is taken to reduce the value of the objective function. The initial guess value of the variables should be chosen carefully based on the observed range of variability in parameters to avoid local minima entrapment. The initial guess values of the three variables are set to 0.5, 0.02 m −1 and 0.015 nm −1 . More information about the "fminsearch" algorithm can be found at [24]. Table 1. Absorption spectrum decomposition algorithms used in the study with outputs. "X" indicates the outputs provided by each algorithm.

Output Variable
Reference Zhang X --X Pico, micro and Nano size classes; Chlorophyll concentration [20] The SCM model partitions a nw (λ) into absorption subcomponents based on the stacked-constraints approach. SCM model uses several inequality constraints derived using an extensive, quality-verified set of field data covering clear ocean to turbid coastal waters from low to high latitudes. SCM requires an input of a nw at a minimum of six wavelengths of 412, 443, 490, 510, 555 and 670 nm. The model outputs an optimal and a range of feasible solutions of a ph (λ) and a dg (λ). GSCM model is based on the SCM model and is capable of separating a d (λ) and a g (λ) using weakly restrictive assumptions about the component absorption coefficients. The shapes used to parameterize a d (λ) and a g (λ) were obtained from Chesapeake Bay, USA. GSCM requires an input of a nw at a minimum of six wavelengths of 412, 443, 469, 490, 555, 670 nm. GSCM exhibited good performance in deriving absorption subcomponents from the Chesapeake Bay. Lin's model partitions a nw (λ) three times to obtain absorption subcomponents along with respective slopes for each derived subcomponent. The eigenvectors used in this method are based on second-order polynomial fit obtained with data from NASA bio-Optical Marine Algorithm Dataset (NOMAD) and field samples from the Northern South China Sea (NSCS). Lin's model was validated on an independent dataset derived from the Bermuda bio-Optics project.
The Optical Signature Inversion (OSI) method uses non-linear constrained least squares regression procedure to minimize the error between modelled and ac-9 measured a nw (λ). This model uses three normalized spectral phytoplankton curves [25] for modelling phytoplankton absorption and an exponential model to model NAP and CDOM absorption components. OSI method imposes constraints on CDOM exponential slope, phytoplankton absorption, CDOM and NAP absorption weights. Zhang's model partitions a nw (λ) in two steps. A kernel containing basis vectors of pico-, nano-and microplankton ( [26]) and a * dg is created. An exponentially decreasing curve with a slope S dg is used to model a dg (λ). The S dg value is randomly varied in the range [0.004, 0.02] nm −1 . The first step inverts a linear system of equations to retrieve phytoplankton size classes and a dg (443) from input a nw (λ). In the second step, the optimal S dg that minimizes the error between modelled and measured a nw (λ) is derived by implementing MATLAB's "fminbnd" non-linear optimization scheme. "fminbnd' finds the value of a single parameter of a function that minimizes the function on a bounded domain. This algorithm finds only one minimum, can only find minima based on one variable and operates only on one variable at a time. This algorithm is based on golden section search and parabolic interpolation. More information about the algorithm can be found at [27,28]. Since S dg is the only variable to be optimized in the second step of the Zhang's algorithm, the "fminbnd" function is used.

Semi-Analytical Algorithms (SAA)
To combine ADAs with SAAs, a(λ) or a nw (λ) derived from remote sensing reflectance using SAAs is necessary. Many SAAs were proposed in the past decade to derive the bulk Remote Sens. 2021, 13, 1726
GSM was initially developed by Garver and Siegel (1997) [10] and later updated by Maritorena et al., 2002 [11]. It uses a bio-optical model parameterized by a chlorophyllspecific phytoplankton absorption coefficient (a * ph ), S dg and the slope of b bp (λ) parameters derived from a global quasi-real dataset. A non-linear optimization scheme is employed to obtain optimal values of chlorophyll concentration [Chl], a dg (443) and b bp (443) that provide the least error between the modelled and measured values of R rs (λ).
The GIOP framework allows the user to construct multiple SAA's with a different configuration at each run time. The configurable options in the GIOP framework are bidirectional reflectance distribution function (BRDF), type of computational method of inversion, eigenvector or representative shape vectors for the subcomponents, wavelength range, Raman scattering. The default configuration of GIOP uses a non-linear spectral optimization routine with the Levenberg-Marquardt method as an optimization technique. The default configuration for GIOP uses: a specific absorption model for phytoplankton from Bricaud et al. (1995) [32] normalized by 0.055 m −2 (mg C) −1 ; the slope of particulate backscattering model from QAA; exponential spectral shape model for modelling a dg (λ) with fixed slope, S dg of 0.018 nm −1 . The outputs of the GIOP model are Chlorophyll and spectral IOPs. All retrieved IOPs falling in between −0.005 and 5 m −1 are considered valid. The derived IOPs are considered invalid if the reconstructed R rs differs from the observed R rs by more than 33%.
QAA is an algebraic algorithm that uses empirical, semi-analytical and analytical equations in a step-wise manner to derive bulk and subcomponent IOPs from R rs (λ) of optically deep waters. In the first step, a(λ) is derived from R rs (λ) and in the second, a(λ) is decomposed into subcomponents, a ph (λ) and a dg (λ).
The hybrid models are formulated by combining SAAs and ADAs. The first step of hybrid models involve deriving a nw (λ) from R rs (λ) using SAAs. In the second step, the SAA model-derived a nw (λ) is used as input to various ADAs to derive absorption subcomponents ( Figure 1). The hybrid models are indicated by the notation SAA ADA . For example, if a nw (λ) derived from GIOP is used as input to GSCM model, the notation would be GIOP GSCM .

Statistical Indicators
The mean absolute percentage error (MAPE), root mean squared error (RMSE) in log scale and N% are used to evaluate the performance of various models in deriving IOPs since the datasets cover a wide range of optical properties.

Datasets
Four datasets covering a wide range of optical properties are used to evaluate the performance of ADAs, SAAs and SAA ADA models ( Table 2). The variation in R rs (λ) and corresponding subcomponent IOPs for the four datasets is presented in Figure 2. The first dataset is a simulated dataset created as a part of the International Ocean-Colour Coordinating Group (IOCCG)'s Algorithm Working Group [33]. This dataset was generated using Hydrolight [34] using known concentrations of IOP inputs covering a wide range of optical properties observed in natural waters. This dataset consists of 500 simulated R rs (λ) along with corresponding spectral absorption coefficient of a, b b , a ph , a d and a g in 400-800 nm wavelength range with 10 nm spacing. For this study, the spectral IOPs and R rs at six wavelengths of 412, 443, 490, 510, 555 and 670 nm present in most of the ocean colour sensors were used. This dataset is hereafter referred to as the IOCCG dataset. The second dataset corresponds to a subset of the global bio-optical in situ dataset curated as a part of the European Space Agency's Ocean Colour Climate Change Initiative (OC-CCI) [35]. This dataset consists of R rs (λ) and the IOPs a, a ph and a dg at six wavelengths (same as IOCCG) for 860 stations acquired over a wide range of environments covering clear to turbid ocean waters. This dataset is hereafter referred to as the GB dataset.
For development and evaluation of algorithms for coastal water quality, Coast Colour Round Robin (CCRR) project brought together an in situ dataset with sampling locations at 11 coastal sites spread around the globe [36]. The third dataset is a subset of the in situ dataset and consists of 348 spectral R rs and corresponding IOPs of a, a ph and a dg at six wavelengths 412, 443, 490, 510, 555 and 665nm. Most of the stations in this dataset are located along the coastal waters of Southern California and Florida, USA. This dataset is hereafter referred to as the CCRR dataset. The locations of stations in GB and CCRR datasets are shown in Figure 3.
The fourth dataset corresponds to a simulated dataset generated using Hydrolight with inputs of known concentrations of bio-optical parameters in the Red sea oligotrophic waters [37]. Briefly, this dataset consists of 5000 random spectra of R rs (λ) and spectral IOPs of a, a ph , a d , a g in 350-800 nm wavelength range with 5 nm spacing. The ranges of chlorophyll and non-algal particulate material used in simulations are 0.01-0.50 mg/m 3 and 0-0.1 g/m 3 . The slopes, S g and S d are varied from 0.01-0.03 nm −1 and 0.007-0.021 nm −1 . Pure water absorption coefficients are taken from Pope and Fry (1997) [38]. This dataset is hereafter referred to as the Red Sea dataset.
In IOCCG and the Red Sea simulated datasets, the input IOPs are generated using random numbers and equations for the absorption subcomponents. These IOPs are used as input to Hydrolight to simulate R rs (λ). In the case of GB and CCRR datasets, both IOPs and R rs (λ) are concurrently measured in situ. While in situ measured datasets are crucial to evaluate model performance in real-world scenarios, the number of data points and their extent may not cover the entire variability observable in natural waters. Hence, simulated datasets that cover additional natural variability in waters not observed in the in situ datasets are required. Therefore, it is crucial to include both simulated and measured datasets in evaluating model performance.
is hereafter referred to as the Red Sea dataset.
In IOCCG and the Red Sea simulated datasets, the input IOPs are generated using random numbers and equations for the absorption subcomponents. These IOPs are used as input to Hydrolight to simulate ( ). In the case of GB and CCRR datasets, both IOPs and ( ) are concurrently measured in situ. While in situ measured datasets are crucial to evaluate model performance in real-world scenarios, the number of data points and their extent may not cover the entire variability observable in natural waters. Hence, simulated datasets that cover additional natural variability in waters not observed in the in situ datasets are required. Therefore, it is crucial to include both simulated and measured datasets in evaluating model performance.

Results
Firstly, the performance of the existing global semi-analytical algorithms, G GSM and QAA in deriving ( ) from measured or simulated ( ) from the four data is evaluated. This step is critical for quantifying the errors in the SAA model-derived

Results
Firstly, the performance of the existing global semi-analytical algorithms, GIOP, GSM and QAA in deriving a(λ) from measured or simulated R rs (λ) from the four datasets is evaluated. This step is critical for quantifying the errors in the SAA model-derived a(λ) or a nw (λ), which are then used as inputs to the second step of hybrid SAA ADA models. The second comparison involves evaluating the performance of each ADA in deriving absorption subcomponents using measured a nw (λ). Evaluating the performance of ADAs using measured a nw (λ) demonstrates their capability in partitioning in ideal conditions, i.e., negligible or minimal errors in the input. The ADAs capable of providing absorption subcomponents with lower errors are used for combining with SAAs to form SAA ADA models. The SAA ADA models are then evaluated for deriving absorption subcomponents from R rs (λ) and are compared with existing global SAAs.

Performance Evaluation of Semi-Analytical Algorithms in the Retrieval of a nw (λ)
The three existing operational SAAs, i.e., GIOP with the default configuration, QAA and GSM, use measured or simulated R rs (λ) as input to estimate the bulk and subcomponent IOPs as outputs ( Figure 4)

Performance Evaluation of ADAs in the Retrieval of Absorption Subcomponents from Measured
( ) The performance of six ADAs are evaluated based on the errors observed in the derived ℎ ( ) and ( ) from ( ). For the IOCCG dataset, ℎ (555) and (670) derived from all ADAs have the highest MAPE and RMSElog10 values than the rest of the wavelengths ( Figure 5A). The MAPE values obtained for ℎ ( ) and ( ) from all ADAs at all wavelengths lie in the range of 5-88% and the average spectral MAPE in 22-44% ( Figure 5A). The average spectral MAPE for ℎ ( ) derived from GSCM is the lowest (22%). The average spectral RMSElog10 values obtained for GSCM and Zhang models derived ℎ ( ) are lowest (0.122-0.126 −1 ) among the ADAs. In the case of ( ), In both GB and CCRR datasets, the GSM model exhibited higher MAPE values in the blue-green region in the range of 20-50% and exhibited a decreasing trend towards red wavelengths. The MAPE values for QAA-derived a(λ) from GB and CCRR datasets are higher at 412 nm and decreased with an increase in wavelength till 555 nm; at 670 or 665 nm, the MAPE values reached~40%. The MAPE and RMSE log10 values for GIOP algorithm derived a(λ) for GB and CCRR datasets are higher at 412 nm and decrease with wavelength. Unlike QAA, higher errors were not observed for the derived a(670) or a(665) for GIOP or Remote Sens. 2021, 13, 1726 9 of 22 GSM models. As the present study focuses on the errors in SAA-derived a nw (λ), the total and particulate backscattering coefficients accuracy is not assessed. Compared with GIOP and QAA models, the GSM model resulted in relatively higher MAPE values for all four datasets. Hence, only GIOP and QAA model derived a(λ) are used as input to the ADAs.

Performance Evaluation of ADAs in the Retrieval of Absorption Subcomponents from Measured a nw (λ)
The performance of six ADAs are evaluated based on the errors observed in the derived a ph (λ) and a dg (λ) from a nw (λ). For the IOCCG dataset, a ph (555) and a dg (670) derived from all ADAs have the highest MAPE and RMSE log10 values than the rest of the wavelengths ( Figure 5A). The MAPE values obtained for a ph (λ) and a dg (λ) from all ADAs at all wavelengths lie in the range of 5-88% and the average spectral MAPE in 22-44% ( Figure 5A). The average spectral MAPE for a ph (λ) derived from GSCM is the lowest (22%). The average spectral RMSE log10 values obtained for GSCM and Zhang models derived a ph (λ) are lowest (0.122-0.126 m −1 ) among the ADAs. In the case of a dg (λ), average spectral MAPE values of all models, except SCM, lie in the range of 14-17%. GSCM and Lin model-derived a dg (λ) obtained lowest average spectral RMSE log10 of 0.082-0.085 m −1 ( Figure 5B). The N% for absorption subcomponents from all ADAs are always >80%, with GSCM exhibiting the lowest N% of 82% among all ADAs ( Figure S1). Based on the observed errors, the order of performance of ADAs in deriving the absorption subcomponents from a nw (λ) of IOCCG dataset is GSCM > Zhang > CB > Lin > OSI > SCM, where GSCM model-derived absorption subcomponents have lowest errors and SCM the highest.
In the case of the GB dataset, average spectral MAPE values for a ph (λ) from CB, GSCM, Zhang and Lin's models are identical and vary from 34 to 36%. OSI model-derived a ph (λ) exhibited the highest average spectral MAPE of 45%. In the case of a dg (λ), the average spectral MAPE values for GSCM and CB model-derived a dg (λ) are the lowest (20%) and highest (45%). The average spectral MAPE values for a dg (λ) derived from remaining models lie in the range of 22-25%. Based on MAPE and RMSE log10 errors ( Figure 5A,B), the order of performance of ADAs is GSCM > (Zhang ≈ Lin) > SCM > (OSI ≈ CB).
MAPE values for the derived absorption subcomponents from all ADAs for the CCRR dataset lie in the range of 11-77%. GSCM model-derived a ph (λ) and a dg (λ) obtained the lowest average spectral MAPE values of 16% and 25%, respectively. The average spectral MAPE values for a ph (λ) and a dg (λ) from the remaining models lie in 23-27% and 31-43%, respectively. As observed in MAPE statistic, the lowest RMSE log10 values are obtained for GSCM derived a ph (λ) and a dg (λ) with average spectral RMSE log10 values of 0.104 and 0.147 m −1 , respectively. The order of performance of ADAs based on the error statistics for the derived absorption subcomponents is GSCM > (Lin ≈ OSI) > Zhang > SCM > CB.
In the case of the Red sea dataset, Zhang model-derived a ph (λ) obtained the lowest average spectral MAPE value of 11% among all ADAs. Average spectral MAPE for a ph (λ) from rest of the models lie in the range of 19-59%. In agreement with MAPE, the RMSE log10 values for Zhang's model-derived a ph (λ) are the lowest with an average spectral RMSE log10 of 0.072 m −1 . In the case of a dg (λ), GSCM model obtained the lowest average spectral MAPE of 9% and spectral RMSE log10 of 0.047 m −1 . Based on MAPE and RMSE log10 statistics obtained for both absorption subcomponents, the order of performance of ADAs for the Red Sea dataset is GSCM > Zhang > SCM > CB > Lin > OSI.
Error statistics computed for the derived absorption subcomponents from the four datasets show that the GSCM model performed better than all other ADAs with the lowest errors. After GSCM, lower errors were observed in Zhang's model for IOCCG, GB and the Red Sea datasets. Hence, these two ADAs are selected for combining with SAAs of GIOP and QAA to form SAA ADA models. As a special case, the CB model is also included as this model was evaluated previously [39] for studying spatio-temporal variations in phytoplankton size and coloured detrital matter at global and regional scales. ens. 2021, 13, x FOR PEER REVIEW 11 of 24

Hybrid Inversion Models for Deriving Absorption Subcomponents from Remote Sensing Reflectance
The six hybrid SAAADA models, GIOPCB, QAACB, GIOPGSCM, QAAGSCM, GIOPZhang and QAAZhang, are compared with GIOP and QAA using statistical indicators of MAPE, RMSElog10 and %. As observed in the previous section, GIOP model-derived ( ) from the Red Sea dataset obtained the least average spectral MAPE value of 6%. In an ideal situation, ( ) derived from each SAA will have the least possible error. In such a case, assessing each ADA's performance in waters with a wide range of optical variability is necessary for deriving accurate absorption subcomponents. Hence, an ideal case wherein the measured ( ) provided as input to each ADA to derive absorption subcomponents is included in each comparison. The comparison of remaining SAAADA models with SAAs is presented in Figures S3-S5 as supplementary information.

Performance Evaluation of Hybrid Models in Deriving a ph (λ)
For the IOCCG dataset ( Figure 6A, Row 1), GIOPCB and QAACB model-derived ℎ ( ) have 23-42% lower MAPE compared to GIOP and QAA models. Higher MAPE values were observed for ℎ (555) for all the models. MAPE values for GIOPCB and QAACB derived ℎ ( ) are 6-11% lower compared to the CB model. For the GB dataset, MAPE values for GIOPCB and QAACB models derived ℎ ( ) are 4-25% lower than GIOP and QAA models. In the case of the CCRR dataset, GIOPCB model-derived ℎ ( ) obtained 4-

Hybrid Inversion Models for Deriving Absorption Subcomponents from Remote Sensing Reflectance
The six hybrid SAA ADA models, GIOP CB , QAA CB , GIOP GSCM , QAA GSCM , GIOP Zhang and QAA Zhang, are compared with GIOP and QAA using statistical indicators of MAPE, RMSE log10 and N%. As observed in the previous section, GIOP model-derived a nw (λ) from the Red Sea dataset obtained the least average spectral MAPE value of 6%. In an ideal situation, a nw (λ) derived from each SAA will have the least possible error. In such a case, assessing each ADA's performance in waters with a wide range of optical variability is necessary for deriving accurate absorption subcomponents. Hence, an ideal case wherein the measured a nw (λ) provided as input to each ADA to derive absorption subcomponents is included in each comparison. The comparison of remaining SAA ADA models with SAAs is presented in Figures S3-S5 as Supplementary Information.

Performance Evaluation of Hybrid Models in Deriving a ph (λ)
For the IOCCG dataset ( Figure 6A, Row 1), GIOP CB and QAA CB model-derived a ph (λ) have 23-42% lower MAPE compared to GIOP and QAA models. Higher MAPE values were observed for a ph (555) for all the models. MAPE values for GIOP CB and QAA CB derived a ph (λ) are 6-11% lower compared to the CB model. For the GB dataset, MAPE values for GIOP CB and QAA CB models derived a ph (λ) are 4-25% lower than GIOP and QAA models. In the case of the CCRR dataset, GIOP CB model-derived a ph (λ) obtained 4-17% lower MAPE compared to GIOP model-derived a ph (λ). For the Red Sea dataset, GIOP CB model-derived a ph (λ) have 12-22% and 4-6% lower MAPE values compared to the GIOP model. Similarly, compared to the QAA model, MAPE values for a ph (λ) from GIOP CB model are 1-25% and 2-14% lower. QAA CB model-derived a ph (λ) have 25-29% and 13-36% lower MAPE values than GIOP and QAA models. The validity of absorption subcomponents derived from GIOP CB and QAA CB models varied in the range 82-100% ( Figure S2A). These results indicate that only GIOP CB model performed better with lower errors than GIOP and QAA models for the Red Sea dataset. In the case of GB, CCRR and the Red Sea datasets, GIOP CB and QAA CB model-derived absorption subcomponents obtained 1-34%, 11-70% and 1-43% higher MAPE values than the CB model. Absorption subcomponents derived from QAA CB and GIOP CB models deviated less in simulated datasets than GB and CCRR datasets ( Figure 7A).
The average spectral MAPE value for a ph (412 − 555 nm) derived from GIOP GSCM is~7% less than GIOP for the IOCCG dataset. The average spectral MAPE values for GIOP GSCM and QAA are similar, with a difference of less than 1% ( Figure 6A,B, Row 2). For the GB dataset, MAPE values for a ph (412 − 510 nm) from GIOP GSCM are 2-15% and 2-10% less than GIOP and QAA models. Similarly, QAA GSCM derived a ph (412 − 555 nm) obtained 4-13% and 1-8% lower MAPE values than GIOP and QAA models. Overall, for GB dataset, QAA GSCM model-derived absorption subcomponents are lower than GIOP and QAA models. The lower MAPE values can be a consequence of the lower N% of 19% observed for QAA GSCM derived a ph (λ), as compared to 90-100% valid retrievals of GIOP and QAA. GIOP GSCM derived absorption subcomponents also obtained a lower N% of 76% ( Figure S2A). For the CCRR dataset, compared to GIOP and QAA-derived a ph in 412-555 nm range, GIOP GSCM model obtained 14-41% and 2-7% lower MAPE values. QAA GSCM derived a ph obtained 2-56% lower and 2-18% higher MAPE values than GIOP. The N% values obtained for both a ph (λ) and a dg (λ) derived from GIOP GSCM , QAA GSCM , GIOP and QAA models are 92%, 22%, 94% and 100%, respectively. These results imply that the lower MAPE values from the QAA GSCM model can be a consequence of the lowest N%, as witnessed in GB dataset.
In comparison with GIOP and QAA models, a ph (λ) derived from GIOP Zhang models obtained 6-18% lower MAPE values in 410-510 nm range for IOCCG dataset. GIOP and QAA-derived a ph at 555 and 670 nm wavelengths have >100% MAPE; hence, they are excluded from the comparison (Row 3, Figure 6A,B). For the GB dataset, GIOP Zhang model-derived a ph in 412-510 nm range have 18-26% and 6-24% lower MAPE than GIOP and QAA models. QAA Zhang model-derived a ph (λ) obtained 1-7% lower MAPE values than GIOP and 1-9% higher MAPE values than QAA models. For the CCRR dataset, GIOP Zhang model-derived a ph (λ) obtained 14-58% and 4-20% lower MAPE than GIOP and QAA models. QAA Zhang model-derived a ph (λ) exhibited~7% higher error in the 412-490nm range and 9-50% lower error in 510-670 nm compared to the GIOP model. Similarly, MAPE values for a ph from QAA Zhang are 10-17% higher than QAA-derived a ph in 412-490 nm range. For the Red Sea dataset, MAPE values for GIOP Zhang derived a ph (412 − 555 nm) are 8-16% and 2-45% lower than GIOP and QAA. Similarly, QAA Zhang model-derived a ph in 412-555 nm range have 12-35% and 4-73% lower MAPE compared to the QAA model. In comparison with Zhang's model, GIOP Zhang and QAA Zhang modelderived a ph (λ) obtained 16-32% and 6-32% higher MAPE values. GIOP Zhang modelderived a ph (λ) from IOCCG, GB, CCRR and the Red Sea datasets have N% in the ranges of 37-64%, 63-79%, 65-87% and 59-79% ( Figure S2A). Similarly, QAA Zhang model-derived a ph (λ) from IOCCG, GB, CCRR and the Red Sea datasets are 46-63%, 31-50%, 22-74% and 7-11% valid ( Figure S2A). QAA Zhang model-derived absorption subcomponents (Blue dots in Figure 7C) obtained higher biases than GIOP Zhang (Red "+") for all four datasets. Especially for the GB dataset, the deviations observed from the 1:1 line are higher for QAA Zhang MAPE and (B). RMSE in log scale. Here, GIOPCB indicates that GIOP algorithm with default configuration is used to derive a nw (λ) from R rs (λ) of each of the four datasets. In the second step, the GIOP-derived a nw (λ) is used as input to CB algorithm to obtain absorption subcomponents of a ph (λ) and a dg (λ). The same procedure is used in GIOPZhang, QAAZhang, GIOPGSCM and QAAGSCM models. GIOP, QAA and GSM are the existing operational semi-analytical algorithms that use R rs (λ) as input and provide absorption subcomponents as outputs. The CB, GSCM and Zhang's models use spectral non-water absorption coefficient, a nw (λ) as input to derive the absorption subcomponents. Here, GIOP CB indicates that GIOP algorithm with default configuration is used to derive a nw (λ) from R rs (λ) of each of the four datasets. In the second step, the GIOP-derived a nw (λ) is used as input to CB algorithm to obtain absorption subcomponents of a ph (λ) and a dg (λ). The same procedure is used in GIOP Zhang , QAA Zhang , GIOP GSCM and QAA GSCM models. GIOP, QAA and GSM are the existing operational semi-analytical algorithms that use R rs (λ) as input and provide absorption subcomponents as outputs. The CB, GSCM and Zhang's models use spectral non-water absorption coefficient, a nw (λ) as input to derive the absorption subcomponents. The hybrid models QAACB, QAAGSCM, QAAZhang, GIOPCB, GIOPGSCM and GIOPZhang models use R rs (λ), spectral remote sensing reflectance to derive absorption subcomponents. Blue dots and red plus indicate the hybrid models that use GIOP and QAA in the first step. The absorption decomposition algorithms, CB, GSCM and Zhang's (Black circles) models, use a nw (λ), spectral non-water absorption coefficient to derive the absorption subcomponents, λ is the wavelength. The inputs for the models, i.e., R rs (λ) and a nw (λ) are taken from the four datasets, IOCCG, GB, CCRR and the Red Sea. The 1:1 relationship is shown in a grey dotted line. and GIOP Zhang models use R rs (λ), spectral remote sensing reflectance to derive absorption subcomponents. Blue dots and red plus indicate the hybrid models that use GIOP and QAA in the first step. The absorption decomposition algorithms, CB, GSCM and Zhang's (Black circles) models, use a nw (λ), spectral non-water absorption coefficient to derive the absorption subcomponents, λ is the wavelength. The inputs for the models, i.e., R rs (λ) and a nw (λ) are taken from the four datasets, IOCCG, GB, CCRR and the Red Sea. The 1:1 relationship is shown in a grey dotted line.

Performance Evaluation of Hybrid Models in Deriving a dg (λ)
For the IOCCG dataset (Row 1, Figure 8A,B), GIOP CB and QAA CB derived a dg (λ) obtained 2-8% lower MAPE than GIOP and QAA models. Higher MAPE values were observed for a dg (670) for all the models. MAPE values for GIOP CB and QAA CB modelderived a dg (λ) are 2-13% higher than the CB model. For the GB dataset, the average spectral MAPE for a dg (λ) derived from QAA CB and GIOP CB models are 3% higher and 2% lower compared GIOP model. In comparison with QAA-derived a dg (λ), average spectral MAPE for QAA CB and GIOP CB models-derived a dg (λ) are 2% and 8% higher. For the Red Sea dataset, compared to the QAA model, MAPE values for a dg (λ) from GIOP CB model are 2-14% lower. The errors for QAA CB derived a dg (λ) are 4-30% and 5-23% higher compared to GIOP and QAA models.
The average spectral MAPE obtained for GIOP GSCM derived a dg (λ) is~2% lower than GIOP in the case of the IOCCG dataset. QAA GSCM derived a dg (λ) obtained lower MAPE values in the range of 2-27% compared to GIOP and 1-20% lower MAPE than QAA (Row 2, Figure 8A). For the GB dataset (Row 2, Figure 8A), except at 670 nm, GIOP GSCM derived a dg obtained lower MAPE values compared to GIOP and higher MAPE values compared to QAA. In 412-555 nm range, QAA GSCM derived a dg obtained 12-19% lower MAPE values than QAA and 7-14% lower MAPE values than QAA. The lower MAPE values can be a consequence of the lower N% of 19% observed for QAA GSCM derived a dg (λ) retrievals, as compared to 90-100% valid retrievals of GIOP and QAA. GIOP GSCM derived absorption subcomponents also obtained a lower N% of 76% (Row 2, Figure S2B). For the CCRR dataset, GIOP GSCM obtained~5% lower average spectral MAPE compared to GIOP. QAA GSCM derived a dg (λ) obtained 9-23% and 4-14% lower MAPE compared to GIOP and QAA models. The N% values obtained for a dg (λ) derived from GIOP GSCM , QAA GSCM , GIOP and QAA models are 92%, 22%, 94% and 100%, respectively.
These results imply that the lower MAPE values from the QAA GSCM model can be a consequence of the lowest N%, as witnessed in GB dataset. MAPE values from QAA GSCM are 8-15%, 13-20%, 7-23% and 7-18% higher than GSCM. In addition, QAA GSCM and GIOP GSCM derived absorption subcomponents have higher biases than the GSCM model, as indicated by deviation from the 1:1 line. GIOP GSCM model-derived absorption subcomponents have higher biases ( Figure 9B) compared to QAA GSCM for the Red Sea dataset, resulting in higher MAPE values (Row 2, Figure 8A). GIOP Zhang and QAA Zhang derived MAPE values for a dg (λ) from IOCCG dataset are higher or similar to QAA and GIOP models (Row 3, Figure 8A,B). For the GB dataset, QAA Zhang obtained 3-8% lower MAPE values compared to GIOP and similar values to QAA. For the CCRR dataset, GIOP Zhang derived a dg (λ) obtained~5% lower error compared to GIOP in 412-490 nm range and 2-7% higher in 510-670 nm range, indicating a similar overall error. QAA Zhang model-derived a dg (λ) have 5-6% and 2-21% higher MAPE compared to GIOP and QAA models. For the Red Sea dataset, the GIOP Zhang model obtained 4-17% and 8-35% lower MAPE than GIOP and QAA in 490-555 nm. QAA Zhang model-derived a dg (λ) obtained 2-33% and 7-57% lower MAPE values than GIOP and QAA models. For the GIOP Zhang model, the N% observed for IOCCG, GB, CCRR and the Red Sea datasets are 82-83%, 86-88%, 92-94% and 79-100% respectively (Row 3, Figure S2B). Similarly, QAA Zhang model-derived a dg (λ) from IOCCG, GB, CCRR and the Red Sea datasets obtained N% in the ranges of 75-73%, 43-50%, 44-47% and 10-11%, respectively. These results indicate that GIOP Zhang model-derived absorption subcomponents are more valid compared to QAA Zhang . However, the N% obtained for the GIOP Zhang model is still lower than GIOP and QAA at all wavelengths and all datasets. GIOP Zhang and QAA Zhang model-derived a dg (λ) obtained 10-43% and 8-25% higher MAPE than Zhang's model. Overall, the QAA Zhang model obtained lower MAPE values than GIOP and QAA in IOCCG, GB and the Red Sea datasets. However, the observed lower errors from QAA Zhang can result from the lowest N% values obtained for absorption subcomponents.  MAPE and (B). RMSE in log scale. Here, GIOPCB indicates that GIOP algorithm with default configuration is used to derive a nw (λ) from R rs (λ) of each of the four datasets. In the second step, the GIOP-derived a nw (λ) is used as input to CB algorithm to obtain absorption subcomponents of a ph (λ) and a dg (λ). The same procedure is used in GIOPZhang, QAAZhang, GIOPGSCM and QAAGSCM models. GIOP, QAA and GSM are the existing operational semi-analytical algorithms that use R rs (λ) as input and provide absorption subcomponents as outputs. The CB, GSCM and Zhang's models use spectral Here, GIOP CB indicates that GIOP algorithm with default configuration is used to derive a nw (λ) from R rs (λ) of each of the four datasets. In the second step, the GIOP-derived a nw (λ) is used as input to CB algorithm to obtain absorption subcomponents of a ph (λ) and a dg (λ). The same procedure is used in GIOP Zhang , QAA Zhang , GIOP GSCM and QAA GSCM models. GIOP, QAA and GSM are the existing operational semi-analytical algorithms that use R rs (λ) as input and provide absorption subcomponents as outputs. The CB, GSCM and Zhang's models use spectral non-water absorption coefficient, a nw (λ) as input to derive the absorption subcomponents. non-water absorption coefficient, a nw (λ) as input to derive the absorption subcomponents. The hybrid models QAACB, QAAGSCM, QAAZhang, GIOPCB, GIOPGSCM and GIOPZhang models use R rs (λ), spectral remote sensing reflectance to derive absorption subcomponents. Blue dots and red plus indicate the hybrid models that use GIOP and QAA in the first step. The absorption decomposition algorithms, CB, GSCM and Zhang's (Black circles) models, use a nw (λ), spectral non-water absorption coefficient to derive the absorption subcomponents, λ is the wavelength. The inputs for the models, i.e., R rs (λ) and a nw (λ) are taken The errors observed in absorption subcomponents derived from GIOP ADA and QAA ADA models are induced in two steps. First, a nw (λ) from GIOP and QAA are not entirely errorfree; hence, using erroneous a nw (λ) induces additional error in GIOP ADA and QAA ADA model-derived absorption subcomponents. Second, ADA model-derived absorption subcomponents also are not error-free, implying that even if error-free a nw (λ) are used as inputs, each ADA model derives absorption subcomponents with some error.
Summarizing the performance of hybrid SAA ADA models, GIOP CB and QAA CB modelderived absorption subcomponents obtained lower errors than GIOP and QAA only in the case of simulated datasets. GIOP GSCM model obtained lower errors for the derived absorption subcomponents in IOCCG, GB and CCRR datasets than GIOP and QAA models. Higher errors for the GIOP GSCM derived absorption subcomponents were observed in the case of the Red Sea dataset. A possible reason could be the Chesapeake Bay based parameterization of absorption subcomponents in the GSCM model, whereas the Red Sea dataset is mostly oligotrophic waters. For the Red Sea dataset, GIOP Zhang model-derived absorption subcomponents achieved lower errors than GIOP and QAA models. Although QAA GSCM and QAA Zhang derived absorption subcomponents obtained lower errors than GIOP and QAA, the lower errors can be attributed to lower N% values.

Effect of Different Datasets on Model Performance
This paper focuses on developing hybrid inversion models to derive absorption subcomponents from remote sensing reflectance. Two simulated and two measured datasets covering a wide range of optical properties are used to facilitate intercomparison of the ADAs performance. GB dataset is the most extensive global dataset of in situ R rs (λ) and concurrent IOPs. However, the distribution of the samples in the GB dataset is not equivalent to the global ocean. Eutrophic waters are over-represented in GB and oligotrophic waters are under-represented [35]. To compensate for this data gap, the Red Sea simulated dataset was used [37]. An ideal case of inter-comparison of models requires datasets exclusive of data used to parameterize the SAAs or ADAs. However, in the present study, it was difficult to assess the impact of IOCCG, GB and CCRR datasets on model performance, as these datasets influence most models to some extent. For example, QAA used in the present study uses an updated parameterization [14], wherein some empirical equations were based on the IOCCG and NOMAD, a significant contributor to the GB dataset used in the study [13]. Among the ADAs, the spectral shape model used in Lin's model for modelling a ph is based on a subset of NOMAD. Hence, independent datasets are required to assess the dataset's influence on various models performance and in further evaluation of models.
An assumption related to in situ datasets is their closeness to the truth. In reality, in situ data may have some associated inherent errors. The use of robust quality control procedures to remove measurement outliers was adopted during the compilation of GB and CCRR datasets [35,36]. Measurements of absorption subcomponents and a nw (λ) contain inherent errors depending on the type of method and instruments used [2]. Despite the application of the best available correction methods, a nw (λ) measured using in situ absorption measuring instruments may contain a residual error of 20% or more. The magnitude of errors depend on wavelength region as well, with relatively higher errors observed in red and NIR wavelengths [40]. Quantifying the error associated with each variable at each wavelength with varying uncertainty levels is a tedious task. Future efforts to compile in situ observations with accompanying uncertainties are required. GB and CCRR datasets are in situ datasets, wherein the error associated with R rs (λ) can be small as compared to satellite-derived R rs (λ). The performance of SAAs and ADAs may vary when data with higher levels of noise is used.

Applicability of Existing ADAs in Partitioning a nw (λ)
Each ADA used in the study was designed for a specific purpose, as indicated by the different outputs they provide (Table 1). Each ADA's advantages and disadvantages are dependent to a certain degree on the methodology, spectral shape models used and wavelengths. The performance of each ADA and the hybrid inversion methods developed using these ADAs are dependent on these factors. For example, the CB model provides S f , a parameter useful to spatio-temporal patterns of phytoplankton size at a global level [39]. Zhang's model provides pico-, nano-and micro-plankton size classes useful to study phytoplankton diversity at global scales. Despite this, some of the ADAs are not devoid of limitations. CB model requires Chlorophyll concentration as input. To avoid dependency on field-measured inputs, [Chl] is derived from Ocean Colour Chlorophyll-a empirical algorithms like OC4 [41,42]. However, ancillary inputs can be a source of error that may affect the algorithm's performance [1,20]. The shape and magnitude of the phytoplankton absorption spectrum vary based on phytoplankton community size, species and other effects like pigment packaging and photodegradation. Use of a second-order polynomial equation as in Lin's ADA to model a ph (λ) may limit its variation in spectral shape and thus limits its representativeness. Secondly, Lin's model generates two sets of a ph (λ) and a d (λ) that are not identical, indicating an incomplete closure [17]. As observed from the results, OSI method exhibited higher errors for the derived absorption subcomponents. A possible reason could be its parameterization focused on application to coastal regions. Along with a ph (λ), a dg (λ) also exhibit high variability in both shape and magnitude [43] in world's oceans and use of strict assumptions such as a fixed shape model as in CB, Lin's and Zhang's models may limit the retrieval performance [1]. GSCM model was developed and validated with data from Chesapeake Bay only and its applicability in other aquatic environments needed to be studied.

Effect of SAA Methodology in Deriving a nw (λ) from R rs (λ)
Based on results from spectral plots (Figure 4), QAA model-derived a nw has higher errors at 670 nm in the case of GB and CCRR datasets. GSM model exhibited higher errors in blue-green wavelengths for all datasets. For most Case-1 global waters, water absorption dominates at higher wavelengths, i.e., at 665 or 670 nm and above. As a result, R rs (665) or R rs (670) approaches zero. QAA is an algebraic approach that derives the IOPs at each wavelength from the corresponding R rs at that wavelength. When the available magnitudes are smaller, the errors in deriving a(665 or 670) increase, there by the errors in a nw (665 or 670). The results for QAA model-derived IOPs lie in accordance with another comparison study [44]. A possible reason for lower errors observed in GIOP modelderived a nw at longer wavelengths can be attributed to the indirect method of calculating a nw . In GIOP, a nw (665 or 670) is not directly derived from R rs (665 or 670), instead, it is calculated as the sum of a ph (665 or 670) and a dg (665 or 670). The values of absorption subcomponents are derived using a ph and a dg at a reference wavelength, typically 443 nm and associated spectral shape models. Hence, the absorption subcomponents and a nw are inferred from R rs at shorter wavelengths.
All ADAs obtained higher errors in deriving a ph at 555 nm and exhibited an increase in error for the derived a dg towards longer wavelengths ( Figure 5). A similar trend is observed in absorption subcomponents derived from ADAs that use a nw (λ) derived GIOP and QAA. Among various ADAs, the ADA models that use a nw from QAA didn't perform well as compared to using a nw from GIOP (Figures 6 and 8).

Validity of the Derived Absorption Subcompnents and Statistical Indicator Selection
The range of absorption subcomponents used in the present study varies by more than three orders of range. For example a ph (670) for GB dataset varies in the range of 0.001-0.8199 m −1 . If most of the model-retrieved values are biased and deviate from the actual values, the observed MAPE value increases. Higher MAPE values, i.e., >100% were obtained for few models at 670 or 665 nm. The preliminary reason for such discrepancy in the model-derived value as compared to the actual value is as follows. The absorption due to pure water at these longer wavelengths (red and NearInfrared) is higher compared to its absorption at blue-green wavelengths. Thereby, the magnitudes of the absorption coefficients of subcomponents at longer wavelengths is very low. Deriving accurate absorption coefficients of subcomponents with smaller magnitudes at these longer wavelengths is particularly difficult. Similar higher errors in the derived absorption subcomponents were reported in previous studies [1]. To reduce the effect of such high spurious MAPE values on the entire model performance, the longer wavelengths are omitted. Further, to perform a proper comparison, the spurious MAPE values are removed from all the models in the comparison wherever necessary.
In most of the cases, the N% values obtained for various model-derived absorption subcomponents are greater than 75%. This statistic is important to find the model that is capable of deriving a maximum number of valid retrievals. Although, the present study deals with in situ and simulated data only, the proposed hybrid models can be applied to satellite data in further study. In the case of satellite datasets, the model's ability to derive a maximum number of valid absorption subcomponents is crucial. The N% statistic at a wavelength depends on a numerous factors like type of methodology used in partitioning, errors present in the a nw at the wavelength, magnitude of the absorption subcomponents etc. Except in longer wavelengths, the N% at blue-green wavelengths are nearly the same for all model-derived absorption subcomponents.
The performance of GSCM and Zhang's model-based hybrid inversion approaches are better than other models because of the following reasons: 1. the spectral library of the GSCM model includes several measured distinct shapes for the absorption subcomponents, unlike other ADA's wherein a modelled shape is used; 2. Zhang's model provides more flexibility by allowing the model to choose optimal contributions of pico-, nanoand microplankton.
The present study may further be extended by deriving Chl from the derived a ph (λ) or directly from R rs (λ). GIOP and GSM models in the SAA's and Zhang's ADA model only provide Chl as output and the rest of the models provide a ph (λ) as outputs. Although Chl can be derived from a ph (λ) using a * ph and Chl relationships, the errors would be higher because of two reasons. First, a ph (λ) derived from SAA ADA models are affected by errors in datasets and methodology adopted in SAA and ADA. Second, using a a * ph and Chl relationship with erroneous input of a ph (λ), may further induce additional errors in the derived Chl. Hence, this step is not pursued further.

Conclusions
Absorption decomposition algorithms partition the total absorption coefficient or non-water absorption coefficient into absorption subcomponents. Comparing the existing ADAs is essential to suggest an ADA capable of deriving absorption subcomponents over a wide range of waters. Based on the motive for partitioning a nw (λ), the output variables and methods vary for each ADA. The a ph (λ) and a dg (λ) are two standard outputs from the six ADAs, so this study focuses on comparing the models based on these two retrievals. Similar to CB and GSCM models, other ADAs need to be assessed for their applicability to remote sensing applications, i.e., the derivation of absorption subcomponents from R rs . The purpose of the present study is to formulate and evaluate the performance of hybrid approaches that combine SAAs and ADAs to derive absorption subcomponents from both measured and SAA-derived a nw (λ). Among the six ADAs, GSCM and Zhang's models had lower errors for the derived absorption subcomponents from measured a nw (λ). Similarly, GIOP and QAA algorithms obtained lower errors at most wavelengths in deriving a nw from R rs (λ). Hence, a nw (λ) from these two algorithms is used as inputs to GSCM, Zhang and CB models to formulate SAA ADA hybrid models to derive absorption subcomponents from R rs (λ). Based on the statistical comparison, GIOP GSCM model-derived absorption subcomponents obtained lower errors than GIOP and QAA models for IOCCG, CCRR and GB datasets. In the Red Sea dataset, GIOP Zhang model-derived absorption subcomponents