A Quantitative Comparison of Total Suspended Sediment Algorithms : A Case Study of the Last Decade for MODIS and Landsat-Based Sensors

Abstract: A quantitative comparative study was performed to assess the relative applicability of Total Suspended Solids (TSS) models published in the last decade for the Moderate Resolution Imaging Spectroradiometer (MODIS) and Landsat-based sensors. The quantitative comparison was performed using a suite of statistical tests and HydroLight simulated data for waters ranging from clear open ocean case-1 to turbid coastal case-2 waters. The quantitative comparison shows that there are clearly some high performing TSS models that can potentially be applied in mapping TSS concentration for regions of uncertain water type. The highest performing TSS models tested were robust enough to retrieve TSS from different water types with Mean Absolute Relative Errors (MARE) of 69.96%–481.82% for HydroLight simulated data. The models were also compared in regional waters of northern Western Australia where the highest performing TSS models yielded a MARE in the range of 43.11%–102.59%. The range of Smallest Relative Error (SRE) and Largest Relative Error (LRE) between the highest and the lowest performing TSS models spanned three orders of magnitude, suggesting users must be cautious in selecting appropriate models for unknown water types.


Introduction
The health of coastal waters not only determines the health of marine habitats in the region but also signifies the health of the nearby human inhabitants with nearly 60% of the earth's population settled in the coastal zones of our oceans and seas, and over 90% of the world's fish caught for consumption being sourced from coastal waters [1].The health of water systems is typically determined from a key indicator, the water clarity (turbidity) which is influenced by the amount of dissolved matter and total suspended solids (TSS) comprising organic matter such as algae and other micro-organisms and inorganic particulate matter from minerals [2].Monitoring TSS along with other water quality parameters is crucial for coastal ecology because TSS can directly affect the turbidity and color of water [3] and turbidity determines the amount of light availability at depth for primary production [4][5][6].
Monitoring the temporal and spatial distribution of TSS in the coastal environment can be a huge undertaking and nearly impossible in terms of financial and time resources if performed using traditional in situ water sampling methods [7] unless coupled with satellite-based remote sensing.Since the early space-borne sensors of the 1970s there has been, and continues to be, a great improvement in the spectral, spatial and temporal resolutions [8].For example, the Landsat-based series of sensors has evolved over the years from three (red, green and blue) spectral bands with spatial resolutions of 185 m and a revisit time of 18 days to the newest Landsat-8 with 11 spectral bands (433-12,500 nm) with spatial resolutions of 30 m (and 15 m panchromatic) and a revisit time of 16 days.The shortcoming of the long revisit time for Landsat can be filled by the readily available MODIS-Aqua and Terra sensors which have shorter revisit times of one day, and with 36 spectral bands (405-14,385 nm) and spatial resolutions from 250 m to 1000 m.
The majority of the models developed in retrieving TSS by remote sensing methods are typically locally tuned to a regional water or waters with similar optical properties.Regional tuning of a TSS model is necessary because of the potentially large variation in the inherent optical properties (IOPs) of the water constituents.The theoretical basis of ocean color remote sensing has shown that sensor-measured reflectance of the water is related to the IOPs of the water-absorption and scattering coefficients.IOPs vary with the types and amounts of the water's constituents, such as sediments, phytoplankton, detrital matter and CDOM [44] which may be different for different sediment types and phytoplankton types in different regions.In addition, factors such as water depth, viewing geometry, and atmospheric conditions all add to the complexity of the relationship between the measurement of reflectance of the water surface and the IOPs and concentrations of constituents [45].
TSS models are generally classified into three categories, (1) an empirical model where TSS is modelled directly using a statistical analysis to relate the apparent optical properties (AOPs); (2) an analytic model that relates the IOPs and AOPs of water through radiative transfer theory to derive TSS; and (3) a semi-analytic model that is partly based on the empirical analysis and grounded on the radiative transfer theory [7].Individual TSS model designs have their own limitations and advantages.An empirical model is often sought for its simplicity and explanatory power because unique properties of local waters are tuned to each model, but it may lack general applicability.An analytic model is potentially applicable to other water bodies because it is not dependent on the in situ water constituents, but it requires accurate knowledge of water column properties which is often difficult to acquire.The semi-analytic model has both the limitations and advantages associated with the first two models, and it is generally preferred because it has higher explanatory power and is more convenient than the analytic model [7,8].
In the last decade, various TSS models have been developed [35,37,38,46,47] and applied to their respective regions with a wide range of success with reported retrieval errors ranging from lows of ~18% to highs of ~61%.Considering each model is developed and tuned for a specific region, water type and its associated IOPs, the application or transferability of the models to other regions is limited, and the likely accuracy of the results unknown.Even when an existing TSS model is applied to waters in similar regions it is often first re-calibrated before being applied.The availability of many TSS algorithms for different regions and sensors warrants one to ask if we can use someone else's algorithm to estimate TSS in regions where we do not have any in situ observations?For the cross applicability of TSS models between different regions the design of a TSS algorithm has to either be based on analytic methods and grounded on theoretical functions of radiative transfer theory, or the waters must be assumed to have similar optical and physical properties.However, considering the vast number of TSS models that have been developed across different geographical regions with different optical and physical properties we can seek to establish the robustness in the applicability of these existing TSS algorithms for different regions.
A recent study by Brewin, et al. [44] developed an objective methodology where comparison of different bio-optical algorithms are quantitatively and qualitatively considered for use in climate studies.Following the methods of Brewin, et al. [44] and their quantitative methodology to rank the algorithms, in this study we objectively compare the performance of TSS algorithms for MODIS and Landsat sensors developed during the last decade using HydroLight simulated data for different water and sediment types.If shown to be robust, these algorithms would provide marine remote sensing scientists and coastal managers some level of confidence in their ability to assess the quality of water with minimal resource for coastal monitoring of optically unexplored waters.Specifically, this study aims to quantitatively assess the applicability of established TSS algorithms to different water types and quantify the variability in retrieving TSS when using off the shelf TSS algorithms for MODIS and Landsat sensors.

HydroLight Simulation
A set of ocean reflectance spectra were derived using the radiative transfer numerical model HydroLight 4.2 (Sequoia Scientific, Inc., Bellevue, WA 98005, United States of America) in the four component case-2 waters mode.Using a forward model HydroLight solves radiance distributions and derives reflectance and radiance for water bodies with specific inherent optical properties (SIOPs) for given sky and water state conditions [48].Sub-surface remote sensing reflectance's (r rs ) were computed for infinitely deep water using a range of SIOPs, sea-state, and sky conditions.The spectral range for r rs from HydroLight was simulated for wavelengths (λ) in the range of 400 nm-800 nm at a nominal bandwidth of 4 nm.
For all the HydroLight simulations the sea state was chosen to have a wind speed of 5 m•s −1 and the sky radiance computed using the Harrison and Coombes (1988) normalized radiance model for a clear sky.The diffuse and direct sky irradiances were computed using the Gregg and Carder (1990) irradiance model for a solar zenith angle of 30 • [49].The four components, pure water, chlorophyll (CHL), colored dissolved organic matter (CDOM), and mineral (TSS) were modelled in varying concentrations, presented in Table 1, to be representative of open ocean to turbid coastal waters.For the TSS component, five different sediment types were used, namely (1) brown earth; (2) calcareous sand; (3) yellow clay; (4) red clay; and (5) Bukata from the default database of HydroLight.The phase functions for the components were modelled as Rayleigh like phase function for pure water, Fournier-Forand phase function with b b (λ)/b(λ) of 0.01 for CHL, and Petzold "average particle" phase function for TSS for all the aforementioned HydroLight simulations.In addition to the aforementioned parameters for HydroLight simulations, we further carried out additional simulations using the parameters outlined above but with solar zenith angles of 15   , and 60 • and b b (λ)/b(λ) ratios of 0.001, 0.01, 0.018, 0.05, and 0.1 for calcareous sand to study the robustness of TSS models to changes in solar angles and the backscattering ratios.
The SIOP models allow the scaling of the IOP of each component with concentration (X): where i is the component and a i *(λ) and b i *(λ) are component specific absorption and scattering coefficients.The SIOP of each component was either obtained from HydroLight's default dataset or modeled using established models.For the specific absorption and scattering coefficients: the absorption coefficient for pure water was obtained from Pope and Fry (1997) [50] and mass-scattering coefficient from Smith and Baker (1981) [51], the CHL mass-specific absorption coefficient (a ϕ *(λ)) from Prieur-Sathyendranath (1981) [52] and the CHL mass-specific scattering coefficient modeled using Equation (6), the CDOM mass-specific absorption was modeled using Equation (7) and CDOM was considered to be a non-scattering component, and the mineral mass-specific absorption and scattering coefficients were obtained from HydroLight's default dataset for brown earth, calcareous sand, yellow clay, red clay, and Bukata.Figure 1a,b shows the mass-specific absorption and scattering coefficients of the five different minerals used in the HydroLight modelling of water reflectance.The IOP models used in this HydroLight simulation are described by Equations ( 1) and ( 2).The total absorption coefficient (a(λ)) is the sum of absorption coefficients of pure water (aw(λ)), CHL (aϕ(λ)), CDOM (acdom(λ)) and TSS (ap(λ)): The total scattering coefficient (b) is the sum of scattering coefficients of pure water (bw(λ)), CHL (bϕ(λ)), and TSS (bp(λ)): The total backscattering coefficient is expressed as the sum of backscattering coefficients for pure sea water (bbw(λ)), particulates (bbp(λ)), and phytoplankton pigments (bbφ(λ)).
The SIOP models allow the scaling of the IOP of each component with concentration (X): where i is the component and ai*(λ) and bi*(λ) are component specific absorption and scattering coefficients.
The SIOP of each component was either obtained from HydroLight's default dataset or modeled using established models.For the specific absorption and scattering coefficients: the absorption coefficient for pure water was obtained from Pope and Fry (1997) [50] and mass-scattering coefficient from Smith and Baker (1981) [51], the CHL mass-specific absorption coefficient (aϕ*(λ)) from Prieur-Sathyendranath (1981) [52] and the CHL mass-specific scattering coefficient modeled using Equation (6), the CDOM mass-specific absorption was modeled using Equation (7) and CDOM was considered to be a non-scattering component, and •the mineral mass-specific absorption and scattering coefficients were obtained from HydroLight's default dataset for brown earth, calcareous sand, yellow clay, red clay, and Bukata.Figure 1a b ϕ * (λ) = 0.407CHL 0.795 600 λ (6)

Extrapolation of Simulated Dataset
The IOP data output by HydroLight do not extend beyond 800 nm, however some of the TSS algorithms for MODIS and Landsat utilize bands beyond the 800 nm reflectance data generated by the HydroLight simulations.To include algorithms which utilize bands in the NIR region of the electromagnetic spectrum, we extrapolated the r rs (λ) data from HydroLight to 1300 nm using Equation (1) of the quasi-analytical model of Lee, et al. [53] at a nominal wavelength of 1.0 nm: where g 0 and g 1 are assigned either g 0 = 0.0949 and g 1 = 0.0794 for oceanic case-1 water [54], g 0 = 0.084 and g 1 = 0.17 for coastal water, or averaged values of g 0 = 0.0895 and g 1 = 0.1247 for coastal and case-1 waters [53].The selection of values for g 0 and g 1 were based on the condition that the selected values provided the minimum Mean Absolute Relative Error (MARE) as defined in Equation (C1) in the Appendix C between HydroLight and Equation (8) r rs (λ) spectra.
To model the r rs (λ) spectra to 1300 nm using Equation (8), we used the following IOPs-the total absorption coefficient was computed using Equation (1) while the total backscattering coefficient was computed using Equation (2).Equations ( 4) and ( 5) were used to compute individual component-specific absorption and scattering coefficients using the respective component concentration and the phase function used in the HydroLight simulations as mentioned in Section 2.1.1.The total backscattering coefficient in Equation (8) was computed from the respective backscattering components in Equation (3) which in turn were computed using respective scattering components from Equation (2) and scattering phase functions and backscattering ratios discussed in Section 2.1.1.The mineral specific absorption and backscattering coefficients were spline extrapolated to 1300 nm to compute the mineral-specific absorption and backscattering coefficients required in Equations ( 4) and (5).The r rs (λ) spectra generated using HydroLight and modelled using Equation (8) had MARE of 1.6% to 13.73%.The higher relative error was toward the blue end of the spectral region.

Grouping of Datasets
Using the extrapolation methods discussed in Section 2.1.2,in total 2.2 × 10 4 r rs (λ) spectra were generated for the spectral range of 350 nm to 1300 nm at the nominal wavelength of 1.0 nm for the parameters discussed in Section 2.1.1.The water, from the point of view of remote sensing, can be classified into case-1 and case-2 water types: case-1 waters are optically dominated by phytoplankton (CHL) while case-2 waters are more optically complex with varying concentrations of CHL, CDOM and TSS that are region specific [28,45].With respect to modelling the water types, it is not feasible to model each water type that is optically similar to the optical properties of the water where each individual TSS model was developed.The TSS models that are robust enough in one region can often fail when applied to other regions because each TSS model is typically tuned to a specific region where the waters are optically unique.Thus, due to the problem of accurately modelling the waters to suit any specific TSS model, and acknowledging the fact that we cannot simulate all the conditions and compositions of ocean constituents for different regions, we resorted to five different classes (shown in Table 2) to represent varying cases of water where concentrations of one ocean constituent might dominate the others or there are different degrees of contributions from each constituent.CLASS I from the water classification in Table 2 represents high CHL and low CDOM concentration which in a physical world would be associated with high phytoplankton blooms in eutrophic lakes where concentration of CHL dominates other optically active substances [55].CLASS II with high CDOM and low CHL represents water where CDOM dominates other optically active substances, which is the case in lakes where CHL is generally low, for example as in the case in lakes in boreal regions and waters off the coast in the Baltic Sea [55].CLASS III and IV represent the extreme cases where both CDOM and CHL are either high or low, which can be associated with high phytoplankton blooms in coastal waters for CLASS III and open ocean water with low CHL for CLASS IV.CLASS V represents a general case of coastal waters where CHL and CDOM are moderate.For all the classes of water discussed above, the TSS is varied in its concentration independent of different water cases considered.The TSS retrieval algorithms developed by various researchers use different types of reflectance measurements to relate to TSS concentrations.The most common choice among all the TSS algorithms considered here is the remote sensing reflectance (Rrs(λ)), which is defined by Equation (9).
where Lw (0 + , λ) is the water leaving radiance and Ed (0 + , λ) is the downwelling irradiance evaluated above the water surface.The HydroLight generated r rs (λ) was converted to R rs (λ) following [53] as defined by Equation (10).
R rs (λ) = 0.52r rs (λ) 1 − 1.7r rs (λ) (10) After converting r rs (λ) to R rs (λ), depending on the sensor and the bands used by particular TSS algorithms, we convolved R rs (λ) from Equation (10) to each sensor's respective band reflectance using the spectral response function of the sensor in their respective bands using Equation (11).
where R k rs is the band averaged R rs for each band, k, with band width ∆k and spectral response function s(λ) of the sensor.
The next common reflectance type used in TSS algorithms is a normalized water-leaving reflectance which is related to R rs (λ) as follows: There are also algorithms which employ normalized water leaving radiance which is calculated using Equation (13).
where F o (λ) is the extraterrestrial solar irradiance band averaged to each sensor's band using their respective band spectral response functions.

TSS Models
This section lists the available TSS algorithms from 2000-2015 that are empirical and semi-analytic in their design for MODIS and Landsat-based sensors.We made an effort to select all the available TSS algorithms for the sensors considered in this study using a search database 'Scopus' (https://www.scopus.com/),but we acknowledge that some of the literature for TSS algorithms, which were not present in the database, might have been missed.However, within the limitation of our search capability we made an effort to use other science databases and discovered 42 MODIS empirical models and 7 semi-analytical models, 22 Landsat empirical models and 5 semi-analytical models.The summaries of each TSS algorithm are provided in Table A1.Semi-analytical models described in this section encompass all the semi-analytical models from MODIS (MOD-A) and Landsat (LAN-A).Models are considered semi-analytic because they are derived based on a physical form [56] or one or more parameters in the TSS algorithms are either parameterized using site-specific or global in-water bio-optical properties [38].Semi-analytic algorithms for the two sensors considered here consist of algorithms that are based on radiative transfer modelling to relate the dependence of geo-physical properties of the water, TSS in our case, to the reflectance via IOPs of the water.
Empirical models consist of TSS algorithms that are directly related with in situ AOPs of water and the TSS using linear or non-linear regression methods.For the two optical sensors considered here the empirical algorithms from MODIS (MOD-E) and Landsat (LAN-E) will be collectively known as empirical algorithms unless otherwise stated explicitly.The form of the equations used in the empirical methods ranged from simple linear [17,26,57,58], exponential [9, 21,34,59], power [10,46,60] and other polynomial relationships [61][62][63] using single, multiple or combinations of different bands in band ratio or self-formulated indexes.To differentiate the algorithms within each sensor, algorithms will be labeled with a respective number following each sensor's name, MOD-A1 and MOD-E1 will represent MODIS semi-analytic algorithm 1 and MODIS empirical algorithm 1 respectively; likewise, a similar naming convention is followed for TSS algorithms for Landsat-based sensors.

Statistical Tests and Scoring System
The statistical tests used to evaluate the performance of each TSS algorithm for different types of water described in Section 2.1.2are based on the statistical tests used by Brewin et al. [44].Further, to objectively rank the TSS algorithms we used the point scoring system of Brewin et al. [44].The details of each statistical test and scoring system of each test adopted from [44] are described in the following sections.Further, to contain the effect of spurious TSS generated by some of the TSS models being applied outside their range, we only included TSS estimations that were between a lower bound available in each TSS model (zero for the TSS models which did not contain the lower bound) and an upper bound of twice the highest TSS concentration reportedly used to calibrate each TSS model.

Pearson Correlation Coefficient (r) Test
The point scoring system for the r test involves determining if the r-value for each TSS algorithm is statistically significant when compared with the mean r-value for all TSS algorithms.The statistical significance is determined through z-scores and the z-score is computed through Fisher's r-to-z transformation using relationships between the r-values of two models and the total number of samples used to determine the r-values, described in [44] as: where r 1 is the r-value of a specific TSS algorithm and r 2 is the mean of all r-values from all the TSS algorithms.Similarly, n 1 is the number of samples in a specific TSS algorithm and n 2 is the mean number of samples from all TSS algorithms.In the event that the TSS model fails to produce a reasonable estimate of TSS within the accepted bounds of each TSS model when tested for a particular water type then in such cases the value of n 1 can be different between two different water conditions, similarly, the value of n 2 also changes as it is the average number of samples of all TSS models in that particular water type.
For algorithm comparison, a two-tailed test was performed using the z-score to determine the p-value.If the p-value was less than 0.05 then the r-values were considered as statistically significant and for each TSS algorithm that were statistically significant the following scores were assigned comparing the r-value and the mean r-value (r) of all TSS algorithms:

Root Mean Square Error (ψ) Test
The Root Mean Square Error (ψ) of a model estimate, y i , with respect to a true value, x i , can be computed using Equation ( 18): The 95% confidence intervals were also calculated for each TSS algorithm and the mean of all TSS algorithms.For each TSS algorithm, the following scoring points were assigned according to the conditions in Equation ( 19): where ψ 95%CI and ψ 95%CI is the 95% confidence interval of ψ and mean-ψ (ψ) of all TSS algorithms respectively.Figure 2 shows an example of scoring point classification for Landsat algorithms used in retrieving TSS concentration for the ψ-test.

The Bias (δ) Test
The bias (δ) of model estimate yi and true xi is calculated using Equation (20): ( ) For each TSS algorithm, following score points were awarded according to the conditions in Equation ( 21): 0 points AND 0 or 0 1 point or 0 0 or 0 0 2 points AND 0 0 where 95%CI δ and 95% CI δ is the 95% confidence interval of mean-δ ( ) δ of all TSS algorithms respectively.Further, to score one point only one conditions must be satisfied while to score two points both the conditions must be satisfied

The Center-Pattern Root Mean Square Error (Δ) Test
The center-pattern Root Mean Square Error (Δ) is calculated using Equation ( 21): The 95% confidence intervals were also calculated for each TSS algorithm and the mean of all TSS algorithms.For each TSS algorithm, the following scores were assigned according to the conditions in Equation (23):

The Bias (δ) Test
The bias (δ) of model estimate y i and true x i is calculated using Equation (20): For each TSS algorithm, following score points were awarded according to the conditions in Equation ( 21): where δ 95%CI and δ 95%CI is the 95% confidence interval of mean-δ (δ) of all TSS algorithms respectively.Further, to score one point only one conditions must be satisfied while to score two points both the conditions must be satisfied.

The Center-Pattern Root Mean Square Error (∆) Test
The center-pattern Root Mean Square Error (∆) is calculated using Equation ( 21): The 95% confidence intervals were also calculated for each TSS algorithm and the mean of all TSS algorithms.For each TSS algorithm, the following scores were assigned according to the conditions in Equation (23): where ∆ 95%CI and ∆ 95%CI is the 95% confidence interval of ∆ and mean-∆ (∆) of all TSS algorithm respectively.The Slope (S) and Intercept (I) of a type-2 regression [64] were calculated using Equation (24): where Y is the TSS estimates derived from the TSS algorithms and X the true TSS.The following scores were assigned by comparing the S-value of each TSS algorithm and mean-S (s) value of all TSS algorithms.
where σ s is the standard deviation of s from all TSS algorithms.
For the I parameter, for each TSS algorithm, the following scores were assigned according to the conditions in Equation (26).
where σ I is the standard deviation of mean-I (I) from all TSS algorithms.Further, in the S and I-test in Equations ( 25) and ( 26), to score one point only one of the two conditions must be satisfied while to score two points both the conditions must be satisfied.

Percentage of Possible Retrievals (η)
The percentage of possible retrievals (η) was calculated using Equation ( 27): where N E is the total number of TSS retrieved using each TSS algorithm from the total number of TSS concentrations (N M ) considered in the study.For the point scoring system the following basis was followed: where η and σ η is the mean η-value and its standard deviation for all TSS algorithm in η-test.

Total Points
For objective comparison the performance of each TSS algorithm with respect to different water types, all points from each statistical test were added and normalized by the mean score of all TSS algorithms.Thus, a score of zero indicates that the TSS algorithm is performing lower than the mean of all TSS algorithms, a score of one indicates that the TSS algorithm is at par with the average of all TSS algorithms, and a score of 2 means the TSS algorithm is better than the mean of all TSS algorithms.
Further, to test the stability of the scoring systems used in this study, we used a bootstrap method [65], with 1000 runs and each time generating a new dataset by resampling via the replacement method for model generated TSS and HydroLight TSS.Each new dataset was tested using the statistical tests and scored using the scoring system described in Section 2.3.The results reported in the Section 3 are the mean values of the total points from the bootstrapping method with 2.5 and 97.5 percentiles reported as uncertainty estimates for 95% confidence limits.

Mean of Total Points
The mean of total points is achieved by averaging the score of each TSS model across different sediment types or solar zenith angles or backscattering ratios for the particular water classes described in Section 2.3.7.For example, in the case of different sediment types in CLASS-I water, the mean of total points in CLASS-I is an aggregate of total scores of each TSS model for different sediment types.For the case of different backscattering ratios and solar zenith angle, the mean of total points is an aggregate of each TSS model for different backscattering ratios and solar zenith angles, respectively, for a specific sediment type in a particular water class.The error bars in the mean of total points are the mean of uncertainty estimates of the total points obtained from the 95% confidence limit from the bootstrapping method.

Final Score
The final score is the aggregate of the mean of total points across all water classes for different sediment types, backscattering ratios and solar zenith angles.For example, the final score for MOD-E1 is derived as the mean from the aggregate score of MOD-E1 at five different sediment types, backscattering ratios, and solar zenith angles across all five different water classes.The error bars are the standard deviation of errors from the mean of total points across all five different water classes.Figure 3 shows an illustration of the point score system adapted from [44] and used in comparing TSS models in this study.The error bars in the Final score are the mean of uncertainty estimates from the mean of total points.algorithms.Thus, a score of zero indicates that the TSS algorithm is performing lower than the mean of all TSS algorithms, a score of one indicates that the TSS algorithm is at par with the average of all TSS algorithms, and a score of 2 means the TSS algorithm is better than the mean of all TSS algorithms.Further, to test the stability of the scoring systems used in this study, we used a bootstrap method [65], with 1000 runs and each time generating a new dataset by resampling via the replacement method for model generated TSS and HydroLight TSS.Each new dataset was tested using the statistical tests and scored using the scoring system described in Section 2.3.The results reported in the Section 3 are the mean values of the total points from the bootstrapping method with 2.5 and 97.5 percentiles reported as uncertainty estimates for 95% confidence limits.

Mean of Total Points
The mean of total points is achieved by averaging the score of each TSS model across different sediment types or solar zenith angles or backscattering ratios for the particular water classes described in Section 2.3.7.For example, in the case of different sediment types in CLASS-I water, the mean of total points in CLASS-I is an aggregate of total scores of each TSS model for different sediment types.For the case of different backscattering ratios and solar zenith angle, the mean of total points is an aggregate of each TSS model for different backscattering ratios and solar zenith angles, respectively, for a specific sediment type in a particular water class.The error bars in the mean of total points are the mean of uncertainty estimates of the total points obtained from the 95% confidence limit from the bootstrapping method.

Final Score
The final score is the aggregate of the mean of total points across all water classes for different sediment types, backscattering ratios and solar zenith angles.For example, the final score for MOD-E1 is derived as the mean from the aggregate score of MOD-E1 at five different sediment types, backscattering ratios, and solar zenith angles across all five different water classes.The error bars are the standard deviation of errors from the mean of total points across all five different water classes.Figure 3 shows an illustration of the point score system adapted from [44] and used in comparing TSS models in this study.The error bars in the Final score are the mean of uncertainty estimates from the mean of total points.

TSS Model Comparisons
Figures 4 and 5 show the quantitative comparison between the models using the final scores which are aggregates of the total scores from different sediment types, backscattering ratios, and solar zenith angles across all five different water classes for MODIS and Landsat-based models respectively.The final results presented in Figures 4 and 5 are indications of the overall performance of the TSS models when weighted across different water types, sediment types and backscattering ratios.The detailed results of individual model performance in respective sediment types, backscattering ratios, and water types are presented in Supplementary Material, S11.In addition, the Supplementary Materials S1-S10 also provide the detailed statistical test results for each TSS model.
zenith angles across all five different water classes for MODIS and Landsat-based models respectively.The final results presented in Figures 4 and 5 are indications of the overall performance of the TSS models when weighted across different water types, sediment types and backscattering ratios.The detailed results of individual model performance in respective sediment types, backscattering ratios, and water types are presented in Supplementary Material, S11.In addition, the Supplementary Materials S1-S10 also provide the detailed statistical test results for each TSS model.
From the final scores displayed in Figures 4 and 5 we can visually observe that there are clearly high and low performing models.The high performing MODIS TSS models with final scores greater than 1.5, in the order of highest to lowest final score, are MOD-E6, MOD-A1, MOD-E28, MOD-A4, MOD-E10, and MOD-E42 and low performing MODIS TSS models with scores less than 0.5 are MOD-E8, MOD-E2, MOD-E24, MOD-E22 and MOD-E32.For the Landsat TSS models, LAN-E3, LAN-A4, LAN-E9, LAN-A5, and LAN-A1 have final scores greater than 1.5 while LAN-E11, LAN-E22, LAN-E16, and LAN-E18 have final scores less than 0.5.In the final scores of low performing TSS models, the LAN-E18 model has scores of zeros which shows that LAN-E18 failed to derive TSS within the acceptable TSS bounds of 0.4-5.8mg/L.We suspect the published algorithm includes an error.The overall ranking of the TSS models using the final scores for each TSS model is also presented in Tables B1 and B2 for MODIS and Landsat respectively.Further, Tables B1 and B2 provides mean total scores for different sediment types, backscattering ratios, and solar zenith angles in all five water classes for respective TSS model.With respect to the results displayed in Figures 4 and 5, without the inclusion of error bars the distinction between the high performing TSS models is clear and we can easily compare the scores of each TSS model to obtain a ranking.For instance, in Figures 4 and 5, the MOD-E6 and LAN-E3 are the highest scoring models with final scores of 1.70 and 1.73 respectively.However, on inclusion of the error bars, all high performing TSS models may be considered comparable and difficult to separate in terms of robustness, thus may all be ranked equally.Likewise, the case is similar for low From the final scores displayed in Figures 4 and 5 we can visually observe that there are clearly high and low performing models.The high performing MODIS TSS models with final scores greater than 1.5, in the order of highest to lowest final score, are MOD-E6, MOD-A1, MOD-E28, MOD-A4, MOD-E10, and MOD-E42 and low performing MODIS TSS models with scores less than 0.5 are MOD-E8, MOD-E2, MOD-E24, MOD-E22 and MOD-E32.For the Landsat TSS models, LAN-E3, LAN-A4, LAN-E9, LAN-A5, and LAN-A1 have final scores greater than 1.5 while LAN-E11, LAN-E22, LAN-E16, and LAN-E18 have final scores less than 0.5.In the final scores of low performing TSS models, the LAN-E18 model has scores of zeros which shows that LAN-E18 failed to derive TSS within the acceptable TSS bounds of 0.4-5.8mg/L.We suspect the published algorithm includes an error.The overall ranking of the TSS models using the final scores for each TSS model is also presented in Tables B1 and B2 for MODIS and Landsat respectively.Further, Tables B1 and B2 provides mean total scores for different sediment types, backscattering ratios, and solar zenith angles in all five water classes for respective TSS model.
With respect to the results displayed in Figures 4 and 5, without the inclusion of error bars the distinction between the high performing TSS models is clear and we can easily compare the scores of each TSS model to obtain a ranking.For instance, in Figures 4 and 5, the MOD-E6 and LAN-E3 are the highest scoring models with final scores of 1.70 and 1.73 respectively.However, on inclusion of the error bars, all high performing TSS models may be considered comparable and difficult to separate in terms of robustness, thus may all be ranked equally.Likewise, the case is similar for low performing TSS models where their error bars overlap.Further, we observe that two and three of the top five high scoring TSS models in MODIS and Landsat respectively are semi-analytic while none of the semi-analytic models were in the bottom five low scoring models.

Model Evaluation Using HydroLight Data
The five high and low scoring models from MODIS and Landsat TSS models were selected to further evaluate their performance.From all available HydroLight data discussed in Section 2.1.2,the aforementioned high scoring TSS models were evaluated for their Relative Error (RE) between model-derived and HydroLight TSS.From the results presented in Table 3 we observe that there is high variability in the RE results amongst the respective MODIS and Landsat TSS models.The differences in the Smallest Relative Error (SRE) for high scoring TSS models were not as large as the differences within the MARE and Largest Relative Error (LRE).The MARE ranged from a low of 69.96% to a high of 481.82% while the SRE and LRE ranged from 15% to 63.14% and 139.35% to 1109.80% respectively.In the low scoring models, the high variability in the RE was observed with the MARE for low performing models ranging from 106.43% to 1832.79% while the SRE and LRE ranged from 39.90% to 213.54% and 118.16% to 6778.93% respectively.In both MODIS and Landsat high scoring models, the LRE results were for backscattering ratios of 0.001 and for Bukata type sediment.The SRE results were for backscattering ratios of 0.01 and calcareous sand sediment.Further, for the SRE in both the high and low performing TSS models, we observe that the high and low performing TSS models scored reasonably well in either one of the categories in sediment types, backscattering ratios, solar zenith angle and water classes.For instance, the low performing LAN-E22 scored higher than most of the high scoring TSS models in SRE results which indicated that LAN-E22 retrieves better in one of the water types.The TSS derived using real satellite-data are bound by uncertainty related to observational, instrumental, measurement and data processing errors, the latter largely associated with the atmospheric correction procedure [66].Thus, to assess the tolerance of high and low performing TSS models to the uncertainties in R rs , which is the key input in derivation of the TSS concentration, we simulated the effect of R rs uncertainty (∆R rs ) by varying the R rs by ±10%, ±20% and ±50% of the HydroLight generated R rs at each of the MODIS and Landsat bands.The R rs ± ∆R rs was used in deriving TSS concentration and compared with HydroLight input TSS to calculate the Absolute Relative Error (ARE) of the TSS model.Table 3 reports the ARE and the MARE of HydroLight Data Validation as defined in Equation C2 in Appendix C. In general, we observe that with the increase in ∆R rs the ARE also increases and the errors are higher for +∆R rs than −∆R rs .The ARE for high scoring TSS models ranged from 33.14% to 1974.47% while for low scoring TSS models it ranged from 82.69% to 12747.84% which shows both high and low performing TSS models are not impervious to uncertainty in R rs measurements.However, high scoring TSS models show better tolerance to ∆R rs than the low scoring TSS models.The details of the TSS models deviation in estimating TSS concentration from the error-free HydroLight data with ∆R rs are shown in Table 3.

Model Evaluation Using In situ Data
As part of a regional water monitoring program, in situ reflectance and TSS measurements were carried out for the waters off the coast of northern Western Australia to develop regional TSS models (see MOD-A1 and LAN-A1 in Appendix A) [67].The details of the in situ measurements and regional TSS model developed using in situ data can be obtained from [67].A set of high scoring models (MOD-E10, MOD-A4, LAN-E9, and LAN-A5) and low scoring models (MOD-E1, MOD-E38, LAN-E6, and LAN-A3) were selected to compare with MOD-A1 and LAN-A1 in the context of in situ data comparisons.These subsets of models were selected because the reflectance bands used by other high scoring models were beyond the available reflectance bands in the in situ data.Table 4 shows the Mean RE results obtained from each of the model evaluations against in situ data.Table 4 displays a high variability in the Mean RE for model comparisons for high scoring models with in situ data, from a low of 43.11% for LAN-E9 to a high of 102.59% for LAN-A5.When compared with the regional model's MOD-A1 and LAN-A1 MARE results, we see that both the high scoring TSS models MOD-E10 and LAN-E9 and low scoring TSS models LAN-E6 and LAN-A3 were comparable.However, the results presented in Table 4 also show the extreme variability observed in the Mean RE for the low scoring models with a low of 35.62% and a high of 256%.

Data and Methodological Limitation
The data used in this study to quantitatively compare TSS models have been generated using the widely used [68,69] in-water radiative transfer model HydroLight 4.2.The simulated data do not encompass all different water types in which each TSS model was developed to be used, however, it does provide us with a dataset that is independent of the data that has been used to parameterize the models to avoid biases in the results.To include all the models in comparisons, the simulated data were extrapolated to the NIR region of the spectrum using the methods discussed in Section 2.1.2.The extrapolation of reflectance data can introduce unrealistic values if the underlying assumptions of the spline extrapolation methodology does not hold true for the NIR regions.The extrapolation of the data is not ideal when used in modelling remote sensing products but the error for extrapolation had a MARE of 4.0% which was considered to be acceptable for this study.The ideal case for data for model comparisons would be to use a real global water data base, which is currently not available.The NOMAD dataset (http://seabass.gsfc.nasa.gov/)that is currently the most extensive dataset of in situ reflectance measurement and in-water variables did not contain the TSS measurements essential for this study.
The use of the objective methodology [44] of comparing models, used in this study to compare TSS models, can aid users in selection of TSS models that are best suited for waters of regional interest in the absence of means and a method to produce their own regionally tuned TSS algorithms.However, the objective methodology used here is not without limitations, as discussed by [44] with respect to using average performance to classify between high and low performing models.The very low performance of one particular model would affect the average of all other models to the extent that it becomes difficult to differentiate scores between models.For example, in Figure S11.1 for the score of MODIS TSS models in yellow clay, MOD-E1-2, E8-9, E15, E22-24, E32, and E38 all have low scores which increases the score of other TSS models making it difficult to differentiate among high scoring models.This problem is further exacerbated when the majority of TSS models score low which makes the few remaining high scoring models to appear similar in score, which is the case in Figure S11.22 for b b /b of 0.001.
The objective classification was conducted on a case by case basis for different water types, sediment types, solar zenith angles, and backscattering ratios.The overall low performance of models in the final scores in Figures 4 and 5 does not necessarily mean that low performing TSS models scored less in all the categories used in deriving the final score.For example, in Figure 5, LAN-E22 scored a very low final score when compared to other TSS models, but when considering specific results as presented in Figures S11.16-S11.20,LAN-E22 received a score at least comparable with most of the best performing models in all water classes for the red clay sediment type.Likewise, similar cases can be ascertained for all the respective TSS model's scores for specific water classes, sediment types, backscattering ratios and solar zenith angle (Results provided as Supplementary Material S11 for other overall low scoring TSS models in Figures 4 and 5).An additional disadvantage of the objective methodology used here is that the final score does not necessarily show the performance of all models in different categories considered, it shows only the relative performance of models in comparison with the mean scores of TSS models.In Figure S11.1, we observe that almost all TSS models score relatively higher total points for brown earth and lower for Bukata sediment types when compared with other sediments.
To account for the methodological uncertainties from the range of univariate statistical tests described in Section 2.3, we used a bootstrapping method [44,65] which generates the confidence limit in the final score.The results from the 1000 bootstrap runs presented in all the score charts shows that the mean score of models did not vary significantly for each different run, the ranges of 95% confidence limits were smaller for most of the models.Further, to limit the effect of spurious TSS values derived by some of the models, especially models with exponential and power functions, we filtered out any derived TSS value below a minimum of 0.001 mg/L and greater than a twice the maximum TSS concentration of each TSS model.Filtering out the spurious results can artificially inflate the final scores because only values that are within the upper and lower bounds would be considered for statistical tests.However, the possible percentage retrieval test discussed in Section 2.3.6 negate such an effect because filtering out spurious results would result in lower possible percentage retrieval and lower score in the percentage retrieval test.

TSS Model Selection Guidelines
Even though there were clearly distinct higher and lower performing TSS models from the final score chart presented in Figures 4 and 5, the performance of individual models varied widely when viewed against respective water types, sediment types, and backscattering ratios.The results presented in Figures 4 and 5 can be of use to the end-users who are clearly interested in TSS models that are robust enough to be used in waters for which they have little or no information of their optical and physical properties to generate TSS products.Figures 4 and 5 indicate that the MODIS TSS models MOD-E6, MOD-A1, MOD-E28, MOD-A4 and MOD-E10 and the Landsat TSS models LAN-E3, LAN-A4, LAN-E9, LAN-A5 and LAN-A1 are ranked the highest in terms of likely suitability for estimating TSS concentration of unknown water types.An example of the selection of high performing TSS models using a real water dataset was demonstrated in Section 3.2.2 and it can be seen that the results varied widely among the high scoring TSS models, with MOD-E10 and LAN-E9 producing results within a MARE of 46.20% and 43.11% and other higher scoring models producing results as high as 102.59%.Considering the retrieval error of TSS concentrations from MODIS algorithms are typically reported as in the range of ~18.0% to ~61% for many studies conducted in the last decade [35,37,38,46,47], we consider the regional TSS models MOD-A1 and LAN-A1, and the empirical models MOD-E10 and LAN-E9 as being the most appropriate for the waters in the north of Western Australia.
However, readers with prior information of water and sediment types can use information provided in S11, and Tables B1 and B2 as a guideline in selecting the model that is best suited for that particular water type.The difference in Relative Error between the high and low scoring models validated using HydroLight data and the in situ data showed that there is a huge difference between the two.The best performing model from the high scoring models shows that TSS can be estimated with a Mean RE between 69.96% and 481.82% (for different water conditions), but the low scoring model's results can vary dramatically within a Mean RE ranging from 106.43% to 1832.79%.The high Mean RE for low scoring models does not necessarily mean that the low scoring model performs low for all waters types.The low scoring TSS model's performance in one category or more can be significantly better than other models, but overall on average the model performs poorly when compared with high scoring models across all water types.For example, the low scoring model LAN-E22 displays the Smallest RE of 19.41% which is certainly better than the Smallest RE of most of the high scoring model's Smallest RE.Thus, with prior knowledge of water types and bio-geochemical properties of the region, we can select a TSS model from both high and low performing TSS models presented in Tables B1 and B2 that have higher scores in the water that are similar to the region where TSS model would be applied.
The results also showed that semi-analytic models were generally higher in ranking when compared with empirical models.The reason for most semi-analytic models performing better than empirical models can be attributed to the fact that semi-analytic models, by design, were based on radiative transfer theory [38,70] and one or more parameters were calibrated using general in situ bio-optical properties representative of a wide range of global waters [7,38].

Conclusions
In summary, in this study we have applied an objective methodology to compare the TSS models and their suitability in use for retrieving TSS in the absence of a regionally tuned TSS model.From the study we have identified the MODIS TSS models MOD-E6, MOD-A1, MOD-E28, MOD-A4 and MOD-E10 and the Landsat TSS models LAN-E3, LAN-A4, LAN-E9, LAN-A5 and LAN-A1 as suitable for estimating TSS concentration in waters with no prior knowledge of bio-optical or bio-geochemical properties.The results from this study highlighted the impact of "local tuning" of algorithms, showing that some low scoring models performed better than the high scoring models in one or more specific sediment, backscattering, solar zenith and water types.The results from this study can be used to ascertain which TSS models perform well in particular water types, sediment types and backscattering ratios for use in aiding the selection of a TSS model suited for use in a particular water type.In addition, the results also show that the semi-analytic TSS models are generally better than empirical TSS models in deriving TSS estimation in unknown water types.

Figure 2 .
Figure 2. Point classification for Landsat algorithms using the Root Mean Square Error Test.The upper and lower dashed lines indicate the mean ± 95% confidence limits and the solid horizontal line is the mean RMSE of all the TSS algorithms.Unfilled circles are RMSE of each TSS algorithm with respective ± 95% confidence limits shown by error bars.

Figure 2 .
Figure 2. Point classification for Landsat algorithms using the Root Mean Square Error Test.The upper and lower dashed lines indicate the mean ± 95% confidence limits and the solid horizontal line is the mean RMSE of all the TSS algorithms.Unfilled circles are RMSE of each TSS algorithm with respective ± 95% confidence limits shown by error bars.

Figure 3 .
Figure 3. Flow diagram showing the methodology of the point scoring system described in Section 2.3.

,Figure 3 .
Figure 3. Flow diagram showing the methodology of the point scoring system described in Section 2.3.

Table 1 .
Concentration of colored dissolved organic matter (CDOM), chlorophyll (CHL), and total suspended solids (TSS) used in HydroLight modelling.The pure water component in all the HydroLight runs remains unchanged.

Table 2 .
Five different water classes.

Table 3 .
Relative Error and ∆R rs Uncertainty Tolerance results for the highest and lowest scoring models' evaluation using HydroLight Data.The highest scoring models are in bold text and the lowest scoring models are in regular italic text.The results provided in parenthesis represent the +∆R rs and '-' indicates the model failed to provide TSS estimation within acceptable bounds.SRE: Smallest Relative Error.LRE: Largest Relative Error.MARE: Mean Absolute Relative Error.ARE: Absolute Relative Error.

Table 4 .
The MARE for high and low scoring models for in situ data.The high scoring models are in bold text and the low scoring models are in italics.

Table B1 .
Mean Relative Error, MARE = Mean Absolute Relative Error, MAE = Mean Absolute Error, MPE = Mean Percentage Error, SD = Standard Deviation, MRAD = Mean Relative Absolute Difference, RMSE = Root Mean Square Error, SE = Standard Error, ARE = Absolute Relative Error, RPD = Relative Percentage Difference, APD = Absolute Percentage Difference, RRMSE = Relative Root Mean Square Error.Mean of Total Point and Final Scores of MODIS TSS models across different water classes as derived from different sediment types, backscattering ratios and Solar Zenith Angles.The top five and bottom five scores from each water types and the final scores are in bold (top) and bold italics (bottom).

Table B2 .
Mean of Total Point and Final Scores of Landsat TSS models across different water classes as derived from different sediment types, backscattering ratios and Solar Zenith Angles.The top five and bottom five scores from each water types and the final scores are in bold (top) and bold italics (bottom).E14 1.47 1.32 1.46 1.33 1.45 1.76 1.37 1.78 1.42 1.56 1.75 1.09 1.74 1.12 1.56