Classification of Several Optically Complex Waters in China Using in Situ Remote Sensing Reflectance

Determining the dominant optically active substances in water bodies via classification can improve the accuracy of bio-optical and water quality parameters estimated by remote sensing. This study provides four robust centroid sets from in situ remote sensing reflectance (Rrs (λ)) data presenting typical optical types obtained by plugging different similarity measures into fuzzy c-means (FCM) clustering. Four typical types of waters were studied: (1) highly mixed eutrophic waters, with the proportion of absorption of colored dissolved organic matter (CDOM), phytoplankton, and non-living particulate matter at approximately 20%, 20%, and 60% respectively; (2) CDOM-dominated relatively clear waters, with approximately 45% by proportion of CDOM absorption; (3) nonliving solids-dominated waters, with approximately 88% by proportion of absorption of nonliving particulate matter; and (4) cyanobacteria-composed scum. We also simulated spectra from seven ocean color satellite sensors to assess their classification ability. OPEN ACCESS Remote Sens. 2015, 7 14732 POLarization and Directionality of the Earth's Reflectances (POLDER), Sentinel-2A, and MEdium Resolution Imaging Spectrometer (MERIS) were found to perform better than the rest. Further, a classification tree for MERIS, in which the characteristics of Rrs (709)/Rrs (681), Rrs (560)/Rrs (709), Rrs (560)/Rrs (620), and Rrs (709)/Rrs (761) are integrated, is also proposed in this paper. The overall accuracy and Kappa coefficient of the proposed classification tree are 76.2% and 0.632, respectively.


Introduction
Remote sensing reflectance from the surface of optically complex waters is comprised of the backscattering from optically active constituents, including pure water, phytoplankton, nonliving solids, and colored dissolved organic matter (CDOM).In addition to containing more than one type of constituent, optically complex waters are characterized by a high degree of spatiotemporal diversity in their optical properties.Using remote sensing to classify such waters according to their optical properties will help identify optically complex waters and understand biogeochemical processes therein.A number of recent works have developed optical pre-classification schemes aiming to select regional inversion algorithms for water constituents [1][2][3].Several studies in the literature have focused on the development of such local or regional inversion algorithms for China's waters [4][5][6][7][8][9][10][11].These are limited, however, by their high dependency on the dataset used for algorithm development, and local or regional findings therefore tend not to be applicable elsewhere or to larger scales.
Early studies aiming to identify water types made use of their color.In the beginning, oceans were classified into two types: Case I was blue clear ocean waters and Case II was in different colors caused by increasing turbidity [12][13][14].Case I comprised open ocean waters with all constituents co-varying with phytoplankton.Case II had been expanded to comprise all other natural waters.It was subsequently proposed that waters could be classified as optically shallow or optically deep, depending on whether or not bottom reflectance contributed to the water-leaving reflectance [15,16].Most water constituent algorithms are intended for optically deep waters.In recent years, optically complex waters have been classified by remote sensing into several distinct types, such as phytoplankton-dominated, suspended solids-dominated, and other mixed waters [2,3].
Previous studies classified waters based on different types of parameters.Many [17,18] developed optical classification schemes based mainly on light penetration parameters, including euphotic zone depths (EZD), Secchi depth (SD), the spectral irradiance reflectance (R), the total attenuation coefficient (c), the diffuse attenuation coefficient for downwelling irradiance (Kd), and their ratio (Kd/c).Some studies classified waters using only water quality parameters (e.g., SD, turbidity, chlorophyll-a concentration (chl-a), yellow substance, and sea surface temperature) [19] or inherent optical properties (IOPs; e.g., absorption and scattering coefficients) [15,[20][21][22].Some studies used a combination of water quality parameters and optical properties [21,23,24].Classification results tend to improve as more parameters of different types (i.e., water quality parameters, IOPs, and apparent optical parameters (AOPs)) are included as inputs.However, the extensive parameterization possible to achieve through in situ measurements is inefficient.An operational approach to surface water classification makes use of remote sensing reflectance (Rrs (λ)) or irradiance reflectance (R (λ)) [2,[25][26][27][28][29], which can be obtained directly from satellite images.
Feature-based methods are often adopted when using Rrs (λ) to classify waters.The related research has used supervised classifiers, such as decision trees, maximum likelihood, supervised neural networks, or support vector machines [19,30] and unsupervised classification, based on K-means or fuzzy c-means (FCM) clustering [27,[31][32][33][34], hierarchical algorithms [11,25], unsupervised neural networks [35], or eigenvector classifiers using variance analysis [2,36].Unsupervised classification is based on the data alone and characterized by less human interference than supervised classification.Furthermore, many researchers have attempted to classify water types using Rrs (λ) alone [1,2,26,32,37], which has been associated with some algorithm improvement.Vantrepotte et al. [2] applied an unsupervised classification method developed by Ward [38] to divide data into homogeneous groups.Vantrepotte and Mé lin [3] used the iterative self-organizing data analysis technique (ISODATA) clustering method.The above clustering methods could not provide class memberships simultaneously.Moore et al. [1,27] first determined the optimal number of classes using a suite of cluster validity functions, and then applied the FCM algorithm to return the centroid sets for a number of classes and a membership function matrix, which expressed the likelihood that a point, with its observed reflectance vector, belongs to a class with a known Rrs (λ) vector.The method in the current work is primarily based on Moore's clustering framework, but with minor improvements.
Similar applications of optical classification dedicated to the global scale or coastal ocean are currently limited [1][2][3]33].Vantrepotte and Mé lin [3] optically classified global coastal ocean, from very turbid to oligotrophic conditions, and analyzed optical diversity using a global seven-year SeaWiFS data set.In this study, the samples were collected from Chinese eutrophic lakes, reservoirs, and coastal waters, and exhibit large variations in water quality parameters.Our samples span four orders of magnitude of chl-a (0.0139-943 mg• m −3 ), two of TSS (3.00-300 g• m −3 ), and one of CDOM absorption at 440 nm (0.138-4.79 m −1 ).Many studies observed in situ reflectance spectra and defined typical classes of waters based on researcher experience [8,37].In contrast, subjectivity can be reduced by clustering in situ reflectance and using the centroid reflectance of the resulting water types in classification.However, methods based on mathematical algorithms typically lack an underlying physical mechanism.Few studies have focused specifically on first clustering Rrs (λ) and then using IOPs and water quality parameters to support the classification results [2,32,33].
The primary objective of this study is to classify several optically complex waters found in China using Rrs (λ).Further, the classification method presented here could inform future classification efforts in similar settings.Fuzzy clustering is used to provide a set of centroids from in situ Rrs (λ) representing typical optical types.IOPs and water quality parameters are then used to validate which water type each cluster represents.The conceptual model of the study is comprised of four parts (Figure 1).In the first, fuzzy clustering is applied to all in situ Rrs (λ) spectra at least twice.Each time, a different similarity measurement was used and new centroid sets were extracted.The centroid sets corresponding to Class i (i = 1, 2, …, c) were intersected to form the final robust centroid sets.In the second part of the study, relationships between Rrs (λ) characteristics of the centroids and environmental parameters of each class were assessed, which indicates the reasonability of the proposed centroids.The given centroid sets represent different types of optically complex waters.Based on the clustering results, a classification tree was proposed in the third part of the study.Finally, the ability of seven ocean color radiometers to discriminate the given optical water types was assessed.

3) a classification tree
In situ Rrs data (3) A classification tree was proposed for hyperspectral sensors.(4) The ability of seven ocean color radiometers to differentiate the given optical water types was assessed.

Data Acquisition and Processing
Collected data consist of Rrs (λ), the concentrations of optically active substances, and the IOPs of the water at each station.Data were obtained from 447 stations during 12 cruises between 2006 and 2012.These stations are located in several typical Chinese optically complex waters, including the Three Gorges Reservoir, Lake Chaohu, Lake Dianchi, Lake Taihu, and the Yellow River estuary (Figure 2).Data from Lake Taihu were obtained by Zhang et al. [4,39,40] in January, July, and October 2006, and January, and April 2007.Data from Lake Chaohu (June 2009), Lake Taihu (March and April 2009), the Three Gorges Reservoir (August, 2009), and Lake Dianchi (September and December 2009) were obtained by Sun et al. [41], and data from the Yellow River estuary were obtained by Zhang et al. [42] in June 2012.An independent test data set, comprising 143 in situ Rrs (λ), was collected from the above areas during other cruises.Rrs (λ) was measured following the above-water method described by Mueller et al. [43], using an ASD FieldSpecR 3 spectroradiometer, with a spectral range of 350-2500 nm, a spectral resolution of 3 nm at 700 nm, and a sampling interval of 1.4 nm over the spectral range 350-1000 nm.Radiance spectra were collected ten times each for a reference panel, the water, and the sky at each station, and were then visually examined to eliminate abnormal spectra.The retained spectra from each station were averaged and Rrs (λ) derived according to Equation (1): where Lt (λ) is the total upwelling spectral radiance above the water surface; r × Lsky (λ) is the direct upwelling radiance reflected on the water surface contributed by the sky; r is calculated from the Fresnel formula; Lp (λ) is the simultaneously observed radiance of the reference panel, which has an accurately calibrated reflectance, ρp, of approximately 30%.The Rrs (λ) measurement error is less than 5%.
Optically active substances considered in this study include chl-a (mg• m −3 ), TSS (g• m −3 ), particulate inorganic matter (PIM, g• m −3 ), particulate organic matter (POM, g• m −3 ), and dissolved organic carbon (DOC, g• m −3 ).Water samples for chl-a measurement were filtered using Whatman GF/C fiberglass filters with a 0.7 µm pore size.chl-a was extracted using 80 °C ethanol (90%), and analyzed spectrophotometrically at 750 and 665 nm, with correction for phaeopigments [44,45].For the TSS measurements, water samples were filtered through Whatman GF/C fiberglass filters with a 0.7 µm pore size, precombusted at 550 °C for 4 h and weighed using an electrobalance with an accuracy of 10 −4 g.The filtered samples were dried at 105 °C for 4 h, weighed, and filter weights were subtracted to obtain only the sample weights.Next, filtered samples were recombusted at 550 °C for 4 h and reweighed to obtain the PIM value; POM was obtained by subtracting PIM from TSS.The DOC was obtained using a 1020 Total Organic Carbon Analyzer after filtering with Whatman GF/F fiberglass filters.

Classification of Rrs (λ) Spectra
In situ Rrs (λ) spectra with similar features were classed together through the FCM algorithm [34,47], which returns a membership function matrix.The centroid sets for a number of classes were obtained via the class memberships.The following three-step clustering framework was adopted.The second step with Principal Component Analysis (PCA) transform and the third step based on more than one similarity measure constituted improvements on the classification method proposed by Moore et al. [1,24].
In the first step, the number of clusters, c, needs to be provided as an input to the FCM clustering routine.The optimal value for c was assessed by two cluster validity measures, the Bayesian information criterion (BIC) index [48] and Dunn's index [49].In addition, the value of fuzziness m was two [50].
In the second step, PCA was used to reduce the dimensionality of the in situ Rrs (λ).Using too many bands of in situ Rrs (λ) may lead to a decline in classification accuracy; therefore, we applied PCA transform to acquire an optimal feature subset.
In the last step, the FCM routine was applied to the principal components from the Rrs (λ) of the 447 samples.Each time, clustering was performed based on different similarity measures, such as Euclidean distance (ED), spectral angle distances (SAD), orthogonal projection divergence (OPD) [51], and transformed divergence (TD) [52].For each FCM routine, c membership grades were obtained for each point in the c clustering groups.When a grade corresponding to Class i is larger, the likelihood that the pixel belongs to Class i is greater (where i =1, 2, …, c).Samples with any one membership grade greater than 0.9 would be attributed to the corresponding centroid set.After running the four routines (ED, SAD, OPD, TD), four centroid sets driven from four similarity measures corresponding to Class i were formed.
In each centroid set every point was viewed as a vector.The same vectors appearing in all four centroid sets were selected to form the Class i centroid set (where i = 1, 2, …, c).Finally, c centroid sets were formed, which were much more robust than the centroid sets based on only one similarity measure.
Here the four similarity measures, SAD, ED, OPD, and TD, were defined as: SAD=1-arccos Where xr and xc are the two spectral reflectance vectors; C is the covariance matrix of vector, xr; μr is the mean value of vector, xr, and tr is the trace function.
The lower the SAD, the greater the similarity of the two spectra, in accordance with ED.

Factor Analysis of Environmental Parameters
Factor analysis was applied to test for correlation between variables, and to analyze whether several variables, such as chl-a, TSS, and aCDOM (λ) could be replaced by common factors to represent optical water types.Of all 447 stations, 213 contained eleven variables (chl-a (mg• m −3 ), TSS (g• m −3 ), PIM (g• m −3 ), POM (g• m −3 ), DOC (g• m −3 ), aCDOM (440) (m −1 ), bp (m −1 ), cpg (m −1 ), aCDOM/(ap+ aCDOM), ad/(ap + aCDOM), and aph/ap) and were analyzed.As explanation: aCDOM (440) is aCDOM (λ) at 440 nm; Factor analysis was carried out as follows.First, the Kaiser-Meyer-Olkin (KMO) measure [53] and Bartlett's sphericity test [54] were used to verify that these eleven variables were indeed suitable for factor analysis.Then, factor analysis, in which PCA was selected as the extraction method and varimax [55] as the rotation method, was implemented.Further, the excluded missing values option chosen was listwise-an option in which the entire observation is omitted from the analysis if any variable is missing.

Characteristic Wavelength Extraction
To reveal the implications of local peaks and troughs of the Rrs (λ) spectra of different classes, the characteristic wavelengths from the Rrs (λ) spectra were extracted.The extraction method was proposed by Lee et al. [56] and improved upon by Shen et al. [57], to distinguish effectively between the maximum and minimum values.A Savitzky-Golay filter [58,59] with a window size of 15 and a second-order polynomial was selected to smooth the Rrs (λ).The Savitzky-Golay filter can remove curves that have significant noise, while maintaining as much as possible the true shape of the original curve.Then, the first-order derivatives were calculated for each group, and f1max (λ) and f1min (λ) were used to represent the cumulative frequency at which the overall Rrs (λ) showed maximum and minimum values at a wavelength λ, respectively.The larger the f1max (λ) or f1min (λ), the greater was the probability of a maximum or minimum value appearing at λ.

Results and Discussion
FCM was used on all Rrs (λ) spectra to obtain final centroid sets corresponding to each Class i (i = 1, 2, 3, 4).To match the four centroid sets to types of optically complex waters, the environmental parameters associated with different Rrs (λ) spectra were assessed.Based on clustering results, a classification tree is provided as a supplement, and the feasibility of our centroids is estimated for several multispectral sensors.

Clustering and Determination of Centroid Sets
The centroid sets are the basis for defining the membership of each pixel to each class.FCM was used to cluster Rrs (λ) spectra and obtain the centroid sets.To promote the robustness of FCM, the original Rrs (λ) data were strictly inspected prior to inclusion.Meanwhile, the classified spectra were inspected to determine whether certain classes contain very few samples that maybe outliers.Following such inspection, 447 stations were selected.
We ran the FCM routine on the 447 samples with values of c ranging from 2 to 15.Two validity measures were found to indicate the same number optimal number of clusters, c, which is four for our data set.Figure 3 shows the final results of the four clusters.Note that the seven samples in the ellipse of Figure 3 are far from the other samples.These seven samples were considered to be an independent Class 4, and the two validity measures run on the rest of data (Classes 1-3) again.The indices again implied these to be three clusters.The remaining data, except for Class 4, were then clustered into three clusters.The FCM routine was run on different similarity measures and the centroid sets were intersected.Of the 447 samples, 135 samples that were more representative than other samples were selected.The final four centroid sets covering 135 samples corresponding to Classes 1-4 are discussed in Section 3.2.
The test data set of 143 samples was used to extract the three centroid sets for Classes 1-3.The overall accuracy [60] and Kappa coefficient [61] were used to estimate the accuracy of the clustering method.The results prove that the accuracy of our methods with four similarity measures was better than the general FCM method with the Euclidean distance only.For the latter, the overall accuracy and Kappa coefficient was 89.7% and 0.844, respectively.The centroid sets corresponding to Classes 1-3 covered 16, 28, and 34 samples respectively.For the former, the overall accuracy was 100% (the respective overall accuracy for ED, SAD, OPD, TD was 89.2%, 100%, 100%, and 82.5%).Because the centroid sets intersected, some misclassified samples were removed.The three centroid sets corresponding to Classes 1-3 included 8, 28, and 26 samples, respectively.The three centroid vectors for each Class i (where i = 1, 2, 3) calculated from the four similarity measures were very similar to one another in shape (Figure 4).This demonstrates that the choice of similarity measure affected the centroids to a certain extent but did not drastically affect the final classification.This reflects the stability of the method presented here.Normalization of Rrs [2,3,25], which could remove the differences in amplitudes rooted from gradients in concentrations, was recommended.However, we did not apply normalization here.Instead, we compared the mean reflectance spectra of the classes, derived from the (a) non-normalized (Classes 1, 2, 3, 4) and (b) normalized (Classes 1', 2', 3',4') in situ measurements, as shown in Figure 5.The mean spectra of classes with or without normalization yield similar distributions.In contrast, in Figure 3 in Vantrepotte et al. [2], the differences between classes in terms of spectral shape are lower when derived from the raw reflectance data than when derived from normalized spectra data.Furthermore, we compared the average ±1 standard deviation of the raw reflectance spectra and the normalized reflectance spectra for the four classes (not shown).The variations within a class were slightly smaller in terms of normalized spectra than raw spectra, and did not differ significantly [3].In summary, normalization has a minimal impact on our results.The reason may be the act of PCA transform before FCM.Normalization removed the differences in amplitude.The PCA transform preserved the main features of the spectra vectors, such as shape, which resulted in the FCM results not being predominantly influenced by the amplitude of the spectra.

Links Between Rrs (λ) Spectra and Environment Characteristics
Three steps were carried out to determine which types of optically complex waters the four centroid sets extracted by unsupervised classification represent.First, environmental parameters in each class were calculated.Secondly, differences between the Rrs (λ) among the four centroid sets were located.Then, these were paired to explain Rrs (λ) differences by essential IOPs knowledge and water quality parameters.All of the 135 samples selected in Section 3.1 were included.
(1) Environment Characteristics of Each Group Our results (Table 1) show that each optical class, defined by Rrs (λ) spectra, has unique environment characteristics.
Averages are shown in bold, with standard deviations in parentheses; minimum and maximum values are given in the square brackets.Minimums of all four classes are italicized, and maximums of all four classes are italicized and underlined.N is the number of samples.
Class 1 represents highly mixed eutrophic waters containing nonliving particulate matter, phytoplankton, and CDOM.In this class, the proportion of absorption of CDOM, phytoplankton, and nonliving particulate matter are about 20%, 20%, and 60%, respectively.For Class 1, the various parameters ranged between the maximum and minimum values of all classes, except for aCDOM (440).The proportion of phytoplankton absorption was also found to only be lower than that of Class 4 and is higher than the average of the other three classes.Higher chl-a/TSS also indicates that the water is eutrophic.In addition, ad/(ap + aCDOM) has a broader range, than the other classes.In other classes, the range of ad/(ap+ aCDOM) is more limited.Class 2 is comprised of relatively clear and CDOM-dominated water masses.Class 2 samples have the lowest average level of chl-a, TSS, PIM, POM, bp, and cpg of all classes (Table 1).Compared to the other classes, the aCDOM/(ap + aCDOM) values are the highest, with an average of 0.447.CDOM may be generated by native production related to the bacterial activity that occurs during the phytoplankton biomass senescence phase.In addition, Class 2 has a high average absolute DOC concentration, of 7.78 g• m −3 , which is higher than that of both Class 1 and Class 3. The relatively high average DOC values suggest that the particulate matter assemblage includes a relatively high proportion of detrital material from biodegradation.
Class 3 samples have extremely high levels of nonliving particulate matter.Most Class 3 samples are from windy lakes in spring or winter, and are affected by local sediment resuspension.Compared to the other classes, Class 3 has the highest levels of TSS, PIM, PIM/TSS, bp, cpg, and ad/(ap + aCDOM) (Table 1).In contrast, Class 3 has the lowest levels of DOC, aCDOM (440), chl-a/TSS, POM/TSS, aph/ap (Table 1).
Finally, Class 4 is comprised of seven samples collected during a heavy degree of cyanobacteria blooms.Although the number of Class 4 samples was low because there were few in situ measured samples, the seven samples are still representative of this class.Those Rrs are different from other classes, and represent the typical types of waters with a surface scum of cyanobacteria.Class 4 samples have extremely high concentrations of absolute chl-a and POM, and very high values of chl-a/TSS and POM/TSS ratios.The aph/ap ratio of Class 4 is the highest of all four classes.In comparison, the PIM/TSS, ad/(ap + aCDOM) and aCDOM/(ap + aCDOM) ratios of Class 4 are far lower than those of the other classes.
In all 447 samples, chl-a varies from 0.0139 to 943 mg• m −3 (with an average of 34.5 mg• m −3 ).The maximal chl-a values correspond to heavy cyanobacteria blooms occurring annually in Lake Taihu between April and October.The TSS concentration ranges between 3.00 and 300 g• m −3 , with an average value of 68.4 g• m −3 .The range of aCDOM (440) variation (0.138-4.79 m −1 , with an average of 0.83 m −1 ) covers a large fraction of the natural variability reported in high contrast areas.The dataset of 447 samples considered in this work spans four orders of magnitude in chl-a concentration and two orders of magnitude in TSS, PIM, and POM concentrations, and bp and cpg.The four water types determined here should be therefore be broadly representative.Environmental conditions, governed by the water constituents and the optical properties of the water column, are not unique to any particular lake, reservoir, or coastal area.Table 2 shows the distribution of data from each site across the four classes.The distributions of data for each class span multiple water types, with freshwater and marine stations found in the same class.This demonstrates the limitations of regional inversion algorithms, which are temporally and spatially dependent, as well as the advantages of approaches based on classification.
In this study, several environmental parameters were considered, including water constituents (e.g., phytoplankton and nonliving particulate matter) and optical properties of the water column (e.g., bp, aph, ap, aCDOM).To examine the potential relationships among such parameters, factor analysis-a statistical method used to describe variability among observed variables in terms of a lower number of unobserved variables-was applied to the 447 samples (Figure 6). Figure 6 shows that chl-a, POM and aph/ap are similar in nature and close to 1 at the Component 2 axis.TSS, PIM, bp, cpg and ad/(ap + aCDOM) are clustered near 1 at the Component 1 axis.DOC, aCDOM (440), and aCDOM/(ap+ aCDOM) are grouped with large values at the Component 3 axis.It can be inferred that the three components represent nonliving particulate matter, phytoplankton, and dissolved matter of biological origin (both living and detrital) respectively.Among the 11 components, only three components' eigenvalues are larger than one.The cumulative percentage of the first three components reaches 89.32%, explaining the majority of the total variance.The rotated factor loadings represent how the variables are weighted for each factor.The parameters bp and cpg contribute more to Component 1 than do TSS and PIM.aCDOM (440) contributes more to Component 3 than does DOC, and aph/ap contributes less to Component 2 than do chl-a and POM.In general, the IOPs contribute to the components as much as the water quality parameters do.The above correlated parameters, including water quality parameters and IOPs, can be replaced by three potential components, nonliving particulate matter, phytoplankton, and CDOM, which are precisely the three main optically active substances in waters [62].(2) Characteristics of Rrs (λ) Spectra and correspondence with environmental parameters The link between Rrs (λ) and environmental parameters has a theoretical basis.The concentrations and constitutes of organic and inorganic particulate and dissolved matter impact the absorption and scattering of light.The IOPs (including absorption and backscattering coefficients), as well as illumination and viewing geometries, jointly determine Rrs (λ).In other words, optically complex waters that have similar environment conditions can be expected to have apparent reflectance spectra with similar shapes [63].Here, we provide some implications of the local peaks and troughs of the spectral curves of each class, aiming to both reveal the relationships between Rrs (λ) spectra and environment conditions as well as to achieve a simplified representation of Rrs (λ).
The wavelength distributions of f1max (λ) and f1min (λ) were obtained as the "frequency_peaks" and "frequency_troughs" for the four different classes (Figure 7).The frequency diagrams show a correspondence between the local frequency maxima and the extreme points (maxima and minima) of the average spectral curves in each centroid set.
The spectral curves also exhibit different characteristics in each of the four classes.On the basis of field experience, Class 4 is unique, representing waters with a surface scum of cyanobacteria bloom.The distinctly high values in the near-infrared spectral range result from the backscattering of the algae, which far outweigh the absorption of pure water.A reflection peak at around 359 nm in Class 4 is caused by a local absorption minimum, which is formed by the coinciding negative exponential decrease of ad + aCDOM and an increase in aph at wavelengths of approximately 300-400 nm.Chl-a absorption around 440 nm caused the narrow reflection troughs near 439-440nm in Class 4.
For Class 1 to Class 3, five characteristic wavelengths were observed, which can be used to discriminate optical types: peaks around 567-585 nm, 637-651 nm and 684-695 nm, and troughs around 626-632 nm and 674-677 nm.The five features listed above are mainly affected by pigment absorption, and also by the integrated absorption or scattering of pure water, nonliving particles, phytoplankton, and CDOM in each optical class.
The central positions of the trough near 674-677 nm and of the peaks near 637-651 nm and 684-695 nm are due to the absorption of chl-a.Class 4 and Class 1 have relatively higher chl-a concentrations in comparison with the low chl-a concentration waters of Class 2 and Class 3. The peaks near 560-580 nm shift to longer wavelengths when TSS concentrations increase as long as there is no algae bloom.In Class 3 samples, TSS is high and dominates; the central position of the peak is 585 nm, ranging from 560-596 nm.In contrast, Class 1 and Class 2 samples are from waters with lower TSS; the central position of the peak is consistently at 567 nm.The peaks near 684-695 nm all have a large range of variation for the four classes.The peak in Class 3 samples shifts to 695 nm, demonstrating that it does not matter whether TSS or chl-a dominates, the peak will be found at a longer wavelength as long as chl-a concentration is high.
Note that there is an obvious broad, flat peak between 560-695 nm in Class 3 samples.Class 2 samples show a peak at around 567 nm and a decreasing slope between 567 nm and 690 nm.Class 2 is comprised of CDOM-dominated water masses, with about 45% aCDOM/(ap + aCDOM).Class 1 samples fall between Classes 3 and 2, representing highly mixed eutrophic waters, with two obvious peaks at around 567 nm and 695 nm.
To summarize, the characteristics of the Rrs (λ) spectra correspond with the measured water quality parameters and optical properties.They could be matched up well in each class.It is also partially indicated that the Rrs (λ) spectra of the resulting centroid sets are representative and reasonable.

Classification Tree
Centroids can be used in both fuzzy and hard classification.However, a classification tree can only be used in hard or strict classification.The results of the former are more practical than the latter.However, the operational efficiency of the latter is higher than that of the former.To supplement a classification method and to make the most of the given Rrs (λ) features, an classification tree is provided here.
To make the classification tree more applicable, we also expanded our sample size.In addition to the 135 classified samples, the remaining 312 samples were grouped into the given classes by a maximum of four grades of membership.The ranges of the above 10 ratios were calculated for the samples belonging to each of the four classes.Then, some of the ratios and thresholds were found to distinguish any two classes very well, in spite of individual exceptions.For example, Rrs (695)/Rrs (676) ranged 0.76-1.23 for Class 3, except that the ratio of one sample was 2.39.The ratio ranged 2.13-3.32 for Class 4. Thus, the sample in Class 3 had to be moved into Class 4 when the ratio Rrs (695)/Rrs (676) was chosen as a classification feature, and the threshold was 2.0.Then, the range of a certain ratio in a given class would not overlap with any other class.After slight adjustment, a new set of classifications was formed (Figure 8).Among the four classes, Class 4 samples could be easily distinguished due to the obvious high values at near-infrared wavelengths, which resulted from the backscattering of the algae.The ratio of Rrs (695)/Rrs (676) can therefore be used to distinguish Class 4. This ratio ranges 0.91-1.98,0.85-1.23,and 0.94-1.16for Classes 1-3, respectively.Thus the threshold to distinguish Class 4 is selected as 2.0, since the ratio ranges 2.13-3.32 for the 16 samples in Class 4.
In Class 4, based on in situ observations, the samples corresponding to light degrees of surface scum have lower values at near-infrared wavelengths than those that correspond to heavy degrees of surface scum.As shown in the dashed rectangle in Figure 8, the Rrs (695)/Rrs (766) ratios of these nine samples, ranging 0.89-1.64,are obviously higher than those of the seven samples (0.29-0.74) corresponding to heavy degrees of surface scum.The heavy surface scum samples can therefore be distinguished when the Rrs (695)/Rrs (766) value is less than 0.8.In addition, the eutrophic Class 1 samples, which contained the highest number of samples of all classes, can be divided according to chl-a concentration levels using Rrs (695)/Rrs (766).
Class 2 samples tend to be easily distinguished by a spike near 567 nm, and a steep decline following the spike.A Rrs (567)/Rrs (695) value, usually known as the green/red ratio, can therefore be utilized to distinguish Class 2 from the other classes.The range of Rrs (567)/Rrs (695) in Class 2 samples is 2.01-4.76,compared with 1.03-1.96 in Class 1 and 0.82-1.91 in Class 3. The weaker decline between 567 nm and 695 nm in Class 1 or Class 3 is due to strong backscattering at the corresponding range in relatively turbid waters.
Class 3 and Class 1 are relatively difficult to distinguish.The following are the essential differences between Class 1 and Class 3: Class 3 is dominated by nonliving particulate matter with no obvious pigment absorption characteristics, while Class 1 is also significantly affected by phytoplankton, with obvious spectral characteristics at around 630 nm and 676 nm.Thus, Rrs (695)/Rrs (676), which is positively correlated with chl-a [64][65][66], and Rrs (630), which is related to phycocyanin [45,67], are selected as indices.Also note that the Rrs (λ) spectra in Class 3 are typically flat between 560 and 695 nm.In contrast, the Rrs (λ) spectra in Class 1 decrease from 560 nm to 695 nm.Low ratios of Rrs (567)/Rrs (695) are therefore chosen together as additional criteria to identify Class 3. When the Rrs (695)/Rrs (676) < 1.16 and Rrs (567)/Rrs (630) < 1.2, 78 Class 1 samples were misclassified into Class 3. Combined with the Rrs (567)/Rrs (695) index, these 78 misclassified samples of Class 1 were completely separated from the 133 Class 3 samples.In summary, the five ratios consisting of 567 nm, 630 nm, 676 nm, and 695 nm were primarily used to identify Classes 1-4.A classification tree for several optically complex waters in China was proposed based on hyperspectral Rrs (λ) images.The structure of the dendrogram is shown in Figure 9.The classification tree was tested using a test data set comprising 143 samples.The overall accuracy was 79.7% and the Kappa coefficient was 0.697.

Reflectance Discrimination Using Ocean Color Satellite Sensors
For hyperspectral remote sensors, such as Hyperion, equipped with hundreds of continuous spectral bands, it is no problem to apply the classification scheme proposed in this study because the centroids provided are from continuous spectra and the band ratios require several narrow bands.However, for multispectral sensors, bands generally do not cover the full spectrum, only a discrete section, and band width is usually greater than 10 nm.Some subtle characteristics may therefore be missed or obscured by the coarser resolution sensors.The feasibility of using our centroids with data from multispectral sensors should therefore be assessed.Seven sensors commonly used in ocean color, including Sentinel-2A, MEdium Resolution Imaging Spectrometer (MERIS), the Visible Infrared Imager Radiometer Suite (VIIRS) onboard earth-observing satellite NPP, Sea-Viewing Wide Field-of-View Sensor (SeaWiFS), POLarization and Directionality of the Earth's Reflectances (POLDER), Ocean Scanning Multispectral Imager (OSMI), and MODerate-resolution Imaging Spectroradiometer (MODIS), were analyzed.Here, the relative spectral responses of POLDER-1 and MODIS-Terra were selected to represent POLDER and MODIS.
The ground sample distances of the seven sensors are approximately 0.2 km, 0.3 km (full resolution), 0.4 km (subastral point), 1.1 km (local area coverage), 6 km, 0.85 km, and 1 km respectively.Figure 10 shows the band placements of the seven selected sensors.If the Rrs spectra acquired from satellite images covering four types of waters have significant differences, the satellite sensor can be viewed as suitable to classify the several optical complex waters in China.The average in situ Rrs (λ) of each centroid set was first calculated to obtain the Rrs (λ) centroids associated with four optical types.The Rrs (λ) was then convolved with solar irradiance and the relative spectral response (RSR) from each of the ocean color remote sensors.Figure 11 compares the centroids from the in situ measured spectra and those simulated by the seven ocean color satellite sensors.Figure 11 shows that the spectrum of a given sample has a similar shape when observed by six of the sensors: Sentinel-2A, MERIS, VIIRS, SeaWiFS, POLDER, and MODIS.However, OSMI does not have a band near 670 nm, and therefore misses the trough there.In general, the spectra of the Class 4 optical water type are the most unique because they represent a surface scum composed of cyanobacteria blooms.The spectra of the other three optical water types have some differences, however it is difficult to interpret such differences qualitatively from Figure 11.To quantitatively assess the discriminability of the four centroids by the commonly used satellite sensors, the SADs between any two centroids was calculated, as shown in Figure 12.In general, the seven sensors were all found to effectively discriminate Class 4 from the other classes, and the SAD values between Classes 1 and 4, Classes 2 and 4, and Classes 3 and 4 almost exceeded 0.2.The SAD values were also able to differentiate between Class 2 and Class 3.However, it is difficult for them to discriminate Class 1 from Class 2, and Class 1 from Class 3. POLDER is the sensor best able to distinguish between the four water types, followed by MERIS.Sentinel-2A ranks the third in general.Except for Class 4, Sentinel-2A is the sensor best to distinguish between Classes 1-3.To summarize, Sentinel-2A (with a better spatial resolution), MERIS, and POLDER are the best choices for discriminating between the four optical water types.The performance of VIIRS, SeaWiFS, MODIS, and OSMI in this respect was found to decrease successively.the performance of VIIRS, SeaWiFS, MODIS, and OSMI in decreasing order.A classification tree for MERIS was also proposed.The overall accuracy and Kappa coefficient were 76.2% and 0.632, respectively.This method, which is based on various similarity measures, is better than the general FCM method, which uses Euclidean distance only.The method is found to enhance the representativeness and robustness of the resulting centroids, which can be applied in other studies to classify optical water types.By comparing the observed Rrs (λ) with the known typical spectra, the class which each pixel belongs to can be determined.Via classification, knowledge of which substance (e.g., nonliving solids, CDOM, and phytoplankton) dominates in different types of waters can be obtained.The final centroid sets could also be used in a class-based inversion approach to retrieve the concentrations of water constitutes using remote sensing images, using the classification memberships as weights.In addition, the four water types proposed in this paper are not only suitable for the application of hyperspectral remote sensing but also appropriate for multispectral sensor-based classification.

4 )Figure 1 .
Figure 1.Conceptual model of the study is comprised of four parts.(1) Fuzzy clustering was applied to all in situ Rrs (λ) spectra to form the final robust centroid sets.(2) The relationships between the Rrs (λ) characteristics and environmental parameters of each class were assessed.(3)A classification tree was proposed for hyperspectral sensors.(4) The ability of seven ocean color radiometers to differentiate the given optical water types was assessed.

Figure 2 .
Figure 2. (a) Locations of the five study areas.(b) Locations of the field measurement stations in Lake Taihu.Crosses represent the same stations sampled in January, April, and October 2006, and January and April 2007.Triangles represent the stations sampled in March 2009.Circles represent the stations sampled in April 2009.(c) As in (b) Field measurement station locations in Lake Dianchi.Crosses and triangles are the stations sampled in September 2009 and December 2009 respectively.(d) Field measurement stations in the Three Gorges Reservoir.(e) Field measurement stations in Lake Chaohu.(f) Field measurement stations in the Yellow River estuary.

Figure 3 .
Figure 3.The clustering results with four clusters.Component 1 and Component 2 are the first two principal components from Principal Component Analysis (PCA) of in situ Rrs (λ).

Figure 4 .
Figure 4.The clustering centroids when similarity measures are SAD, TD, OPD, and ED respectively.

Figure 7 .
Figure 7.The Rrs (λ) of the four class centroids and the corresponding frequency distributions of the peaks and troughs.Blue circles mark the positions of the main peaks and troughs of interest.(a): class 1 centroid; (b): the frequency distributions of class 1; (c): class 2; (d): the frequency distributions of class 2; (e): class 3; (f): the frequency distributions of class 3; (g): class 4; (h): the frequency distributions of class 4.

Figure 8 .
Figure 8.The adjusted classification based on the extended dataset, with the centroids in bold ((a): class 1; (b): class 2; (c): class 3; (d): class 4).The dashed rectangle in Class 4 includes seven samples from a heavy surface scum composed of cyanobacteria blooms.Black bars mark the discussed wavelengths.

Figure 10 .
Figure10.Band placement and width specifications of the seven satellite sensors selected.The atmospheric transmission of the y-axis is associated with grey curves, but not with the vertical placement of the difference sensors.

Figure 11 .
Figure 11.In situ measured centroid spectra and those simulated by seven ocean color satellite sensors for the four optical water types.The seven ocean color satellite sensors are Sentinel-2A, MERIS, VIIRS, SeaWiFS, POLDER, OSMI, and MODIS.

Figure 12 .Figure 13 .
Figure 12.The six spectral angle distance (SAD) combinations produced by any two classes, the centroids of which have been simulated for the seven satellite sensors.The y-axis shows the SAD, which has a value of between 0 and 1.The x-axis shows the six combinations of the four classes.For each combination, the seven color bars denote the SADs of Sentinel-2A-, MERIS-, VIIRS-, SeaWiFS-, POLDER-, OSMI-, and MODIS-measured spectra.

Table 1 .
Overall and class-specific statistics for the bio-optical data collected during the cruises.

Table 2 .
Numbers of in situ data distributed across the four classes.