Optimization of Landsat Chl- a Retrieval Algorithms in Freshwater Lakes through Classiﬁcation of Optical Water Types

: The application of remote sensing data to empirical models of inland surface water chlorophyll-a concentrations (chl- a ) has been in development since the launch of the Landsat 4 satellite series in 1982. However, establishing an empirical model using a chl- a retrieval algorithm is difﬁcult due to the spatial heterogeneity of inland lake water properties. Classiﬁcation of optical water types (OWTs; i.e., differentially observed water spectra due to differences in water properties) has grown in favour in recent years over traditional non-turbid vs. turbid classiﬁcations. This study examined whether top-of-atmosphere reﬂectance observations in visible to near-infrared bands from Landsat 4, 5, 7, and 8 sensors can be used to identify unique OWTs using a guided unsupervised classiﬁcation approach in which OWTs are deﬁned through both remotely sensed reﬂectance and surface water chemistry data taken from samples in North American and Swedish lakes. Linear regressions of algorithms (Landsat reﬂectance bands, band ratios, products, or combinations) to lake surface water chl- a were built for each OWT. The performances of chl- a retrieval algorithms within each OWT were compared to those of global chl- a algorithms to test the effectiveness of OWT classiﬁcation. Seven unique OWTs were identiﬁed and then ﬁt into four categories with varying degrees of brightness as follows: turbid lakes with a low chl- a :turbidity ratio; turbid lakes with a mixture of high chl- a and turbidity measurements; oligotrophic or mesotrophic lakes with a mixture of low chl- a and turbidity measurements; and eutrophic lakes with a high chl- a :turbidity ratio. With one exception (r 2 = 0.26, p = 0.08), the best performing algorithm in each OWT showed improvement (r 2 = 0.69–0.91, p < 0.05), compared with the best performing algorithm for all lakes combined (r 2 = 0.52, p < 0.05). Landsat reﬂectance can be used to extract OWTs in inland lakes to provide improved prediction of chl- a over large extents and long time series, giving researchers an opportunity to study the trophic states of unmonitored lakes.


Introduction
The turn of the century has seen an apparent increase in the frequency and magnitude of harmful algal blooms in lakes, resulting in significant social, economic, and ecological damage [1][2][3]. It is theorized that the increase in blooms is a result of atmospheric changes (e.g., increased temperatures) and land use changes (e.g., agricultural intensification) [4]. The repercussions of frequent and intense blooms have motivated improved lake sampling efforts; however, there is often a sampling bias towards large lakes close to settled areas, while smaller lakes that scatter remote landscapes are often not sampled [5]. Lakes are regarded as sentinels of change in atmospheric and terrestrial systems, with smaller lakes often having a larger response compared to larger lakes [6,7].
Monitoring of lake algae typically relies on measurements of algal density and biomass or biovolume [8]. While ground-based measurement options provide precise information, remote sensing options are preferable-if not the only ones possible-in remote locations. continental forest, steppe, desert, mountain, subtropical humid forest, and forest) from July to October (1984-2016) (see Table S1 in the Supplementa more information). Ground-based samples were provided by the Govern Columbia's Environmental Monitoring System (EMS) surface water data USGS Storage and Retrieval (STORET) database, the USGS National Wat System (NWIS) database, and the Swedish University of Agricultural Miljödata MVM Environmental database. Samples were selected in these they provided consistent open data sources for lake water quality parame tabases also helped provide a geographic spread of data from the tropics to perate ecoregions, which may provide a diverse range of potential wat graphic clustering of data occurs as only specific ecoregions had frequentl ter quality results. Only samples where both chl-a and turbidity were taken of a Landsat 4, 5, 7, or 8 satellite overpasses were selected. This window s to allow for an adequate number of matchups between samples and satel while maintaining a relationship with measured reflectance [50]. Limited oured dissolved organic matter and total suspended solids metrics were fou window and therefore were not used in this study. A total of 204 sample p lakes were selected ( Figure 1, Table S1). Lake sizes ranged from 5.3 to 86,66 = 119.3 ha). Due to a lack of available metadata for public data records, ground-based measurement processing and calibration will occur and of potential error in the remote sensing retrieval.

Landsat Image Acquisition, Processing, and Analysis
Sample locations were mapped to the Worldwide Reference System (WRS-2) Landsat catalogue system to identify the (longitudinal) paths and (latitudinal) rows in which the samples were found. A total of 105 pairs of Landsat Level-1 and -2 images with <10% cloud coverage and within ±3 days of sample dates were downloaded from the USGS EarthExplorer data catalogue (https://earthexplorer.usgs.gov/, last accessed: 3 November 2021) (72 Landsat 4-5 TM, 11 Landsat 7 ETM+ (SLC-on), and 22 Landsat 8 OLI) (Table S1).
Various atmospheric correction options are available for the remote sensing of water quality using Landsat data (e.g., 6S, DOS, COST, iCOR); however, such methods often result in errors due to the violation of the dark pixel assumption in turbid waters when estimating aerosol optical thickness in the N [51,52]. While the SWIR band can be used in lieu of the N, it often results in lower aerosol accuracy estimation due to a poorer signal-noise ratio [53]. Some studies have instead opted for simple atmospheric correction of Rayleigh scatter (and not of aerosol contributions) for chl-a retrieval in turbid waters [54][55][56][57][58]. To reduce opportunities for overestimation of atmospheric contributions, this study corrected Landsat data for Rayleigh scatter contribution only.
OWTs were identified from top-of-atmosphere (TOA) reflectance values (0-1) in B (band 1 TM and ETM+, band 2 OLI), G (band 2 TM and ETM+, band 3 OLI), R (band 3 TM and ETM+, band 4 OLI), and N (band 4 TM and EMT+, band 5 OLI) bands. TOA radiance (W/(m 2 × sr × µm)), measured by Landsat sensors, were scaled using multiplicative (gain) and additive (bias) scaling factors to 8-bit (0-255; TM and ETM+) and 16-bit (0-65,000; OLI) integer value ranges (digital numbers or DNs) for transmission and storage in Landsat Level-1 products. DNs were recalibrated to TOA radiance using the standard equation [59], as follows: where L is TOA radiance for wavelength (λ) range or band λ. TOA radiances were corrected for Rayleigh scatter (attributed to the molecular properties of the atmosphere) using an inverse algorithm based on a simplified radiative transfer model presented by Gilabert [60], as follows: where L r is the Rayleigh path radiance for band λ, ESUN is the mean solar exo-atmospheric irradiance for band λ, P r is the Rayleigh phase function, θ s is the solar zenith angle in degrees, θ is the satellite viewing angle in degrees (equal to 0 • for Landsat 4, 5, and 7 images and for nadir-looking Landsat 8 images), τ r is the Rayleigh optical thickness, and t oz ↑ and t oz ↓ are upward and downward ozone transmittance, respectively. The Rayleigh phase function (P r ) [61,62] describes the angular distribution of scattered light and was calculated as follows: where Θ is the scattering angle (180 • − θ s ), γ = δ/(2 − δ), and δ is the depolarization factor that denotes the polarization of anisotropic particles at right angles-dependent on the wavelength, atmospheric pressure (constant), and air mass (constant) [63,64]. Rayleigh optical thickness (τ r ) [65,66] was calculated as follows: Ozone transmittance (t oz ↑ and t oz ↓) [67] were calculated as follows:  (6) where τ oz is the ozone optical thickness, as calculated by [68]. L r was subtracted from L for each band λ to determine Rayleigh-corrected TOA radiance (L) as follows:L L was then converted to unitless TOA reflectance (ρ; 0-1) for each band λ to avoid issues regarding shifts in the solar zenith angle due to latitude and time of year, as follows: where d is the Earth-Sun distance in astronomical units. Lake boundaries were delineated from Level-2 images using the dynamic surface water extent (DSWE) model developed by Jones [69] and adapted by DeVries et al. [70]. Contiguous groups of pixels identified as water by the DSWE model were vectorized without polygon simplification (i.e., lake vector boundaries matched the pixel boundaries), and the vectors were then buffered inwards by 15 m (0.5 pixel width) to minimize the spectral effects of edge pixels where the reflectances of vegetation and shallow depths mix with the reflectance of water. Only buffered lake polygons ≥4.5 ha (50 pixels) were used in this study to further minimize the spectral effects of edge pixels.
In each buffered lake polygon, pixels identified as having a high probability of cloud or cloud shadow in the pixel quality assessment band, provided with Level-2 products, were removed. Because ground-based samples were taken from multiple sources, an assumption of spatial homogeneity in the water chemistry was made due to potential inaccuracies in reported sampling coordinates. To meet this assumption, the standard deviation of ρ in all remaining pixels in each buffered lake polygon was calculated for each visible-N band ρλ; homogeneity is expressed as the sum of the band standard deviations (SSD; [71,72]); and lakes with an arbitrary threshold of SSD larger than the median SSD of all lakes were discarded. While a 3 × 3 or 5 × 5 filter may reduce the effects of homogeneity, some public water quality data may only provide lake coordinates and not sampling coordinates. Filters will not provide adequate smoothing for larger waterbodies, and thus lake averages and SSD thresholds were used.

Identification of OWTs
OWTs are defined as waters with diverse water chemistry compositions resulting in a wide range of spectral signatures in the visible-N spectrum [73]. Common methods of OWT separation use unsupervised classifiers such as k-means or fuzzy c-means [44][45][46]; however, the small number of Landsat bands limits the number of potential observable spectral signatures. To overcome this limitation, a guided approach was implemented, whereby, the ratio of chl-a:turbidity (Chl:T) was used in addition to ρλ in the visible-N bands in a unsupervised hierarchical clustering method. The use of Chl:T indicates whether the optical signal is influenced by a high biomass presence (high Chl:T) or a low biomass presence (low Chl:T). The hierarchical clustering method was done in R using the "hclust" function found in the base "STATS" package using the "Ward" method. The hierarchical clustering distance values were calculated using the "Canberra" method. Distance is measured as the space (referred to as Euclidian space) between data points in a multivariate dataset, which represents how closely clustered points are. Chl:T and ρλ in the visible-N bands were normalized in R using the "preProcess" function found in the "caret" package, with "scale" selected as the method (i.e., dividing each column by its standard deviation) [74].
To determine the optimal number of classes, an elbow method was used, whereby the total within sums of squares for numbers of clusters from 2 to 24 were calculated using the "fviz_nbclust" function as part of the "factoextra" package in R [75]. A three-point piecewise regression of total within sum of squares vs. number of clusters was fit to determine at which point the increase in clusters no longer significantly reduced the total within sum of squares. Each OWT defined using this method was defined as OWT-A h or OWT-B h , etc.
To be applicable to lakes where in situ water chemistry is unknown, a supervised classifier was trained using normalized ρλ in the visible-N bands and the now defined OWTs. A quadratic discriminative analysis (QDA) model was selected as it reduces dimensionality and uses the mean vector of each class to define non-linear boundaries between the defined classes. A random stratified sampling technique was used to select 70% normalized training and 30% normalized testing data using the "stratified" function from the "splitstackshape" package in R (seed = 854) [76]. The QDA was calculated in R using the "qda" function found in the "MASS" package [77]. Each OWT defined using this method is defined as OWT-A q or OWT-B q , etc.

Development of Chl-a Retrieval Algorithms
A total of 82 algorithms-including all possible band products and ratios, as well as commonly used multiband combinations found in the literature (Tables S2 and S3)-were tested for the empirical retrieval of chl-a across all lakes (i.e., global models) and within each OWT using linear regression. Chl-a and turbidity values were log-scaled to meet the assumption of normality. Shapiro-Wilk tests were used to assess the normality (p ≥ 0.05) of relationships between dependent and independent variables. Breusch-Pagan tests were used to assess constant variance (p ≥ 0.05) between dependent and independent variables using the "lmtest" R package [78]. Outliers in selected models were identified using Cook's distance >4/n prior to regression modelling of chl-a. Algorithm strength and significance were evaluated using coefficients of determination (r 2 ) and regression p-values: these were used to compare the strength and significance (p < 0.05) of correlations between mean lake ρ λ to chl-a or turbidity using OWTs vs. global applications. The chl-a retrieval algorithms were validated using ten-fold cross validation and the predictive performance was measured by the root mean squared error (RMSE) as follows: whereŷ i is observed as chl-a and y i is the predicted value. To compare between different groups of varying sample size and different scales of input chl-a, the RMSE were normalized as follows: where σ is the standard deviation of the input chl-a. The root mean log squared error (RMSLE) was calculated as follows: Predictive performance was also measured by the mean absolute error (MAE), calculated as follows: The median absolute percentage error (MAPE) was calculated as follows: Bias was calculated as follows: The MAE, RMSLE, MAE, and bias were calculated using the "metrics" R package [79], while the MAPE was calculated using the "MLmetrics" package [80]. To showcase the application of the OWT and chl-a retrieval algorithms, a testing image was used (Landsat 8 OLI, 15 August 2021, path = 17, row = 29), where per pixel OWT and modelled chl-a are shown.

Identification of OWTs
The number of OWTs was determined using a three-point piecewise regression in R, where the total within sum of squares was calculated using normalized Chl:T and ρλ in the visible-N bands, and which identified three breaks (k = 3, 7, and 11). The first, k = 3, represents too few potential OWTs, while k = 11 resulted in clusters with too few samples for the development of regressions. To maximize the number of OWTs and maintain reasonable sample sizes, k = 7 was identified as the optimal number.
The unsupervised hierarchical clustering method defined which of the lakes belonged to which OWT ( Figure 2). Based on lake surface water chemistry (Table 1, Figure 3), OWT-E h had the highest Chl:T (median = 6.7) with high chl-a (median = 13.7 µg L −1 ) and low turbidity (median = 1.9 NTU) measurements. While the Chl:T ratio was high, the lakes were relatively dark compared to OWT-A h , -B h , and -C h , but brighter in the B band compared to OWT-D h , -F h , and -G h (Figure 4a). OWT-A h had the lowest Chl:T (median = 0.5) with low chl-a (median = 4.0 µg L −1 ) and high turbidity (median = 7.8 NTU) measurements. While optically bright in the B, G, and R bands, OWT-A h had low ρλ(N) in ranges similar to the optically dark lakes (Figure 4a). OWTs-B h and -C h had moderately high Chl:T (median = 4.8 and 4.5, respectively) with a high chl-a (median = 33.6 µg L −1 and 20.2 µg L −1 , respectively) and high turbidity (median = 6.7 NTU and 5.0 NTU, respectively) measurements. OWT-C h returned the highest ρλ of any OWT, with significantly higher N ρλ. Both OWTs-B h and -C h had equally high chl-a and turbidity measurements, with OWT-C h displaying the greatest variance in its distribution compared to any other OWT ( Figure 4a). OWT-D h had a low Chl:T (median = 1.1) with low chl-a (median = 1.3 µg L −1 ) and low turbidity (median = 1.7 NTU) measurements. OWT-D h remained optically dark throughout all four visible-N bands, with little variation in ρλ ( Figure 4a). OWTs-F h and -G h had moderately low Chl:T (median = 2.5 and 3.0, respectively) with low chl-a (median = 3.00 µg L −1 and 2.95 µg L −1 , respectively) and low turbidity (median = 1.2 and 1.0 NTU, respectively) measurements. OWT-G h exhibited the lowest ρλ with the lowest reflectances in the G and R bands. While OWT-F h shows an even distribution of chl-a and turbidity, OWT-G h has slightly higher chl-a relative to turbidity. Table 1. Summary statistics of ground-based chl-a and turbidity within each OWT for two different methods: unsupervised hierarchical clustering based on reflectance and water chemistry, and supervised quadratic discriminant analysis (QDA), trained using the hierarchical classes and the associated mean lake TOA reflectance (ρλ) per band (B = Blue, G = Green, R = Red, N = near infrared).     The supervised QDA method provided a reasonably accurate prediction of OW with a testing accuracy of 75.4% ( Figure 5a) and a total accuracy of 80.1% ( Figure 5 OWTs-Bq, -Cq, -Dq, -Eq, and -Gq were reasonably predicted (i.e., ≥60.0%), but OWTs-Aq a -Fq were poorly predicted (≤25%) (Figure 5a). The order of brightness, as represented mean ρλ(B, G, R, N) in OWTs, identified through the supervised QDA method and t unsupervised hierarchical clustering method, were the same. OWT-Cq was the bright  The supervised QDA method provided a reasonably accurate prediction of OWTs, with a testing accuracy of 75.4% ( Figure 5a) and a total accuracy of 80.1% (Figure 5b). OWTs-B q , -C q , -D q , -E q , and -G q were reasonably predicted (i.e., ≥60.0%), but OWTs-A q and -F q were poorly predicted (≤25%) (Figure 5a). The order of brightness, as represented by mean ρλ(B, G, R, N) in OWTs, identified through the supervised QDA method and the unsupervised hierarchical clustering method, were the same. OWT-C q was the brightest (mean ρλ(B, G, R, N) = 0.061); followed by OWTs-A q (mean ρλ(B, G, R, N) = 0.041) and OWTs-B q (mean ρλ(B, G, R, N) = 0.040); then by OWT-E q (mean ρλ(B, G, R, N) = 0.028), OWT-F q (mean ρλ(B, G, R, N) = 0.027), and OWT-D q (mean ρλ(B, G, R, N) = 0.022); and finally by OWT-G q , which was the darkest (mean ρλ(B, G, R, N) = 0.017).

OWT Chl-a Retrieval Performance
In the absence of OWT separation, global algorithms demonstrated moderate to poor performances for chl-a retrieval. The [(R/B) × (R/N)] algorithm exhibited the highest r 2 (0.52, p < 0.05, n = 147) with the seventh lowest NRMSE and the fourth lowest MAPE out of 82 algorithms in total. Of the 10 algorithms with the highest r 2 , all included the use of the R band, while 70%, 30%, and 30% included N, G, and B bands, respectively (Table 2), 70% included both the R and N bands, and 40% included a ratio of R and N. Compared with the best performing global algorithm, algorithm performances improved when separated into OWTs using unsupervised hierarchical clustering and supervised QDA. The algorithm performances were similar between OWTs identified by the two methods (Table 3), except for OWTs-C (higher r 2 in OWT-C q ), -D (higher r 2 in OWT-D h ), -E (higher r 2 in OWT-E h ), and -F (higher r 2 in OWT-F q ). 2.5 and 3.0, respectively) with low chl-a (median = 2.9 μg L and 3.0 μg L , resp and low turbidity (median = 1.2 and 1.0, NTU, respectively) measurements. Whe ing the placement of QDA OWTs in the chl-a vs. turbidity plot (Figure 4b), simil to the hierarchical clustering were found, with the greatest difference in OWT-D was misclassified with OWTs-Fq and OWTs-Gq.   Based on Chl:T, OWTs-A h and OWTs-D h represent lakes where the chl-a remains relatively low despite higher relative turbidity measurements. In both OWT-A h and OWTs-D h , [G × (B + G + R)] and [N/G] had the highest r 2 (1.00, p < 0.05, n = 4; 0.97, p < 0.05, n = 9, respectively); however, due to the small sample size after outlier removal, the algorithm with the next highest r 2 , [B/R], and [G/N] was preferable (r 2 = 0.91, p < 0.05, n = 9; r 2 = 0.91, p < 0.05, n = 13, respectively). Due to the small sample sizes (n < 10), most error metrics (RMSE, RMSLE, NRMSE, and MAE) were not calculated; however, [B/R] and [G/N] had the 23rd and the 3rd lowest MAPE respectively. In both OWTs, of the ten algorithms with the highest r 2 , N was included in the largest number. In OWT-A q , and OWTs-D q [(R/G) × N] and [R] had the highest r 2 (0.91, p < 0.05, n = 5; 0.55, p < 0.05, n = 10, respectively). [(R/G) × N] and [R] had the 7th and 15th lowest MAPE respectively. Of the ten algorithms with the highest r 2 , N was included in the largest number in OWT-A q , while G was included in the largest number in OWT-D q .
Based on Chl:T, OWTs-B h and OWTs-C h represent eutrophic lakes where both chl-a and turbidity measurements were relatively high. In both OWT-B h and OWTs-C h , [B/G] and [(B/G) × (R/G)] had the highest r 2 (0.82, p < 0.05, n = 23; 0.26, p = 0.08, n = 13, respectively), with 2nd and 18th lowest NRMSE and the 7th and 1st lowest MAPE, respectively. Of the ten algorithms with the highest r 2 , B and N were included in the largest number in OWT-B h , with G the most frequent in OWT-C h . Algorithm performances in OWT-C h , however, were consistently poor. In both OWT-B q and OWTs-C q , [(B/G)] and [(B × N)] had the highest r 2 (0.85, p < 0.05, n = 26; 0.59, p < 0.05, n = 10), with the 2nd and 1st lowest NRMSE and MAPE respectively. Of the ten algorithms with the highest r 2 , B, G, and R bands were included in the largest number in OWT-B q , while B was included in the largest number in OWT-C q . While OWT-C q does provide an improvement over OWT-C h , [(B × N)] was the only algorithm with an adequate performance as the next highest r 2 was found for [(B/G) × (R/G)] (r 2 = 0.33, p < 0.05, n = 16).
Based on Chl:T, OWT-E h represents lakes where the turbidity measurements remains relatively low, despite the higher relative chl-a. In OWT-E h , [B/N] had the highest r 2 (0.77, p < 0.05, n = 36), with the 4th lowest NRMSE and the 2nd lowest MAPE. Of the ten algorithms with the highest r 2 , N was included in the largest number. Conversely, in OWT-E q , [(B/N)] had the highest r 2 (0.47, p < 0.05, n = 56) with the lowest NRMSE and the 8th lowest MAPE. Of the ten algorithms with the highest r 2 , B was included in the largest number.
Based on Chl:T, OWTs-F h , and OWTs-G h represent oligotrophic or mesotrophic lakes, where both chl-a and turbidity measurements are relatively low. In both OWT-F h and OWTs-G h , [(G/R) × N] and [(B/G) × (R/N)] had the highest r 2 (0.78, p < 0.05, n = 27; 0.69, p < 0.05, n = 21, respectively), with 12th and 1st lowest NRMSE, and the 4th and 44th lowest MAPE respectively. Of the ten algorithms with the highest r 2 , N was included in the largest number in both OWTs. In OWTs-F q and -G q , [G × (B+G+R)] and [(B/G) × (R/N)] had the highest r 2 (0.98, p < 0.05, n = 13; 0.68, p < 0.05, n = 24, respectively), with the 1st and 4th lowest NRMSE and the 4th and 29th lowest MAPE respectively. Of the ten algorithms with the highest r 2 , R was included in the largest number in OWT-F q , while N was included in the largest number in OWT-G q .
The described methods identified seven OWTs with unique water chemistry compositions, varying spectral curves, and levels of brightness. The supervised classification returned similar OWTs with similar algorithm performances. In comparison with the global algorithms, separation into OWTs provided significantly improved retrieval accuracy for six of the seven OWTs. Scatter and validation plots for OWTs are presented in Figures 7 and 8, respectively, for unsupervised hierarchical clustering, and in Figures 9 and 10, respectively, for supervised QDA. Full algorithm performance metrics for hierarchical clustering and QDA can be found in Tables S2 and S3, respectively. The application of the OWTs and the subsequent modelling of chl-a can be seen in Figure 11, where a variety of OWTs were retrieved per pixel for several lakes in central-eastern Ontario.

Discussion
This study sought to determine the following: whether Landsat-derived ρλ have t capacity to differentiate OWTs with unique spectral signatures and water chemistry d tributions; whether OWT-specific algorithms improved chl-a retrieval accuracy compar with that of a global algorithm. Given the limited number of Landsat's broad radiomet

Discussion
This study sought to determine the following: whether Landsat-derived ρλ have the capacity to differentiate OWTs with unique spectral signatures and water chemistry distri-butions; whether OWT-specific algorithms improved chl-a retrieval accuracy compared with that of a global algorithm. Given the limited number of Landsat's broad radiometric bands, a unsupervised classifier was developed using ρλ in the visible-N bands, guided by Chl:T to produce seven OWTs with both unique spectral signatures and unique water chemistry profiles. A supervised classifier was trained using the guided unsupervised OWTs and applied to lakes where lake surface water chemistry was unknown. The supervised classifier provided reasonably accurate classification results, returning similar chl-a retrieval algorithm performances compared to the guided unsupervised classifier.

Identifying OWTs
The guided, unsupervised classifier differentiated lakes as optically bright (OWTs-A h , -B h , and -C h ) and optically dark (OWTs-D h , -E h , -F h , and -G h ) (Figure 2). However, this classifier also defined OWTs with unique water chemistry distributions. The optically bright lakes had distinct spectral curves, mostly differentiated by Chl:T and the observed ρλ in the N band (Figure 3). Among the optically bright lakes, OWT-A h represented lakes where ρλ was high with low chl-a. Despite the low biomass, turbidity remained high along with a greater increase in ρλ in the R band and a smaller increase in the N, indicating a potential for non-algal particle dominance in this OWT [33,81]. OWTs-B h and -C h represented turbid lakes, as there was a relatively equal ratio of B-G and R-N ρλ. OWT-B h exhibited notably higher ρλ in the G and R bands compared with OWTs-D h to -G h . The increased absorption in the R band due to chl-a was countered by the increase in non-algal particulate scatter, as is often seen in turbid waters. OWT-C h exhibited much higher ρλ in the N band compared to other OWTs. Additionally, OWT-C h represented a much wider range of Chl:T values (Figure 3f). Exploration of the metadata showed that the OWT-C h lakes had the smallest surface area of all OWTs (median = 75.6 ha), suggesting that these lakes may have exhibited high ρλ(N) due to shallow emergent vegetation or shoreline contamination. The optically bright lakes returned significantly brighter G and R bands relative to the B and N bands when compared to the optically dark lakes (with the exception of the N band for OWT-C h ).
The optically dark lakes had similar spectral curves, mostly differentiated by the level of brightness (Figure 2). Among the optically dark lakes, OWT-D h represented oligotrophic or mesotrophic lakes with low Chl:T where the spectral curve does not replicate that of OWT-A h , which is likely a result of low chl-a and turbidity measurements where water absorption would dominate the spectra. OWT-E h represented mesotrophic or eutrophic lakes with high Chl:T and low ρλ in the R band, relative to other bands, as a result of chl-a absorbance [10]. Lakes in which there is a significant spike in ρλ in the N band relative to R suggest that most of the signal is a result of algal particles [81]. Non-algal particles are a significant contributor to backscatter at all wavelengths, but the contribution decreases at higher wavelengths, while algal particles increase backscatter at higher wavelengths [81]. OWTs-F h and -G h represented oligotrophic or mesotrophic lakes with low chl-a and turbidity measurements. OWT-F h represented a more even mix of chl-a and turbidity (i.e., the lakes were closer to the 1:1 line in Figure 4), and resembled the spectral shape of OWT-B h , though optically darker. OWT-G h had slightly lower relative turbidity and, therefore, more closely resembled the spectra of OWT-E h , though optically darker. For lakes classified as optically dark, the B band returned the highest mean lake ρλ, G the second highest, and R the lowest, with a slight increase in the N. The high B band ρλ was likely due to water as the algal particles remained low [48,82]. Typically, N should remain the lowest observed mean lake ρλ; however, due to the atmospheric correction of only Rayleigh scatter used in this study, a higher proportion of observed visible radiance (B, G, and R bands) was removed compared with that of radiance in the N band.
While the guided unsupervised classifier differentiated OWTs based on varying magnitudes of brightness and distinct lake surface water chemistry, it required the water chemistry to be known. The application of the chl-a retrieval algorithm would be used when in situ chl-a and turbidity are unknown; therefore, the supervised classifier is required.
The supervised classifier would need to accurately return similar OWTs compared to that of the guided unsupervised classifier, where each OWT returns similar spectra and water chemistry information.
As with the unsupervised classifier, the supervised classifier (QDA) differentiated lakes as optically bright (OWTs-A q , -B q , and -C q ) and optically dark (OWTs-D q , -E q , -F q , and -G q ) (Figure 2). The QDA accurately defined the optically bright and dark lakes when comparing the magnitudes of brightness observed (Table 1). OWTs with unique water chemistry distributions were also observed when comparing the Chl:T value of each QDAderived OWT ( Figure 6) to those derived by the unsupervised classifier ( Figure 3). OWT specific classification errors do occur particularly for lakes with a low Chla:T, as OWTs-A q and -D q returned low classification accuracy. The difficulty in defining OWTs with a low Chla:T may be due to the high variability in the observed ρλ for the visible bands (Figure 3), as the composition of potential non-algal particles (e.g., white vs. red clays) can greatly affect the visible spectra. OWT-F h had also returned poor classification accuracy, often misclassified as OWT-E q . The misclassification tended to occur in mesotrophic lakes where chl-a was high. Despite these issues, all other OWTs (i.e., OWTs-B q , -C q , -E q , -G q ) returned high classification accuracy, indicating the supervised classifier is capable of defining OWTs when using Landsat-derived ρλ. The application of Landsat for chl-a retrieval in mixed waters is limited due to its broad radiometric bands [83,84], and this limitation extends to the identification of OWTs. Landsat has the capacity to resolve the difference between optically bright and dark signals with a varying range of Chl:T at the lake level (using mean ρλ in the entire waterbody); however, further resolution is unlikely (i.e., differentiating sediment from detritus material, differentiating algal taxonomy). Additionally, dissolved and particulate matter will increase backscatter and subsequent observed ρλ at visible wavelengths, depending on the composition and concentration [33,85]. The minimal difference in the observed spectra of these lakes is potentially due to the low signal-noise ratio of the Landsat satellite series (particularly with Landsat 5 TM and 7 ETM+), in which small incremental changes in water properties are not likely to be observed in the spectra of dark lakes [12,86]. To define the Chl:T range among varying levels of brightness, the application of lake surface water chemistry parameters in guiding the classification of OWTs offers an improvement when using only Landsat observed ρλ. While in situ spectroradiometers, hyperspectral imagers, and multispectral satellites have a higher number of visible-N bands that may provide more accurate results, the methods outlined in this paper are to be employed when such data are not available.

OWT Chl-a Retrieval Performance
Eighty-two chl-a retrieval algorithms were tested for each OWT to determine which algorithm performed best. Algorithms performed at varying levels in each OWT, with some patterns observed in the types of bands used. The best performing algorithms using the supervised classifier (i.e., OWTs-A q , -B q , etc.) and the guided unsupervised classifier (i.e., OWTs-A h , -B h , etc.) were then compared.
OWTs-A h and -D h represented a low Chl:T, where OWT-A h was optically brighter and consisted of higher turbidity measurements. Both OWTs returned high r 2 and low overall error; however, some of these fits were inflated due to small sample sizes after outliers were removed. As the chl-a signal was relatively low despite the high brightness observed, a low correlation was expected. The high r 2 with algorithms using B and G bands were likely false positives due to the high reflectivity of potential non-algal particles at shorter wavelengths, particularly when chl-a is low [33]. Due to the classification errors with both QDA-derived OWTs (particularly OWT-A q ), the best performing algorithms as indicated by r 2 did not match well. The best performing algorithms frequently utilized the R and N bands for OWT-A q and the G and R bands for OWT-D q , which is expected for turbid waters. While the performance as measured by r 2 did not provide a good match for OWT-D q , other error metrics such as NRMSE provided a slightly better match, where the same algorithms derived from unsupervised and supervised classifiers had similar retrieval errors.
OWTs-B h and -C h represented eutrophic lakes, where both chl-a and turbidity measurements are high relative to the training data distribution. For optically complex and turbid lakes, an R-N ratio is traditionally used [35][36][37]. According to Gitelson [39], this ratio should capture the R edge to N transition (~700 nm), which is currently not possible with Landsat; however, N bands have been used in past studies as an alternative [87]. The best performing algorithms in both OWTs typically used B and G bands, with the best performing algorithms in OWT-B h also commonly including the N band. Both OWTs returned algorithms using a B-G ratio, which is commonly used for oligotrophic waters due to increased water column penetration, lack of non-algal particles increasing scatter, and highest ρλ in the G band [88]. It is possible that the influence of non-algal particles in OWT-B h was driving the observed r 2 . OWT-C h returned a very poor performance compared to OWTs-B h , which may reflect the small median lake size, potential emergent vegetation, or shoreline contamination. OWT-B h provided some algorithms with an adequate performance that would be expected to provide a better chl-a signal in turbid waters, such as [(R/B) × (R/N)], which exhibited the lowest NRMSE and used the R-N ratio (r 2 = 0.77, p < 0.05). As the supervised classification accuracy was high, both OWT-B q and -C q provided similar algorithm results.
OWT-E h represented lakes with a high Chl:T, where the turbidity was relatively low given higher relative chl-a. The lakes are considered optically dark, a result of low turbidity, where the signals may be influenced by a lack of non-algal particles increasing ρλ in the B band (due to water reflectance) and decreasing ρλ in the G and R bands; moreover, other factors such as DOM, which typically increases absorption at shorter wavelengths, may not be present as well [89]. The spectra therefore resemble those of other optically dark OWTs, although the brightest of the dark OWTs on average. Algorithms with lower NRMSE use the G-B ratio and the R-N ratio which are commonly used chl-a retrieval metrics [9]. OWT-E q had returned highly similar algorithms albeit with far poorer performance metrics.
OWTs-F h and -G h represented oligotrophic and mesotrophic lakes, where both chl-a and turbidity measurements were low relative to the training data distribution. While the lake surface water chemistry values were low, there was a relatively even distribution of chl-a and turbidity measurements. The best performing algorithms for both OWTs were suited to retrieving chl-a in turbid mixed lakes, with OWT-F h using a G-R ratio and OWT-G h using both B-G and R-N ratios. A G-R ratio was used for chl-a retrieval in turbid lakes for other studies similar to the R-N edge, as both implement a maximal absorption and reflectance peaks for chl-a [30]. When classified using the QDA method, similar algorithm performances were found in OWT-G q in which the best performing algorithm as the same as in -G h , while OWT-E q does suffer from misclassification, particularly with OWT-F h . The misclassification of OWT-E q with -F h may explain the improved performance of OWT-F q , which, as a result, covered a much larger range of chl-a measurements (Table 1), in which higher chl-a often has a stronger observable signal when using Landsat.

Comparison of Global Algorithms to OWTs
Optically bright lakes exhibited unique algorithm performances, while optically dark lakes returned similar performances with the same algorithms ( Figure S1). All OWTs provided unique algorithm performances in comparison to the global models. OWTs consistently had improved retrieval accuracy and lower error (RMSE, NRMSE, RMSLE, and MAE) compared with those of the global algorithms, with the exception of OWT-C h ( Table 3). Instances of global algorithm performance exceeding that of an OWT may also be a result of the following assumptions and methods established within this study. This study used mean ρλ in each lake to identify a singular OWT; however, multiple water types can exist within one lake due to differences in morphology, weather, and land use [47]. The use of a mean ρλ may help in reducing noise in observed ρλ, improving the linear regression at the global level. The use of a mean ρλ, however, may also reduce the capacity for classifiers to define unique spectra, as seen in the optically dark lakes. Lakes-especially large lakes-may represent more than one OWT due to spatial (e.g., multiple lake basins, local point sources of detritus, nutrient, or sediment runoffs; Figure 11) and temporal (e.g., shifts in water chemistry due to precipitation, lake mixing, or algal growth events) factors. Therefore, the separation of OWTs may not provide significant chl-a retrieval performance over that of a global model for lakes that exhibit multiple optical signals prior to outlier removal. When these lakes are placed into OWTs and used in a regression, the variability they introduce is more statistically impactful on the correlation when the sample size is smaller; therefore, the global algorithm is less affected by the variability introduced by this method. Additional variability will also be introduced due to the effects of atmospheric aerosol contribution.
This study made use of simple empirical algorithms such as band ratios and combinations. Bio-optical models [90], such as water colour simulator (WASI) [91], have shown promising results for chl-a retrieval in optically complex waters [92]. However, these physics-based models require knowledge of the absorption and backscatter of IOPs, which were not available in public water quality data records and were, therefore, not employed in this study. Additionally, various bio-optical models require specifically centred bands which are not provided for by Landsat and require spectral calibration using in situ reflectances [93]. Alternative empirical methods such as machine learning, Empirical Orthogonal Function (EOF) analysis, and line-height algorithms options may also provide improvement to chl-a retrieval in optically complex waters [7,90,91,94]. Machine learning methods such as artificial neural networks require significant training data for accurate results [95]. The separation of data into OWTs limits the available training and testing data; therefore, a machine learning approach was not appropriate for this study. EOF is a type of principle component analysis that is not commonly used for chl-a retrieval but has shown potential in some studies [96]. Line-height algorithms typically use chl-a fluorescence peaks at which Landsat bands are not centred. New methods such as colour space transformations have been applied to improve chl-a retrieval [97,98] by converting a multiband RGB to a hyper-hue-saturation-intensity image [99]. While this study looked to improve upon traditional band algorithms, colour space transformation may be an optimal method to use in future studies. Future studies may also look to integrate externally derived OWTs using more refined techniques [47,100] to improve upon OWT identification in Landsat imagery.

Conclusions
There has been an increase in the number of algal bloom reports in lakes, for which remote sensing retrieval of chl-a for small inland waters is needed to develop a predictive understanding of algal bloom occurrence. Landsat provides the largest historical image record of any sensor and has a long history of chl-a retrieval. This study showed that a guided OWT classification system using Landsat normalized ρλ and Chl:T to define OWTs provided significant improvements in chl-a retrieval algorithms. Seven OWTs based on ρλ in Landsat visible-N bands and on Chl:T were able to retrieve chl-a for inland lakes ranging from eutrophic to oligotrophic, and from turbid to clear, with significantly improved accuracy compared to a global chl-a retrieval algorithms. The separation of lakes into OWTs did improve chl-a retrieval, including in areas where algal blooms occur within less frequently studied, but equally important, lakes. Future work should focus on seeing if the performance of Landsat data for identifying OWTs and predicting chl-a could be further improved by improving supervised classification techniques, and by implementing additional water chemistry data that may help in further differentiating distinct OWTs.