Multiple Images Improve Lake CDOM Estimation: Building Better Landsat 8 Empirical Algorithms across Southern Canada

: Coloured dissolved organic matter (CDOM) is an important water property for lake management. Remote sensing using empirical algorithms has been used to estimate CDOM, with previous studies relying on coordinated ﬁeld campaigns that coincided with satellite overpass. However, this requirement reduces the maximum possible sample size for model calibration. New satellites and advances in cloud computing platforms offer opportunities to revisit assumptions about methods used for empirical algorithm calibration. Here, we explore the opportunities and limits of using median values of Landsat 8 satellite images across southern Canada to estimate CDOM. We compare models created using an expansive view of satellite image availability with those emphasizing a tight timing between the date of ﬁeld sampling and the date of satellite overpass. Models trained on median band values from across multiple summer seasons performed better (adjusted R 2 = 0.70, N = 233) than models for which imagery was constrained to a 30-day time window (adjusted R 2 = 0.45). Model ﬁt improved rapidly when incorporating more images, producing a model at a national scale that performed comparably to others found in more limited spatial extents. This research indicated that dense satellite imagery holds new promise for understanding relationships between in situ CDOM and satellite reﬂectance data across large areas.


Introduction
Coloured dissolved organic matter (CDOM) is the optically measurable part of dissolved organic matter in water. It is optically characterized by its spectral absorption coefficient, aCDOM, at a reference wavelength (e.g., 440 nm). It is now understood to be a water property that partially controls the composition and functioning of freshwater ecosystems and regulates their responses to environmental change [1][2][3]. For example, CDOM's effects on aquatic ecology include impacts on: (1) light and thermal regimes that can influence primary productivity and aquatic species habitat [4][5][6][7][8][9][10], (2) the pH and alkalinity of water bodies among other chemical and photochemical processes [11,12], (3) the bioavailability and toxicity of contaminants in water by forming chemical complexes with metals [13,14], and (4) water quality for human use and consumption due to the high cost of purifying water with high CDOM concentrations [5,15,16]. Understanding CDOM levels is therefore important for the monitoring and management of freshwater resources.
There is limited in situ CDOM data collected relative to its importance, even though CDOM is easily measured in the laboratory. In Canada, for example, there is no CDOM data collected by publicly available water-quality programs. In comparison, other optically active water-quality variables are more often routinely sampled, even though CDOM has Research Council of Canada (NSERC) Canadian Lake Pulse Network [47]. We used random forest models to identify meaningful bands and band ratios to use for empirical model development. We then explored the impact of median filtering, comparing models made with images within a short time window of field sampling (30 days) to a series of broader temporal spans. Next, we estimated the optimal number of images used in median filtering to generate the best model fit. Finally, we assessed how the model performed in a separate set of lakes for which there was no clear imagery available within 30 days. The results will be relevant to remote-sensing specialists who want to better understand the key sources of variation in fit between sampled CDOM and future imagery, and for limnologists planning field-sampling strategies that will best use data from Landsat 8.

Study Site
Canada is a lake-rich country with a vast number of lakes distributed around its territory: it is estimated that there are over 900,000 lakes greater than 10 hectares that account for approximately 7% of the surface area of the country [20]. These lakes are distributed throughout the landscape, but not uniformly. Lake formation is influenced by geology and availability of water, and as such there is a higher density of lakes in northern and eastern Canada [20,48]. Many lakes, especially those located in northern Canada, are largely inaccessible and have never been sampled. As such, this study focused on lakes sampled in the southern portion of the country where in situ samples were available for calibration.

In Situ Data Collection
The data for this study was collected by the NSERC Canadian Lake Pulse Network, a research partnership between academia and government designed to provide Canada's first national-scale assessment of lake health [47]. Sampled lakes were selected in a semirandom sampling design with ecozone, human impact index, and lake size as stratifying factors. Sampling occurred during the period of maximum summer lake stratification (late June to mid-September) to eliminate seasonal effects. During these sampling campaigns, the Lake Pulse teams measured approximately 100 water-quality parameters, including CDOM absorption spectra from 664 lakes. Although Lake Pulse field crews sampled lakes across all 10 Canadian provinces and 2 territories (Northwest Territories and Yukon), we only used data that coincided with the ecozones in the southern half of Canada (n = 592). This was firstly because few samples were taken in the far north of Canada, and secondly because surface reflectance imagery for Landsat 8 data is only reliable at latitudes below 65 • [49].
Sampling procedures for field and laboratory analyses followed standard limnological practices [50]. For CDOM, a single sample per lake was collected from the epilimnion. Water for CDOM analysis was filtered through a 0.45 µm filter and stored in the dark at 4 • C in 60 mL Amber Boston Round Glass Bottles until the time of processing in the lab. Before lab analysis, samples were refiltered with 0.2 µm filters. CDOM was determined from absorbance measurements at 440 nm (a 440 ) using a Perkins-Elmer lambda 650 spectrophotometer using the dual-beam mode through 5 or 10 cm quartz cuvettes against a sample of deionised water. Absorbance was converted to Napierian absorption coefficients [51] using: a 440 = 2.303 * Absa 440 /l, where a 440 is the absorption coefficient, 2.303 is the natural logarithm of 10, Absa 440 is absorbance, and l is the cell path length of the cuvette (m). Additionally, the a CDOM spectrums were corrected for temperature differences between the deionized water and lake samples [52]. CDOM values are reported as a 440 (m −1 ).

Image Collection and Processing
We used Google Earth Engine (GEE) [53] to obtain Landsat 8 Tier 1 Surface Reflectance images from the U.S. Geological Survey (USGS) [49]. These images have been atmospherically corrected using the Land Surface Reflectance Code (LaSRC) [25]. The LaSRC algorithm is optimized for land rather than aquatic systems, but the product is nonetheless successfully used for lake-monitoring applications for the estimation of CDOM [17,54,55].
We extracted mean surface reflectance values from a 50 m buffer around each sample location using images with cloud cover ≤30%. Specifically, we extracted data in bands B1-B5 (ultra-blue, blue, green, red, near-infrared). We filtered the image data for each of the extracted buffer zones using the bit quality assessment (BQA) band [56,57] in order to remove cloud and cloud shadow. The image data for each of the extracted buffer zones were also filtered so that only images with a BQA band value of 324, signaling cloud-free water, were included. The remaining data were then filtered to remove any null values that occurred due to cloud and cloud shadow masking.
Of the lakes sampled for CDOM between 2017 and 2019 by the NSERC Canadian Lake Pulse Network, we identified those that were at least 10 ha large with observations 20 m or greater from shore (when including the 50 m buffer). For 329 of the lakes, we identified clear summer imagery within 30 days of the sample date (between 20 June-20 September). Of the 329 lakes, we randomly identified 233 of these lake samples for model calibration, leaving 96 for model validation (Figure 1). Calibration and validation lakes were distributed throughout the southern half of Canada and had a similar range of CDOM(a 440 ). For the calibration dataset, CDOM(a 440 ) values ranged between 0.05 and 11.5 m −1 , whereas for the validation dataset, values ranged between 0.06 and 10.5 m −1 . An additional 128 lakes were sampled by Pulse that did not have imagery within 30 days of the sample date. They were retained for subsequent use as an illustration of how this approach can be extended to larger numbers of lakes than in other approaches using narrow time windows. These lakes had estimated average lake depths that ranged from 1-143 m and estimated size that ranged from 0.1-964 km 2 [20].

Predictor Selection
We began by testing multiple time windows using a unique image. We tested time windows of ±1, ±3, ±7, and ±30 days. For these narrower time windows, there were very few sample points available for algorithm development and testing (17,50, and 101 lakes, respectively), which would not be applicable to a continental scale approach. Of the 592 NSERC Canadian Lake Pulse Network lakes, 329 (233 calibration, 96 validation) were sampled within a 30-day window of a clear image. We used these lakes as the basis to explore the effect of incorporating more imagery into the analysis, while leaving other factors fixed.
We used the random forest technique to select relevant variables using image data from ± 30 days of the in situ collection date (Thirty-day Window Nearest). We created random forests of potential explanatory band combinations using the VSURF package in R [58,59] and assessed the most significant combinations. We used the calibration dataset of measured CDOM(a 440 ) values as the dependent variable, and the surface reflectance values for Landsat 8 OLI bands B1-B5 and band ratio permutations from the Same Month Single Image temporal window as the independent variables (26 terms total). We used variables identified by the "prediction step" [58] and identified the pair of terms that produced the highest adjusted R 2 and lowest root mean squared error (RMSE) to develop the models. To ensure that there were no spatial boundaries that needed to be considered for the algorithm calibration process, we mapped and graphed residuals to identify any spatial clustering of extreme residual values.

Predictor Selection
We began by testing multiple time windows using a unique image. We tested windows of ±1, ±3, ±7, and ±30 days. For these narrower time windows, there were few sample points available for algorithm development and testing (17,50, and 101 l respectively), which would not be applicable to a continental scale approach. Of th NSERC Canadian Lake Pulse Network lakes, 329 (233 calibration, 96 validation) sampled within a 30-day window of a clear image. We used these lakes as the ba explore the effect of incorporating more imagery into the analysis, while leaving factors fixed.
We used the random forest technique to select relevant variables using image from ± 30 days of the in situ collection date (Thirty-day Window Nearest). We cr random forests of potential explanatory band combinations using the VSURF packa R [58,59] and assessed the most significant combinations. We used the calibration da

Time Windows
For each lake in the calibration and validation datasets, there was a substantial number of images in the Google Earth Engine catalog that could potentially improve the explanatory power of imagery for estimating CDOM in Canadian lakes. For each lake in the calibration and validation datasets, we identified images that fit the following time constraints: • For both the calibration and validation datasets, we compared the model fit for the various time windows using model adjusted R 2 , RMSE, which is the square root of the mean of squares of all errors [60]; mean absolute error (MAE), which is the mean of the absolute values of the individual prediction errors [61]; and statistical bias [62], which computes the average amount of error using the metrics package [62] in the R software [63].
where CDOM(a 440 ), sensor is using Landsat 8 SR data. MAE= 0 equals a perfect fit.

Effects of Adding Imagery
To understand the potential effect of increasing the amount of imagery considered when creating a regression relationship, we identified 118 (of 233) lakes from the calibration dataset that had 20 or more clear summer images over the 2013 to 2019 period. We iteratively selected images from this 20-image pool, without regard for the proximity of the images to the sample collection date and used them to create regressions using the same variables (B3/B4, B2) as those used for the "Thirty-day Window Nearest" progression. We then determined the asymptote of the distribution by using the Stats package [63] to implement the nls() function.

Extra Lakes Considered when the Time Window Was Expanded
We included lakes that would not have been considered in studies limited to only a single tight timeframe for image selection. These were 128 lakes for which there were no clear image matches within 30 days of in situ sampling and satellite overpass; CDOM(a 440 ) values ranged between 0.03 and 10.25 m −1 (Figure 1). We assessed model performance for two time windows: (1) Same Summer Median, and (2) All Summers Median.

Overall Workflow
The overall workflow of the methods section can be seen in Figure 2. Landsat 8 surface reflectance imagery was accessed in GEE, and band values for B1-B5 were extracted. These band values for a single clear image within a 30-day window were used in conjunction with the in situ CDOM(a 440 ) samples for the random forest using VSURF for band selection. At this stage we examined how using various time windows using a median filter impacted the results. We then explored the potential effect of increasing the amount of imagery using the median filter. Finally, we looked at an extra set of lakes in order to see how the model performed in lakes that had neither been used for model calibration or validation. tion with the in situ CDOM(a440) samples for the random forest using VSURF for band selection. At this stage we examined how using various time windows using a median filter impacted the results. We then explored the potential effect of increasing the amount of imagery using the median filter. Finally, we looked at an extra set of lakes in order to see how the model performed in lakes that had neither been used for model calibration or validation.

CDOM Model Results
Based on the results of the random forest using VSURF, we identified the best twotermed model as: where ln(CDOM(a440)) is the natural logarithm of the CDOM absorbance coefficient measured at 440 nm for a given sample location; coefficients a, b, and c were fit to the calibration data by regression analysis; and B3, B4, and B2 represent Landsat 8′s green, red, and blue bands, respectively. The first term (B3/B4) explained most of the variance observed, and was improved slightly by including the second term (B2) ( Figure S1). Random Forest runs on data with other time windows produced similar variable choices.

Using More Than One Image for Model Development
Given the results of our assessment of spatial variability in regression coefficients, we created a global regression model with the "Thirty-day Window Nearest" time window (ln(CDOM(a440)) = 3.65 − 2.91 * ln(B3/B4) − 0.41 * ln(B2), p-value < 2.2 * 10 −16 , adjusted R 2 = 0.47). Models built from very narrow time windows, which included only lakes with satellite overpass within one, three, or seven days, covered dramatically fewer lakes and

CDOM Model Results
Based on the results of the random forest using VSURF, we identified the best twotermed model as: where ln(CDOM(a 440 )) is the natural logarithm of the CDOM absorbance coefficient measured at 440 nm for a given sample location; coefficients a, b, and c were fit to the calibration data by regression analysis; and B3, B4, and B2 represent Landsat 8's green, red, and blue bands, respectively. The first term (B3/B4) explained most of the variance observed, and was improved slightly by including the second term (B2) ( Figure S1). Random Forest runs on data with other time windows produced similar variable choices.

Using More Than One Image for Model Development
Given the results of our assessment of spatial variability in regression coefficients, we created a global regression model with the "Thirty-day Window Nearest" time window (ln(CDOM(a 440 )) = 3.65 − 2.91 * ln(B3/B4) − 0.41 * ln(B2), p-value < 2.2 * 10 −16 , adjusted R 2 = 0.47). Models built from very narrow time windows, which included only lakes with satellite overpass within one, three, or seven days, covered dramatically fewer lakes and offered no improvement in accuracy metrics from the 30-day window model made from a single image (Table S1). We mapped the standardised regression residuals to assess if there were any spatial patterns in statistically significant residual values ( Figure 3). Most standardised regression residuals fell within 1.96 standard deviations of the mean for this study, with little apparent spatial pattern in how positive and negative they were throughout the country. offered no improvement in accuracy metrics from the 30-day window model made from a single image (Table S1). We mapped the standardised regression residuals to assess if there were any spatial patterns in statistically significant residual values (Figure 3). Most standardised regression residuals fell within 1.96 standard deviations of the mean for this study, with little apparent spatial pattern in how positive and negative they were throughout the country. Because our spatial assessment indicated no clear geographical boundaries for algorithm coefficients, we proceeded using a global algorithm using all of the data to test three methods for estimating the satellite's response for model calibration. Calibration regression models for the median filters for the four time window lengths are shown in Table 1. Model fit significantly improved when increasing data input by taking median image values from one month (adjusted R 2 = 0.46) to same year (adjusted R 2 = 0.63), and then again to input from all years (adjusted R 2 = 0.70), with the predictions tightening around the 1:1 line (Figure 4). RMSE also decreased significantly with increasing time window sizes for each of the median filters. Model coefficients (slope) also decreased with larger time window size when taking median image values, with low CDOM(a440) values tending to be overestimated and larger CDOM(a440) values underestimated in models using smaller time windows (Figure 4). Because our spatial assessment indicated no clear geographical boundaries for algorithm coefficients, we proceeded using a global algorithm using all of the data to test three methods for estimating the satellite's response for model calibration. Calibration regression models for the median filters for the four time window lengths are shown in Table 1. Model fit significantly improved when increasing data input by taking median image values from one month (adjusted R 2 = 0.46) to same year (adjusted R 2 = 0.63), and then again to input from all years (adjusted R 2 = 0.70), with the predictions tightening around the 1:1 line (Figure 4). RMSE also decreased significantly with increasing time window sizes for each of the median filters. Model coefficients (slope) also decreased with larger time window size when taking median image values, with low CDOM(a 440 ) values tending to be overestimated and larger CDOM(a 440 ) values underestimated in models using smaller time windows (Figure 4).

Model Validation
As in the calibration results, the quality of the CDOM(a440) prediction in the validation lake set grew quickly with more imagery (Figure 5). For each time window, the fit in the

Model Validation
As in the calibration results, the quality of the CDOM(a 440 ) prediction in the validation lake set grew quickly with more imagery (Figure 5). For each time window, the fit in the validation set was quite similar to that in the calibration set-e.g., adjusted R 2 = 0.70 for the all-images model in the calibration set versus adjusted R 2 = 0.66 in the validation set. The results showed ( Figure 5) that the models effectively predicted values of CDOM > ln(0) (CDOM > 1) for the various time windows, while low values of CDOM(a 440 ) were consistently overestimated. This overestimation was somewhat reduced in models using data from all years. validation set was quite similar to that in the calibration set-e.g., adjusted R 2 = 0.70 for the all-images model in the calibration set versus adjusted R 2 = 0.66 in the validation set. The results showed ( Figure 5) that the models effectively predicted values of CDOM > ln(0) (CDOM > 1) for the various time windows, while low values of CDOM(a440) were consistently overestimated. This overestimation was somewhat reduced in models using data from all years.

Effects of Adding Imagery
Increased amounts of imagery improved the ability to predict field-gathered CDOM(a440) values ( Figure 6). In 118 of the 233 calibration lakes across Canada for which at least 20 clear images were available, better and more consistent regression models were formed as more imagery was added. Considering the same set of 118 lakes, selecting more images from any summer date in the Landsat 8 record systematically improved the regression model between imagery and CDOM ( Figure 6). Using a carefully curated single image nearest in time to the sample date, the model had adjusted R 2 = 0.45; that model could be considerably improved in the imagery-rich lake subset by selecting any three summer images and taking the median ratio, raising the expected adjusted R 2 to about

Effects of Adding Imagery
Increased amounts of imagery improved the ability to predict field-gathered CDOM(a 440 ) values ( Figure 6). In 118 of the 233 calibration lakes across Canada for which at least 20 clear images were available, better and more consistent regression models were formed as more imagery was added. Considering the same set of 118 lakes, selecting more images from any summer date in the Landsat 8 record systematically improved the regression model between imagery and CDOM ( Figure 6). Using a carefully curated single image nearest in time to the sample date, the model had adjusted R 2 = 0.45; that model could be considerably improved in the imagery-rich lake subset by selecting any three summer images and taking the median ratio, raising the expected adjusted R 2 to about 0.57. Notably, models formed by taking any three randomly selected images were all better than the model with one carefully curated image. The effect of including more imagery leveled off around 12 images, with an expected upper bound asymptote near an adjusted R 2 of 0.74 ( Figure 6). 0.57. Notably, models formed by taking any three randomly selected images were all better than the model with one carefully curated image. The effect of including more imagery leveled off around 12 images, with an expected upper bound asymptote near an adjusted R 2 of 0.74 ( Figure 6). Figure 6. Overall adjusted R 2 over number of images used for CDOM(a440) estimation. Grey points represent adjusted R 2 values for a given number of images; violin plots show overall densities of the data for a given number of images; the dashed line represents the estimated asymptote; the black star represents adjusted R 2 using a single image closest in time to when in situ sampling took place.

Extra Lakes Considered When the Time Window Was Expanded
In such models that do not require tight timing of field and satellite data, substantially more lakes can be included in a research study. In this dataset, 128 lakes had no clear image within a 30-day window, which would exclude them from model calibration/validation in other approaches. Using the median of band ratios from the rest of the Landsat 8 satellite record, these 128 lakes broadly fit the pattern of the calibration and validation lakes, clustering around the 1:1 line-though with somewhat more scatter in the set of extra lakes (Figure 7). The scatter seemed to be greater for low values of CDOM(a440) in particular. However, by including these lakes, we were able to increase the number of lakes under consideration from 329 to 457, equating to a nearly 40% increase. This demonstrated that future studies could include lakes without matches within a 30-day window for model calibration and validation. Figure 6. Overall adjusted R 2 over number of images used for CDOM(a 440 ) estimation. Grey points represent adjusted R 2 values for a given number of images; violin plots show overall densities of the data for a given number of images; the dashed line represents the estimated asymptote; the black star represents adjusted R 2 using a single image closest in time to when in situ sampling took place.

Extra Lakes Considered When the Time Window Was Expanded
In such models that do not require tight timing of field and satellite data, substantially more lakes can be included in a research study. In this dataset, 128 lakes had no clear image within a 30-day window, which would exclude them from model calibration/validation in other approaches. Using the median of band ratios from the rest of the Landsat 8 satellite record, these 128 lakes broadly fit the pattern of the calibration and validation lakes, clustering around the 1:1 line-though with somewhat more scatter in the set of extra lakes (Figure 7). The scatter seemed to be greater for low values of CDOM(a 440 ) in particular. However, by including these lakes, we were able to increase the number of lakes under consideration from 329 to 457, equating to a nearly 40% increase. This demonstrated that future studies could include lakes without matches within a 30-day window for model calibration and validation. Remote Sens. 2021, 13, x FOR PEER REVIEW 12 of 17

Discussion
This study explored the creation of models for predicting field-sampled CDOM in a fixed set of lakes. Gathering and incorporating information from more Landsat 8 images improved our ability to estimate CDOM levels in hundreds of lakes. What was found was that, for a large number of varied lakes across Canada, the median of multiple images was a powerful and robust noise-dampening approach to model development. We were surprised at the strength of adding more images for a large number of lakes across a large portion of southern Canada. In effect, one could select any four summer images and be virtually guaranteed of a model that was superior to one created from a single, curated image near the field sample date. The use of an ensemble of images, even an ensemble of modest size, greatly aided CDOM estimation across a very large area.
Given the time, effort, and expense of gathering field data for CDOM model fits, and the relatively small proportion of Canada's nearly 1 million lakes larger than 10 ha [20] that have been sampled, field data is a rare resource, and its use should be maximized. In addition to improving the fit of CDOM models, the median method expands the number of lakes that can be considered. Without the constraint of matching cloud-free imagery to a costly field campaign, we were able to considerably expand the scope of analysis. An effective fit in a large number of lakes at a continental scale is especially useful given lingering challenges to transferring or extrapolating models made from a small number of lakes [30,55,64]. In that light, the predictive power of the preferred model in this study can be considered good (adjusted R 2 = 0.70, RMSE = 0.54 for the "All Summers Median" model). This is especially the case for CDOM, which is a challenging water property to estimate at broad scales in inland waters [26,65,66].
We found no spatial pattern in residuals of the model fit, indicating that there were no spatial boundaries that needed to be respected for the algorithm calibration process. This was unexpected, given the expanse of the study area and the wide range of lake types encompassed in the study. In previous studies, algorithms are usually locally calibrated either for individual lakes or regions of similar lakes [23,31,43,67] to consider the variability of lake types encompassed in a study. Therefore, algorithm calibration has mainly been tested across a more limited range of lake optical types. However, Deutsch et al. [35] similarly found that a single global algorithm could be used to estimate Secchi disk depth for

Discussion
This study explored the creation of models for predicting field-sampled CDOM in a fixed set of lakes. Gathering and incorporating information from more Landsat 8 images improved our ability to estimate CDOM levels in hundreds of lakes. What was found was that, for a large number of varied lakes across Canada, the median of multiple images was a powerful and robust noise-dampening approach to model development. We were surprised at the strength of adding more images for a large number of lakes across a large portion of southern Canada. In effect, one could select any four summer images and be virtually guaranteed of a model that was superior to one created from a single, curated image near the field sample date. The use of an ensemble of images, even an ensemble of modest size, greatly aided CDOM estimation across a very large area.
Given the time, effort, and expense of gathering field data for CDOM model fits, and the relatively small proportion of Canada's nearly 1 million lakes larger than 10 ha [20] that have been sampled, field data is a rare resource, and its use should be maximized. In addition to improving the fit of CDOM models, the median method expands the number of lakes that can be considered. Without the constraint of matching cloud-free imagery to a costly field campaign, we were able to considerably expand the scope of analysis. An effective fit in a large number of lakes at a continental scale is especially useful given lingering challenges to transferring or extrapolating models made from a small number of lakes [30,55,64]. In that light, the predictive power of the preferred model in this study can be considered good (adjusted R 2 = 0.70, RMSE = 0.54 for the "All Summers Median" model). This is especially the case for CDOM, which is a challenging water property to estimate at broad scales in inland waters [26,65,66].
We found no spatial pattern in residuals of the model fit, indicating that there were no spatial boundaries that needed to be respected for the algorithm calibration process. This was unexpected, given the expanse of the study area and the wide range of lake types encompassed in the study. In previous studies, algorithms are usually locally calibrated either for individual lakes or regions of similar lakes [23,31,43,67] to consider the variability of lake types encompassed in a study. Therefore, algorithm calibration has mainly been tested across a more limited range of lake optical types. However, Deutsch et al. [35] similarly found that a single global algorithm could be used to estimate Secchi disk depth for a large dataset of lakes spread across southern Canada. However, note that both the calibration and validation datasets lacked high CDOM lakes (>11.5 and 10.5 m −1 , respectively). Therefore, models developed in this study are applicable only to lakes with low to moderate CDOM levels. Additionally, the model proposed tended to overpredict CDOM(a 440 ) values in lakes with CDOM(a 440 ) < 1 m −1 , but this trend was partially mitigated using the "All Summers Median" time window. For future research concerned with pinpointing CDOM concentration in low-CDOM lakes, additional band ratios or other information could be considered.
In this study, we found that taking median satellite image values using increasingly wide time windows consistently resulted in improved algorithm fit (Figure 3). One likely reason we saw this improvement is that repeat satellite data over a given location is inherently variable due to atmospheric influences [68]. By adding more data to the analysis and assessing median band and band ratio data, it is possible that we removed atmospheric artifacts that were not already filtered out through atmospheric correction and data cleaning steps. Therefore, it is possible that taking median satellite image values over wider time windows helped us better observe the true Landsat 8 B3/B4 and B2 surface reflectance over a given water pixel.
We expect that the effectiveness of using the median will be strongest when image-toimage variability is greater than in-lake and among-lake CDOM variability. Each of the remote lakes in this study was visited only once, with a sample taken at a single point in each lake, making it difficult to quantify in-lake CDOM temporal variability. Because the method averages image values from a potentially long period of time, it is not intended to track changes in CDOM over short time scales. That said, many studies have found CDOM to be relatively stable when examining within-season timescales [26,28,69,70]. Other works have found that, at least in some lakes, CDOM can fluctuate over short time windows [17,71]. At decadal time scales, CDOM and related DOC levels have been found to be increasing in freshwaters across boreal and temperate regions in North America and Europe [1,2,72], which has been linked to the decrease of acid rain as a result of the emissions regulations that have been put in place in Europe and North America [2,73]. However, for the specific set of lakes in this study that were sampled from June until September, this variability was apparently not sufficient to limit the effectiveness of median satellite image values over time windows as large as seven years (2013-2019). It is possible that this method would not be applicable during other seasons of the year, when water conditions within lakes may change more rapidly [26].
Despite the apparent dampening effect of the median of a group of images, there were still sources of error and uncertainty that could limit performance in results seen here. First, we did not actively seek out region-specific models at this stage; it is possible that regional models could have improved the fit, at least for some lakes. Second, there may well be other sources of colour than CDOM, particularly in more eutrophic conditions. This might account for some of the variability in higher-CDOM conditions, and might be separately treated in future work. Third, for any given lake, there are additional sources of variability not explored here: CDOM concentration changes throughout a summer and across years, and may vary substantially in different parts of a given lake. Finally, though we tried to create a model that could be used to make broad estimates in a wide range of Canadian lakes, it is worth noting explicitly that only a small fraction of Canada's vast lake resource was sampled; the model may not be applicable to all lakes in southern Canada.
Future directions for this work could explore how using colour space transformations and optical water properties might improve CDOM(a 440 ) estimates. Transformations to hue-saturation-intensity (HSI) is a simple colour transformation that can be used to develop colour-based image-processing techniques. Niroumand-Jadidi et al. [74] found that colour space transformations showed benefits for estimating water-quality parameters including CDOM. Another possible direction to improve water-quality estimates would be to develop algorithms specifically based on the optical properties of a given waterbody. Various hierarchical, partitional, and hybrid clustering techniques have been used to successfully classify remote sensing reflectance (Rrs) into groups [75][76][77]. Since the NSERC Canadian Lake Pulse Network collected data on various optical water properties as well as in situ reflectance data, classification based on optical types could be explored. Specifically, total suspended sediments (TSS); algal biomass, measured as chlorophyll a (chl-a); and coloured dissolved organic matter (CDOM) data were collected.
Future work could also add other satellite sensors such as Sentinel-2 and Landsat 9, scheduled for launch in autumn 2021. Water-quality parameters have already been successfully estimated using workflows that harmonised Landsat 8 OLI and Sentinel-2 MSI imagery products in Google Earth Engine [17,42], suggesting possible joint Landsat/Sentinel models. By employing harmonised imagery from multiple sensors, revisit time would be reduced, which could considerably increase the chances of obtaining a high number of clear summer images.

Conclusions
While creating high-quality models of coloured dissolved organic matter (CDOM) across a lake-rich region, this study demonstrated the benefit, at least in these lakes, of adding additional imagery to sharpen the signal derived from remote sensing. Additionally, this study effectively expanded the number of lakes that were available for consideration: using a larger set of freely available imagery, we were able to make plausible estimates of CDOM in 40% more lakes than if we had used narrower time windows. As we consider projecting results across very large areas, such as continental scales as seen in the study, the ability to consolidate information from a variety of dates provides an opportunity that is lacking if we were to only use data from a single narrowly timed image.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/rs13183615/s1, Figure S1. Variable importance as identified by random forest using VSURF; Table S1. No meaningful improvement in model fit was seen when narrower time windows were considered. Data Availability Statement: Data from the NSERC Canadian Lake Pulse Network are available from the corresponding author only after permissions have been granted from the Lake Pulse Network. With Network permission, graph data, as well as a non-log-transformed view of the scatterplot, are available from the authors.