Spatiotemporal Variability in Phytoplankton Bloom Phenology in Eastern Canadian Lakes Related to Physiographic, Morphologic, and Climatic Drivers

Phytoplankton bloom monitoring in freshwaters is a challenging task, particularly when biomass is dominated by buoyant cyanobacterial communities that present complex spatiotemporal patterns. Increases in bloom frequency or intensity and their earlier onset in spring were shown to be linked to multiple anthropogenic disturbances, including climate change. The aim of the present study was to describe the phenology of phytoplankton blooms and its potential link with morphological, physiographic, anthropogenic, and climatic characteristics of the lakes and their watershed. The spatiotemporal dynamics of near-surface blooms were studied on 580 lakes in southern Quebec (Eastern Canada) over a 17-year period by analyzing chlorophyll-a concentrations gathered from MODIS (Moderate Resolution Imaging Spectroradiometer) satellite images. Results show a significant increase by 23% in bloom frequency across all studied lakes between 2000 and 2016. The first blooms of the year appeared increasingly early over this period but only by 3 days (median date changing from 6 June to 3 June). Results also indicate that high biomass values are often reached, but the problem is seldom extended to the entire lake surface. The canonical correlation analysis between phenological variables and environmental variables shows that higher frequency and intensity of phytoplankton blooms and earlier onset date occurred for smaller watersheds and higher degree-days, lake surface area, and proportion of urban zones. This study provides a regional picture of lake trophic state over a wide variety of lacustrine environments in Quebec, a detailed phenology allowing to go beyond local biomass assessments, and the first steps on the development of an approach exploiting regional trends for local pattern assessments.


Introduction
The marked increase in freshwater algal blooms is of major interest to governments and public health agencies responsible for maintaining the ecological services provided by these systems. Cyanobacteria threaten the ecological integrity of some of the world's most important lake environments, including Lake Erie [1], Lake Ontario [2], Lake Taihu [3], Lake Okeechobee [4], and Lake Victoria [5]. Their increasing frequency and sustained presence affect the structure and functioning of aquatic food webs [6], limit recreational activities [7], and threaten drinking water sources [8]. Monitoring phytoplankton blooms remains difficult and onerous, particularly because their spatial and temporal distribution is highly variable when dominated by buoyant cyanobacteria [9,10]. A database of historical Chl-a concentrations was generated from satellite data acquired by the MODIS sensor (bands 1-7) located on NASA's Earth Observation System Terra platform, at a temporal frequency of 1 day. The spatial resolution of the bands 3-7 was refined from 500 m to 250 m using a spatial resolution downscaling approach developed by Trishchenko et al. [47]. This approach was validated using high resolution spatial data (Landsat ETM+ at 30 m) demonstrating that the radiometric properties of the downscaled bands were not altered. Image pre-treatment, including downscaling, projection, and atmospheric correction, was achieved using an automated procedure developed by the Canadian Center for Remote Sensing [47]. An estimation algorithm for Chl-a concentrations based on ensemble methods [48] was then applied to all MODIS images extracted. This algorithm was specifically developed for inland waters and performed well on databases with Chl-a ranging from low to high concentrations (see Appendix A). In order to reduce the uncertainty related to the heterogeneous elements on lakeshores (e.g., build environments, lake bottom), mixed pixels (water-land boundary) were removed over a 250 m band by applying a ground mask. This procedure generated a composite image formed by the minima of reflectance in the near infrared on images captured between May and October of 2000 to 2016. This avoids interference by macrophytes, since the reflectance of water pixels with phytoplankton will be inferior to that of the pixels occupied by macrophytes, which are more permanent elements in the littoral zone during the open-water season. In order to have enough pixels to adequately represent each water body, only lakes with a minimum area of 3.5 km 2 were considered for this study. Areas affected by haze or cloud cover were then removed using a cloud mask [49] specifically developed for inland waters (lakes, rivers, estuaries). Only lakes having less than 25% cloud cover for a given image were selected. Overall, Chl-a concentrations were extracted from 580 lakes in southern Quebec between May and October of the years 2000 to 2016, for a total of 1572 images.
An operational biomass threshold of 10 µg Chl-a L −1 was chosen to define the onset of a bloom and characterize its phenology on the studied lakes. It corresponds to the lower end of the eutrophic lake class (8-25 µg L −1 ) and to the decision threshold for recreational water level 1 as established by A database of historical Chl-a concentrations was generated from satellite data acquired by the MODIS sensor (bands 1-7) located on NASA's Earth Observation System Terra platform, at a temporal frequency of 1 day. The spatial resolution of the bands 3-7 was refined from 500 m to 250 m using a spatial resolution downscaling approach developed by Trishchenko et al. [47]. This approach was validated using high resolution spatial data (Landsat ETM+ at 30 m) demonstrating that the radiometric properties of the downscaled bands were not altered. Image pre-treatment, including downscaling, projection, and atmospheric correction, was achieved using an automated procedure developed by the Canadian Center for Remote Sensing [47]. An estimation algorithm for Chl-a concentrations based on ensemble methods [48] was then applied to all MODIS images extracted. This algorithm was specifically developed for inland waters and performed well on databases with Chl-a ranging from low to high concentrations (see Appendix A). In order to reduce the uncertainty related to the heterogeneous elements on lakeshores (e.g., build environments, lake bottom), mixed pixels (water-land boundary) were removed over a 250 m band by applying a ground mask. This procedure generated a composite image formed by the minima of reflectance in the near infrared on images captured between May and October of 2000 to 2016. This avoids interference by macrophytes, since the reflectance of water pixels with phytoplankton will be inferior to that of the pixels occupied by macrophytes, which are more permanent elements in the littoral zone during the open-water season. In order to have enough pixels to adequately represent each water body, only lakes with a minimum area of 3.5 km 2 were considered for this study. Areas affected by haze or cloud cover were then removed using a cloud mask [49] specifically developed for inland waters (lakes, rivers, estuaries). Only lakes having less than 25% cloud cover for a given image were selected. Overall, Chl-a concentrations were extracted from 580 lakes in southern Quebec between May and October of the years 2000 to 2016, for a total of 1572 images.
An operational biomass threshold of 10 µg Chl-a L −1 was chosen to define the onset of a bloom and characterize its phenology on the studied lakes. It corresponds to the lower end of the eutrophic lake class (8-25 µg L −1 ) and to the decision threshold for recreational water level 1 as established by the World Health Organization [50] and the government of Quebec [51]. When level 1 threshold is exceeded and biomass becomes dominated by cyanobacteria, it is considered a risk of minor health effects (irritation and allergies). This threshold is already used by watershed-based organizations, water management officers and municipalities.
For each studied year, phenological variables were established as follows: (1) the frequency, which is the number of days when Chl-a concentrations remained above the threshold, (2) the intensity, which is the maximum concentration of Chl-a detected during a bloom, (3) the relative area, which is the maximum area occupied by a bloom normalized by the lake area, (4) the onset date, and (5) the end date, which are the first and last days of the year when a bloom is detected, and (6) the duration, which is the number of days between the onset date and the end date. Determination of the end date and duration of blooms is challenging because the studied region is frequently covered by clouds during the fall, significantly reducing the number of MODIS images available for this period. This is especially true during the month of October, for which there was on average half as many MODIS images without full cloud cover than for months between May and September. Therefore, end date and duration were discarded from the study. The retained variables describe what can be considered as an annual-based phenology, compiling days with less than 25% cloud cover and for which remotely-sensed Chl-a was above the established threshold, for any given pixel.
A geo-referenced database of the morphological, physiographic, and climatic characteristics of the watershed of each studied lake was established. The boundaries and morphological descriptors of the watersheds (area, slope) were calculated from the Canadian Digital Elevation Model [52] with a spatial resolution of approximately 30 m. Climate data from North American Regional Reanalyses [53], with a spatial resolution of approximately 32 km, were used. The cumulative degree-days ( • C day) were calculated by summing the recorded degrees ( • C) each day above 20 • C, a value that has been considered a threshold for cyanobacterial growth [54,55]. Even though the remote sensing approach used here is not specific to cyanobacterial biomass, this climate proxy is considered valid. Land use data (at 40 m spatial resolution) and agricultural and ecumene data (at 25 m spatial resolution) were provided by Natural Resources Canada [56,57]. The environmental indicators were considered stationary over the period 2000-2016. A total of 27 environmental variables were extracted for each lake and their watershed. The variables with the highest correlation to phytoplankton bloom phenology were selected (see Section 3.2) for statistical analysis (Table 1). The spatial variability in phenological data is presented according to the latitude or longitude of the concerned lakes, and was statistically tested using the Kruskal and Wallis [58] test by randomly generating a subsample of lakes (by longitude or latitude) to ensure the independence of variables (runs test [59]). Temporal trend tests were conducted on median and annual extreme values (5th or 95th percentile; normality test [60]). Given the size and diversity of the generated data sets and the complexity of the interactions governing bloom phenology, analyses were carried out using canonical methods. The aim was to explore all possible correlations between phenological and environmental variables without using one group of variables to justify the other. The canonical correlation analysis (CCA) allows to simultaneously analyze two groups of variables by quantifying their association. The objects (lakes) under study are described by two sets of quantitative descriptors: the first set X 1 of p phenological descriptors, and the second set X 2 of q environmental descriptors. Linear combinations U i = A i X 1 and V i = B i X 2 (A i and B i are parameters; i = 1, . . . , K) of each set of descriptors are calculated in such a way that the canonical correlation between U i and V i is the highest possible. The first pair of canonical variables i = 1 is the pair of linear combinations U 1 and V 1 , which maximizes the equation's correlation. The second pair of canonical variables i = 2 is the pair of linear combinations U 2 and V 2 which maximizes the correlation of this equation and which is not correlated with the first pair of canonical variables. This process is repeated until K pairs of canonical variables are obtained, such as K = min (p, q). The significance of canonical correlations was tested using Bartlett's approximate chi-squared statistic [61,62]. The interpretation of canonical variables was based on the identification of: (1) standardized canonical coefficients A i and B i , (2) structure coefficients R (U i ,X 1p ) and R (V i , X 2q ) , and (3) canonical communality coefficients h 2 . The contribution of original variables to a given canonical variables was estimated from standardized canonical coefficients A i and B i . These weights are generated to maximize the canonical correlation R i , and are thus similar to the weights of a regression. Standardized coefficients assess the importance of one variable in relation to the others, and thus reflect their contribution to the canonical correlation. Structure coefficients were also calculated to evaluate the importance of a given unrelated variable. These coefficients correspond to the correlations between the original variables and the canonical variables R. The correlations (when squared) indicate the proportion of variance linearly shared by a given original variable with the canonical variable. Note that R correlations are not affected by the standardization of the original variables. Finally, the canonical communality coefficients (h 2 ) correspond to the sum of the squared R of all the canonical variables interpreted in the analysis. They provide information on the proportion of variance of a variable that is explained by the set of canonical variables used in the analysis. Variables with low values (<45%) are generally omitted from the analysis [63]. Although multicolinearity between variables does not present any analytical difficulties when using a CCA, it can complicate the interpretation of the results by blurring the origin of the observed effects [64]. The combined use of standardized and structure coefficients is therefore recommended since the latter are not affected by multicollinearity, and informs us on the potential contribution of the observed variables to the development of canonical variables [65]. Satellite data treatment and statistical analyses were computed using Matlab software (R2018b). Figure 2 shows the distribution histograms of the phenological variables. The median frequency of blooms was 15 days per year for all the lakes and years considered (Figure 2a). Among all lakes, the median intensity (maximum Chl-a concentration for a given lake) was 110 µg L −1 (2160 µg L −1 for the 95th percentile; Figure 2b), and the median surface area (maximum relative surface area) was 19% (Figure 2c). The distribution shape is similar across phenological variables and shows a left-sided asymmetry. However, the shape of the onset date distribution is left truncated (Figure 2d). This is because mapping of Chl-a using satellite imagery does not begin until mid-May of each year to ensure that the lakes further north are completely thawed out. Thus, the median date of the first algal bloom is 4 June on the available dataset. This date could be slightly earlier with a more complete distribution of this phenological variable (i.e., including lakes that thawed before 15 May).

Frequency of Blooms
Bloom frequency shows a clear spatial pattern across the entire studied territory (Figure 4a,b). Phytoplankton blooms were more frequent west of the area, and this frequency tended to drop systematically from west to east (Kruskal-Wallis test (KW): = 934; p < 0.001). A systematic increase was also apparent from north to south (KW: = 433; p < 0.001). Hence, regions located in southwest Quebec had much more frequent blooms. In terms of temporal patterns, the median number of days with blooms increased by 23% between 2000 and 2016 (significant at the 10% threshold, p = 0.091; Figure 4). There was also a significant increase in high bloom frequencies; for example, the 95th percentile of the number of days with blooms increased by 24% between 2000 and 2016 (significant at the 5% threshold, p = 0.043; not presented). Conversely, smaller occurrence frequencies (5th percentile) did not show any significant trend over time. This generally concerns lakes located in underdeveloped regions, northeast of the studied area.

Frequency of Blooms
Bloom frequency shows a clear spatial pattern across the entire studied territory (Figure 4a,b). Phytoplankton blooms were more frequent west of the area, and this frequency tended to drop systematically from west to east (Kruskal-Wallis test (KW): χ 2 = 934; p < 0.001). A systematic increase was also apparent from north to south (KW: χ 2 = 433; p < 0.001). Hence, regions located in southwest Quebec had much more frequent blooms. In terms of temporal patterns, the median number of days with blooms increased by 23% between 2000 and 2016 (significant at the 10% threshold, p = 0.091; Figure 4). There was also a significant increase in high bloom frequencies; for example, the 95th percentile of the number of days with blooms increased by 24% between 2000 and 2016 (significant at the 5% threshold, p = 0.043; not presented). Conversely, smaller occurrence frequencies (5th percentile) did not show any significant trend over time. This generally concerns lakes located in underdeveloped regions, northeast of the studied area.

Intensity of Blooms
Although not marked as the frequency of occurrence, bloom intensity also showed a similar spatial pattern on the studied territory ( Figure 5), representing an increase from east to west (KW: = 199; p < 0.001) and from north to south (KW: = 78; p < 0.001). No temporal trends of bloom intensity were detected between 2000 and 2016 (median, p = 0.220).

Surface Area of Blooms
The relative surface area of blooms showed a significant spatial pattern that differed from its frequency or intensity ( Figure 6). The relative surface area significantly increased on lakes located between 74° W and 75° W (KW: = 512; p < 0.01). It also increased from north to south, with blooms on lakes located between 45° N and 49° N showing a surface area twice as large as the lakes located between 50° N and 52° N (KW: = 198; p < 0.01). This spatial trend was similar when the four fluvial lakes were removed from the dataset. No significant temporal trend in median surface area was observed between 2000 and 2016 (p = 0.1836). However, events characterized by large surface areas (95th percentile) significantly increased from 50 to 55% between 2000 and 2016 (p = 0.025).

Intensity of Blooms
Although not marked as the frequency of occurrence, bloom intensity also showed a similar spatial pattern on the studied territory ( Figure 5), representing an increase from east to west (KW: χ 2 = 199; p < 0.001) and from north to south (KW: χ 2 = 78; p < 0.001). No temporal trends of bloom intensity were detected between 2000 and 2016 (median, p = 0.220).

Surface Area of Blooms
The relative surface area of blooms showed a significant spatial pattern that differed from its frequency or intensity ( Figure 6). The relative surface area significantly increased on lakes located between 74 • W and 75 • W (KW: χ 2 = 512; p < 0.01). It also increased from north to south, with blooms on lakes located between 45 • N and 49 • N showing a surface area twice as large as the lakes located between 50 • N and 52 • N (KW: χ 2 = 198; p < 0.01). This spatial trend was similar when the four fluvial lakes were removed from the dataset. No significant temporal trend in median surface area was observed between 2000 and 2016 (p = 0.1836). However, events characterized by large surface areas (95th percentile) significantly increased from 50 to 55% between 2000 and 2016 (p = 0.025). Environments 2020, 7, x FOR PEER REVIEW 9 of 26

Onset Date of Blooms
The date of the first algal bloom was earlier on lakes located west of the studied area ( Figure 7). The median date of the first event was May 21 at 80 • W, while the first event occurred on 25 June at 66 • W (KW: χ 2 = 828; p < 0.01). The first bloom also occurred increasingly early on lakes located further south of the territory (KW: χ 2 = 508; p < 0.01); for example, the median date of the first event was 21 May at 45 • N, and 21 June at 52 • N. For 58% of the lakes, the onset of blooms occurred before the beginning of July, while only 17 lakes had their bloom onset date later than 1 September (Figure 3d). Moreover, the first bloom occurred earlier as years progressed (p = 0.0646; Figure 7c); the median date for all lakes of the studied territory moved from 6 June to 3 June . The 95th percentile of the onset date (i.e., on the lakes with the latest onset date) also occurred earlier (yet not significant at a threshold of 10%, p = 0.1065), changing from 14 July to 1 July between 2000 and 2016. These lakes are typically located north of the studied area, with very little agricultural and urban development. Lakes characterized by the 5th percentile of the onset date, typically located southwest of the studied area in highly urbanized and agricultural sectors did not show any significant trend, the date remaining between May 18 and 19 from 2000 to 2016 (p = 0.826).

Onset Date of Blooms
The date of the first algal bloom was earlier on lakes located west of the studied area ( Figure 7). The median date of the first event was May 21 at 80° W, while the first event occurred on June 25 at 66° W (KW: = 828; p < 0.01). The first bloom also occurred increasingly early on lakes located further south of the territory (KW: = 508; p < 0.01); for example, the median date of the first event was May 21 at 45° N, and June 27 at 52° N. For 58% of the lakes, the onset of blooms occurred before the beginning of July, while only 17 lakes had their bloom onset date later than September 1 ( Figure  3d). Moreover, the first bloom occurred earlier as years progressed (p = 0.0646; Figure 7c); the median date for all lakes of the studied territory moved from June 6 to June 3. The 95th percentile of the onset date (i.e., on the lakes with the latest onset date) also occurred earlier (yet not significant at a threshold of 10%, p = 0.1065), changing from July 14 to July 1 between 2000 and 2016. These lakes are typically located north of the studied area, with very little agricultural and urban development. Lakes characterized by the 5th percentile of the onset date, typically located southwest of the studied area in highly urbanized and agricultural sectors did not show any significant trend, the date remaining between May 18 and 19 from 2000 to 2016 (p = 0.826).

Phenological Trends of Missisquoi Bay and Lake Brome
The phenological trends of Missisquoi Bay (Lake Champlain; Figure 8) and Lake Brome ( Figure  9) are presented since they have been extensively studied in the past given the persistence of cyanobacterial blooms on those lakes in the last decades. The annual occurrence frequency increased significantly on Missisquoi Bay, where the number of days above the Chl-a threshold increased from 25 in 2000 to 47 in 2016 (t = 2.71; p = 0.02). The intensity and surface area of these bloom events were important, with maximum Chl-a concentrations reaching 1315 μg L on average every year, and surface area reaching 62%. However, the onset date of blooms over Missisquoi Bay did not show

Phenological Trends of Missisquoi Bay and Lake Brome
The phenological trends of Missisquoi Bay (Lake Champlain; Figure 8) and Lake Brome (Figure 9) are presented since they have been extensively studied in the past given the persistence of cyanobacterial blooms on those lakes in the last decades. The annual occurrence frequency increased significantly on Missisquoi Bay, where the number of days above the Chl-a threshold increased from 25 in 2000 to 47 in 2016 (t = 2.71; p = 0.02). The intensity and surface area of these bloom events were important, with maximum Chl-a concentrations reaching 1315 µg L −1 on average every year, and surface area reaching 62%. However, the onset date of blooms over Missisquoi Bay did not show any significant temporal trend. With regard to Lake Brome, the annual frequency also showed a significant increase from 23 days in 2000 to 30 days in 2016 (t = 2.32; p = 0.03). Although the intensity (219 µg L −1 on average) was lower than that of Missisquoi Bay, the bloom was covering the entire surface of the lake (maximum annual surface area reaching 100%) at least once every year over the studied period except in 2010 (97% of surface area). Although not significant, the first annual bloom occurred increasingly early in the spring on this lake (in average, from 28 May in 2000 to 19 May in 2016). This trend may be confirmed in the future.
Environments 2020, 7, x FOR PEER REVIEW 11 of 26 any significant temporal trend. With regard to Lake Brome, the annual frequency also showed a significant increase from 23 days in 2000 to 30 days in 2016 (t = 2.32; p = 0.03). Although the intensity (219 μg L on average) was lower than that of Missisquoi Bay, the bloom was covering the entire surface of the lake (maximum annual surface area reaching 100%) at least once every year over the studied period except in 2010 (97% of surface area). Although not significant, the first annual bloom occurred increasingly early in the spring on this lake (in average, from 28 May in 2000 to 19 May in 2016). This trend may be confirmed in the future.

Correlation Analysis
Results of the Pearson correlation analyses within and between the phenological and environmental variables are presented in Appendixes B and C. These analyses allowed to reduce the number of environmental variables in order to eliminate information redundancy and to select variables most correlated to bloom phenology. Results indicate that the frequency of events and the onset date show a strong correlation with degree-days, while bloom intensity is correlated to water body morphology and water surface temperature. However, the bloom extent did not show any significant correlation with the environmental variables included in this study.

Correlation Analysis
Results of the Pearson correlation analyses within and between the phenological and environmental variables are presented in Appendices B and C. These analyses allowed to reduce the number of environmental variables in order to eliminate information redundancy and to select variables most correlated to bloom phenology. Results indicate that the frequency of events and the onset date show a strong correlation with degree-days, while bloom intensity is correlated to water body morphology and water surface temperature. However, the bloom extent did not show any significant correlation with the environmental variables included in this study.

Canonical Correlation Analysis
Results from the CCA applied between the environmental variables and phytoplankton bloom phenology show high for the first two pairs of canonical variables (0.77 and 0.71), which considerably dropped for the last canonical correlations (0.23 and 0.13). The Bartlett test has shown that these canonical correlations are significant at threshold = 0.01 (Table 2). Based on these values, only results of the first two pairs of canonical variables were interpreted. Table 3 presents the standardized canonical coefficients, structure coefficients, and communality coefficients. Communality coefficients where h ≥ 45% are highlighted to identify the variables most useful to the canonical model development.

Canonical Correlation Analysis
Results from the CCA applied between the environmental variables and phytoplankton bloom phenology show high R i for the first two pairs of canonical variables (0.77 and 0.71), which considerably dropped for the last canonical correlations (0.23 and 0.13). The Bartlett test has shown that these canonical correlations are significant at threshold α = 0.01 (Table 2). Based on these R i values, only results of the first two pairs of canonical variables were interpreted. Table 3 presents the standardized canonical coefficients, structure coefficients, and communality coefficients. Communality coefficients where h 2 ≥ 45% are highlighted to identify the variables most useful to the canonical model development.

Phenological Variables
The phenological variable most contributing to the correlation between U 1 was V 1 were the frequency of occurrence (a 1,1 = −0.54), followed by the onset date (a 1,4 = 0.41), the intensity (a 1,2 = −0.20), and the extent of blooms (a 1,3 = −0.06). Variables with the highest level of relevance to the model development appear in the same order: the frequency of occurrence (R (V 1 ,x 1,1 ) = − 0.94), the onset date (R (V 1 ,x 1,4 ) = 0.89), the intensity (R (V 1 ,x 1,2 ) = − 0.49), and the extent (R (V 1 ,x 13 ) = − 0.46). Hence, bloom phenology was mainly characterized by the frequency and onset date, followed by the intensity also having a significant contribution. These variables each provide complementary information, since they do not show strong collinearity (r < 0.7, Appendix B). The structure coefficients of frequency, intensity, and extent are inversely related to the structure coefficients of the onset date.

Discussion
This study presents the spatiotemporal dynamics of phytoplankton blooms over 580 lakes in southern Quebec between 2000 and 2016, and their potential relationships with physiographic, morphological, and climatic descriptors prevailing on the lakes and their watersheds. The results demonstrate realistic and expected trends, which validate the usefulness of this approach to study the response of lake trophic state to historical and future changes in land use and climate. For instance, the data show the expected increase in the magnitude of phytoplankton blooms (expressed by frequency, surface area, or intensity) from north to south and east to west, as well as an earlier onset date in the southern and western regions of the studied region. These spatial trends had been observed over the last decades by water quality monitoring services (MELCC) [66][67][68], with blooms typically located in sectors of highly developed areas.
A particularly interesting result is the temporal increase of blooms observed between 2000 and 2016 in Quebec lakes, invalidating the hypothesis that the increasing trend would simply be related to the greater attention given to the phenomenon. This trend has also been observed by Winter et al. [69] from 1994 to 2009 on lakes of Ontario, and Ho et al. [42] over a 30-year period on 71 large lakes across the planet. The recent review by Huisman et al. [70] particularly describes the overall increase in frequency, intensity, and duration of cyanobacterial blooms observed on lakes globally. For example, this was demonstrated from the analysis of cyanobacterial pigments in sediment cores from over one hundred lakes of Northern America and Europe [71]. Our study allowed to quantify phenological trends over time and with respect to landscape characteristics, promoting predictive models such as the one put forward by Cremona et al. [17], who developed a cyanobacterial biomass prediction model with respect to regional climatic variables and hydrological indicators. They found that cyanobacteria biomass will increase from 2% to 10% in future decades.

Phenological Trends
During the period under study, phytoplankton biomass exceeded the threshold of 10 µg Chl-a L −1 during 15 days per year (May to October) on average for all lakes studied. This result is rather conservative, as only days with a cloud cover under 25% of observable surface area were included in the database, while we can assume that many additional cloudy days would add to this [27]. The areal extent of the blooms reached 19% of the overall lake surface on average (47% for the 95th percentile). Hence, most bloom events were rather restricted in terms of surface coverage. A bloom is qualified as very limited when the surface area remains under 25% (Sylvie Blais from the government of Quebec, pers. comm.). Our results show that all studied lakes had at least one pixel reaching a biomass of 10 µg Chl-a L −1 or above at least once over the studied period, while this proportion drops to 12% of the lakes for blooms (biomass > 10 µg L −1 ) covering more than 50% of the lake area. Therefore, high biomass values are often reached, but the problem is seldom extended to the entire lake surface area.
High annual bloom frequencies were mostly observed on lakes located in highly developed sectors with heavy urbanization and agricultural use. The impact of land use (and its associated inputs of nutrients) appears to play an undeniable role on algal bloom phenology; this relationship was validated by the CCA showing that the relative proportion of urban areas and population ecumene were significant in controlling the frequency, intensity, and onset date. These results were also obtained by Weber et al. [72] showing a significant relationship between percent forest and cyanobacteria cell densities for 771 waterbodies in Georgia, USA. The bloom areal extent also varied spatially, with a significant increase between longitudes 74 • W and 75 • W, a corridor with intense seasonal resort operations stretching southwest of Montreal. However, this regional increase was partly caused by the phenology of two fluvial lakes (see Appendix D), Lake Des-Deux-Montagnes and Lake Saint-François, showing extended blooms (on average 35% of the lake area) of low biomass (median values below 15 µg L −1 for days exceeding the threshold). Nevertheless, this spatial trend in bloom areal extent still existed when removing fluvial lakes from the database.
An increase by 23% in bloom frequency was observed between 2000 and 2016 (all lakes). It was mostly observed on lakes with moderate to high frequencies (i.e., between 15 to 28 days of blooms per year on average). While the median areal extent did not increase significantly over the studied period, events covering a large fraction of the lake (95th percentile) showed a significant increase over the years. Hence, lakes with large and frequent blooms, for instance Missisquoi Bay north of Lake Champlain and Lake Brome, are the ones most clearly showing a rising trend over the studied period. These two lakes have largely been studied due to the intensity of blooms occurring there and the resulting socio-economic issues, for example on drinking water quality [10,[73][74][75]. The rising importance of blooms on these lakes, often composed of buoyant cyanobacteria, has been associated to land-use changes (nutrient inputs) and increasing water temperature [76].
The first blooms occurred 3 days earlier in 2016 than in 2000 on average over the studied territory, although the accuracy of onset date estimates is influenced by the amount of missing data. Other studies have shown bloom onset date becoming earlier over time on Lake Taihu in China (subtropical climate), but at a much faster rate (~10 days earlier per year between 1998 and 2009, [38,77]). Interestingly, our results show that the onset date was particularly getting earlier (by almost 2 weeks) on lakes located in the northern part of the studied territory, but this specific trend is not significant. Direct impacts of increased air temperatures include an extended summer season, higher surface water temperatures, and intensified thermal stratification of lacustrine environments [78]. These conditions can stimulate the growth of phytoplankton communities in eutrophic environments [79] and particularly of cyanobacteria [70]. Therefore, these effects may be more apparent in northern regions where other anthropogenic factors are not acting, although the observed trends need to be substantiated.

Links to Climate and Environmental Physiography
It is not simple to identify the causes of the rising trend in bloom frequency in a context of population expansion occurring in parallel to global warming. Partitioning the causes will need the development of a regional model allowing to test various scenarios for a given water body with respect to the specific conditions prevailing on its watershed. The spatiotemporal trends in bloom onset date, development, maintenance, and decline observed on the studied territory present linkages with the lake's morphological characteristics, watershed's physiographic characteristics, and prevailing climatic conditions, as shown from the canonical analysis. It allowed to highlight the characteristics underpinning the regional dynamics of phytoplankton blooms.
The frequency and onset date of blooms are the phenological variables most strongly linked to the environmental characteristics prevailing on the lakes and their watershed. The most significant environmental variables are the lake area, watershed area, settlement, and degree-days. Ultimately, the input of nutrients to lakes will be larger in urbanized and agricultural sectors southwest of the territory where lakes are more exposed to non-point source pollution (typically related to agriculture) and point source pollution (e.g., sewage systems, domestic, or industrial wastes). On the other hand, since the studied territory covers about 600,000 km 2 , climate is very likely to play a role on phytoplankton bloom phenology. For example, on the studied territory, there is a difference of 37 days in the onset date of phytoplankton bloom. Zhang et al. [38] showed that the onset date and duration of blooms are strongly related to climate (temperature, sunshine hours, and global radiation). The southwestern part of the territory is therefore offering more favorable conditions to phytoplankton growth, particularly when nutrients are abundant (Huisman et al [70] and references therein).
Results indicate that the urbanized area is the physiographic variable best explaining bloom frequency and intensity. This relationship has been evoked in other studies mentioning that key forcing factors for the development of blooms include modifications resulting from anthropogenic activities such as contaminants from effluent and stormwater discharges, natural resource extraction and agricultural runoff [80,81]. Interestingly, the CCA results indicate that urbanization (settlement and population ecumene) better explains the spatiotemporal variability of phytoplankton blooms than agricultural variables (cropland and agriculture ecumene), but both variables are linked on this relative occupancy scale. In the studied region, urban area varied extensively (0-64% of the lakes drainage basin) as well as farming area (0-55%). Farming has often been raised as a major controlling factor on bloom phenology through its influence on P and N loads.
The impact of lake surface area on frequency, intensity, and onset date revealed in the present study (larger lakes presenting higher frequency, intensity, and earlier onset date) has yet to be explained, although lake morphology is of definite importance in the development of cyanobacteria. For instance, dominance of filamentous species is observed in shallow lakes while colony-forming species dominate deeper lakes [82]. Others have also shown the influence of hydrologic retention time on the establishment of blooms and their composition [83,84]. Since larger lakes generally tend to have longer retention times, this factor could explain the relationships revealed by the CCA. However, we cannot exclude that the frequency of events could increase with the observable lake surface area (by the remote sensor), increasing the probability of detecting a bloom.
Degree-days is the climatic variable best explaining bloom phenology, followed by total annual precipitation. However, annual or seasonal water temperatures (Table 3 and Appendix C) do not show any clear relationships with the phenological variables. Although several temperature descriptors have often been related to phytoplankton growth, including atmospheric temperature [19,38], water temperature [85], hours of sunlight [38], and degree-days [86,87], it is the latter descriptor that most directly impact the growth of ectotherm organisms [88]. For instance, Ralston et al. [87] used degree-days to assess inter-annual variability in the onset date of algal blooms, their development and date of decline in the Nauset estuary, and proposed this metric as an efficient warning indicator. Trombetta et al. [89] also pointed out that water temperature is a key factor controlling the phenology and community structure of phytoplankton blooms in a Mediterranean shallow coastal system. The recent study by Ho and Michalak [90], exploiting over twelve hundred summertime lake observations from across the continental U.S., showed that summer temperature drives total phytoplankton abundance, while the length of summer is linked to cyanobacterial abundance. The impact of precipitations on bloom frequency, intensity, and onset date suggested by the present study CCA is also discussed by Ho and Michalak [90], who are evoking the effect of increased nutrient runoff on bloom development, while precipitations could rise flushing rates and slow down growth, confusing the relationships.

Conclusions
This study provides a regional portrait of lake trophic state over a wide variety of lacustrine environments in Quebec, and a detailed phenology allowing to go beyond simple biomass assessments at the scale of a lake or locally. It offers an approach to characterize bloom phenology on lakes that are providing ecological services for instance, and eventually to forecast how this phenomenon could evolve in response to changes in climate or land use. This approach is of particular interest in a country with over a million lakes across its territory [91]. While the statistical approach used here does not provide definitive mechanistic linkages, it is useful for identifying potential mechanisms driving bloom phenology at the regional scale. The increasing frequency of bloom events observed over the study period invalidates the hypothesis that the rising trend would simply be due to the greater amount of attention given to the phenomenon. Remote sensing data allows to assess bloom phenology with much more details and inform on specific features. For example, results indicate that high biomass values are often reached but the problem is seldom extended to the entire lake surface. Moreover, studying blooms over a territory 600,000 km 2 allowed to determine a difference in bloom onset dates of 37 days, which could be exploited in a space-for-time substitution approach to assess the response of lakes to future climate conditions.
The analysis of bloom phenology on 580 lakes and the associated environmental conditions also allowed to identify key factors explaining the spatiotemporal patterns: degree-days, land use, and the morphology of lakes and their watersheds. For instance, the procedure adopted here based on remote sensing provides near-real time biomass estimation to implement management plans in a recreational context. Remote sensing offers a great potential to locate where phytoplankton blooms initiate and how they evolve over space and time, and exploit regional relationships to study local patterns. This will be done through the development of a statistical model examining the impact of climate or land use on the occurrence of phytoplankton blooms, which will be presented in a sister paper. As spatial resolution improves and images become more accessible, this approach will represent an efficient tool to identify priority areas and strategies in restoration plans.

Appendix A. Uncertainties on the Estimation of Pytoplankton Biomass and Global Trends
Naturally, there is an uncertainty regarding the Chl-a concentration values obtained using the ensemble-based model developed by El Alem et al. [48]. This model had a determination coefficient of 0.93 (relative RMSE = 50%, relative BIAS = −27%, relative NASH = 0.70) when it was tested on an independent database containing a group of lakes with relatively low biomass in the southern part of the studied territory, and where Chl-a concentrations had been measured in-situ [48]. Using a cross-validation approach, the model performed better when it was used to estimate high Chl-a concentrations (R 2 = 0.98, RMSEr = 15%, BIASr = −2%, NASHr = 0.95) than at the initialization of blooms (R 2 = 0.77, RMSEr = 37%, BIASr = −8%, and NASHr = 0.70). The model calibration did not include fluvial environments, where the hydrological dynamics and optically active components are different from those of lacustrine environments, and where there is a greater amount of suspended inorganic matter. This could cause an estimation bias regarding Chl-a concentrations when non-algae particles are proportionally concentrated, although a qualitative evaluation of this type of interference by El Alem et al. [48] seems to indicate that the estimation model behaved correctly in the presence of a high inorganic content in Lake Huron. That is why the results of lakes Des-Deux-Montagnes, Saint-Louis, Saint-François, and Saint-Pierre, located along the St. Lawrence River, were presented for information purposes only. The inclusion of these lakes does not affect our global results, since the spatiotemporal trend analysis and the CCA were also carried out without including these lakes, and the results remained the same. In Lake Des-Deux-Montagnes, bordering the city of Montréal, the median summer concentration of Chl-a was 4.8 µg L −1 in 2000 according to our results, while in situ measurements taken close to the lake in 1997-1998 varied between 2 and 5 µg L −1 [66]. Given the little amount of data available to validate the estimations obtained for these lakes, it is difficult to ensure of the accuracy of the concentrations, further justifying the need to develop this type of approach when aiming to describe the phenological history of phytoplankton blooms using remote sensing.
Lastly, the percentage of missing data due to a large cloud cover (over 75% of the lake's surface area) or to geometric distortion of MODIS images was 68% at 45 • N for the present study, which gradually increased with latitude, reaching 80% of missing data at 52 • N ( Figure A1). The number of sunlight hours in Quebec [92,93] fully explains the spatial variability of missing data. Certain studies have shown that missing data can affect the accuracy of phenological variables, particularly on phytoplankton bloom frequency and onset date. For example, Cole et al. [94] found bloom onset date errors in ocean environments of the order of 2 to 3 days with 10% of missing data, and 15 to 30 days with 80% missing data when using images from the SeaWiFS sensor. The results obtained in the present study obviously carry an uncertainty related to cloud cover, which is admittedly difficult to quantify. Nonetheless, satellite databases considerably increase the precision and accuracy of spatiotemporal modelling of phytoplankton blooms comparatively to in situ sampling, thus representing a fair trade-off. Figure A1. Mean percentage (%) of missing data (between 2000 and 2016) related to cloud cover covering more than 75% of the lake for a given lake, or having geometric distortion of the MODIS image. The mean percentage of days with missing data is shown on the bases of months, lake surface area and location (latitude and longitude).  Figure A1. Mean percentage (%) of missing data (between 2000 and 2016) related to cloud cover covering more than 75% of the lake for a given lake, or having geometric distortion of the MODIS image. The mean percentage of days with missing data is shown on the bases of months, lake surface area and location (latitude and longitude).