How Accurate Is an Unmanned Aerial Vehicle Data-Based Model Applied on Satellite Imagery for Chlorophyll-a Estimation in Freshwater Bodies?

: Optical sensors are increasingly sought to estimate the amount of chlorophyll a (chl_a) in freshwater bodies. Most, whether empirical or semi-empirical, are data-oriented. Two main limitations are often encountered in the development of such models. The availability of data needed for model calibration, validation, and testing and the locality of the model developed— the majority need a re-parameterization from lake to lake. An Unmanned aerial vehicle (UAV) data-based model for chl_a estimation is developed in this work and tested on Sentinel-2 imagery without any re-parametrization. The Ensemble-based system (EBS) algorithm was used to train the model. The leave-one-out cross validation technique was applied to evaluate the EBS, at a local scale, where results were satisfactory (R 2 = Nash = 0.94 and RMSE = 5.6 µ g chl_a L − 1 ). A blind database (collected over 89 lakes) was used to challenge the EBS’ Sentine-2-derived chl_a estimates at a regional scale. Results were relatively less good, yet satisfactory (R 2 = 0.85, RMSE= 2.4 µ g chl_a L − 1 , and Nash = 0.79). However, the EBS has shown some failure to correctly retrieve chl_a concentration in highly turbid waterbodies. This particularity nonetheless does not affect EBS performance, since turbid waters can easily be pre-recognized and masked before the chl_a modeling.


Introduction
Man-made pollution towards the environment combined with global warming have made algal blooms in freshwater bodies more recurrent, intense, and harmful more than ever. When they are massive, algal blooms can be a real threat to human and animal health [1]. Many readers make the common mistake of labelling massive algal blooms, known as HAB (Harmful algal blooms), as cyanobacteria. Three main species of algae can massively grow and turn into HAB: cyanobacteria, dinoflagellates, and diatoms [2]. Dinoflagellates-based or diatoms blooms are not as harmful to citizens as cyanobacteriabased blooms, but their massive growth is as degrading to the freshwater quality and to the entire aquatic ecosystem as cyanobacteria blooms [3]. Additionally, Long et al. [4] have reported a positive relationship between the biomass of phytoplankton as well as the concentration of their main pigment, Chlorophyll-a (chl_a), and the biomass of cyanobacteria. For these reasons, the World Health Organization (WHO) has listed chl_a as a mandatory parameter to measure in freshwater guidance level for cyanobacteriarelated risk (https://www.who.int/water_sanitation_health/bathing/srwe1-chap8.pdf, accessed on 4 February 2021). However, punctual networks of in situ measurements and

Study Area
Located between latitudes 44 • to 50 • north and longitudes 67 • to 80 • west, the area covered by this study represents Quebec priority watersheds and is composed of several freshwater bodies spread all over southern Quebec (Figure 1). This area encompasses approximately 2285 freshwater bodies that are visible at the spatial resolution of the Sentinel-2. These are very diverse in terms of their intrinsic bio-optical characteristics because they belong to different eco-environmental systems, which are not only governed by different physiological and ecological factors, but by anthropogenic action as well. This diversity is an interesting challenge to the EBS in order to test its robustness in different aquatic environments.

Materials
Three lakes (Brome, Champlain, and Magog) were chosen for this study (zoomed frame in Figure 1). Brome Lake is located southeast of Mount Brome and is the source of the Yamaska River. The watershed is 187 km 2 and the surface area is 15 km 2 . The length and width are 6 and 5 km, respectively, and the mean and maximum depths are 6 and 13 m, respectively. Lake Champlain is located in the Lake Champlain Valley, on the border of the states of Vermont and New York, its northern end being in the south of the province of Quebec. The watershed is 23,720 km 2 and the surface area is 1269 km 2 . The length and width are 201 and 23 km, respectively, and the mean and maximum depths are 20 and 122 m, respectively. Lake Magog is part of the large Saint-François River watershed. The watershed is 1956 km 2 and the surface area is 11 km 2 . The length and width are of 11 and 2 km, respectively, and the mean and maximum depths are of 9 and 19 m, respectively. These lakes belong to different trophic statues (oligotrophic (Magog) mesotrophic (Brome), and eutrophic (Champlain)). This was important for building a database of chl_a in situ measurements as wide as possible, since it is an essential criterion to develop a robust empirical model. The choice of these lakes was based on their historical (2001 to 2008) in situ chl_a measurements, collected by different departments of the Government of Quebec Province, which highlighted that all these lakes have experienced algae blooms episodes, but with significant differences, in terms of frequency and duration ( Figure 2). In this analysis, concentrations higher than 10 µg chl_a L −1 were taken as a bloom episode as stated by the WHO (https://www.who.int/water_sanitation_health/bathing/srwe1 but with significant differences, in terms of frequency and duration ( Figure 2). In this analysis, concentrations higher than 10 µg chl_a L −1 were taken as a bloom episode as stated by the WHO (https://www.who.int/water_sanitation_health/bathing/srwe1-chap8.pdf, accessed on 4 February 2021).   Remote Sens. 2021, 13, x FOR PEER REVIEW 4 of 28 but with significant differences, in terms of frequency and duration ( Figure 2). In this analysis, concentrations higher than 10 µg chl_a L −1 were taken as a bloom episode as stated by the WHO (https://www.who.int/water_sanitation_health/bathing/srwe1-chap8.pdf, accessed on 4 February 2021).

Data Compilation and Pre-Processing
In situ measurements, collected for the model calibration, were sampled in summer 2017 (18 to 20 July and 4 to 16 August) using the same experimental protocol. First, measurements were performed on a single transect composed of eighteen 20 × 20 m 2 cells Remote Sens. 2021, 13, 1134 5 of 27 (Sentinel-2 spatial resolution) covering a distance of 340 m. Samples were collected at the center of each cell. Second, as soon as the sampling process was completed, a drone boarded with visible/near-infrared hyperspectral cameras flew over the same eighteen 20 × 20 m 2 cells for image recording.
3.1. Calibration Database 3.1.1. Collection of Data from the Lake Nine samples for chl_a, pheophytin-a, and Secchi disk measurements were collected (one out of two cells of the eighteen), five samples for dissolved organic carbon (one out of four cells of the eighteen) measurements were collected, and one sample for total phosphorus and total nitrogen measurements (one cell of the eighteen) was taken at each field campaign (FC). A 2 L bottle of water was sampled at each cell center, followed by a water depth measurement using a 2.5 cm black/white Secchi disk. A part of this bottle was filtered for the chl_a (25 mm diameter GF/F filter). A known volume of water with enough chl_a was deposited on the filter for subsequent chl_a extraction. Since chl_a can be easily degraded by sunlight, it was important to prevent the water from being exposed to light. Therefore, it was directly collected in opaque containers. Thereby, as soon as filtration was completed, filters were conserved in a freezer (between 0 and 2 • C) for laboratory analysis. The rest of the water was frozen (−18 • C) for analysis of other parameters (colored dissolved matters, dissolved organic carbon, phosphorus, and nitrogen). Figure 3 illustrates the histogram of the calibration database (N = 41), from which it is clear that the collected in situ chl_a cover a relatively wide range of concentrations with a minimum of 2.16 µg chl_a L −1 (Magog Lake) and a maximum of 77.04 µg chl_a L −1 (Champlain Lake). Table 1 illustrates the results of all the analyzed water quality parameters in this project. These results highlighted that the three pre-selected lakes belong to different trophic levels ranging from oligo-mesotrophic (Magog Lake) to eutrophic (Champlain Lake) based on the chl_a, transparency, and total phosphorus values following the classification proposed by the Environmental Ministry of Quebec ( Figure 4).

Data Compilation and Pre-Processing
In situ measurements, collected for the model calibration, were sampled in summer 2017 (18 to 20 July and 4 to 16 August) using the same experimental protocol. First, measurements were performed on a single transect composed of eighteen 20 × 20 m 2 cells (Sentinel-2 spatial resolution) covering a distance of 340 m. Samples were collected at the center of each cell. Second, as soon as the sampling process was completed, a drone boarded with visible/near-infrared hyperspectral cameras flew over the same eighteen 20 × 20 m 2 cells for image recording.

Calibration Database
3.1.1. Collection of Data from the Lake Nine samples for chl_a, pheophytin-a, and Secchi disk measurements were collected (one out of two cells of the eighteen), five samples for dissolved organic carbon (one out of four cells of the eighteen) measurements were collected, and one sample for total phosphorus and total nitrogen measurements (one cell of the eighteen) was taken at each field campaign (FC). A 2 L bottle of water was sampled at each cell center, followed by a water depth measurement using a 2.5 cm black/white Secchi disk. A part of this bottle was filtered for the chl_a (25 mm diameter GF/F filter). A known volume of water with enough chl_a was deposited on the filter for subsequent chl_a extraction. Since chl_a can be easily degraded by sunlight, it was important to prevent the water from being exposed to light. Therefore, it was directly collected in opaque containers. Thereby, as soon as filtration was completed, filters were conserved in a freezer (between 0 and 2 °C) for laboratory analysis. The rest of the water was frozen (−18 °C) for analysis of other parameters (colored dissolved matters, dissolved organic carbon, phosphorus, and nitrogen). Figure 3 illustrates the histogram of the calibration database (N = 41), from which it is clear that the collected in situ chl_a cover a relatively wide range of concentrations with a minimum of 2.16 µg chl_a L −1 (Magog Lake) and a maximum of 77.04 µg chl_a L −1 (Champlain Lake). Table 1 illustrates the results of all the analyzed water quality parameters in this project. These results highlighted that the three pre-selected lakes belong to different trophic levels ranging from oligo-mesotrophic (Magog Lake) to eutrophic (Champlain Lake) based on the chl_a, transparency, and total phosphorus values following the classification proposed by the Environmental Ministry of Quebec ( Figure 4).   The imaging engine used in this project is composed of a hexacopter drone loaded with hyperspectral cameras: Pika II (covering wavelengths from 400 to 900 nm) and Pika NIR (covering from 900 to 1700 nm). The spectral resolution of the Pika II is 3 nm, whilst it is 6 nm for the Pika NIR. Since the payload of the hexacopter drone is about 10 kg, it was necessary to make two overflights (a first with Pika II and a second with the Pika NIR) for each field campaign. The hexacopter drone was placed at 100 m altitude allowing image recording with a spatial resolution of 3 cm. However, since the goal is to develop a model applicable to Sentinel-2 data sensors, during the geo-rectification, the outputs were up-scaled to 20 m spatial resolution.
The Pika Hyperspectral sensor initially records the sunlight radiance in digital numbers (DN). Equation (1) was thus used to convert the DN to spectral radiance.

Spectral radiance = Gain × DN + BIAS
(1) Figure 5 shows results of the radiances collected over the nine sampling points of the three lakes in each FC. Each plot illustrates the average ± the standard deviation of the radiance recorded over in situ measurements. As can be seen, the largest variation is observed for FC 2 of Champlain Lake and the lowest is observed for FC 2 of Magog Lake. These observations highly concord with in situ chl_a measurements, where the maximum of the variation was also observed for FC 2 of Champlain Lake and the minimum of the variation was observed for FC 2 of Magog Lake ( Figure 3).   The imaging engine used in this project is composed of a hexacopter drone loaded with hyperspectral cameras: Pika II (covering wavelengths from 400 to 900 nm) and Pika NIR (covering from 900 to 1700 nm). The spectral resolution of the Pika II is 3 nm, whilst it is 6 nm for the Pika NIR. Since the payload of the hexacopter drone is about 10 kg, it was necessary to make two overflights (a first with Pika II and a second with the Pika NIR) for each field campaign. The hexacopter drone was placed at 100 m altitude allowing image recording with a spatial resolution of 3 cm. However, since the goal is to develop a model applicable to Sentinel-2 data sensors, during the geo-rectification, the outputs were up-scaled to 20 m spatial resolution.
The Pika Hyperspectral sensor initially records the sunlight radiance in digital numbers (DN). Equation (1) was thus used to convert the DN to spectral radiance.
Spectral radiance = Gain × DN + BIAS (1) Figure 5 shows results of the radiances collected over the nine sampling points of the three lakes in each FC. Each plot illustrates the average ± the standard deviation of the radiance recorded over in situ measurements. As can be seen, the largest variation is observed for FC 2 of Champlain Lake and the lowest is observed for FC 2 of Magog Lake. These observations highly concord with in situ chl_a measurements, where the maximum of the variation was also observed for FC 2 of Champlain Lake and the minimum of the variation was observed for FC 2 of Magog Lake ( Figure 3). Hyperspectral radiance collected over the in situ measurements. Rad, Avrg, and Std refer, respectively, to radiance, average, and standard deviation. FC1 and FC2 refer, respectively, to field campaigns 1 and 2.

Sentinel-2 Bands Simulation
The simulation of Sentinel-2 bands was performed based on its Spectral response curves (SRC) using Equation (2), inspired by the work of Blonski, et al. [34]. These SRC store information about the measured spectral responses and about the spectral width of each band. Since Sentinel-2A and -2B are twin sensors, their SRC are almost identical. Figure 6 shows the results of the simulation. The black lines represent the hyperspectral radiance recorded by the drone, the red dots represent the simulated radiance for each band using its corresponding SRC, and the colored lines are the plots of the Sentinel-2 SRC. It is important to highlight that the plotted hyperspectral radiances and their simulations are the averages picked up over the in situ measurements for each FC. From Figure 6, it is clear that Sentinel-2 bands are simulated without any loss of information, as all the simulations (red dots) are located on the initially signal recorded (black lines).
where: _ is the simulated radiance of band i; ℎ _ is the hyperspectral radiance of band i; SRC is the curve spectral response of band i, where ∑ SRC = 1. Figure 5. Hyperspectral radiance collected over the in situ measurements. Rad, Avrg, and Std refer, respectively, to radiance, average, and standard deviation. FC1 and FC2 refer, respectively, to field campaigns 1 and 2.

Sentinel-2 Bands Simulation
The simulation of Sentinel-2 bands was performed based on its Spectral response curves (SRC) using Equation (2), inspired by the work of Blonski, et al. [34]. These SRC store information about the measured spectral responses and about the spectral width of each band. Since Sentinel-2A and -2B are twin sensors, their SRC are almost identical. Figure 6 shows the results of the simulation. The black lines represent the hyperspectral radiance recorded by the drone, the red dots represent the simulated radiance for each band using its corresponding SRC, and the colored lines are the plots of the Sentinel-2 SRC. It is important to highlight that the plotted hyperspectral radiances and their simulations are the averages picked up over the in situ measurements for each FC. From Figure 6, it is clear that Sentinel-2 bands are simulated without any loss of information, as all the simulations (red dots) are located on the initially signal recorded (black lines). where: RAD_sim i is the simulated radiance of band i; hyp_RAD i is the hyperspectral radiance of band i; SRC i is the curve spectral response of band i, where ∑ SRC i = 1.  Figure 6. Averages of the hyperspectral radiance (black lines) and the multispectral radiance simulated (red dots) for the 13 Sentinel-2 bands in the function of their Spectral response curves (SRC). Figure 6. Averages of the hyperspectral radiance (black lines) and the multispectral radiance simulated (red dots) for the 13 Sentinel-2 bands in the function of their Spectral response curves (SRC).

Simulated Bands Reflectance
In remote sensing applications, particularly in a multi-temporal context, it is better to use the reflectance instead of the radiance. In fact, the radiance is the spectral flux that reaches the satellite sensor determined both per unit of area and solid angle, whilst the reflectance is equal to the radiance divided by the irradiance and by other parameters for geometric considerations (Equation (3)). This eliminates the dependency on the instrument observing it or on the irradiance received. It is an intrinsic property of the surface, which only depends on illumination and observation angles. Thus, several advantages can be attributed to the use of the reflectance: Its values are usually comprised between 0 and 1; It is easy to compare the reflectance from one spectral band to another; It is possible to compare the reflectance directly at different times of the year or of the day, since it is already corrected for variations of earth-sun distance and solar zenith angle (https://labo.obs-mip.fr/multitemp/les-grandeurs-radiometriqueseclairement-luminance-reflectance/, accessed on 4 February 2021).
Equation (3), known as the COST algorithm (Cosine of the sun zenith angle), inspired by the work of Moran, et al. [35], was used to estimate the reflectance of the simulated bands. It is an enhanced version of the DOS algorithm (Dark object subtraction). Indeed, in addition to the additive effect correction, such as the one used for the DOS algorithm, COST uses an additional correction for transmittance along the path from the ground toward the sensor (TAUz). The correction of this module, which is part of the multiplicative effect of the atmosphere, is made by the computation of the cosine of the solar zenith angle. According to Chavez Jr [36], the cosine of the solar zenith angle is a good approximation of TAUz.
where: L λ is spectral radiance at the sensor's aperture; L haze is the upwelling atmospheric spectral radiance scattered in the direction of the sensor entrance pupil and within the sensor's field of view W × m −2 × sr −1 × µ − 1 ; d is the earth-drone distance; ESUN λ is the mean solar exo-atmospheric irradiance; θ λ is the approximation of the atmospheric transmittance terms; θ S is the solar zenith angle in degrees. Figure 7 shows the results of the reflectance computed using the COST algorithm. The results presented here are averages of the reflectance picked up over the sampling points with their corresponding in situ chl_a measurements. It is important here to point out that the reflectance obtained are typical to case-2 water bodies, as the ratio of blue to green bands is clearly lower than 1.0 [37]. In addition, it reveals a relatively important signal backscattering, in the red-edge part of the spectrum, that is only perceived for high values of chl_a (Champlain Lake). This behavior is typical to water highly ladened with chl_a, which is the case of the Champlain Lake (>25 µg chl_a L −1 ).

In Situ Measurements
The Voluntary lakes monitoring network (VLMN) is a multiple partnership organization (environmental ministry, municipalities, university researchers, and water associations) that aims to: 1) acquire data to establish the trophic level of a large number of lakes and follow their evolution over time; 2) detect lakes showing signs of eutrophication and degradation; 3) educate, raise awareness, support, and inform local residents' associations and other participants; and 4) draw a global portrait of the situation of resort lakes in Quebec. The VLMN has been operational since 2004, and since then, it has collected several water quality parameters (chl_a, water transparency, phosphorus, dissolved organic carbon, etc.), each year, over many inland freshwater bodies in Quebec. These data were used to validate the developed EBS using images acquired by the Sentinel-2A and -2B sensors.
A database of approximately 1891 samples was collected by the VLMN between the years 2015 and 2017. However, this number has been reduced to 94, spread over 89 inland waterbodies, due to three main reasons: 1) the number of cross-over dates (sampling and passage of satellite) was very low; 2) samples outside the Sentinel-2A and -2B tiles; and 3) the presence of haze or cloud. Table 2 summarizes the results of chl_a concentrations that have been used to test the performance of the EBS using a completely blind database. For 2015, it was not possible to cross-over any dates; thereby data collected from this year were not used in this validation process. The minimum measured value was 0.7 µg chl_a L −1 and the maximum was 41.8 µg chl_a L −1 . Unlike the calibration database, most of the validation chl_a concentrations are centered at around 5 µg chl_a L −1 (Figure 8), meaning that most of the tested waterbodies belong to the oligotrophic level.

In Situ Measurements
The Voluntary lakes monitoring network (VLMN) is a multiple partnership organization (environmental ministry, municipalities, university researchers, and water associations) that aims to: (1) acquire data to establish the trophic level of a large number of lakes and follow their evolution over time; (2) detect lakes showing signs of eutrophication and degradation; (3) educate, raise awareness, support, and inform local residents' associations and other participants; and (4) draw a global portrait of the situation of resort lakes in Quebec. The VLMN has been operational since 2004, and since then, it has collected several water quality parameters (chl_a, water transparency, phosphorus, dissolved organic carbon, etc.), each year, over many inland freshwater bodies in Quebec. These data were used to validate the developed EBS using images acquired by the Sentinel-2A and -2B sensors.
A database of approximately 1891 samples was collected by the VLMN between the years 2015 and 2017. However, this number has been reduced to 94, spread over 89 inland waterbodies, due to three main reasons: 1) the number of cross-over dates (sampling and passage of satellite) was very low; 2) samples outside the Sentinel-2A and -2B tiles; and 3) the presence of haze or cloud. Table 2 summarizes the results of chl_a concentrations that have been used to test the performance of the EBS using a completely blind database. For 2015, it was not possible to cross-over any dates; thereby data collected from this year were not used in this validation process. The minimum measured value was 0.7 µg chl_a L −1 and the maximum was 41.8 µg chl_a L −1 . Unlike the calibration database, most of the validation chl_a concentrations are centered at around 5 µg chl_a L −1 (Figure 8), meaning that most of the tested waterbodies belong to the oligotrophic level.

Remote Sensing Data
The remotely sensed data used in this project were obtained from Sentinel-2 Level 1 available on the Copernicus server. The Sentinel-2 sensor is boarded on two platforms (A and B) belonging to the European Space Agency. The Sentinel-2A was launched first (June 23, 2015). Two years later, the Sentinel-2B followed it (March 7, 2017). It records across a quite wide range of the spectrum covering 13 spectral bands (443-2190 nm), with a swath width of 290 km, and spatial resolutions of 10 m (for the visible bands), 20 m (red edge to shortwave of infrared (SWIR)), and 60 m (coastal blue and the atmospheric correction bands). For this study, only cross-over dates of Sentinel-2A and -2B passage to the VLMN in situ measurements were downloaded and pre-processed (Table 2). In order to increase the number of Sentinel-2 bands, the 10 m spatial resolution bands were upscaled to 20 m for a total of 9 bands covering parts from the blue (497 nm) to SWIR (1610 nm).

. Atmospheric Correction of Sentinel-2 Images
Sentinel-2 Level 1 is radiometrically corrected and offers the Top of Atmosphere (ToA) reflectance. This product is, however, still affected by atmospheric effects and needs to be corrected. The Sentinel-2 bands were corrected, to the Bottom of Atmosphere (BoA) reflectance, using an adapted algorithm of the COST model. In fact, contrariwise to the simulated bands, where the term of Equation (3) was insignificant (because images were recorded at 100 m altitude), this term is very significant for the Sentinel-2 bands and it has been necessary to correct it. The correction of this term is usually performed using the dark object subtraction technique [36]. However, when this technique was tested, it was perceived that the corrected reflectance of Sentinel-2 images was still influenced by the atmosphere ( Figure 9A); the magnitude of the COST reflectance is significantly higher from those of the drone image ( ) ( Figure 9A plot)).

Remote Sensing Data
The remotely sensed data used in this project were obtained from Sentinel-2 Level 1 available on the Copernicus server. The Sentinel-2 sensor is boarded on two platforms (A and B) belonging to the European Space Agency. The Sentinel-2A was launched first (23 June 2015). Two years later, the Sentinel-2B followed it (7 March 2017). It records across a quite wide range of the spectrum covering 13 spectral bands (443-2190 nm), with a swath width of 290 km, and spatial resolutions of 10 m (for the visible bands), 20 m (red edge to shortwave of infrared (SWIR)), and 60 m (coastal blue and the atmospheric correction bands). For this study, only cross-over dates of Sentinel-2A and -2B passage to the VLMN in situ measurements were downloaded and pre-processed (Table 2). In order to increase the number of Sentinel-2 bands, the 10 m spatial resolution bands were upscaled to 20 m for a total of 9 bands covering parts from the blue (497 nm) to SWIR (1610 nm).

Atmospheric Correction of Sentinel-2 Images
Sentinel-2 Level 1 is radiometrically corrected and offers the Top of Atmosphere (ToA) reflectance. This product is, however, still affected by atmospheric effects and needs to be corrected. The Sentinel-2 bands were corrected, to the Bottom of Atmosphere (BoA) reflectance, using an adapted algorithm of the COST model. In fact, contrariwise to the simulated bands, where the L haze term of Equation (3) was insignificant (because images were recorded at 100 m altitude), this term is very significant for the Sentinel-2 bands and it has been necessary to correct it. The correction of this term is usually performed using the dark object subtraction technique [36]. However, when this technique was tested, it was perceived that the corrected reflectance of Sentinel-2 images was still influenced by the atmosphere ( Figure 9A); the magnitude of the COST reflectance is significantly higher from those of the drone image (Img Drone ) ( Figure 9A plot)).  Reflectance at the ToA and at the BoA using COST and of a Sentinel-2 image (11 August 2017) are compared to reflectance of (14th August 2017) of Magog Lake and results are presented in Figure 9A. Results demonstrate that reflectance using the approximates the best the reflectance of the and are confirmed by the plot in Figure 9B, where the R 2 = 1.0 ( and (a modified version of COST) reflectance are so close that they are superimposed in the Figure 9A plot). In fact, identifying the dark pixel using the standard COST model has some issues in an automated mode. In this study, a novel technique was developed using the radiance of the in order to fix this issue. Indeed, the most important term to be estimated in Equation (3) is the . To reach a good estimation of the BoA reflectance, this term has to be the closest to the radiance of oligotrophic water bodies, since the absorption of the sunlight in these kinds of waterbodies is very high and, consequently, they are good samples of the dark pixels ground entities.
As per in situ measurements results (Table 1), Lake Magog can be classified as an oligotrophic water body. The radiance averages of each simulated band of this lake were used as a reference radiance to corrected Sentinel-2 images for the atmosphere effects, as presented in the Figure 10. , in this figure, correspond to the radiance of the dark object (i.e., the minimal radiance corresponding to 1% reflectance) of the Sentinel-2 image that is used to compute the . The was simply the difference between the and . Thus, the correction of the was based, this time, on a reference that is known to be a "real" dark object.
Computing the for each band of the Sentinel-2 image using the reference radiances yielded good results when applied manually (at a local-scale). On the other hand, at a regional-scale, the results significantly dropped. In fact, the main issue with this technique is that, from one band to another, the of the dark pixel changes, and consequently, the intrinsic relationships between Sentinel-2 bands were destroyed. To fix this issue, once the Sentinel-2 dark pixel is identified for the blue band (since it is the most affected by the atmosphere), radiance coefficients are computed based on the radiance value of this pixel. These radiance coefficients are equal to the ratio of each band to the blue (Equation (4)). Thereby, the intrinsic behavior of the dark pixel is kept unchanged. Contrariwise to the first technique, solely the blue band radiance is used as a reference to correct for the (s) of the rest of the Sentinel-2 bands. Thus, by adding these modifications to Equation (3), it took the form of Equation (5). Reflectance at the ToA and at the BoA using COST and COST Mod of a Sentinel-2 image (11 August 2017) are compared to reflectance of Img Drone (14th August 2017) of Magog Lake and results are presented in Figure 9A. Results demonstrate that reflectance using the COST Mod approximates the best the reflectance of the Img Drone and are confirmed by the plot in Figure 9B, where the R 2 = 1.0 (Img Drone and COST Mod (a modified version of COST) reflectance are so close that they are superimposed in the Figure 9A plot). In fact, identifying the dark pixel using the standard COST model has some issues in an automated mode. In this study, a novel technique was developed using the radiance of the Img Drone in order to fix this issue. Indeed, the most important term to be estimated in Equation (3) is the L haze . To reach a good estimation of the BoA reflectance, this term has to be the closest to the radiance of oligotrophic water bodies, since the absorption of the sunlight in these kinds of waterbodies is very high and, consequently, they are good samples of the dark pixels ground entities.
As per in situ measurements results (Table 1), Lake Magog can be classified as an oligotrophic water body. The radiance averages of each simulated band of this lake were used as a reference radiance to corrected Sentinel-2 images for the atmosphere effects, as presented in the Figure 10. L λ , in this figure, correspond to the radiance of the dark object (i.e., the minimal radiance corresponding to 1% reflectance) of the Sentinel-2 image that is used to compute the L haze . The L haze was simply the difference between the L λ and L Re f . Thus, the correction of the L haze was based, this time, on a reference that is known to be a "real" dark object.
Computing the L haze for each band of the Sentinel-2 image using the reference radiances yielded good results when applied manually (at a local-scale). On the other hand, at a regional-scale, the results significantly dropped. In fact, the main issue with this technique is that, from one band to another, the L λ of the dark pixel changes, and consequently, the intrinsic relationships between Sentinel-2 bands were destroyed. To fix this issue, once the Sentinel-2 dark pixel is identified for the blue band (since it is the most affected by the atmosphere), radiance coefficients are computed based on the radiance value of this pixel. These radiance coefficients are equal to the ratio of each band to the blue (Equation (4)). Thereby, the intrinsic behavior of the dark pixel is kept unchanged. Contrariwise to the first technique, solely the blue band radiance is used as a reference to correct for the L haze (s) of the rest of the Sentinel-2 bands. Thus, by adding these modifications to Equation (3), it took the form of Equation (5). where: L(λ) DP is the radiance of the dark pixel for a given band (λ); L(blue) DP is the radiance of the dark pixel blue band.
where: L λ is spectral radiance at the sensor's aperture; 0.0023 is the averaged radiance recorded at the camera boarded on the drone; Coe f λ is the correction coefficients as described in Equation (4); d is the earth-sun distance in astronomical units; ESUN λ is the mean solar exo-atmospheric irradiance; θ λ is the approximation of the atmospheric transmittance terms; θ S is the solar zenith angle in degrees.
where: ( ) is the radiance of the dark pixel for a given band (λ); ( ) is the radiance of the dark pixel blue band.
where: is spectral radiance at the sensor's aperture; 0.0023 is the averaged radiance recorded at the camera boarded on the drone; is the correction coefficients as described in Equation (4); is the earth-sun distance in astronomical units; is the mean solar exo-atmospheric irradiance; is the approximation of the atmospheric transmittance terms; is the solar zenith angle in degrees. The spectral reflectance presented in the red frame of Figure 11 illustrates the result of the algorithm based only on the blue band radiance as a reference, while the spectral reflectance presented at the blue frame illustrates the result of the algorithm based on all bands radiance as a reference. It is clear that the reflectance along the spectrum (from the blue to NIR) picked up from the image corrected with the blue band radiance-based is closer to the known case-1 water spectral behavior. On the other hand, correcting independently the for each Sentinel-2 band leads to destroying of the intrinsic physical relationships along the spectrum within the image. This fact is more tangible in the case of waters since the returned signal is initially too small and any slight over-or under-estimation of the could change the initial spectral behavior, as shown in the blue frame of Figure 11. The spectral reflectance presented in the red frame of Figure 11 illustrates the result of the COST Mod algorithm based only on the blue band radiance as a reference, while the spectral reflectance presented at the blue frame illustrates the result of the COST Mod algorithm based on all bands radiance as a reference. It is clear that the reflectance along the spectrum (from the blue to NIR) picked up from the image corrected with the COST Mod blue band radiance-based is closer to the known case-1 water spectral behavior. On the other hand, correcting independently the L haze for each Sentinel-2 band leads to destroying of the intrinsic physical relationships along the spectrum within the image. This fact is more tangible in the case of waters since the returned signal is initially too small and any slight over-or under-estimation of the L haze could change the initial spectral behavior, as shown in the blue frame of Figure 11. Remote Sens. 2021, 13, x FOR PEER REVIEW 14 of 28 Figure 11. Comparison of spectral reflectance of Sentinel-2 bands picked up over a water pixel (yellow star) using: the modified version of COST blue band radiance-based (blue frame) and the modified version of COST all bands radiancebased (red frame).

Methodological Approach
The flowchart presented in Figure 12 summarizes the methodological approach of this study. Two main steps were followed during this process: calibration, where two field campaigns were performed over three lakes. This step aims to collect in situ water samples and to record hyperspectral images using a drone, simultaneously. Water samples were analyzed in the laboratory for chl_a and other water properties, whilst drone images underwent several pre-processing phases:  Geometric and radiometric (Equation (1)) corrections;  Upscaling to 20 m spatial resolution (equal to that of Sentinel-2);  Sentinel-2 spectral bands simulation (Equation (2));  Simulated bands reflectance computation (Equation (3)).
Once these steps were finalized, it was a question to calibrate an ensemble-based system (EBS) to estimate chl_a concentration and to evaluate its performance locally using the Leave-one-out cross-validation (LOOCV) technique (EBS development will be more detailed in the next section). Once the EBS reached an acceptable rate of errors, this closes the first step of the methodological approach. The second part focused on validating and testing the robustness of the EBS at a regional scale. Thus, the EBS was challenged over several inland waterbodies belonging to different geo-environmental regions ( Figure 1). Hence, it was a question of collecting chl_a in situ measurements from several environmental organizations, particularly the VLMN, qualified here as a blind database. Cross-over dates of Sentinel-2 images to in situ samples were downloaded and pre-processed to the BoA reflectance (Equation (5)). Sentinel-2-based chl_a concentration products were, as a final step, compared to in situ chl_a collected by the VLMN. Figure 11. Comparison of spectral reflectance of Sentinel-2 bands picked up over a water pixel (yellow star) using: the modified version of COST blue band radiance-based (blue frame) and the modified version of COST all bands radiance-based (red frame).

Methodological Approach
The flowchart presented in Figure 12 summarizes the methodological approach of this study. Two main steps were followed during this process: calibration, where two field campaigns were performed over three lakes. This step aims to collect in situ water samples and to record hyperspectral images using a drone, simultaneously. Water samples were analyzed in the laboratory for chl_a and other water properties, whilst drone images underwent several pre-processing phases: Geometric and radiometric (Equation (1)) corrections; Upscaling to 20 m spatial resolution (equal to that of Sentinel-2); Sentinel-2 spectral bands simulation (Equation (2)); Simulated bands reflectance computation (Equation (3)).
Once these steps were finalized, it was a question to calibrate an ensemble-based system (EBS) to estimate chl_a concentration and to evaluate its performance locally using the Leave-one-out cross-validation (LOOCV) technique (EBS development will be more detailed in the next section). Once the EBS reached an acceptable rate of errors, this closes the first step of the methodological approach. The second part focused on validating and testing the robustness of the EBS at a regional scale. Thus, the EBS was challenged over several inland waterbodies belonging to different geo-environmental regions ( Figure 1). Hence, it was a question of collecting chl_a in situ measurements from several environmental organizations, particularly the VLMN, qualified here as a blind database. Cross-over dates of Sentinel-2 images to in situ samples were downloaded and pre-processed to the BoA reflectance (Equation (5)). Sentinel-2-based chl_a concentration products were, as a final step, compared to in situ chl_a collected by the VLMN. ens. 2021, 13, x FOR PEER REVIEW 15 of 28

Ensemble-Based System
Seeking additional opinions before making a decision is part of human nature. The general concept behind EBS algorithms is based on this principle. In fact, an EBS is composed of several individual elements (classifiers and/or estimators). Combining the decision of these individual elements decreases modeling errors in contrast to an estimation based on a unique model that can be erroneous and consequently, increases the error [23]. Two main keys are required for building a robust EBS: 1) reaching the highest possible diversity between its individual elements and 2) finding the best combination rule of individual elements so that correct decisions are amplified and the incorrect ones are canceled out [23]. This technique is principally used in sensitive fields such as medical care [38][39][40], financial businesses [41][42][43], and recently was used to remote estimate chl_a using MODIS data [15].
On the other hand, the WHO has established the value of 10 µg chl_a L −1 as the first alert level to freshwater bodies use for recreational activities [44]. In this context, many studies have demonstrated that the returned signature from oligotrophic waterbodies is different compared to that returned from mesotrophic or eutrophic waterbodies [45][46][47], meaning that sensitive spectral regions to water bodies highly laden in chl_a are not necessarily the same for water bodies moderately or poorly laden in chl_a concentrations. As a consequence, using a specific model to each trophic level could enhance the quality of chl_a estimates than using a unique model for all trophic levels [48]. Taking into account the above statements, it was decided to develop a hybrid EBS composed of an Ensemblebased classifier (EBC) and an Ensemble-based estimator (EBE).

Ensemble-Based System
Seeking additional opinions before making a decision is part of human nature. The general concept behind EBS algorithms is based on this principle. In fact, an EBS is composed of several individual elements (classifiers and/or estimators). Combining the decision of these individual elements decreases modeling errors in contrast to an estimation based on a unique model that can be erroneous and consequently, increases the error [23]. Two main keys are required for building a robust EBS: 1) reaching the highest possible diversity between its individual elements and 2) finding the best combination rule of individual elements so that correct decisions are amplified and the incorrect ones are canceled out [23]. This technique is principally used in sensitive fields such as medical care [38][39][40], financial businesses [41][42][43], and recently was used to remote estimate chl_a using MODIS data [15].
On the other hand, the WHO has established the value of 10 µg chl_a L −1 as the first alert level to freshwater bodies use for recreational activities [44]. In this context, many studies have demonstrated that the returned signature from oligotrophic waterbodies is different compared to that returned from mesotrophic or eutrophic waterbodies [45][46][47], meaning that sensitive spectral regions to water bodies highly laden in chl_a are not necessarily the same for water bodies moderately or poorly laden in chl_a concentrations. As a consequence, using a specific model to each trophic level could enhance the quality of chl_a estimates than using a unique model for all trophic levels [48]. Taking into account the above statements, it was decided to develop a hybrid EBS composed of an Ensemble-based classifier (EBC) and an Ensemble-based estimator (EBE).

Statistical Indices for the Ensemble-Based System Assessment
Performance evaluation of the model in both cases (using the LOOCV technique or using the blind dataset) was performed using five statistical indices (Coefficient of determination (R 2 ), bias, Root mean squared errors (RMSE), the Nash criterion (Nash), and the relative errors). The Nash criterion evaluates the performance by comparing the estimated values to the in situ measurements average, producing a result that ranges between −∞ and 1.0 (inclusive). A negative Nash means that it would be better to use the in situ measurements' average than the model estimates, whereas values between 0.0 and 1.0 are generally viewed as acceptable levels of performance, and model performance is satisfactory for values higher than 0.8; the model is perfect when Nash = 1.0. The mathematical Equations (6)-(10) of the indices are as follows: where: n is the database size, M and Es are measured and estimated values, and M and Es are averages of measured and estimated values.

The Ensemble-Based Classifier Development
Two levels in chl_a loads (> 10 µg chl_a L −1 representing mesotrophic and eutrophic waterbodies and < 10 µg chl_a L −1 representing oligotrophic waterbodies) were used in this study. For discrimination purposes, the Classification and regression tree (CART) algorithm was applied [49]. The CART algorithm has selected R λ 4 − R λ 5 as the best discrimination index. However, such kinds of algorithms are known to be local and unstable [50]. In fact, the proposed decisions tree by the CART is optimal for the used training database, but not unique. A simple modification within this database can lead to radical changes in the decisions tree. An interesting way to control this uncertainty is to quantify it and to take it into consideration when estimating chl_a concentrations. The control of the CART uncertainty was possible by using the bagging algorithm (n-resampling with replacement; n bag was set to 25,000 iterations in this study) based on the preselected discrimination index. This allowed a random vector (v) to be generated composed by thousands of thresholds.
Normally, all these thresholds generated should be implied in the classification process, but this will surely demand a huge running time as images are composed of hundreds of thousands of pixels. Thus, the Gaussian quadrature method [51] was used for optimizing the runtime. The purpose of this method is to summarize a given distribution in some optimal points (referred to here as optimal thresholds) of this distribution that approximates it the best. Thereby, based on the random vector, it was possible to establish a probability distribution characterized by a mean (µ) and a variance (σ). These two statistical moments have served after that to develop the EBC (beige frame in Figure 13) composed of three (arbitrary choice) Optimal thresholds (Opt Thr ) that are the Lower, Upper, and Nominal thresholds, based on the Gaussian quadrature method, using the following equations: where µ(v) and σ(v) are the mean and variance of the random vector, respectively.

The Ensemble-Based Estimator Development
Based on the EBC results, it was possible to develop an EBE. Indeed, by means of the Nominal threshold (Nominal Thr ), Lower threshold (Lower Thr ), and Upper threshold (U pper Thr ), it is possible to subdivide the calibration database into six (3Opt Thr × 2Classes) sub-classes and four modeling spaces ( Figure 13). Each sub-class is composed of a dataset of chl_a measurements that are used to train a specific estimator (known as expert in the machine learning field). The subdivision was made as follows. Three datasets specific to oligotrophic waterbodies are composed of chl_a samples where the values of R λ 4 − R λ 5 (the discrimination index) are, respectively, inferior to the Nominal Thr , Lower Thr , and U pper Thr (blue arrows in Figure 13), referred as Exp OT ↓ i in this study. Additionally, three datasets specific to mesotrophic and eutrophic waterbodies are composed of chl_a samples where the values of R λ 4 − R λ 5 are, respectively, superior to the Nominal Thr , Lower Thr , and U pper Thr (red arrows in Figure 13), referred to as Exp OT ↑ i . These sets of experts are the individual elements of the EBE (blue frame in Figure 13). It is, however, important to point out that it was not possible to train six different experts in this study. As shown in Figure 13, there are no chl_a samples located in the transition zone (area between dashed green (Lower Thr ) and blue (U pper Thr ) lines). As a consequence, it was possible to train only a unique expert for each class instead of three each (calibration plots in Figure 13).

The Hybrid Ensemble-Based System for Chlorophyll-a Modeling
Estimation of the chl_a using this hybrid EBS is made on two steps: (1) determination of the Modeling space (MS) by means of the EBC and (2) the estimation of the chl_a concentration using the corresponding expert to the pre-identified MS. For the proposed approach, 4 MSs are identified as illustrated in the blue frame of Figure 13. The estimation of chl_a using this approach is weighted average-based. Weighting coefficients (ω), 1/6 for the Lower Thr and U pper Thr and 2/3 for the Nominal Thr , are computed from the Gaussian quadrature method, as demonstrated by Tørvi and Hertzberg [51]. Experts' weights of chl_a estimates change depending on the MS. However, the sum of the used weights in each case is always equal to 1.0. Thus, chl_a estimation using the ESB is performed as it is described in Figure 14. Even if it was not possible to train three different experts per sub-class, as is supposed to be the case, we have deliberately detailed the calculation process, as if this were the case, for those interested for their future work. Remote Sens. 2021, 13, x FOR PEER REVIEW 19 of 28 1 Figure 14. Schematic algorithm of chl_a estimation by using the ensemble-based system.

Calibration of the Experts
Once the calibration database was subdivided, it was a question training the different experts. To do this, all possible combinations of two-and three-band normalized indices were computed, for each dataset, and integrated in a stepwise regression. For mesotrophic and eutrophic waterbodies, three spectral indices were selected by the stepwise as the best estimator to retrieve chl_a and were able to explain 93% (Figure 13) of its variance (Equation (14)). For oligotrophic waterbodies, solely a unique spectral index was selected by the stepwise as the best estimator to retrieve chl_a and was able to explain 47% (Figure 13) of its variance (Equation (15)). As stated above, it was only possible to calibrate two experts in this work. This is due to a lack of moderate chl_a values in the initial database, notably between 5 and 15 µg chl_a L −1 (Figure 3). Since the discrimination threshold used here was 10 µg chl_a L −1 (as proposed by the WHO), there was a significant gap in the classification results within the transition area between the two classes (dashed lines in Figure 13) and consequently, only two datasets were set instead of six.
Based on the stepwise regression results, the expert designed to estimate chl_a of mesotrophic and eutrophic waterbodies is trained by six different spectral regions ranging from the blue to NIR. This spectral richness led to the development of a strong function of calibration in these blooming conditions. The green, red, and NIR parts of the spectrum have already been used, by many researchers, to estimate the chl_a in mesotrophic and eutrophic water bodies [52,53]. The contribution of the blue band in retrieving chl_a in these blooming conditions was not expected. However, it is important to take into consideration that the blue band was used in a three-band normalized index including the red and NIR bands. This combination of bands has, somehow, highlighted a significant relationship between this spectral index and the chl_a. On the other hand, the expert designed to estimate chl_a of oligotrophic waterbodies is trained by a simple function using a two-band normalized index. The used bands are located in the green and red parts of the spectrum. This result is coherent with other work results that have related the estimation of chl_a to its bio-optical activity that is more correlated to the visible parts of the spectrum [54].

Cross-Validation: Local Evaluation
In order to evaluate the local performance of the developed approach, the crossvalidation technique was used. This technique consists of removing a sample (or a set of samples (known as k-fold cross-validation)) from the calibration database, to use the remaining samples to train a new model, and to estimate the chl_a concentration of the removed sample with the trained new model (known as LOOCV). This exercise is re-done for the whole calibration database. It is, however, important to underline that this technique was adapted for the EBS. In fact, each removed sample was first classified by means of EBC. Thereafter, the estimation of the chl_a concentration was done using the corresponding EBE-expert to the classification result. With this technique, it was possible to compare observed to estimated chl_a concentrations using the pre-mentioned statistical indices (Equations (6)-(10)). Figure 15 shows the LOOCV result of chl_a estimates. It is important to point out that the above processing and results are based only on the simulated Sentinel-2 reflectance recorded by the drone. The R 2 and Nash of 0.94% underscore the robustness of the EBS. The accuracy and exactitude of the EBS can be considered as high (RMSE = 5.6 µg chl_a L −1 ) to very high (bias almost null). These high performances are also supported by the distribution of the dots around Line 1:1 that emphasize the absence of any under-or overestimation, particularly at the extremities ( Figure 15A). The above findings are confirmed by the residuals plot ( Figure 15B) result, where the relative errors range between −0.8% and 0.6% and are distributed around 0%, highlighting the absence of any form of trends. Additionally, it is perceived that the relative error is reduced by almost 60% for values higher than 40 µg chl_a L −1 . This was somewhat expected since the expert designed to estimate chl_a concentration in these blooming conditions was able to explain up to 93% the chl_a variance ( Figure 13). whole calibration database. It is, however, important to underline that this technique was adapted for the EBS. In fact, each removed sample was first classified by means of EBC. Thereafter, the estimation of the chl_a concentration was done using the corresponding EBE-expert to the classification result. With this technique, it was possible to compare observed to estimated chl_a concentrations using the pre-mentioned statistical indices (Equations (6)-(10)). Figure 15 shows the LOOCV result of chl_a estimates. It is important to point out that the above processing and results are based only on the simulated Sentinel-2 reflectance recorded by the drone. The R 2 and Nash of 0.94% underscore the robustness of the EBS. The accuracy and exactitude of the EBS can be considered as high (RMSE = 5.6 µg chl_a L −1 ) to very high (bias almost null). These high performances are also supported by the distribution of the dots around Line 1:1 that emphasize the absence of any under-or overestimation, particularly at the extremities ( Figure 15A). The above findings are confirmed by the residuals plot ( Figure 15B) result, where the relative errors range between −0.8% and 0.6% and are distributed around 0%, highlighting the absence of any form of trends. Additionally, it is perceived that the relative error is reduced by almost 60% for values higher than 40 µg chl_a L −1 . This was somewhat expected since the expert designed to estimate chl_a concentration in these blooming conditions was able to explain up to 93% the chl_a variance ( Figure 13).  Figure 16 shows the results of EBS evaluation using the blind databases. The performance of the EBS can be considered as low (R 2 = 0.31) to very low (Nash = −1.95). However, considering the results of the bias (−1.1 µg chl_a L −1 ) and the RMSE (8.8 µg chl_a L −1 ), it can be concluded that in most cases, the estimation of chl_a is accurate and exact ( Figure  16A). These conclusions also emerge in the residuals plot where almost all the residues are plotted around 0%, except for some specific cases ( Figure 16B). Even if the Nash is negative, meaning that the average of the measured chl_a concentrations is more accurate than the chl_a modeling using the EBS, the low rate of the RMSE and residuals distribution around 0% with relative errors comprised between −2% and 2%, in most cases, encouraged us to investigate further in our research. In fact, based on the Figure 16B, only four estimates are tainted with a high error (relative error < −3 %). When these last estimates were traced in the blind database, they all belonged to inland waterbodies affected by very high turbidity (dashed arrows and circles in Figure 16). (The fourth sample was on another Lake, to lighten the paper, this last case was not illustrated.)

Validation with the Blind Dataset: Regional Evaluation
When these four samples were removed from the blind database, the results increased significantly (Figure 17). A R 2 of 0.85 and a Nash of 0.79 underline the good performance and the robustness of the EBS to retrieve the chl_a using data recorded from 5.6. Validation with the Blind Dataset: Regional Evaluation Figure 16 shows the results of EBS evaluation using the blind databases. The performance of the EBS can be considered as low (R 2 = 0.31) to very low (Nash = −1.95). However, considering the results of the bias (−1.1 µg chl_a L −1 ) and the RMSE (8.8 µg chl_a L −1 ), it can be concluded that in most cases, the estimation of chl_a is accurate and exact ( Figure 16A). These conclusions also emerge in the residuals plot where almost all the residues are plotted around 0%, except for some specific cases ( Figure 16B). Even if the Nash is negative, meaning that the average of the measured chl_a concentrations is more accurate than the chl_a modeling using the EBS, the low rate of the RMSE and residuals distribution around 0% with relative errors comprised between −2% and 2%, in most cases, encouraged us to investigate further in our research. In fact, based on the Figure 16B, only four estimates are tainted with a high error (relative error < −3%). When these last estimates were traced in the blind database, they all belonged to inland waterbodies affected by very high turbidity (dashed arrows and circles in Figure 16). (The fourth sample was on another Lake, to lighten the paper, this last case was not illustrated.) When these four samples were removed from the blind database, the results increased significantly (Figure 17). A R 2 of 0.85 and a Nash of 0.79 underline the good performance and the robustness of the EBS to retrieve the chl_a using data recorded from Sentinel-2A and -2B sensors. The RMSE dropped to 2.4 µg chl_a L −1 and the bias is null ( Figure 17A). These two last statistical results highlight the high exactitude and accuracy of the EBS estimates. It is interesting to point out that chl_a concentrations of the blind database are very low (97% are inferior to 10 µg chl_a L −1 and 95% are inferior to 5 µg chl_a L −1 (Figure 8)). Chl_a modeling errors are known to be high in such conditions, as the signal-to-noise ratio is often high, because of the low signal magnitude returned by the chl_a that is almost equal to water backscattering and to the retuned signal from other bio-optical components in the water column (total suspended solids, colored dissolved organic matters, etc.). Despite these facts, chl_a modeling error rates using the EBS on Sentinel-2 data remained low with relative errors comprised between −1% and 1% for values higher than 5 µg chl_a L −1 and relative errors between −2.5% and 1% for values lower than 5 µg chl_a L −1 ( Figure 17B). of the EBS estimates. It is interesting to point out that chl_a concentrations of the blind database are very low (97% are inferior to 10 µg chl_a L −1 and 95% are inferior to 5 µg chl_a L −1 (Figure 8)). Chl_a modeling errors are known to be high in such conditions, as the signal-to-noise ratio is often high, because of the low signal magnitude returned by the chl_a that is almost equal to water backscattering and to the retuned signal from other bio-optical components in the water column (total suspended solids, colored dissolved organic matters, etc.). Despite these facts, chl_a modeling error rates using the EBS on Sentinel-2 data remained low with relative errors comprised between −1% and 1% for values higher than 5 µg chl_a L −1 and relative errors between −2.5% and 1% for values lower than 5 µg chl_a L −1 ( Figure 17B).   Figure 18 shows three examples of practical applicability of the EBS on Sentinel-2 images. The blue frame illustrates a zoomed section of Lake Saint-Jean with clear water partially covered by a thin haze. The outputs of chl_a concentrations over this lake are below 5 µg chl_a L −1 even in the hazy section. This clearly demonstrates that the chl_a estimated by the EBS are not affected by the thin haze. The yellow frame illustrates a database are very low (97% are inferior to 10 µg chl_a L −1 and 95% are inferior to 5 µg chl_a L −1 (Figure 8)). Chl_a modeling errors are known to be high in such conditions, as the signal-to-noise ratio is often high, because of the low signal magnitude returned by the chl_a that is almost equal to water backscattering and to the retuned signal from other bio-optical components in the water column (total suspended solids, colored dissolved organic matters, etc.). Despite these facts, chl_a modeling error rates using the EBS on Sentinel-2 data remained low with relative errors comprised between −1% and 1% for values higher than 5 µg chl_a L −1 and relative errors between −2.5% and 1% for values lower than 5 µg chl_a L −1 ( Figure 17B).   Figure 18 shows three examples of practical applicability of the EBS on Sentinel-2 images. The blue frame illustrates a zoomed section of Lake Saint-Jean with clear water partially covered by a thin haze. The outputs of chl_a concentrations over this lake are below 5 µg chl_a L −1 even in the hazy section. This clearly demonstrates that the chl_a estimated by the EBS are not affected by the thin haze. The yellow frame illustrates a  Figure 18 shows three examples of practical applicability of the EBS on Sentinel-2 images. The blue frame illustrates a zoomed section of Lake Saint-Jean with clear water partially covered by a thin haze. The outputs of chl_a concentrations over this lake are below 5 µg chl_a L −1 even in the hazy section. This clearly demonstrates that the chl_a estimated by the EBS are not affected by the thin haze. The yellow frame illustrates a zoomed section of Lake Saint-Pierre. This water body is known by its high loads of suspended solids. In fact, according to the environmental Ministry of Quebec, the annual average suspended solids (https://www.environnement.gouv.qc.ca/eau/Atlas_interactif/ evolution/pdf/05300004_TTEND_MES.pdf, accessed on 4 February 2021) load in Lake Saint-Pierre for the year 2017 was approximately 163 mg/L, whilst the annual average chl_a (https://www.environnement.gouv.qc.ca/eau/Atlas_interactif/evolution/pdf/05 300004_TTEND_CHLO.pdf, accessed on 4 February 2021) was about 5.85 µg/L 1 . Estimates derived from the EBS of chl_a for Lake Saint-Pierre are between 0.1 and 5 µg/L 1 , as shown in Figure 18. This is a good demonstration that the EBS is not affected by high loads of suspended solids in inland waterbodies. Finally, Missisquoi Bay of Lake Champlain experienced a huge harmful algae bloom on October 10, 2017 captured by the Sentinel-2 sensor. Estimates from the derived EBS of chl_a copy the spatial distribution of this bloom well, while in areas where the bloom did not occur, chl_a estimates have remained very low. The three examples above highlight the applicability of the EBS, with a high level of reliability, for monitoring harmful algal blooms using Sentinel-2 data. Lake Saint-Pierre for the year 2017 was approximately 163 mg/L, whilst the annual average chl_a (https://www.environnement.gouv.qc.ca/eau/Atlas_interactif/evolution/pdf/05300004_TTEND_CHLO.pdf, accessed on 4 February 2021) was about 5.85 µg/L 1 . Estimates derived from the EBS of chl_a for Lake Saint-Pierre are between 0.1 and 5 µg/L 1 , as shown in Figure 18. This is a good demonstration that the EBS is not affected by high loads of suspended solids in inland waterbodies. Finally, Missisquoi Bay of Lake Champlain experienced a huge harmful algae bloom on October 10, 2017 captured by the Sentinel-2 sensor. Estimates from the derived EBS of chl_a copy the spatial distribution of this bloom well, while in areas where the bloom did not occur, chl_a estimates have remained very low. The three examples above highlight the applicability of the EBS, with a high level of reliability, for monitoring harmful algal blooms using Sentinel-2 data.

Spatial Distribution Assessment of the Ensemble-Based System
. Figure 18. Spatial distribution of chl_a concentration derived from Sentinel-2 on three different tiles. The yellow frame is a zoomed part of the 18 TXS tile on Lake Saint-Pierre, the red frame is a zoomed part of the 18 TXQ tile on Missisquoi Bay of Lake Champlain, and the blue frame is a zoomed part of the 18 UYU tile on Lake Saint-Jean.

Challenges, Advantages, and Limitations of Chlorophyll-a Monitoring Using UAVs and Satellites Data
The quality of surface water is affected by climate and human activity and affects climate, biological diversity, and human well-being. For the last few decades, efforts by many scientists have been made to monitor water quality, notably chl_a concentration, using remote sensing data. In most cases, the developed models yielded success within the calibration sites, but their estimates were mostly inaccurate outside these sites. The big challenge of this study was thus to achieve acceptable accuracy rates at a local (within the calibration sites) and a regional (outside the calibration sites) scale. This goal was partially accomplished by using one of the most promising techniques in machine learning-the EBS. On the one side, modeling quality, locally and regionally, indeed yielded satisfying rates of accuracy (higher than 85%). On the other side, at both scales, it has been perceived that errors increase with low chl_a concentrations. This fact was more tangible in the case Figure 18. Spatial distribution of chl_a concentration derived from Sentinel-2 on three different tiles. The yellow frame is a zoomed part of the 18 TXS tile on Lake Saint-Pierre, the red frame is a zoomed part of the 18 TXQ tile on Missisquoi Bay of Lake Champlain, and the blue frame is a zoomed part of the 18 UYU tile on Lake Saint-Jean.

Challenges, Advantages, and Limitations of Chlorophyll-a Monitoring Using UAVs and Satellites Data
The quality of surface water is affected by climate and human activity and affects climate, biological diversity, and human well-being. For the last few decades, efforts by many scientists have been made to monitor water quality, notably chl_a concentration, using remote sensing data. In most cases, the developed models yielded success within the calibration sites, but their estimates were mostly inaccurate outside these sites. The big challenge of this study was thus to achieve acceptable accuracy rates at a local (within the calibration sites) and a regional (outside the calibration sites) scale. This goal was partially accomplished by using one of the most promising techniques in machine learning-the EBS. On the one side, modeling quality, locally and regionally, indeed yielded satisfying rates of accuracy (higher than 85%). On the other side, at both scales, it has been perceived that errors increase with low chl_a concentrations. This fact was more tangible in the case of validation with the blind database. Many researchers link this high rate of error to the atmosphere effect and our results support this assumption as the relative errors were lower using data collected by the drone (100 m altitude) than those collected by Sentinel-2 data (786 km altitude). In addition, the atmospheric model used (COST) in this study is based on the assumption of at-water leaving reflectance of dark object being equal to 1%. A new selection method of the dark object, based on the hyperspectral image, was proposed and yielded higher accuracy than the standard one. However, the small image stripe does not cover a large part of the Sentinel-2 images. This fact could be a limiting factor, decreasing the accuracy of the atmospheric correction in regions further away than those covered by the hyperspectral image. More investigations must be conducted, in this field, to achieve more accurate estimates of the BoA at-water reflectance.
The developed hybrid EBS has demonstrated its ability to model chl_a using the drone-simulated Sentinel-2 data and using data recorded by the two Sentinel-2 sensors. This is a double added-value to the EBS use. In fact, it is possible, using the same algorithm, to use two sources of data to monitor algae bloom. Indeed, Sentinel-2 sensor data-based chl_a products could be used, as stated above, on a regional-scale for spatial monitoring of algae blooms as well as for their temporal recurrence. Water managers and stakeholders are constantly looking for such monitoring tools for their decisions-making. On the other hand, UAVs are nowadays affordable and easy-to-use (almost fully autopiloted) more than ever. Furthermore, cameras such as the Tetracam MicroMCA-global shutter model have the ability to be set by personalized wavelength filters that can cover the same exact spectral regions as used to estimate the chl_a using the EBS. Thereby, it is possible to develop a low-cost engine for chl_a retrieval for local and personalized use anytime and anywhere. This offers great flexibility of use (time, space, and sensor characteristics), which could also be of great help to managers and stakeholders in specific crises. Through the approach developed, it is now possible to harness the strength of these two complementary technologies for the benefit of better-quality management of freshwater bodies.
The transposable application capacity (from UAVs towards satellites data and vice versa) could open up new ways of exploiting previous databases (known as Spatial database (SB) in remote sensing) to train models for the future generation of sensors and use their very first data delivered with an acceptable modeling rate without any delay necessary to collect a new SB specific to this sensor. Indeed, thanks to the SB already pre-collected, it would be possible to simulate the bands of the new sensor and to train a new model, based on machine or deep learning algorithms, that will be compatible with the images acquired by this new sensor. Additionally, the performance of already trained and operational models could be enhanced by means of the proposed approach. In fact, since most operational models are data-oriented, retraining them with their original SB merged with an SB built from simulated bands (specific to their bandwidth and spectral regions) will certainly help improve their modeling performance.
However, it is also important to underline the limitations of these technologies. Cloud and haze are the big limiting-factors to use data from satellite sensors such as Sentinel-2. The temporal frequency could also, at some level, be a limiting factor to monitor algae blooms using Sentinel-2 data. In contrast, UAVs flights are more flexible in terms of temporal and space use, but the spatial coverage is very restrained compared to satellite data. Climate conditions, especially wind, are a big limitation for UAV use. In fact, depending on the model, there is always a wind use-threshold. Above that, it is not recommended to takeoff. On the other side, in the case of a cloudy sky, the radiance recorded by the camera boarded on the UAVs is significantly affected by cloud shadow. It is often recommended to record image scenes in a uniform climate condition. With respect to EBS development, the accuracy of the modeling could be higher if more data were available, especially for concentrations between 10 and 20 µg chl_a L −1 . This middle class was underrepresented in the calibration database and led to the training of only one expert, instead of three, in each trophic class. Averaging estimates of three experts would undoubtedly lead to greater accuracy in the modeling of chl_a. Finally, it should be noted that the EBS has shown a failure to estimate chl_a in waters affected by high turbidity. However, it can still be applied in an operational mode, since these waters are known to highly reflect in the red than in the green part of the spectrum. By using this information, it is possible to mask waters highly affected by turbidity before proceeding to chl_a modeling.

Conclusions
A hybrid EBS for chl_a estimates using Sentinel-2 simulated bands was developed and tested on freshwater bodies in southern Quebec using data recorded by Sentienl-2A and 2B sensors. Several innovative aspects were developed in this paper: (1) development of a hybrid EBS composed of EBC and EBE; (2) calibration of EBS using Sentinel-2 simulated bands; (3) testing the EBS using data recorded by Sentinel-2A and 2B; and (4) the use of the Gaussian quadrature function for time running optimization. Combining these aspects has led to the development of a robust hybrid EBS to monitor chl_a concentrations in freshwater bodies at local and regional scales.
Two validation techniques were used to assess the performance of the proposed method: (1) the leave-one-out cross-validation, as a local evaluation of the EBS, and (2) validation with blind datasets (in situ and remotely), as a regional evaluation. The leaveone-out cross-validation results were satisfactory (R 2 = Nash = 0.94). The hybrid EBS showed a relatively negative bias (−0.2 µg chl_a L −1 ) and the errors were relatively low (RMSE = 5.6 µg chl_a L −1 ). Validation with the blind datasets highlighted some failures of the EBS in estimating the chl_a in waters affected by high turbidity. Modeling the chl_a in these ecosystems is highly tainted by errors (Nash < 0). However, when the EBS was tested on waters not affected by high turbidity, the results significantly increased (R 2 = 0.85, Nash = 0.79, RMSE = 2.4 µg chl_a L −1 ). Additionally, it has been perceived that the relative errors were comprised between −2.5% and 1% and have decreased by almost 60% when estimating chl_a values higher than 40 µg chl_a L −1 . This is probably due to the reduced number of calibration data, notably medium values of chl_a, which were under-represented and to the higher signal-to-noise ratio for at-water leaving reflectance in oligotrophic waters.
The proposed method allowed the development of a robust approach to estimate chl_a in freshwaters using either Sentinel-2 data simulated from a hyperspectral camera boarded on a drone or data recorded by the Sentinel-2A and 2B sensors. Assessment of the spatial distribution pointed out that the modelling of chl_a using EBS is not affected by thin haze or suspended solids and is very sensitive to algal blooms. Overall, the performance of the hybrid EBS was satisfactory, except for water affected by high turbidity. Despite this, EBS could still be used in an operational mode, in a complementary way using both technologies, since turbid water can be easily identified and masked before the chl_a modeling. Such a tool could be of great assistance to water managers and stakeholders for better water quality assessment, particularly in areas with abundant lakes like the province of Quebec and Canada. Data Availability Statement: All data collected, pre-processed, processed, or analyzed during this study are included in this work.