AlgaeMAp: Algae Bloom Monitoring Application for Inland Waters in Latin America

: Due to increasing algae bloom occurrence and water degradation on a global scale, there is a demand for water quality monitoring systems based on remote sensing imagery. This paper describes the scientiﬁc, theoretical, and methodological background for creating a cloud-computing interface on Google Earth Engine (GEE) which allows end-users to access algae bloom related products with high spatial (30 m) and temporal (~5 day) resolution. The proposed methodology uses Sentinel-2 images corrected for atmospheric and sun-glint effects to generate an image collection of the Normalized Difference Chlorophyll-a Index (NDCI) for the entire time-series. NDCI is used to estimate both Chl-a concentration, based on a non-linear ﬁtting model, and Trophic State Index (TSI), based on a tree-decision model classiﬁcation into ﬁve classes. Once the Chl-a and TSI algorithms had been calibrated and validated they were implemented in GEE as an Earth Engine App, entitled Algae Bloom Monitoring Application (AlgaeMAp). AlgaeMAp is the ﬁrst online platform built within the GEE platform that offers high spatial resolution of water quality parameters. The App beneﬁts from the huge processing capability of GEE that allows any user with internet access to easily extract detailed spatial (30 m) and long temporal Chl-a and TSI information (from August 2015 and with images every 5 days) throughout the most important reservoirs in the State of S ã o Paulo/Brazil. The application will be adapted to extend to other relevant areas in Latin America.


Introduction
Human activities on a global scale have significantly contributed to the degradation of the water quality of inland aquatic systems by increasing their nutrient levels. Particularly, reservoirs are under high pressure due to the increasing water demand for urban areas, including irrigation, industrial use, and energy production, while still needing to maintain their ecological function [1][2][3]. The quasi-lentic nature of reservoirs leads to a higher phosphorus accumulation, which may trigger phytoplankton production, abundance, and frequency of algae blooms [4]. Algae bloom (AB) is a rapid increase or accumulation in the population of algae, characterized by the blue-green water coloration caused by algae's pigments, that can cause serious consequences to human health and aquatic biogeochemistry due to the production of toxins [5,6]. The increase in aquatic ecosystems' productivity has been monitored by a series of different Trophic State Indexes (TSIs) based on information regarding the nutrient input (generally, phosphorus), water transparency (usually Secchi depth), and chlorophyll concentration (a proxy for phytoplankton biomass) [7]. The advantage of using Chl-a for AB monitoring is that it can be estimated with satellite data and, therefore, used to systematically monitor phytoplankton abundance at a relatively low cost. The interaction between light and photoactive pigments in phytoplankton cells, such as Chlorophyll-a (Chl-a), enables the detection of algae bloom from space-borne imagery [8]. For instance, the Normalized Difference Chlorophyll Index (NDCI) has been used to detect phytoplankton presence by using a ratio of near-infrared (705 nm-maximum phytoplankton reflectance sensitivity) and red bands (665 nm-high Chl-a absorption peak) [9]. In phytoplankton-rich waters around the globe, NDCI has shown to detect a wide Chl-a concentration range when using Sentinel-2 imagery [8,[10][11][12]. Watanabe et al. [8] observed that NDCI performed better than other algorithms (2-band, 3-band, and slope) for Chl-a retrieval in the Tietê cascade system, which is the study area of this research.
The development of algorithms using remote sensing for Chl-a estimation relies largely on in situ Chl-a measurements. Unfortunately, the lack of in situ data, in Latin American countries for example, limits water quality management due to the scarce information on the current status of surface waters. This is because most of the monitoring programs provide water quality data at a low frequency (monthly or bi-monthly). Alternatively, the spatial coverage and increased frequency of image acquisition make satellite remote sensing a key tool for AB monitoring programs. Recently, new processing capabilities have used remote sensing historical information, such as Sentinel-2, to track seasonal patterns and degradation tendencies [4,13,14]. The Google Earth Engine (GEE), for instance, offers high performance cloud computing resources to process large amounts of geospatial datasets [15,16]. According to Hirt et al. [17], monitoring water resources using GEE is promising since it offers access to a large image database of several satellites (Sentinel and Landsat programs) thus permitting temporal analysis over large areas. Recent studies have used GEE to map Chl-a concentration and its temporal dynamics [18][19][20][21], showing its ability to monitor phytoplankton abundance and TSI using satellite images.
To meet the need to incorporate satellite remote sensing into the TSI monitoring system in Latin American (LA) inland waters, several researchers from several countries are developing Algae Bloom (AB) mapping tools under the support of GEE and Earth Observation Data Science (EO) [22]. This research aims to develop a user-friendly application (GEE App) to retrieve algae bloom-related products towards a monitoring system for inland waters in Latin America. More specifically, the objective is to calibrate and validate predictive algorithms for Chlorophyll-a and Trophic State Index classes using both in situ data and Sentinel-2/MSI data available in Google Earth Engine. Once parametrized, these predictive algorithms were implemented in the GEE App to compute the spatial and temporal variation of Chl-a concentration and TSI classes. This paper gives the theoretical and methodological background supporting the tool's development and reports the first results of their application to the monitoring of Tietê River Basin, located in the State of São Paulo, Brazil ( Figure 1).  Table 1), (c) Detail of Guarapiranga and Billings Reservoirs. Table 1. Information about the Chl-a concentration range, date range, and number of water samples used in this research for the Chl-a algorithm and TSI decision tree. The dataset is composed of CETESB sample stations from ID1 to ID 25, and samples acquired during field work conducted on 7 November 2020 (ID [26][27][28][29][30][31]. Geographical coordinates in decimal degrees (Datum: WGS/1984).  Table 1), (c) Detail of Guarapiranga and Billings Reservoirs.

Materials and Methods
The proposed methodology uses Sentinel-2 images corrected for atmospheric and sun-glint effects to generate an image collection of the NDCI [9] for the entire time-series (August 2015 to present) of a given area. NDCI data retrieved from the imagery are then compared with Chl-a measured in situ. NDCI is used to estimate both Chl-a concentration, based on a non-linear fitting model, and TSI, based on a tree-decision model. The TSI tree decision classifies every pixel into five levels of Trophic State Index (Oligo, Meso, Eutrophic, Super, and Hypereutrophic) [23].

Tietê River Basin
The AB monitoring tools were first developed for selected reservoirs of the Tietê River Basin, located in the State of São Paulo (Figure 1), which is home to more than 40 million inhabitants and is responsible for more than 24% of the total water demand in Brazilian urban areas. A variety of reservoirs provides water for the urban areas within the State of São Paulo. The Cantareira system is the most important water supply for the São Paulo capital, while the Tietê Cascade System Reservoir (which comprises six reservoirs in its cascade system) with a downstream length of 1100 km provides water to fishery activities, navigation, hydroelectricity, industrial use, agriculture, and domestic demands for the interior of the State. However, the rapid urbanization and industrialization process has led to high nitrogen and phosphorus concentrations in surface waters and consequently degradation and eutrophication of the water quality [24]. For example, the Billings reservoir, located in the upstream Tietê basin, faces serious water pollution problems due to the increase in urban population without a concurrent increase in sewage or solid waste collection and treatment systems. As a result, high TSI levels were often observed throughout the past years [8] along the Tietê River expressed by Chl-a concentration ranging from 1 to over 500 µg/L, suitable for the proposed application because it covers a large variety of trophic levels, including AB conditions.

Sentinel-2 MSI Imagery Processing
The first step of the AB monitoring tool (Figure 2) is the atmospheric correction since water applications demand reliable surface reflectance measurements. To circumvent the lack of Sentinel 2 Level 2 (surface reflectance) for the entire time series (2015 onwards), the best option is to use the SIAC (Satellite Invariant Atmospheric Correction) implemented on GEE. The SIAC method uses coarse-resolution data derived from MODIS (Moderate Resolution Imaging Spectroradiometer) MCD43A3 datasets (at 500 m of spatial resolution) to describe surface anisotropy and CAMS (Copernicus Atmospheric Monitoring Service) to estimate atmospheric parameters. MODIS data is mapped to TOA (Top-of-Atmosphere) to be compared against data with coarser spatial resolution, and CAMS data is used to obtain atmospheric parameters solving an inversion problem. More detailed information about SIAC and its implementation in GEE can be obtained in Yin et al. [25] and in Song et al. [26]. detailed information about SIAC and its implementation in GEE can be obtained in Yin et al. [25] and in Song et al. [26]. After atmospheric correction, the sun glint effect was corrected by subtracting the B12 (2190 nm) from Bands 2-11 ( Figure 2b). This method assumes that the remaining signal (after atmospheric correction) in the SWIR band is related to the air-water interface specular reflectance (sun and sky glint). The water signal in these wavelengths is assumed to be negligible due to the extremely high water absorption [27,28]. This method was previously used for glint correction in Brazilian waters with successful results [29][30][31][32]. In order to evaluate both atmospheric and sun-glint correction of a Sentinel-2 image, in situ radiometric data were collected in November 2020 in the Billings Reservoir. A detailed description of this evaluation can be found in Appendix A.
The next step is to mask out the pixels with clouds by applying the Sentinel-2: Cloud Probability product available on GEE (Figure 2c). This is a product created by using the Sentinel Hub's cloud detector (s2cloudless python package), which provides automated cloud detection for Sentinel-2 imagery. Only pixels with a cloud probability lower than 10% were included in the following steps.
For the water mask, the Joint Research Centre (JRC) Water Mask product available on GEE was used [33] to include water masses that are permanent all year long to avoid adjacency effects and seasonally flooded wetlands (Figure 2d). After all the pre-processing steps were completed, NDCIsat was calculated (Figure 2e) as follows [9]: For Sentinel-2, the central wavelengths for Surface Reflectance at 705 nm (SR705) and at 665 nm (SR665) correspond to Band 5 (red edge) and Band 4 (red), respectively. The main limitation of using SIAC within GEE is that when the user wants to display the results on the map after SIAC's correction, GEE reaches its maximum computing capacity. This limitation prevents the user from performing long time-series analysis of large water After atmospheric correction, the sun glint effect was corrected by subtracting the B12 (2190 nm) from Bands 2-11 ( Figure 2b). This method assumes that the remaining signal (after atmospheric correction) in the SWIR band is related to the air-water interface specular reflectance (sun and sky glint). The water signal in these wavelengths is assumed to be negligible due to the extremely high water absorption [27,28]. This method was previously used for glint correction in Brazilian waters with successful results [29][30][31][32]. In order to evaluate both atmospheric and sun-glint correction of a Sentinel-2 image, in situ radiometric data were collected in November 2020 in the Billings Reservoir. A detailed description of this evaluation can be found in Appendix A.
The next step is to mask out the pixels with clouds by applying the Sentinel-2: Cloud Probability product available on GEE (Figure 2c). This is a product created by using the Sentinel Hub's cloud detector (s2cloudless python package), which provides automated cloud detection for Sentinel-2 imagery. Only pixels with a cloud probability lower than 10% were included in the following steps.
For the water mask, the Joint Research Centre (JRC) Water Mask product available on GEE was used [33] to include water masses that are permanent all year long to avoid adjacency effects and seasonally flooded wetlands (Figure 2d). After all the pre-processing steps were completed, NDCI sat was calculated (Figure 2e) as follows [9]: NDCI sat = (SR 705 -SR 665 )/(SR 705 + SR 665 ) For Sentinel-2, the central wavelengths for Surface Reflectance at 705 nm (SR 705 ) and at 665 nm (SR 665 ) correspond to Band 5 (red edge) and Band 4 (red), respectively. The main limitation of using SIAC within GEE is that when the user wants to display the results on the map after SIAC's correction, GEE reaches its maximum computing capacity. This limitation prevents the user from performing long time-series analysis of large water bodies. To overcome this issue, a new NDCI sat image collection has been created (resampled to 30 m) and uploaded into the GEE user's asset to be used directly on the GEE platform ( Figure 2f). By doing this, the user can perform spatial and temporal analysis of NDCI sat and derived products over large areas and long periods in a few seconds. Figure 2g,h refers to the calibration and validation of Chl-a and TSI algorithms, respectively, described as follows.

Chl-a In Situ Data
In situ Chlorophyll-a measurements were provided by the São Paulo State Environmental Company (CETESB), which monitors reservoirs bi-monthly at defined sample stations [23]. Considering the limited number of Chlorophyll-a in situ samples available on the same day as the satellite image acquisition, different time windows were tested to gather a larger number of samples (see Appendix B). After testing the algorithm's output varying time windows from 0 to ±3 days, that with ±2 days has shown to be the best balance between the number of samples, accuracy, and precision for both Chl-a estimation and TSI classification. This time window has been frequently reported in the literature for the application of satellite images in lentic aquatic systems [34].
The two-days window matchups correspond to 130 samples of 25 stations ( Figure 1 and Table 1) acquired from August 2015 and November 2020. Those samples are not evenly distributed amongst the in situ stations (IDs) since the frequency of Chl-a measurements varies among them-for example, Billings Reservoir's stations ID21-25 are more likely to fall within the ±2 days window, having a larger number of samples throughout the study period. In addition, some fieldwork was conducted in the Billings Reservoir (in November 2020) during which six Chl-a concentration samples were incorporated to the calibration/validation dataset used to develop the Chl-a model (summing to a total of 136 pairs of Chl-a and NDCI sat samples, Table 1).

Algorithm for Chl-a
All the match-ups between in situ Chl-a data and NDCI sat (N = 136, for a ± 2 days' time window) were used to calibrate the Chl-a algorithm by Monte Carlo (MC) cross validation technique, with 10,000 rounds [35]. At each run of Monte Carlo, 70% of the samples were applied for calibration and the remaining for validation ( Figure 2g). After testing two non-linear fitting models (Polynomial 2nd order and power-law fitting curves), the power-law function was selected for the Chl-a modeling as the polynomial fitting showed a parabolic trend causing the estimated Chl-a to increase as NDCI decreased (for NDCI lower than −0.2). A total of 10,000 runs were used for Monte Carlo validation and two fitting statistics were computed: Coefficient of Determination, R 2 , and Mean Absolute Percentage Error, MAPE.
where n is the number of predictions, y i is the observed value,ŷ is the predicted value, and y is the mean value of y i . The results for fitting statistics as well as the model parameters for the 10,000 runs were calculated considering the mode of each statistic.

Decision Tree for TSI and Algae Bloom Classification
The Trophic State Index can be classified into six classes according to Chl-a concentration, ranging from Ultra-oligotrophic to Hyper-eutrophic (Table 2). For this study, all the 136 measured Chl-a samples were first categorized into measured TSI levels as ground truth following the thresholds in Table 2. The NDCI sat distribution for each TSI class was submitted to a paired T-Test (p value = 0.05) between the classes, where the null hypothesis is that there is no significant difference between the two classes being tested. The NDCI sat distributions between all classes were statistically different, except for Ultra-oligotrophic (not statistically different from Oligotrophic, p-value > 0.05) which was then reclassified into Oligotrophic. A decision tree based on MC technique was applied to determine the NDCI sat threshold that classifies into five TSI levels ( Figure 2h). To detect the algae bloom (AB), the criterion was to consider either Super-eutrophic or Hyper-eutrophic levels (i.e., Chl-a > 30.55 µg/L) as algae bloom condition. The output is binary classification, where NDCI sat is categorized either as non-bloom (Oligo, Meso and Eutrophic levels) or AB condition (Super and Hypereutrophic levels). Table 2. Chl-a concentration thresholds for Trophic State Levels according to CETESB [23]. * In this research, Ultra-oligotrophic class was combined with Oligotrophic because paired T-test indicated no significant difference between them. ** Considered Algae Bloom condition for this research.

Trophic State
Chlorophyll-a (µg/L) For both TSI and AB detection, confusion matrices were used to evaluate the classification performances in terms of accuracy and f1-score. All the calibration and validation processing were conducted in Python 3.6.

GEE App
The Chl-a algorithm (Section 2.4.1) and TSI decision-tree (Section 2.4.2) were implemented in GEE in order to compute Chl-a, TSI, and AB maps from the stored NDCI sat collection (August 2015 to March 2021). To make these products easily accessible to endusers, a GEE App with interactive functions has been implemented. The user can select the Region of Interest (ROI), date range, and the type of analysis: either single image or temporal analysis. For single image analysis, the user can select an image from a day of the year to visualize RGB, NDCI sat , Chl-a, TSI, and/or AB images. For the temporal analysis, a couple of charts will pop-up when the user defines a geometry on the Map showing Chl-a temporal average and the relative TSI area. In addition, new map layers are added with mean, minimum, maximum maps for Chl-a, TSI, and AB data (Figure 2i).

Results
The intermediate outputs of an image processing workflow from Original S2 Level 1 to NDCI images are shown in Figure 3. The original image (Figure 3a) was first cloud masked ( Figure 3b) and then submitted to atmospheric and glint correction followed by water mask application (Figure 3c). NDCI sat was calculated and exported to the NDCI-daily collection with a 30 m resolution (Figure 3d). The evaluation of the atmospheric correction method applied in this study is available in Appendix A.
After processing the entire Sentinel-2 imagery database for the Tietê River Basin on a daily basis and adding them into the GEE Asset with 30 m resolution, the NDCI daily collection of more than 1500 images from August 2015 to February 2021 summed up to 10 GB.  After processing the entire Sentinel-2 imagery database for the Tietê River Basin on a daily basis and adding them into the GEE Asset with 30 m resolution, the NDCI daily collection of more than 1500 images from August 2015 to February 2021 summed up to 10 GB.

Chl-a Algorithm
As a result of Monte Carlo simulation, the validated power-law fitting curve (N = 30%) assembles the global power-law fitting curve (N = 136) rendering a robust predictive model (Figure 4a).
Chl-a = 23.44 × (NDCIsat +1) ^ 7.95 (4) The final model presented a R 2 of 0.86 and MAPE close to 90% (Figure 4a). The low accuracy indicated by MAPE value could be related to the fact that the proposed algorithm presented higher uncertainty for low Chl-a values (<5 µg/L). As Chl-a increases, the algorithm accurately estimates Chl-a for concentration between 10 and 70 µg/L, as observed in Figure 4c. On the other hand, the prediction's errors increase significantly above 70 µg/L [Chl-a], according to the model's residual values (Figure 4d). Overall, the algorithm performs satisfactorily well on estimating medium to high Chl-a (>10 µg/L), a concentration range where AB often occurs.

Chl-a Algorithm
As a result of Monte Carlo simulation, the validated power-law fitting curve (N = 30%) assembles the global power-law fitting curve (N = 136) rendering a robust predictive model (Figure 4a). Chl

TSI Classification Tree
The paired T-test confirmed that the NDCIsat distribution is different among each TSI class (Figure 5a). The NDCIsat thresholds (nodes) were based on the decision tree defined after MC iteration (Figure 5b  The final model presented a R 2 of 0.86 and MAPE close to 90% (Figure 4a). The low accuracy indicated by MAPE value could be related to the fact that the proposed algorithm presented higher uncertainty for low Chl-a values (<5 µg/L). As Chl-a increases, the algorithm accurately estimates Chl-a for concentration between 10 and 70 µg/L, as observed in Figure 4c. On the other hand, the prediction's errors increase significantly above 70 µg/L [Chl-a], according to the model's residual values (Figure 4d). Overall, the algorithm performs satisfactorily well on estimating medium to high Chl-a (>10 µg/L), a concentration range where AB often occurs.

TSI Classification Tree
The paired T-test confirmed that the NDCI sat distribution is different among each TSI class (Figure 5a). The NDCI sat thresholds (nodes) were based on the decision tree defined after MC iteration (Figure 5b  The weighted overall f1-score, which accounts for both precision and recall, is 0.71 (Figure 6a). Oligo and Hypertrophic classes show a high f1-score (>0.78), indicating that the tree classification performs well on the extremes (very low and very high) of Chl-a concentration range. For the intermediate classes, however, the f1-score decreases to 0.72 for Super-eutrophic, 0.61 for Meso, and 0.50 for Eutrophic class. The Eutrophic class shows a low precision (0.39) due mostly to the underestimation of Super-eutrophic class, i.e., the tree decision classifies some Super-eutrophic samples (n = 10) as Eutrophic, and to the overestimation of Meso-eutrophic samples classified as Eutrophic class (n = 4). Overall, the classifier shows a satisfactory performance allowing the user to detect increasing levels of eutrophication.
For AB (Super and Hyper-eutrophic) vs. non-bloom (Oligo, Meso, and Eutrophic) classification, the overall f1-score is 0.90 (Figure 6b). Most of the errors are due to the misclassification of bloom (true label) as non-bloom (predicted label), which means the under-estimation of bloom condition of around 14%. On the other hand, for the pixels classified as bloom with 96% of precision, only two samples identified as bloom out of 57 were mistaken.  For AB (Super and Hyper-eutrophic) vs. non-bloom (Oligo, Meso, and Eutrophic) classification, the overall f1-score is 0.90 (Figure 6b). Most of the errors are due to the misclassification of bloom (true label) as non-bloom (predicted label), which means the under-estimation of bloom condition of around 14%. On the other hand, for the pixels classified as bloom with 96% of precision, only two samples identified as bloom out of 57 were mistaken.

App Interface and Functionalities
Once the Chl-a and TSI algorithms had been calibrated and validated, they were implemented in GEE as an Earth Engine App (Figure 7), entitled Algae Bloom Monitoring Application (AlgaeMAp). Within the App, the user can pick a region of interest, display

App Interface and Functionalities
Once the Chl-a and TSI algorithms had been calibrated and validated, they were implemented in GEE as an Earth Engine App (Figure 7), entitled Algae Bloom Monitoring Application (AlgaeMAp). Within the App, the user can pick a region of interest, display maps, create temporal charts, and visualize spatial stats, such as min, max, and mean values for each pixel given a date range.  A few examples extracted from the Tietê River Basin are shown to illustrate the App functionalities by taking the user's perspective: Single image analysis-After selecting a ROI (either basins or reservoirs), individual (or single) images can be displayed as RGB and/or derived products (Chl-a, TSI, bloom) A few examples extracted from the Tietê River Basin are shown to illustrate the App functionalities by taking the user's perspective: Single image analysis-After selecting a ROI (either basins or reservoirs), individual (or single) images can be displayed as RGB and/or derived products (Chl-a, TSI, bloom) by selecting the desired date (Figure 7c). After selecting the date, the user can click on the "Display Image" button ( Figure 7c) to show the Chl-a, TSI, and Bloom images (Figure 8). For every date selected, the user can download the displayed layers (NDCI sat , Chl-a, TSI, and AB) as tiff files (Figure 7i). Additionally, the user can click on a given pixel to show the Chl-a, NDCI sat , and geographic location in the Inspector tool (Figure 7h). (a) Select the ROI (Region of Interest), either Basins or Reservoirs; (b) Indicate geographic coordinates to add a geographical point, in decimal degrees.; (c) Select single images to display RGB (432, Original Level 1), Chl-a, NDCI, TSI, or Bloom product; (d) For temporal analysis, select the year and month interval of interest; (e) Calculate time-series to generate Spatial Stats Maps such as min, max, mean, and bloom occurrence (%); (f) Write a buffer and scale before drawing a geometry to create time-series charts; (g) Select the geometry and draw it on the map and select the Check-boxes to show time-series charts; (h) Inspect the image pixel value; (i) Download the processed images; (j) Legend for image interpretation.
A few examples extracted from the Tietê River Basin are shown to illustrate the App functionalities by taking the user's perspective: Single image analysis-After selecting a ROI (either basins or reservoirs), individual (or single) images can be displayed as RGB and/or derived products (Chl-a, TSI, bloom) by selecting the desired date (Figure 7c). After selecting the date, the user can click on the "Display Image" button ( Figure 7c) to show the Chl-a, TSI, and Bloom images (Figure 8). For every date selected, the user can download the displayed layers (NDCIsat, Chl-a, TSI, and AB) as tiff files (Figure 7i). Additionally, the user can click on a given pixel to show the Chl-a, NDCIsat, and geographic location in the Inspector tool (Figure 7h). Time-series spatial stats-After selecting the ROI, the user can also analyze time changes in variables such as Chl-a and NDCIsat (temporal mean, min, max) and Bloom Frequency (Figure 9). For that, the user must select the date range (year and month) and click on the ''Calculate Time-Series button'' (Figure 7e). This method allows the user to choose a specific t interval and monthly range (to analyze seasonality). These statistical Time-series spatial stats-After selecting the ROI, the user can also analyze time changes in variables such as Chl-a and NDCI sat (temporal mean, min, max) and Bloom Frequency (Figure 9). For that, the user must select the date range (year and month) and click on the ''Calculate Time-Series button" (Figure 7e). This method allows the user to choose a specific t interval and monthly range (to analyze seasonality). These statistical maps are added as new layers into the Map Panel and can be visualized and downloaded separately (Figure 9). Similar to the Single image analysis, clicking on a specific pixel on the map in the Time-series section will display the temporal variables in the Inspector (Figure 7h).  Time-series charts-After selecting the date range, the App can also be used to plot temporal information from an area of interest, which can be a rectangle, polygon, or point ( Figure 7f). First, the user must choose one of these three available geometries and draw Time-series charts-After selecting the date range, the App can also be used to plot temporal information from an area of interest, which can be a rectangle, polygon, or point ( Figure 7f). First, the user must choose one of these three available geometries and draw it on the map. Then the user must click on the desired information, which can be "Show mean Chl-a", "Show mean NDCI", "Show monthly TSI", "Show daily TSI", and "Show Video". For the "Show mean Chl-a" and "Show mean NDCI" buttons, the App displays temporal variation of spatial mean Chl-a and NDCI (if the geometry is a rectangle or a polygon). When the user selects a point, the Chl-a and NDCI sat temporal information becomes specific to an individual pixel. As a result, the Chl-a and NDCI sat time series can be analyzed for a given geographic coordinate of interest (decimal longitude and latitude, WGS 84), which is often the case when you have in situ data from a field campaign. For example, Figure 10a shows Chl-a time-series for one of the CETESB's monitoring sites indicated by the coordinates in Figure 7. Furthermore, it is possible to identify, within the chart, every date with an image, which can be then displayed (Figure 8c). For TSI, there are two options to visualize the information: relative TSI area aggregated by month ("Show monthly TSI") or absolute daily TSI area ("Show daily TSI"). Monthly aggregation gives a more general TSI condition with less gaps on the water body over space and time, being more appropriate for long-term and regular monitoring. Alternatively, for a more specific evaluation of TSI, the user can select aggregation by day which shows the absolute number of pixels for a given eutrophication class. Figure 10c,d shows the TSI temporal trend aggregated by day and month within the rectangle presented in Figure 7 (covering the entire Billings reservoir). Additionally, the user can visualize a video of Chl-a maps over time when selecting "Show Video" (Figure 10b).

Discussion
The monitoring of Chl-a concentration in inland waters using remote sensing data Furthermore, it is possible to identify, within the chart, every date with an image, which can be then displayed (Figure 8c). For TSI, there are two options to visualize the information: relative TSI area aggregated by month ("Show monthly TSI") or absolute daily TSI area ("Show daily TSI"). Monthly aggregation gives a more general TSI condition with less gaps on the water body over space and time, being more appropriate for long-term and regular monitoring. Alternatively, for a more specific evaluation of TSI, the user can select aggregation by day which shows the absolute number of pixels for a given eutrophication class. Figure 10c,d shows the TSI temporal trend aggregated by day and month within the rectangle presented in Figure 7 (covering the entire Billings reservoir). Additionally, the user can visualize a video of Chl-a maps over time when selecting "Show Video" (Figure 10b).

Discussion
The monitoring of Chl-a concentration in inland waters using remote sensing data has been improved in the last years due to the new generation of earth observation sensors [10]. A successful example is the MSI sensor aboard Sentinel-2A-B. This sensor covers the lack of spectral (13 bands), spatial (10-20 m), and temporal (5 days) resolutions required for inland water bodies monitoring [10,13,36,37]. Moreover, the MSI sensor's spectral resolution allows for Chl-a concentration in those waters to be detected more accurately than in other sensors such as Landsat-8 [38]. This accuracy is likely due to there being more bands in the near-infrared region (705, 740 nm), which allows the detection of the spectral features associated to phytoplankton blooms (e.g., high Surface Reflectance signal at 705 nm). Therefore, these spectral features could be enhanced using indexes such as the NDCI [9]. NDCI is related to Chl-a as it is the normalized difference between the red-edge band (705 nm) and the red band (660 nm) [39]. As Chl-a concentration increases, the signal at 705 nm will increase (reflectance peak), and the signal at 660 nm will decrease due to the high Chl-a absorption at this band [9]. NDCI and band ratios between red-edge and the red band were widely evaluated in literature and presented successful results [9,36,[40][41][42][43][44].
In this study, the use of NDCI sat for estimating Chl-a concentration in the Tietê River Basin confirms its applicability for algae blooms monitoring. It was possible to develop a robust power-law function to predict Chl-a based on NDCI sat values obtained from the Sentinel-2 images. The results based on our validation procedure using Monte Carlo simulation demonstrated that the model had good performance for Chl-a retrieval (mode of R 2 = 0.86). Further, the NDCI sat separated the trophic state (global accuracy of 0.70) and the bloom occurrence (global accuracy of 0.90). In a previous NDCI sat application on MERIS bands, Mishra [9] used a quadratic function to estimate Chl-a from the NDCI sat (Chl-a < 30 µgL −1 ), with RMSE of 1.43 µgL −1 . In a eutrophic reservoir in Tietê River Basin (Barra Bonita), Watanabe et al. [43] obtained an R 2 of 0.80 and a Mean Absolute Error (MAPE) of 39.44% for an NDCI algorithm, using in situ radiometric measurements. In a more global application for a wider range of Chl-a concentrations (0-1000 µgL −1 ), Pahlevan et al. [37] reported that NDCI obtained from in situ radiometric data estimated Chl-a with errors of approximately 49%. The NDCI was compared with other algorithms and outperformed almost all, apart from a machine-learning algorithm. Using Sentinel-2 imagery, Watanabe et al. [45] observed that NDCI performed better than other algorithms (2-band, 3-band, and slope) for Chl-a retrieval in the Tietê cascade system (Chl-a betweeñ 0 to up to 800 µgL −1 ), with MAPE values of 48%. The use of NDCI for Chl-a estimates in eutrophic environments (~40 to up to 225 µgL −1 ) was also demonstrated by Tavares et al. [46] in two Brazilian Lagoons (RMSE = 25.96 µgL −1 ). Additionally, NDCI has been used for algal bloom monitoring in locations with the absence of in situ data due to the COVID-19 lockdown [12,47], and also to provide near-real-time bloom alerts to environmental and governmental agencies [10,11].
The performance of NDCI sat algorithm for Chl-a retrieval using satellite data is dependent on the atmospheric correction. In Google Earth Engine, bottom of atmosphere (BOA) data was only available using the Sen2Cor method and only after 2019 for the South America region. The Sen2Cor method that generates the BOA data for MSI imagery in GEE also has uncertainties for water applications, as it is designed for land applications [48]. Therefore, atmosphere correction methods suitable for inland water application need evaluation to increase their accuracy over water bodies. The SIAC method evaluated in this study presented a satisfactory performance for Surface Reflectance retrieval (Pearson R = 0.96, MAPE = 48%) when all bands were compared and, specifically, for bands used in the NDCI sat calculation (MAPE = 27.23% and 8.92% for bands 4 and 5, respectively). Similar to other AC methods, SIAC applies the well-known 6SV RT model, a long-term MODIS product, as well as fast processing in the GEE platform. Moreover, the auxiliary products have been tested for atmospheric correction of medium spatial resolution [49][50][51], and the results are promising in terms of quality and reproducibility. In fact, the satisfactory agreement between in situ surface reflectance and satellite reflectance corroborates the use of SIAC for water quality applications, particularly for NDCI sat calculation. The advantage is that SIAC has been implemented into GEE allowing atmospheric correction via cloud computing services, which is extremely fast when compared to traditional personal computing methods.
Furthermore, we observed that glint correction reduced Surface Reflectance data errors (a reduction by, approximately, 30%). The application of glint correction using the SWIR band of Sentinel-2 was also demonstrated by other studies in Brazilian Inland waters [29,31,35]. The comparison between in situ calculated NDCI and NDCI sat also presented likely results (Pearson R = 0.94, MAPE = 47.4%). The higher MAPE values for NDCI could be attributed to uncertain propagation in band-ratio algorithms [52], and a slight offset between in situ measured and NDCI sat . However, as the algorithms were developed using NDCI sat on a wide temporal scale, this difference should not impact the final algorithm. The time delay used in this study was set at ±2 days, which could be a source of uncertainty if a bloom occurs between these two measurements [53,54]. However, other studies pointed out that if the data was carefully checked, the time delay could be extended up to three days [29,35,45,55].
For the TSI, previous studies [29,55] usually estimate trophic levels after deriving Chl-a concentration from satellite imagery, which often propagates the Chl-a algorithm's uncertainties. Alternatively, we propose the use of NDCI sat , which is independent of any predictive Chl-a algorithm, to derive TSI levels. The proposed decision tree separates extreme classes well, such as Oligotrophic and Hypereutrophic, but some misclassifications were observed within the intermediate levels. It is important to note that both Chl-a and TSI algorithms are results of the first attempt at a regional approach. The plan is to improve these algorithms, having NDCI sat as the predictive index so it can be extended to other relevant regions in Latin America ( Figure 11).
Overall, the AlgaeMAp application system has shown to be useful to provide remote sensing water quality algorithms to end-users. The App is easy to use and offers valuable information about the current and historical Chl-a, TSI, and bloom state of a given water body. Different programs have been developing platforms to monitor reservoir Chl-a and trophic state using remote sensing [56][57][58]. The CyanoLakes company, for example, monitors current and historical Chl-a in reservoirs using the Ocean and Land Colour Instrument (OLCI) on Sentinel-3 (with 300 m spatial resolution) and provides the information in a web platform [58]. Similarly, the company EOMap also provides web portals where the user can extract spatial and temporal data of Chl-a, suspended matter, Chl-a, harmful algae bloom, and trophic state classification [56]. In addition to private companies, public agencies are also realizing the great benefit of providing water quality parameters in web platforms. The Brazilian National Water Agency (in Portuguese ANA) has been developing a remote sensing web platform called HidroSat, which provides Chl-a and turbidity information using MODIS TERRA e AQUA [59]. However, the platform provides the historical information in a graph for just one fixed point, a method that prevents spatial analysis. In the same way, the United States Environmental Protection Agency (EPA) has been developing an App that will inform cyanobacteria algal blooms on a national scale throughout the USA using Sentinel 3 information [57].
To the extent of the authors' knowledge, our developed AlgaeMAp is the first online platform built within the GEE platform to offer high spatial resolution of water quality parameters. As a result, the App benefits from the huge processing capability of the GEE platform that allows any user with internet access to easily extract detailed spatial (30 m) and long temporal Chl-a and TSI information (from August 2015 and with images every five days) over the most important reservoirs in São Paulo. AlgaeMAp uses the GEE platform to access Sentinel 2 images which are uploaded approximately one day after the satellite acquisition date, allowing the App to serve as an alarm system as well.
Considering that the São Paulo State Environmental Quality Company currently relies on in situ data to monitor reservoirs in the most important economic region in Brazil, the App satellite information might be a huge advance in water governance if incorporated by water agencies. The AlgaeMAp aims to connect space technology to water decision end-users to improve water management in Latin America.
final algorithm. The time delay used in this study was set at ±2 days, which could be a source of uncertainty if a bloom occurs between these two measurements [53,54]. However, other studies pointed out that if the data was carefully checked, the time delay could be extended up to three days [29,35,45,55].
For the TSI, previous studies [29,55] usually estimate trophic levels after deriving Chl-a concentration from satellite imagery, which often propagates the Chl-a algorithm's uncertainties. Alternatively, we propose the use of NDCIsat, which is independent of any predictive Chl-a algorithm, to derive TSI levels. The proposed decision tree separates extreme classes well, such as Oligotrophic and Hypereutrophic, but some misclassifications were observed within the intermediate levels. It is important to note that both Chl-a and TSI algorithms are results of the first attempt at a regional approach. The plan is to improve these algorithms, having NDCIsat as the predictive index so it can be extended to other relevant regions in Latin America (Figure 11). Overall, the AlgaeMAp application system has shown to be useful to provide remote sensing water quality algorithms to end-users. The App is easy to use and offers valuable information about the current and historical Chl-a, TSI, and bloom state of a given water body. Different programs have been developing platforms to monitor reservoir Chl-a and trophic state using remote sensing [56][57][58]. The CyanoLakes company, for example, monitors current and historical Chl-a in reservoirs using the Ocean and Land Colour Instrument (OLCI) on Sentinel-3 (with 300 m spatial resolution) and provides the information in a web platform [58]. Similarly, the company EOMap also provides web portals where the user can extract spatial and temporal data of Chl-a, suspended matter, Figure 11. AlgaeMAp intends to extend to other relevant water bodies, such as Primavera Dam, Itaipu Dam, and coastal lagoons (Mirim and Patos) in the Southern South America. Example of average NDCI for March 2020.

Conclusions
In this article, we demonstrate the development of an algae bloom monitoring system that uses cloud computing (GEE) to process all Sentinel-2 imagery, generate the NDCI sat collection, and provide quick access to the Chl-a and TSI (Trophic State Index) levels for any water body within the study area (Tietê River Basin, Figure 1). More importantly, all this information has been implemented into an Earth Engine App that allows personalized evaluation of a given water mass that can either be chosen from a list or drawn by the user on the map panel. The AlgaeMAp is currently under development and a demonstration version can be accessed from GEE App gallery Available online: https://felipellobo.users. earthengine.app/view/algaemapv10 (accessed on 8 July 2021). The user is able to define date range, ROI (Region of Interest), time-series charts (either NDCI or Chl-a) and TSI area charts (% class area), and save these plots as texts or image formats.
For the project's following activities, we plan to extend the NDCI sat collection to other important South America regions, such as Uruguay, Argentina, and north-eastern Brazil. However, this extension depends on in situ data for algorithm validation, which could be accomplished with more in situ chlorophyll data provided by the users and collaborators. Finally, this project represents the state-of-art Application of Remote Sensing and Cloud Computing to provide algae bloom alerts in the Latin America regions, helping Authors would also like to than Juan Torres-Batllo (EO Data Science) for crucial help in Google Earth Engine App implementation. Authors are in debt to Felipe Nincao, Rejane Paulino, and Rogério Flores Júnior (INPE's Postgraduate Course) for help with in situ measurements in Billings Reservoir.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
For an evaluation of a Sentinel-2 images after atmospheric and sun glint correction, radiometric data was collected in the Billings Reservoir (November 2020). Three intercalibrated TriOS-RAMES (Hyperspectral Radiance and Irradiance Sensors) spectroradiometers, operating in the 350-950 nm spectral range, were used to collect radiometric data in 22 sample sites ( Figure A1a, Chl-a concentration was only determined for six of these sample sites). At each site, the instruments measured, simultaneously, the downwelling irradiance above the water surface, Water-Leaving Radiance, and Sky Radiance to derive Surface Reflectance-SR (full radiometric processing described in [30]). Then SR values were used to simulate MSI/Sentinel-2 spectral bands using their Spectral Response Function [60].
where SR λ j is the simulated band at central wavelength j, SRF (λ) is the spectral response function for each wavelength i, and SR(λ i ) is the measurement for each wavelength i. The simulated bands were used to calculate NDCI (Equation (1)), named NDCI in situ, which were compared to the NDCI sat derived from Sentinel-2/MSI . In total, 22 radiometric samples were used to evaluate the Sentinel-2 processing of atmospheric correction, glint correction and NDCI of one image acquired on 7 November 2020 (Tile: T23KLP). The evaluation of the SIAC's surface reflectance output using in situ surface reflectance as ground truth ( Figure A1)  When in situ measured NDCI (NDCI in situ ) is plotted against Sentinel-2 derived NDCI (NDCI sat ), a high positive correlation is observed (R = 0.93. N = 22). NDCI sat consistently shows lower values, which is expected, because NDCI sat integrates a larger area (30 m × 30 m) than those measured by in situ radiometers.

Appendix B
To define a predictive model with the best balance between the number of samples, accuracy, as well as precision for both Chl-a estimation and TSI classification, several tests were taken by varying time windows from 0 to ±3 days. According to Table A1, the algorithm`s outputs with ±2 days has shown to be more appropriated for this application because it comprises a wide Chl-a interval, large number of samples (n = 136), and yet, high determination coefficient value (R 2 = 0.86). Table A1. Performance of power-law algorithm by varying the time window between satellite and in situ samples from 0 to ± 3 days. a and b refer to the algorithm's parameters (Chl-a = a * (NDCIsat +1) ^ b). The model chosen for this research is ± 2 days indicated with *. For the TSI tree decision and AB classification, similar conclusions can be withdrawn. Overall accuracy for TSI classification when using a ± 2 days window is approximately the same as using ±1 day window but with 50% more samples (Table A2). For AB classification, overall accuracy is similar for all tested windows (>0.89). When using only samples taken on the same day (0 day), the paired T-tested amongst the classes did not reject the null hypothesis which is that there is no significant difference between them preventing procession with the tree decision testing.

Appendix B
To define a predictive model with the best balance between the number of samples, accuracy, as well as precision for both Chl-a estimation and TSI classification, several tests were taken by varying time windows from 0 to ±3 days. According to Table A1, the algorithm's outputs with ±2 days has shown to be more appropriated for this application because it comprises a wide Chl-a interval, large number of samples (n = 136), and yet, high determination coefficient value (R 2 = 0.86). Table A1. Performance of power-law algorithm by varying the time window between satellite and in situ samples from 0 to ± 3 days. a and b refer to the algorithm's parameters (Chl-a = a * (NDCI sat +1)ˆb). The model chosen for this research is ± 2 days indicated with *. For the TSI tree decision and AB classification, similar conclusions can be withdrawn. Overall accuracy for TSI classification when using a ± 2 days window is approximately the same as using ±1 day window but with 50% more samples (Table A2). For AB classification, overall accuracy is similar for all tested windows (>0.89). When using only samples taken on the same day (0 day), the paired T-tested amongst the classes did not reject the null hypothesis which is that there is no significant difference between them preventing procession with the tree decision testing. Table A2. Accuracy performance of Trophic State Index (TSI) and Algae Bloom (AB) classification by varying the time window between satellite and in situ samples from 0 to ± 3 days. Paired T-test between classes rejects the null hypothesis (H 0 ) when there is a statistical difference between them (p-value < 0.05). Nodes correspond to NDCI thresholds depicted in Figure 6. The model chosen for this research is ± 2 days indicated with *.