Estimating Chlorophyll- a of Inland Water Bodies in Greece Based on Landsat Data

: Assessing chlorophyll- a (Chl- a ) pigments in complex inland water systems is of key importance as this parameter constitutes a major ecosystem integrity indicator. In this study, a methodological framework is proposed for quantifying Chl- a pigments using Earth observation (EO) data from Landsat 7 Enhanced Thematic Mapper Plus (ETM + ) and 8 Operational Land Imager (OLI) sensors. The ﬁrst step of the methodology involves the implementation of stepwise multiple regression (MLR) analysis of the available Chl- a dataset. Then, principal component analysis (PCA) is performed to explore Greek lakes’ potential interrelationships based on their Chl- a values in conjunction with certain criteria: their characteristics (artiﬁcial / natural), typology, and climatic type. Additionally, parameters such as seasonal water sampling and the date di ﬀ erence between sampling and satellite overpass are taken into consideration. Next, is implemented a stepwise multiple regression analysis among di ﬀ erent groups of cases, formed by the criteria indicated from the PCA itself. This e ﬀ ort aimed at exploring di ﬀ erent remote sensing-derived Chl- a algorithms for various types of lakes. The practical use of the proposed approach was evaluated in a total of 50 lake water bodies (natural and artiﬁcial) from 2013–2018, constituting the National Lake Network Monitoring of Greece in the context of the Water Framework Directive (WFD). All in all, the results evidenced the suitability of Landsat data when used with the proposed technique to estimate log-transformed Chl- a . The proposed scheme resulted in the development of models separately for natural ( R = 0.78; RMSE = 1.3 µ g / L) and artiﬁcial lakes ( R = 0.76; RMSE = 1.29 µ g / L), while the model developed without criteria proved weaker ( R = 0.65; RMSE = 1.85 µ g / L) in comparison to the other ones examined. The methodological framework proposed herein can be used as a useful resource toward a continuous monitoring and assessment of lake water quality, supporting sustainable water resources management.


Introduction
Accumulating passive exploitation of natural resources, improper land-use practices, and irregular development activities in lake basins undermine various significant functions of water resources [1]. Surface water provides exceptional financial benefits, regarding water supply (quantity and quality), fisheries, agriculture, wildlife resources, and recreation and tourism opportunities (Ramsar Information Paper no. 1 2007 [2]). The need for sustainable management of water bodies highlights the fact that water resources are not inexhaustible and have limited resistance under anthropogenic pressures such as ongoing drainage, conversion, and pollution.
Hence, one of the most significant aspects for the sustainable management of water bodies-lakes in particular-is the constant monitoring of their quality, as well as of their watersheds [3]. Water quality parameters comprising physical, chemical, and biological properties are conventionally measured by collecting samples from the field and then analyzing those samples in the laboratory. Although in situ monitoring provides high accuracy, it is a time-consuming procedure, and it cannot ensure a simultaneous water quality dataset on a regional or greater scale [3][4][5]. Furthermore, traditional point sampling methods are not capable of detecting the spatial or temporal variations in water quality, as required in extensive assessment and management of water bodies. On the other hand, geoinformation and especially geospatial technologies provide a promising direction in that respect. In particular, the combined use of Earth observation (EO) and geographical information systems (GIS) allows monitoring in an efficient and robust way lake parameters over variable spatial scales, including lakes that are otherwise inaccessible [6,7].
Various EO instruments mounted on either airborne or satellite platforms acquire spectral information and measure the energy from the water's surface at different wavelengths [3]. The most commonly used approach to inland water remote sensing involves fitting a standard linear regression between spectral band/band ratio values and temporally coincident in situ water quality measurements [8]. Visible, near-infrared (NIR), and shortwave infrared (SWIR) bands of the solar spectrum are usually used by many researchers to acquire powerful correlations-through empirical approaches-among water column reflectance values and constituents in different water bodies [3,[9][10][11][12][13][14]. EO data from several satellite and airborne sensors such as Satellite Pour l' Observation de la Terre (SPOT), Moderate Resolution Imaging Spectroradiometer (MODIS), Earth Observing 1-Hyperion, and Medium Resolution Imaging Spectrometer (MERIS) were used for chlorophyll-a (Chl-a) estimation [15][16][17][18][19]. Nonetheless, it was revealed that Landsat seems to be more appropriate and widely used for Chl-a assessment due to its temporal coverage, spatial resolution, and easy accessibility [3].
Chl-a is the major indicator of trophic state and considered as one of the top water pollution indices related to public health, eutrophication, and deterioration of ecosystem habitat. In Case 2 waters (i.e., inland and coastal waters), the optical properties are measured based on a compound of dissolved organic matter, dead organic and inorganic particulate matter, and phytoplankton (Chl-a). Therefore, determination of Chl-a concentration is much more complex and less accurate [3,[20][21][22]. Oligotrophic to mesotrophic waterbodies with low biomass present a Chl-a spectrum characterized by a sun-induced fluorescence peak centered at 680 nm [8,23], while eutrophic waterbodies (high biomass) present a florescence signal which is masked by absorption features and backscatter peaks around 665 nm and 710 nm, respectively [8,24]. The ratio between these two wavelengths is widely used to quantify Chl-a concentrations with satisfactory accuracy [8,25].
Many studies have so far focused on monitoring eutrophication or trophic state through Chl-a concentration retrieval in Greek lakes. For example, Peppa et al. [26] applied Chl-a detection algorithms in Lake Pamvotis using Sentinel-2 data, Markogianni et al. [27] developed empirical Chl-a quantitative models based on Landsat 5 images in the brackish urban shallow Koumoundourou lake, while Markogianni et al. [14] investigated the suitability of the OLI instrument on board the Landsat 8 satellite platform in accurately estimating Chl-a in the largest freshwater body of Greece (Trichonis Lake). In another study, Kontopoulou et al. [28] tried to exploit the Water Framework Directive (WFD) dataset and measure Chl-a concentrations by using Landsat 8 data in 11 (six natural; five artificial) of the 50 lakes comprising the national sampling network.
In this study, we discuss the utility of remotely sensed techniques in the qualitative assessment of 50 lake water bodies, particularly of Chl-a concentrations, derived from the WFD (2000/60/EC) monitoring network for lakes in Greece. WFD provides a scheme for the conservation and improvement of inland, ground, and coastal waters' ecological status and aims to harmonize European legislation on water. Thus, pan-European hydromorphological, physicochemical, and biological datasets are used to determine ecological status of surface waters (Article 8) in order to assure and further improve future water quality and quantity [29].
In purview of the above, this study proposes a methodological framework that aims to provide Chl-a in Case 2 complex inland waters of Greece by generating accurate quantitative models with EO data from the Landsat 7 ETM+ and 8 OLI satellite series. The methodology applied initially includes the implementation of stepwise multiple regression (MLR) analysis of the whole available Chl-a dataset with the basic aim of exploring its potential to establish robust Chl-a quantitative algorithms, regardless of lake characteristics. Then, principal component analysis (PCA) is performed to highlight the most significant parameters (artificial/natural, WFD typology, water sampling season, and climatic type) affecting Chl-a values. This procedure was considered valuable for the next step, involving the execution of multiple stepwise MLR analyses-based on PCA results-among different groups of cases. This effort aimed at exploring different remote sensing-derived Chl-a algorithms for various types of lakes according to the most significant lake characteristics. The practical use of the proposed approach was evaluated in a total of 50 lake water bodies (natural and artificial) from 2013-2018, constituting the National Lake Network Monitoring of Greece in the context of the Water Framework Directive (WFD).

Study Area
The National Monitoring Network of Waters in Greek lakes, according to the Joint Ministerial Decision 140384/2011, is implemented by the Goulandris Natural History Museum, Greek Biotope/Wetland Centre (EKBY). The network consists of 50 lake water bodies, natural and artificial.
At the majority of the lakes, only one sampling station is detected, except for trans-boundary lakes (Megali Prespa, Mikri Prespa, and Doirani), where two sampling stations are located (Table 1; Figure 1). From the total of 53 sampling sites, there are 27 surveillance and 26 operational ones. Surveillance stations operate in water bodies of good status, for a certain period of time (one year), while operational stations are monitored on a monthly or seasonal basis, in water bodies which fail to achieve good status.   Table 1).

In Situ Data
In this study, Chl-a concentrations, measured from 2013 to 2018 (summer, autumn, and spring) throughout the studied lake stations, were acquired, while the Chl-a concentrations (in µg/L) were determined spectrophotometrically (Method 10200 H [32]). Those data were retrieved from the EKBY site (Goulandris Natural History Museum, Greek Biotope/Wetland Centre; http://biodiversity-info.gr/ index.php/el/lakes-data#!IMGP4731), where more details about the sampling periods, stations, and variables measured can be found. Apart from the Chl-a measurements, some basic characteristics of the 50 studied lakes were also considered, such as whether they are natural or artificial, their typology according to the Water Framework Directive (Table 2), and the climatic type according to the Köppen-Geiger climate classification (Table 3) [33]. The determination of the Mediterranean lake types is based on the 2013/480/EU decision (Table 2), while the typology of each lake was retrieved from the respective reports, acquired from the Environment and Energy Ministry's website (http://wfdver.ypeka.gr/).  Landsat 8 satellite was launched in 2013 and bears two sensors, the OLI and the Thermal Infrared Sensor (TIR). It also includes a narrower near-infrared band and a 12-bit radiometric resolution compared to the 8 bits of previous Landsat satellites [19,34]. The main aim was to match as many records from the available Chl-a dataset as possible with Landsat 8 images. In cases where no Landsat 8 images were available or were cloud-covered, Landsat 7 ETM+ images were used. Landsat 7, launched in 1999, introduced the ETM+, whereby its analysis was similar to Thematic Mapper (TM) except for a 60-m thermal and a new 15-m panchromatic band [35]. Since 2003, Landsat 7 has a sensor deficiency where the scan line corrector (SCL) is off and, even though those images are characterized by black line gaps [36], its radiometric and geometric analyses remain undisturbed [37].
In order to cover the 50 lakes throughout Greece and the different sampling dates, 2013-2016 and 2018 time series of 296 Landsat imageries (102 L7 ETM+, 194 L8 OLI) were downloaded from the USGS (United States Geological Survey) data center (https://earthexplorer.usgs.gov/). Images from both sensors reside in Landsat Collection 1 Level-1 category data products. Furthermore, the mean time window between the satellite overpass and the in situ measurements was about ±15 days. More information about the bandwidths and spatial resolution of the aforementioned sensors can be found at NASA's official website (https://landsat.gsfc.nasa.gov/).

Satellite Data Pre-Processing
Pre-processing was carried out for the Landsat images using the semi-automatic classification plugin (SCP) of the free and open-source cross-platform desktop geographic information system, Q-GIS v. 3.6.3-Noosa [38]. Images of both Landsat 7 ETM+ and Landsat 8 sensors were subject to the following pre-processing steps: 1.
Conversion of images from digital numbers (DN) to the physical measure of top-of-atmosphere reflectance (TOA).

2.
Atmospheric correction using the DOS1 method (Dark Object Subtraction 1; image-based technique), which was applied to all bands except for thermal ones. More information about each correction including the theoretical background can be found in the SCP Documentation Release 6.2.0.1 [39]. To ensure the use of only cloud-free pixels over the sampled lakes, the Cloud Masking QGIS plugin (https://smbyc.github.io/CloudMasking) was used. By using this tool, clouds, cloud shadow, cirrus, aerosols, and ice/snow were masked for all Landsat images using the combination of the Fmask and Blue Band processes.
The pre-processing procedure of L7 images also included the retrieval of data that coincided with the aforementioned black diagonal stripes. By visual checking, sampling sites covered by those stripes were recognized and, by employing GIS operations and focal statistics, the mean value within a circle of seven cells around it for each input cell location was calculated. This radius was the most adequate among several trials. Then, by applying the Con and IsNull functions, only the cells that had no values were replaced. In cases where parts of sampled lakes were cloud-covered, the SetNull and IsNull functions were combined with the cloud mask calculated in earlier stages to remove the cloud-biased pixels.

Basic Statistics and PCA
Basic statistical analysis among the Chl-a datasets of 2013, 2014, 2015, 2016, and 2018 was carried out including the calculation of mean, median, standard deviation, and min/max. Based on the available lake characteristics (Table 1), a factor analysis was used with the Varimax-rotated principal component analysis (PCA) as a factor extraction method to interpret the major patterns of Chl-a variation within the whole dataset. Furthermore, PCA's basic aim was to explore and indicate the presence of inter-correlations among Chl-a concentrations, lake characteristics, climatic type, typology, and seasonal sampling and further indicate which were the most significant parameters affecting Chl-a concentrations in the studied lakes. The seasons were defined as follows: summer (June, July, and August), autumn (September, October, and November), and spring (March, April, and May). The Chl-a concentration distribution was grouped and categorized by the most significant variables/criteria that PCA indicated, using box-plot diagrams. PCA was performed in SPSS Statistical Package (v. 24.0).

Developing Relationships between Landsat and Chlorophyll-a Data
There are several band combinations and transformations proposed in the relevant literature for establishing relationships between the Landsat reflectance data (independent) and Chl-a, log (Chl-a), and ln (Chl-a) data [14]. Matthews [40] reported that the wavelength areas where water reflects and scatters the majority of the entering solar radiation are those that are mostly used for the monitoring of water quality constituents. These wavelengths incorporate the water-leaving reflectance in visible and NIR wavelengths of the electromagnetic spectrum. Thus, to compare Chl-a concentrations and EO data, visible (blue, green, and red), NIR, and SWIR bands were used, while aerosol, cirrus, panchromatic, and TIRS thermal spectral bands were excluded. In addition, Landsat band ratios, additions, subtractions, and log and ln transformations were added to the analysis to establish accurate and reliable estimation algorithms. Based on the respective literature, more than 75 available band transformations/combinations were developed, including spectral indices such as enhanced vegetation index (EVI) [41], normalized vegetation index (NRVI) [42], normalized difference water index (NDWI) [43], modified normalized difference water index (MNDWI) [44], green normalized difference vegetation index (GNDVI) [45], normalized difference vegetation index (NDVI) [46], and surface algal bloom index (SABI) [47]. As a first step, the main objective was to distinguish the highest important predictors among the aforementioned band transformations/combinations by conducting a correlation analysis among them and the Chl-a, log (Chl-a), and ln (Chl-a) concentrations. Taking into account only the correlations significant at the 0.01 level, a threshold value of Spearman r was set to ±0.4 (moderate relationship [48]). Variables that presented an r value equal or greater than the aforementioned value were selected. Then, those variables/predictors were further inserted-combined in various ways-in numerous stepwise MLRs while examining for statistical performance and residuals. Further criteria consisting of multi-collinearity, tolerance factor, and variance inflation factor (VIF) were applied and checked to a subset of optimal models in order to further compare them and select the simplest models with higher accuracy (higher R 2 ). To develop a robust Chl-a algorithm, stepwise MLR analyses were evaluated in the context of two scenarios, as also illustrated in Figure 2. First scenario: The basic aim of this scenario was the exploration of the potential of stepwise MLR analysis to establish robust Chl-a quantitative algorithms. The training dataset was randomly established and included 80% of the whole available dataset (481 out of 565 Chl-a measurements, regardless of lake characteristics; Figure 2). This methodology is widely used in remote sensing of inland water quality and attempted to develop an algorithm that can be applied in multiple types of lakes.
Second scenario: This scenario aimed at exploring different remote sensing-derived Chl-a algorithms for various types of lakes according to the most significant lake characteristics resulting from the PCA of Chl-a values, lake climatic type, typology, and seasonal sampling. PCA-indicated parameters were combined in all possible ways, forming the various modeling cases examined to highlight the most optimal model. In this scenario, the various training datasets were subsets of the whole dataset of in situ Chl-a values, defined by the respective combination of the PCA-derived criteria. Each training dataset was then used in a separate MLR analysis ( Figure 2). As an additional criterion, a confined time window of ±5 days was used, defined as the difference between field measurements and satellite overpass. Kloiber et al. [49] observed that in situ measurements within one day of the satellite image date resulted in the best calibrations; however, a great number of ground observations with a longer time difference balanced some of the loss of accuracy. Hence, since the available dataset of in situ Chl-a measurements is quite large, the ±5-day window was determined presuming that, during this period, the water quality usually does not exhibit large and rapid water quality fluctuations.
During the analyses which included both scenarios mentioned above, it was ensured that the separate training and validation datasets contained representative samples of the data (80% and 20%, respectively) from the initial dataset. Training datasets of both scenarios comprised the in situ Chl-a measurements of 2013-2016.

Validation Approach
The qualitative capability of the derived algorithms was evaluated by regression analysis in two ways. Algorithms derived from the MLR based on the first scenario (random dataset) were applied in Landsat images associated with the respective validation dataset (20%) and in images of the year 2018 ( Figure 2). The algorithms resulting from the second scenario's training datasets (criteria) were applied in available Landsat imageries concerning the Chl-a of each respective validation dataset (20%) and the in situ Chl-a values of 2018. After applying several models on the available images, the optimal ones were verified based on Spearman's r correlation coefficient, mean residual value (e), and Root-Mean-Square Error (RMSE) statistical indices (Figure 2).

Statistical Analyses
Total measurements of in situ Chl-a concentrations of years 2013-2016 and 2018 were 702, including all sampling campaigns in all lakes, while the most measurements were presented in 2015 (Table 4). Sampling campaigns of 2014 indicated the highest range of Chl-a values (1026.7 µg/L), while, in the remaining years, the annual ranges were significantly lower. The minimum Chl-a value was 0.22 µg/L for years 2013 and 2014, whereas this value increased over the years. Despite this fact, mean Chl-a value remained similar between different years, indicating no special deterioration trend in Greek lake water quality. Furthermore, datasets of all years were skewed to the right with low values, while the kurtosis of all years was described as leptokurtic (fat tails).  Performance of factor analysis (using principal component analysis (PCA) as the extraction method) included five variables/criteria, e.g., sampling season, in situ Chl-a concentrations, lake nature (natural/artificial), typology, and climatic type. Factor analysis was implemented to obtain an indication of underlying common factors (components) that explain the interrelationships among those aforementioned variables. The analysis initially extracted five components (Table 5). Finally, only the first three components with eigenvalues greater than one were retained (as those which represent a real underlying factor) in the extraction sums of squared loadings. The percentage of the total variance explained by the three components (calculated after the implementation of the Varimax rotation method) was 35.4%, 21.9%, and 20.8%, cumulatively explaining 78% of the total variance. Based on rotated component matrix results (Table 6), communalities (summation of the squared loadings in all components) of studied variables are further discussed. Thus, 53% of Chl-a variance was explained by the second component, which also explained 57% of the sampling season variance. The first component explained 10% of Chl-a variance, which also explained 81.5% of the variance of lake characteristics (natural/artificial) and 79% of the variance of lake typology. The third component explained only 5% of the Chl-a variance and 90% of the climatic type variance. Taking into account those results, the variables that mostly contributed to and affected the variance of Chl-a concentration were the lake characteristics (natural/artificial) and typology, followed by the sampling season. Concentrations of Chl-a of natural lakes were found notably higher in comparison to artificial lakes, while the highest measured values were detected in artificial lakes during the summer and autumn months (outlier values) but during autumn in natural lakes (Figure 3a). In general, the seasonality of Chl-a concentrations was more evident in natural rather than in artificial lakes. The highest discrepancies were detected in natural lakes (Zazari and Voulkaria lakes, July 2014) followed by autumn sampling campaigns (e.g., Zazari lake, September 2014).  Table 1.
Measured Chl-a concentrations in both artificial and natural lakes presented a greater range during summer, while most outliers of artificial lakes were also illustrated during summer (Kerkini August 2014, August 2016 and Karla reservoirs, July 2014) and autumn seasons (Kerkini reservoir, October 2015). Spring also presented significant differences between natural and artificial lakes, where median value of the latter was evidently decreased.
The same Chl-a pattern is illustrated in a different way by the second boxplot ( Figure 3b). As far as the Chl-a distribution based on the lake typology is concerned, the highest values were detected at natural GR-VSNL and GR-SNL shallow (Zazari lake) and very shallow lakes (Voulkaria lake). These cases were followed by values measured in deep natural lakes GR-DNL (Yliki lake), while the lowest Chl-a values were detected in artificial shallow reservoirs (Pournari II and Stratos reservoirs).

Relationships between Landsat and Chlorophyll-a Data
Blue, green, red, NIR, and SWIR Landsat 8 and Landsat 7 ETM+ bands and more than 75 different band transformations and indices were used for investigating the most suitable relationship to estimate Chl-a concentrations throughout 50 Greek lakes.
Correlation analysis among all available variables and Chl-a, log (Chl-a), and ln (Chl-a) concentrations resulted in Spearman r values that ranged from −0.6 to +0.6. Based on the correlation matrix, the highest important predictors that met the criterion of the set threshold value of Spearman (r) ±0.40 in relation to log (Chl-a) were the following: blue/green, green/blue, blue/red, red/blue, red/green, red/SWIR1, log (blue/green), log (blue/red), log blue/log green, log blue/log red, (blue − red)/green, ln blue/ln green, ln blue/ln red, ln green/ln blue, ln red/ln blue, ln red/ln SWIR1, and ln red/ln SWIR2. Those variables/predictors were further inserted in several combinations into numerous stepwise linear regressions. As the number of the generated models was quite large, we only included only the most significant models from a statistical point of view.
Based on the first scenario, MLR analysis among the indicated variables generated several models with good performance. Based on tests on statistical significance of the bi coefficient of the independent factors (t-test with p-values less than 0.05) and on tests for multicollinearity (variance inflation factor (VIF) with values greater than one and less than 10 and tolerance higher than 0.1), the following model was finally selected as satisfactory (Equation (1); Table 7). log Chla = 3.599 − 0.63 × blue red − 2.183 × ln red ln swir2 .
(1) Descriptive statistics results obtained for the training and validation datasets (regarding log Chl-a) suggested that there was no discrepancy in the mean (training: 0.8; validation: 0.79) and standard deviation (training: 0.69; validation: 0.68) values between the two groups. To ensure statistically significant results, an independent t-test for the mean values was also implemented. Based on these results (t-value 0.276, p = 0.783), it is assumed that there is no difference in the group means.
Random selection of Chl-a measurements and MLR analysis (first scenario, 1A model) yielded a model accompanied by a value of coefficient of determination equal to 0.43, including the band ratios ln red/ln SWIR2 and blue/red ( Table 7). The standard error of the estimate was equal to 0.53 and the collinearity statistics were considered acceptable.
Concerning the second scenario, MLR analysis conducted among the selected predictors and the various combinations of the criteria (Table 8) indicated by the PCA generated several models. Examination of the aforementioned statistical indices highlighted two models (Table 9; Equations (2) and (3)). It should also be noted that the various criteria combinations always included the values with Chl-a concentrations lower than 500 µg/L in the examined dataset (based on the normal Quantile-Quantile (Q-Q) plot of Chl-a concentrations and outlier analysis) and mean depth greater than 5 m to surely avoid the bottom reflectance noise [50]. The criterion of the time window of ±5 days was applied in cases with statistically significant results to examine and explore any further improvement. The 13th criterion case (Table 8) aimed to establish accurate algorithms considering only the mean depth differentiation of the lakes by including the shallow reservoirs (<15 m), as well as shallow (3-9 m) and some deep (>9 m) natural lakes.
with 125 selected cases, developed based on the criteria Chl-a concentration <500 µg/L, mean depth >5 m, artificial lakes, all seasons, and date difference between sampling and satellite overpass ±5 days.
The optimal Chl-a models developed based on the second scenario incorporated the band ratios ln red/ln SWIR2, blue/green, ln red/ln blue, red/green, and ln red/ln SWIR1, while the coefficient of determination was equal to 0.57 (2B) and 0.6 (2A), and the standard error of the estimate was 0.29 (2B) and 0.4 (2A). Collinearity statistics (tolerance and VIF) of the coefficients were also considered acceptable. In general, the highest beta coefficient values accompanied the band ratios red/green, blue/red, and blue/green, while Durbin-Watson's statistic test indicated an absence of autocorrelation, especially in the residuals of models developed in the second scenario.

Regression Model Validation
Developed regression models were validated for both scenarios, while validation datasets were different for each model. Concerning the first scenario's analysis (1A), the model was applied in Landsat 8 and 7 ETM+ images connected to the remaining 20% of the respective validation dataset (84 out of 565 measurements; first validation) and then for the second validation by using images and all available Chl-a measurements of the year 2018 (N = 136; Table 10). Models 2A and 2B were validated based on 20% of the remaining dataset characterized by the set criteria (2A, N = 21 out of 106 measurements; 2B, N = 31 out of 156 measurements; Table 10) and, subsequently, the in situ Chl-a values of 2018 were used for the second validation process (2A, N = 28; 2B, N = 41). More specifically, the 2A model was validated by using the measurements concerning the natural lakes, while, for the validation of 2B model, measurements of artificial lakes with a sampling/satellite date difference of ±5 days, regardless of the sampling season, were used. Table 10. Statistical indices used to validate the selected algorithms (correlation significant ** at the 0.01 level (two-tailed) and * at the 0.05 level (two-tailed)). RMSE-root-mean-square error. The first scenario yielded a model (1A) that, despite being characterized by a quite low coefficient of determination (0.43), was accompanied by two validation processes with high Spearman (r) values (0.69 and 0.76, significant at the 0.01 level) and quite large validation datasets (N = 84 and 136). Additionally, the differences among the mean in situ and satellite-derived values in both validations were not high (Figure 4a), while mean residual ( Figure 5) and RMSE values were quite low for both validation processes (RMSE first validation: 1.95 µg/L; RMSE second validation: 1.74 µg/L; Table 10). Hence, this specific model could be included as one of the proposed Chl-a quantitative algorithms.
Concerning the models resulting from the second scenario and their validations, 2A and 2B models were those that were mostly proposed. Model 2A was derived from a regression analysis with R 2 equal to 0.6 and Spearman values equal to 0.78 (first validation) and 0.7 (second validation), respectively. Mean in situ and satellite log Chl-a values were almost identical (Figure 4b Summarizing the information derived from the validation process, the most optimal Chl-a assessment model throughout the WFD Greek lakes when no information was available about their characteristics was 1A. When the studied lakes were natural, the Chl-a model became more complex (2A), incorporating three band ratios; when artificial lakes were the case, then model 2B was proposed with the condition that the date difference between field measurements and satellite overpass ranged from −5 to +5 days.

Discussion
WFD application in Greece concerning lake waterbodies yielded a significant Chl-a dataset including years 2013-2016 and 2018. Statistical analyses of Chl-a measurements indicated that natural lakes presented notably higher concentrations in relation to artificial. Those results indicated a degraded water quality of natural lakes compared to artificial with some lakes being characterized as vulnerable to eutrophication. Dense algal blooms, provoking high turbidity, are a frequent indicator of lake eutrophication [51]. Independent studies on natural lakes demonstrated a strong correlation between Chl-a and total phosphorus concentrations [52][53][54]. Canfield [54] predicted total phosphorus concentrations and trophic states (by measuring Chl-a and Secchi depths) in natural and artificial lakes through the development of empirical phosphorus models. Canfield [54] concluded that, while total phosphorus concentrations can be predicted equally well in natural and artificial lakes, prediction of lake trophic state was less reliable in artificial lakes. Relationships of Chl-a concentrations to total phosphorus and Secchi disc transparency were less precise in artificial lakes than natural lakes, while this difference was attributed to non-algal turbidities. Non-algal turbidities were detected in the artificial lakes and recorded as important water clarity determinants for many of this type of lake [55]. Based on this hypothesis, phosphorus concentrations in studied lakes may be examined to ascertain the drivers of decreased water clarity and levels of Chl-a concentrations in both types of Greek lakes.
Harmonization of Landsat 7 ETM+ and 8 OLI images yielded three Chl-a qualitative models including the ratios of blue to green and red, red to green and blue, and the ln-transformed bands SWIR1 and SWIR2. According to Barrett and Frazier [56], ratios between either chlorophyll absorption bands (red and blue) or chlorophyll reflectance bands (green and NIR) with either of the two SWIR bands highlight the part of the spectrum influenced by chlorophyll. Thus, modeled values are better correlated with actual in situ ones.
MLR analysis using the total amount of the available Chl-a dataset resulted as a quite reliable assessment model (R = 0.65). Next, the most significant parameters affecting the variance of Chl-a concentrations in studied lakes were explored, and the outcome implied the lake characteristics (natural/artificial) and typology, followed by the sampling season. The final models were separately developed for natural (R = 0.78) and artificial lakes (R = 0.76), with the latter being accompanied by a date difference between in situ and satellite data ranging ±5 days. Those models were proven to be better-based on statistical indices concerning the validation process-in relation to the one yielded from the total amount of the dataset. This superiority highlights the significance of the information acquisition concerning the studied areas.
Results of this study are in accordance with others similar studies published exploring properties of inland waters using either Landsat or other EO spaceborne sensors. Gholizadeh et al. [3] conducted a detailed review on water quality parameters that are widely estimated using remote sensing techniques. As the authors noted, most Chl-a assessment models use a wavelength near 675 nm and 700 nm. Many researchers developed empirical Chl-a algorithms using various but basically common image bands. Nas et al. [15] used the visible near-infrared (VNIR) and the shortwave infrared (SWIR) of Terra/ASTER for Chl-a mapping, presenting an R 2 value of 0.86. Zhang and Han [16] used the coastal, blue, red, and green Landsat 8 OLI bands to map Chl-a concentrations in Laizhou Bay, while Kim et al. [17] utilized the blue, NIR, and the ratio of blue to red Landsat 8 OLI bands to measure Chl-a in the Fjord of Svalbard, in the Arctic Sea, with an R 2 value of 0.6. Lim and Choi [18] also used Landsat-8/OLI in order to monitor the water quality of Nakdong River in Korea and presented high correlations among Chl-a and OLI bands, especially the green and NIR bands and the band ratio of NIR to green (Pearson's correlation coefficients of −0.7, 0.71, and −0.64, respectively). Bonansea et al. [19] tried generating a different Chl-a model for different Landsat sensors (5 TM, 7 ETM+, and 8 OLI) in the largest artificial reservoir in Córdoba province (Rio Tercero, Argentina). Overall, they observed that each Landsat sensor can be used to estimate Chl-a in the reservoir, while the best model for the TM sensor included a combination of green, red, and NIR bands, as well as the ratio green/red (R 2 = 0.92), and that for the ETM+ sensor (R 2 = 0.91) included the green and SWIR-1 bands and the ratio red/green.
While several satellite sensors can be used for Chl-a determination, mapping Chl-a in Case 2 waters is a complicated task since the optical properties are measured based on a compound of dissolved organic matter, dead organic and inorganic particulate matter, and phytoplankton (Chl-a). Therefore, Chl-a determination is characterized by less accuracy as these constituents are not statistically correlated. Taking this shortcoming into account, we tried to use spectral band ratios which decrease irradiance and atmospheric biases in the sensor signal [20] and more than one band, since the scattering and absorption of Chl-a are then better studied [57]. Furthermore, according to Kloiber et al. [49], all significant band combinations for chlorophyll include at least one of the shortwave infrared bands; thus, SWIR bands were incorporated into this study's analysis in order to produce optimal assessment Chl-a models.
A major difference in relation to aforementioned studies is that the study area used in the present study incorporated 50 different lake systems throughout Greece covering a broad geographic area and a wide range of limnological conditions, while the majority of the respective literature focused on regional scales and discrete inland water bodies. Large spatial scales require greater computational potential; thus, the release of the Google Earth Engine platform dramatically increased the scale at which earth observation research can take place [8]. An example of this extended study area is the research by Lin et al. [58] who combined in situ Chl-a data from 1157 lakes (2007) with Landsat data and developed a well-validated lake national model (RMSE = 34.9 µg/L), by using machine learning algorithms built into the Google Earth Engine. Another example regarding Greece is the study by Kontopoulou et al. [28] who used Landsat 8 images and WFD Chl-a and turbidity datasets concerning 11 lakes for the years 2013-2015. They conducted regression analysis by using Matlab scripts and also examined the effect of the time difference between satellite and field data. In that study, an R 2 of 0.78 (log Chl-a, n = 168) was reported for a time window 0-15 days, while R 2 reached 0.8 (n = 39) for a narrower three-day time difference.
Since Chl-a concentrations in lakes cannot be accurately determined due to aforementioned restrictions, we consider that the proposed empirical models are reliable and should be applicable to most natural and artificial lakes within Greece. One limitation of empirical models is their restriction to confident assessments only within the range and setting of the input data. This restriction limits their application across spatiotemporal domains [8], a risk which was restrained to a large extent since training datasets of this study included the majority of Greek lakes and three sampling seasons.
However, the general applicability and potential limitations of this approach were not thoroughly addressed; hence, further improvement will be explored as soon as the latest WFD datasets are released.
Furthermore, there are clearly some factors that should be taken into consideration in this study, affecting the accuracy of Chl-a quantification in Greek studied lakes. 1.
An uncertainty of accuracy regarding the location of sampling points. Sampling in lakes requires special attention as winds and other external factors (e.g., season, lake depth and changes in water level, ease of proximity) contribute to potential transpositions of sampling sites.

2.
The implementation of the DOS1 atmospheric correction method was not validated in order to assure that atmosphere biases were completely removed. However, this method is widely used by the EO community and proved useful when no atmospheric measurements are available.

3.
Optimal models were applied to lake surfaces accrued from lake shapefiles, acquired from the Environment and Energy Ministry's website. Since no classification between land and water was conducted, there is the possibility that some pixels, covering land, hindered Chl-a quantification with great accuracy.
All in all, it should be noted that EO is recommended to be combined with conventional in situ water sampling in order to achieve great assessment accuracy. Such a synergistic approach, in conjunction with cooperation among government and scientists, contributes to increased data retrieval, with obtained knowledge of the lake water quality, and by extension to better protection and pollution mitigation measures.

Conclusions
In this study, a methodological framework was proposed for quantifying Chl-a pigments using Earth observation (EO) data from Landsat 7 ETM+ and 8 OLI sensors. Its practical use was evaluated in a total of 50 lake water bodies (natural and artificial) from 2013-2018, constituting the National Lake Network Monitoring of Greece in the context of the Water Framework Directive (WFD).
Use of geoinformation technologies, such as EO and GIS, in combination with conventional field surveying and spatial data analysis methods, is the most efficient way forward for monitoring water quality parameters in lakes. Application of WFD in Greece resulted in a significant dataset of various water quality parameters concerning, in this case, 50 lake water bodies with different morphological characteristics and other properties. The integration of spectral information from two Landsat sensors and statistical analyses employing principle component analysis and stepwise MLR analyses yielded statistically significant results. Optimal models were developed in this study, separately for natural and artificial lakes, and they increased the feasibility of Chl-a assessment with high accuracy. The majority of the respective literature focused on discrete inland water bodies, reporting the most accurate and statistically significant models. Monitoring of water quality in large spatial scales, however, as in this study, may result in sustainable water resource management even though the models may be statistically weaker.
Since Chl-a is the major indicator of trophic state, considered as one of the top water pollution indices related to eutrophication, this study supports WFD application concerning the perpetual water quality monitoring of Greek lakes. WFD application throughout Europe aims at the monitoring of hydromorphological, physicochemical, and biological data to assess the ecological status of surface waters. Those data include optically active constituents of water that interact with light (e.g., Chl-a) and can be measured using remote sensing, as well as those that lack optical properties. Some examples are pH, dissolved oxygen, and nutrient concentrations. Monitoring of those properties, characterized by a low signal-to-noise ratio, by using geoinformation technologies-in particular EO and GIS-is a challenging task, and this motivated us to pursue it in the near future, exploiting the WFD monitoring results.