Inland waters such as lakes, reservoirs, and rivers are important water resources; they regulate climate and hydrological flows; support soil formation, nutrient cycling, and pollination; enable food production and water supply; and provide aesthetic conditions, cultural services, and recreation [1
]. Therefore, their protection is vital and their water quality must be assured. Based on a water body’s intended purpose, the parameters of water quality must achieve certain standards. Monitoring these parameters allows for the detection of sudden harmful changes and establishes opportunities for implementing preventive and restorative measures to recover healthy conditions.
Conventional methods for monitoring water quality, so-called point sampling methods, determine water quality indicators by collection of samples directly from the field and their analysis in a laboratory. However, these traditional methods have important constraints. First, in-situ sampling is laborious and requires extensive time to cover large areas, which increases costs. Furthermore, investigating the spatial and temporal trends of water quality parameters in large waterbodies is not feasible due to limited sample points, which do not accurately represent the complete status of the water surface [2
]. Moreover, the topography can play an important role in restricting access to some areas of water bodies, and errors may still exist in field and laboratory measurements.
From the conventional parameters, Secchi Disk Depth (SDD) is a common measurement of water transparency, which can be evaluated using the approach developed by Pietro Angelo Secchi [3
]. In this method, a white and black disk disappears inside a water column at a certain water depth; therefore, SDD is commonly measured as a numerical variable for distance as meters (m). Additionally, SDD is inversely related to the average amount of organic and inorganic materials along the water column [3
] and is a practical indicator of trophic conditions [4
]. SDD is employed to study relative nutrient loads and particle contents as well as visually track the flow of suspended detritus and the displacement of sediment influxes from tributary streams and rivers. In a eutrophication process, the water is affected by algae saturation and other aquatic plants due to excess nutrients. The remaining matter of such aquatic plants depletes oxygen from the water, causing oxygen-dependent life to deplete. Fertilizers from fields, human sewage and animal wastes are the main sources of such nutrient loads. Inland waters with high eutrophication are characterized by poor water quality, which can potentially threaten human health and constrain usage [8
]. Turbidity in a water body is caused by suspended chemical and biological particles via scattering and absorption of light. This water quality parameter has implications for both water safety and aesthetics regarding drinking-water supplies [9
]. To measure Turbidity, an electronic turbidimeter in nephelometric turbidity units (NTU) is employed, which requires water samples.
Through Remote Sensing (RS), it is possible to acquire information from the Earth’s surface. This can be achieved over different scales, regions, and periods of time. Information concerning inland waters can be utilized to retrieve physical and biochemical parameters of the water using the spectral reflectance measured by RS sensors in several bands of the electromagnetic spectrum. This procedure has helped to develop water quality monitoring with RS in the recent decades. The successful history of water quality monitoring applications has been detailed in studies by Dekker [10
], Cheng [6
], Odematt [11
], Matthews [12
] and Hansen [13
]. During its period of operation (2002–2012) and beyond, the Medium Resolution Imaging Spectrometer (MERIS) provided by the European Space Agency (ESA) has been successfully employed to monitor inland waters [11
], and its archives are considered a rich source of data for water research [16
]. MERIS has outstanding advantages for monitoring water quality, including full spatial resolution of 260 × 290 m, 15 visible (VIS) and near-infrared (NIR) bands, as well as an extensive web-enabled image archive (2002–2012) [17
]. MERIS also enables temporal analysis applications with its three-day temporal resolution.
The current ESA satellites, the Sentinel-2 and 3 are under operation since 2015 and research of former events must be addressed using archived imagery from prior sensors. Increased use of MERIS for inland water quality analysis was visible during the years before the launch of the Sentinels (2010–2015) with a decrease after 2015. During those years, MERIS was an important source of data for inland water quality research considering the quantity of available sensors (Figure 1
Currently, archived data from MERIS contain valuable information in many fields that has yet to be processed, as is the case of applications in inland water quality. This situation opens the need to further research and increases the limited number of studies analyzing the complete MERIS imagery, which leads to a broader scope and better representation of study cases. Despite the advantages of monitoring water quality using RS techniques, these methodologies are not yet broadly applied by water resources and policy managers [18
]. Further research using RS with field measurements is therefore necessary to evidence its benefits and potential in protecting water resources and promptly detecting potential hazards.
To estimate water quality parameters, various approaches have been developed based on the relationship between RS reflectance and optical characteristics of water constituents. In general, these methods can be broadly divided in empirical, semi-analytical and machine learning methods [16
] with further sub-classifications among them. Empirical methods employ band and band ratio as coefficients to establish relationships. Frequently, several combinations of input values are evaluated through comparison of error metrics looking for the best fit. The result is a regression algorithm that can be applied to the images of the study area and dates of interest to estimate spatial and temporal variations in water quality parameters. This approach is, to some extent, easily applicable when there are enough in-situ and RS data; however, its application is limited to the studied water body and cannot be generalized to other regions due the variations of atmosphere and water composition [21
]. If an empirical method selects bands or band ratios based on the knowledge of the physical characteristics of water components that may affect specific wavelengths, then it is classified as a semi-empirical method. On the other hand, analytical approaches use the knowledge of physics of light. They define the specific and necessary parameters of a model on the base of the optical properties of the water and atmosphere also known as inherent optical properties (IOPs). The modelling process produces theoretical absorption and backscattering values which can be separated to estimate optically active water quality constituents using an inverse equation [22
]. The semi-analytical approaches implement in addition in-situ measurements, to define the parameters of the inverse equation and to reduce the difficulty of modelling complex waters. These models can derive several water quality parameters simultaneously [23
] and they can be applicable to other regions different from the original study area. However, their use require various large spectral datasets for training and computing, as well as considerable fieldwork in the regional context to develop robust algorithms [16
The machine learning (ML) techniques in the RS field were introduced to overcome the complex association between the RS data and the water constituents present in the parametric regression models as least-squares or multiple regression [27
]. A standard procedure of regression approaches is the linear regression (LR) which is a statistical method that allows to observe the relationship between two constant numerical variables. It can be classified as an empirical approach in the water quality modelling field or as a ML basic algorithm for data analysts. During this paper we will define LR as a ML approach for further comparison. Another widely applied algorithm is the Support Vector Regression (SVR) [29
] which is a supervised learning method trained with labeled data. As the support vector machines (SVM) used for classification, SVR algorithm includes the C
hyperparameter and the kernel trick. It is useful with a limited number of samples because of its good generalization ability. Also common, the random forest is an adaptable procedure useful for classification and regression (RFR). It employs subsets of the data which are averaged for enhancement of predictive capacity, control of over-fitting and handling of large datasets. RFR has been implemented to several RS applications including water resources [32
]. A more recent method for estimation of biophysical parameters, the Gaussian Processes Regression (GPR) [34
] provides a Bayesian approach to learn regression problems using kernels [34
]. It has lately been applied for water quality parameters retrieval from remotely sensed data with high performance in its estimations [36
]. When lacking spectral field measurements, the modelling process in ML algorithms can be implemented with less data and different assumptions for their training stage in comparison with radiative transfer models [39
]. For water quality studies, the ML approaches analyzing completely and intensively the MERIS imagery of lakes and reservoirs are sparse due to their recent development and the previous operating timeframe of MERIS. Thus, these studies using novel algorithms could take a greater advantage of the legacy of this sensor increasing the usage of such rich source of data.
The Valle de Bravo reservoir in central Mexico is a multipurpose waterbody that provides drinking water to the metropolitan area of Mexico City. It is also the most important reservoir in the country for recreational activities such as tourism, fishing, and sailing [40
]. Most of the previous research in Valle de Bravo is limited due to the use of conventional measuring methods. These constrains are in temporal and spatial scale due to scarce measuring stations or impossibility of continuous sampling campaigns due the time and costs demands. In the last two decades, studies by Olvera-Viascan [41
], Ramirez-Garcia [43
], Nandini [44
] and Figueroa-Sanchez [45
] analyzed the reservoir and expressed concern about its trophic state. Some authors ultimately offered strategies for improving the reservoir’s water quality and reducing the presence of toxic cyanobacteria, pointing as main contributors of the degradation of water quality the scarce wastewater management, the agricultural runoff and the surrounding ecosystems factors. In Mexico, there is a national monitoring water quality program under the “Sistema Nacional de Información del Agua” (SINA) with measurement stations (around 5000) distributed across the inland waters of the country, with five fixed stations located in Valle de Bravo. However, these five stations and the measured water quality parameters can likely be insufficient for accurately representing the spatial and temporal scale of harmful events in the water, especially in cases of eutrophication or harmful algae. Moreover, the measured parameters are limited to control the pollution from wastewaters as biochemical oxygen demand (BOD), chemical oxygen demand (COD), total suspended solids (TSS) and fecal coliforms.
The major installation of monitoring stations began in 2012, which indicates there is no comprehensive water quality data about the reservoir prior to this time. As a result of the limited monitoring capacity in the reservoir, there is an increasing demand for continuous monitoring of water quality parameters in the region, especially for such important reservoirs which supply drinking water to great urban areas where millions of people reside. Furthermore, a lack of knowledge of the water quality conditions may persist in the years prior the establishment of monitoring programs. Similar limitations can likely be present in transition and developing economies either because they lack extensive survey networks or because these networks are of recent implementation and therefore no previous data can be acquired. Standard procedures which may help to overcome these limitations are needed and they are of particular benefit for such regions to improve their water quality monitoring capacity. One way to overcome these restrictions is using available resources in combination with current analysis techniques. This leads to clarification of the variations of inland water quality in recent years, together with the implications of natural and anthropogenic hazards in water quality detriment. In this way, overall conclusions of the water quality could be achieved even in lack of extensive field or surface spectral data measurements.
Concerns about the water quality conditions and quantity of the water supply raised for the urban region of Mexico City during the previous decade [46
] and until today regulation in the supply is commonly applied. As the most important drinking water source for the region, the protection and continuous monitoring of Valle de Bravo reservoir is an essential duty. The understanding of disruptive events that occurred in previous years may lead to a clear comprehension of the current situation and to avoid formerly occurred threats. To contribute to such needs in the region, this paper analyzes the water quality parameters variations in the Valle de Bravo reservoir for a period of 11 years, prior to the launch of current sensors used for water quality monitoring. Water quality measurements from sampling campaigns conducted in 2010 and RS data from matchup MERIS imagery are used as input for ML algorithms. From the analysis, the best model is selected and applied to the complete MERIS data archives (2002–2012) to examine the spatial and temporal variations of water quality. This could contribute to future research on water quality of lakes and reservoirs where limited monitoring is implemented but the resources to increase its investigation exist. The main objectives of this research are focused firstly, on the development and evaluation of a methodology based on ML approaches using MERIS spectral data and physically water quality data measured in Valle de Bravo. Secondly, on the analysis of the spatial and temporal dynamics of the water quality in the reservoir during the entire MERIS operation timeframe (11 years), which also complements the scarce number of studies taking advantage of the complete MERIS imagery. Also, as the ML techniques are commonly based on different assumptions, a further and continuous evaluation of their predicting capacity is necessary to determine which approach may be better to evaluate and map water quality in the region using MERIS data. Finally, this study also contributes to increase the use of ML techniques in the analysis of water quality parameters in lakes and reservoirs, which are of recent implementation. The results of this work will complement the existing literature for water quality evaluation in the reservoir. To our best knowledge, no comprehensive integration of in-situ water quality measurements and RS techniques has yet been implemented to monitor water quality in this region for such amount of time or using ML approaches. This study aims to fill this research gap for the intended water quality parameters. The findings of this work are expected to provide guidance to policy makers on incorporating satellite RS into national in-situ water quality control program.
Utilizing the remote sensing reflectance from MERIS data and in-situ collected samples, this study developed and validated machine learning algorithms to estimate the water quality parameters of Secchi Disk Depth (SDD) and Turbidity for Valle de Bravo reservoir in central Mexico from 2002 to 2012. Using the dataset 1 (DS1), the models performed well for estimation of both water quality parameters with satisfactory cross-validation results and with a slightly outperformance for GPR (SDD: R2 = 0.81; RMSE = 0.15, Turbidity: R2 = 0.86; RMSE = 0.95) followed by LR, SVM and RFR. With this, the contribution to the continuous analysis of MERIS imagery stored is reinforced. The results obtained confirm that ML algorithms are current useful approaches to retrieve water quality parameters from RS data.
From the temporal analysis it, is suggested that the droughts of 2006 and 2009 acted in detriment of the water quality of the reservoir. The seasonal fluctuations were affected with unusual behaviors during 2006–2009 and contributed to lower values in 2010. The water transparency measured with SDD retrieved low values (≈1 m) during these periods. The Turbidity estimations confirmed this behavior with high values (≈12 NTU) during the same years. The suggested classification indicated an evolution from an initial trophic stage in 2002–2005 to an intermittent hypertrophic one during 2006–2008 and 2010, before a slight recovery to trophic status during 2011–2012. The water patterns also suggest that periods with low SDD and high Turbidity coincide with the rainy months (June–October) and thus, runoff of surrounding areas could have had influence on transparency owing to the loads of suspended materials and dissolved solids. On the contrary, opposite behavior, high SDD and low Turbidity was observed in dry seasons.
The methodology applied in this study yielded results that were consistent with independent evaluations, confirming the idea that RS techniques are powerful tools for overcoming limited resources when planning monitoring programs of water quality, even across long time periods. The synchrony of field measurements and the acquisitions of the sensor is of major importance. Ideally, same day in situ data is preferred for validation of satellite products. Regarding this study, it is necessary to consider that greater uncertainties in the results may be present due the variation of ±2 days between field and satellite data collection. To avoid this, it is recommended to conduct continuous field measurements and use sensors with enough temporal resolution.
Local water quality monitoring systems are present in different countries of the world to periodically analyze the state of inland waters. Such systems have great potential for integration with RS techniques. This combination could allow extensive spatial and temporal analysis on a greater scale. Scheduled field campaigns paired with the date of image acquisition by respective satellites could be useful for data calibration, training, and validation. The continuous measurements of water quality parameters could serve as a constant source of field data. The RS resources, as the MERIS archives, offer valuable data and an important opportunity to contribute to the understanding of how diverse events influence inland waters. The study of data acquired from sensors such as MERIS is essential for the understanding of the water quality of lakes and reservoirs in the last two decades. The current operational satellites, particularly the Sentinel-2 and Sentinel-3, are the natural successors of ENVISAT with MERIS sensors; however, extensive analysis of periods of time of any inland water compelling the first 20 years of the 2000 years would require the contribution from MERIS for a wider monitoring. For the case of Valle de Bravo, this study could serve as a base for further monitoring using Sentinel data and investigate its evolution during the remaining 8 years of the decade. The full exploration of the usefulness and performance of MERIS in monitoring inland water quality would be beneficial to the development and improvement in utilization of successor satellite missions/sensors, i.e., the Sentinel-3 with OLCI instrument, that is continuity of the MERIS instrument capability. The validated good performance of estimated water quality parameters using MERIS data in this study provides confidence in combining MERIS and successor satellite missions to extend a longer-term monitoring of inland water quality.
ML regression models are useful methods to retrieve water quality parameters for the first decade of the century using MERIS imagery, particularly in inland waters with special importance for human health, as seen in the encouraging accuracies retrieved. Future work will focus on (i) gather in-situ national water quality monitoring system datasets, (ii) process spectral datasets of current sensors like Landsat 8 OLCI or Sentinel satellites, (iii) extend the analysis of inland waters of the region where the most of the water quality remains uncertain at long-time period scale, (iv) assess new approaches like variations of linear regression (ridge linear regression, radius neighbor regression, elastic net regression) or trees (gradient boost regression trees) and (v) estimate other important water quality parameters as Chl, CDOM, TSS or nutrients. All the above with the aim to contribute to the knowledge of water quality status and trophic state of inland waters in regions which have not been previously studied using remote sensing techniques.