1. Introduction
Palic is a town located in the city of Subotica, Serbia. Its location and environment, including the vast vegetation and Lake Palic, make it an important tourist attraction. Near Palic is another village, Hajdukovo, where we can find another critical wetland complex, the Nature Reserve Lake Ludas. These two water bodies are considered very important, both as natural habitats and as tourist attractions. On this basis, they are included in systematic monitoring systems that provide information to the population regarding water quality.
The degradation of these aquatic systems is manifested through eutrophication, resulting in changes to water quality, increased nutrient content, reduced oxygen levels, and other physical, chemical, and biological alterations to the water. Lake Palic is annually affected by an algal bloom, resulting in unfavorable recreational conditions for tourists and the general population. The causes of these algal blooms are found in the conditions surrounding Lake Palic, such as the inflow of treated wastewater, the lake being surrounded by agricultural land subject to pesticides, illegal wastewater discharge into the lake, and the effects of environmental conditions (high temperatures, lack of mixing, etc.). Developing a functional yet sustainable monitoring and management system within these water bodies depends on the ability to source sufficient data to make informed decisions. The current monitoring practice relies on field measurements, which require a certain number of qualified personnel to carry out sampling, preserve the samples, transport them to the laboratory, and analyze them, resulting in a single piece of point data. The obtained results provide information about the evaluated parameter only at the sampling location at the given time. Considering the complexity of this process, it is easy to understand that this approach to monitoring is not sustainable if we wish to increase the temporal or spatial measurement frequency. The first parameter considered for this sustainable monitoring approach was Chlorophyll-a, as it is regarded as the primary indicator of the trophic status of the environment [
1].
The latest technological development enables us to use remote sensing as a tool for acquiring some of these data without being physically present at the location. Utilizing satellite images, we can correlate pixel information with changes in Chlorophyll-a concentration. Orbital sensors can identify the interaction of solar radiation with water and its various constituents [
1]. Careful evaluation of the spectral signatures of the materials present in the given aquatic environment, compared with the spectral signature of water, and consideration of how they interact can provide information on the trophic state, the water quality, and the presence, and even the amount of certain constituents in the considered water body.
Water has low solar irradiance at wavelengths shorter than approximately 400 nm and increasing light absorption for wavelengths longer than 850 nm. Consequently, the analysis of water quality by remote sensing is typically conducted within this interval (400 to 850 nm) [
1]. The signal that is detected by remote sensing originates from the top layer of the water column. According to Dekker [
2], the spectral reflectance of turbid coastal and almost all inland water is a function of absorption by algal pigments, detritus, and humus at short wavelengths, and pure water at long wavelengths. Scattering is caused by water molecules at short wavelengths, backscattering from particles, and absorption by humus and tripton (non-algal particulate matter). Absorption occurs at longer wavelengths of algal pigments. By investigating the spectral signature of Chlorophyll-a, one can notice strong absorbance of light in the areas near 650 nm and 680 nm, and reflectance peaks at around 560 nm and 700 nm [
3,
4,
5]. Although these intervals will vary depending on Chlorophyll-a concentrations and the presence of other optically active parameters [
4], notable areas of peak values will remain evident. However, this interrelation of environmental factors influencing the optical characteristics of waters indicates the necessity of incorporating a thorough analysis of site-specific traits when establishing models for estimating Chlorophyll-a concentrations.
Other applications of remote sensing in the monitoring and management of aquatic systems include ocean color monitoring [
3]. This information is essential for estimating rates of change of organic carbon pools associated with detrital and living materials, as well as colored dissolved organic matter [
2]. Further applications include monitoring sediment erosion, deposition, general water quality, and flow discharge by studying suspended and dissolved river material. This information can be extracted by retrieving the absorption and backscattering characteristics of the water, as well as the ocean surface reflectance. Remote sensing also allows differentiation of deep (where the bottom does not influence the reflectance values) and shallow waters. Remote sensing is applicable for climate change monitoring by identifying deep ocean heat content and predicting future warming patterns. Since satellites cannot directly measure temperatures in deep waters, this is achieved indirectly by measuring the sea surface height, which is then correlated with the deep-water temperatures. This is because it is known that warming below the surface causes thermal expansion.
Another simple application is monitoring the turbidity of water bodies. The process is very similar to the one employed for Chlorophyll-a estimation. The reasoning behind our choice of Chlorophyll-a in our study is that we had more measurements of Chlorophyll-a compared to turbidity measurements. Temporal analysis of satellite images provides information about changes in the aquatic environment over the years, which can reveal information about its expansion or contraction. Similarly, we can identify changes in river dynamics, such as the shifting of river shores and deltas, by watching for changes in their position.
When selecting a model for Chlorophyll-a identification, the goal is to choose wavelengths that have the lowest influence on both absorption and backscattering, thereby allowing for maximal sensitivity to the contribution from Chlorophyll-a. One such model considered by Wesley [
3] is the three-band NIR model, in which the red region is selected because it shows maximal absorption by Chlorophyll-a and no significant absorption by other constituents present.
The application of various color ratios for estimating chlorophyll in coastal surface waters was investigated by the authors Yen-Hoong Gin et al. [
6]. In general, these models rely on known behavioral characteristics of the matter present in the studied water body. Hence, by understanding the spectral characteristics of present optically active parameters, we can establish a correlation that will allow us to emphasize these characteristics for the parameter of interest, simultaneously reducing the impact of other parameters [
7]. Depending on the classification of the considered water body as Case I or Case II waters, different parameters will influence the optical characteristics of the water [
8]. In Case I waters, phytoplankton dominate, and in these water bodies, establishing a correlation between Chlorophyll-a concentration and reflectance values proved to be relatively straightforward. On the other hand, Case II waters, which have a significant content of total suspended matter (TSM) and colored dissolved organic matter (CDOM), in addition to the presence of Chlorophyll-a, require more refined correlations between reflectance values and optically active parameters [
8]. The popularity of algorithms based on reflectance values in the red and NIR regions stems from the fact that non-algal particles and dissolved organic matter exhibit notably lower absorption compared to the green and blue regions.
The Sentinel-2 mission is frequently selected for research purposes in this area because of its short revisit time [
7,
9]. A study conducted by Barraza-Moraga et al. [
10] considers the application of Sentinel-2 satellites for estimating Chlorophyll-a concentrations. They included both L1C and L2A level products in their analysis. Viso-Vázquez et al. also used Sentinel-2, specifically L1C-level data, for the remote detection of Cyanobacterial blooms and Chlorophyll-a [
11]. They complemented their investigation by calculating various environmental indices and defining their correlation with the in situ data. Bramich et al. [
12] also considered the application of Sentinel-2 data for Chlorophyll-a detection using an improved version of a semi-analytical algorithm. The model was evaluated by comparison with field data using a wide range of Chlorophyll-a concentrations. Considering the promising results reported by various authors, we also opted for L1C-level data from the Sentinel-2 mission.
The method for translating satellite information into Chlorophyll-a concentrations was determined after considering various research papers [
1,
8,
11,
13,
14,
15,
16,
17]. For example, a study published by Gitelson, A.A., and Merzlyak [
13] focused on identifying spectral indices that would increase sensitivity to chlorophyll concentrations. They found maximum sensitivity to Chlorophyll-a at 550–560 nm and 700–710 nm, showing a good correlation for a wide range of Chlorophyll-a concentrations. In their work [
13], the authors demonstrated that considering inverse reflectance at 550 nm and 700 nm is proportional to Chlorophyll-a concentrations. Indices computed using these reflectance values, such as R
750/R
550 and R
750/R
700, also exhibit direct proportionality to Chlorophyll-a concentrations. The authors tested these indices for a broader range of concentration values using several datasets and obtained promising results. Another extensive review of environmental indices is presented by Xue and Su [
13]. Additional possibilities for remote sensing in environmental monitoring are considered in [
15] by Batina and Krtalić. The latter manuscript considers the potential of remote sensing in estimating concentrations of parameters without defined optical properties, such as total nitrogen, total phosphorus, or dissolved oxygen. The authors argue that these parameters are often examined by relying on statistical correlations with optically active parameters, such as turbidity, transparency, water temperature, salinity, and electrical conductivity. There are numerous studies aimed at developing correlations between these two groups of parameters, as defining these could simplify regular water quality monitoring [
16]. A study presented by Aranha et al. in [
1] used Sentinel-2 data to estimate Chlorophyll-a concentrations. The authors employed L1C satellite images to monitor reservoirs located in Northeast Brazil. The model they used was a three-band reflectance model, in which the first reflectance had maximal absorption of Chlorophyll-a, and the second reflectance was invariant to Chlorophyll-a concentrations but influenced by other water constituents. Namely, if the first two reflectance values are influenced by tripton and CDOM, then the absorption by these two optically active parameters can be removed. The backscattering effect of tripton particles was an additional issue, and it was addressed by selecting the third spectral band, which was not influenced by the water constituents. Instead, it was predominantly affected by the absorption of pure water and the backscattering of tripton.
The aim of this study was to establish a sustainable monitoring system for Lakes Palic and Ludas, capable of providing systematic and reliable information regarding water quality, specifically Chlorophyll-a concentrations. To this end, we opted to employ remote sensing technology to establish a correlation between satellite images and Chlorophyll-a concentrations in these locations, which could later be used to estimate Chlorophyll-a concentrations based on satellite data. Considering the brief research and experience summarized by other authors, we opted to use the interrelation between spectral indices and Chlorophyll-a concentrations to create equations that would result in reliable Chlorophyll-a prediction models. The goal was to set up a sustainable Chlorophyll-a monitoring system for Lakes Palic and Ludas. The intended monitoring system relies on readily available Satellite data and occasional measurements, designed to verify the values extracted from the images.
3. Results and Analysis
3.1. Evaluation of the Environmental Indices
The correlation coefficients were determined for each lake segment separately, and they are given in
Table 2. The first row shows the number of images used for the extraction of the Chlorophyll-a model equation for the given location, labeled N. There is a notable difference in the amount of data used at sites N1, N4izl, and Ls and all the other locations. Other deviations between numbers of data were caused by cloudy images that could not be used for the study and missing measurements.
The intensity of the color in the table corresponds to the intensity of the correlation for the specific parameter. The color red is used to represent negative correlations, and blue represents positive correlations. There is a noticeable pull toward stronger coloring on the right side of the table, indicating a better correlation in these areas. The following section discusses the results for each area separately.
N1 is the first part of Lake Palic, which receives its water supply from the treated discharged water of the WWTPS. The correlation in N1 is very low for all the indices, less than 0.2. The reasoning for such low correlations could be found in the fact that this water body (with most of its physical, chemical, and biological characteristics) is artificially created by the water expelled from the WWTPS. The estimated area of N1 is roughly 41,108.5 m2, while the average daily discharge from the WWTPS at that time amounted to approximately 28,000 m3/d. Taking the average depth as 2 m, this leads us to conclude that the water in the last lagoon of the first sector is completely interchanged within roughly 3 days. This can explain why it was not possible to develop a correlation in this area, as there is insufficient time for natural processes to occur and to establish a correlation between the Chlorophyll-a concentration and its physical manifestation. Considering the absence of a natural correlation, there was no point in developing an equation for estimating Chlorophyll-a concentrations in this sector, as this would not have yielded reasonable results anyway.
Furthermore, treated wastewater typically contains various constituents (e.g., nutrients such as phosphate and nitrate, suspended particles, colored dissolved organic matter (CDOM), and chemicals) that influence the absorption and scattering of light. Consequently, the retrieved spectral signature will be influenced by these factors. Specifically, treated wastewater will elevate the nutrient content in the lake and can contribute to algal blooms and unrealistically high Chlorophyll-a concentrations. Increased CDOM, also introduced to the lake via the expelled wastewater, can lead to absorption in the 350–500 nm range, mimicking the presence of Chlorophyll-a. Additional issues can be caused by suspended solids introduced into the water, which increase reflectance in the red-NIR range, thereby disabling proper model establishment. A further problem arises from the fact that treated wastewater is continuously introduced to Lake Palic at relatively high rates, resulting in these changes occurring quickly. They cannot be modeled by a model relying on natural processes occurring at their own pace.
As previously discussed, the residence time in the first two sectors is relatively short, and this affects the water quality, causing rapid changes. If the estimated residence time is 3 days, then even a five-day satellite revisit time may be too long to detect important changes. Additionally, rapid flushing can prevent the accumulation of nutrients and other constituents, leading to a misinterpretation of the actual situation.
The next part of Lake Palic is the second sector, labeled as N2. Evaluating the results reveals a similar tendency in this case to the previous one. A brief water balance check of the second sector was also conducted. The approximate area of this sector is 50,878 m2. Taking the average depth as 2 m and the average discharge from the WWTPs as 28,000 m3/d, the water in this sector is replaced every 3.7 days. Once again, it is reasonable to assume that this is one reason behind the non-existent correlation. Another cause is again found in the fact that the expelled wastewater contains various constituents that impact the spectral signature of the considered water body. Due to the inability to find a correlation between measurements and environmental indices, there was no rationale for designing Chlorophyll-a estimation equations for this sector. One fact should be noted: as we move away from the discharge point of the WWTPs, our water balance assumptions lose integrity due to the increase in external factors that may influence the flow direction (e.g., evaporation, groundwater, external unregistered inflow, etc.).
The next sector is N3, the outflow from the third sector of Lake Palic. Although it still displays a limited correlation between measured Chlorophyll-a concentrations and satellite images, the results are notably improved. In this sector, we already see correlation coefficients of over 0.3 and 0.4, which can be taken as indicators of the water body reaching a more natural state and the diminishing influence of WWTPS. The area of this sector can be assumed to be 813,246 m3, with an average depth of 2 m. The discharge from the WWTPS is 28,000 m3/d, suggesting that the water is replaced every 58 days. This time may be sufficient to allow the water body to develop a natural balance within its ecosystem, thus resulting in natural relationships between Chlorophyll-a concentration and physical appearance. Although the retention time in this sector is considerably longer, the best R2-adjusted value attained with simple linear regression was only 0.119 for NPCI; thus, we saw no point in further consideration of this sector.
The fourth sector of Lake Palic, N4, has two sampling locations: one in the middle of the lake and one at the outflow end (
Figure 1). These two locations were considered as two independent sites, as there are times when they display significant differences in various water quality parameters. The correlation coefficients for the point in the middle of the lake (N4sr) are quite high, the highest being −0.807 for a simple ratio index 740/705. Considering the noteworthy correlation observed between this index and the measured Chlorophyll-a data, this index was selected for the equation model design.
N4izl is the outflow from Lake Palic. The overall correlation between the environmental indices and measurements is more diverse, but there are more indices with higher correlation coefficients than in the previous case. The highest correlation is 825, achieved with MI2. Consequently, at N4izl, the developed equations rely on this index. The approximate surface area of the entire fourth sector is 3,596,946 m2, with an average depth of 2 m; the retention time in this sector is approximately 257 days, which can be considered sufficient to establish a natural lake environment. The best correlation was found with GI; therefore, this index was selected for the creation of the simple linear regression.
Interestingly, although Lake Ludas is not physically divided into sectors, the water quality data, specifically the Chlorophyll-a concentrations, show conspicuous distinctions, as does the correlation coefficient. Here, the results for the middle and southern parts of the lake yielded higher correlation values. This could be explained by the mixed external influences on the result in the northern part of the lake. The Palic–Ludas channel enters Lake Ludas in its northern part. The water quality of this channel is questionable, as it draws water from Lake Palic, flowing through Bloody Lake, while also collecting any existing unregulated sewage discharge. Furthermore, there are a lot of reeds in this portion of the lake. Finally, water sampling was conducted near the shore on the north part of the lake, and in clear water at the two other locations. All of these facts could be responsible for the divergence in water quality and its physical manifestation. For the north point, labeled Ls, the highest correlation coefficient, of 0.559, was attained with GI; thus, this index was used for model development. In the middle of the lake, marked Lsr, the highest Pearson coefficient was 0.737, with OSAVI, which was subsequently used to create the Chlorophyll-a equation. Lastly, in the south part of the Lake, Lj, the most significant correlation between the measured Chlorophyll-a concentrations and environmental indices was displayed for 3BM, which is why this index was selected for the development of the Chlorophyll-a equation for the north.
3.2. Chlorophyll-a Prediction Equations
As has already been mentioned, the equations for estimating Chlorophyll-a concentrations were developed for each sampling point individually, relying on two approaches. The first approach was creating a simple linear regression model using the environmental index with the highest correlation coefficient. The second step was to develop a model that estimates Chlorophyll-a values based on two indices: one from the first model, and another that yielded the highest adjusted R2 values when added to the model.
The resulting equations are presented in
Table 3. The equations for N1, N2, and N3 are omitted due to poor correlation with the measurements.
Table 3 also includes results from the model’s performance evaluation and normality analysis of the developed models, including information on the number of data points used to establish the specific equation, R
2, adjusted R
2 values, RMSE, and statistics W and
p.
The results are in accordance with the correlation coefficients. The lowest R2 and adjusted R2 are found in the north part of Lake Ludas, Lsev, where R2adj = 0.291 for the simple linear regression and R2adj = 0.385 for the multivariate regression. These values are slightly higher in the middle and southern parts of Lake Ludas. The highest values are present at the outflow of Lake Palic, where R2adj = 0.667 for the simple linear regression and R2adj = 0.677 for the multivariate regression.
The higher RMSE values at this location also indicate poor model performance at Lsev. Finally, the statistics from the Shapiro–Wilk Test also showed that the model at Lsev is not reliable. The sample does not follow the normal distribution, and the conclusions extracted from the analysis are questionable. Although the results at the other locations satisfy the conditions set for the Shapiro–Wilk Test, location Lj also suggests that the developed model may be debatable. The highest statistical values are computed at N4izl.
Further assessment of normality was conducted using the Q-Q plots shown in
Figure 3 for Lake Palic and
Figure 4 for Lake Ludas. The images on the left show the plots for the simple linear regression, and those on the right show the plots for the multivariate regression. The results for Lake Palic generally follow a straight line, with approximately half of the points lying above and the other half below the line. The most significant deviation is noted for the multivariate model at N4sr, as shown in
Figure 3a on the right.
Figure 3.
Q-Q plot for the established models in Lake Palic.
Figure 3.
Q-Q plot for the established models in Lake Palic.
The Q-Q plots for Lake Ludas display more of a tendency toward dissipation. Although the points seem more scattered around the straight line, they are relatively uniformly distributed above and under the line.
The verification was conducted in two modes due to the significant differences in the number of available measurements. Accordingly, locations with fewer measurements were modified to include data from 2021 for the model development, leaving only data from 2022 for the model verification. On the other hand, locations with monthly available data were modeled using measurements only up to 2020, leaving data from 2021 and 2022 for verification.
The verification was carried out by downloading the images from the relevant days and processing them to extract normalized water-leaving reflectance. These values were used to calculate the environmental indices required for determining Chlorophyll-a values. The estimated values were compared to the measured data.
Figure 4.
Q-Q plot for the established models in Lake Ludas.
Figure 4.
Q-Q plot for the established models in Lake Ludas.
3.3. Verification of the Chlorophyll-a Prediction Equations
Figure 5 shows the comparison of measured Chlorophyll-a concentrations with the data calculated using the two developed models for Lake Palic, location N4sr. The dashed line presents the measured values, while the two other lines mark the simple linear regression model and the multivariate model. As this location has a smaller number of measured data points, we used the measurements up to 2021 for model establishment, reserving 2022 for verification. Nevertheless, simple visual inspection of the compared values shows a relatively good agreement. Both models can follow the obvious annual oscillations in Chlorophyll-a concentrations. Furthermore, the deviation of specific values between measured and computed data is satisfactory for both models.
The results presented in
Figure 6 compare the measured and calculated values at the outflow from Lake Palic, N4izl. The image displays all the measurements taken with the models during this period, from 2016 to 2022. Better agreement is noted until the verification period. In this time interval (years 2021 and 2022), the model does follow the tendency of the measured data, but it fails to reconstruct the extremes.
Figure 7 presents the results at the north part of Lake Ludas, Lsev. The annual oscillations are not as clearly visible here as they were in the two previous examples. The models are not able to reproduce, nor foresee, the hectic variations of the measured values. Even the average divergences between measured and computed values are much greater here.
The models developed for Lsr yielded more reliable results (
Figure 8). Both equations were able to reproduce the annual Chlorophyll-a cycles. Furthermore, the offset between measured and calculated values is much smaller than at Lsev. This is the first case where notable differences are observed between the results produced by the Simple Linear Regression (SLR) model and the Multivariate Regression (MR) Equation. The SLR models were able to reproduce the base values more accurately than MR, while MR appears to describe the peaks in Chlorophyll-a concentrations more accurately.
The results obtained for Lj,
Figure 9, suggest that the established equations are not very reliable for predicting Chlorophyll-a concentrations. Although a slight similarity is notable due to the presence of annual cycles, the intensity of these changes and their shape do not match. Additionally, we can notice a more pronounced difference between the measured and calculated values.
4. Discussion
This research aimed to establish a sustainable monitoring system for chlorophyll-a concentrations in Lakes Palic and Ludas. Hence, we evaluated the available measurements and compared the sampling times and locations with satellite images to develop equations connecting spectral reflectance values with Chlorophyll-a concentrations. Most of the measurements were used for model development, while a small portion of the data was retained for model verification.
To achieve better agreement between the measured and computed data, we first evaluated the measurements through the lakes. This evaluation revealed significant changes in water quality, from the inflow in the first sector of Lake Palic to the southern part of Lake Ludas. Accordingly, we treated each of these segments separately, aiming to achieve better-performing models.
The results were evaluated in conjunction with the existing knowledge of outer circumstances. Hence, the first three sectors of Lake Palic were excluded from the study because of extremely low Pearson correlation coefficients. This occurrence is attributed to the heavy influence of treated outflow from the WWTPS, which very quickly replenishes the water in these three sectors, preventing them from establishing a naturally balanced ecosystem. Further reasons behind low correlations are found in various constituents present in the expelled wastewater that influence the spectral signature of the water body.
The remaining locations were modeled using two types of equations: a simple linear regression and another using a multivariate regression based on two variables. There are no significant differences in the overall performance of these two model types. The models were able to follow annual tendencies but had limited accuracy. The causes of these deviations are due to several reasons, listed in the forthcoming paragraphs.
Even though the revisit time for Sentinel 2 is only 5 days, the temporal delay between sampling and taking of the image was often unavoidable. There were numerous occasions on which the images on targeted dates were of low quality; hence, we needed to use an image from another time, thereby increasing the time gap between sampling and image taking. This time gap increases model discrepancies by mis-calibrating the data.
The pixel size was 10 m, which is considered a high resolution. However, in this specific situation, considering the size of the lakes, changes in data over an area of 10 × 10 square meters are still significant and can lead to reduced accuracy.
Another issue we faced with pixel information was in situations in which the specific pixel matching the sampling location was not available, due to cloud coverage or poor visibility issues. Consequently, we opted for another pixel, but this led to an increased spatial gap between the sampling location and the utilized pixel, again reducing the model’s accuracy.
An additional factor influencing the model’s accuracy is the unequal number of data points in all four seasons. It is evident that the winter images often had very low quality and were not suitable for inclusion in the model development. This will limit the model’s ability to forecast Chlorophyll-a values in winter accurately. This specific issue can only be resolved by incorporating field measurements of reflectance via an alternative method, as reduced visibility is common during the wintertime. To enable the models to forecast seasonal variability, it is essential to include time series analysis covering annual trends in the calibration procedure. We used Chlorophyll-a measurements conducted over a time interval of 5 or 6 years, depending on the location. The remaining measurements, taken over 1 or 2 years, were used to verify the developed model. During this time interval, we obtained data for various parts of the year, feeding the model with information for all seasons. This is important as it allows the model to predict Chlorophyll-a values under various conditions. Additional possibilities to improve the modeling of seasonal variability include expanding the model to include environmental parameters that influence Chlorophyll-a values or further developing the model to incorporate machine learning techniques.
Finally, the limited availability of data also restricts the reliability of the developed models. Although the examined period spans 7 years, the data we could use were minimal. The omitted measurements typically do not align with the timing of poor-quality satellite images. This meant we often had to remove a couple of months from a considered year.