The typical methodology for investigating water quality involves collecting water samples directly from various locations and laboratory analyses. While this method may result in accurate assessments of water body quality with limited areas, it is time consuming and expensive, and difficult to apply in large areas. Moreover, because the results are punctual, they do not necessarily reflect the quality of the whole site [1
Alternative measures for in situ monitoring of water quality in lakes, dikes, and reservoirs can be obtained by means of remote sensing techniques. Such an application is only possible due to the presence of optically active components in the water. These substances can be identified via sensor systems in that their presence in a water body results in different absorption and backscattering patterns of the incident light, which are characteristic of each component. Among the parameters of water quality, suspended inorganic sediments, organic chlorophyll-a, and dissolved organic material are the main agents of absorption and scattering of electromagnetic radiation in a water body [3
It should be noted that these components are directly related with the quality of the aquatic ecosystem and its surroundings. For example, total suspended solids (TSS), which represents the total amount of inorganic or organic particles drifting or floating in water [5
], may be related to water pollution since these can serve as a transporting and storage agent of various pollutants, as well as erosive processes in a river basin (resulting in silting of major rivers and reservoirs) [4
]. TSS concentration is often related to total primary production, heavy-metal and micro-pollutant flows, and in many turbid regions, is directly linked to sediment transport problems and the light available for primary production [6
An indirect measurement of TSS in water bodies via remote sensing can compensate for deficiencies in manual water quality monitoring by being fast, allowing for continuous monitoring of large areas [2
]. Most of the studies published on the TSS prediction from remote sensing involve the use of spectral data retrieved from satellite images. Because of its medium spatial resolution (30 m), in the studies of remote water sensing, one the most common satellites are Landsat, such as found in Qun et al. [2
], Kong et al. [8
], Din et al. [9
], and Amanollahi et al. [10
]. Song et al. [6
] tested the images for medium spatial resolution IRS-P6 (Indian Remote Sensing Satellite) as well. Besides these, Wang et al. [7
], Breuning et al. [11
], and Moridnejad et al. [12
] used MODIS (Moderate Resolution Imaging Spectroradiometer) satellite images and Campbell et al. [3
] of the MERIS (Medium Resolution Imaging Spectrometer), with these having low spatial resolutions (250 and 300 m, respectively) and limited to large areas of water.
Thus, although remote sensing serves as a powerful technique for monitoring environmental and seasonal changes, and its ability to remotely monitor water resources has increased in recent decades because of the improved quality and availability of satellite imagery data [13
], the analysis of small water bodies may not be adequate due to the medium image resolution of the most usual commercial satellites [1
]. In this case, the use of aerial images obtained by unmanned aerial vehicle (UAVs) for monitoring small bodies of water has presented good results and becomes promising for producing greater detail due to high spatial resolution and the possibility of constant monitoring [14
Although some applications of UAVs for water quality parameters monitoring, such as chlorophyll-a [1
], organic matter [18
], and suspended solids [1
], have been demonstrated in the literature, there are still few studies focused on this application. For suspended solids monitoring, for example, Veronez et al. [18
] and Saénz et al. [19
] used regression analyses between TSS values measured in the laboratory and the UAV responses in the visible and near infrared (NIR) regions to generate their prediction models. Saenz et al. [19
] explored relations between individual bands and combinations between them (as NIR-red, for example), and Veronez et al. [18
] chose to relate to vegetation indexes such as normalized difference vegetation index (NDVI) and normalized difference water index (NDWI). Although both mentioned studies have shown positive results, the modeling of these parameters in complex environments is not always possible through regression analysis.
Therefore, cognizant of the limitations that techniques such as regression analysis has, the need for research and improvement of inland waters monitoring techniques integrated with the facilities provided by the technologies developed and available in the market can be seen. Among these modern techniques that can provide the support for the monitoring of waters via remote sensing is the artificial intelligence with the use of neural networks.
Approaches involving neural networks are promising in the area of remote sensing and the development of water quality models because they can be more sensitive and robust than other traditional regression techniques, with the ability to capture both linear and non-linear relationships between the involved parameters [8
]. However, results presented in the literature on artificial neural networks (ANN) approaches in water bodies use mainly satellite imagery of low [12
] to medium [8
] spatial resolution.
No papers were found that included the application of artificial neural networks to the analysis of high spatial resolution images obtained using UAVs, and therefore, this study intends to fill this gap. The aim of this article was to use remote sensing technologies to evaluate water quality, identifying an alternative method for monitoring and quantifying the concentration of suspended solids in water, through the correlation between UAV images and limnological data using regression analysis (RA) and artificial neural networks (ANN). Furthermore, this study aims to contribute to the development of temporal and spatial water quality monitoring techniques through modern remote sensing tools and artificial intelligence.
The manuscript is structured as follows: Section 2
contains the information about the field site, the acquisition of the data, and its subsequent analyses; in Section 3
, we present and discuss the results of the research on the concentration of TSS, regression models, and ANN; and finally, Section 4
presents our conclusions regarding the study, with indications of its importance and its continuity.
2. Materials and Methods
The method that we are proposing can be structured according to the following steps: GNSS (Global Navigation Satellite System) data acquisition, water sampling and laboratory analysis, overflight with the UAV and processing of the images, extraction of values from images UAV, regression analysis, and training and testing of the ANN. The flowchart of the proposed method is depicted in Figure 1
and detailed in the following subsections.
2.1. Field Site
The adopted study site was the lake on the Unisinos University campus, located in the state of Rio Grande do Sul, southern Brazil (Figure 2
). The lake is artificial, has an area of approximately 0.025 km² and maximum depth of 4 m. Although small, it is located at the lowest altitude of the campus, and because it is formed from rainwater drainage collected at the university, it contains several inorganic and organic compounds found in the form of suspended solids or organic matter from rainwater runoff [18
The lake and its surroundings also function as an ecosystem for several species of animals, such as ducks, geese, and several other birds, as well as a great diversity of fish. Because it is a university campus, the area has several buildings, paved areas, and a large circulation of people and cars. However, the campus also has several vegetated areas, mainly around the lake, as can be seen in Figure 2
Studies addressing the applicability of remote sensing in the monitoring of water bodies have already been developed in this area. Guimarães et al. [17
] used spectral data collected in the field and UAV images to model the chlorophyll-a concentration in the environment. Veronez et al. [18
], based on UAV images, applied neural networks to estimate Landsat 8 OLI satellite bands and correlated this with data on suspended solids and dissolved organic matter.
Studies including the characterization of this lake, the behavior of the limnological variables, and their relationships with the remote sensing variables are important as they serve as pilot studies to be applied in larger water bodies.
2.2. Data Acquisition
We performed two field samplings in March 2016 and 2017 during the transition period between the seasons of summer and fall. The collections were carried out in a single day and we ensured that the climatic conditions of both days were similar. The average temperatures were between 22 and 24 °C, winds with a speed of 0.4 ms−1 (southeast direction), and without the occurrence of precipitation events on the days of the collections.
On the same days, the UAV overflew the area and in situ collection of water samples occurred such that the two pieces of information could be compared as being representative of the same conditions of the lake. Besides, possible temporal variations from one year to another can be evaluated for the collected data and compared to the predicted one from the analyzed RA and ANN methods.
We selected 21 sample points, as shown in Figure 3
, that were spatially distributed over the lake such that surface water samples (up to 0.5 m) were collected for the laboratory determination of suspended solids using the gravimetric method described in the Standard Methods for the Examination of Water and Wastewater [22
The UAV used to take the images was the SenseFly, Swinglet CAM model (SenseFly Parrot Group, Cheseaux-sur-Lausanne, Switzerland). It was coupled to a Canon ELPH 110HS (Canon U.S.A., Inc., New York, NY, United States) camera with a 16-megapixel resolution and was factory-modified to capture the NIR band instead of the red band. Thus, mapping was in three distinct channels, namely: near infrared (NIR), green (G), and blue (B).
As well as the sampling points for water collection, in the field we also established and tracked six ground control points (GCPs), through the GNSS (Global Navigation Satellite System), based on the RTK (real time kinematic) method, located in the area of coverage of the flight such that later their positions were used in the georeferencing of the images obtained.
The images obtained using the UAV were processed using the PIX4D software, version 2.1 (Pix4D S.A., Lausanne, Switzerland), in which the images were orthorectified and georeferenced, where we adopted the SIRGAS 2000 (Geocentric Reference System for the Americas) as the reference system, in the UTM (Universal Transverse Mercator) −22S projection zone. We generated orthophotos with a pixel size of 5 cm × 5 cm.
2.3. Data Analysis
In order to perform analysis between the data collected of the water quality and those obtained via remote sensing, we plotted the sample points where samples were collected in the orthophotos generated by the overfly with the UAV and extracted the values of the pixels concerning each point for the NIR, G, and B channels. We emphasize that among the collected values of the two years (n = 42), four were disregarded because the points were located in a shaded area of the image. Thus, a sample of 38 points was considered for the analysis.
We used this data to predict the suspended solids concentration in Lake Unisinos using linear and non-linear regression analysis (RA) and artificial intelligence through an artificial neural network (ANN). The aim of this step was to identify a model to quantify the concentrations of suspended solids present in the water using the information obtained through remote sensing. We evaluated their performances using the following statistical metrics: coefficient of determination (R²) and root mean square error (RMSE).
Linear and non-linear regression models were investigated. The considered non-linear functions were exponential, logarithmic, quadratic, and power (range from −1 to 1). Knowing that different concentrations of suspended solids present different responses for each wavelength, to predict TSS in RA models, we considered as independent variables each channel individually (NIR, G, and B) and the operations of bands (sum, subtraction, and ratios) to highlight the spectral characteristics of the compounds [6
]. Thus, besides the simple regressions with the covariates included individually, multiple regressions were considered with two or more independent variables combined, taking care to avoid dependence among the covariates.
Considering the sample size of this experiment, all 38 observations were used for RA modeling because the probabilistic assumptions of this class of models. After adjustment, the usual residual verifications related to the distribution (Gaussian), independence, and homoscedasticity were checked. If the estimated model was adequate, a cross-validation step could be performed where one observation at a time was left out of the adjustment for comparison or a sample part was reserved when the sample size was large.
In the TSS prediction from the ANN, which is a distribution-free method, the neural network modeling considered two processing steps, the first being the training of the network, and the second being its subsequent testing with a data set different from the first stage. In this study, we used 80% of the data collected for ANN training and 20% for testing, which were randomly defined [23
As the objective of the method was to create an ANN capable of recovering the concentration of suspended solids in the water from the bands of the modified Canon sensor incorporated into a UAV, we considered the normalized values of the NIR, G, and B channels as inputs to ANN, and the TSS concentration as an output at the same sampling point. We used a network of feed-forward backpropagation, with this being commonly used in remote sensing studies [8
During the training phase, several tests were carried out in order to obtain the best ANN topology applicable to this study, choosing the network that provided the highest correlation coefficient and the lowest mean square error during training and testing. We tested different numbers of neurons (from 5 to 20) in one single hidden layer, as well as three activation functions (sigmoid, tangent, and linear), and the number of training cycles.
3. Results and Discussion
The results of the laboratory analyses were satisfactory for the research and compatible with prior knowledge of the water quality in the study area and analysis of the spatial behavior of these parameters, which would later be compared with the UAV images. Table 1
shows the descriptive statistics of the total suspended solids (TSS) analyzed in this research for March 2016 and 2017.
We observed from the analysis of the data presented in Table 1
that the characteristics of the study lake were not the same between the two collections. This was also confirmed from the Wilcoxon test at a 95% confidence level. There was a decrease in the concentration of suspended solids from 2016 to 2017, which can be seen in the averages, medians, maximum, and minimum values of Table 1
This difference in TSS concentration, although small, can be justified because although it did not rain on the days of sampling in 2016 and 2017, there were rainfall events in the week before the collection of 2016 (85 mm according to the experimental climatological station located at Unisinos University), which did not occur in 2017. Allen et al. [24
] point out that in impermeable urban areas, the flow of rainwater in the soil causes the collection of the pollutants and sediments from these surfaces, which are transported to the nearest waterways. As the lake receives the drainage of rainwater from the university campus, it is expected that in rainy periods, various compounds will be carried into it, increasing the concentration of suspended solids, for example.
As initial cartographic products, obtained via overflying with the UAV in 2016 and 2017, and by processing the images, we have the orthophotos of the area, as shown in Figure 4
The simple and multiple linear and non-linear RA described in the previous section were evaluated; however, most of the results were unsatisfactory. Table 2
shows the best results that we obtained in these analyses. Although not shown, residual analyses were performed to check the error assumptions.
According to Table 2
, the best adjustments of the simple regression analyses were for the NIR and G/NIR variables, agreeing with Song et al. [6
] and Amanollahi et al. [10
], although both studies obtained better results than ours (0.7 for Amanollahi et al. [10
] and above 0.9 for Song et al. [6
]). Also, a combination of B and G/NIR was the best result for the multiple linear regressions.
Although the regression models in Table 2
showed statistical significance, the low R² values indicate that the RA models were not ideal for TSS recovery in the study area. This result can be explained by the optical complexity of the study waters such that the relations between the bands of the collected images and the concentration of TSS could not be explained by traditional regression techniques.
To improve the accuracy of TSS predictions, ANN can be effective. Kong et al. [8
] emphasize that ANN models establish different weights for each input in the network and thus take full advantage of the characteristics of TSS included in the different bands.
We performed several trainings of neural networks with different topologies. In Table 3
, the results include a coefficient of determination (R²) greater than 0.5 in the training step and their respective topologies, activation functions, number of epochs, and time of training are presented.
According to Table 3
, the topology in which we obtained the smallest RMSE and the highest determination coefficient was 3-7-1, with the tangent function as the activation function, and with 300 training cycles. Thus, the ANN adopted was a feed-forward backpropagation type, with three input layers (NIR, G, and B), seven neurons in a single hidden layer, and one output (TSS). As usual, the training processing time depended on the number of epochs and was not a problem in our experiment because of the reduced sample size.
The results that we found in the training and testing steps for this best ANN are presented in Table 4
. The graph presented in Figure 5
demonstrates the comparison between the data measured in the laboratory and those estimated through ANN.
The ANN training stage resulted in an R² of 0.84 and RMSE of 1.33, while during testing, these values were 0.57 and 2.97, respectively. As shown in Table 4
and Figure 5
, considering all the data collected in this study as inputs to the ANN, the R² was 0.75 and the RMSE was 1.81.
As expected, the results showed a significant improvement in the prediction of suspended solids data in the study area through the use of ANN in place of the simple and multiple linear and non-linear investigated RA.
Although several studies show good results using regression methods to predict TSS [2
], others, such as Song et al. [6
], Amanollahi et al. [10
], Moridnejad et al. [12
], and Wu et al. [25
], compared the two methodologies (RA and ANN) and obtained results indicating better quality in the prediction of the data through an ANN, signaling the capacity of the neural networks to model more complex and non-linear relations between the parameters. Only Kong et al. [8
] reported that an ANN did not present better results than regression methods for TSS predictions in their area of study.
Din et al. [9
] used statistical correlation analysis only as a support for choosing the ideal bands of the Landsat 8 OLI satellite for an ANN input. Then, the authors decided to include also the bands of the short-wave infrared (SWIR-1 and SWIR-2) as inputs, which is not common in papers about ANN for predicting water quality parameters since only visible and near infrared regions are exploited [6
Although the aforementioned approaches from the literature are similar to our paper for comparing RA and ANN for the prediction of TSS, we point out that our results differ and are highlighted by the high spatial resolution of the UAV images used in comparison to low or medium spatial resolutions of the satellite images of other studies. Thus, our method allows for giving more geographically accurate TSS predictions because of the small pixel size of the UAV images (5 cm in comparison to 30 m for Landsat, for example) and generating high quality and resolution TSS monitoring maps.
Finally, the ANN model was used to predict the TSS concentration for the whole lake using the NIR, G, and B variables for the 2016 and 2017 UAV images. Thus, the generated TSS maps for Lake Unisinos are shown in Figure 6
While analyzing Figure 6
, we noticed the highest concentrations of suspended solids in the 2016 sampling compared to the 2017 one, a situation that was already indicated in Table 1
. Besides, the used data set presented a significant statistical difference between the two years, where the spatial distribution also became evident in Figure 6
. The highest concentrations of TSS in 2016 were in the lower central region of the lake, whereas in 2017, they were near the lower-right margin. A large part is found in the center of the lake for 2017 with the minimum TSS values. This difference in spatial distribution, mainly showing as a large TSS concentration in 2016, is consistent with the in situ collected water samples and is also explained by the rain that occurred in the previous week of the field collection in 2016. Figure 6
also shows that TSS concentrations were in the same range (9.33 to 23.75 mg/L) for both years. In this sense, to verify if the statistical characteristics of the prediction, data remain close to the observed ones, where Figure 7
shows the box plot of TSS concentrations for both observed and predicted values in March 2016 and 2017.
From Figure 7
, we can see the similarity between the observed and predicted distributions, even though this was not a large sample. From the Wilcoxon test, which is adequate for asymmetric distributions, the null hypothesis of equal medians was not rejected at a 5% significance level (p
-value = 0.74). From the same test, the statistically significant difference between the years already seen for the observed TSS values was maintained for the predicted TSS (p
-value = 0.0007). The observed average that was 16.27 and 13.65 (Table 1
) became 15.72 and 12.51 for the predicted TSS in March 2016 and 2017. The variance coefficient also remained similar, 17.51% and 22.56%, which are close to 19.77% and 23.65% presented in Table 1
Although the results of this study confirm the viability of the prediction of the concentration of TSS from remote sensing data and ANN, we emphasize that because it is a new methodology and that is still under development it has some limitations that should be considered.
For example, since each water body has its own characteristics (hydraulic, physical, chemical, and biological), which are related to its surroundings and the region’s climate, the proposed model in this study was trained relative to these conditions of the study area. Thus, it is necessary to develop regional models adapted to the area of interest of the study. Other authors like Kong et al. [8
] and Chen et al. [26
] also point out the absence of a standard model for different regions.
In relation to temporal variation, we point out that the field samplings were carried out in March of the two years and therefore the seasonal variation of TSS (not considered in this study) may indicate that a single model trained with data from only one season is not sufficient to predict other values throughout the year. Another factor that stands out is that besides the seasonal variation of TSS, other changes can occur in the natural environments over the years. Once the environmental characteristics are modified, it is not possible to affirm the capability of the trained neural network to predict data in the long term at this time. Thus, the monitoring of TSS from remote sensing does not rule out laboratory analyses from time to time. For instance, if the predictions exhibit unexpected behavior, such as a growth trend, new TSS and spectral data may be collected to check if it is a real change in the TSS or the neural network needs to be updated for current conditions.
Finally, although studies as this serve as pilot studies to be applied in larger water bodies, we emphasize that adaptations need to be made for this to occur because when flying with a UAV in lakes and dams because large homogeneous areas makes it difficult to generate orthophotos and products generated from the Structure for Motion (SfM) technique. One of the ways to minimize this problem would be to perform high altitude flights, facilitating the identification of homologous points in the images for generating the orthophoto, but which would result in a loss in resolution of the images.
The presented limitations indicate that this research needs to be continued. Nevertheless, what we have demonstrated in this article should instigate replications of this method in other water bodies such that more involved communities benefit from our positive results. This can be done through area flyovers with UAVs with RGB and NIR cameras, correct processing of acquired images, reliable data collection of water quality, and the establishment of an ANN with the ideal parameters for the prediction of interest, which can be TSS as in this study, or for example, chlorophyll-a or organic matter.
The prediction of TSS in water bodies from images acquired using a UAV and processed via an ANN should benefit managers, professionals, and researchers linked to the management and control of water resources by presenting a method for the dynamic and spatial monitoring of water quality problems, such as the presence of suspended solids.
The use of UAVs in the mapping of water quality is shown to be a promising tool because it alleviates issues found in the usual in situ monitoring, such as the insufficiency of data, high time and money costs, and modeling via remote orbital sensing, such as the low spectral and temporal resolutions. Through analysis of the response that the sensor on board the UAV collected in the regions of visible and near infrared, it was possible to model the concentration of optically active compounds, such as suspended solids, and generate maps that allowed for their temporal monitoring and spatial analysis at the study site.
We emphasize the applicability of the use of artificial intelligence through artificial neural networks to meet the need for modeling suspended solids in complex aquatic environments, where more simplistic analyses, such as the regression models presented in this study, may not be sufficient. The use of an ANN instead of RA significantly improved the quality of the results from the generated models, where R² values rose from 0.20 (RA) to 0.75 (ANN).
However, although the model presented could accurately predict suspended solids concentrations compatible with the statistical features of the in situ observed values, its use was limited only to the study area where the ANN was trained and calibrated, and possible adaptations to it are required for use in other environments.
The presented results are important for two main reasons. First, although regression methods have been used in remote sensing applications, they may not be adequate for capturing the linear and/or non-linear relationships of interest. Second, they show that the use of UAVs in the mapping of water bodies together with the application of neural networks in the analysis of the results obtained is a promising approach and has the potential to assist in monitoring the quality of these environments. Thus, we intend to continue monitoring the total suspended solids concentrations in Lake Unisinos by performing new overflights with a UAV in the region and simulating the data collected with the neural network.
We also emphasize the need to continue the research in order to improve the generated model, as well as to consider the interference of other optically active compounds, such as chlorophyll and organic matter, in the spectral response of water, and consequently, in the neural network generated.