Assessment of Water Quality Parameters Using Temporal Remote Sensing Spectral Reflectance in Arid Environments, Saudi Arabia

Remote sensing applications in water resources management are quite essential in watershed characterization, particularly when mega basins are under investigation. Water quality parameters help in decision making regarding the further use of water based on its quality. Water quality parameters of chlorophyll a concentration, nitrate concentration, and water turbidity were used in the current study to estimate the water quality parameters in the dam lake of Wadi Baysh, Saudi Arabia. Water quality parameters were collected daily over 2 years (2017–2018) from the water treatment station located within the dam vicinity and were correspondingly tested against remotely sensed water quality parameters. Remote sensing data were collected from Sentinel-2 sensor, European Space Agency (ESA) on a satellite temporal resolution basis. Data were pre-processed then processed to estimate the maximum chlorophyll index (MCI), green normalized difference vegetation index (GNDVI) and normalized difference turbidity index (NDTI). Zonal statistics were used to improve the regression analysis between the spatial data estimated from the remote sensing images and the nonspatial data collected from the water treatment plant. Results showed different correlation coefficients between the ground truth collected data and the corresponding indices conducted from remote sensing data. Actual chlorophyll a concentration showed high correlation with estimated MCI mean values with an R2 of 0.96, actual nitrate concentration showed high correlation with the estimated GNDVI mean values with an R2 of 0.94, and the actual water turbidity measurements showed high correlation with the estimated NDTI mean values with an R2 of 0.94. The research findings support the use of remote sensing data of Sentinel-2 to estimate water quality parameters in arid environments.


Introduction
There are many changes in the water body that take place when flowing water stops at the lowest elevation point on land. The flowing water transfers and holds the temperature from one location to the next, so all drainages that recharge a natural reservoir or an artificial dam will affect the water at the destination [1]. Moreover, thermal changes in the water body are associated with chemical changes which influence the organism cycle at the lagoon. Again, small streams and other channels that end at

Study Area Description
Baysh Dam is a gravity dam on Wadi Baysh around 35 km upper east of Baysh in the Jizan Region of southwestern Saudi Arabia (Figure 1). The dam has numerous reasons to incorporate surge control, water system, and groundwater revive. The Baysh Dam was constructed between 2003 and 2009, and is owned and operated by the Ministry of Water and Electricity. The Baysh Dam is 120 m high from the foundation level, with no sunlight reaching the bottom of the dam's lake. The total reservoir capacity of the dam is 192 million cubic meters. The dam normally requires a long time to be clear of the effects of organic materials. Sinking organic materials will consume oxygen in the water. Then, undesirable gases, like carbon dioxide and methane, are released into the dam water. This procedure can take a decade or so, although, in the tropics it may take many decades or even centuries for most of the organic matter to molder. Baysh Dam is a gravity dam on Wadi Baysh around 35 km upper east of Baysh in the Jizan Region of southwestern Saudi Arabia (Figure 1). The dam has numerous reasons to incorporate surge control, water system, and groundwater revive. The Baysh Dam was constructed between 2003 and 2009, and is owned and operated by the Ministry of Water and Electricity. The Baysh Dam is 120 m high from the foundation level, with no sunlight reaching the bottom of the dam's lake. The total reservoir capacity of the dam is 192 million cubic meters. The dam normally requires a long time to be clear of the effects of organic materials. Sinking organic materials will consume oxygen in the water. Then, undesirable gases, like carbon dioxide and methane, are released into the dam water. This procedure can take a decade or so, although, in the tropics it may take many decades or even centuries for most of the organic matter to molder.

Water Sample Collection
Regular water sample collections were initiated during the middle of each climatic season. From 2017 to 2018, following the complete random sampling (CRS) technique [12], a total of 120 water samples were collected and transferred to the laboratory to perform the designated colorimetry test [13], nitrate concentration (mg/L) test [14], and turbidity test (NTU) [15]. For the restricted part of the lake, water quality parameters were collected from the water treatment plant located at the vicinity of the dam lake.

Remote Sensing Data
Routine collection of Sentinel-2 data began in January 2017 and continued until the end of December 2018 on a sensor revisit resolution (16 days) which resulted in total of 52 images. The Sentinel-2 instrument is made of 12 spectral bands with a 10 m resolution of visible bands (VI), 20 m resolution of vegetation red edge (VRE) bands, and short-wave infrared (SWIR) bands, in addition to three bands related to coastal aerosols and water vapor of 60 m resolution. Three different remotely sensed indices were obtained to represent three different water quality parameters, maximum chlorophyll index (MCI), green normalized difference vegetation index (GNDVI), and normalized difference turbidity index (NDTI).

Water Sample Collection
Regular water sample collections were initiated during the middle of each climatic season. From 2017 to 2018, following the complete random sampling (CRS) technique [12], a total of 120 water samples were collected and transferred to the laboratory to perform the designated colorimetry test [13], nitrate concentration (mg/L) test [14], and turbidity test (NTU) [15]. For the restricted part of the lake, water quality parameters were collected from the water treatment plant located at the vicinity of the dam lake.

Remote Sensing Data
Routine collection of Sentinel-2 data began in January 2017 and continued until the end of December 2018 on a sensor revisit resolution (16 days) which resulted in total of 52 images. The Sentinel-2 instrument is made of 12 spectral bands with a 10 m resolution of visible bands (VI),  20 m resolution of vegetation red edge (VRE) bands, and short-wave infrared (SWIR) bands, in addition to three bands related to coastal aerosols and water vapor of 60 m resolution. Three different remotely sensed indices were obtained to represent three different water quality parameters, maximum chlorophyll index (MCI), green normalized difference vegetation index (GNDVI), and normalized difference turbidity index (NDTI).

Maximum Chlorophyll Index
Maximum chlorophyll index (MCI) was used to exploit the height of the measurements in a certain spectral band above a baseline which passes through three bands B4 (665 nm), B5 (705 nm), and B6 (740 nm) [16,17]. The MCI for floating vegetation and inland water bodies is estimated using the algorithm of Matthews et al. [18] considering the top of atmosphere (TOA) condition: where R rs can be obtained based on a field measurement as follows [19]: where Rp is the standard reflectance panel, Lw(λ) is the radiance of water-viewing, Lsky(λ) is sky-measured radiance, ρ is the air-water interface reflectance, and Lp is the radiance reference panel.

Green Normalized Difference Vegetation Index
The green normalized difference vegetation index is based on two-band combinations of the red-edge region of the spectrum [20]. GNDVI is very sensitive to the change in chlorophyll content, which is tidally related with the nitrogen content at the dam lake. The nitrogen index was created and found from the following equation [21]: The normalized difference vegetation index was used to detect nitrogen content using the following equation: (NIR − (690 nm~710 nm))/(IR + (690 nm~710 nm)).
After many studies, it was found that using the green ray for detection of nitrogen content was more effective than the normal vegetation index. Therefore, GNDVI uses the following equation [22]: GNDVI = ((NIR − (540 nm~570 nm))/(NIR + (540 nm~570 nm)).
The wavelength which was used in the previous equation was shifted to the green edge in order to get a clearer result from satellite images [22], where NIR is the near-infrared band of Sentinel-2.

Normalized Difference Turbidity Index
Lacaux et al. [23] developed an algorithm to estimate the water turbidity using remote sensing data specifically for ponds and inland waters, and it can be estimated as follows: where R is the red band of Sentinel-2, and G is the green band of Sentinel-2.

Data Normalization and Regression Analysis
In order to establish a regression analysis between the actual and estimated water quality parameters, a data normalization procedure is essential task to omit the unit's dimension from the two datasets. Data normalization can be achieved as follows [24]: The basic equation for Pearson's correlation is defined as follows [25]: where N is number of pairs of scores, ∑xy is sum of the products of paired scores, ∑x is sum of x scores, ∑y is sum of y scores, ∑x 2 is sum of squared x scores, and ∑y 2 is sum of squared y scores. The intention behind performing the regression analyses is to envisage the regression potentials between the actual and the remotely sensed estimated water quality parameters. Therefore, the actual parameters will be plotted against the estimated parameters and root mean square error (RMSE) values are used to obtain the best fit. RMSE is obtained as follows [26]: where P is the predicted value; O is the observed values.
Zonal statistics under arc environment were exercised and resulted in four different statistic types (mean, P 90 "majority", maximum, and minimum values of the input raster) which were used in the regression analysis to identify the best fit.

Results and Discussion
Multiple empirical regression analyses were exercised in order to evaluate and realize the coherent relationships between the actual water quality parameter concentrations collected in situ and the corresponded water quality parameters in reflectance values estimated from remote sensing data.
The in-situ water quality parameters were taken daily by the dam authority for routine analysis, therefore, there was no time difference between the in-situ sampling and remote sensing data acquisition.
Statistical analyses that included calculations of the average, maximum, and minimum values, and linear and nonlinear regressions were performed. Pearson correlation analysis was used to investigate the strength of the association between the two variables with a correlation coefficient (r). Significance levels were reported to be significant (p < 0.05) or not significant (p > 0.05) with a t-test, which provides evidence of an association between the two variables.
Statistical analyses were performed using the mean values of in situ measurements against the mean, P 90 (majority), maximum, and minimum values of remote sensing data to evaluate the analysis consistencies in linear and nonlinear regressions. The variables' association strength was examined for subsequent Person correlation with p < 0.05 for a significant association and p > 0.05 for no significant association. Figures 2-4 demonstrates the linear regression analysis of MCI, GNDVI, and NDTI, respectively.
Statistical analyses were performed using the mean values of in situ measurements against the mean, P 90 (majority), maximum, and minimum values of remote sensing data to evaluate the analysis consistencies in linear and nonlinear regressions. The variables' association strength was examined for subsequent Person correlation with p < 0.05 for a significant association and p > 0.05 for no significant association. Figures 2-4 demonstrates the linear regression analysis of MCI, GNDVI, and NDTI, respectively.   Regression results showed that mean pixel values were the best for presenting a coherent association between the actual water parameters and the remotely sensed estimated ones in each of the investigated water quality parameters (MCI, GNDVI, and NDTI). RMSE expressed in Table 1 confirms the robust association between the mean value of the in-situ water quality measurements and the conducted values from remote sensing data based on the summary of the fit analysis [27,28]. Regression results showed that mean pixel values were the best for presenting a coherent association between the actual water parameters and the remotely sensed estimated ones in each of the investigated water quality parameters (MCI, GNDVI, and NDTI). RMSE expressed in Table 1 confirms the robust association between the mean value of the in-situ water quality measurements and the conducted values from remote sensing data based on the summary of the fit analysis [27,28]. Multi variance analysis shows a strong correlation between the mean in situ readings and the mean image values of estimated MCI, GNDVI, and NDTI ( Figures 5-7). Multi variance analysis shows a strong correlation between the mean in situ readings and the mean image values of estimated MCI, GNDVI, and NDTI ( Figures 5-7). Table 2 expresses the correlation coefficient of the water quality parameters under investigation.         The minimum and maximum pixel did not show a strong association with in situ measurements. The reason behind this weak relationship is that both minimum and maximum pixel values were considered as the analysis range anomalies [29].
There was no significant difference between the in-situ surface water sampling across different climatic conditions and the subsurface measurements taken by the dam authority, which might be understood as reaching the saturation level of the investigated water quality parameters [30]. Therefore, this leads to water being classified as of a low quality that cannot be safely or directly used [31].
The temporal analysis of the estimated remotely sensed indices ensured regression stability [7] based on robust linear coherence between the actual and estimated water quality parameters examined in the current research study.
While Van Wagtendonk et al. [32] failed to establish a strong association using Landsat data, red-edge bands of Sentinel-2 proved to be efficiently reliable for estimating water quality parameters specified in inland water [33].
According to Gitelson et al. [34], Odermatt et al. [35], and Vesali et al. [36] estimation of chlorophyll concentration underwater turbidity conditions was satisfactorily conducted using several models based on accurate spectral measurements. However, developing complex models seems to be difficult, due to the broad bandwidth of Operational Land Imager (OLI) data. This requires the development of a different method that is applicable to OLI data.
Nevertheless, complex models for chlorophyll estimations based on Landsat OLI data are hard to develop because of Landsat OLI broad bandwidth in the near-infrared and thermal infrared regions [37]. Previously, scholarly work of Walthall et al. [38] and Adam et al. [39] reported weaknesses in the association of middle infrared and chlorophyll concentrations in water using Landsat ETM+ data. This was explained by the penetration limitations of Landsat ETM+ bands in deep water [40,41].
Estimation of N concentration in agricultural crops and its spatial distribution has been the goal of several scholarly works because of its importance to soil fertilization [42,43]. However, estimation of dissolved N is confined to a limited number of earth observation sensors equipped specifically with the red-edge region of the spectrum [44].
Regression analysis between the actual N concentration and the estimated index was exercised on four different zonal statistic types. The mean and the P 90 values were more coherent than the actual data with a correlation coefficient of 0.9699 and 0.9596, respectively, while maximum and minimum values were less representative of the actual N concentration. The increase in the dissolved N in the dam lake is a sign of pollution pressure to the water quality on the catchment scale [45,46]. However, the N concentration is monitored by the local authority at the outflow, disregarding only the separation distribution of the pollutant which can be assessed using remote sensing data [47,48].
Water turbidity as a sign of sedimentation processes was initially considered by Carpenter [49] utilizing Landsat Thematic Mapper data. The algorithm was consequently developed to consider the change in the central bandwidth of the recent sensors as Doxaran [50] reported using Satellite Probatoire de l'Observation de la Terre (SPOT) images. The duality of the bands at 645 and 850 nm of suspended particulate matter proved to be effective in turbidity detection, especially in inland water [51]. Similarly, Sentinel-2 central band wavelengths in the near-infrared region cover the designated bands for suspended particulate detection, making the sensor capable of estimation of water turbidity in a precise manner [52,53].

Conclusions
The routine monitoring of water quality parameters is costly and requires constant laboratory supplies and efforts. The implemented methodologies, as well as the comprehensive assessments, answered the questions concerning the feasibility of using a linear empirical approach to estimate the designated water quality parameters across temporal remote sensing data. Estimation of chlorophyll, nitrogen content, and water turbidity were successfully achieved utilizing remote sensing data acquired from Sentinel-2. Red-edge bands of Sentinel-2 are the keystone feature of the sensor for estimating the addressed water quality parameters in a reliable manner. Moreover, the mean values of the raster data showed a high correlation with the actual data from the conducted laboratory examinations. Therefore, a consistent empirical model could be determined.