Surface Water Quality Assessment through Remote Sensing Based on the Box–Cox Transformation and Linear Regression

: A methodology to estimate surface water quality using remote sensing is presented based on Landsat satellite imagery and in situ measurements taken every six months at four separate sampling locations in a tropical reservoir from 2015 to 2019. The remote sensing methodology uses the Box–Cox transformation model to normalize data on three water quality parameters: total organic carbon (TOC), total dissolved solids (TDS), and chlorophyll a (Chl-a). After the Box–Cox transformation, a mathematical model was generated for every parameter using multiple linear regression to correlate normalized data and spectral reﬂectance from Landsat 8 imagery. Then, signiﬁcant testing was conducted to discard spectral bands that did not show a statistically signiﬁcant response ( α = 0.05) from the different water quality models. The r 2 values achieved for TOC, TDS, and Chl-a water quality models after the band discrimination process were found 0.926, 0.875, and 0.810, respectively, achieving a fair ﬁtting to real water quality data measurements. Finally, a comparison between estimated and measured water quality values not previously used for model development was carried out to validate these models. In this validation process, a good ﬁt of 98% and 93% was obtained for TDS and TOC, respectively, whereas an acceptable ﬁt of 81% was obtained for Chl-a. This study proposes an interesting alternative for ordered and standardized steps applied to generate mathematical models for the estimation of TOC, TDS, and Chl-a based on water quality parameters measured in the ﬁeld and using satellite images.


Introduction
Surface water quality monitoring is essential to assess the impacts of anthropogenic activities and natural phenomena [1,2], but it is labor-intensive, time-consuming, and costly [3].Several studies have demonstrated that using remote sensing for water quality evaluation has significant advantages for surface water quality monitoring [4][5][6][7].Remote sensing for water quality evaluation is based on measuring the radiance emerging from the water related to electromagnetic radiation that interacts with both suspended and dissolved matter through absorptive, refractive, and scattering mechanisms [8,9].Specific imagery bands are required to measure water quality parameters, but the remotely sensed reflectance may be influenced by external conditions, such as atmospheric and air-water interface effects, illumination conditions, or instrument characteristics [10].To avoid these interferences, some authors have suggested observing the scattering and absorption characteristics of optically active constituents (OACs) to obtain accurate inherent optical properties [11].However, identifying specific wavelengths for water quality estimation is a complex task because water constituents absorb and scatter light across the entire visible spectral range, which complicates their estimation from optical measurements [10].
The most employed methodologies to estimate surface water quality uses empirical approaches with multispectral sensors [12].Water quality modeling using remote sensing is often carried out using normalized difference indices and spectral band ratios.Normalization can remove brightness variations, reducing the influence of atmospheric, and air-water surface effects [13][14][15].Other water quality studies using remote sensing are based on water quality parameters and spectral reflectance multiple regression [16][17][18][19][20] consisting of obtaining correlations between water leaving radiance (Lw) and several optically active parameters such as chlorophyll-a, total suspended solids, and turbidity.Despite some studies having achieved satisfying results using broadband sensors, others report less accurate results because of the presence of suspended material in turbid and/or eutrophic water bodies [21].
Although significant advancements in mathematical models for surface water quality using reflectance values from satellite imagery are given, improving existing models using multiple linear regression to estimate water quality using Landsat 8 imagery remains an interesting pending research task.Recently, Sharaf El Din and Zhang [22] have proposed a regression-based technique to estimate surface water quality parameters using Landsat 8 OLI imagery.They propose a stepwise regression (SWR) to minimize the number of predictor variables and to maximize the precision of the water quality estimation.Highly accurate results were achieved when using the Landsat 8-based-SWR approach (r 2 > 85%).
The present study proposes a series of ordered and standardized steps to generate mathematical models from a multiple linear regression analysis.The multiple regression analysis was carried out between the reflectance values of Landsat 8 images and the normalized concentrations of water quality parameters.Normalizing water quality data was used to eliminate the effects of certain errors that may be still present after the dataset validation procedures, such as outliers, censored values, seasonality trends, or serial correlations, and that can affect the accuracy of the data, making it more consistent, reliable, and suitable for further processing and analysis.
Some studies have normalized the dataset, but they were not used for the estimation of water quality from satellite imagery.For instance, Feng [23] developed normalized water quality indexes using band combination, and Qi et al. [24] normalized reflectance data for water quality estimation.Asadollahfardi et al. [25] suggested using the Box-Cox transformation for the normalization of water quality data.
The methodology proposes the use of the Box-Cox transformation to normalize data on three water quality parameters: total organic carbon (TOC), total dissolved solids (TDS), and chlorophyll a (Chl-a).After the Box-Cox transformation, a mathematical model was generated for every parameter using multiple linear regression between normalized data and spectral reflectance from Landsat 8 imagery.Using the proposed methodology, a surface water quality assessment was carried out in the Adolfo López Mateos (ALM) reservoir in Culiacan, Mexico.These mathematical models could be considered a crucial tool for decision-making since they could be used to estimate water quality during periods when field monitoring is not conducted.

Study Area
The ALM reservoir (Culiacan, Mexico) is in the Humaya River basin (25°05′25″, 25°20′15″ North, 107°33′00″, 107°15′00″ West) at 186.5 m above the sea level (m.a.s.l.).The ALM dam is 105.5 m high, and 765 m long, considered one of the main sources of water for agricultural irrigation, power generation, fishing, and tourist activities [26][27][28], covering 11,354 ha and ranked tenth in Mexico according to its storage capacity [29] (Figure 1).The Humaya River basin is characterized by a warm humid climate toward the center and south, with summer rains.From the center toward the north, the climate is semi-warm sub-humid.The average annual temperature is 24.5 °C and the mean annual rainfall is 698.9 mm per year.The basin has a mountainous geography, with deep canyons, low mountains, highlands, and large plateaus with ravines.The basin elevation varies between 150 and 2300 m.a.s.l.[30].
The predominant vegetation is tropical deciduous forest, with small areas of pineoak and pine forests toward the northwestern part [26].Since high productivity is observed in the basin, a large proportion of the land is intended for agricultural and livestock activities [31].According to Sanhouse-Garcia et al. [30] and Monjardin et al. [32], the Humaya River basin is affected by both natural (fires and frequent frosts) and anthropogenic (deforestation) factors that cause continuous and rapid changes in land use and aquatic ecosystems.

Methodology
Figure 2 sketches the methodology for water quality parameters estimation using satellite imagery, which was carried out in three phases: (i) processing Landsat 8 sensor imagery and reflectance data extraction, (ii) obtaining water quality data, and (iii) developing mathematical models for water quality parameters.The Humaya River basin is characterized by a warm humid climate toward the center and south, with summer rains.From the center toward the north, the climate is semi-warm sub-humid.The average annual temperature is 24.5 • C and the mean annual rainfall is 698.9 mm per year.The basin has a mountainous geography, with deep canyons, low mountains, highlands, and large plateaus with ravines.The basin elevation varies between 150 and 2300 m.a.s.l.[30].
The predominant vegetation is tropical deciduous forest, with small areas of pine-oak and pine forests toward the northwestern part [26].Since high productivity is observed in the basin, a large proportion of the land is intended for agricultural and livestock activities [31].According to Sanhouse-Garcia et al. [30] and Monjardin et al. [32], the Humaya River basin is affected by both natural (fires and frequent frosts) and anthropogenic (deforestation) factors that cause continuous and rapid changes in land use and aquatic ecosystems.

Methodology
Figure 2 sketches the methodology for water quality parameters estimation using satellite imagery, which was carried out in three phases: (i) processing Landsat 8 sensor imagery and reflectance data extraction, (ii) obtaining water quality data, and (iii) developing mathematical models for water quality parameters.

Satellite Imagery Acquisition
Satellite imagery was obtained through the United States Geological Survey database [33] and coincided with the dates of water quality monitoring campaigns.GeoTiff level 1 (L1T) images from Landsat 8 were used.The L1T images are terrain-corrected; hence, these images already provide a radiometric and geodetic accuracy in a cartographic projection UTM (Universal Transversal of Mercator), referenced in WGS84 (Word Geodetic System 1984).The images correspond to Path 32, Row 43 of the Landsat 8 sensor, which covered the reservoir surface during the study period (January 2015 to June 2019).Table 1 shows the acquisition dates of images from Landsat 8. Landsat-8 imagery at level 1T was rescaled to the top of atmosphere (TOA) reflectance using radiometric rescaling coefficients [22,33].This radiometric rescaling was

Satellite Imagery Acquisition
Satellite imagery was obtained through the United States Geological Survey database [33] and coincided with the dates of water quality monitoring campaigns.GeoTiff level 1 (L1T) images from Landsat 8 were used.The L1T images are terrain-corrected; hence, these images already provide a radiometric and geodetic accuracy in a cartographic projection UTM (Universal Transversal of Mercator), referenced in WGS84 (Word Geodetic System 1984).The images correspond to Path 32, Row 43 of the Landsat 8 sensor, which covered the reservoir surface during the study period (January 2015 to June 2019).Table 1 shows the acquisition dates of images from Landsat 8. Landsat-8 imagery at level 1T was rescaled to the top of atmosphere (TOA) reflectance using radiometric rescaling coefficients [22,33].This radiometric rescaling was performed in QGIS software based on TOA reflectance (Equations ( 1) and ( 2)) and using the semiautomatic classification plug-in [34].
where ρ * is the TOA planetary reflectance, without correction for solar angle; M ρ is the band-specific multiplicative rescaling factor from the metadata; Q cal is the quantized and calibrated standard product pixel values (DN); and A ρ is the band-specific additive rescaling factor from the metadata.Since the reflectance obtained from the Landsat 8 data is not corrected for the solar zenith angle, the provided reflectance is generally too low and this error increases at high latitudes and in the cold season [22].Therefore, a TOA reflectance correction for the solar zenith angle was performed using Equation (2).
where ρ is the TOA planetary reflectance; θ SE is the local sun elevation angle: θ SZ local solar zenith angle; θ SZ = 90 • − θ SE .Atmospheric correction processes were carried out using dark object subtraction (DOS).The basic assumption of the DOS method is that within the image some pixels are in complete shadow and their radiances measured at the satellite are due to atmospheric scattering (path radiance), selecting the spectral-band haze values that are correlated to each other [35].This process was performed in QGIS by using Equations ( 3)-( 7) [22].
where ρ sur f ace is the surface reflectance; L λ is the spectral radiance at the sensor's aperture; L P is the path radiance due to atmospheric effects; d is the Earth-Sun distance in astronomical units; T V is the atmospheric transmittance in the viewing direction; E Sunλ is the mean solar radiation; T Z is the atmospheric transmittance in the illumination direction; E down is the downwelling diffuse irradiance; L λmin is the radiance values correspond to the minimum pixel values; L DO1% is the radiance of dark object; M L is the radiance band-specific multiplicative rescaling factor; DN min is the minimum pixel value; and A L is the radiance band-specific additive rescaling factor.

Reflectance Data Extraction
The study area was delimited from satellite imagery with a polygon mask using QGIS software and a semi-automatic extraction utility.Then, reflectance data of corrected bands were extracted using the QGIS point sampling tools [36].Landsat 8 has eleven bands, but in this study, the extraction process was carried out only for the bands B1, B2, B3, B4, B5, B6, and B7.Since the TOC, TDS, and Chl-a are known as optically active parameters and their spectral responses are mainly in visible and near-infrared domains, B9 (cirrus band), B10, and B11 (infrared thermal bands) were excluded [37,38].The B8 panchromatic band was also excluded from the extraction process since this band combines blue (B2), green (B3), and red (B4) bands with a greater spatial resolution and does not contain any additional wavelength-specific information.

Water Quality Monitoring
Water quality is monitored at the ALM reservoir every six months through sampling campaigns at four sampling sites by Mexico's National Water Commission (CONAGUA).CONAGUA is responsible for implementing processes to guarantee quality assurance/control (QA/QC).Hence, the sampling, transportation, and preservation of samples meet the appropriate Mexican standards, and the samples are analyzed in triplicate by an accredited laboratory, based on international standard methods for water analysis [39].Official water quality information from CONAGUA has been used to assess water quality by other studies [26,28,40].In this study, water quality data from 2015 to 2019 were used.To demonstrate the appropriateness of the proposed methodology, three water quality parameters were evaluated: total dissolved solids (TDS), chlorophyll (Chl-a), and total organic carbon (TOC).TOC, TDS, and Chl-a were measured based on APHA Methods 5310, 2510, and 10200H, respectively [39].Past studies have shown that these parameters respond to the energy spectrum changes of reflected solar radiation from waterbodies [41,42].

Box-Cox Transformation of Water Quality Parameters
The Box-Cox transformation is a statistical technique to stabilize the variance of a certain dataset and ensure normal distribution of deviations around the model [43].The main goal of data normalization is to adjust values to a common scale, achieving a standardized data format, which may facilitate comparison and analysis.
The Box-Cox transformation was only applied to the water quality parameters.For transformation, every data is raised to the λ 1 power after changing it to a certain amount λ 2 (often equal to 0).These transformations could be square roots, logarithms, reciprocals, and/or other common transformations (Table 2) [43].Hence, the Box-Cox transformation (Equation ( 3)) is defined as a continuous function that varies as a function of power (λ) [44].
where y' is the normalized water quality parameter, y is the originally measured water quality data, λ 1 and λ 2 are values that, when substituted in Equation ( 3), the standard deviation of y' will be zero.

Power Transformation Description
Inverse square root

Reciprocal
Note: * Note that as λ 1 → 0 , the power transformation approaches a logarithm.
The Statgraphics Centurion XVI software was used to perform the Box-Cox transformation of water quality data.Initially, the λ 1 values of 2, 1, 0.5, 0.33, 0, −0.5, and 1 shown in Table 2 were investigated to determine which, if any, is most suitable.The software was used to solve for the optimum value of λ 1 using maximum likelihood estimation.Once the Box-Cox transformation was performed, the normality of the data was evaluated using the Kolmogorov-Smirnov goodness of fit test.

Multiple Linear Regression
Multiple linear regression was carried out to correlate normalized water quality data and reflectance values from Landsat 8 imagery, where a fitted (or estimated) value is calculated using Equation (4) [45]: where y i is the estimated Box-Cox normalized value, x 1 , x 2 , . . ., x i are Landsat 8 imagery bands, b 0 is the intercept when all the predictors x 1 , x 2 , . . ., x i are all zero, b 1 , b 2 , . . ., b i are the linear regression coefficients obtained from the fitted values and ε is a random error corresponding to the n observations that are also assumed to be uncorrelated random variables [46].

Model Performance Evaluation
Two indicators were selected to evaluate model performance in estimating water quality: the coefficient of determination (r 2 ) and the root-mean-square error (RMSE).Equation ( 5) was used to estimate r 2 , which is a number between 0 and 1 that measures how well a model estimates an outcome [47]: where y i is the measured water quality parameter, y i is the average water quality parameter, ŷi is the estimated water quality values, and n is the number of available data.RMSE statistically assesses differences between values observed and estimated by the model, the higher the RMSE, the greater the difference between estimated and observed values.RMSE was calculated using Equation (6) [47,48]:

Multiple Linear Regression Significance Testing
The significance of individual regression coefficients in the multiple linear regression model was carried out using a t-test.This test measures the contribution of an independent variable while the remaining variables are still included in the model.For the model the significance of the variable x 1 is evaluated while controlling for the presence of the variables x 2 , . . ., x i (i.e., the model To determine whether x 1 , x 2 , . . ., x i variables are useful predictors in this model, the following null and alternative hypotheses were tested: To carry out this hypothesis test, a p-value was obtained for all coefficients in the model.Each p-value is based on a t-statistic calculated as: Water 2023, 15, 2606 where s bi is the standard error of the regression coefficient b i , calculated using Equation ( 8): The p-value is then compared with a significance level (α = 0.05).This critical value is typically set for hypothesis testing.

Water Quality Model Validation
A model validation procedure was performed comparing estimated and measured water quality values not previously used for model development.In this study, the data used for model validation were 25% of the total field data for the 2015-2017 period and the total field data for 2018, as shown in Figure 2. RMSE and r 2 were used to estimate the models' fitness to field water quality measurements.

Water Quality Mapping
Estimated water quality parameters were used to generate simplified models resulting from the significant testing and validation process.These models were represented spatially and temporally using GIS tools (raster calculator) and employing QGIS software for TOC, TDS, and Chl-a estimation.

Water Quality from Field Sampling
Data of water quality parameters (TOC, TDS, and Chl-a) measured in the field are summarized in Figure 3. Figure 3a shows that TOC has similar spatial distribution in the reservoir remaining within the 3.4 to 5.4 mg/L range, with a slight increase in 2017.These results agree with values reported by Zhou et al. [49], where TOC concentrations of around 2.5 mg/L were found in a northeast China reservoir.Few studies have been carried out on the organic matter in the study area.Gonzalez-Farias [50] (2006) reported that the mean concentration of particulate organic carbon in the Culiacan River was 1.73 mg C/L.TDS also showed slight spatial variation (Figure 3b) where 92.4,93.8, and 117.8 mg/L mean concentrations were observed in 2015, 2016, and 2017, respectively.These TDS concentrations were similar to those reported in other reservoirs located close to the ALM reservoir, such as Huites (124 mg/L), José Ortiz (137 mg/L), and Miguel Hidalgo (132 mg/L) reservoirs [51].Chl-a values in the ALM reservoir ranged from 0.1 to 6.1 mg/m 3 .These values are within the reported by Fregoso-López et al. [52] who found a maximum Chl-a concentration of 3.4 mg/m 3 and a minimum concentration of 0.3 mg/m 3 in the Miguel Hidalgo y Costilla reservoir located in El Fuerte, Sinaloa.A model validation procedure was performed comparing estimated and measured water quality values not previously used for model development.In this study, the data used for model validation were 25% of the total field data for the 2015-2017 period and the total field data for 2018, as shown in Figure 2. RMSE and r 2 were used to estimate the models' fitness to field water quality measurements.

Water Quality Mapping
Estimated water quality parameters were used to generate simplified models resulting from the significant testing and validation process.These models were represented spatially and temporally using GIS tools (raster calculator) and employing QGIS software for TOC, TDS, and Chl-a estimation.

Water Quality from Field Sampling
Data of water quality parameters (TOC, TDS, and Chl-a) measured in the field are summarized in Figure 3. Figure 3a shows that TOC has similar spatial distribution in the reservoir remaining within the 3.4 to 5.4 mg/L range, with a slight increase in 2017.These results agree with values reported by Zhou et al. [49], where TOC concentrations of around 2.5 mg/L were found in a northeast China reservoir.Few studies have been carried out on the organic matter in the study area.Gonzalez-Farias [50] (2006) reported that the mean concentration of particulate organic carbon in the Culiacan River was 1.73 mg C/L.TDS also showed slight spatial variation (Figure 3b) where 92.4,93.8, and 117.8 mg/L mean concentrations were observed in 2015, 2016, and 2017, respectively.These TDS concentrations were similar to those reported in other reservoirs located close to the ALM reservoir, such as Huites (124 mg/L), José Ortiz (137 mg/L), and Miguel Hidalgo (132 mg/L) reservoirs [51].Chl-a values in the ALM reservoir ranged from 0.1 to 6.1 mg/m 3 .These values are within the reported by Fregoso-López et al. [52] who found a maximum Chl-a concentration of 3.4 mg/m 3 and a minimum concentration of 0.3 mg/m 3 in the Miguel Hidalgo y Costilla reservoir located in El Fuerte, Sinaloa. Figure 3 also shows that Chl-a levels in 2015 and 2016 were higher than those observed in 2017.This behavior is contrary to the behavior observed for TOC and TDS, where the highest concentration was found in 2017 when Chl-a was significantly reduced.Normally, there should be a correlation between these parameters because an excess of Figure 3 also shows that Chl-a levels in 2015 and 2016 were higher than those observed in 2017.This behavior is contrary to the behavior observed for TOC and TDS, where the highest concentration was found in 2017 when Chl-a was significantly reduced.Normally, there should be a correlation between these parameters because an excess of nitrogen and phosphorus in the water can lead to an overgrowth of algae or phytoplankton, resulting in higher Chl-a levels and an increase in organic matter and dissolved solids [53].However, if the nutrients are depleted faster than they are being replenished, the algae eventually die and a decrease in Chl-a levels can be observed, and the decomposition of the excessive organic matter produced by the algae can increase TOC and TDS levels [54].

Box-Cox Transformation
Table 3 shows the resulting algorithms (a three-year normalized equation, from January 2015 to December 2017) obtained using the Box-Cox transformation for the water quality parameters studied.The mathematical models provided in Table 3 were rearranged for their later use to transform the estimated values to their original water quality units.The models showed a good fit, with r 2 values greater than 0.85 in all cases.Table 4 shows the results of the Kolmogorov-Smirnov goodness of fit test using the normalized water quality parameters.According to the Kolmogorov one-sample statistic (Dn) values and their respective p-values, the Box-Cox transformation was a good tool to normalize the water quality parameters.Note: p-value > 0.05 suggests that there is not sufficient evidence to conclude that the data is not normally distributed.

Multiple Linear Regression Modeling and Discriminant Analysis
Multiple linear regression was performed between normalized water quality parameters and band reflectance values extracted from satellite imagery, using B1, B2, B3, B4, B5, B6, and B7 Landsat 8 bands.In this method, values from the Box-Cox transformation were considered dependent variables whereas band reflectance values from imagery captured by the sensor were considered independent variables.Table 5 shows the multiple linear regression models for the water quality parameters of the ALM reservoir.The multiple regression models showed fair fitting with r 2 greater than 0.80, considered satisfactory compared with the results of other empirical models used to estimate water quality through remote sensing [55,56].
A discriminant analysis was then performed to reduce the number of bands used in each model.A Student's t-test was carried out to assess whether each band has a significant effect on the water quality variables.p-values greater than or equal to 0.05 were considered not significant [57].After these variables were eliminated, the hypothesis test was performed again using the simplified model, and the cycle was repeated until the model included only significant (p < 0.05) independent variables.Table 6 shows the band discrimination process for TOC.As shown, six iterations were used to eliminate non-significant bands without significantly altering the fitting of the model.Table 7 shows the different p-value for TOC model parameters in each iteration.The initial mathematical model without band elimination (iteration 1) is the same as presented in Table 3.The student's t-test showed that the highest p-value was observed for B4 (Table 7), so this band was eliminated, and another hypothesis test was then performed generating a simplified model (Table 7, iteration 2) with a similar fit to the original.The band elimination process from TOC was repeated until only bands with a p-value less than 0.05 were included (this value is the established significance level for hypothesis tests).Through this process, B4, B5, B7, B6, and B3 were discriminated against for the TOC model.Despite the many bands removed, the simplified model presented a similar r 2 value compared to the initial one.This same methodology was carried out for each of the water quality parameters considered in this study and the results of the simplified models are shown in Table 8.  Figure 4a shows the estimated and measured TDS values in the ALM reservoir.The final TDS model accuracy (r 2 = 0.875; RMSE = 3.2613) can be considered satisfactory showing a better fit compared to other studies [58].This could be attributed to the bands used for TDS estimation since low model accuracies have been reported in several studies that have only used B3, B4, and B5 bands (530 to 890 nm) of Landsat 8 [59,60].According to Zhao et al. [61] (2020), the B3-B5 wavelength range (530-890 nm) can be used to characterize whether the water body contains phytoplankton chlorophyll (560-590 nm), cyanobacteria (620 nm), phycocyanin (650 nm), algae chlorophyll (675 nm), and suspended inorganic matter (810 nm).However, in this study, the discriminant analysis demonstrated that TDS estimation should be carried out using the bands B1, B2, B3, B4, B5, and B6 of Landsat 8.The use of a wider wavelength range could explain the satisfactory fit obtained since higher dissolved content of inorganic and organic substances could be detected, such as the colored dissolved organic matter (CDOM) (420-555 nm).Our results agree with Maliki et al. [62], who successfully predicted the TDS of surface water in Bangladesh using Landsat 8 OLI and multiple linear models (r 2 = 0.95).
Chl-a is the most studied water quality parameter through remote sensing.Results for Chl-a showed a lower fit than TOC and TDS probably because of the normalization of water quality data.Several limitations were observed when Chl-a was normalized using the Box-Cox transformation, which generated the lowest r 2 value (see Table 3) in comparison with TOC and TDS, likely because Chl-a is a biological parameter showing exponential growth.In addition, Chl-a is more susceptible to seasonal variations related to physical, chemical, and climatic factors [63,64].Mohsen et al. [65] used a multi-linear regression technique for the estimation of Chla through remote sensing using Landsat 7 bands B1 and B3 at Lake Burullus, Egypt, obtaining r 2 = 0.86 (RMSE = 34.6).Bohn et al. [66] reported r 2 = 0.83 estimating Chl-a using Landsat 7 bands B3 and B4.The accuracy of these models was similar to the results of this study (r 2 = 0.81, RMSE = 3.1267) (Figure 4c).In the Chl-a model, some bands (such as B2 and B7) appear in the final model generated and do not appear in other studies, such as the one performed by Bohn et al. [66].This is because these studies estimate Chl-a by calculating predetermined indices such as the normalized difference vegetation index (NDVI), normalized area vegetation index (NAVI), enhanced vegetation index (EVI), and ratio vegetation index (RVI).
The results obtained in this study can be considered low compared to those reported by Tyler et al. [67], (r 2 = 0.95) for a linear mixture model used to estimate Chl-a in Lake Balaton, Hungary, using Landsat TM imagery.The accuracy of the water quality models can be improved by removing image interferences.For instance, in this study, the DOS atmospheric correction method was used which assumes that there are dark targets in the image, such as water and dense vegetation.But when the water body is turbid, such as the reservoir in this study, the reflection of water in the near-infrared band is close to 0, which leads to uncertainties of the atmospheric correction over water [68].Other atmospheric correction methods have been proven to be effective for turbid waters, such as AC-OLITE [69,70], ACIX-Aqua [71,72], iCOR [73], POLYMER [74], or MDM [68].Thus, the performance of these algorithms on the regression models should be investigated in depth in further studies.

Model Validation
The remaining 25% of field water quality data (randomly selected data not previously used for model development) during the 2015-2017 period and 100% of the data obtained from January 2018 to June 2019 were used to validate the simplified models, showing a fair fit between estimated and observed data.The estimated and observed water quality data for the model development and validation is shown in Table S1.A good fit of 93% and 81% was obtained for TOC and Chl-a, respectively (Figure 5b,c).TDS Chl-a is the most studied water quality parameter through remote sensing.Results for Chl-a showed a lower fit than TOC and TDS probably because of the normalization of water quality data.Several limitations were observed when Chl-a was normalized using the Box-Cox transformation, which generated the lowest r 2 value (see Table 3) in comparison with TOC and TDS, likely because Chl-a is a biological parameter showing exponential growth.In addition, Chl-a is more susceptible to seasonal variations related to physical, chemical, and climatic factors [63,64].
Mohsen et al. [65] used a multi-linear regression technique for the estimation of Chl-a through remote sensing using Landsat 7 bands B1 and B3 at Lake Burullus, Egypt, obtaining r 2 = 0.86 (RMSE = 34.6).Bohn et al. [66] reported r 2 = 0.83 estimating Chl-a using Landsat 7 bands B3 and B4.The accuracy of these models was similar to the results of this study (r 2 = 0.81, RMSE = 3.1267) (Figure 4c).In the Chl-a model, some bands (such as B2 and B7) appear in the final model generated and do not appear in other studies, such as the one performed by Bohn et al. [66].This is because these studies estimate Chl-a by calculating predetermined indices such as the normalized difference vegetation index (NDVI), normalized area vegetation index (NAVI), enhanced vegetation index (EVI), and ratio vegetation index (RVI).
The results obtained in this study can be considered low compared to those reported by Tyler et al. [67], (r 2 = 0.95) for a linear mixture model used to estimate Chl-a in Lake Balaton, Hungary, using Landsat TM imagery.The accuracy of the water quality models can be improved by removing image interferences.For instance, in this study, the DOS atmospheric correction method was used which assumes that there are dark targets in the image, such as water and dense vegetation.But when the water body is turbid, such as the reservoir in this study, the reflection of water in the near-infrared band is close to 0, which leads to uncertainties of the atmospheric correction over water [68].Other atmospheric correction methods have been proven to be effective for turbid waters, such as ACOLITE [69,70], ACIX-Aqua [71,72], iCOR [73], POLYMER [74], or MDM [68].Thus, the performance of these algorithms on the regression models should be investigated in depth in further studies.

Model Validation
The remaining 25% of field water quality data (randomly selected data not previously used for model development) during the 2015-2017 period and 100% of the data obtained from January 2018 to June 2019 were used to validate the simplified models, showing a fair fit between estimated and observed data.The estimated and observed water quality data for the model development and validation is shown in Table S1.A good fit of 93% and 81% was obtained for TOC and Chl-a, respectively (Figure 5b,c).TDS showed a good adjustment with 98% (Figure 5a).Therefore, this study suggests the high feasibility to develop mathematical models based on water quality parameters measured in the field and using satellite images.
Water 2023, 15, x FOR PEER REVIEW 13 of 19 showed a good adjustment with 98% (Figure 5a).Therefore, this study suggests the high feasibility to develop mathematical models based on water quality parameters measured in the field and using satellite images.

Spatial and Temporal Distribution of Water Quality Parameters from Optimized Models
Simplified models were used to evaluate the spatial and temporal distribution of water quality parameters (TOC, TDS, and Chl-a) in the study area (Figures 6 and 7). Figure 6 shows the temporal behavior of TOC, TDS, and Chl-a in the ALM reservoir.In this figure, a time series comparison between the observed and estimated water quality values is shown.Only two observations are shown per year because water quality monitoring was carried out semi-annually.These observations represent the mean value of the four sample sites in the reservoir.
The similarity between the values estimated using simplified models derived from remote sensing and field measurements was explained using the RMSE and coefficient of determination (r 2 ) indicators.A very low RMSE value was obtained when TOC observed and estimated concentrations were contrasted (Figure 6b).This figure also shows a fair estimation for TDS and Chl-a from the optimized models and satellite imagery (Figure 6a,c).The r 2 values obtained for TDS and Chl-a were higher than 0.81, which indicated a low variation between the observed and estimated water quality parameters.
This study estimated TOC, TDS, and Chl-a in the ALM reservoir on a bimonthly basis, despite the water quality information was available every six months.One of the main problems with empirical models is that they can generate unreliable results when applied at sites where they were not generated or on dates different from those used for their generation.The results demonstrated that the estimated water quality data agreed with the data observed in the ALM reservoir.These models were validated and suggest the feasibility of using Landsat imagery to estimate TDS, TOC, and Chl-a, which can be used as a decision-support tool for water quality management and policy analysis.
Figure 7 shows the spatial behavior of the TOC, TDS, and Chl-a through time (using the ISO 8601 date format YYYY-MM-DD).In this figure, a linear color gradient was used based on the RGB color model, where red and blue colors correspond to the highest and lowest concentrations of the water quality parameters depicted, respectively.Figure 7b shows that the TOC concentration in the ALM reservoir is higher during September and October, corresponding to the rainy season, associated with the entry of organic matter into the waterbody by runoff.Similarly, TOC concentrations of the nearshore area were higher than those within the ALM reservoir.According to this figure, the maximum TOC content occurred in the ALM upper reaches during May, when the water level in the reservoir is very low.Therefore, TOC behavior in the ALM reservoir is highly related to the biogeochemical processes of organic carbon.Hence, continuous monitoring of TOC using    TDS showed a slight increase during the rainy season as runoff incorporates minera salts into the reservoir (Figure 7a).Chl-a showed a slight increase in the first months o the year (Figure 7c) probably related to the lentic regime of the reservoir and the absence of rain.The spatial water quality variation observed in this study corresponds to wate    The similarity between the values estimated using simplified models derived from remote sensing and field measurements was explained using the RMSE and coefficient of determination (r 2 ) indicators.A very low RMSE value was obtained when TOC observed and estimated concentrations were contrasted (Figure 6b).This figure also shows a fair estimation for TDS and Chl-a from the optimized models and satellite imagery (Figure 6a,c).The r 2 values obtained for TDS and Chl-a were higher than 0.81, which indicated a low variation between the observed and estimated water quality parameters.
This study estimated TOC, TDS, and Chl-a in the ALM reservoir on a bimonthly basis, despite the water quality information was available every six months.One of the main problems with empirical models is that they can generate unreliable results when applied at sites where they were not generated or on dates different from those used for their generation.The results demonstrated that the estimated water quality data agreed with the data observed in the ALM reservoir.These models were validated and suggest the feasibility of using Landsat imagery to estimate TDS, TOC, and Chl-a, which can be used as a decision-support tool for water quality management and policy analysis.
Figure 7 shows the spatial behavior of the TOC, TDS, and Chl-a through time (using the ISO 8601 date format YYYY-MM-DD).In this figure, a linear color gradient was used based on the RGB color model, where red and blue colors correspond to the highest and lowest concentrations of the water quality parameters depicted, respectively.Figure 7b shows that the TOC concentration in the ALM reservoir is higher during September and October, corresponding to the rainy season, associated with the entry of organic matter into the waterbody by runoff.Similarly, TOC concentrations of the nearshore area were higher than those within the ALM reservoir.According to this figure, the maximum TOC content occurred in the ALM upper reaches during May, when the water level in the reservoir is very low.Therefore, TOC behavior in the ALM reservoir is highly related to the biogeochemical processes of organic carbon.Hence, continuous monitoring of TOC using remote sensing could provide a quantitative basis for the estimation of carbon dioxide emission and sediment accumulation.
TDS showed a slight increase during the rainy season as runoff incorporates mineral salts into the reservoir (Figure 7a).Chl-a showed a slight increase in the first months of the year (Figure 7c) probably related to the lentic regime of the reservoir and the absence of rain.The spatial water quality variation observed in this study corresponds to water characteristics observed in waterbodies located in tropical regions [26,27,40,75].

Conclusions
This study proposed a methodology to estimate water quality parameters using satellite images.We proposed a methodology for band selection, discrimination, and water quality modeling based on ordered and standardized steps.However, it is important to highlight that this methodology was only validated for TOC, TDS, and Chl-a in the ALM reservoir in Mexico.Further studies should be focused on obtaining data from other water bodies to verify whether the methodology could be generalized.
The Box-Cox normalization proved to be effective in normalizing field water quality data, which was then used to find an optimal relationship with reflectance data from satellite bands.The models proposed were found robust since high coefficient of determination (r 2 ) values were obtained for the different water quality parameters estimated at the different stages (model development, discrimination, and validation).The obtained models were then used to estimate water quality parameters during periods where field monitoring was not conducted, which represents a crucial tool for decision-making.
This study provides an economical and effective alternative to monitor the water quality of a large water body in a short time based on a standardized repetitive basis.The methodology provides the spatial and temporal behavior of surface water quality, which can be used for water resources management.In this sense, this tool could contribute to improving the monitoring frameworks in many developing countries in the world, which

Figure 2 .
Figure 2. Proposed methodology to estimate water quality using satellite imagery.

Figure 2 .
Figure 2. Proposed methodology to estimate water quality using satellite imagery.

Figure 3 .
Figure 3. Descriptive analysis of the spatial and temporal distribution of water quality parameters (a) TOC, (b) TDS, and (c) Chl-a in the studied reservoir.

Figure 3 .
Figure 3. Descriptive analysis of the spatial and temporal distribution of water quality parameters (a) TOC, (b) TDS, and (c) Chl-a in the studied reservoir.

Figure 4 .
Figure 4.Estimated and observed values for (a) TOC, (b) TDS, and (c) Chl-a using the simplified models.

Figure 4 .
Figure 4.Estimated and observed values for (a) TOC, (b) TDS, and (c) Chl-a using the simplified models.

Figure 5 .
Figure 5. Validation of TDS (a), TOC (b), and Chl-a (c) with randomly selected data not previously used for model development.

Figure 5 .
Figure 5. Validation of TDS (a), TOC (b), and Chl-a (c) with randomly selected data not previously used for model development.

3. 5 .
Spatial and Temporal Distribution of Water Quality Parameters from Optimized ModelsSimplified models were used to evaluate the spatial and temporal distribution of water quality parameters (TOC, TDS, and Chl-a) in the study area (Figures6 and 7).
Figure 6 shows the temporal behavior of TOC, TDS, and Chl-a in the ALM reservoir.In this figure, a time series comparison between the observed and estimated water quality values is shown.Only two observations are shown per year because water quality monitoring was carried out semi-annually.These observations represent the mean value of the four sample sites in the reservoir.Water 2023, 15, x FOR PEER REVIEW 14 of 19 remote sensing could provide a quantitative basis for the estimation of carbon dioxide emission and sediment accumulation.

Figure 6 .
Figure 6.Temporal estimation of the estimated and observed values TOC (a), TDS (b), and Chl-a (c).

Figure 6 .
Figure 6.Temporal estimation of the estimated and observed values TOC (a), TDS (b), and Chl-a (c).

Figure 7 .
Figure 7. Maps generated from the simplified model for the estimation of water quality parameters.Figure 7. Maps generated from the simplified model for the estimation of water quality parameters.

Figure 7 .
Figure 7. Maps generated from the simplified model for the estimation of water quality parameters.Figure 7. Maps generated from the simplified model for the estimation of water quality parameters.

Table 1 .
Dates of acquisition of satellite images.

Table 1 .
Dates of acquisition of satellite images.

Table 3 .
Mathematical models and r 2 values obtained from the Box-Cox transformation.

Table 4 .
Kolmogorov-Smirnov test for the normalized water quality parameters.

Table 5 .
Multiple linear regression models for the water quality parameters of the ALM reservoir based on the reflectance values of the Landsat 8 satellite images.

Table 7 .
Statistical analysis for the discrimination of variables (bands).

Table 8 .
Final models after variable discrimination for the different parameters tested.