A GIS Methodology to Determine the Critical Regions for Mitigating Eutrophication in Large Territories: The Case of Jalisco, Mexico

: Inadequate management practices for solid waste and wastewater are some of the main causes of eutrophication globally, especially in regions where intensive livestock, agricultural, and industrial activities are coupled with inexistent or ineffective waste and wastewater treatment infrastructure. In this study, a methodological approach is presented to spatially assess the trophic state of large territories based on public water quality databases. The trophic state index (TSI) includes total nitrogen, total phosphorus, chlorophyll A, chemical oxygen demand, and Secchi disk depth values as water quality indicators. A geographical information system (GIS) was used to manage the spatiotemporal attributes of the water quality data, in addition to spatially displaying the results of TSI calculations. As a case study, this methodological approach was applied to determine the critical regions for mitigating eutrophication in the state of Jalisco, Mexico. Although a decreasing trend was observed for the TSI values over time for most subbasins (2012–2019), a tendency for extreme hypereutrophication was observed in some regions, such as the Guadalajara metropolitan area and the Altos region, which are of high economic relevance at the state level. A correlation analysis was performed between the TSI parameters and rainfall measurements for all subbasins under analysis, which suggested a tendency for nutrient wash-off during the rainy seasons for most subbasins; however, further research is needed to quantify the real impacts of rainfall by including other variables such as elevation and slope. The relationships between the water quality indicators and land cover were also explored. The GIS methodology proposed in this study can be used to spatially assess the trophic state of large regions over time, taking advantage of available water quality databases. This will enable the efﬁcient development and implementation of public policies to assess and mitigate the eutrophication of water sources, as well as the efﬁcient allocation of resources for critical regions. Further studies should focus on applying integrated approaches combining on-site monitoring data, remote sensing data, and machine learning algorithms to spatially evaluate the trophic state of territories.


Introduction
Global concerns about eutrophication have been rising in the past decades [1]. Eutrophication is defined as an accelerated and abnormal growth of autotrophic organisms, such as plants and algae, in a defined aquatic ecosystem due to nutrient enrichment, mainly phosphorus and nitrogen. Phosphorus can be loaded as particulate or as dissolved matter in the form of pentavalent phosphorus structures, which are then converted to orthophosphates and eventually consumed by autotrophic organisms [2]. Nitrogen is discharged into water bodies in the form of organic and inorganic compounds, depending on natural and anthropogenic factors surrounding the area. A positive correlation has been demonstrated between inorganic nitrogen (nitrates, nitrites, and ammonia) and orthophosphates on one hand and eutrophic levels on the other [3]. As a result of microalgae and cyanobacteria blooms caused by unbalanced nutrient enrichment, dissolved oxygen consumption increases, creating anoxic conditions [1]. This in turn can lead to ecological imbalances in eutrophic bodies of water, which can harm the local wildlife [4,5] by depleting the dissolved oxygen that endemic species need to survive.
To address eutrophication, several models have been developed to better understand the problem and to develop local solutions to ensure an adequate management of surface water sources [6][7][8][9]. Trophic state indices (TSI) were created as a means to quantify, report, rank, and communicate the trophic nature of bodies of water [10]. TSIs can be useful to assess the trophic state either at specific monitoring sites or complete lakes and streams based on a scale from 0 to 100 with discrete steps of ten; however, the variability of the environmental context around the globe has led to the creation of multiple TSI methodologies, where authors choose mathematical models, ranges, and parameters that best suit their research objectives and their study area [11].
Total nitrogen (TN) and total phosphorus (TP) are water quality parameters that play a crucial role in the eutrophication in continental waters and must be included in TSI models. In Mexico, the Federal Rights Law established maximum limits for water pollutants to preserve aquatic life. For freshwater ecosystems, a maximum TN limit is not specified; however, the ammoniacal nitrogen maximum concentration is set at 0.060 mg L −1 and TP must be under 0.050 mg L −1 [12]. In the United States, the Clean Water Act entitles every state to set their own internal limits for pollutants; however, the US Environmental Protection Agency overviews them and has the authority to set limits if the state fails to do so in accordance with the Clean Water Act's objectives [13]. For example, the state of California has published a set of values and ranges for TN and TP, depending on the region and hydrologic unit. These values vary widely, ranging from 0.1 to 10 mg mL −1 of TN and from 0.005 to 0.300 mg mL −1 , respectively [14]. The concentration of chlorophyll A (Chl-A) has been widely used as an effective parameter to estimate phytoplankton biomass due to the strong correlation between them. On the other hand, Secchi disk depth (SDD) is a measure of turbidity, which is also correlated with the trophic state of a body of water, because increased phytoplankton concentrations reduce the ability of light to penetrate deeper into the water. Consequently, Chl-A and SDD are commonly included in TSI models [7,15]. Furthermore, these parameters along with nutrient concentrations (TN and TP) have been combined in more robust models for trophic state evaluation [6,16,17]. Researchers rely on their experience or the available information to assess the TSI of a specific body of water. TSI models such as the one proposed by Carlson [7] only considered phosphorus and Chl-A concentrations to evaluate trophic states. These models, however, work under the assumption that eutrophic growth is only affected by TP concentration. On the other hand, Xu et al. [6] proposed a more complex model considering the TN concentration as well as chemical oxygen demand (COD) and SDD as parameters to assess TSI, assuming that water sources could be both phosphorus-and nitrogen-limited.
In Mexico, cultural eutrophication is a multifactorial problem that is spread throughout the country. Waste discharges from livestock, as well as agricultural and industrial activities, are the main sources of nutrients entering aquatic ecosystems [18]. In this study, the state of Jalisco was chosen to test the methodological approach. Jalisco is the main livestock producer in Mexico, with one of the strongest economies, where eutrophication has been reported as a consequence of poor management of livestock farming and of agricultural waste [8,[19][20][21][22][23][24][25]. According to Ramírez and Rodríguez [26], the livestock farming industry is still growing in the region, which is driving a shift in land use from natural ecosystems to animal production surfaces. As an example, studies on the Verde River have shown that nutrient loads received in the Verde River basin come mainly from the livestock industry and wastewater discharges located close to the mouth of the river basin [20]. To assess the trophic state of water sources in Jalisco, Membrillo-Abad et al. [8] used multispectral satellite images to estimate Chl-A and SDD in Chapala Lake. On the other hand, Jayme-Torres and Hansen [20] only considered TN and TP when assessing the trophic state in the Verde River Basin. Other authors have evaluated only individual water quality parameters in water sources in Jalisco, some of which are related to eutrophication [27]. The inconsistency hinders not only the comparison of these water sources regarding their trophic status, but also the creation of efficient strategies to mitigate the effects of eutrophication in a prioritized manner.
Worldwide, new strategies involving remote sensing and machine learning have been coupled with geographical information system (GIS) to monitor eutrophication in real time. Ho et al. [28] conducted a global study on phytoplankton blooms using satellite images of three decades. Hu et al. [29] developed a methodological approach to remotely monitor the TSI of shallow lakes in a subtropical region (China) using a GIS to process Landsat-8 images with 30 m pixel resolution. This approach proved to overcome the high costs associated with monitoring the TSI parameters in large regions due to laboratory and in situ water quality analyses. Nevertheless, approaches relying on satellite imagery may have several drawbacks when implemented for other regions. Firstly, the 30 m resolution of the Landsat-8 images was proven to be well suited for the assessment of shallow lakes, but not necessarily for other superficial water sources, such as rivers. Additionally, climatological factors, such as precipitation and cloudiness, may complicate gathering useful data during the rainy season, and consequently the seasonal variations of the TSI may be difficult to assess. Finally, the model was validated by comparing it with sampling data made by the TSI and measured using Carlson's model, which might not be well suited for every region because of the environmental assumptions it is based on. While this methodological approach was proven to be useful to estimate a TSI in a large subtropical region, it cannot be generalized for other regions until those limitations are overcome. Another important effort was made by Zhu and Mao [30] to estimate a five-parameter TSI (TN, TP, COD, Chl-A, and SDD) using machine learning algorithms. These authors also employed remote sensing and included environmental factors, such as water temperature, in the GIS.
In the present study, we aim to develop a standardized GIS methodology to spatially assess the eutrophication conditions in large territories over time through the estimation of a TSI model including total nitrogen (TN), total phosphorus (TP), chlorophyll-a concentration (Chl-A), chemical oxygen demand (COD), and Secchi disk depth (SDD). Because it is common to find significant quantities of official water quality data in government archives that have never been properly analyzed [31], the proposed methodology uses official water quality databases to determine the critical regions for mitigating eutrophication in large territories. The methodology involves (i) extraction from the water quality databases and processing of water quality data with spatiotemporal attributes to calculate the TSI values, (ii) the spatiotemporal interpolation of water quality data to handle the missing values in the dataset, and (iii) the calculation of the average trophic state index at the subbasin level. Furthermore, correlation coefficients are determined to assess the correlation between the TSI indicators (TN, TP, COD, Chl-A, and SDD) and precipitation and land cover, respectively. As a case study, the presented methodology is applied for the detection of hypereutrophic and extreme hypereutrophic subbasins in the State of Jalisco, Mexico, to allow optimal allocation of financial resources for remedial action plans. The reproducibility of this methodology is possible for other study areas worldwide where there are water quality monitoring networks. Satellite remote sensing and machine learning algorithms are powerful tools that may be used in future studies in combination with on-site monitoring data to evaluate the trophic states of water bodies.

Materials and Methods
The summarized methodology for this study is shown in Figure 1.

Materials and Methods
The summarized methodology for this study is shown in Figure 1.

Study Site and Water Quality Data
The area under study was the state of Jalisco, located in the western region of Mexico. It has a total population of 8,348,151 inhabitants [32] and an area of 78.5 thousand km 2 [19]. This region is made up of 85 subbasins that belong to seven hydrological regions across the state: Lerma-Chapala-Santiago (RH12), Huicicla (RH13), Ameca (RH14), Coast

Study Site and Water Quality Data
The area under study was the state of Jalisco, located in the western region of Mexico. It has a total population of 8,348,151 inhabitants [32] and an area of 78.5 thousand km 2 [19]. This region is made up of 85 subbasins that belong to seven hydrological regions across the state: Lerma-Chapala-Santiago (RH12), Huicicla (RH13), Ameca (RH14), Coast of Jalisco  of Jalisco (RH15), Armería-Coahuayana (RH16), Balsas (RH18), and El Salado (RH37) (Figure 2). The complete list of subbasins considered in this study is shown in Table S1 in the Supplementary Materials. Jalisco is considered a subtropical region. The mean annual temperature and precipitation are 20.5 °C and 850 mm, respectively [33]. Tropical and subtropical water sources share similar characteristics that make them more prone to eutrophication. The steep slopes of Jalisco also promote the development of eutrophication throughout its hydrographic networks [34]. As reported by Mohamed et al. [35], the range of slopes within Jalisco can vary from 0% to more than 40%, so there are regions that are more susceptible to hydric erosion and nutrient runoff than others. According to INEGI [36], the main soil types for the area of study are phaeozem (22%), regosol (18%), leptosol (17%), luvisol (12%), cambisol (10%), vertisol (7%), and other soil types (14%).
Water quality data from 2012 to 2019 were obtained from the National Water Commission in Mexico (CONAGUA) [37], which manages, regulates, and protects national waters in Mexico. Since 2012, the national network of water quality monitoring in Mexico has monitored and recorded periodic measurements in predefined monitoring sites all over the country. At the time of this study, only measurements from 2012 to 2019 had been made available to the public. In Jalisco, this database resulted from the periodical in situ monitoring of several water quality indicators on 313 predefined sampling sites across the state ( Figure 2). All the measurements include their geographical coordinates (latitude Jalisco is considered a subtropical region. The mean annual temperature and precipitation are 20.5 • C and 850 mm, respectively [33]. Tropical and subtropical water sources share similar characteristics that make them more prone to eutrophication. The steep slopes of Jalisco also promote the development of eutrophication throughout its hydrographic networks [34]. As reported by Mohamed et al. [35], the range of slopes within Jalisco can vary from 0% to more than 40%, so there are regions that are more susceptible to hydric erosion and nutrient runoff than others. According to INEGI [36], the main soil types for the area of study are phaeozem (22%), regosol (18%), leptosol (17%), luvisol (12%), cambisol (10%), vertisol (7%), and other soil types (14%).
Water quality data from 2012 to 2019 were obtained from the National Water Commission in Mexico (CONAGUA) [37], which manages, regulates, and protects national waters in Mexico. Since 2012, the national network of water quality monitoring in Mexico has monitored and recorded periodic measurements in predefined monitoring sites all over the country. At the time of this study, only measurements from 2012 to 2019 had been made available to the public. In Jalisco, this database resulted from the periodical in situ monitoring of several water quality indicators on 313 predefined sampling sites across the state ( Figure 2). All the measurements include their geographical coordinates (latitude and Sustainability 2021, 13, 8029 6 of 21 longitude) and the recollection date. A total of 7274 observations were obtained during the timeframe for five water quality parameters: TN, TP, Chl-A, COD, and SDD; however, the monitoring sites are not distributed evenly across the state and most of the locations are within the Lerma-Santiago hydrographic network. As a result, only 43 of the 85 subbasins in Jalisco were assessed in this study (Table S1 in the Supplementary Materials) due to a lack of monitoring sites within the remaining 42 subbasins.
Land cover data were retrieved from the public repository managed by the National Forestry Commission of Mexico (CONAFOR) [38]. The original categories were reclassified based on the reclassification system proposed by Gebhardt et al. [39]. The surface of Jalisco is covered by agriculture (29.8%), temperate forests (29.8%), tropical forests (25.5%), grasslands (8.9%), water (2.1%), human settlements (2.1%), scrublands (1.2%), bare soil (0.4%), other vegetation (0.2%), and wetlands (<0.1%), as shown in Figure 3. Additionally, the land cover distribution is presented for every subbasin in Jalisco in Table S2  and longitude) and the recollection date. A total of 7274 observations were obtained during the timeframe for five water quality parameters: TN, TP, Chl-A, COD, and SDD; however, the monitoring sites are not distributed evenly across the state and most of the locations are within the Lerma-Santiago hydrographic network. As a result, only 43 of the 85 subbasins in Jalisco were assessed in this study (Table S1 in the Supplementary Materials) due to a lack of monitoring sites within the remaining 42 subbasins. Land cover data were retrieved from the public repository managed by the National Forestry Commission of Mexico (CONAFOR) [38]. The original categories were reclassified based on the reclassification system proposed by Gebhardt et al. [39]. The surface of Jalisco is covered by agriculture (29.8%), temperate forests (29.8%), tropical forests (25.5%), grasslands (8.9%), water (2.1%), human settlements (2.1%), scrublands (1.2%), bare soil (0.4%), other vegetation (0.2%), and wetlands (<0.1%), as shown in Figure 3. Additionally, the land cover distribution is presented for every subbasin in Jalisco in Table  S2 in the Supplementary Materials.

Data Preprocessing
Values labeled as below the limit of detection (LD) were set at LD/(2 (1/2) ), which is a common substitution method used to minimize the error for data from skewed and centered distributions [40]. The medcouple (MC), which is a measurement of distribution skewness, was computed for every parameter in each monitoring site using the statsmodels 0.10.1 module for Python 3.7.4. The MC was then used to determine the most adequate method to detect and remove outliers based on the methodology proposed by Casillas-García et al. [31]. For symmetrical data distributions (|MC| < 0.4), the maximum normalized residual test [41] was applied for outlier detection, whereas for skewed distributions (|MC| ≥ 0.4) the modified lower and upper limits were computed as shown in Equations (1) and (2) if MC ≥ 0: and Equations (3) and (4) were used if MC < 0:

Modi f ied upper limit
where Q 1 and Q 3 stand for the first and third quartiles and IQR is the interquartile range. The observations that were above the modified upper limits or below the modified lower limits were then classified as outliers [42] and removed from the dataset. Further, all removed outliers were replaced with the values predicted by the supervised machine learning models described in Section 2.4. Monthly precipitation totals from 2012 to 2019 were obtained from official records for all weather stations in Jalisco [43]. The precipitation records from 2012 to 2019 were assigned to a water quality monitoring sample as an extra column on the water quality database considering the total precipitation recorded at the closest weather station at the time water quality was monitored. The equation used to estimate the distances from the weather stations to any given monitoring site is shown in Equation (5): where D ij = distance between weather station i and water quality monitoring station j; x i = longitude of the weather station i; x j = longitude of the water quality monitoring station j; y i = latitude of the weather station i; y j = latitude of the water quality monitoring station j.
The manager module geopandas 0.8.0 from Python was employed to compile the shapefiles with the geographical delimitations of every basin and subbasin within Jalisco's territory [44]. Geographical operators were used to merge the two databases so that each monitoring site would correspond to a specific subbasin.

TSI Model Application
The TN/TP ratio was computed to determine the limiting nutrient for every single observation in time and space. The TN/TP showed that the water sources in the state are limited by phosphorus (TN/TP > 22.6) only for 11.8% of the observations (Figure 4), so TP-limited eutrophication could not be assumed for this geographic region. Accordingly, the TSI scores were determined according to the ranges proposed by Xu et al. [6], which are shown in Table 1.   In contrast to Carlson's model, which takes SDD, TP, and Chl-A concentrations to estimate the TSI of a region, the model proposed by Xu et al. [6] applies TN, TP, Chl-A, COD, and SDD for TSI estimation. The later model considers that TN concentration might be a limiting factor for water sources. Additionally, COD and Chl-A concentrations, as well as water transparency, measured as SDD, have been demonstrated to be efficient indicators of eutrophication in a water body [45].

Spatiotemporal Regression to Handle Missing Values
Spatial interpolation was performed to handle the missing values in the dataset. The Python module sklearn 0.21.3 was employed to perform the k-nearest neighbor (kNN) regression algorithm for the statistical estimation of the missing values. Then, the data  In contrast to Carlson's model, which takes SDD, TP, and Chl-A concentrations to estimate the TSI of a region, the model proposed by Xu et al. [6] applies TN, TP, Chl-A, COD, and SDD for TSI estimation. The later model considers that TN concentration might be a limiting factor for water sources. Additionally, COD and Chl-A concentrations, as well as water transparency, measured as SDD, have been demonstrated to be efficient indicators of eutrophication in a water body [45].

Spatiotemporal Regression to Handle Missing Values
Spatial interpolation was performed to handle the missing values in the dataset. The Python module sklearn 0.21.3 was employed to perform the k-nearest neighbor (kNN) regression algorithm for the statistical estimation of the missing values. Then, the data were split into training and testing sets, whereby all the data points were randomly distributed so that 80% of them corresponded to the training set and 20% of the data were used to test Sustainability 2021, 13, 8029 9 of 21 the estimations. Through an iterative process, several neighbor numbers were used for the estimation-from k = 1 up to the rounded value of the square root of the total number of data points, which has been used as a rule of thumb for kNN regression [46]. For the spatial interpolation, kNN weights were assigned using two methodologies: kNN with uniform weights and inverse distance weighting kNN (IDW-kNN). Then, five regressions were made for both weighting methodologies for TN, TP, Chl-A, COD, and SDD, namely a spatial regression, attribute regression, mixed regression (spatial + attribute), temporal regression, and spatiotemporal regression.
After training all the models, the testing dataset was used to evaluate the best model to estimate missing data. Normalized root mean squared errors (NRMSE) and determination coefficients (R 2 ) were computed for every single combination of k-neighbors, regression type, and weighting method. The treatment with the lowest NRMSE and highest R 2 for each parameter in the testing dataset was selected to estimate missing data in the original water quality dataset.

Average Trophic State Index at Subbasin Level
After assigning every sampling site to a specific subbasin and predicting the missing values in the original dataset, descriptive statistics were computed for the water quality parameters used in the TSI model. Central tendency statistics were needed to generate the average TSI for every parameter in each subbasin. After the mean value was computed for each TSI parameter in every subbasin, mean values were normalized into a 0 to 100 score by interpolating the average value in accordance with the ranges shown in Table 1. Then, the average TSI per subbasin was assessed as shown in the Equation (6), where TSI k stands for the trophic state index relative for each of the TSI variables: Finally, we selected two timeframes to compare and visualize the temporal trends or changes in the TSI scores at subbasin, basin, and hydrographic network levels. For this, we compared the measurements from the first biennium in the original dataset (2012-2013) with the last biennium (2018-2019).

Correlation between the TSI Parameters and Precipitation and Land Use
The distributions of all five water quality parameters were tested for normality using the Kolmogrov-Smirnov statistic for goodness-of-fit. For each parameter, normality was assessed considering data from all monitoring sites. After rejection of normality (p < 0.05), Spearman's rank non-parametric correlation coefficients were computed using the scipy 1.3.1 module in Python. This correlation procedure is used to compute correlation coefficients between ranked data that come from non-normally distributed data. For each subbasin with data, Spearman's correlation coefficients were determined to assess the correlation between the monthly precipitation rate and TN, TP, COD, Chl-A, and SDD, respectively. Additionally, the land cover categories (agriculture, bare soil, grassland, human settlements, scrubland, temperate forest, tropical forest, water, wetland, and other vegetation) were correlated independently with the five TSI parameters (TN, TP, Chl-A, COD, and SDD) using Spearman's rank correlation at the state level.
The scipy algorithm used to compute the Spearman's correlation coefficients has two output matrices to evaluate the sign of the correlation (positive or negative), the strength of the correlations, and the significance. Non-significant correlations (p < 0.05) were treated as zero correlation for visualization purposes and correlations were classified as significant or non-significant. All significant correlations were classified as perfect positivenegative, strong positive-negative, moderate positive-negative, or weak positive-negative, according to their coefficients' ranges, as stated by Akoglu [47].

Spatiotemporal IDW-kNN Regression
After computing the NRMSE and R 2 of all models tested for the five TSI parameters, the best regression model was selected to estimate all the missing values in the complete dataset, as shown in Table 2. The IDW-kNN regression showed a better performance for estimating new data than the uniform kNN regression. This was expected, because IDW-kNN assigns weights depending on the distance of every neighbor to the estimation point, where closer points have a greater impact on the outcome [6]. An IDW-kNN regression with spatial attributes minimized the NRMSE for TN and TP; on the other hand, a combination of spatial and temporal attributes resulted in better estimators with lower NRMSE for Chl-A, COD, and SDD. This suggests that for TN, TP, Chl-A, COD, and SDD, the observed value has a strong relationship with its location, mainly because of specific environmental and anthropogenic factors such as weather, land use, and precipitation. Additionally, for Chl-A, COD, and SDD, the temporal factor also suggests that eutrophication could be a function of time, where several policies and seasonal factors might change the water quality of a specific water source. There are several approaches and methodologies for predicting spatial data through interpolation and classification. Wong et al. [48] evaluated four of these interpolation methodologies for air quality parameters over large regions in the United States: spatial averaging (SA), kNN, IDW, and kriging. Their results showed no statistically significant differences (p < 0.05) between the values predicted by the SA, kNN, and IDW models in large regions with low monitor density (where monitoring sites are on average further than 10 miles from each other), whereas for regions with a higher monitor density, better results were achieved by the SA, kNN, and kriging models. In the case of Jalisco, there are subbasins with both low and high monitor density levels. Even in regions with high monitor density, the IDW-kNN displayed better estimations than uniform kNN regression. The criteria used to determine the best estimation model was the minimization of the NRMSE in the test dataset for each TSI parameter. The IDW-kNN regression has a fixed number of neighbors, as opposed to the traditional IDW regression [48], where a radial distance from the observation is defined instead of a number of neighbors. By keeping the number of neighbors constant, we can reduce the variance shown in spatial interpolation methods for observations located in high monitor density regions [48], without having to dynamically adjust the interpolation radius for every observation.

Average TSI
The average TSI values for every subbasin and parameter in this study are shown in Table 3. Additionally, a geographical representation of the resulting average TSI for every subbasin can be seen in Figure 5 for the two selected timeframes (2012-2013 and 2018-2019). A decrease in the TSI score can be observed from 2012 to 2019 in 34 out of 43 subbasins. Zapotlán Lake (subbasin code RH12De, as shown in Table S1 in the Supplementary Materials), Tecomala River (RH13Aa), and Bolaños Alto River (RH12Kc) displayed the highest TSI decrease from 2012 to 2019 ( Figure 5A,B). Despite the drop in TSI over time, no published information was found regarding any actions or programs to improve water quality in these subbasins between these two timeframes. Precipitation levels in 2018-2019 were significantly lower than those in 2012-2013, so the precipitation effect needed to be evaluated thoroughly in those areas. On the other hand, the opposite effect was shown for the La Laja River (RH12Ef) and the Bolaños River-Huaynamota River (RH12Fa) subbasins. A rise in the TSI scores coincided with an increase in precipitation levels in the closest monitoring stations at the time of the measurements, suggesting that eutrophication is highly linked to the hydrological cycle in these subbasins. and near the mountain chain and the coastline. Anthropogenic factors are buffered in these less populated areas. On the contrary, the highest average TSI values were located around the Guadalajara metropolitan area, in the central region of the state, and in the Altos region, located in the northeast of Jalisco. The metropolitan area is the most densely populated locality within the state and has a high proportion of the industrial activities in Jalisco. The higher average TSI scores within the northeastern region can be explained by the excessive livestock farming [49] and agricultural activities [50] (Figure 3). In general, TSI scores in Jalisco decreased over the study timeframe; however, the same methodological approach must be used to continue evaluating changes in the TSI over a greater period to assess the effectiveness of the local public policies implemented by local authorities. Average TSI scores were found at the basin ( Figure 5C) and hydrographic network ( Figure 5D) levels. Only the Verde Grande River (RH12I) and Santiago- The subbasins with the lowest average TSI scores were found in the south of Jalisco and near the mountain chain and the coastline. Anthropogenic factors are buffered in these less populated areas. On the contrary, the highest average TSI values were located around the Guadalajara metropolitan area, in the central region of the state, and in the Altos region, located in the northeast of Jalisco. The metropolitan area is the most densely populated locality within the state and has a high proportion of the industrial activities in Jalisco. The higher average TSI scores within the northeastern region can be explained by the excessive livestock farming [49] and agricultural activities [50] (Figure 3).
In general, TSI scores in Jalisco decreased over the study timeframe; however, the same methodological approach must be used to continue evaluating changes in the TSI over a greater period to assess the effectiveness of the local public policies implemented by local authorities. Average TSI scores were found at the basin ( Figure 5C) and hydrographic network ( Figure 5D) levels. Only the Verde Grande River (RH12I) and Santiago-Aguamilpa River (RH12F) basins showed increasing TSI trends over time. RH12F is influenced by the Grande Santiago River and the Verde Grande River, which have been referred to as two of the most polluted basins in the entire Lerma-Santiago (RH12) hydrographic network and are considered relevant monitoring sites for governmental agencies and researchers [51].
Additionally, based on average TSI distributions at hydrographic network levels, RH12 had the highest averages over both periods ( Figure 5D).
The critical subbasins were determined based on the average TSI scores from 2018 to 2019. Eighteen subbasins (41.9%) were classified as extremely hypereutrophic (TSI > 80) and hypereutrophic subbasins (70 ≤ TSI < 80) were also frequent (27.9%), as shown in the summary of the average TSI classification distribution in Table 4. The four most critical subbasins are shown in detail in Figure 6, where the main water sources are graphed and the monitoring sites are represented with dots colored according to TSI score. The Corona River-Verde River subbasin had the highest TSI score ( Figure 6A). We can see that the monitoring points located in Lake Cajititlán increase the average TSI of the whole subbasin. Lake Cajititlán has been described as one of the most polluted water reservoirs in the country, with critical eutrophication levels reported over the years [21,25,52,53]. The San Marcos Lake subbasin ( Figure 6B) had an average TSI of 91.6. A closer look at the monitoring stations shows that the Hurtado Dam was responsible for the high average TSI score in this subbasin. The Hurtado Dam suffered an aggressive spillover of molasses in 2013 [54]. This event dramatically increased nutrient levels in the dam, resulting in massive ecocide of local wildlife due to reduced levels of dissolved oxygen. Five years after the event, the average TSI shows us that the dam had not fully recovered from this anthropogenic disaster. On the other hand, the Lagos River ( Figure 6C) and Verde River-Santa Rosa Dam ( Figure 6D) are the other two most critical subbasins according to their average TSI. The Lagos River subbasin surrounds the municipality of San Juan de los Lagos, which has the largest livestock population in the state. The higher TSI scores in this region are related to poor management of livestock residues. Finally, measurements from the Verde River-Santa Rosa Dam subbasin are well distributed along three bodies of water: the Santa Rosa Dam, the Santiago River, and the Verde River. The Verde and Santiago Rivers' poor water quality and risk of eutrophication have been extensively documented [20,31]. The main causes of eutrophication in these regions include excessive nutrient loads due to the massive livestock populations, lack of regulation in fertilizer use for agriculture, and inefficient wastewater treatment programs [20,55,56]. Additionally, the Verde River-Santa Rosa Dam subbasin is located north of the Guadalajara Metropolitan Area, which has one of the highest human settlement proportions in the region. According to land cover data, this subbasin has the second largest proportion of human settlements in Jalisco (Table S2 in the Supplementary Materials). This land cover category represents 11.9% of the subbasin's surface. Tahiru et al. [57] conducted a study to evaluate the effects of land cover changes on water quality in a tropical region (Ghana). Their results showed a positive significant correlation between increasing human settlement and water turbidity and total ammonia. In this study, ammonia was shown to be positively correlated with agricultural land, which covers 27.4% of the Verde River-Santa Rosa Dam subbasin's surface. Eutrophication has become an ongoing subject for researchers in different locations across the state of Jalisco, such as in Chapala Lake [8,22], Verde River [20,55], and Lake Cajititlán [21,25,52,53], among others; however, little effort has been made to standardize the use of a specific methodology to evaluate eutrophication in Jalisco's freshwater bodies. While eutrophication has been a relevant matter for discussion in the area, a methodological approach to consistently assess the trophic state of surface waters has not been proposed; hence, it might be hard for public and private stakeholders to allocate financial resources and prioritize remedial action plans to improve water quality.
This work addresses the previous issue by proposing a methodological approach for data wrangling to properly transform water quality databases from large territories, so that the five-parameter TSI model can be applied in a standardized manner. In Mexico, the five selected parameters considered in the TSI model are included in the national water quality monitoring program of the Mexican federal government. A standardized methodology, as proposed here, would enable any user to analyze public databases to objectively compare a TSI in almost any region in the country and to perform further analyses as needed. Additionally, most countries monitor the same water quality parameters as a regular task. Eutrophication has become an ongoing subject for researchers in different locations across the state of Jalisco, such as in Chapala Lake [8,22], Verde River [20,55], and Lake Cajititlán [21,25,52,53], among others; however, little effort has been made to standardize the use of a specific methodology to evaluate eutrophication in Jalisco's freshwater bodies. While eutrophication has been a relevant matter for discussion in the area, a methodological approach to consistently assess the trophic state of surface waters has not been proposed; hence, it might be hard for public and private stakeholders to allocate financial resources and prioritize remedial action plans to improve water quality.
This work addresses the previous issue by proposing a methodological approach for data wrangling to properly transform water quality databases from large territories, so that the five-parameter TSI model can be applied in a standardized manner. In Mexico, the five selected parameters considered in the TSI model are included in the national water quality monitoring program of the Mexican federal government. A standardized methodology, as proposed here, would enable any user to analyze public databases to objectively compare a TSI in almost any region in the country and to perform further analyses as needed. Additionally, most countries monitor the same water quality parameters as a regular task.

Precipitation
In contrast to the individual distribution tests performed for each monitoring station, where both centered and skewed distributions were seen for all five TSI parameters, normality was rejected when considering values from all monitoring sites (p < 0.05); thus, Spearman's rank, non-parametric, and correlation coefficients were computed between TN, TP, Chl-A, COD, and SDD and precipitation for every subbasin. Correlation significance was not rejected for 39.53%, 60.47%, 16.28%, 37.21%, and 39.53% of subbasins with available data for TN, TP, Chl-A, COD, and SDD, respectively (p < 0.05) (Figure 7). Correlations were further classified as positive or negative and as perfect, strong, moderate, or weak. For TN, TP, Chl-A, and COD, positive correlations mean that TSI increases proportionally with an increase in precipitation levels, suggesting a nutrient wash-off from soil to continental water bodies in those subbasins. A negative correlation for these parameters shows the opposite effect-a drop in the TSI proportionally to an increase in precipitation levels; nutrient dilution is implied in this scenario for those subbasins. On the other hand, SDD is inversely proportional to TSI, meaning that a higher SDD is related with lower turbidity in the body of water, and vice versa. For SDD, negative correlations suggest a wash-off effect, while positive correlations suggest a dilution effect in the subbasin.
The correlation analysis between precipitation and TN showed that 27.9% of the subbasins had positive correlations with precipitation and 9.3% had negative correlations. The TP correlations with precipitation were positive for 32.6% of the subbasins, while 16.3% were negative. Chl-A had positive correlations with precipitation in 11.6% of the subbasins and zero negative correlations. Also, COD showed a positive correlation with precipitation in 27.9% of the subbasins and 7.0% negative correlations. Finally, SDD showed 34.9% of the subbasins to be positively correlated with precipitation without any subbasin being negatively correlated with precipitation.
The correlation analyses suggested a tendency for higher eutrophic levels during the rainy seasons. This may have been a result of nutrients being washed-off the land into lakes and rivers. On the other hand, in a few subbasins rainfall had a dilution effect due to an increase of water levels, where nutrient wash-off was not significant [18]. A higher wash-off effect indicates that there is a strong anthropogenic impact, mainly due to livestock, agricultural, and industrial activities, in the development of eutrophication in most subbasins, as suggested by several authors [19,20,56]. The correlation analysis showed that significant correlations (p < 0.05), suggesting nutrient wash-off, were often located in areas with strong primary and industrial activities. Flores López et al. [58] described the relationship between inadequate livestock and agricultural waste management practices and higher nutrient concentrations in water bodies located in the Altos region of Jalisco. Rainfall promotes the run-off of nutrients present in the soil due to fertilizers and livestock manure. Run-off is a multidimensional process that depends on several environmental, edaphic, and anthropogenic variables. Because of rain, nutrients in the soil's surface are carried out in a laminar flow, following the slope of the terrain until the flow reaches a channel or a stream; however, increments in precipitation increase the velocity of the flow, resulting in a turbulent flow. Such flow types can carry more particles with nutrients to the watershed areas [58]. The direction of the slope will define where the nutrients will end up. For these reasons, the methodological approach will benefit from correlation analyses that consider the slope of the terrain surrounding the watersheds, as well as other variables relevant for eutrophication, such as elevation, and amount of waste generation. Sustainability 2021, 13, x FOR PEER REVIEW 16 of 21  In the Colotlán River subbasin, in contrast, a significant negative correlation for TN, TP, and COD showed an evident dilution effect with increasing precipitation in areas with low anthropogenic input far from overindustrialized regions and agroindustrial clusters. The only relevant exception was the Lagos River subbasin, which had weak negative correlations for TP, COD, and SDD. Further analyses must be performed to fully understand the relationships between environmental factors and eutrophication. Furthermore, the correlations for subbasins RH12Kc, RH12De, RH12Ef, and RH12Fa showed a strong tendency for nutrient wash-off for some parameters. This suggests that the respective average TSI drop or increase from 2012 to 2019 was due to insufficiently distributed sampling throughout the entire years, including both rain and drought seasons.

Land Cover
The land cover distribution was measured as the ratio of the area of a specific class of land cover in a subbasin to the subbasin's total area. Relationships between the water quality indicators and land cover were explored using Spearman's rank correlation test. A correlation matrix was constructed between the five TSI parameters and the ten land cover classes ( Figure 8). Four out of the ten land cover classes showed positive significant correlations with TN (agriculture, human settlements, and water), TP (agriculture and human settlements), Chl-A (agriculture), COD (agriculture, human settlements, and water), and SDD (agriculture, human settlements, water, and wetlands). Moderate positive correlations were found between agricultural land cover and human settlements and the TSI parameters, suggesting that higher eutrophication levels are found in subbasins with higher proportions of land dedicated to agriculture and urban development. Similarly, Manssour and Al-Mufti [59] reported the influence of agriculture and human activities on TN, TP, and COD concentrations.
Two land cover categories (temperate forest and tropical forest) were negatively correlated with TN (temperate forest and tropical forest), TP (temperate forest), COD (temperate forest and tropical forest), and SDD (temperate forest and tropical forest), as shown in Figure 8. Similarly to our study, Rodríguez-Gallego et al. [60] assessed the effects of Four out of the ten land cover classes showed positive significant correlations with TN (agriculture, human settlements, and water), TP (agriculture and human settlements), Chl-A (agriculture), COD (agriculture, human settlements, and water), and SDD (agriculture, human settlements, water, and wetlands). Moderate positive correlations were found between agricultural land cover and human settlements and the TSI parameters, suggesting that higher eutrophication levels are found in subbasins with higher proportions of land dedicated to agriculture and urban development. Similarly, Manssour and Al-Mufti [59] reported the influence of agriculture and human activities on TN, TP, and COD concentrations.
Two land cover categories (temperate forest and tropical forest) were negatively correlated with TN (temperate forest and tropical forest), TP (temperate forest), COD (temperate forest and tropical forest), and SDD (temperate forest and tropical forest), as shown in Figure 8. Similarly to our study, Rodríguez-Gallego et al. [60] assessed the effects of land uses and land cover on eutrophication indicators. These authors concluded that forested cover is correlated with lower values of the eutrophication indicators. Although temperate and tropical forests cover more than 55% of the territory of Jalisco, increase in agricultural land and human settlements have been reported for the state, similarly to the national trend [61].

Conclusions
Our results indicate that eutrophic levels are critical in most of the study area, according to the TN, TP, Chl-A, COD, and SDD measurements for 8 years. These results can be used to rank areas by their TSI so that the state and local governments can direct their efforts to improve the trophic state at a subbasin level in a more competent manner. This would result in a deeper understanding of the specific eutrophic development in every monitoring station and would enable the clustering of stations and regions with similar behavior so that public policies can develop management programs.
In this study, a methodological approach was proposed to evaluate the TSI at a subbasin level based on water quality databases published by governmental dependencies. A GIS was used for spatiotemporal interpolation. The methodology implemented here uses available data monitored on-site by public agencies; however, on-site monitoring is expensive, and as a result there are regions in the world where water quality monitoring is infrequent or limited to a small percentage of the territory. As previously mentioned, satellite remote sensing is now used to monitor and evaluate the trophic states of water bodies over a broader range with faster speeds and lower costs; thus, further studies should combine on-site monitoring data with remote sensing data to integrate more complete databases and to develop a more complete picture of the trophic status, especially in large territories without a comprehensive water quality monitoring program. Furthermore, machine learning approaches can be implemented to evaluate the trophic state index scores of territories that are not regularly monitored.