A Methodology to Model Environmental Preferences of Ept Taxa in the Machangara River Basin (ecuador)

Rivers have been frequently assessed based on the presence of the Ephemeroptera— Plecoptera—Trichoptera (EPT) taxa in order to determine the water quality status and develop conservation programs. This research evaluates the abiotic preferences of three families of the EPT taxa Baetidae, Leptoceridae and Perlidae in the Machangara River Basin located in the southern Andes of Ecuador. With this objective, using generalized linear models (GLMs), we analyzed the relation between the probability of occurrence of these pollution-sensitive macroinvertebrates families and physicochemical water quality conditions. The explanatory variables of the constructed GLMs differed substantially among the taxa, as did the preference range of the common predictors. In total, eight variables had a substantial influence on the outcomes of the three models. For choosing the best predictors of each studied taxa and for evaluation of the accuracy of its models, the Akaike information criterion (AIC) was used. The results indicated that the GLMs can be applied to predict either the presence or the absence of the invertebrate taxa and moreover, to clarify the relation to the environmental conditions of the stream. In this manner, these modeling tools can help to determine key variables for river restoration and protection management.


Introduction
The composition of the benthic macroinvertebrate communities can reflect the ecological quality of surface waters over time as pollution induces systematic shifts in community composition [1,2].Biological monitoring to assess river water health has been in use for more than a century.Nevertheless, in developing countries, this procedure was introduced and subsequently developed only recently [3].With biological samples, it is possible to predict the average values of chemical parameters over a period of time, when their accruing effects have been more pronounced in the biota [2].
For evaluation of the ecological water quality of rivers and lakes, some metrics based on taxonomic macroinvertebrate community composition have been developed, such as % Ephemeroptera-Plecoptera-Trichoptera (% EPT), % scrapers, taxa richness and Biological Monitoring Working Party (BMWP) [1,4].The latter index, which was developed in England and has been adapted to tropical countries, is a procedure to determine the water quality classes based on the score of sensitivity to organic pollution of the taxa found in water bodies [5][6][7].The EPT taxa are often used because these families are the most sensitive orders and their richness is related to water quality, in particular, mainly to oxygen deficiency and environmental degradation [8][9][10].
Few studies have been conducted to understand the environmental preferences of macrobenthos in the high altitude Andes, in which some parameters have been identified as possible predictors of the presence of taxa.For instance, Ríos-Touma, et al. [11] found a relation between macroinvertebrate community composition and flow stability velocity and their variation in both dry and rainy seasons.Jacobsen and Marín [12] principally analyzed the presence of the EPT taxa in relation to temperature and oxygen saturation.The same authors, in other research performed in Ecuador, found that the relation between the EPT taxa with the total of invertebrate fauna diminishes with the altitude [13].Other research in the Andes indicated that the effect of organic pollution on macrobenthos was more evident during the dry season in comparison to the rainy season [14].However, other abiotic conditions that could have an effect on the environmental preference of macroinvertebrates, especially with regard to the sensitivity of EPT taxa were not investigated in these studies.
Prediction of the distribution of species based on biotic environmental conditions has been done through the modeling of running water.These models are now recognized as the core of predictive ecology and are powerful tools in conservation planning.The approach of these models is to predict the presence/absence or abundance of a species in relation to physicochemical variables and biotic and abiotic conditions collected in a specific habitat [15,16].Predicting the composition and distribution of macrobenthos communities in rivers is not a simple exercise due to the non-linear behavior of the ecosystem and the complexity of biotic and abiotic variables [1,2,17].For ecological modeling, techniques such as artificial neural networks (ANNs), Bayesian belief networks (BBNs), classification and regression trees (CTs and RTs), genetic algorithms (GAs), logistic regressions (LRs) and support-vector machines (SVMs) have been used [16,18].The developed models can be a comprehensive way to assess, with high reliability, the impact of an anthropogenic source in rivers, which can be helpful for decision support systems in river basin management [1].
In this article, we aim to determine which physicochemical parameters are related to the occurrence of three sensitive families, one of each order of the EPT taxa, in the Machangara River Basin in Ecuador.To do so, environmental and biological variables were collected, which were the basis of the construction of three generalized linear models (GLMs).Furthermore, we discuss the ecological relation between the selected predictors and their corresponding taxon, as well as the potential application and restriction of these models.

River Basin
The Machangara River is an Andean mountain river, which is an affluent of the Cuenca River.It is about 37 km [19] in length and at the end of its path, crosses the city of Cuenca, located in the southern Province of Azuay in Ecuador (Figure 1).The estimated population in Cuenca in 2015 was about 370,000 inhabitants [20].It is also the main urban center in the study area.The Cuenca River Basin is part of the Hydrographic Demarcation Santiago, one of the Amazon Effluents.
This study focuses on the Basin of the Machangara River and its sub catchments.The study area is about 325 km 2 , of which 79.1% is forest protected by the Ecuadorian Government, 8.9% is a mosaic between urban area, pastures and crops, 7.1% is used as pastures, 2.4% is urban area, 1.1% is water bodies and 1.4% is bare soil (Figure 2A).This basin is regulated all year long by the presence Water 2017, 9, 195 3 of 31 of the two hydroelectric power plants, with their respective dams, Labrado and Chanlud, situated in the upper area of the catchment, upstream from Cuenca City.Water is extracted from the catchment basin for use primarily as a supply of drinking water, agriculture irrigation, and to a lesser extent, for industrial use.The altitude of the basin varies from 2440 to 4420 m above sea level (MASL) and its mean altitude is 3557 MASL.The average annual rainfall in the basin varies between 877 mm and 363 mm per year, while the average annual temperature fluctuates between 16.3 • C and 9.0 • C in the lower and upper areas respectively [21,22].Two seasons are present during the year, the rainy from the middle of February until the beginning of July and the dry season during the rest of the year.The average flow of the Machangara River measured from 1964 to 2010, before discharging into the Tomebamba River, was 8.4 m 3 •s −1 , the average minimum was 5.3 m 3 •s −1 in August and the average maximum being 14.6 m 3 •s −1 in May [23], cf. Figure A1 in Appendix for monthly averages.Despite the combined sewage system in Cuenca, poor water quality results were obtained along parts where the river flows through the city.This is mainly due to some sewage networks and industrial pollution points that are discharging in certain locations along the river and its tributaries that are affecting the water quality of these streams [19,24].Moreover, discharges from combined sewer overflows (CSOs) events, when wet-weather flows exceed the sewage treatment plant capacity, as well as the surface water outfalls (SWO), cause the degradation of physicochemical and biological quality [8,[25][26][27].Similarly, pollution from agricultural and livestock runoffs transport polluted water into the rivers [28].

Data Collection
The dataset used in this research was collected and measured once during the rainy season in February and March 2012, while the validation data set was sampled in the last half of July 2015 in dry season.The study considered 33 sampling locations (Figures 2 and A2), whose altitudes varied from 2451 to 3428 MASL.In each point, 16 physicochemical, hydraulic, microbiological and biological variables were measured (Table 1).The locations were chosen along the catchment according to land use (Figure 2A).Despite the combined sewage system in Cuenca, poor water quality results were obtained along parts where the river flows through the city.This is mainly due to some sewage networks and industrial pollution points that are discharging in certain locations along the river and its tributaries that are affecting the water quality of these streams [19,24].Moreover, discharges from combined sewer overflows (CSOs) events, when wet-weather flows exceed the sewage treatment plant capacity, as well as the surface water outfalls (SWO), cause the degradation of physicochemical and biological quality [8,[25][26][27].Similarly, pollution from agricultural and livestock runoffs transport polluted water into the rivers [28].

Data Collection
The dataset used in this research was collected and measured once during the rainy season in February and March 2012, while the validation data set was sampled in the last half of July 2015 in dry season.The study considered 33 sampling locations (Figures 2 and A2), whose altitudes varied from 2451 to 3428 MASL.In each point, 16 physicochemical, hydraulic, microbiological and biological variables were measured (Table 1).The locations were chosen along the catchment according to land use (Figure 2A).The samples of benthic macroinvertebrates were taken from the river and their tributaries by using the kick-sampling procedure.This method is applied by shuffling the feet walking backwards  The samples of benthic macroinvertebrates were taken from the river and their tributaries by using the kick-sampling procedure.This method is applied by shuffling the feet walking backwards against a current while holding a standard net (inlet area 575 cm 2 , mesh size 900 µm, depth 27.5 cm) Water 2017, 9, 195 5 of 31 against the current for six minutes in an area of 5 m 2 [29][30][31].The sampling was performed in a stretch of about 10-20 m of length, in different aquatic habitat.Thirty-six taxa of macroinvertebrates were identified up to family level with the help of a stereoscope and specific reference materials [32][33][34].At each sampling location, to assess the water quality, the Biological Monitoring Working Party score adapted to Colombia (BMWP-Col) was calculated [7,33,35] (Figures 2B and A2).The sensitivity score ranges from one (for very tolerant taxa) to 10 (for the most sensitive families).The BMWP-Col, which is measured from one to 120, gives a five water quality classification in function of the sum of sensitive scores obtained in each location: bad (≤15), deficient (16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35), moderate , good (61-99), very good (100-120) [33,35].

Model Species
Three macroinvertebrate families present in the river basin, which are pollution-sensitive based on the BMWP-Col, were selected.These taxa were Baetidae (Ephemeroptera), Leptoceridae (Trichoptera) and Perlidae (Plecoptera).Baetidae belong to the mayfly family and their tolerance score (TS) to pollution is 6.Leptoceridae are part of the caddisflies family and have a TS equal to 8. Perlidae are part of the stoneflies family and have a TS of 10.

Model Development, Selection, Validation and Optimization
In this research, the presence/absence of three sensitive macroinvertebrate taxa, described above, was associated with a physicochemical water quality condition, using a generalized linear model (GLM) with binomial adjustment.The presence or absence of a species, in connection with explanatory variables such as environmental conditions, has been frequently modeled as binary response using GLM [36][37][38].The GLMs were developed in order to conceptualize the environmental preferences of macroinvertebrates.Three different GLMs were constructed, thus one model for each family that can be used as input for the environmental suitability quantifications.
For starting the data exploration and analysis, boxplots linked to the presence and absence of a family in relation to the amplitude of physicochemical variables were constructed [36].
Collinearity between explanatory variables were computed with a non-parametric Spearman rank correlation coefficient, which is regularly used in ecology because this parameter makes no assumptions about linearity between the two variables [36,39].Variables are not omitted for the construction of a model, when the Spearman correlation coefficient is lower than 0.80, since this means that no strong correlation is detected and no redundant variables are identified [36,40,41].
In a GLM, the distribution of the response variable Y i also expresses the mean and variance of Y i .As a result, the expected mean and variance of Y i are given by: Where, π i signifies the probability that taxa i are present and (1 − π i ) is the probability that they are absent.The models were composed using Equation (1), in which X ni means the explanatory variables, while α and β i are regression parameters [36].
For the assessment of the goodness-of-fit and choice of models, the Akaike information criterion (AIC) was analyzed.AIC is a statistical performance criterium that is a trade-off between model complexity and model accuracy criteria.Thus, the best model, which has the lowest value of AIC, tends to fit closest to reality [42].
The effects of the input variables (i.e., abiotic variables) on model performance were assessed in order to identify which ones have influence on the presence or absence of a family.This was done by consecutively eliminating the least important input variables that had the highest p-value.When beneficial for model performance, previously removed variables were introduced again [43], in seeking the lowest AIC.This stepwise input variable selection procedure in the construction process of the three models was tested to find the lowest AIC.Each model had its own explanatory variables with which the presence or absence of an individual taxon was predicted.
For each of the developed models, the adjusted R 2 was reported, which is a likelihood ratio index in a logistic regression model that is analogous with the R 2 used in multiple linear regression techniques [44].All GLMs were fitted with an independent correlation matrix and all statistical tests were performed at the 5% significance level.The GLMs and the boxplots were constructed in R [45].
Finally, the model was validated with information collected in 14 points of the same basin in June of 2015 (dry season).The accuracy of the validation set for each GLM was calculated as the adjusted R 2 , which was determined as a relation between the correct predictions divided by the total points analyzed.

Data Exploration
The flow had a wide variation with minimal values lower than 0.1 m 3 •s −1 in small streams located in the upper area of the basin and maximum values of 23.1 m 3 •s −1 in the lowest areas of the catchment, close to the city.The flow velocity, due to the mountainous terrain and the slope difference, varies from a very slow flow (0.07 m•s −1 ) in flat areas to high velocities (1.84 m•s −1 ) in steep lands (Figure A3A).The water temperature was colder (9.1 • C) in the upper mountain regions than in the lower regions (13.4 • C) (Figure A3A).The pollution from anthropogenic origin measured as BOD 5 (Figure A3A), COD (Figure A3B) and fecal coliforms (Figure A3A) was low.The maximum concentration of these pollutants was 13 and 46 mg•L −1 and 5.4 × 10 5 MPN.100 mL −1 respectively.Values that were in the range of an analysis done by Esquivel, et al. [24], with more than 200 physicochemical samples taken on a quarterly basis by ETAPA-EP from 1984 to 2006.Because the fecal coliforms exhibited a wide variation, the analysis was executed on a logarithmic scale.For the most part, the pH remained in the basic zone, with three points with a level less than 7.The minimum value of this parameter was 6.33, while the maximum value was 8.36 (Figure A3).The maximum value of the color was 40 HU, which was mainly vegetable in origin, due to the presence of moorland, wetlands and native vegetation (Figure A3B).The conductivity in general was low, with values from 13.2 to 238 µS•cm −1 (Figure A4C).The dissolved oxygen (DO) was high with a minimum of 6.7 mg•L −1 (Figure A3C).This could be due to the high re-aeration capacity because of the high flow velocity and the low depth of the water in the river.When the Machangara River passes through the urban area, the maximum values of total solids (Figure A4A), and turbidity (Figure A4B) were registered: respectively at 190 mg•L −1 and 48.2 NTU.In general, the river had low depth, with only one place in the flat area located in the city, where this value reached more than 1.0 m (Figure A4D).Other pollutants such as the organic nitrogen (Figure A3C), the Nitrate/Nitrite, the Ammonia Nitrogen and the Phosphates (Figure A4D) had low concentrations, with maximum values of 6.55 mg•L −1 , 0.70 mg•L −1 , 0.40 mg•L −1 and 0.55 mg•L −1 respectively.These points were registered where the land was used for pasture or crops.An overview of the measured physicochemical, microbiological and biological variables can be seen in Table 1.
Poor biological water quality was registered in the stretches after the dams of the two hydroelectric plants.The main cause of poor water quality here is hydropeaking.By contrast, good water quality results were found in the upstream locations of the basin where there is a low human presence.While, eight places located in the high land natural forest had moderate water quality, two locations in the reach flowing through the city registered poor biological water quality.
Regarding the fluctuations of the abiotic variables and their relation to the BMWP-Col, we concluded that, when the flow velocity in the studied rivers was higher than 1.3 m•s −1 , the poorest biological water quality was found (i.e., BMWP-class V: bad) (Figure 3C).The highest concentration of the organic pollutants BOD 5 (Figure A5A), COD (Figure 3A) and fecal coliforms (Figure A5B) was registered closest to the city, which agreed with the poorest biological quality (class V).High flow velocities and increased levels of organic matter were detected in only one point close to the city, leading both parameters to register low biological water quality.In relation to the temperature, the biological water quality tended to be better with lower temperatures (Figure 3D).It could be because in elevated regions, the anthropogenic pollution is lower and a protected area is present.The pH (Figure A5C), the color (Figure A5D) and the conductivity (Figure 3B) do not have a visible impact on the BMWP-Class.
Water 2017, 9, 195 7 of 30 (Figure A5C), the color (Figure A5D) and the conductivity (Figure 3B) do not have a visible impact on the BMWP-Class.When the BMWP-Col is analyzed with the selected EPT taxa, we determined that Baetidae was present in all biological classes (Figure 4A).While, Leptoceridae was found when the BMWP-Col varied from class I to IV (Figure 4B) and Perlidae was present in the highest biological classes (i.e., Class I, II and III) (Figure 4C).When the BMWP-Col is analyzed with the selected EPT taxa, we determined that Baetidae was present in all biological classes (Figure 4A).While, Leptoceridae was found when the BMWP-Col varied from class I to IV (Figure 4B) and Perlidae was present in the highest biological classes (i.e., Class I, II and III) (Figure 4C).

Correlation Analysis
Regarding the collinearity analysis (Tables 2 and A1), no variables were omitted for the construction of the three models.Four cases in which the absolute value of correlation coefficient is greater than 0.50 but less than 0.80, were detected.For the Leptoceridae model, two values higher than 0.50 were found, COD with BOD 5 (r = 0.61, p < 0.001) and Log Fecal Coliforms with BOD 5 (r = 0.57, p < 0.001).In the Baetidae model, only one occurrence of the correlation coefficient higher than 0.50 was established, color with COD (r = 0.63, p ≤ 0.001).One of the high values of correlation coefficient is color with BOD 5 (r = 0.51, p = 0.002), which is not in relation to any model.When the BMWP-Col is analyzed with the selected EPT taxa, we determined that Baetidae was present in all biological classes (Figure 4A).While, Leptoceridae was found when the BMWP-Col varied from class I to IV (Figure 4B) and Perlidae was present in the highest biological classes (i.e., Class I, II and III) (Figure 4C).

Logistic Regression Models
Of the 16 physicochemical and microbiological variables measured in the Machangara River, only eight revealed a relation with the presence or absence in the three GLMs of Baetidae, Leptoceridae and Perlidae.These variables were flow velocity, water temperature, pH, color, conductivity, biological oxygen demand five (BOD 5 ), chemical oxygen demand (COD) and fecal coliforms (Table 3, Figures A6-A8).However, the range of preferred physicochemical conditions differed between families in relation to their specific variables.The regression coefficients of Equation (1), p-values and goodness-of-fit measurements of the aforementioned models are presented in Tables 3 and A2.For each of the families, the model selection procedure is summarized in Tables A3-A5.For the family of Baetidae (TS: 6), the adjusted R 2 calculated for their GLM was around 60% and the AIC value was equal to 22.5, which was the lowest of the three constructed models for the three families.These taxa were present in 27 of the 33 sampled points.The effect of conductivity (p = 0.04), temperature (p = 0.07), color (p < 0.10) and COD (p = 0.10) were associated with the probability of the occurrence of Baetidae according to the constructed GLM (Table 3).Based on the binomial model and the constructed boxplots, the likelihood of the existence of Baetidae increases with lower temperatures (Figures A9A and A6A).The effect of the color was similar, i.e., with a minor value, their possibility of presence increased (Figures 5A and A6B).Concerning conductivity, Baetidae were likely to occur at higher values (Figures A9C and A6D) compared with the other two analyzed taxa.Nevertheless, the maximum measured concentration of this variable was 238 µS•cm −1 , which was relatively low.A higher probability of the existence of Baetidae was associated to the measured range (2 to 46 mg•L −1 ) of COD concentration, (Figures A9B and A6C).The maximum registered value of this parameter corresponded to water with relatively low pollution.The plots of the residuals vs. fitted and scale-location of the model can be seen in Figure A12A,B respectively.
the constructed boxplots, the likelihood of the existence of Baetidae increases with lower temperatures (Figures A9A and A6A).The effect of the color was similar, i.e., with a minor value, their possibility of presence increased (Figures 5A and A6B).Concerning conductivity, Baetidae were likely to occur at higher values (Figures A9C and A6D) compared with the other two analyzed taxa.Nevertheless, the maximum measured concentration of this variable was 238 μS•cm −1 , which was relatively low.A higher probability of the existence of Baetidae was associated to the measured range (2 to 46 mg•L −1 ) of COD concentration, (Figures A9B and A6C).The maximum registered value of this parameter corresponded to water with relatively low pollution.The plots of the residuals vs. fitted and scalelocation of the model can be seen in Figure A12A,B respectively.
Regarding Leptoceridae (TS: 8), their GLM had the highest adjusted R 2 of the three analyzed models for the three taxa, with a value of 67.6% and the AIC value was equal to 26.5.This family was present in six of the 33 sampled points.The pH (p = 0.03), flow velocity (p = 0.04), fecal coliforms (p = 0.04) expressed in log scale, COD (p = 0.05), BOD5 (p = 0.06) and conductivity (p = 0.06) are the variables of the GLM related to the probability of the occurrence of this family (Table 3).The boxplots of Leptoceridae and their GLM reveals that the probability of the presence of these taxa is higher with low values of fecal coliforms (Figures A10A and A7A), low concentrations of COD (Figures A10B and  A7E), and BOD5 (Figures A10C and A7C).The probability of the existence of the aforementioned taxa is higher with neutral pH (Figures A10D and A7D) and when the flow velocity is lower than one meter per second (Figures 5C and A7B).These taxa tend to be present when the conductivity is low (Figures 5D and A7F).The plots of the residuals vs. fitted and scale-location of the model can be seen in Figures A12C,D    Regarding Leptoceridae (TS: 8), their GLM had the highest adjusted R 2 of the three analyzed models for the three taxa, with a value of 67.6% and the AIC value was equal to 26.5.This family was present in six of the 33 sampled points.The pH (p = 0.03), flow velocity (p = 0.04), fecal coliforms (p = 0.04) expressed in log scale, COD (p = 0.05), BOD 5 (p = 0.06) and conductivity (p = 0.06) are the variables of the GLM related to the probability of the occurrence of this family (Table 3).The boxplots Water 2017, 9, 195 10 of 31 of Leptoceridae and their GLM reveals that the probability of the presence of these taxa is higher with low values of fecal coliforms (Figures A10A and A7A), low concentrations of COD (Figures A10B  and A7E), and BOD 5 (Figures A10C and A7C).The probability of the existence of the aforementioned taxa is higher with neutral pH (Figures A10D and A7D) and when the flow velocity is lower than one meter per second (Figures 5C and A7B).These taxa tend to be present when the conductivity is low (Figures 5D and A7F).The plots of the residuals vs. fitted and scale-location of the model can be seen in Figure A12C,D respectively.
The Perlidae (TS: 10) GLM model was characterized by the lowest adjusted R 2 value (43.5%) and the highest AIC value (34.5) of the three constructed models for the three families.These taxa were present in 12 of the 33 sampled points.The probability of the occurrence of Perlidae in relation to the constructed GLM is associated to fecal coliforms (p = 0.02) expressed in log scale, temperature (p = 0.03), conductivity (p = 0.04) and flow velocity (p = 0.17) (Table 3).Regarding the binomial model of Perlidae and their boxplots, the possibility of the presence of this family is higher with less concentration of fecal coliforms (Figures 5B and A8B).When the temperature (Figures A11A and  A8A) was lower, the opportunity of occurrence of the aforementioned taxa increased.Perlidae also prefer flow velocities with values less than 1.5 m•s −1 (Figures A11B and A8D) and conductivity below 180 µS•cm −1 (Figures A11C and A8C).The plots of the residuals vs. fitted and scale-location of the model can be seen in Figure A12E,F respectively.
The studied families: Baetidae, Leptoceridae and Perlidae, had one common variable, the conductivity, which always was below 250 µS•cm −1 .COD was a conjoint variable between Baetidae and Leptoceridae, albeit the amplitude of this predictor was lower when the sensitivity of the taxa was higher.Leptoceridae and Perlidae had two other mutual variables: fecal coliforms and flow velocity.For the first variable, the range was minor when the sensitivity of the taxa was greater, while for the second predictor, the value in both cases was less than 1.4 m•s −1 .A common variable between the GLMs of Baetidae and Perlidae was the temperature, which was inferior when the sensitivity of the taxa was superior.
When the GLMs were validated with an independent data set, the accuracy of the models measured as adjusted R 2 was 86% for Baetidae, 86% for Leptoceridae and 43% for Perlidae.

Analysis of the Chosen EPT Taxa in Relation with the BMWP-Col
Although the average values of measured pollutants were low in eight points located by the forest in the upper area of the basin, the biological class given for BMWP-Col was moderate.Similar findings were found in high land streams in the Andes in South America and in the Rwenzori Mountains in Africa [12,46,47].This could be due to environmental stress caused by natural factors such as the disturbance of stream sites due to poor habitat conditions, the impact of heavy rains in the rainy season [12], and heavy shading, or due to hydropeaking as a result of dam operation.Other causes could be the water physicochemical characteristics in the forest sites, such as low conductivity, low turbidity, high dissolved oxygen concentration, which in combination with heavy shading could influence the low primary production [48].
The presence of Baetidae in all the BMWP-Col classes could be due to the fact that the maximum measured concentration of BOD 5 was 13 mg•L −1 , a value that is relatively low, and the minimum DO concentration was high with a value of 6.65 mg•L −1 .This family was found in Thailand with concentrations of BOD 5 , DO and conductivity of 7 mg•L −1 , 1.1 mg•L −1 and 377 µS•cm −1 respectively.In Ghana, these taxa were present with a BOD 5 of 2 mg•L −1 , DO of 4.1 mg•L −1 and a conductivity of 1250 µS•cm −1 [10], while in Turkey, Baetidae were found when BOD 5 was 8.8 mg•L −1 [49].
When Leptoceridae were present, the BMWP-Col class varied from I to IV, despite low pollution registered, with a maximum BOD 5 of 0.60 mg•L −1 and a minimum DO of 6.7 mg•L −1 .Perlidae were present when the BMWP-Col was registered from Class I to III, the minimum DO was 7.0 mg•L −1 and the maximum BOD 5 was 0.50 mg•L −1 .Other factors such as disturbances of stream caused by the rainy season, or by the daily operation of the two dams, could have induced the lower BMWP-Col class.

Analysis of the Explanatory Variables in Relation to Response Variables
Our results demonstrate that GLMs can be used to select physicochemical variables that best predict the presence of the EPT taxa.These orders were chosen because those models describing environmental preference conditions of sensitive families are more reliable and valid than models for non-sensitive taxa [50].For example, Forio, et al. [50] showed that the model constructed to predict the occurrence of the Leptophlebiidae (TS: 10) taxa, which belong to the Ephemeroptera order, was reliable, while, for the prediction of the Chironomidae (TS: 2), which belong to the Diptera order, the model was not reliable.
Conductivity was a common variable between the three analyzed taxa (Baetidae, Leptoceridae and Perlidae).This variable represents the natural mineral content of the water from both inorganic and organic origin [51].In the first case, inert material is dragged because of precipitation and local geology, as well as inorganic pollutants from different anthropogenic sources that are leached or discharged into water bodies.However, in the second case, the organic origin is mainly due to wastewater discharges [43,52].In agreement with the models developed in this research, conductivity has been established as one of the most critical variables to predict macroinvertebrates presence in rivers, in both tropical and temperate countries [38,51,53].In this way, the caddisflies and stoneflies orders, from which Leptoceridae and Perlidae originate respectively, were reported to be present when the conductivity was relatively low [54,55].Furthermore, D'Heygere, et al. [51] indicated that the wide range of variation of this variable expresses the high diversity existing between rivers and their stretches.
Similarly, Dissolved Oxygen (DO) is a variable that in most cases is present to predict the occurrence of macroinvertebrates [51,56,57].Nonetheless, DO was not a variable of the three GLMs in this research, an observation that was not expected in our research hypothesis.The main cause could be that the lowest measured concentration of DO in the field was 6.65 mg•L −1 (>85% oxygen saturation), a value that was enough for the presence of Baetidae.For the occurrence of Leptoceridae and Perlidae, higher concentrations of DO were necessary, values that were greater when the family was more sensitive.That is, for Leptoceridae, the minimum DO concentration was 6.67 mg•L −1 , while for Perlidae the lowest concentration of this variable was 7.0 mg•L −1 .Additionally, mountain rivers are likely to maintain relatively high oxygen saturation due to their high velocity and turbulence.Consequently, these rivers have better aeration and less sedimentation of oxygen-consuming materials thus being more resistant to organic pollution [14].Regarding Dissolved Oxygen, Connolly, et al. [58] mentioned that mayflies populations declined dramatically when the DO levels were less than 20% of saturation.Moreover, a finding in the highlands of Ecuador consistent with our analysis established that no EPT taxa were present in tributaries when the oxygen saturation was lower than 80% [13].
Flow velocity is another important variable that has been analyzed by some authors.In this research, the aforementioned variable is part of the two predictable models for Leptoceridae and Perlidae whereas it is not a factor for the Baetidae model.Interestingly, Holguin-Gonzalez, et al. [57] reported the flow velocity as a predictor of the presence of macroinvertebrates in Colombia.In the same way, Ríos-Touma, et al. [11] in their research, in the high altitude tropical streams in Ecuador, found that flow velocity had influence in abundance, in communities such as Leptoceridae.However, Ríos-Touma, et al. [11] reported that this variable is less important for macrobenthos with faster development times such as Baetidae, which is less susceptible to hydraulic disturbance.A similar finding in Zimbabwe for the Trichoptera order, showed a strong relation between their occurrence and abiotic parameters such as flow velocity and average depth [59].
The temperature variation between high altitudes despite short distances to reach them is considerable.In addition, the hourly and daily temperatures at one altitude fluctuate substantially in the tropical Andes Mountains [11].Furthermore, due to lower water temperature, the solubility of oxygen increases leading to elevated DO concentrations in the water.At higher elevations, however, the atmospheric partial pressure of oxygen diminishes with ascension in altitude, countering the effects of lower temperature [14].The aforementioned temperature is part of two GLMs (Baetidae and Perlidae) and it is negatively correlated with the possible presence of these two taxa.A similar correlation was found for the prediction of the EPT taxa in the tropical Andean region of Bolivia [12] or for the macrobenthos assemblage in California [60].
The significant variables of the three GLMs that were related to organic pollution were COD, BOD 5 and fecal coliforms.The first variable was in relation to the possible presence of Baetidae and Leptoceridae, while BOD 5 was only associated with the probable occurrence of Leptoceridae and fecal coliforms related to the possible existence of Baetidae and Perlidae.Outcomes were consistent with Jacobsen [14], who specified that the effect of organic pollution on the macrobenthos in Ecuadorian high land tributaries was the same as rivers at higher latitudes.In this way, Jacobsen [14] and Ríos-Touma, et al. [11] reported in their research in the Andes of Ecuador that Plecoptera, in which Perlidae originates, was present in pristine conditions and unpolluted places.With regard to Leptoceridae, the previously mentioned authors described that this family was present in only slightly polluted places.While, with regard to Baetidae, Jacobsen [14] expressed that these taxa were not found in severely polluted streams.Macrobenthos are known to be affected by organic pollution for two reasons: the first is dissolved oxygen reduction and the second is due to alteration of the substratum and loss of available food sources [61].Furthermore, a study performed in the Itambi River, located in the northern Andes of Ecuador, evaluated the impact of organic pollution.The study concluded that the number of macroinvertebrate species was reduced when the concentration of BOD 5 increased with a subsequent decrease in DO and vice versa [62].In the samples subject of this research, it was found that when Perlidae were present the maximum COD and BOD 5 were equal to 14 mg•L −1 and 0.5 mg•L −1 respectively.The relatively low concentration of COD and BOD 5 , could be the main reason these variables were not present in the Perlidae Model.In either case, when one of the three families was present, the maximum concentration of COD was equal to 46 mg•L −1 , an amount registered for Baetidae.For more sensitive taxa, lower values of COD were registered for their presence.These findings are in line for the EPT taxa, which are considered indicators of relatively clean water due to their sensitivity to pollution [2,4,12].
Two other variables, color and pH, had respective relationships to Baetidae or Leptoceridae.On the one hand, color had a negative correlation with the model to predict the probable presence of Baetidae.The range of this predictor, when these taxa were found, varied from 0 to 27 Hazen units (HU).The color in natural water consisted mainly of the generic humic and fulvic fraction of dissolved organics [63], and its intensity could be increased in direct relation to the amount of precipitation and runoff [64].The humic substances are organic acids and their accumulation and dissolution in water are associated with a reduction of pH [47,65].The pH also had a negative correlation as a possible predictor of the presence of Leptoceridae, which was found when pH was between 6.33 and 8. Similar values (4.6 to 7.9) were found when Leptoceridae were present in research done in the high-altitude streams in Uganda [46].In addition, the composition and abundance of macroinvertabrate taxa have been shown to have a relationship with pH [66] and subtle differences in this parameter may explain the differences in macrobenthos assemblages [65].The potential of hydrogen has been used to predict the presence of some families of macrobenthos, such as Baetidae, Hydroptilidae and Simuliidae [38,46,67].
The application of the three GLMs to find the probable presence of the three families studied (i.e., Baetidae, Leptoceridae and Perlidae) is defined by the measured range of the predictors found in this research.Regarding this topic, some authors have written that the preferential conditions of a family varies from place to place, in connection with abiotic settings [12,38].Hence, the three constructed models could not be simulated in different amplitude of the predictors that were evaluated in this investigation.

Model Performance
Three GLMs for three different EPT taxa (i.e., Baetidae, Leptoceridae and Perlidae) were built in this study using a generalized linear model.GLMs have the advantage of directly establishing and reporting the relative importance of each variable in searching for the biotic integrity [68].Furthermore, a stepwise discriminant procedure to select the most significant variables based on the AIC selection criteria applied in this research, has been shown to be effective in the prediction of species distribution [69].However, the variables chosen for the developed models were not necessarily the only ones that were important.Variables that were not selected could be due to a correlation with another variable or with a set of variables [1], or that they were less important for the model than variables that were selected.
Because the data sample size (33 points) was relatively small (<100 points) [70,71], the binary data that represents their presence and absence was applied for the construction of the GLMs.With regard to small sample sizes, Stockwell and Peterson [71] concluded that the accuracy of samples of ten points was 90% and when the size increased to 50 points their accuracy was near maximal.In addition, logistic regression models that were applied to different small sample sizes showed a stable goodness-of-fit in their predictable capability [37,72].Despite the fact that the basin is regulated and taking into account the samples were only taken in the rainy season, binary construction was applied for the model development.This kind of development based on one season gives the best results, however, it does not allow for insights in to seasonal variations related to the abundance of macrobenthos [38].Moreover, the probability of the existence of taxa based on their presence or absence was estimated with a GLM with binomial adjustment, which is more suitable and often used for these types of predictions [36,73,74].
The suitability of the developed models in the first instance is in the range in which the eight explanatory variables were measured (Figures A3 and A4).For Baetidae, the presence or absence of this taxa as a result of their GLM, could be inferred in lower or higher values of color and temperature.However, we do not recommend extrapolating for the possible presence of this family when the concentrations are greater than the measured range for the COD (46 mg•L −1 ) and conductivity (238 µS•cm −1 ).The presence or absence of Leptoceridae according to the developed GLM, could be extrapolated by flow velocity, conductivity, COD, BOD 5 and pH.Nonetheless, we do not advise to infer the possible presence of this family when fecal coliforms are higher than the measured range (1.7 × 10 5 MPN.100 mL −1 ).Regarding the Perlidae model, the temperature and fecal coliforms could be extrapolated in smaller and greater measured values for determination of their presence or absence.However, this model must not be extrapolated without prior determination of the presence of this taxa in higher values of flow velocity (1.30 m•s −1 ) and conductivity (177.9 µS•cm −1 ).
According to Mac Nally [75], the typical R 2 found in 100 cases analyzed was around 50% with p > 0.5 and R 2 value diminished to 25%, when only the significant variables were retained in the models.The author mentioned also specified that much simpler models with less variables produced typical R 2 around 30% with p < 0.001.The goodness-of-fit of the three GLMs, which was also measured with the correlation coefficient adjusted R 2 , has shown that the Leptoceridae model had the highest value (67.6%), followed by the Baetidae (60%) and the Perlidae (43.5%)GLMs, a common adjustment range found in ecological models.When the accuracy of the model was assessed with an independent validation set, the adjusted R 2 was similar for both the Baetidae and the Leptoceridae (86%), while for Perlidae it was the lowest with an analogous value that was obtained with the training data set.In the latter value, it is not clear why this was the lowest.

Conclusions
Three generalized linear models (GLMs) were built in order to understand the physicochemical water quality variables that determine the relation to the presence of three selected EPT taxa.Eight variables, identified during the stepwise selection procedure, showed a clear relation to the probable occurrence of the analyzed families.Each taxon had its own explanatory variables, of which conductivity was a unique common term between the GLMs of the three families.The fit of the GLMs was measured with the Akaike information criterion (AIC), the adjusted R 2 , as well as, an independent validation data set.The adjusted R 2 varied from 43.5% to 67.6%, values that are common for ecological models.Therefore, the GLMs could be used as tools to predict changes in the biological quality of the Machangara River.
primarily as a supply of drinking water, agriculture irrigation, and to a lesser extent, for industrial use.The altitude of the basin varies from 2440 to 4420 m above sea level (MASL) and its mean altitude is 3557 MASL.The average annual rainfall in the basin varies between 877 mm and 363 mm per year, while the average annual temperature fluctuates between 16.3 °C and 9.0 °C in the lower and upper areas respectively[21,22].Two seasons are present during the year, the rainy from the middle of February until the beginning of July and the dry season during the rest of the year.The average flow of the Machangara River measured from 1964 to 2010, before discharging into the Tomebamba River, was 8.4 m 3 •s −1 , the average minimum was 5.3 m 3 •s −1 in August and the average maximum being 14.6 m 3 •s −1 in May[23], cf.FigureA1in Appendix for monthly averages.

Figure 1 .
Figure 1.Location of the Machangara Basin in Ecuador.

Figure 1 .
Figure 1.Location of the Machangara Basin in Ecuador.

Figure 2 .
Figure 2. Sampled site locations with (A) land use; (B) Biological Monitoring Working Party score adapted to Colombia (BMWP-Col) qualification.

Figure 2 .
Figure 2. Sampled site locations with (A) land use; (B) Biological Monitoring Working Party score adapted to Colombia (BMWP-Col) qualification.

Figure 3 .
Figure 3. Boxplots of the BMWP-Col class with the main explanatory variables of the three models: (A) chemical oxygen demand (COD); (B) conductivity; (C) flow velocity and (D) temperature.

Figure 4 .
Figure 4. Bar plots of the BMWP-Col class with the number of the samples in which the chosen macrobenthos were present or absent: (A) Baetidae; (B) Leptoceridae and (C) Perlidae.

Figure 3 .
Figure 3. Boxplots of the BMWP-Col class with the main explanatory variables of the three models: (A) chemical oxygen demand (COD); (B) conductivity; (C) flow velocity and (D) temperature.

Figure 4 .
Figure 4. Bar plots of the BMWP-Col class with the number of the samples in which the chosen macrobenthos were present or absent: (A) Baetidae; (B) Leptoceridae and (C) Perlidae.

Figure 4 .
Figure 4. Bar plots of the BMWP-Col class with the number of the samples in which the chosen macrobenthos were present or absent: (A) Baetidae; (B) Leptoceridae and (C) Perlidae. respectively.

Figure 5 .
Figure 5.The probability of chosen taxa being present in relation to an explanatory variable: (A) Baetidae with color; (B) Perlidae with log fecal coliforms; (C) Leptoceridae with flow velocity; (D) Leptoceridae with conductivity.(The points in the curves indicate extrapolation outside the observed physicochemical variables range).

Figure 5 .
Figure 5.The probability of chosen taxa being present in relation to an explanatory variable: (A) Baetidae with color; (B) Perlidae with log fecal coliforms; (C) Leptoceridae with flow velocity; (D) Leptoceridae with conductivity.(The points in the curves indicate extrapolation outside the observed physicochemical variables range).

Figure A5 .
Figure A5.Boxplots of the BMWP-Col with the others explanatory variables of the three models: BOD5 (A); Log fecal coliforms (B); pH (C) and color (D).

Figure A5 .
Figure A5.Boxplots of the BMWP-Col with the others explanatory variables of the three models: BOD5 (A); Log fecal coliforms (B); pH (C) and color (D).

Figure A5 .
Figure A5.Boxplots of the BMWP-Col with the others explanatory variables of the three models: BOD 5 (A); Log fecal coliforms (B); pH (C) and color (D).

Figure A6 .
Figure A6.Boxplots showing the presence or absence of Baetidae with its explanatory variables: (A) temperature; (B) color; (C) COD and (D) conductivity.

Figure A8 .
Figure A8.Boxplots showing the presence or absence of Perlidae with its explanatory variables (A) temperature; (B) fecal coliforms expressed in log scale; (C) conductivity; (D) flow velocity.

Figure A9 .
Figure A9.The probability of Baetidae being present in relation to (A) temperature; (B) COD; (C) conductivity.(The blue points in the curves indicate extrapolation outside the observed physicochemical variables range).

Figure A9 . 30 Figure A9 .
Figure A9.The probability of Baetidae being present in relation to (A) temperature; (B) COD; (C) conductivity.(The blue points in the curves indicate extrapolation outside the observed physicochemical variables range).

Figure A10 .
Figure A10.The probability of Leptoceridae being present in relation to (A) log fecal coliforms; (B) COD; (C) BOD5; (D) pH.(The brown points in the curves indicate extrapolation outside the observed physicochemical variables) range.

Figure A11 .
Figure A11.The probability of Perlidae being present in relation to: (A) temperature; (B) flow velocity; (C) conductivity.(The red points in the curves indicate extrapolation outside the observed physicochemical variables range).

Figure A10 . 30 Figure A10 .
Figure A10.The probability of Leptoceridae being present in relation to (A) log fecal coliforms; (B) COD; (C) BOD 5 ; (D) pH.(The brown points in the curves indicate extrapolation outside the observed physicochemical variables) range.

Figure A11 .
Figure A11.The probability of Perlidae being present in relation to: (A) temperature; (B) flow velocity; (C) conductivity.(The red points in the curves indicate extrapolation outside the observed physicochemical variables range).

Figure A11 .
Figure A11.The probability of Perlidae being present in relation to: (A) temperature; (B) flow velocity; (C) conductivity.(The red points in the curves indicate extrapolation outside the observed physicochemical variables range).

Table 1 .
Summary of the physical, chemical and microbiological data collected in the Machangara Basin in Ecuador based on 33 samples in 2012.

Table 1 .
Summary of the physical, chemical and microbiological data collected in the Machangara Basin in Ecuador based on 33 samples in 2012.

Table 2 .
Spearman correlation coefficients of the explanatory variable used to construct the generalized linear models (GLMs).

Table 3 .
Regression parameters: p-values and goodness-of-fit measurements of the models for predicting the presence of Baetidae, Leptoceridae and Perlidae.

Table A1 .
Spearman correlation p-values of the explanatory variable used to construct the generalized linear models (GLMs).

Table A2 .
Regression parameters: Standard Error and z value of the models for predicting the presence of Baetidae, Leptoceridae and Perlidae.

Table A1 .
Spearman correlation p-values of the explanatory variable used to construct the generalized linear models (GLMs).

Table A2 .
Regression parameters: Standard Error and z value of the models for predicting the presence of Baetidae, Leptoceridae and Perlidae.

Table A3 .
Process of model development for Baetidae: p-values, AIC and adjusted R 2 .