Evaluation of Seasonal and Spatial Variations in Water Quality and Identiﬁcation of Potential Sources of Pollution Using Multivariate Statistical Techniques for Lake Hawassa Watershed, Ethiopia

: The magnitude of pollution in Lake Hawassa has been exacerbated by population growth and economic development in the city of Hawassa, which is hydrologically closed and retains pollutants entering it. This study was therefore aimed at examining seasonal and spatial variations in the water quality of Lake Hawassa Watershed (LHW) and identifying possible sources of pollution using multivariate statistical techniques. Water and efﬂuent samples from LHW were collected monthly for analysis of 19 physicochemical parameters during dry and wet seasons at 19 monitoring stations. Multivariate statistical techniques (MVST) were used to investigate the inﬂuences of an anthropogenic intervention on the physicochemical characteristics of water quality at monitoring stations. Through cluster analysis (CA), all 19 monitoring stations were spatially grouped into two statistically signiﬁcant clusters for the dry and wet seasons based on pollution index, which were designated as moderately polluted (MP) and highly polluted (HP). According to the study results, rivers and Lake Hawassa were moderately polluted (MP), while point sources (industry, hospitals and hotels) were found to be highly polluted (HP). Discriminant analysis (DA) was used to identify the most critical parameters to study the spatial variations, and seven signiﬁcant parameters were extracted (electrical conductivity (EC), dissolved oxygen (DO), chemical oxygen demand (COD), total nitrogen (TN), total phosphorous (TP), sodium ion (Na + ), and potassium ion (K + ) with the spatial variance to distinguish the pollution condition of the groups obtained using CA. Principal component analysis (PCA) was used to qualitatively determine the potential sources contributing to LHW pollution. In addition, three factors determining pollution levels during the dry and wet season were identiﬁed to explain 70.5% and 72.5% of the total variance, respectively. Various sources of pollution are prevalent in the LHW, including urban runoff, industrial discharges, diffused sources from agricultural land use, and livestock. A correlation matrix with seasonal variations was prepared for both seasons using physicochemical parameters. In conclusion, effective management of point and non-point source pollution is imperative to improve domestic, industrial, livestock, and agricultural runoff to reduce pollutants entering the Lake. In this regard, proper municipal and industrial wastewater treatment should be complemented, especially, by stringent management that requires a comprehensive application of technologies such as fertilizer management, ecological ditches, constructed wetlands, and buffer strips. Furthermore, application of indigenous aeration practices such as the use of drop structures at critical locations would help improve water quality in the lake watershed.


Introduction
Studies have shown that urban, agricultural, and industrial discharges have a direct effect on surface water quality. Similarly, urban wastewaters cause fecal contamination of surface waters, and urban stormwater runoff, which contains large amounts of fecal microbes, also affects surface water quality [1]. Surface water bodies are vital natural resources that are vulnerable to pollution. The contaminants are chemical, physical, and biological constituents resulting from anthropogenic activities and are of greater environmental consideration [2]. Surface water bodies are extensively used as the major sources for domestic, non-domestic, industrial, and irrigation purposes. Therefore, monitoring and assessment of water bodies is imperative to obtain reliable information on water quality for effective management [3]. Anthropogenic uses of the waterbodies in the study basin can degrade the quality of surface water and impair its usability as potable water supply or for industry, agriculture, recreation, or other purposes. Hence, regular monitoring of water quality of rivers and lake is indispensable [4,5]. The most affected river stretches are those that flow through urbanized and exceedingly populated urban areas where there is no adequate sanitation. Upstream rural areas are mainly affected by pollutants from non-point sources such as agricultural runoff, whereas urban areas are polluted by point sources, sewage discharges, urban runoff, and pollutants from upstream areas [6,7].
Studies have shown that some lakes and wetlands around the world have disappeared or are showing changes in their ecosystem. Furthermore, factors such as intensive land use for urbanization and agriculture have had significant impact on the hydrology, ecology, and ecosystem services of lakes, which has eventually led to a decline in lake levels [8].
In addition, pollutants have long been a concern, as their accumulation can have serious effects on fauna, flora, and human health when the huge amount of urban and industrial wastewater reaches the shores [9]. Lake Hawassa is located near the city of Hawassa and is surrounded by agricultural land, industries and residential areas. Therefore, it is susceptible to a variety of pollutants that enter the lake directly or indirectly. On the other hand, the Lake Hawassa Watershed is experiencing rapid land cover change, and natural resources have overwhelmingly diminished. The lake is hydrologically closed and has no apparent outlet, so all pollutants entering the lake are retained. As a result, the lake faces numerous problems, and the water quality deteriorates over time, threatening biodiversity [10].
Significant industrialization, augmented with rapid urbanization and increasing economic development, has increased the extent of pollution [11]. The pollution is mainly from non-point sources caused by urban and agricultural runoff, overgrazing, deforestation, soil erosion, land development, and industrial effluents. This leads to numerous environmental concerns that have resulted in substantial hydrological disturbances. The main factories in the study area are a ceramics factory, a flourmill, a cement products factory, a Moha soft drink factory, a BGI (St. George Brewery factory), an Etabs soap factory, an industrial park in Hawassa, and other small-scale industries. They are virtually all concentrated along the main road, which is close to the shallow swamp, and discharge their effluents into the lake through streams. On the other hand, deforestation and irrigation of the land have caused the drying up of Lake Cheleleka by reducing the streamflow [12].
Various studies have been conducted to examine water quality in the LHW catchment and identify sources of pollution. Teshome [11] investigated the eastern catchment of Lake Hawasa Watershed to assess the seasonal water quality and its suitability for the designated uses. The findings revealed that the rivers in the eastern part of Lake Hawassa Watershed are suitable for agriculture and livestock but unpleasant for aquatic life, and the lake is hypereutrophic.
Amare [13] investigated the primary sources of non-point source pollution and their relative contribution in Lake Hawassa Watershed using the Annualized Agricultural Non-Point Source (AnnAGNPS) model. The pollutant-loading model revealed non-point source pollutants originating from agricultural lands and associated with deleterious anthropogenic activities responsible for the water quality impairment of Lake Hawassa. These point sources have been determined to be the source of numerous pollutants in the lake ecosystem if the effluent control system put in place is unsuitable [14].
Kebede [15] studied the impact of land cover changes on water quality and streamflow in Lake Hawassa Watershed and concluded that water quality in the upper watershed of the three rivers was better than the lower sections of the catchment with respect to the parameters studied, which might be correlated to the observed land use.
A study conducted by Lencha et al. [16] at Lake Hawassa revealed that most of the population, including the inner part of the city, are using latrines. Larger buildings have conventional flushing systems but without any wastewater treatment. Furthermore, industrial and commercial point sources are known to discharge their effluents into streams or rivers that end up in the Lake. In addition, Hawassa Industrial Park and Referral Hospital discharge their effluents directly into the lake. This is a threat to the people who rely on rivers, streams, and the lake for domestic and other purposes and to the survival of aquatic life as well.
To sum up, some studies regarding the water quality have been conducted in either the eastern or the western catchment of Lake Hawassa, while others have been carried out only at Lake Hawassa. Nonetheless, there is no sufficient water quality study to connect agricultural and urban land use with the watershed pollution level to identify the sources of pollution. The previous studies mainly relied on random monitoring and data from literature and focused only on a few water quality parameters, which cannot reflect the whole picture of water quality in the watershed. Additionally, some previous studies also obtained contradictory findings. On the other hand, urbanization, industrialization, commercial activities, and population growth are increasing rapidly, which could increase sewage and effluents production. Through monitoring data, consistent data analysis, and homogenization of parameters, this study aimed to (1) statistically analyze multipleparameter data by using principal component analysis (PCA), cluster analysis (CA), and discriminant analysis (DA); (2) investigate the broad-spectrum variation in the parameters of LHW; and (3) cluster monitoring stations with similar characteristics and identify potential sources of pollution in LHW.

Study Area
Lake Hawassa Watershed (LHW) is located 275 km from the capital Addis Ababa, in the capital of Sidama regional state, on the main road leading to Nairobi, Kenya via Moyale. LHW has a total area of 1431 km 2 and lies between 6 • 45 to 7 • 15 N latitude and 38 • 15 to 38 • 45 E longitude ( Figure 1). LHW comprises five sub-watersheds [17].
The watershed is known for its flat plains and dissevered undulating landscape with elevation ranging from 1571 to 2962 m above sea level [18]. The area comprises mountains and low-lying areas, with a wide flat wetland called Cheleleka. Perennial rivers and streams on the north and northeast sides of the catchment and runoff on the east wall feed Cheleleka. The sub-basin of Tikur-Wuha consists of only a tributary called Tikur-Wuha that flows into Lake Hawassa. In this lake system, no surface water flows out from the lake except by evaporation and abstraction, so the catchment can be considered hydrologically closed [15]. The climate of the Hawassa sub-basin is sub-humid and distinctly seasonal. The months from April to October are wet and humid, and the main rainy season is between July and September, with a mean annual precipitation of about 955 mm. The mean minimum precipitation is 17.8 mm in December (dry season) and the mean maximum precipitation is 119.8 mm in August (rainy season) [19].

Sampling and Monitoring Parameters
The monitoring sites and sampling strategy were planned to cover a wide range of factors contributing to the water quality of the river, taking into account tributaries and point sources whose effluents end up in the lake and have a substantial impact on the water quality of the lake. The criteria for selecting monitoring points were hydrological, with confluence of sub-basins having distinct characteristics and land use types, with the intention of transferring parameters to unmonitored sub-basins. Furthermore, factors such as availability of point and non-point sources, land use type, and urban and wastewater drains were considered in the selection of monitoring sites.
Hence, a total of nineteen (19) monitoring stations were selected (Table 1 and Figure  1). Four (4) monitoring sites were selected purposively at the Wesha, Hallow, Wedessa, and Tikur-Wuha river mouths of the respective sub-watersheds.

Sampling and Monitoring Parameters
The monitoring sites and sampling strategy were planned to cover a wide range of factors contributing to the water quality of the river, taking into account tributaries and point sources whose effluents end up in the lake and have a substantial impact on the water quality of the lake. The criteria for selecting monitoring points were hydrological, with confluence of sub-basins having distinct characteristics and land use types, with the intention of transferring parameters to unmonitored sub-basins. Furthermore, factors such as availability of point and non-point sources, land use type, and urban and wastewater drains were considered in the selection of monitoring sites.
Hence, a total of nineteen (19) monitoring stations were selected (Table 1 and Figure 1). Four (4) monitoring sites were selected purposively at the Wesha, Hallow, Wedessa, and Tikur-Wuha river mouths of the respective sub-watersheds.
Eleven (11) monitoring sites were distributed evenly along the entire course of Lake Hawassa for water quality monitoring. Three (3) monitoring sites were selected near the industrial disposal site, and one (1) was at the health care center.
The monitoring sites in the Tikur-Wuha catchment were Wesha River (MS1), Hallo River (MS2), and Wedessa River (MS3), which are located in the upstream part of Lake Hawassa, where agricultural runoff from the catchment flows directly or through its tributaries into the Cheleleka wetland. The three rivers were purposively selected based on their size and spatial location to represent their respective sub-basins. Monitoring station 6 (MS6) is a critical area with mostly fresh water where factories discharge their effluent into the Tikur-Wuha River, and the river eventually flows into Lake Hawassa. This is an area where river inputs to the lake are high. Monitoring sites for point sources were selected from available industries in the catchment that directly or indirectly feed Lake Hawassa. The selected sites were the St. George Brewery factory, BGI (MS4), and the Moha soft drink factory (MS5), whose effluents discharge into the Cheleleka wetland and eventually enter Lake Hawassa via Tikur-Wuha River, as well as the Referral Hospital (MS15) and Hawassa Industrial park (MS19), which discharge their effluents directly in to Lake Hawassa.
The monitoring stations for Lake Hawassa were selected based on the presence of major pollution sources in the lake, existence of point sources, health facilities, industrial effluent emission sites, availability of boating and recreational activities, presence of service rendering facilities such as Haile and Lewi resorts, fish market (Amora-Gedel and Gudumale), and also the central part of the lake where the disturbance is minimum.
For this purpose, eight (8) monitoring sites were selected in the eastern part (northeast to southeast) of the lake and designated as MS7, MS8, MS9, MS10, MS11, MS12, MS13, and MS14.
The other three (3) monitoring sites were located on the western (northwest to southwest) sides of the lake and were designated as MS16 for the local village Ali-Girma site (opposite Haile Resort), MS17 for Sima site that is opposite side of Mount Tabor, and MS18 for Dore-Bafana Betemengist site. In this part of the lake, although there is no point source pollution, there is enormous anthropogenic activity in the form of non-point source pollution from recreational activities, agricultural runoff, and animal waste.
The analyses of physicochemical water quality parameters at selected sites and periods were conducted from May 2020 to January 2021 to see seasonal variation. Sample collection for the wet season was event-based, i.e., samples were collected after rainfall events. The coordinates of each sampling stations was determined using GNSS.
Composite samples were collected in pre-cleaned 2L polyethylene plastic bottles (sterilized glass bottles were used for biochemical oxygen demand (BOD) and chemical oxygen demand (COD) analyses) for different parameters. The bottles were washed with concentrated nitric acid and distilled water before sample collection and thoroughly rinsed with sample water during collection to avoid possible contamination. The water samples were aseptically handled, labelled, preserved in sterile glass bottles, stored in the cooler (Mobicool v30 AC/DC, Germany) and ice box, and transported to the laboratory of Hawassa University Environmental Engineering, Addis Ababa City Government Environmental Protection, and Green Development Commission and Engineering Corporation of Oromiya for analysis. The collection, handling, preservation, and treatment of the water samples followed the standard methods outlined for the examination of water and wastewater by the American Public Health Association guidelines [20] and all the parameters were presented with their respective analytical methods and instruments used for analysis in Table 2 below. − ), magnesium ion (Mg +2 ), sodium ion (Na + ), potassium ion (K + ), calcium ion (Ca +2 ), and suspended solids (SS).

Un-Ionized Ammonia Determination from Total Ammonium Nitrogen (TAN)
The un-ionized free ammonia was calculated by the mass action law in its logarithmic form (1). The pKa as function of temperature was taken from Emerson et al. [21]: where T k is temperature in Kelvins (273 + • C).

Multivariate Statistical Techniques
Multivariate statistical techniques (MVST) are a valuable tool to estimate efficiently the spatio-temporal variability in a watershed and the influences of human intervention on the characteristics of physicochemical parameters at monitoring stations [22]. In addition, MVST like cluster analysis (CA), discriminant analysis (DA), and PCA/factor analysis can be implemented to interpret complex databases to offer better visualization of water quality in the studied watershed [23]. The statistical techniques PCA, CA, and DA are vital to determine the primary relationships among the physicochemical parameters measured in experimental data standardized to the Z-scale to avoid inaccurate grouping because of the huge variability in the data dimensionality [5,[24][25][26].
XLSTAT 2016 (Addinsoft, New York, USA), Microsoft Excel 2016, and "Statistical Package for the Social Sciences Software, IBM SPSS 25 for Windows" were employed to perform statistical analysis integrally.

Data Treatment and Multivariate Statistical Methods
PCA is sensitive to outliers, missing data, and poor linear correlation among variables due to insufficient assigned variables. Thus, the data treatment needs to be performed for missing data and outliers in the monitored water quality data before executing multivariate statistical analysis. There might be a real shift in the value of an observation that arises from non-random causes. In this study, outliers were detected according to Grubbs [40] test method using XLSTAT 2016. On the other hand, data collection and analysis were conducted with great prudence to minimize the amount of missing data. However, the incidence of missing data is inevitable and was handled by the multiple imputation of missing values technique using Markov Chain Monte Carlo (MCMC) [41].
The raw water quality parameters were standardized to a mean of 0 and variance of 1 using Z-scale transformation to examine the normality of the distribution of data sets and to ensure that the different variables were equally weighted in the statistical analyses [36]. The data were further checked for normality using Kaiser-Meyer-Olkin (KMO) and Bartlett's sphericity tests to determine if our measured variables may be factorized efficiently. KMO is the degree of sampling adequacy, which shows the percentage of variance that is likely attributable to the underlying factors. Generally, the KMO index ought to be greater than 0.5 for satisfactory factor analysis. When the KMO index is close to 1, the PCA of the variables is suitable; however, when it is close to 0, the PCA is not relevant. In this study, the KMO had a value of 0.68. Bartlett's test of sphericity shows whether the correlation matrix is an identity with variables that are unrelated. The significance level, which is 0 in this study (less than 0.05), indicates that there are significant relationships among the variables.

Principal Component (PCs)/Factor Analysis (FA)
PCA reduces the dimensionality of the data set by explaining the correlations amongst a large number of variables in terms of a smaller number of underlying factors without losing much information [42,43]. The original variables of PCs produce loadings that have correlation coefficients with PCs. The PCs' formula was taken from [33,36]: where z is the component loading, y is the component score, x is the measured value of a variable, m is the component number, n is the sample number, and m is the total number of variables. Meanwhile, FA attempts to extract a lower-dimensional linear structure from the data set and extracts the new group of variables known as varifactors (VFs) via rotation along the PCA axis. In FA, the basic concept is borrowed from [33,36]: where y is the measured value of the variable, z refers to the factor loading, p is the factor score, m is the sample number, n is the variable number, r is the total number of factors, and e is the residual term accounting for errors or other sources of variation. In this study, PCA was employed for qualitative determination of pollution sources.

Discriminant Analysis
DA was used for discriminating between and among groups by applying discriminating variables. These variables measure characteristics regarding which the groups are expected to differ [44]. DA applies a linear equation of a regression analysis on raw data with prior knowledge of membership of objects to particular clusters and provides statistical classification of samples, expressed in the following equation [43,45]: where Ki is a constant specific to each particular group, i is the number of groups (G), n is the number of parameters used in group classification, and Wij is the weight coefficient designated by DA for the specific parameter (Pij). Independent variables are entered into DA either all together or stepwise, using both backward and forward approaches. In the first approach of variable entry, the discriminant function is calculated by engaging all the independent variables at once. This approach is used when there are a limited number of independent variables in the interest of discovering how well certain variables perform as discriminants in the absence of others. The stepwise method, on the other hand, involves entering the independent variables into the discriminant function (DF) one at a time. This stepwise input is based on the fact that variables with relative importance to the cluster variables with greater discriminant weights were entered first [46].
In this study, standard, forward, and backward stepwise approaches of DA were applied to each matrix of the primary data. In the forward stepwise mode, discriminant function analysis (DFA) variables were added stepwise until no significant change occurred, while in the backward stepwise mode, variables were removed starting from least significant until a significant change occurred. For this purpose, two groups obtained from CA were selected for spatial evaluations [35].

Pollution Index (PI)
Pollution index (PI) is a simple technique to examine surface water quality and was applied by Tiwan EPA. The parameters such as DO, BOD, SS, and NH 3 −N employed to determine PI were classified into four index scores (Table 3) and computed using the equation formulated by [47,48]. In particular, PI refers to the arithmetic mean of the index values with respect to the water quality. PI classifies water quality into four categories: (0-2) for good or non-polluted, (2-3) for slightly polluted, (3-6) for moderately polluted, and (>6) for highly polluted. Anthropogenic activities have been associated with water quality degradation [47,49].

Cluster Analysis
Hierarchical agglomerative CA was carried out on the normalized data set using Ward's approach, where Euclidean distances were used as the degree of similarity among samples, and a distance was represented by the distinction among analytical values. In hierarchical clustering, sequentially higher clusters formed [23,45,[50][51][52]. In cluster analysis, cases are classified into classes based on similarities between two samples, which are usually given by the Euclidean distance between analytical values of the two samples. The squared Euclidean distance can be calculated by [53]: where Qi is the ith object, and X ij is the value of the jth variable of the ith object.
The dendrogram provides a visual summary of the clustering process to classify a sample of entities into a smaller number of mutually exclusive groups on the basis of multivariate similarities among entities [33].
Therefore, CA, DA, PCA, and pollution index were applied in this study to identify the underlying interrelationship among the parameters and monitoring stations. CA was applied based on prior knowledge of monitoring stations and the results of DA and pollution index to accurately cluster monitoring stations. PCA was employed to qualitatively identify pollution sources and the type of contaminants contributing to pollution.

Correlation Matrix Evaluation and Seasonal Variation
Correlation coefficients are established to portray a correlation among variables and measure statistical significance between pairs of water quality variables [54,55]. Correlation analysis measures the proximity between the identified dependent and independent variables. Correlation coefficients that are close to −1 or +1 demonstrates a strong correlation between x and y, which have a linear correlation. The correlation between the parameters is referred to as strong from (+0.8 to 1.0) or (−0.8 to −1.0), moderate from (+0.5 to 0.8) or (−0.5 to −0.8) and weak from (+0.0 to 0.5) or (−0.0 to −0.5) [56]. In cases where the correlation coefficient between variables is zero, there could be no correlation with a degree of p < 0.05 between the two variables [57]. In this study, a correlation matrix was constructed for each dry and wet season using the physicochemical parameters. Pearson's correlation coefficient (r) is determined using correlation matrix to identify the highly correlated and interrelated water quality parameters. To test the significance of the pair of parameters, the p-value is determined.
A strong positive correlation was found between PO 4 −P and TN, with r = 0.825 at p < 0.005, moderate positive correlations were found between PO 4 −P and COD, BOD, TP, and temperature (r = 0.712, r = 0.709, r = 0.730, r = 0.602, r = 0.594; p < 0.05), and a moderate negative correlation was observed between PO 4 −P and DO values (r = −0.793; p < 0.05). No statistically significant difference was found between pH and NO 3 −N and the rest of the parameters of LHW (p > 0.05).
A strong positive correlation was found between PO 4 −P and TP, with r = 0.921 at p < 0.005, moderate positive correlations were found for PO 4 −P with BOD, COD, TP, Na + , and temperature (r = 0.749, r = 0.647, r = 0.680, r = 0.76; p < 0.05), and a moderate negative correlation was found between PO 4 −P and DO values r = −0.626; p < 0.05) ( Table 5). Table 5. Correlation matrix Pearson (r) and alpha (p) values for dry season.

Parameters TDS
EC The pH of rivers was 7.4 (7.1 to 7.6) in the dry season and 8.2 (7.5 to 8.7) in the wet season, and the pH of lake was 8.2 (7.3 to 8.9) in the dry season and 8.5 (7.5 to 9) in the wet season. The pH of point sources was 8.3 (7.1 to 9) in the dry season and 8.3 (8.1 to 8.7) in the wet season. The recommended pH as per the standard for drinking, irrigation, and aquatic life is 6.5-8.6, and the pH of LHW was within the accepted limit ( Table 6) Table 6). The NO 3 −N concentration of rivers was 0.5 mg/L, NO 3 −N concentration of Lake Hawassa was 1.4 mg/L, and that of point sources was 1.5 mg/L for the dry season. In the wet season, the NO 3 −N concentration was 0.7, 1.9, and 1.9 for rivers, Lake Hawassa, and point sources, respectively. The value of NO 3 −N increases in the rainy season due to the contribution of agricultural runoff and use of fertilizers. The PO 4 −P concentration of rivers was 6.5 mg/L, PO 4 −P of Lake Hawassa was 3.3 mg/L, and that of point sources was 43.8 mg/L in dry season. In the wet season, the PO 4 −P concentration was 7.4, 2.9, and 25.7 for rivers, Lake Hawassa, and point sources, respectively (Table 6). Similarly, Gebre-Mariam [58] reported that Ethiopian Rift Valley lakes generally have lower EC values in the rainy season than in the dry season, due to dilution by rain coupled with minimal evaporation rates during the rainy season.  The TN (TP) of rivers was 8 (0.12) mg/L in dry seasons and 5(0.26) mg/L in wet season, and TN (TP) of lakes was 5.3 (0.2) mg/L) in dry season and 5.2 (0.6) mg/L in wet season. Hence, there is an obvious increase of TN in rivers and Lake Hawassa when temperature increases due to lower dilution and greater agricultural contribution from the upper stream by irrigation, whereas TP in rivers and Lake Hawassa increases in wet seasons due to greater agricultural, rural, and urban runoff. The TN (TP) from point sources was 31.8 (7.2) mg/L in dry season and 13.9 (5.4) mg/L in wet season. This shows that TN (TP) of point sources increases significantly with increasing temperature due to lower dilution. The NH 3 −N of rivers was 0.2 mg/L, NH 3 −N of Lake Hawassa was 0.83 mg/L, and that of point sources was 4.72 mg/L in dry season. In the wet season, the NH 3 −N values were 0.03, 0.71, and 3.6 for rivers, Lake Hawassa, and point sources, respectively. The decreases in NH 3 −N level in the rainy season might be due to dilution effect ( Table 6).
The positive correlation between temperature and TN, TP, EC, TDS, NH 3 −N, and PO 4 −P indicates the increase in the concentration of nutrients as the temperature increases (dry period). It also confirms the major contributors of nutrients were the point sources that are releasing a relatively higher amount of pollutants than the agricultural and other sources, as this value lowers during the wet season due to dilution effect. However, the increase in nutrient (NO 3 −N) concentration in rivers and Lake Hawassa in the wet season might be due to the increased contribution of agricultural runoff and use of fertilizers.
Sodium, calcium, magnesium, and potassium concentrations of the rivers were 49.  (Table 6). There was an observed decrease in ions when the temperature decreased in the study area. This can be ascribed to the discharge of industrial and domestic effluents, which contribute large amounts of alkaline ions to the river system, as the conductivity depends mainly on the ion concentration in surface water [52]. The natural range of sodium ions in water and soil is so low that their existence can show river pollution caused by human activities. Calcium is added to water from soil, industrial wastes, and natural resources. Magnesium is an essential nutrient required for numerous biochemical and physiological functions [59].
The TDS of water generally increases with the level of dissolved pollutants (such as nitrate, ammonium, and phosphate). Conductivity of ions in water depends on water temperature, and ions move faster when water is warm. Hence, conductivity apparently increases when water has a higher temperature [60]. In addition, Taylor et al. [61] pointed out a strong relationship between these variables or ions, such as nitrate, ammonium, and phosphate, and stated that high concentrations of EC indicate high concentrations of soluble salts. There are strong correlations between EC/TDS, as evidenced by an increase in conductivity as the concentration of all dissolved constituents increases [62] Table 6.
The BOD (COD) of rivers was 19.7 (96.5) mg/L in dry seasons and 6.9 (89.4) mg/L in the wet season, and the BOD and COD of lakes was 28.1 (133.3) mg/L in dry season and was 19.1 (112.9) mg/L in wet season. The BOD and COD concentrations for point sources were 116.2 (398.6) mg/L in dry season and 111.6 (353.7) mg/L in wet season ( Table 6). The DO of rivers was 3.5 mg/L in dry season and 6 mg/L in wet season, and the DO of lakes was 4.2 mg/L in dry season and 4.4 mg/L in wet season. The DO of point sources was 2 mg/L in dry season and 2.3 mg/L in the wet season ( Table 6).
The DO of the rivers in the dry seasons and Lake Hawassa were well below the standard value. This indicates that the discharge of industrial and domestic effluents has resulted in serious organic pollution of these rivers, as the decrease of DO was mainly caused by the decomposition of organic compounds. Moreover, an extremely low DO content usually indicates the degradation of an aquatic system [63].
The DO showed a negative correlation with most parameters in both dry and rainy seasons, revealing the value of DO decreases with the increase in other water quality parameters. This could explain the temporal variations, as more oxygen was available for reaction with the pollutants, especially metals and organic pollutants, during dry seasons. Additionally, the characteristics of temporal variation in water quality of LHW were affected by DO. DO was strongly correlated with organic matters, nutrients, and metals, and thus seasonal variation should be considered when DO is used as an indicator to evaluate surface water quality. Low dissolved oxygen (DO) is primarily the result of excessive algal growth caused by nutrients. As the algae die and decompose, this process consumes dissolved oxygen. This may result in insufficient dissolved oxygen for fish and other aquatic life. Temperature was significantly correlated with water quality parameters such as EC, TDS, TP, PO 4 −P, and DO in both seasons. Temperature had significant negative correlation with DO in the dry and wet seasons, indicating that when water temperature increases, the metabolic rate of microorganisms also increases, and the amount of DO in the water decreases. This might be because faster biodegradation of organic matter during dry seasons can effectively improve water quality. The solubility of oxygen was inversely related to temperature, as the water becomes warmer and more easily saturated with oxygen, hence holds less DO during the dry season. Singh et al. [32] observed the inverse relationship between temperature and DO in natural processes, as water can hold less DO with increasing temperature.

Pollution Index (PI)
The mean pollution index of the rivers in the lake watershed was 4.5 in dry and 3.3 in wet season, indicating a moderately polluted condition of rivers. Lake Hawassa PI was 5 in both dry and rainy season, indicating that the quality of the lake was moderately polluted. Anthropogenic activities were causing deterioration of the water quality of the rivers and Lake Hawassa, and the overall status of the water quality is moderately polluted. The PI for the point sources was measured for comparison purposes, and it was found to be highly polluted, having a PI index of 6.8 and 7.3 for the wet and dry seasons, respectively (Table 7).

Cluster Analysis Spatial and Temporal Similarities
Cluster analysis was applied to find out if the monitoring stations had similar characteristics in terms of water quality parameters. It was implemented with the water quality data set to group comparable monitoring sites (spatial variability) spread over the watershed. Results from CA display high homogeneity within clusters and high heterogeneity between clusters [64]. Hierarchical agglomerative CA was carried out with the normalized data set employing Ward's method, using Euclidean distances as a measure of similarity. In this approach, the analysis of variance method is used to evaluate the distances between clusters, attempting to reduce the sum of squares of all clusters that can be made at each step. In this method, the clusters are grouped sequentially, beginning with the most comparable pair of objects and establishing better clusters one after the other, demonstrated through a dendrogram [2,65].
The dendrogram presents a visual summary of the clustering processes and provides the map of the groups with a dramatic reduction in the dimensionality of the original records [2,5,32,43,44]. The CA grouped all 19 monitoring stations into two statistically significant clusters for the dry and wet seasons in LHW, and the dendrogram displays the grouping of stations for the wet and dry seasons, as demonstrated in Figure 2. Regarding the clustering for the dry and wet seasons, monitoring stations from most of the watershed upstream, from the eastern and western sides of the lake, and from the center of Lake Hawassa have been grouped in Cluster 1. Stations in these clusters typically consist of rivers and Lake Hawassa and are categorized as moderately polluted. The monitoring stations in these clusters are MS1-MS3, MS6-MS14, and MS16-MS18, which can be labeled as "moderate anthropogenic effect". This cluster received pollution from point sources and non-point sources, consisting of animal waste and runoff. It is characterized by moderate anthropogenic impact and labelled as moderately polluted. Cluster 2 includes four monitoring stations in the middle part of the LHW and groups monitoring stations in this cluster as MS4, MS5, MS15, and MS19. Four point sources, specifically BGI, Pepsi Factory, Referral Hospital, and Industrial Park monitoring stations, were assigned to this cluster. Consequently, this cluster is characterized by comparatively heavy pollution.

Discriminant Analysis
Discriminant analysis (DA) was used to evaluate the spatial variations in water quality and to distinguish the most critical parameters in relation to variations between clusters. Both the standard and stepwise modes were applied to the primary data by dividing them into wet and dry seasons, and the two spatial groups resulting from CA were used in DA. In this case, the WQ parameters were treated as independent variables, while the clusters were considered as dependent variables. The confusion matrixes (CM) showed that 100%, 100%, and 100% of the data points were correctly classified in the standard, The pollution sources for monitoring stations MS1-MS3 were mainly anthropogenic activities from non-point pollution sources such as agricultural and sewage pollution, whereas pollution sources for monitoring stations MS6 (Tikur-Wuha river) and Lake Hawassa (MS7-MS14, MS16-MS18) were mainly industrial pollution, dispersed point sources, agricultural pollution, urban runoff, and sewage pollution.
Owing to their relative sources, all stations in this cluster were rivers and lakes, suggesting that clustering is reasonable for both dry and wet seasons.
The spatial trend of water quality was generally driven by anthropogenic activities from point and non-point sources of pollution, especially anthropogenic activities with respect to pollutant loading and land use.
Cluster 2 includes four monitoring stations in the middle part of the LHW and groups monitoring stations in this cluster as MS4, MS5, MS15, and MS19. Four point sources, specifically BGI, Pepsi Factory, Referral Hospital, and Industrial Park monitoring stations, were assigned to this cluster. Consequently, this cluster is characterized by comparatively heavy pollution.

Discriminant Analysis
Discriminant analysis (DA) was used to evaluate the spatial variations in water quality and to distinguish the most critical parameters in relation to variations between clusters. Both the standard and stepwise modes were applied to the primary data by dividing them into wet and dry seasons, and the two spatial groups resulting from CA were used in DA. In this case, the WQ parameters were treated as independent variables, while the clusters were considered as dependent variables. The confusion matrixes (CM) showed that 100%, 100%, and 100% of the data points were correctly classified in the standard, forward stepwise, and backward stepwise modes for both dry and wet seasons, respectively (Table 8). The standard DA method builds DFs using eighteen parameters, while only three and seven parameters were the critical parameters useful to make distinction within the two pollution groups for both the forward stepwise modes and backward stepwise modes, respectively, for both dry and wet seasons. In forward stepwise mode, most of the parameters such as turbidity, TDS, pH, NH 3 −N, NO 3 −N, PO 4 −P, DO, COD, NO 2 −N, TN, TP, temperature, Mg 2+ , Ca 2+ , and K + were insignificant variables leading to less variation, and they were deleted in the further process. However, in the forward stepwise DA mode, the three significant variables that were useful to make distinctions within the two pollution groups with 100% correct assignation were EC, BOD, and Na + . The backward stepwise mode deleted the least significant and identified seven significant variables: EC, DO, COD, TN, TP, Na + and K + . These seven parameters, which were 100% correctly assigned, were the critical parameters useful to make distinctions within the two pollution groups. This implies that the expected spatial variation in water quality can be explained sufficiently using variables EC, DO, COD, TN, TP, Na + , and K + . Wilks' lambda shows that the discriminant distribution is skewed towards high concentrations.
On the other hand, the standard DA functions was constructed using eighteen parameters, of which three and four parameters were used for forward stepwise mode and backward stepwise mode, respectively, for wet season. In forward stepwise mode, the pollutants that were found to be insignificant variables and had less variation in terms of their spatial distribution were deleted in the further process. However, in the backward stepwise DA mode, the three significant variables that were useful to make distinctions within the two pollution groups with 84.5% correct assignment were EC, Na + , and COD. The backward stepwise mode deleted the least significant and identified two significant variables: EC and Ca +2 . These two parameters were the critical parameters useful to make distinctions within the two pollution groups with 87.5% correct assignation (Table 8). This implies the spatial water quality variation can be sufficiently explained by using variables EC, Na + , COD, and Ca 2+ , with Wilks' lambda value showing discriminatory distribution is skewed toward high concentration, as shown in Figure 3.

Pollution Source Identification of Monitored Variables
Principal Component Analysis PCA was applied to the normalized data and was able to identify three principal components (PCs) using the Kaiser criterion [66] based on loading higher than 0.5. The scree plot graphs are used widely to identify the number of PCs to be retained to understand the underlying data structure [26]. Based on the scree plot and the eigenvalues >1 criterion, three factors were chosen as principal factors. The variables with eigenvalues lower than 1 were removed due to their low significance [67].
In this study, the scree plot ( Figure 4) shows the sorted eigenvalues from large to small as a function of the number of PCs. This figure shows a pronounced change in slope after the third eigenvalue; three components were retained (Table 9). After the third PC (Figure 4a,b), beginning with the upward curve, the remaining components were circumvented. It was used to classify the number of PCs to be retained in order to figure out the underlying data structure [25]. Consequently, a new set of data is obtained that may explain the variation of data set having fewer variables.
Moreover, scree plots are used to visually evaluate which components or factors elucidate the maximum variability in the data.
Scree plot Scree plot

Pollution Source Identification of Monitored Variables Principal Component Analysis
PCA was applied to the normalized data and was able to identify three principal components (PCs) using the Kaiser criterion [66] based on loading higher than 0.5. The scree plot graphs are used widely to identify the number of PCs to be retained to understand the underlying data structure [26]. Based on the scree plot and the eigenvalues >1 criterion, three factors were chosen as principal factors. The variables with eigenvalues lower than 1 were removed due to their low significance [67].
In this study, the scree plot ( Figure 4) shows the sorted eigenvalues from large to small as a function of the number of PCs. This figure shows a pronounced change in slope after the third eigenvalue; three components were retained (Table 9). After the third PC (Figure 4a,b), beginning with the upward curve, the remaining components were circumvented. It was used to classify the number of PCs to be retained in order to figure out the underlying data structure [25]. Consequently, a new set of data is obtained that may explain the variation of data set having fewer variables. criterion, three factors were chosen as principal factors. The variables with eigenvalues lower than 1 were removed due to their low significance [67].
In this study, the scree plot (Figure 4) shows the sorted eigenvalues from large to small as a function of the number of PCs. This figure shows a pronounced change in slope after the third eigenvalue; three components were retained (Table 9). After the third PC (Figure 4a,b), beginning with the upward curve, the remaining components were circumvented. It was used to classify the number of PCs to be retained in order to figure out the underlying data structure [25]. Consequently, a new set of data is obtained that may explain the variation of data set having fewer variables.
Moreover, scree plots are used to visually evaluate which components or factors elucidate the maximum variability in the data.   Table 9. Matrix of factor loadings calculated based on water quality parameters measured in the period from May to January in the Lake Hawassa Watershed and factor loadings of variables on the first three PCs extracted by using eigenvalue for both wet (a) and dry (b) seasons.  Moreover, scree plots are used to visually evaluate which components or factors elucidate the maximum variability in the data.

Parameters
The PCA results, which include the loadings (participation of the original variable in the new one), are summarized in Table 9. The FA in LHW extracted three factors by retaining the PCs through varimax rotation that explained 72.5% of the total variance for the wet season. An eigenvalue offers a degree of the importance of the factor, and factors having the highest eigenvalues are the most significant. Eigenvalues of 1.0 or more are considered significant. Liu et al. [26] additionally categorized the factor loadings as 'strong', 'moderate', and 'weak', corresponding to absolute loading values of >0.75, 0.75-0.50, and 0.50-0.30, respectively.
The first factor (F1), accounting for 46.8% of the total variance, showed strong positive loadings of TDS, EC, PO 4 −P, BOD, COD, TP, TN, Na + , and temperature with factor loadings of 0.974, 0.978, 871, 0.811, 0.784, 0.793, 0.898, 0.812, 0.825, and 0.832, respectively; a weak positive loading of K + (0.477); and strong negative loading of DO (−0.842) ( Table 9). High positive loadings of temperature and high negative loading of DO might suggest the impact of seasonal variation, and temperature is inversely related to DO. The strong and moderate positive loading of BOD and COD signify biodegradation of organic matter and are negatively affected by DO of water bodies. F1 stands clearly for pollution by BOD or COD, and nutrients and oxygen depletion is a consequence. When the temperature of water bodies decreases, the biodegradation of organic matter decreases, and the solubility of oxygen in the water increases. Similar reports of high concentrations of BOD and COD exist elsewhere [42,44,45]. Similarly, the strong negative DO loading indicates the utilization of DO under anaerobic conditions in rivers and lakes for the degradation of organic matter. F1 showed strongly positive loadings for both COD and BOD, while the loading for DO was strongly negative. This indicates a group of purely organic pollution indicator parameters from industrial effluents, domestic discharges, and livestock affecting water bodies [23,27,51].
High nutrient loadings of factors such as TN and TP represent pollution from point and non-point sources from industrial setup, agriculture areas, domestic sewage, and urban runoff. The high loading of metals demonstrates the influences of industrial effluents and agriculture activities. Phosphorus and nitrogen can originate from point sources such as sewage pollution, industrial facilities and livestock, as well as from non-point sources, mainly from agricultural activities, runoff from rural and urban areas, soil erosion, and livestock. These results are consistent with findings of other reports elsewhere [27,68]. The strongly positive loadings of Na + and weak positive loadings of K + are likely due to industrial effluents discharged into the river Tikur-Wuha and Lake Hawassa. Reports also indicate that the sources of Na + and K + might be domestic sources, fertilizers, and residential waste in addition to industrial effluents [69]. During field observation, it was found that the major industries are discharging their treated and untreated effluents directly into the Tikur-Wuha River and the lake during the rainy period when the flow rate is high, resulting in high dilution, but during the dry period, the dilution effect is lower and consequent pollution is higher.
On the other hand, the strong loadings of TN and TP in F1 suggest higher contribution from point sources in industry and non-point sources such as agricultural land use, urban drainage, and residential areas during the rainy season. In general, these factors are symbolic of a blended source of contamination, encompassing industrial discharges, urban runoff, and agricultural land use. The results are in agreement with those of other studies [5,24,67,69]. Hence, they can be considered as the contamination index for surface water [44,45].
The second factor (F2) explained 13.4% of the total variance. It had a moderately negative loading of Mg 2+ and Ca 2+ (−0.654, −0.627) and a moderately positive loading of NH 3 -N (0.516). This factor's moderately negative loading of Mg 2+ and Ca 2+ is likely to originate from industrial wastewater discharged into the Tikur-Wuha River and Lake Hawassa, usually from carbonate minerals, which are naturally present in the soils of the Lake Hawassa watershed. This factor is more pronounced at monitoring stations affected by point sources, agricultural lands, and rural and urban runoff, such as MS3 in the upper catchment, MS19 in the middle section (point source), and MS8, MS11, MS16, and MS18 monitoring stations on both eastern and western sides of Lake Hawassa.
A moderately positive loading of NH 3 −N (0.7) indicates biodegradation of organic matter. This variable is primarily from runoff, with high loading of solids and wastes from point sources of pollution from domestic and industrial areas. Furthermore, NH 3 −N is triggered by the decomposition of organic matter, indicating the discharge of domestic sewage to surface water. Studies elsewhere have showed comparable results [42,44,45,69,70].
The third factor (F3), explaining 12.3% of the total variance, had a moderately negative loading for pH (−0.710), suggesting the dominance of physical reactions by aquatic plants and natural weathering of the basin, possibly due to industrial impact from different sources [22]. It had weak positive loading of turbidity (0.452), moderate negative loading of NO 2 −N (−0.620), and moderate positive loading of NO 3 −N (0.507). NO 3 -N may additionally have derived from agricultural areas in the region, where inorganic nitrogen fertilizers are in common use and the role of domestic waste is strong, and hence, this component can be best explained by a "nutrient" factor representing influences from nonpoint sources such as agricultural runoff and the domestic pollution factor. The reports of Yilma et al. [35] in Ethiopia and Zhang et al. [27] elsewhere were comparable with this result. This factor is typical of the monitoring stations in the middle section including point sources and eastern and western sides of Lake Hawassa (MS4, MS10, and MS17), where domestic sewage, industrial effluents, and agricultural runoff are predominant.
The FA in LHW extracted three factors by retaining the PCs through varimax rotation that explained 70.5% of the total variance for the dry season. The first factor (F1), accounting for 45.7% of the total variance, showed strong positive loadings of TDS, EC, PO 4 −P, BOD, DO, TP, Na + , and temperature, having factor loadings of 0.962, 0.961, 0.830, 0.796, 0.897, 0.783, and 0.973, respectively; moderate positive loadings of K+, COD, and TN (0.572, 0.721, 0.724); and strong negative loadings of DO (−0.847). Strong positive loadings of temperature and strong negative loadings of DO might suggest the impact of seasonal variations. The strong and moderate positive loading of BOD and COD signify biodegradation of organic matters and negatively affect DO of water bodies. F1 stands clearly for pollution by BOD or COD, and nutrients and oxygen depletion is a consequence. High temperature increases biodegradation and reduces solubility of oxygen in the water. This PC was correlated with COD and BOD 5 , indicating a group of purely organic pollution indicator parameters from uncontrolled domestic discharges caused by rapid urbanization and industrial effluents. Biodegradation of organic matter causes concentrations of BOD and dissolved oxygen in water [23,27,51].
A high loading of nutrients represents pollution from industrial setup and domestic wastewater. High loading of metals demonstrates the influences of industrial discharges. Phosphorus and nitrogen may originate from point sources such as sewage pollution, agricultural runoff in the upper stream due to irrigation, industrial facilities, and livestock. Consequently, this component is more likely to be explained by the combination of domestic pollution factors and industrial factors. Strongly positive loading of Na + and moderate positive loadings of K + are likely to originate from industrial effluents discharged directly into the Tikur-Wuha River and Lake Hawassa. These results are also supported by similar findings obtained elsewhere [27,69].
This factor is more pronounced at monitoring stations in the upper catchment (MS1 and MS3), monitoring stations in the middle section including point sources (MS4, MS5, MS15 and MS19), Tikur-Wuha River (MS6), and monitoring stations from both eastern and western sides of Lake Hawassa (MS9, MS10, MS14, MS16, and MS17), where domestic sewage, industrial effluents, and agricultural activities are predominant. The major industries discharge their treated and untreated effluents directly into Tikur-Wuha River and the lake during the dry period when the flow is low, which might lead to higher pollution. On the other hand, the strong loadings of TN and TP at F1 suggest a higher contribution of point sources from industrial facilities and agricultural runoff in the upper stream due to irrigation. Generally, these factors suggest a blended source of contamination encompass-ing municipal and industrial point source and livestock. This result is also confirmed by other studies [5,23,33,67,69]. Hence, it can be considered to be the contamination index for surface water [44,45].
The second factor (F2) explained 16% of the total variance and had a strong negative loading of turbidity (−0.781), a moderate negative loading of NO 2 −N and Mg +2 (−0.567, −0.531), and a moderate positive loading of NO 3 −N and Ca +2 (0.599, 0.524). NO 3 -N could be mainly from point sources, and the role of domestic waste is also strong. Hence, this component can be explained by the "nutrient" factor, which represents influences from non-point sources such as the domestic pollution factor [24,27,32,35,66,69]. A moderately positive loading of K + and a moderately negative loading of Mg 2+ in this factor likely originate from industrial discharges into the Tikur−Wuha River and Lake Hawassa. This PC is more influenced by industrial discharges, and monitoring stations from the LHW, where industry is predominant, are more pronounced. This factor is more pronounced in monitoring stations in the upper catchment (MS2) and the monitoring stations in the eastern and western sides of Lake Hawassa (MS11, MS12, MS13, and MS18), where domestic, industrial, and agricultural activities are predominant in the upper stream due to irrigation.
The third factor (F3), explaining 8.8% of the total variance, had a strong positive loading of pH (0.775), suggesting the dominance of physical reactions by aquatic plants and natural weathering of the basin, and attributed to industrial impact from different sources [22]. A moderate positive loading of NH 3 −N (0.7) indicates the biodegradation of organic matter causing concentrations of waterborne factors such as NH 3 −N. This variable originated primarily from wastes from point sources of pollution from domestic and industrial areas. Furthermore, NH 3 −N is triggered by organic matter decomposition, indicating the discharge of domestic sewage to surface water. Reports elsewhere support the findings of this study [42,44,45,70]. This factor is more pronounced in monitoring stations on the eastern side of Lake Hawassa (MS7 and MS8), where domestic sewage, industrial effluents, and agricultural activities are prevalent.
The bi-plot of PCs on key parameters TDS, EC, PO 4 −P, DO, BOD, COD, TN, TP, temperature, Na + , K + , Turbidity, NO 2 −N, NO 3 −N, Mg 2+ , and Ca 2+ that characterize monitoring stations from rivers in the upper and middle catchment, point sources in the middle catchment, and the eastern and western sides of Lake Hawassa are presented in Figure 5a,b for dry and wet seasons. In fact, the average values of EC, TDS, BOD, COD, Na + , K + , Mg 2+ , Ca 2+ , and NH 3 −N of point sources were exceedingly higher than that of rivers in the upper and middle catchment (MS1-MS3, and MS6) and Lake Hawassa (MS7-MS14, MS16 and MS18) in Table 6. In addition, NO 3 −N, NO 2 −N, TN, TP, and PO 4 −P were the main parameters characterizing the stated monitoring sites in both seasons. These stations predominantly include rural areas, urban and peri-urban areas, and industrial sites from which domestic sewage, urban runoff, and effluents are discharged into the lake. Furthermore, the influence of agricultural activities in the upper catchment and Tikur-Wuha River feeding the lake was evident. The results of this investigation were comparable to the findings of the studies conducted by Tibebe et al. [71] and Meshesha et al. [72] on Lake Ziway. In particular, higher EC and TDS values were recorded for similar monitoring stations in both seasons (Table 6). In an aquatic environment, EC is used to categorize the pollution status of surface waters, and an increase in conductivity indicates the presence of dissolved ions that can affect aquatic life and water quality [73].
the influence of agricultural activities in the upper catchment and Tikur-Wuha River feeding the lake was evident. The results of this investigation were comparable to the findings of the studies conducted by Tibebe et al. [71] and Meshesha et al. [72] on Lake Ziway. In particular, higher EC and TDS values were recorded for similar monitoring stations in both seasons (Table 6). In an aquatic environment, EC is used to categorize the pollution status of surface waters, and an increase in conductivity indicates the presence of dissolved ions that can affect aquatic life and water quality [73].

Total Nitrogen to Total Phosphorus (TN:TP) Ratio
The TN:TP ratio in lakes and reservoirs is a key element, as it gives an idea of which of these nutrients is either in excess or limiting to growth, and it was used to estimate the

Total Nitrogen to Total Phosphorus (TN:TP) Ratio
The TN:TP ratio in lakes and reservoirs is a key element, as it gives an idea of which of these nutrients is either in excess or limiting to growth, and it was used to estimate the nutrient limitation in the lake. According to Smith [74], blue-green algae (cyanobacteria) has a capacity to dominate in the lake section when the TN:TP ratio was less than 29, and it tends to be rare in the lake when TN:TP > 29. On the other hand, Fisher et al. [75] used a more conservative ratio of TN:TP. According to them, the ratio > 20 is designated as the phosphorus limitation and nitrogen limitation when the ratio is <10, while a TN:TP ratio of 10 to 16 demonstrates either phosphorus or nitrogen (or both) are limiting for growth. The estimated ratio for Lake Hawassa was 31, which is higher than 20 and 30, revealing cyanobacteria dominance in the lake section, which is rare. The TN:TP ratio > 20 in Lake Hawassa indicated that phytoplankton growth in the lake might be phosphorous deficient.

Conclusions
Multivariate statistical techniques help researchers to scrutinize the relationships between parameters in a broader fashion by applying different approaches such as cluster analysis, correlation, factor analysis, discriminant analysis, and multiple regressions to determine the association between dependent and independent variables. They reduce the dimensionality of data so that the whole picture can be visualized more easily than looking at specific cases allows. Furthermore, multivariate techniques provide powerful significance testing compared to univariate techniques. Despite their various merits, the results of multivariate statistical modeling are not easy to interpret and require a large data set to get meaningful results due to the high standard errors. In particular, PCA/FA is likely to lose information if PCs or factors are not chosen judiciously.
This study was conducted to evaluate seasonal and spatial variations in water quality and to identify potential sources of pollution using multivariate statistical techniques for the Lake Hawassa Watershed. The results of this study show that the condition of Lake Hawassa Watershed was classified into moderately and highly polluted categories in both dry and wet seasons. In data-limited developing countries such as Ethiopia, it is especially clumsy to identify possible sources of pollution due to certain contaminants, as this requires frequently monitored water quality data, which are often not available. To address this serious problem, this study applied MVST. Multivariate statistics were used to perform temporal and spatial assessment of surface water quality to reduce the number of monitoring stations and chemical parameters in LHW. In this study, we used Pearson correlation, PCA/FA, CA, and DA to evaluate spatial and temporal variance in surface water quality.
CA grouped the monitoring stations into two statistically significant clusters for the dry and wet seasons, labelled MP and HP, using PI. Accordingly, this resulted in a dendrogram with two clusters for the dry and wet seasons. The findings of the study revealed that rivers in the upstream and middle portion of the lake watershed and Lake Hawassa were moderately polluted (MP), while point sources (industries, hospitals, and hotels) in the middle of the LHW were found to be highly polluted (HP).
DA was used to identify the most critical parameters to investigate the spatial variations and extracted seven significant parameters: EC, DO, COD, TN, TP, Na+, and K+, with spatial variance to distinguish the pollution statuses of the groups obtained using CA.
PCA/FA techniques helped to identify the potential sources of water quality degradation. This study comprehensively analyzed the water quality of LHW and identified three significant sources responsible for pollution of Lake Hawassa Watershed in dry and wet seasons affecting the water quality. Accordingly, the pollution is due to mixed sources including point sources such as municipal and industrial effluents, natural processes, livestock, urban runoff, and non-point sources from agricultural activities.
Poor industrial effluent management combined with non-point sources from agriculture and urban runoff contribute significantly to the pollution of Lake Hawassa. Discharge of industrial effluents into the surface water system is the largest point source of anthropogenic pollution. Diffuse sources that contribute enormously to LHW come from agricultural activities, i.e., intensive farming and livestock (F1, F2, and F3).
We conclude that effective management of point and non-point source pollution is imperative to improve domestic, industrial, livestock, and agricultural runoff to reduce pollutant inputs into the lake. A stringent management that requires a comprehensive application of technologies such as fertilizer management, ecological ditches, constructed wetlands, and buffer strips should complement proper municipal and industrial wastewater treatment set-up.
Furthermore, application of indigenous aeration practices such as the use of drop structures at critical locations would help improve water quality in the lake watershed.