Quantifying the Information Content of a Water Quality Monitoring Network Using Principal Component Analysis: A Case Study of the Freiberger Mulde River Basin, Germany

: Although river water quality monitoring (WQM) networks play an important role in water management, their effectiveness is rarely evaluated. This study aims to evaluate and optimize water quality variables and monitoring sites to explain the spatial and temporal variation of water quality in rivers, using principal component analysis (PCA). A complex water quality dataset from the Freiberger Mulde (FM) river basin in Saxony, Germany was analyzed that included 23 water quality (WQ) parameters monitored at 151 monitoring sites from 2006 to 2016. The subsequent results showed that the water quality of the FM river basin is mainly impacted by weathering processes, historical mining and industrial activities, agriculture, and municipal discharges. The monitoring of 14 critical parameters including boron, calcium, chloride, potassium, sulphate, total inorganic carbon, fluoride, arsenic, zinc, nickel, temperature, oxygen, total organic carbon, and manganese could explain 75.1% of water quality variability. Both sampling locations and time periods were observed, with the resulting mineral contents varying between locations and the organic and oxygen content differing depending on the time period that was monitored. The monitoring sites that were deemed particularly critical were located in the vicinity of the city of Freiberg; the results for the individual months of July and September were determined to be the most significant. In terms of cost ‐ effectiveness, monitoring more parameters at fewer sites would be a more economical approach than the opposite practice. This study illustrates a simple yet reliable approach to support water managers in identifying the optimum monitoring strategies based on the existing monitoring data, when there is a need to reduce the monitoring costs. This work establishes a simple quantification of the cost ‐ effectiveness framework of the monitoring networks based on PCA results for the FM river basin. Under the current monitoring ‐ intense conditions, preserving monitoring variables rather than sites seems to be more economical than the opposite practice. To achieve 75% of variance, it is recommended to monitor 23 parameters at 72 monitoring sites, rather than monitoring 14 parameters at 151 monitoring sites, with the first option resulting in a cost decrease of 20% compared to the second option. Different variable selection strategies increase in significance depending on the requirement for substantial cost reductions. Up to 40% of information can be retained for less than 15% of current costs, at either 21 sites with all variables or 31 sites with the main variables (PCs 1 - 5), or 50 sites with more economical variables (PCs 1,3, and 4). This approach is restricted to quantify the basin ‐ wide variability of water quality based on the previously established water quality variables and sampling sites. Further quantification of monitoring frequencies still needs to be specified in order to assess the effectiveness of the monitoring network. Often, monitoring intends to assess the state or development of a water body. Objectives such as trend detection or compliance assessment require other evaluation criteria, rather than information explained. The monitoring costs in this study were estimated only based on laboratory, transportation, and sampling costs, but the costs of the whole monitoring program can be easily incorporated into the presented approach if the data of monitoring period, frequencies, and other costs (logistics, personnel, maintenances, etc.) are available. This approach may support water managers and practitioners in selecting the optimum monitoring sites and variables through a rational understanding of the dynamic sources of water quality, when there is a need to reduce the monitoring costs.


Introduction
Rivers are the main inland freshwater source for domestic, industrial, and agricultural purposes [1]. As a result of the deleterious effects of human activities and population growth, about one-third of the river stretches in Latin America, Africa, and Asia have been affected by severe pathogen contamination, and one-seventh by organic pollution [2]. Additionally, natural processes such as precipitation, erosion, and weathering of crustal materials can also contribute to the impairment of

Study Area
Freiberger Mulde (FM) is a 124-km long siliceous river with a catchment area of 2985 km² [30]. It is the headstream of the three main tributaries of the Mulde River, which is one of the important western tributaries of the Elbe River in Germany. Running northwest and rising from the Ore Mountains in Czech Republic, the FM river has been historically polluted with heavy metals due to both geogenic and human activities, especially by ore mining [31]. Even now, the river basin is still considered a major source of heavy metals to the Elbe River [30].
The monitoring program for surface water of the FM river basin is under the context of Water Framework Directive (WFD) and aims at collecting the data for a status assessment of biological, chemical, and physicochemical water quality elements. A total of 463 water quality parameters have been monitored in the FM river basin since 1999, including general physiochemical parameters, industrial pollutants, pesticides, herbicides, and pharmaceuticals. The monitoring network in the FM river basin is comprised of 364 measuring points, 27 measuring points of which are on the mainstream of the FM River and an additional 337 measuring points are on the tributaries of its river network ( Figure 1).

Data Selection and Preparation
The monitoring data for the FM river basin, which have been collected by the Free State of Saxony since 1999 and which are freely accessible on their water quality database platform, were used for this research [32]. The process of preparing the dataset for the application of multivariate statistical analysis consisted of selecting water quality variables and monitoring stations, while minimizing the missing values. The water quality parameters that were considered for current analysis include chemical and physiochemical elements that explain the catchment processes, such as the influence of both the drainage basin and local environmental conditions. For this reason, longterm and frequently-monitored parameters were prioritized. Parameters with data availability of less than 30% or with censored data comprising more than 15% (concentrations below the detection limits and/or below the quantitation limits of the analytical methods) were excluded. The records under the censor limits were replaced by half of the detection limits and/or quantitation limits. According to the United States Environmental Protection Agency [33], this percentage of censored data is acceptable for a substitution method. The soluble concentrations in total water samples were used for the analysis. Maps of the river basin and monitoring stations were also obtained via Saxony's open access on geodata portal [34].
The selection of monitoring stations was based on the availability of monitoring data. In this study, we considered the monitoring stations that had at least three years of monitoring data and 12 sampling events (Figure 1). The continuity of the monitoring years was not necessarily required, because the variables were assumed to be independent and identically distributed.

Principal Component Analysis
Principal component analysis (PCA) is a popular multivariate statistical technique used for dimension reduction [35]. PCA provides information on the most meaningful variables, thus describing the whole dataset and rendering data reduction with a minimum loss of the original information [1]. PCA transforms the original variables into new, uncorrelated variables called the principal components (PCs) [36]. The calculation to obtain PCs is given in Abdi and Williams [19]. In this study, PCA is implemented based on the correlation matrix. In instances where the variables are highly correlated, the first few principal components may be sufficient to describe most of the variability of the dataset [37]. The importance of a component is reflected by its eigenvalue. PCs with eigenvalues less than one are commonly recommended to be ignored [1,22,38]. To strengthen the interpretation, PCs with eigenvalues more than one are subjected to the varimax rotation, which generates rotated components (RCs). RCs further simplify the data structure coming from PCA [1,22]. The varimax rotation technique prevents multiple variables from being loaded to a single component, allowing for easy interpretation of significant variables [24]. Because these rotations are performed in a subspace, the new rotated components explain less variance than the original principal components, but the total variance remains the same after rotation [19].
In PCA, the correlations between a variable and a component are loadings, which estimate the information that they share. For interpretation, variables that have absolute values of loadings greater than or equal to 0.7 are strongly correlated, from 0.5 to 0.7 are moderately correlated, and less than 0.5 are weakly correlated to the component [1,22]. In other words, the larger the loading values, the more important that variable is to explain the component. The length of the projection of the observations on the components are factor scores. The importance of an observation for a component can be obtained by the ratio of the squared factor score of this observation to the sum of squared factor scores of all observations in the component [19]. This ratio is called contribution of the observation to the component. Details of the equations to calculate loadings, factor scores, and observation contributions can be found in Abdi and Williams [19]. For a given component, the sum of the contributions of all observations is equal to 1. Thus, the larger the value of the contribution, the more the observation contributes to explaining the component [19]. In this study, the observation contributions in percentage are used to calculate the importance of the monitoring sites in explaining the spatial variability and importance of the monitoring months in explaining temporal variability of the water quality. On each component, the contribution of a monitoring site is calculated as the sum of the contributions of all observations on that site during the whole monitoring period. Similarly, the contribution per monitoring month is calculated as the sum of contributions of all observations of all sites on that month. The variance explained by a monitoring site at any component is quantified by the product of its contribution and the variance explained by the selected component.
PCA is carried out in R software, and varimax rotation is implemented on R package psych [39]. The factor scores, loadings, and contribution of observation can be directly extracted using R package FactoMineR [40]. Map visualizations of PCA's results are implemented on QGIS (version 2.18.16) software [41].

Data Screening and Descriptive Statistics
A thorough review of the existing dataset revealed that the timing of the sample collection was routine and not intended to capture any specific event. Of the monitoring sites of small tributaries, some were dismissed in 2006 and some were added after 2007. Only the main river and big tributaries such as Zschopau, Flöha, Gimlitz, Pockau, and Hüttenbach have been monitored long enough to obtain continuous data series from 1999 to 2016. This could be a result of the WFD implementation in Germany, where one of the WFD-mandated deadlines for "setting up networks and putting them into operation" was December 2006 [42]. For this reason, the monitoring period considered in this study is restricted to 2006-2016. Although more than 80 water quality parameters were screened, only 23 parameters were selected based on the criteria mentioned in Section 2.2. After the initial screening, the selected database included 7541 sampling events covering 23 parameters at 151 monitoring sites, for a period of 11 years (2006 to 2016). A descriptive statistics summary with the percentages of censored data is presented in Table 1. The monitoring sites cover the large streams of Freiberger Mulde, Zschopau, Große Striegis, Flöha, Bobritzsch, Aschbach, and 75 other smaller tributaries. It is noted that most of the parameters do not follow normal distribution with high standard deviation and skewness, with the exceptions of oxygen and temperature (Table 1). For principal component analysis, non-normal data is log-transformed and then standardized to zero mean and unit of variance to avoid misclassification arising from different scales and units of the monitored variables.

Characterized Water Quality Parameters and Sources Identification Based on Factor Loadings
The results of the principal component analysis for the data matrix of (7541 observations × 23 variables) in the FM river basin are shown in Table 2. There are five principal components (PCs) with the eigenvalue more than one, explaining 75.1% of the water quality variability ( Figure 2). PC1 is strongly and positively correlated to HCO3 − , Ca 2+ , Cl − , Mg 2+ , K + , Na + , SO4 2− , Boron and TIC and moderately correlated to NO3 − and TON. The sources of these ionic concentrations may have multiple origins: rainfall, weathering of silicate and carbonate minerals, dissolved minerals contained in some sedimentary rocks, or leaching from the soil surface during rainstorms [43]. The first PC represents the weathering process and explains 37.6% of the total water quality variability in the river basin. The second component accounts for 12.9% of the observed data variability and has strong negative loadings on zinc and moderate loadings on nickel, fluoride, and arsenic. These trace metals and anions appear naturally in river waters through the weathering of minerals and also anthropogenically through the mixing of industrial effluents into the river streams and non-point pollution sources [44]. Taking into account the historical mining activities in the FM river basin [30], the major sources of PC2 are likely related to abandoned mines in the Ore Mountains. The third component explains 11.9% of the total variance and has negative strong loadings on dissolved organic carbon and total organic carbon and positive moderate loadings on turbidity and NO3 − . PC3 represents organic matter, which could originate from the natural decomposition of organic material, as well as anthropogenic activities including agriculture and domestic wastewater discharges [45]. PC4 shows the inverse relationship between temperature and oxygen, representing the seasonal effects and explaining 7.4% of the variance. PC5 accounts for only 5.3% of the data variability and does not show strong or moderate correlation to any variable. To strengthen the interpretation, varimax rotation was applied for the first five principal components and resulted in a new set of loadings (Table 2). For the first four components, PCA and varimax-rotated PCA gives the same interpretation of the hidden factors that affect the water quality variability of the FM river basin. The first rotated component (RC) links to the major ions and total inorganic carbon but only explains 34.9% instead of 37.6% of the data variability. RC2 also relates to mining activities and weathering processes, but with strong loadings on arsenic and fluoride and moderate loading on zinc, it explains only 11.2% of the total variance. RC3 conveys the same strong correlation to organic carbon and turbidity and accounts for 11.4% of the observed variability. RC4 shows the seasonal effects with strong positive loading on temperature and negative loading on oxygen, which explains 9% of the total variance. It is only on the fifth component that the varimax rotation shows a strong loading of manganese and moderate loadings of nickel and zinc and this explains 8.7% of the total variance. Therefore, the sources of RC5 could also be the weathering processes and the historic mining activities. In combining the results from PCA and varimax rotation, the major sources of surface water quality variation in FM river include weathering process, mining activities, agriculture, seasonality, and wastewater effluents. For the first five components, 75.1% of the variance can be explained by 18 parameters: HCO3 − , Ca 2+ , Cl − , Mg 2+ , K + , Na + , SO4 2− , Boron, TIC, Arsenic, Zinc, Nickel, Fluoride, DOC, TOC, Temperature, Oxygen, and Manganese. Notably, PCA does not give a substantial data reduction with more than 78% of the parameters (18 out of 23) to explain 75.1% of the data variation.  PCA does not explicitly account for the redundancy of correlated variables. To further reduce the number of monitoring parameters, Pearson's correlation coefficients for all 23 parameters were computed for the entire monitoring period ( Figure 3). If the correlation coefficient is between 0.9 and 1 (or −0.9 and −1), the two variables are highly correlated and can be represented by a linear relationship. Thus, for the paired variables that have correlation coefficients of more than 0.9, one of them could be discarded to reduce the redundancy of the information, e.g., Cl − − Na + (0.92), HCO3 − − TIC (0.96), Ca 2+ − Mg 2+ (0.92), and DOC − TOC (0.94), with the variable of higher loading on the principal component being kept. Consequently, combining the PCA results and Pearson correlation analysis, four parameters (Na + , HCO3 − , Mg 2+ , DOC) can be further discarded. As a result, 14 variables (Boron, Calcium, Chloride, Potassium, Sulphate, TIC, Fluoride, Arsenic, Zinc, Nickel, Temperature, Oxygen, TOC, and Manganese) now explain 75% of the total variance, and therefore, should remain under observation.

Spatial and Temporal Variability of Water Quality Based on the Contribution of Observations
Variation of water quality is captured and represented by sampling points (geographical or pollution effect) and sampling months (seasonal effect) [20]. In this study, the contributions of monitoring sites were used to visualize the spatial variation of water quality in the FM river basin (Figure 4). For a given component, the value of contributions in percentage are summed up to 100, with the variance explaining the monitoring sites summed up to the variance explained by that component. One percentage of contribution is recommended as the threshold to decide if a monitoring site is critical on a specific component. As such, 25 monitoring sites contribute more than one percent to PC1. They are located mostly on the upstream tributaries (16 sites) and partly on the FM river and its small first-order streams (9 sites). These monitoring sites make up 23% out of 37.6% of variance explained by the first component. The weathering process is therefore spatially dependent and plays an important role in the upstream and mainstream of the FM river basin in explaining the water quality variance. The second component shows higher contributions in three places: Wilisch (in the upper west of the river basin) and Roter Graben and Münzbach streams, which are close to Freiberg city where abandoned mines and heavy industries are located. On the contrary, monitoring sites where organic matter was observed showed that this component was homogenously distributed across the river basin, with only minor variations being observed. Notably, the highest contribution site to PC3 (10.1%) is located on Lampertsbach, which is a 5.6-km long stream running through the populated area of Cranzahl and connected to the Sehma river. The second-highest contribution site of organic matter was shown to be on the Schwarze Pockau, derived from bog-water in the Ore Mountains. Like PC3, the fourth component (PC4) was evenly distributed among the monitoring sites, with a maximum deposit of only 1.75% on Zschopau river, again showing that temperature and oxygen contribute to minor spatial variations. The most relevant sites for measuring the fifth component (PC5) are located on the Graben and Münzbach streams, with both being affected by mining discharges from Freiberg city industry.
The temporal variability of each principal component was analyzed according to the monthly contributions of each observed component, and calculated using the total amount of observations. The contribution per component over a 12 months period was summed up to 100%; if each month shows an equal contribution, the period of time sampled shows a negligible impact on that component. A fluctuation in contributions over several months indicated that the component is subjected to temporal variation, with a higher contribution suggesting the influence of the time period. Figure 5 shows the contribution over the entire monitoring period of 12 months (January to December) for the first five components. The first component (PC1) remains almost constant over the 12 months, with a minor contribution shift observed during the cold season of November to March, potentially because the rate of chemical weathering decreases with the decreasing temperature [46]. This also indicates the minor influence of the sampling months on the mineral contents in the FM river basin. For the second component, a weak point is shown during warm periods, accompanied by a peculiar pattern of higher and lower contributions, differing from month to month, with low contributions in even months and high contributions in odd months. While the reason for this oscillating contribution remains unclear, the observations suggested that the monitoring schemes favored odd months over even months. The fluctuations in components three and four are quite similar: in PC3, the maximum variation of organic matter is observed in July (12.1%), which is almost twice the contribution of April (5.6%) and December (5.9%). The higher contribution of organic matter from June to September could be related to lower and more variable flow during summertime. In PC4, the extreme warm (July to September) and cold months (January to March) play a bigger role than the milder months of April, May, and October in demonstrating the variation. The fifth component resembles patterns of the first and second component, with less seasonal variation of manganese in the FM river basin.
Based on the contributions of observations from PCA, the mineral contents (major ions) in the FM river basin are mainly impacted by sampling locations rather than seasonality. In contrast, sampling months play a more important role in explaining the variation of organic matter, temperature, and oxygen than sampling locations. Both monitoring schemes and locations influence the heavy metals variation in the FM river basin. Temporally, July and September contribute the most and December the least in explaining the data variability. As an implication, monitoring strategies should focus more on the warmer months to capture the most variability of water quality in the FM river basin. The significance of discharge variability for concentration variability should be further studied. Spatially, areas close to Freiberg city and to the upper west of the river basin are the hotspots in terms of heavy metals and mineral contents in explaining the water quality variations.

Cost-Effectiveness of Proposed Water Quality Monitoring Network Based on PCA Results
Although the principal component analysis helped to identify the critical factors, variables, and monitoring sites that explain the water quality variability, this information still does not constitute a criterion to decide if the proposed variables and sites present the optimum options. This section strives to provide a solution to this problem by quantifying the cost-effectiveness of the monitoring network based on the results from PCA. According to Harmancioglu, et al. [10], a possible way of measuring the benefits of monitoring practice can be the information conveyed by the collected data. This study was conducted under the assumption that the "effectiveness" or the "information" of a monitoring network corresponds to the water quality variance deriving from the monitoring data collected. The information is therefore equivalent to the variance explained by the principal components: specifically, if only strong loading parameters on the first component (Ca 2+ , Cl − , Mg 2+ , K + , Na + , SO4 2− , Boron, and TIC) are monitored for all monitoring sites, then only 37.6% of water quality variability is preserved. Cumulatively, if all 10 strong loading variables on PC1 and PC2 are monitored (at all 151 monitoring sites), then the monitoring network retains 50.5% of its information. Depending on the monitoring requirement, the water managers can select the parameters for observations on specific components accordingly.
Monitoring costs in the state of Saxony are program-based and the monitoring prices of different parameters are not available. Therefore, we estimated the monitoring costs based on the 2019 services' price list of Brandenburg, another State in Germany that neighbors Saxony [47]. These estimations include the cost of transportation (for an average of 10 monitoring sites per day), sampling, and laboratory analysis. Detailed prices are given in Table 3. If only the laboratory cost was considered, monitoring of organic matter (PC3), temperature and oxygen (PC4), and inorganic contents (PC1) would be more economical compared to the heavy metals (PC2 and PC5), with the percentage of information achieved per euro being 0.71, 0.68, 0.58, and 0.19, respectively. Furthermore, monitoring of all 23 variables appeared to be less economical than monitoring the 14 critical variables of the first five components. * price for all the listed parameters at one analysis; † price is assumed to be equivalent to TOC.
Each monitoring site has a different contribution to a component to explain the total variance; the variance explained by a monitoring site i on j components, denoted , , is calculated as: where , is the contribution of monitoring site i on component j, and is the variance explained by j-th component. The variance explained by monitoring sites on each component is given in Annex 1. According to the monitoring variables and number of monitoring sites, sampling and monitoring costs are estimated for one monitoring event. To quantify the cost at different levels of information achieved, the monitoring costs were estimated for five scenarios: PC1: monitoring of six variables strongly correlated to PC1 (Ca 2+ , Cl − , K + , SO4 2− , Boron, and TIC) and obtaining 37.6% of information accordingly; PC1,2: monitoring of 10 variables strongly correlated to PC1 and PC2 (Ca 2+ , Cl − , K + , SO4 2− , Boron, TIC, Fluoride, Arsenic, Zinc, Nickel) and obtaining 50.5% of information accordingly; PC1,3,4: monitoring of nine variables strongly correlated to PC1, PC3, and PC4 (Ca 2+ , Cl − , K + , SO4 2− , Boron, TIC, TOC, temperature, oxygen) and obtaining 56.9% of information accordingly; PC1-5: monitoring of 14 variables correlated to the first five components (Ca 2+ , Cl − , K + , SO4 2− , Boron, TIC, Fluoride, Arsenic, Zinc, Nickel, TOC, Oxygen, Temperature, Manganese) and obtaining 75.1% of information accordingly; and All PC: monitoring of all 23 variables and obtaining 100% of the information.
An adaptation from the cost-effectiveness plane illustrating the information and the costs of different monitoring options is shown in Figure 6. Monitoring sites are in descending order based on their contributions to the variance explained (given in Appendix A) for calculating cumulative variance. The cost-effectiveness plane in our case consists of four-quadrants: high information-low cost, high information-high cost, low information-high cost, and low information-low cost. Five strategies of variable selection according to the variance explained by principal components are also displayed in the same diagram ( Figure 6). The current monitoring practice of 23 variables at 151 monitoring sites would give 100% of information on data variability at estimated 51,507 euro per monitoring event (equivalent to 100% cost). A reduction of monitoring sites or WQ variables would result in a decrease in the information achieved as well as the monitoring costs. As such, monitoring of six variables of PC1 (Ca 2+ , Cl − , K + , SO4 2− , Boron, and TIC) at 151 sites would cost 17,487 euro (40% of the total cost) but would only give 37.6% of the information. Monitoring the 10 variables of PC1 and PC2 (Ca 2+ , Cl − , K + , SO4 2− , Boron, TIC, Fluoride, Arsenic, Zinc, Nickel) at 151 sites would explain 50.5% of the information at the cost of 26,411 euro (~51.2% compared to the total cost). The PC1 curve lies completely in the low information-low cost quadrant, while the PC1,2 curve exceeded 50% of the cost at 148 sites. Combination of three components (PC1,3, and 4) with cost-effective variables explained 57% of the data variability at the cost of 21,669 euro for all sites, which provides more information at less cost than the combination of PC1 and 2. The high laboratory costs of heavy metals made the cost of PC1,2 curve increase faster than the information added, as compared to the PC1 and PC1,3,4 curves. Although the curves of PC1-5 and All PC expand in three quadrants, it is deemed more effective to monitor more variables at fewer sites than the opposite practice. For example, to achieve 75.1% of information, measuring 14 variables at 151 sites would cost 34,837 euro while monitoring all variables at 72 sites provides the same amount of information and would only cost 24,616 euro. It is noteworthy that the strategies all parameters (All PC), main parameters (PC1-5), and cost-effective parameters (PC1,3,4) perform similarly: up to 45% information, although they differ in their emphasis on number of sites versus number of parameters. Other options with different level of cost and information achieved can be compared easily using the rank of monitoring sites in Annex 1 and price list in Table 3.
The most challenging aspect of this approach is the selection of the representative variables on the principal components. In this study, strong loading variables (loadings > 0.7) with low correlation coefficient (<0.9) were selected; by ignoring other variables, part of the variance explained would certainly be lost. In addition, the input for PCA requires no missing data; thus, this approach is datadependent and only applicable when a decision must be made to remove monitoring sites or WQ parameters. The quantification of information based on the variance explained limits the objective of the designed monitoring network to the determination of changes in water quality only, without consideration of other specific objectives such as trend detection and compliance monitoring. Finally, the cost estimation was simplified for one monitoring event without consideration of other fixed and operational costs of a monitoring program. In order to curb these limitations and provide a more effective monitoring network design, future research should consider the quantification of multiobjectives monitoring network (data quality, information accuracy, statistical methods, monitoring costs, stakeholder views, social factors, etc.) and monitoring frequencies.

Conclusions
This study demonstrates the usefulness of principal component analysis (PCA) in analyzing the complex dataset to address the water quality management in rivers. PCA proves to be useful for the analysis of 11-year irregular monitoring data from the Freiberger Mulde (FM) river basin, which is comprised of 23 water quality parameters and 151 monitoring sites. A combination of PCA and Pearson's correlation analysis allowed for identification of 14 critical parameters that are responsible for explaining 75.1% of data variability in the river basin. Weathering processes, historical mining, wastewater discharges, and seasonality are the main causes of the river water quality variability. The contributions calculated from factor scores are very insightful in interpreting spatial and temporal sources of water quality variations. As such, heavy metals are impacted by both sampling locations and sampling time. Specifically, Wilisch (in the upper west of the river basin), Roter Graben, and Münzbach streams, which are close to the Freiberg city, appear to be the best selections for monitoring of heavy metals. Monitoring of those significant sites is recommended to guarantee the continuity of effective water quality monitoring in the future. The mineral contents play an important role in explaining the water quality variations of the FM river basin and are impacted more by the sampling locations than the sampling months. The variation of organic matter, oxygen, and temperature, in contrast, are more dependent on the sampling months rather than the sampling locations, with July and September contributing to the highest variability in water quality. Temporarily, five major factors explaining water quality of the FM river basin vary the most in July and September and the least in December, hence the future monitoring scheme should concentrate more on the warmer months. This work establishes a simple quantification of the cost-effectiveness framework of the monitoring networks based on PCA results for the FM river basin. Under the current monitoringintense conditions, preserving monitoring variables rather than sites seems to be more economical than the opposite practice. To achieve 75% of variance, it is recommended to monitor 23 parameters at 72 monitoring sites, rather than monitoring 14 parameters at 151 monitoring sites, with the first option resulting in a cost decrease of 20% compared to the second option. Different variable selection strategies increase in significance depending on the requirement for substantial cost reductions. Up to 40% of information can be retained for less than 15% of current costs, at either 21 sites with all variables or 31 sites with the main variables (PCs 1-5), or 50 sites with more economical variables (PCs 1,3, and 4). This approach is restricted to quantify the basin-wide variability of water quality based on the previously established water quality variables and sampling sites. Further quantification of monitoring frequencies still needs to be specified in order to assess the effectiveness of the monitoring network. Often, monitoring intends to assess the state or development of a water body. Objectives such as trend detection or compliance assessment require other evaluation criteria, rather than information explained. The monitoring costs in this study were estimated only based on laboratory, transportation, and sampling costs, but the costs of the whole monitoring program can be easily incorporated into the presented approach if the data of monitoring period, frequencies, and other costs (logistics, personnel, maintenances, etc.) are available. This approach may support water managers and practitioners in selecting the optimum monitoring sites and variables through a rational understanding of the dynamic sources of water quality, when there is a need to reduce the monitoring costs.

Conflicts of Interest:
The authors declare no conflicts of interest. Values in bold show the important sites on the principal components. PC3-organic matter and PC4seasonality show a minor dependence on the sampling locations with the variance distributed quite homogenously among the sites. On the first five components, contribution of a monitoring site can be calculated as the quotient of its variance and the total variance explained by the component.