^{1}

^{*}

^{2}

^{1}

This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Concerns about the water quality in Yuan-Yang Lake (YYL), a shallow, subtropical alpine lake located in north-central Taiwan, has been rapidly increasing recently due to the natural and anthropogenic pollution. In order to understand the underlying physical and chemical processes as well as their associated spatial distribution in YYL, this study analyzes fourteen physico-chemical water quality parameters recorded at the eight sampling stations during 2008–2010 by using multivariate statistical techniques and a geostatistical method. Hierarchical clustering analysis (CA) is first applied to distinguish the three general water quality patterns among the stations, followed by the use of principle component analysis (PCA) and factor analysis (FA) to extract and recognize the major underlying factors contributing to the variations among the water quality measures. The spatial distribution of the identified major contributing factors is obtained by using a kriging method. Results show that four principal components

Water quality is the main factor controlling healthly and diseased states in both humans and animals. Surface water quality is an essential component of the natural environment and a matter of serious concern today. The variations of water quality are essentially the combination of both anthropogenic and natural contributions. In general, the anthropogenic discharges constitute a constant source of pollution, whereas surface runoff is a seasonal phenomenon which is affected by climate within the water catchment basin [

Many investigations have been conducted on anthropogenic contaminants of ecosystems [

The application of different multivariate statistical techniques, such as principal component analysis (PCA), factor analysis (FA), cluster analysis (CA), and discriminate analysis (DA), assists in the interpretation of complex data matrices for a better understanding of water quality and ecological characteristics of a study area. These techniques provide the identification of possible sources that affect water environmental systems and offer a valuable tool for reliable management of water resources as well as rapid solution for pollution issues [

Geostatistical mapping is based on field observations. Because field surveys are limited by the cost of sampling, only sparse observation data are generally available. Geostatistical mapping or further analysis requires the assessment of exhaustive attribution values for an entire study area. Geostatistical mapping techniques have been widely applied to different fields including water quality in bays [

The objective of the present study was to analyze 14 physico-chemical water quality parameters in water samples collected on monthly basis from 2008 to 2010 in a subtropical alpine lake (Yuan-Yang Lake) in Taiwan. The data matrix obtained from field measurement was subjected to the CA, PCA, and FA techniques, as well as geostatistical mapping to evaluate information about the similarities between sampling stations and to ascertain the important contributions of nutrient sources among water quality parameters in the alpine lake.

The Long-Term Ecological Research (LTER) program is one of the core projects of the Global Change and Terrestrial Ecosystem program (GCTE), which is under the umbrella of the International Geosphere-Biosphere Program (IGBP). An understanding of ecological processes and of mechanisms leading to ecologically tragic events is particularly important for the sustainability of Taiwan Island. To meet such a requirement, the LTER project was initiated in 1992 on the island. Yuan-Yang Lake (YYL) is one of the six LTER sites and the only site associated with a mountain lake ecosystem in Taiwan. YYL, a small (3.6 ha) and shallow (4.5 m maximum depth) lake in a mountainous catchment 1,730 m above sea level, is located in the northeastern region of Taiwan (24°35′ N, 121°24′E) (

The steep watersheds are dominated by pristine Taiwan false cypress [

The sampling network including eight measured stations was designed to cover a wide range of key locations accounting for inflow and outflow (

Water temperatures were measured through the water column at 0.5 m increments using a thermistor chain (Templine, Apprise Technologies, Inc. Duluth, MN, USA). Wind speed was measured 1 m above the lake by an anemometer (model 03001, R.M. Young, Traverse, MI, USA). Precipitation, air temperature and downwelling photosynthetically active radiation (PAR) were measured at a land-based meteorological station approximately 1 km away from the lake. Variation in water levels was measured using a submersible pressure transmitter [PS 9800(1), Instrumentation Northwest, Kirkland, WA, USA] deployed at the lake shore (

The pH, turbidity, and Secchi depth were measured ^{3} samples through a glass fiber filter. The filter paper itself was used for the analysis. The filtering was group up 90% acetone solution and fluorometer is used to read the light transmission, which in turn was used to calculate the concentration of chlorophyll a. TSS and nutrients, concentration was analyzed using the US EPA standard method 160.1 [

CA is an unsupervised pattern recognition method that divides a large amount of cases into smaller groups or clusters based on the characteristics they process. The resulting clusters of objects should exhibit high internal (within cluster) homogeneity and high external (between clusters) heterogeneity. Hierarchical CA is the most common approach, which starts with each case in a separate cluster and joints the clusters together step by step until only one cluster remains and is typically illustrated by a dendrogram (tree diagram). The dendrogram provides a visual summary of the clustering process, presenting a picture of the groups and their proximity, with a dramatic reduction in dimensionality of original data. The Euclidean distance usually provides the similarity between two samples and a distance can be represented by the difference between analytical values from samples. In the present study, hierarchical CA was adopted to the standardized data using Ward’s method, with Euclidean distance as a measure of similarity. The Ward method applies an analysis of variance approach to assess the distances between clusters to minimize the sum of squares of any two clusters that can be formed at each step. The spatial variability of water quality in the lake was determined from hierarchical CA using the linkage distance [

Principal component analysis is a data analysis method focused on a particular collection of variables. Consider the form of the first principal component. The score for individual i on component, _{i}_{1}, uses weight _{11}, ….., _{p}_{1} in the linear combination:
_{1} is as large as possible subject to the condition that _{11}^{2} + …..+ _{p}_{1}^{2} = 1. The second principal component is another linear combination of _{j}_{2} is the maximal, subject to the conditions that corr (_{1}, _{2} )=0 and that _{12}^{2} + …...+ _{p}_{2}^{2} = 1. The criterion of summarizing the information in

FA follows PCA. FA focuses on reducing the contribution of less significant variables to simplify even more of the data structure coming from PCA. This purpose can be implemented by rotating the axis defined by PCA based on well established rules, and constructing new variables, also called varifacrors (VFs). PCA of the normalized variables was performed to extract significant PCs and to further reduce the contribution of variables with minor significance; these PCs were subjected to varimax rotation (raw) generating VFs [

The FA can be written as:

Geostatistical mapping can be defined as the analytical production of maps by using field observations, auxiliary information and a computer program that generates predictions. The isotropic semivariogram are estimated to characterize the relationship between general spatial dependence and distance among the observations. Different semivariogram models, e.g., exponential and Gaussian models, nested with nugget effects are selected separately with respect to different principle components or factor scores. The optimal parameters for semivariogram models are calculated by the weighted least squares method [

The measured results of 14 physico-chemical water quality parameters at eight sampling stations from August 2008 to June 2010 in the YYL are presented in

Cluster analysis was applied to find out the similarity groups between the sampling stations. It produced a dendogram (

The two measurement stations (1 and 2) are regarded as the cluster 2 which comprises the shallow area. Stations 3, 4, 5, and 8 are cluster 1 which corresponds to the middle water depth. Stations 6 and 7 belonging to the deep zone which constitutes cluster 3. The results show that the CA technique is useful for classification of lake waters, hence, the number of sampling sites and respective cost can be diminished in future monitoring plans. There are other reports [

Pattern recognition of correlations among 14 parameters was best summarized by PCA/FA. The Bartlett test was used on the data set to examine the suitability of these data for PCA/FA. In this study, the covariance matrix coincided with the correlation matrix which was presented in _{4}-N, TN, TSS, Chl-a, and so on. The negative correlations were revealed between some variables such as DO, Temp, NH_{4}-N, TN, Chl-a, Turb, and so on. Correlation coefficients of two elements were very useful, because they numerically represented the similarity between two elements of the two water quality variables. This also indicated that PCA could successfully reduce the dimensionality of the original data set. Therefore factor analysis of the present data set further reduced the contribution of less significant variables obtained from PCA.

The Scree plot (shown in

Liu _{4}-N, TSS, Chl-a, and Turb (turbidity). Because the NH_{4}-N concentration is a nutrient source for chlorophyll a growth, VF1 represented nitrogen source. VF2, which explained 18.08% of total variance, had a moderate positive loading on R (rainfall), WS (wind speed), TN, and pH and represents meteorological factors. VF3, explaining 11.02% of total variance, has a moderate positive loading on Ke, SD, and Turb (turbidity). This factor represents the contribution of turbidity effects in the water column. VF4, explaining 9.54% of total variance, had a moderate positive loading on NO_{3}-N and water temperature and represented the nitrate factor. The analyzed results revealed that FA/PCA can serve as an important means to identify the main factors affecting water quality in the alpine lake.

Geostatisitcal techniques were used for the mapping of principle components and factor scores over the study area. Due to the long period between each observation campaign, the temporal correlation among the observations is assumed to be ignorable in this analysis.

The spatial distribution of the PC and FA can vary over time. Our analysis shows that the spatial distributions of PC (or FA) of the observations collected in the same month are generally similar. As for the PC obtained at different months, their spatial distribution can be distinct. This variability can result from meteorological condition and physico-chemical characteristics.

The general characteristics can be seen in

Water quality data collected from eight monitoring stations located around the subtropical alpine Yuan-Yang Lake in Taiwan have been examined by unsupervised pattern recognition (CA) and display methods (PCA/FA) to yield correlations between variables and water quality similarity in the lake. Cluster analysis confirmed the existence of three types of water quality (

This study was supported by the National Science Council and Academia Sinica, Taiwan, under the grant number NSC-96-2628-E-239-012-MY3 and AS-98-TP-B06, respectively. The financial support is highly appreciated.

Location of Yuan-Yang Lake (YYL) in Taiwan and eight measurement stations in YYL.

Dendrogram of cluster analysis for sampling stations accroding to water quality paramters of YYL.

Scree plot of the characteristic roots (eigenvalues) of principal component analysis.

The experimental and modeled variograms of PC1 and FA1.

Variograms in time for PC1 and FA1.

Spatial distribution of

Spatial distribution of second principle component by ordinary kriging method on the measured data of February 14, 2009.

Results of water quality parameters at eight sampling in the YYL.

Temperature (°C) | Temp | 12.4 ± 2.88 . | 13.63 ± 3.80 | 14.30 ± 3.41 | 14.41 ± 3.66 | 14.67 ± 3.62 | 13.86 ± 3.24 | 13.83 ± 3.49 | 14.47 ± 3.75 |

Dissolved Oxygen (mg/L) | DO | 5.82 ± 0.89 | 6.49 ± 0.92 | 6.85 ± 0.75 | 6.79 ± 1.08 | 6.57 ± 1.26 | 6.11 ± 1.40 | 6.01 ± 1.30 | 6.78 ± 0.82 |

Secchi Depth (m) | SD | 0.65 ± 0.12 | 0.86 ± 0.14 | 1.79 ± 0.39 | 1.69 ± 0.40 | 1.79 ± 0.44 | 1.95 ± 0.41 | 1.92 ± 0.39 | 1.84 ± 0.36 |

Total Phosphorus (mg/L) | TP | 0.011 ±.005 | 0.014 ± 0.008 | 0.012 ± 0.006 | 0.011 ± 0.006 | 0.009 ± 0.004 | 0.009 ± 0.004 | 0.010 ± 0.004 | 0.009 ± 0.003 |

Total Nitrogen (mg/L) | TN | 0.528 ± 0.169 | 0.544 ± 0.219 | 0.452 ± 0.196 | 0.427 ± 0.115 | 0.432 ± 0.144 | 0.454 ± 0.184 | 0.448 ± 0.166 | 0.422 ± 0.169 |

Ammonium Nitrogen (mg/L) | NH_{4}-N |
0.080 ± 0.112 | 0.078 ± 0.057 | 0.074 ± 0.039 | 0.051 ± 0.037 | 0.055 ± 0.031 | 0.097 ± 0.085 | 0.100 ± 0.102 | 0.077 ± 0.089 |

Nitrate Nitrogen (mg/L) | NO_{3}-N |
0.111 ± 0.053 | 0.071 ± 0.038 | 0.083 ± 0.045 | 0.092 ± 0.042 | 0.091 ± 0.044 | 0.097 ± 0.044 | 0.095 ± 0.041 | 0.097 ± 0.045 |

Total Suspended Solids (mg/L) | TSS | 5.38 ±.02 | 5.87 ± 3.88 | 3.79 ± 2.73 | 3.19 ± 1.95 | 4.18 ± 2.84 | 3.44 ± 3.07 | 3.90 ± 3.73 | 3.57 ± 2.74 |

Turbidity (NTU) | Turb | 14.10 ± 7.60 | 16.24 ± 7.31 | 15.18 ± 6.36 | 15.25 ± 6.95 | 16.14 ± 7.72 | 18.23 ± 7.81 | 18.52 ± 8.65 | 15.83 ± 6.45 |

Chlorophyll |
Chl-a | 4.20 ± 3.44 | 7.33 ± 6.68 | 4.50 ± 3.17 | 3.49 ± 2.05 | 3.11 ± 1.98 | 6.39 ± 5.68 | 7.78 ± 10.14 | 3.83 ± 2.35 |

pH (pH unit) | pH | 5.89 ± 0.43 | 6.30 ± 0.45 | 6.42 ± 0.39 | 6.43 ± 0.29 | 6.49 ± 0.38 | 6.41 ± 0.29 | 6.48 ± 0.32 | 6.48 ± 0.34 |

Light attenuation coefficient (m^{−1}) |
Ke | 4.78 ± 2.52 | 4.87 ± 2.48 | 2.68 ± 1.17 | 2.58 ± 1.30 | 2.67 ± 1.27 | 2.84 ± 1.26 | 2.37 ± 0.87 | 4.35 ± 1.97 |

Wind Speed (m/s) | WS | 0.744 ± 0.182 | 0.744 ± 0.182 | 0.744 ± 0.182 | 0.744 ± 0.182 | 0.744 ± 0.182 | 0.744 ± 0.182 | 0.744 ± 0.182 | 0.744 ± 0.182 |

Rainfall (mm) | R | 4.318 ± 7.048 | 4.318 ± 7.048 | 4.318 ± 7.048 | 4.318 ± 7.048 | 4.318 ± 7.048 | 4.318 ± 7.048 | 4.318 ± 7.048 | 4.318 ± 7.048 |

Note: Values represent mean ± standard deviation.

Correlation matrix of water quality parameters of YYL.

_{4}-N |
_{3}-N |
|||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | ||||||||||||||

−0.38 |
1 | |||||||||||||

−0.07 | −0.04 | 1 | ||||||||||||

0.1 | 0.1 | −0.77 |
1 | |||||||||||

−0.12 | 0.26 |
−0.02 | 0.02 | 1 | ||||||||||

0.24 |
−0.15 | −0.32 |
−0.26 |
−0.27 |
1 | |||||||||

_{4}-N |
0.30 |
−0.27 |
−0.25 |
−0.32 |
−0.18 | 0.37 |
1 | |||||||

_{3}-N |
−0.26 |
−0.10 | 0.21 | 0.22 |
0.13 | −0.28 |
0.04 | 1 | ||||||

0.26 |
−0.46 |
0.17 | −0.21 | −0.23 |
0.24 |
0.35 |
0.16 | 1 | ||||||

0.15 | −0.23 |
−0.33 |
−0.32 |
−0.18 | 0.51 |
0.37 |
0.12 | 0.25 |
1 | |||||

0.17 | −0.34 |
−0.34 |
0.28 |
−0.14 | 0.39 |
0.48 |
−0.10 | 0.27 |
0.59 |
1 | ||||

0.36 |
−0.48 |
−0.18 | −0.20 | 0.01 | −0.14 | 0.55 |
0.16 | 0.36 |
0.29 |
0.42 |
1 | |||

0.02 | 0.36 |
−0.19 | 0.17 | 0.38 |
−0.05 | −0.11 | −0.30 |
−0.37 |
−0.18 | −0.09 | −0.18 | 1 | ||

−0.11 | −0.15 | −0.01 | 0.14 | −0.38 |
0.04 | −0.08 | 0.07 | 0.24 |
0.11 | 0.13 | −0.08 | −0.36 |
1 |

Values are statistically significant at p < 0.01;

values are statistically significant at p < 0.05.

Loading of 14 parameters on significant VFs for water quality data set.

Temp | 0.465 | 0.038 | 0.309 | −0.623 |

DO | −0.582 | 0.437 | −0.205 | 0.171 |

WS | −0.409 | −0.696 | 0.201 | −0.237 |

R | 0.383 | 0.735 | −0.105 | 0.218 |

SD | −0.367 | 0.330 | 0.581 | 0.254 |

TP | 0.610 | 0.224 | −0.309 | −0.218 |

NH_{4}-N |
0.718 | 0.096 | 0.252 | 0.051 |

NO_{3}-N |
−0.043 | −0.460 | 0.299 | 0.704 |

TN | 0.536 | −0.543 | 0.118 | −0.105 |

TSS | 0.698 | 0.111 | −0.163 | 0.310 |

Chl-a | 0.737 | 0.133 | −0.047 | 0.148 |

Turb | 0.655 | −0.067 | 0.533 | 0.110 |

pH | −0.314 | 0.649 | 0.217 | −0.214 |

Ke | 0.162 | −0.429 | −0.627 | 0.132 |

Eigenvalue | 3.76 | 2.53 | 1.54 | 1.34 |

Percentage of total variance | 26.89 | 18.08 | 11.02 | 9.54 |

Cumulative percentage of variance | 26.89 | 44.96 | 55.98 | 65.52 |

Variogram models used for spatial mapping.

PC1 | Nugget[0.031] + Exponential[0.466, 287.106] |

PC2 | Nugget[0.007] + Gaussian[5.004, 1116.3] |

PC3 | Nugget[0.036] + Gaussian[1.443, 259.137] |

PC4 | Nugget[0.018] + Gaussian[0.305, 215.136] |

FA1 | Nugget[0.038] + Exponential[0.157, 65.983] |

FA2 | Nugget[0.003] + Exponential[0.080, 185.810] |

FA3 | Nugget[0.010] + Gaussian[2.056, 383.165] |

FA4 | Nugget[0.010] + Gaussian[0.409, 288.627] |

Note: The notations that Nugget[ _{1} ] + Exponential(or Gaussian)[ _{2}, _{2} ] denote the nest model of nugget model effect of sill _{1} and exponential (or Gaussian) model of sill _{2} and range _{2} in meters. PC: Principal Component; FA: Factor Analysis.