Assessment of Surface Water Quality Using Multivariate Analysis: Case Study of the Crati River, Italy

: The water vulnerability of the Crati river (Calabria, Italy), was assessed by applying chemometric methods on a large number of analytical parameters. This study was applied to a data set collected in the years 2015–2016, recording 30 physical–chemical and geological parameters at 25 sampling points, measured both for water and for sediments. The processing of the data by principal component analysis (PCA) allowed for highlighting the inﬂuence of the components most responsible for pollution. The accumulation of heavy metals in the water was detected only in two samples near the source of the river. On the contrary, their concentration values in the sediments exceeded the legal limit in several sites, probably due to their proximity to urban areas. In this case, high concentrations of chromium, mercury and nickel were detected both at the mouth of the river and along the valley. Lead was only detected in one sediment sample. The multivariate analysis techniques proved to be very useful to completely characterize the areas surrounding a river course and facilitate the development of a risk map to monitor health risks to the local population.


Introduction
Environmental stability is often modified by man to meet his needs, but in many cases, this behavior involves the pollution of air, water and soil with a deterioration in the quality of life. The quality of surface waters is a useful indication of the status of a territory, reflecting the effects of human activities on natural ecosystems [1]. Urbanization, industries, land use, modern agricultural practices and animal husbandry, alone or in synergy, can affect aquatic ecosystems and, if these alterations are not limited, the natural balance of ecosystems can be irreversibly compromised.
In order to minimize the negative impact of human activities on aquatic systems, the development of rapid and reliable systems for monitoring water quality is a research priority in this field [2]. The main target for assessing the environmental status of rivers, lakes, groundwater and coastal waters is the regular detection of water pollutants and the causes of their presence [3,4].
The absence or inadequate treatment of urban waste water causes the release of organic substances, bacteria and compounds containing nitrogen or phosphorus into sewers and, in turn, into rivers [5].
This negligence, together with the waste of industrial and agricultural activities, are the main sources of water pollution, with significant repercussions on the health of plants, animals and humans.
In the present study, many physical-chemical and geological parameters were recorded during a monitoring program from September 2015 to March 2016 on the waters of the Crati River and its main tributaries, in order to prepare a risk map of the of the study area [6].
In recent years, the development of new analytical methodologies has allowed for the processing of physical and chemical data at a speed unthinkable a few decades ago and has made it possible to solve complex analytical systems through multivariate analysis, which allows for the simultaneous use of all the variables involved. Statistical information for each variable is extracted from the collected experimental data, identifying the correlations between the variables or any outlier data. In most cases, a mathematical model is built that predicts the quantitative values of a variable (the response) from the values of a set of variables (the predictors) measured for known samples. Such methods are called regression methods. In a second phase, the model can undergo an optimization process, in which the number of variables or samples can be modified and the data that appear discordant with the rest (outliers) are eliminated. This model must necessarily be subjected to a validation process, consisting of a series of statistical and applicative tests to ascertain the effectiveness of the model in predicting new samples with unknown composition.
In this work, chemometric techniques have been applied to data matrices in order to develop a multivariate model capable of correlating environmental conditions with health and quality of life [7][8][9][10][11][12][13][14][15][16]. Data processing was performed using dedicated software, capable of managing the complexity of the system and simultaneously assessing the impact of single analytical parameters. Multivariate analysis was able to select and consider the parameters containing the most representative information, thus providing a highly reliable response on the quality of the water and the surrounding environment [17][18][19]. The results of this investigation can be very useful for developing a risk map of the studied area as an important reference document for potential epidemiological studies.

Study Area
The Crati River is in Calabria, a region of south of Italy. The studied area is bounded by a watershed from the Tyrrhenian Sea to the Ionian Sea (CS) toward the south and by that of the rivers Noce, Lao, Crati and other small basins toward the north. The territory is situated between the Tyrrhenian coast and the peak of Coccovello mountain and extends to the other mountain peaks of Rocca Rossa, Murgia del Principe, Sirino and Zaccana, up to the Grattaculo. The area reaches the Pollino promontory where Dolcedorme (2271 m above sea level) reaches the highest peak in the area, followed by the peaks of Timpone Rotondella and Timpone Neviera and finally descends to the Ionian coast. This area is completely surrounded by the sea.
The studied region shows the predominantly mountainous orographic configuration with a very harsh mountain range, often with deep valleys and steep slopes. In the south of the region, the upland Sila has a considerable extension and is characterized by large shelves of around 1300 m, including the most important peaks of Botte Donato (1929 m above sea level), Montenero (1881 m), Volpintesta (1730 m) and Gariglione (1785 m).

Sampling
The monitored sites were included in the geographic information system, WGS84 World Geodetic System 1984, which describes the main characteristics of the water: name, basin code and geographic coordinates [20,21]. According to this georeferencing system, the Italian territory is located in zone 33 N (12 • to 18 • equator parallels). Table 1 lists the east and north geographic coordinates of the sampling sites monitored in this study. The sampling sites were distributed along the Crati River as follows: four at the river head (samples Crati13, Crati15, Crati16, Crati17), seven located downstream of the purifier (samples Crati2, Crati3, Crati4, Crati7, Crati8, Crati9, Crati10) and 14 at the mouth of the river (samples Crati1, Busento, Cardone, Campagnano, Emoli, Arente, Settimo, Annea, Galatrella, Finita, Turbolo, Duglia, Mucone, Coscile). These sampling sites correspond to the long-term monitoring sites of the Regional Agency for the Environment of Calabria and were therefore selected according to the protocol detailed in the Dlsg N. 60/2000 [22]. The geographical area and the sampling sites are reported in Figure 1.
The data were collected from September 2015 to July 2016. Sampling was carried out according to Italian standards [23]. All water samples were collected in sterile plastic and glass bottles for the analysis of inorganic and organic compounds, respectively, and stored at 4 • C until analysis. These samples were analyzed without treatment. The sediment sampling was carried out at points located far from drains and the riverside by means of a cylindrical core, drilling to a depth of about 10 cm by vertical sinking. Each sediment surface sample weighing 1 kg was sealed in a transparent plastic bag to minimize sample contamination [24]. The sediment samples, still wet, were weighed, homogenized and transferred to a glass container. Subsequently, the samples were dried in a stove at a constant temperature of 105 • C until a constant weight, which was recorded as dry weight. Finally, the samples were ground by crushing any agglomerates and sieved through a 2-mm net. For the measurement of radioactivity, the sediment samples were placed in a 1000 mL Marinelli beaker.
Water 2020, 12, x FOR PEER REVIEW 4 of 12 pesticides and heavy metals and the measurement of radioactivity were performed within 48 h of sampling. The analytical parameters were averages of six determinations. All the relative standard deviation values fell within the range 1.16-5.03%, demonstrating both the low uncertainty and the robustness of the analytical methods applied.

Instruments and Software
Analytical determinations were carried out at the Physics Laboratory and Water Unit of the Regional Environment Agency of Calabria, Department of Cosenza, Italy. The concentration of pesticides was determined by GC-MS using the Agilent 5975C GC system coupled with the Agilent 5973 MS detector. The concentration of heavy metals was performed by the Shimadzu UV-160 atomic absorption spectrophotometer at a wavelength of 220 and 882 nm in a 1-cm quartz cell and by an Agilent 7500 to carry out inductively coupled plasma spectrometry using argon gas.
The Unscrambler X 10.5 ® software package (Camo Process As., Oslo, Norway), equipped with several statistical packages, supported the application of multivariate algorithms. It also allowed for the optimization of the calibration models and the development of validation procedures.

Multivariate Analysis
The application of chemometric techniques is very useful for describing many variables in an analytical system and to define possible relationships between them. Principal Component Analysis (PCA) is one of the most important data reduction methods for a multivariate data set [17][18][19]. It is characterized by the ability to reduce the dimensionality of a data matrix, while retaining most of the The analytical parameters were averages of six determinations. All the relative standard deviation values fell within the range 1.16-5.03%, demonstrating both the low uncertainty and the robustness of the analytical methods applied.

Instruments and Software
Analytical determinations were carried out at the Physics Laboratory and Water Unit of the Regional Environment Agency of Calabria, Department of Cosenza, Italy. The concentration of pesticides was determined by GC-MS using the Agilent 5975C GC system coupled with the Agilent 5973 MS detector. The concentration of heavy metals was performed by the Shimadzu UV-160 atomic absorption spectrophotometer at a wavelength of 220 and 882 nm in a 1-cm quartz cell and by an Agilent 7500 to carry out inductively coupled plasma spectrometry using argon gas.
The Unscrambler X 10.5 ® software package (Camo Process As., Oslo, Norway), equipped with several statistical packages, supported the application of multivariate algorithms. It also allowed for the optimization of the calibration models and the development of validation procedures.

Multivariate Analysis
The application of chemometric techniques is very useful for describing many variables in an analytical system and to define possible relationships between them. Principal Component Analysis (PCA) is one of the most important data reduction methods for a multivariate data set [17][18][19]. It is characterized by the ability to reduce the dimensionality of a data matrix, while retaining most of the original information. A linear combination is applied to transform the original variables (X) into a limited number of new principal components (PCs) where t n are the score values, p n are the loading values and E is the residual matrix. This chemometric approach defines the minimum number of PCs capable of describing the total sum of the data matrix square. The object classes are defined by scores and loadings. The scores contain all the information concerning the objects (experiment, sample, etc.) and correspond to their projection in the space of the principal components. The loadings are instead the projections of the variables (x, y and z) in the PC space.
Scores and loadings can be represented on a bi-plot. This is a two-dimensional scatter plot or a score map for two specific components (PCs), with the X-loadings displayed on the same plot. It enables the simultaneous interpretation of sample properties and variable relationships.

Data Sets
Two data sets were built, respectively, for water and sediment samples. Each set was built using 30 analytical parameters per site, for a total of 750 values. Firstly, a selection of the variables carrying the most useful information was made. This selection represents a critical step that must be carefully considered, because the exclusion of important variables can lead to misleading results in building the model. The amount of relevant information does not necessarily increase when multiple variables are included, indeed, random noise may even increase. Figure 2 shows the bar plot of the loadings, describing the different contributions of each variable in building the principal components. The importance of the different variables for the components is here evident.
where tn are the score values, pn are the loading values and E is the residual matrix. This chemometric approach defines the minimum number of PCs capable of describing the total sum of the data matrix square. The object classes are defined by scores and loadings. The scores contain all the information concerning the objects (experiment, sample, etc.) and correspond to their projection in the space of the principal components. The loadings are instead the projections of the variables (x, y and z) in the PC space.
Scores and loadings can be represented on a bi-plot. This is a two-dimensional scatter plot or a score map for two specific components (PCs), with the X-loadings displayed on the same plot. It enables the simultaneous interpretation of sample properties and variable relationships.

Data Sets
Two data sets were built, respectively, for water and sediment samples. Each set was built using 30 analytical parameters per site, for a total of 750 values. Firstly, a selection of the variables carrying the most useful information was made. This selection represents a critical step that must be carefully considered, because the exclusion of important variables can lead to misleading results in building the model. The amount of relevant information does not necessarily increase when multiple variables are included, indeed, random noise may even increase. Figure 2 shows the bar plot of the loadings, describing the different contributions of each variable in building the principal components. The importance of the different variables for the components is here evident. The variance value was adopted as a discriminating criterion for selecting the parameters, discarding those showing a relative standard deviation (RSD) value of less than 10%. The raw data were normalized before applying PCA by using the weighted standard deviation procedure, in order The variance value was adopted as a discriminating criterion for selecting the parameters, discarding those showing a relative standard deviation (RSD) value of less than 10%. The raw data were normalized before applying PCA by using the weighted standard deviation procedure, in order to balance the weight of each value measured on different scales on a common scale. Tables 2 and 3 list the values of the selected parameters (means of six measurements) from the analysis of water and sediment samples, respectively.

Results
The PCA multivariate approach was applied to the data sets, considering the sampling sites as objects and the measured parameters as variables. The algorithm decomposed the original variables into main components (PCs). Then, the original input matrix (matrix X) was transformed into the matrix of the multivariate model. This matrix consisted of two new matrices represented by the scores, containing all the information concerning the objects (experiment, sample, etc.) and the loadings, which were the projection of the variables (x, y and z) in the PC space. Figures 3 and 4 show the bi-plot graphs, which respectively represent scores and loadings for the water and sediment samples. The graphs show how the sampling sites could be grouped according to the variables considered.
matrix of the multivariate model. This matrix consisted of two new matrices represented by the scores, containing all the information concerning the objects (experiment, sample, etc.) and the loadings, which were the projection of the variables (x, y and z) in the PC space. Figures 3 and 4 show the bi-plot graphs, which respectively represent scores and loadings for the water and sediment samples. The graphs show how the sampling sites could be grouped according to the variables considered.

Discussion
The bi-plots reported in Figures 3 and 4 enable the identification of the main characteristics of the sampling points through a simultaneous evaluation of the chemical-physical parameters. This allows for a clearer and more immediate evaluation with respect to the interpretation of the data of Tables 2 and 3, which is usually difficult to carry out because the variables are evaluated one at a time.
The analysis of the data pattern through chemometric processing identified different groups (clusters) of samples with distributions reflecting the geographical position. The distance between the samples is the criterion usually adopted to establish their similarity, two samples close to each other in PC space are more similar than the others. From the viewpoint of the variables, their distances and distribution in the PC space help to highlight direct or inverse correlations. Variables in the same quadrant are directly correlated, while variables in diagonally opposed quadrants are negatively correlated.
According to the results represented in both bi-plots, a risk map can be elaborated, as reported in Figure 5.

Discussion
The bi-plots reported in Figures 3 and 4 enable the identification of the main characteristics of the sampling points through a simultaneous evaluation of the chemical-physical parameters. This allows for a clearer and more immediate evaluation with respect to the interpretation of the data of Tables 2 and 3, which is usually difficult to carry out because the variables are evaluated one at a time.
The analysis of the data pattern through chemometric processing identified different groups (clusters) of samples with distributions reflecting the geographical position. The distance between the samples is the criterion usually adopted to establish their similarity, two samples close to each other in PC space are more similar than the others. From the viewpoint of the variables, their distances and distribution in the PC space help to highlight direct or inverse correlations. Variables in the same quadrant are directly correlated, while variables in diagonally opposed quadrants are negatively correlated.
According to the results represented in both bi-plots, a risk map can be elaborated, as reported in Figure 5. The water and sediment withdrawal sites were colored according to the values of the PC1 and PC2 scores [31]. PC1 in water monitoring (29.0% of explained variance, EV) showed higher positive loading values for nitrogen, P and As content, by identifying a corresponding cluster of samples in the risk map. Samples near the river source and the main tributary channel have similar characteristics regarding ammonium, total nitrogen, chromium and nickel, which remain at relatively lower values than in the other sampling points. For the withdrawals along the Crati Valley, Crati 4,7,8,9 and 10, similar values are obtained for the arsenic, total nitrogen, ammonia, phosphorus and nickel parameters, with values above average ( Figure 5). Information stored in PC2 (19% EV) grouped the samples collected at the mouth of the river, Crati 1, 2, 3, Coscile and Turbolo, have similar concentrations of total chromium, selenium and boron. It should be noted that, in this territorial area, the tributaries have similar characteristics to the sites near the river mouth, probably due to the fact that the effluent withdrawals were also made downstream and therefore can be enriched or contaminated by the same pollutants. This type of pollution could be due to anthropogenic activities in the sampling sites studied. In fact, in this area, it was possible to detect the presence of construction sites, agricultural crops and landfills.
The analysis of sediment showed less correlation between the samples and geographic location. However, through PCA and the graphical reports (bi-plots), it was possible to outline a distribution of the withdrawal points on the risk map. In particular, PC1 (49% EV) contained information relating to cadmium, zinc, copper, iron, aluminum, nickel and chromium, with a cluster of samples characterized by high concentrations of these parameters in Crati 2, 3, 15, 16, 17 and Busento. PC2 (17% EV) described the cluster of the samples Arente, Cardone, Crati 8, 9 and 15 as having higher values of mercury, arsenic and lead. The Crati 8 and Cardone sites had high values of mercury and Crati 9 a concentration of lead above the allowed limits. The analyzed data showed that the limits established by the environmental quality standards reported in the Decree 260 of 08/11/2010 were exceeded only with regard to the analysis of metals.
Many metals are dangerous because they insidiously penetrate our bodies through various vectors and tend to accumulate. Polluted surface waters can be included among the main culprits of these types of contaminants. In general, the values that exceed the limits established by current legislation are due to the anthropic activities that gravitate to the sampling sites: construction sites, agricultural crops and, in particular, the presence of industrial waste landfills, without excluding the biological crops. The water and sediment withdrawal sites were colored according to the values of the PC1 and PC2 scores [31]. PC1 in water monitoring (29.0% of explained variance, EV) showed higher positive loading values for nitrogen, P and As content, by identifying a corresponding cluster of samples in the risk map. Samples near the river source and the main tributary channel have similar characteristics regarding ammonium, total nitrogen, chromium and nickel, which remain at relatively lower values than in the other sampling points. For the withdrawals along the Crati Valley, Crati 4, 7, 8, 9 and 10, similar values are obtained for the arsenic, total nitrogen, ammonia, phosphorus and nickel parameters, with values above average ( Figure 5). Information stored in PC2 (19% EV) grouped the samples collected at the mouth of the river, Crati 1, 2, 3, Coscile and Turbolo, have similar concentrations of total chromium, selenium and boron. It should be noted that, in this territorial area, the tributaries have similar characteristics to the sites near the river mouth, probably due to the fact that the effluent withdrawals were also made downstream and therefore can be enriched or contaminated by the same pollutants. This type of pollution could be due to anthropogenic activities in the sampling sites studied. In fact, in this area, it was possible to detect the presence of construction sites, agricultural crops and landfills.
The analysis of sediment showed less correlation between the samples and geographic location. However, through PCA and the graphical reports (bi-plots), it was possible to outline a distribution of the withdrawal points on the risk map. In particular, PC1 (49% EV) contained information relating to cadmium, zinc, copper, iron, aluminum, nickel and chromium, with a cluster of samples characterized by high concentrations of these parameters in Crati 2, 3, 15, 16, 17 and Busento. PC2 (17% EV) described the cluster of the samples Arente, Cardone, Crati 8, 9 and 15 as having higher values of mercury, arsenic and lead. The Crati 8 and Cardone sites had high values of mercury and Crati 9 a concentration of lead above the allowed limits. The analyzed data showed that the limits established by the environmental quality standards reported in the Decree 260 of 08/11/2010 were exceeded only with regard to the analysis of metals.
Many metals are dangerous because they insidiously penetrate our bodies through various vectors and tend to accumulate. Polluted surface waters can be included among the main culprits of these types of contaminants. In general, the values that exceed the limits established by current legislation are due to the anthropic activities that gravitate to the sampling sites: construction sites, agricultural crops and, in particular, the presence of industrial waste landfills, without excluding the biological crops.
As highlighted in the present work, multivariate analysis can be very useful in the production of a risk map that highlights the most critical areas and provides indications to the territorial control offices.

Conclusions
The monitoring of the Crati River in southern Italy was carried out for a period of 7 months by analyzing the waters and sediments of 25 sampling sites distributed along the river and its main tributaries. Two data sets were built, each including 750 values, recording 30 analytical parameters per site and processed with multivariate methods. These procedures first allowed the selection of the parameters carrying the most useful information. The selected data were elaborated by principal component analysis with the aim to find the relationship between the chemical composition of the sampling sites and the different geographical locations. The multivariate analysis allowed the clustering of the samples and the elaboration of a risk map. The results showed that, in the sites close to the river source, there was a concentration above the allowed limits of chromium in the water and nickel in the sediments, probably due to the presence of construction material. In the Crati Valley, however, where most of the inhabited centers are concentrated and therefore the presence of waste and landfills is massive, there are high levels of mercury, nickel and lead, both in water and in sediments. The sites near the river mouth had concentrations higher than the norm of chromium and nickel, in both matrices analyzed. The chemometric models have proven to be valid for the characterization of the environmental matrices. They are a useful tool to objectively and reliably describe the quality level of water and sediment matrices and to highlight the changes that can prove dangerous for the local population. The predictive capabilities of the model provide a valid means for routine analysis, at the service of the offices that manage and control water and intervention strategies. In fact, the obtained results allowed us to produce a risk map able to provide information to better assess the risk of pollution and provide assistance to communities to help them act to reduce risks to human health and the environment.