Assessing the Groundwater Quality in the Liwa Area, the United Arab Emirates

Last period groundwater quality raises big concerns all over the world since it is a limited source of drinkable water and for agricultural and industrial use. While the suitability of the groundwater of Liwa aquifer (Abu Dhabi Emirate) for agricultural use has been previously partially studied, not all the water parameters have been taken into account. Therefore, in this paper, we propose the study of 42 concentrations series of 19 groundwater parameters. We test the hypothesis that the water parameters series recorded at different locations are similar and group the samples in clusters. The main parameters that determine the differences between the clusters are determined by Principal Component Analysis (PCA). Finally, we use a quality index for assessing the water suitability for drinking. The conclusions emphasize the necessity of using more than one technique to evaluate water quality for different purposes and to cross-validate the results.


Introduction
All over the world, billions of people suffer from water scarcity because less than 1% of the world's water is fresh and accessible [1]. The remediation of polluted aquifers is generally difficult and in many cases, is not possible. Therefore, the study of water quality is an essential study topic for water resources scientists [2][3][4][5][6][7][8][9][10][11][12] as a first step for taking informed measures for keeping it clean and as a warning for reducing its pollution [12].
MIKE-II, QUASAR, QUAL2E, and CE-QUALW2, SIMCAT, TOMCAT are water quality models providing comprehensive modeling of water quality conditions in river systems [13]. They are developed for particular purposes and none of them are best. For assessing the surface water quality, researchers [14][15][16][17][18][19] introduced water quality indices (WQI), the most known being CCMEWQI [16]. A review of the most important ones, containing their composition, structure, and comparison is provided in [14]. Other approaches use univariate statistical methods, as time series analysis, for describing the temporal [6][7][8]10] evolution of some water parameters. Bhat et al. [8] and Ioele et al. [20] employed multivariate statistical tools. ANOVA is utilized for evaluating the differences among the series of water quality variables recorded at the study sites. Cluster analysis (CA) allows selecting the groups of sites with similar characteristics (concentrations, pH) of water parameters. Principal Component Analysis (PCA) leads to the detection of the main water parameters that influence water quality [20]. Gad et al. [21] combined a drinking water quality index and four pollution indices, principal component analysis (PCA), partial least squares regression (PLSR), and stepwise multiple linear regression (SMLR) to evaluate the water quality for drinking purposes in the Nile Delta. Assessing the groundwater quality became a study topic in 1968 when the "groundwater vulnerability" notion was introduced by Margat [22]. The definitions of this concept [22][23][24] aim at catching the interaction between a contaminant applied in the soil vicinity (or its surface) and the aquifer, during the pollutant's transportation by the rainwater. The physicochemical reactions and their effects on the groundwater depend on hydrogeological conditions and the pollutant characteristics, quantity, and exposure time [25][26][27].
For assessing the groundwater suitability for drinking or agricultural use at different locations, scientists [56][57][58][59] use indices like Chloro-alkaline index (CAI), Saturation index (SI), Sodium Absorption Ratio (SAR), Residual Sodium Carbonate (RSC), Kelley's ratio, and magnesium hazard. A single WQI (single-factor pollution index (I), the nemerow pollution index (NI), heavy metal evaluation index (HEI), the degree of contamination (Cd)), or geostatistical methods can also be utilized to emphasize the groundwater pollution at a regional scale [60,61].
In this article, we propose a combined methodology for studying the groundwater characteristics at a regional scale. Firstly, we test the similarity of the series of water parameters (collected at different sites), then we cluster the sites with the same characteristics and perform the Principal Components Analysis (PCA) for extracting the significant components. Then, we assess the suitability of water for drinking using a water quality index. Finally, we compare the obtained results with those provided by the literature and conclude.

Study Area
Abu Dhabi Emirate, the largest emirate of the United Arab Emirates, is situated along the Arabian Gulf, between 22.5 • and 25 • north latitudes and 51 • -55 • east longitudes. The study area, Liwa, belongs to Abu Dhabi Emirate. Water samples were collected in the northern part of the Liwa Crescent, between Madinat Zayed and Meziyrah. The distribution of the drilling wells is presented in Figure 1, whereas their coordinates are given in Table 1.  The mean monthly temperature in the region is between 20 • C and 35 • C, with minima between 13 • C and 29 • C and maxima in the interval 31 • C-48 • C. The average humidity varies from 59% to 68%, with a maximum of about 79%. The maximum monthly average precipitation recorded from 2003 till 2017 was 16 mm (in December) and 10 mm (in January), without precipitation from May to October.
In the Liwa area, continental and shallow water marine sedimentary rocks were deposited from Cambrian to Quaternary. Liwa's aquifer lithology is composed of two essential stratigraphic units. The first one has a thickness between 100 m and 150 m and is formed by a Quaternary part, Holocene, and Pleistocene Aeolian fine to medium sands and interdunal deposits. The second one, with a thickness of over 350 m, is a Tertiary unit formed by mudstones, evaporites, and clastics of Miocene age [62]. The shallow aquifer formation consists of sand and sandstone, with a variable thickness underlain by siltstone, claystone, and evaporites. In the Liwa area, the altitude of the groundwater level is between 60 m and 107 m a.s.m.l. (above mean sea level). The shape of the groundwater table is concave down, elongated from East to West. Its top is situated approximately 25 km north of Mezairaa [63]. The gradient of the groundwater table is less than 0.5 m/km in the east-west direction, 0.5 m/km in the southern part, and more than 1 m/km in the northern region.
The principal aquifer of the Western Region, situated in the northern Liwa area, consists of the upper subunit of the Quaternary sediments. To the west, the aquifer extends to the Sabkha Matti area, while to the East, it borders the gravel plains located at the foot of the Oman Mountains. The average thickness of the principal aquifer varies between 30 m to 50 m. Overlaying sands dunes, forming a thick unsaturated zone, cover the aquifer. The lower subunit of the Quaternary sediments represents a fully saturated aquitard, situated above the aquiclude consisting of the Tertiary Lower Fars unit [62,63] area, while to the East, it borders the gravel plains located at the foot of the Oman Mountains. The average thickness of the principal aquifer varies between 30 m to 50 m. Overlaying sands dunes, forming a thick unsaturated zone, cover the aquifer. The lower subunit of the Quaternary sediments represents a fully saturated aquitard, situated above the aquiclude consisting of the Tertiary Lower Fars unit [62,63]. A schematic description of the Liwa aquifer and well cross-section are shown in Figure 2.

Experimental Study
For the present study, we collected groundwater samples from 41 wells situated in the Liwa zone in March 2018 and stored them in polyethylene bottles of 1 L capacity. We followed the methods of the American Public Health Association for the samples' preservation and analysis [64].
pH, electrical conductivity (EC), and total dissolved solids (TDS) were determined at the sampling sites using a pH-meter, a portable EC-meter, and a TDS-meter (Hanna Instruments, Ann Arbor, MI, USA). The sodium (Na + ), potassium (K + ), magnesium (Mg 2+ ), and calcium (Ca 2+ ) ions were determined by atomic absorption spectrophotometry (AAS), while the carbonate and bicarbonate were analyzed by volumetric methods. Sulfate (SO 4 2− ) was estimated by the colorimetric and turbidimetric methods. The nitrate concentration was measured by ionic chromatography. Trace elements (Cd, Cr, Zn, Pb, Cu, Ni, Mn) were determined by Inductively Coupled Plasma spectrophotometer (ICP-OES, Agilent, CA, USA). One can find the results of the chemical analyses in [62].

Methodology
The statistical analysis performed on the series of water parameters consisted of the following.

Experimental Study
For the present study, we collected groundwater samples from 41 wells situated in the Liwa zone in March 2018 and stored them in polyethylene bottles of 1 L capacity. We followed the methods of the American Public Health Association for the samples' preservation and analysis [64].
pH, electrical conductivity (EC), and total dissolved solids (TDS) were determined at the sampling sites using a pH-meter, a portable EC-meter, and a TDS-meter (Hanna Instruments, Michigan, USA). The sodium (Na + ), potassium (K + ), magnesium (Mg 2+ ), and calcium (Ca 2+ ) ions were determined by atomic absorption spectrophotometry (AAS), while the carbonate and bicarbonate were analyzed by volumetric methods. Sulfate (SO4 2− ) was estimated by the colorimetric and turbidimetric methods. The nitrate concentration was measured by ionic chromatography. Trace elements (Cd, Cr, Zn, Pb, Cu, Ni, Mn) were determined by Inductively Coupled Plasma spectrophotometer (ICP-OES, Agilent, CA, USA). One can find the results of the chemical analyses in [62].

Methodology
The statistical analysis performed on the series of water parameters consisted of the following.

 Determine the variability of the water parameters at different locations
For this aim, the basic statistics, the histograms, and the boxplots of the series were studied. Comparisons of the series values with the maximum admissible limits have also been performed.

Determine the variability of the water parameters at different locations
For this aim, the basic statistics, the histograms, and the boxplots of the series were studied. Comparisons of the series values with the maximum admissible limits have also been performed.

Experimental Study
For the present study, we collected groundwater samples from 41 wells situated in the Liwa zone in March 2018 and stored them in polyethylene bottles of 1 L capacity. We followed the methods of the American Public Health Association for the samples' preservation and analysis [64]. pH, electrical conductivity (EC), and total dissolved solids (TDS) were determined at the sampling sites using a pH-meter, a portable EC-meter, and a TDS-meter (Hanna Instruments, Michigan, USA). The sodium (Na + ), potassium (K + ), magnesium (Mg 2+ ), and calcium (Ca 2+ ) ions were determined by atomic absorption spectrophotometry (AAS), while the carbonate and bicarbonate were analyzed by volumetric methods. Sulfate (SO4 2− ) was estimated by the colorimetric and turbidimetric methods. The nitrate concentration was measured by ionic chromatography. Trace elements (Cd, Cr, Zn, Pb, Cu, Ni, Mn) were determined by Inductively Coupled Plasma spectrophotometer (ICP-OES, Agilent, CA, USA). One can find the results of the chemical analyses in [62].

Methodology
The statistical analysis performed on the series of water parameters consisted of the following.

 Determine the variability of the water parameters at different locations
For this aim, the basic statistics, the histograms, and the boxplots of the series were studied. Comparisons of the series values with the maximum admissible limits have also been performed.

Study of the similarity of the series collected at different sites
For the rest of the study, except for the computation of the water quality index, data were standardized by dividing the concentration of each element by the maximum concentration of the element. Then, to test the hypothesis that there is no statistically significant difference between the Water 2020, 12, 2816 5 of 17 water elements in the samples collected at different study places, the Kruskal-Wallis nonparametric test [65] was performed at a 5% significance level. If the null hypothesis was rejected, the Dunn post hoc test was applied to determine the pairs of dissimilar samples [66].
Michigan, USA). The sodium (Na ), potassium (K ), magnesium (Mg ), and calcium (Ca ) ions were determined by atomic absorption spectrophotometry (AAS), while the carbonate and bicarbonate were analyzed by volumetric methods. Sulfate (SO4 2− ) was estimated by the colorimetric and turbidimetric methods. The nitrate concentration was measured by ionic chromatography. Trace elements (Cd, Cr, Zn, Pb, Cu, Ni, Mn) were determined by Inductively Coupled Plasma spectrophotometer (ICP-OES, Agilent, CA, USA). One can find the results of the chemical analyses in [62].

Methodology
The statistical analysis performed on the series of water parameters consisted of the following.

 Determine the variability of the water parameters at different locations
For this aim, the basic statistics, the histograms, and the boxplots of the series were studied. Comparisons of the series values with the maximum admissible limits have also been performed.

Perform data clustering
The classification of the series into homogenous groups within which the patterns are the same was done by using the k-means clustering algorithm [67]. This algorithm determines groups of data series based on a criterion of error minimization, which computes the distance of instances to their representative values. The stages of the k-means algorithm are summarized below [68][69][70].
Let us consider that Z = (z 1 , z 2 , ..., z m ), z i ∈ R n , i = 1, m is the vector that contains the series concentrations measures at each location (column j is composed of data from site j).
(1) Firstly, the number of clusters, k, is selected.
This number is either introduced by the user or computed based on different algorithms. Among the methods used to determine the optimal number of clusters, the most popular are the elbow, the silhouette, and the gap statistic methods [71]. In this article, we employed the facilities of the R software, especially the NbClust, which provides 30 algorithms for the detection of the optimal k. The best number of clusters is selected according to the majority rule [72].
(2) The clusters' centroids ϑ 1 , . . . , ϑ k ∈ R n are initialized and the distances between the data points and the cluster centers are computed. Each point is assigned to the cluster that minimizes the distances from it to the clusters' centers. (3) The new clusters' centers are determined, the procedure restarts from (2) and runs until no data point can be reassigned to another cluster. Then, stop.

Experimental Study
For the present study, we collected groundwater samples from 41 wells situated in the Liwa zone in March 2018 and stored them in polyethylene bottles of 1 L capacity. We followed the methods of the American Public Health Association for the samples' preservation and analysis [64].
pH, electrical conductivity (EC), and total dissolved solids (TDS) were determined at the sampling sites using a pH-meter, a portable EC-meter, and a TDS-meter (Hanna Instruments, Michigan, USA). The sodium (Na + ), potassium (K + ), magnesium (Mg 2+ ), and calcium (Ca 2+ ) ions were determined by atomic absorption spectrophotometry (AAS), while the carbonate and bicarbonate were analyzed by volumetric methods. Sulfate (SO4 2− ) was estimated by the colorimetric and turbidimetric methods. The nitrate concentration was measured by ionic chromatography. Trace elements (Cd, Cr, Zn, Pb, Cu, Ni, Mn) were determined by Inductively Coupled Plasma spectrophotometer (ICP-OES, Agilent, CA, USA). One can find the results of the chemical analyses in [62].

Methodology
The statistical analysis performed on the series of water parameters consisted of the following.  Determine the variability of the water parameters at different locations For this aim, the basic statistics, the histograms, and the boxplots of the series were studied. Comparisons of the series values with the maximum admissible limits have also been performed.
Perform the Principal Component Analysis (PCA) [73][74][75] In hydrological studies, PCA is a tool for finding the water quality parameters that describe the processes that govern the groundwater chemistry and extract valuable information using only the significant variables. PCA is a statistical procedure of the multivariate analysis, designed for reducing the variables' number and replacing them with a few artificial ones (Principal Components-PCs). These PCs are independent factors that mainly explain the study phenomenon and sum up a significant amount of variance. There are many criteria used for extracting the principal components; among them, the most used are the Catell Scree plot, the Kaiser criterion (Kaiser), and the Explained Variance Criterion. The first PC accounts for the highest variability are emphasized on the Scree plot [74]. Kaiser criterion [76] takes into account the selection of those PCs that correspond to eigenvalues greater than 1 [77]. Here, we used both the Scree plot and the Kaiser criteria for detecting the PC.
The biplot shows the contributions of the variables on the PCs. The loadings emphasize the correlations between the input variables and the factors.
For a deeper insight into the PCA and its implementation in R software, the reader may see [74,75,78,79].

Experimental Study
For the present study, we collected groundwater samples from 41 wells situated in the Liwa zone in March 2018 and stored them in polyethylene bottles of 1 L capacity. We followed the methods of the American Public Health Association for the samples' preservation and analysis [64]. pH, electrical conductivity (EC), and total dissolved solids (TDS) were determined at the sampling sites using a pH-meter, a portable EC-meter, and a TDS-meter (Hanna Instruments, Michigan, USA). The sodium (Na + ), potassium (K + ), magnesium (Mg 2+ ), and calcium (Ca 2+ ) ions were determined by atomic absorption spectrophotometry (AAS), while the carbonate and bicarbonate were analyzed by volumetric methods. Sulfate (SO4 2− ) was estimated by the colorimetric and turbidimetric methods. The nitrate concentration was measured by ionic chromatography. Trace elements (Cd, Cr, Zn, Pb, Cu, Ni, Mn) were determined by Inductively Coupled Plasma spectrophotometer (ICP-OES, Agilent, CA, USA). One can find the results of the chemical analyses in [62].

Methodology
The statistical analysis performed on the series of water parameters consisted of the following.

 Determine the variability of the water parameters at different locations
For this aim, the basic statistics, the histograms, and the boxplots of the series were studied. Comparisons of the series values with the maximum admissible limits have also been performed.

Assessing the suitability of water for drinking
For this aim, we used a weighted arithmetic Water Quality Index (WQI) [80][81][82]. The WQI index is built as follows: (1) choose the water parameters used in the computation; (2) compute the quality rating (q i ) for each parameter by: where C i is the actual concentration in sample i, 1/S i is the admissible value, and C ideal = 0 for all, but pH, for which C ideal = 7; (3) Compute the weight of each water parameter, i, by: where n is the number of water parameters taken into consideration; (4) Compute the WQI by:  Table 2 contains the basic statistics of the study data series. High amplitudes of the registered values are found for almost all study elements, with very high standard deviations for EC, TDS, Na + , Cl − , Ca 2+ , and Mg 2+ . The higher the standard deviation is, the higher the variation of the values of a series about the mean is. The coefficients of variation show high variability of Zn, Mn, Cd, and Ni concentrations by comparison to the other series.

Results and Discussion
The boxplots of data series (Figure 3) show the existence of some outliers for pH, K + , SO 4 2− , EC, HCO 3 − , Cd, Cu, Mn, Ni, Pb, and Zn series. In the presence of outliers, the standard deviation becomes high. This is the situation, for example, for the EC series, which presents very high outliers (3003, 2995 µs/cm) compared to the other values, significantly augmenting the standard deviation.
Still, the pH remains in the admissible limits, its values being between 6.19 and 7.19. Also, the groundwater is not contaminated with Cd, Cr, Cu, Mn, Ni, and Zn, with the maximum concentrations of these elements being well below the admissible limits (0.003, 0.05, 2, 0.5, 0.07, and 3 mg/L, respectively). Only samples 9 and 11 contain Pb in concentrations (0.1117 and 0.01229 mg/L) greater than the prescribed limit (0.01 mg/L).
Comparing the determined values of the water parameters with the WHO's drinking water standards [57,83], it results that Na + and chlorides concentrations exceeded the admissible potability limits (200 mg/L and 250 mg/L, respectively) between 3.19 and 26.51 times (for Na + ) and between 4.13 and 48.14 times (for chlorides), respectively. In the aquifer system, sodium is mainly derived from the dissolution of salt minerals and silicate weathering [84]. EC has 80% of values above the permissible limit for drinkable water (1000 µs/cm). The variation of EC values could be explained by rock water interaction and anthropogenic influences, like agricultural run-off and wastewater discharges [57]. TDS has 80.5% of values above 600 mg/L, with 35.7% of water samples being in the category of highly undrinkable water (>1000 mg/L).
The concentrations of Ca 2+ exceed the permissible limits (75 mg/L) between 1.39 and 16.60 times. The contribution of Ca 2+ content in water is dependent on the solubility of CaCO 3 and CaSO 4 [85]. Figure 4 presents the histograms, showing asymmetric and non-uniform distributions of the water parameters. Zn, Cd, Mn, and Co series present the highest right-skewness, whereas Na + and K + have the lowest asymmetry.
To classify the water samples based on their chemical composition, clustering has been performed. The best number of clusters was found to be 3. The clusters are presented in Figure 5. The Kruskal-Wallis test rejected the null hypothesis that the data series obtained at different sites have the same distribution. The post hoc Dunn test emphasized the dissimilarities between the samples, listed in Table 3. The significance of the numbers' combinations is explained in the following. For example, in the first column and the fourth row, "4; 26, 27" means that Sample 4 and Sample 26 do not have the same distribution and Sample 4 and Sample 27 also do not have the same distribution. On another hand, in the first column and 12th row, "12; -" means that there is no dissimilarity between Sample 12 and all the samples with a label greater than 12. Being a diagonal table, the dissimilarities with the samples labelled with a number smaller than 12 are already displayed in the previous rows (and first column); in this example, these are 6, 7, 9, and 11. To classify the water samples based on their chemical composition, clustering has been performed. The best number of clusters was found to be 3. The clusters are presented in Figure 5. There is a clear separation between the clusters, as expected for a good clustering. Comparing the results from Table 4 and Figure 5, one can remark their concordance. For example, 6, 7, 9, and 11 (situated   The sites in the first cluster are mainly situated along communication roads, so the effect of pollution is higher. The water in this cluster is highly undrinkable: TDS >1000 ppm, the sodium content is at least 12 times bigger than the admissible value (200 mg/L), the chloride content is more than 20 times higher than the upper limit (250 mg/L).
For the elements in the second cluster, pH is between 6.12-6.92, EC in the interval 2465-3003 (µS/cm), and TDS is between 1287 and 1585 ppm (higher than the values determined in the samples from the first cluster). The sodium and potassium concentrations in the samples from the second cluster are generally higher than those in the samples from the first one: the average values in the second (first) cluster are 3870 (3733) mg/L and 13.24 (10.74) mg/L, respectively. The same is true for the chlorides for which an average of 8630 (7277) mg/L is detected in the second (first) cluster.
Significant differences are found between the average values of carbonate, bicarbonate calcium, and magnesium, which are 87.6, 32.94, 1125, and 587 mg/L for the second cluster, whereas for the first one, they are 50.18, 106.59, 856, and 349 mg/L, respectively.
The mean concentrations of Cu and Zn (Pb) are 3 and 2.5 times higher (4 times smaller), respectively, in the second cluster by comparison to the first one.
The in the second cluster than in the third one. The nitrates, sulfates, and Ni concentrations are the highest in the samples from the second cluster and the lowest in the last one. The next step was to perform PCA for selecting the main components that determine the water quality. The Scree plot ( Figure 6) was used for selecting the principal components (PCs). Figure 6 shows that the first two components have the highest contribution to explain the variance (38.2% and 13.8%).  Table 4 shows the summary of PCA (standard deviation, the proportion of variance, and cumulative proportion) for the first eight PCs. Generally, the selected principal components are those of which their corresponding eigenvalues are greater than 1. In this case, the first five components have eigenvalues greater than one and explain 73.76% of the variance (which is an acceptable percentage). Since the percentage of explained variation is high, we shall keep only PC1-PC5.
The contributions of the variables on the first two PCs are presented in the biplot (Figure 7). The distances from the variables to the origin represent the variables' quality on the factor map. The pairs (pH and Ni), (CO 3 2− , and HCO 3 − ) are strongly negatively correlated (since the segments represented in Figure 7 are opposite). Sites 7, 6, 9, and 11 ( Figure 7, top left-hand side) are opposite to 26 and 27 (Figure 7, top right-hand side), reconfirming the classification of the four sites (7, 6, 9, and 11) in a separate cluster. The same is true for sites 7, 6, 9, and 11 on one hand and 14 on the other one.
The variables whose distances to the origin are high are well represented on the factor map. In our case, the most significant contributions to PC1 are those of HCO 3 − , CO 3 2− , Cu, Pb, to PC2-Mg 2+ , Na + , K + , TDS (Figures 7 and 8, the first two bar charts) and on PC3-Cr, Mn, Cd, Zn, Pb, CO 3 2− (Figure 8, the last bar chart). The elements of the eigenvectors are called PC loadings [75]. The factor loadings associated with each of the variables in a given PC are the correlation between the original variable and the factor. The significant variables are those with loadings greater than 5%.
In Figure 8, the horizontal dashed line represents the 5% contribution to a PC. Nine elements have a contribution greater than 5% to PC1, six to PC2, and six to PC3. Figure 9 contains the quality of representation of the variables on the factor map. It is another way to summarize the contributions of each element to the first five PCs. In our study, the significant ones are Mg 2+ and Na + (on PC1), carbonates (on PC2), followed by the heavy metals: Cr (with the loading 0.52 on PC3), Mn (with the loading 0.509 on PC4), and Zn (with the loading 0.740 on PC5). Thus, we expect that they will differentiate the water samples. The high influence of HCO 3 − on PC1 could be explained by the recharge due to precipitation [86]. Finally, the values of the WQI index for all the sites are presented in Table 5. The columns of this  Based on the WQI 1 (WQI 2), the quality of water for drinking can be classified as follows: • for the samples from the first cluster, 70.6% (88.2%)-excellent, 29.41% (11.8%)-good; • for the samples from the second cluster, 25% (25%)-excellent, 25% (25%)-good, 50% (50%)-poor; • for the samples from the third cluster, 70% (70%)-excellent, 25% (25%)-good, 5% (5%)-poor.
After removing the concentrations of Cr, Mn, and Zn from the analysis, the modification of the percentage of samples in the first cluster classified in the category Excellent increased from 70.6% to 88.2%. The inclusion in the categories Excellent, Good, and Poor remain the same for all the samples in the second and third cluster when removing Cr, Mn, and Zn from the analysis. Therefore, we can conclude that these three elements have a significant influence on water quality.

Conclusions
In this article, we proposed an integrated approach for water quality evaluation and applied it to the data series containing the groundwater parameters measured in the Liwa area, the United Arab Emirates.
Firstly, the similarity of the water samples was determined using statistical tests. The results of the Dunn post hoc test determined pairs of samples that are not similar. The k-means algorithm was then used to determine the groups of samples with the same characteristics. We found three clusters, determined by the hydrogeological structure of the region and the anthropic activity. The clustering result is concordant with the dissimilarity test. This means that series that were found to be similar belong to the same cluster, while the dissimilar ones belong to different clusters.
The PCA shows that only five components out of 19 (analyzed) could be used to describe the water quality. The heavy metals have a significant influence on the first five PCs, so human activities impact the water quality. The samples included in the second cluster are the most polluted because they are extracted from places with heavy traffic and agricultural land use.
WQI was computed in two scenarios: taking into account all the elements and removing three of them. In both cases, the water quality was mainly excellent and good for the samples belonging to the first and third clusters. In the second scenario, the percentage of samples included in the category