Clustering Groundwater Level Time Series of the Exploited Almonte-Marismas Aquifer in Southwest Spain

: Groundwater resources are regularly the principal water supply in semiarid and arid climate areas. However, groundwater levels (GWL) in semiarid aquifers are suffering a general decrease because of anthropic exploitation of aquifers and the repercussions of climate change. Effective groundwater management strategies require a deep characterization of GWL fluctuations, in order to identify individual behaviors and triggering factors. In September 2019, the Guadalquivir River Basin Authority (CHG) declared that there was over-exploitation in three of the five groundwater bodies of the Almonte-Marismas aquifer, Southwest Spain. For that reason, it is critical to understand GWL dynamics in this aquifer before the new Spanish Water Resources Management Plans (2021 – 2027) are developed. The application of GWL series clustering in hydrogeology has grown over the past few years, as it is an extraordinary tool that promptly provides a GWL classification; each group can be related to different responses of a complex aquifer under any external change. In this work, GWL time series from 160 piezometers were analyzed for the period 1975 to 2016 and, after data pre-processing, 24 piezometers were selected for clustering with k-means (static) and time series (dynamic) clustering techniques. Six and seven groups (k) were chosen to apply k-means. Six characterized types of hydrodynamic behaviors were obtained with time series clustering (TSC). Number of clusters were related to diverse affections of water exploitation depending on soil uses and hydrogeological spatial distribution parameters. TSC enabled us to distinguish local areas with high hydrodynamic disturbance and to highlight a quantitative drop of GWL during the studied period.


Introduction
The increase of groundwater exploitation for intensive agricultural irrigation contributes to transformations in the natural hydrodynamics of a given area. Several studies have shown strong changes in natural groundwater flow in over-exploited areas [1][2][3]. Climate change drives increasing modifications in groundwater uses. In many cases, the decrease of groundwater levels (GWL) involves ecological degradation, especially in wetland areas [4,5]. Effective groundwater management strategies require a deep characterization of GWL fluctuations, in order to identify individual behaviors and triggering factors.
Doñana Natural Space, located in the southwest of Spain, is one of the largest protected wetlands in Europe [6]. The study of historical piezometric databases could be a particularly challenging work in the Doñana wetland area because of heterogeneity in the records and a high number of missing values. While data were formerly obtained manually, piezometers have recently evolved to be part of an automated system, continuously recording GWL fluctuations. This means that the size of the dataset has increased, making the management and interpretation of the GWL data an important issue.
Clustering has frequently been applied in hydrology, e.g., in hydrochemistry studies to distinguish hydrochemical groups of water [7]. Besides this, it has been used to classify GWL hydrographs [8], to study the changes in GWL produced after an earthquake in Japan [9] and to evaluate subsurface flow patterns [10]. Recently clustering was also used to study hydrochemical impacts on groundwater by altering the interaction between groundwater and surface water [11] and to evaluate the missing data of monitoring wells [12].
There are many clustering techniques but k-means is one of the most popular, unsupervised algorithms applied in the geosciences and, also, in hydrogeology [13]. The aim of k-means is to divide n observations into k groups. It is relatively simple to implement, scales to log database and guarantees convergence [14]. Nevertheless, the number of clusters, k, has to be pre-defined and the method is very sensitive to outliers. In addition, k-means does not consider the temporal correlation structure of the time series, which is an important limitation to detect changes in temporal patterns in different groups. To solve this limitation and handle dynamic data, time series clustering (TSC) techniques have been developed, modifying the similarity definition of the data or the prototype extraction function to an appropriate one [15].
In September 2019, the Guadalquivir Basin Authority (CHG in Spanish acronym) activated the administrative process to declare three of the five groundwater bodies that comprise the Almonte-Marismas aquifer to be at risk of not reaching a good quantitative state. For that reason, it is critical to understand GWL dynamics in this aquifer before the new Spanish Water Resources Management Plans (2021-2027) can be developed. Qualitative studies of the historical GWL data in the Almonte-Marismas aquifer exist [16][17][18] but they examined the data by visual inspection and none of them give a clear categorization of GWL behavior. In general, to analyze the characteristics and summarize the patterns of the complex fluctuation of GWL, it is important to rely on objective classifications [19].
The consequences of climate change, such as increasing evaporation or more extreme hydrological events (e.g., droughts), aggravate the anthropic pressure, posing a serious threat to the Almonte-Marismas aquifer's hydrogeological state [20]. The over-exploitation of the area is attributed to uncontrolled, unsustainable and, in many cases, illegal growth of intensively irrigated agriculture over recent decades [21]. Unregulated and illegal pumping must be taken into account in groundwater management evaluations [22]. When long GWL time series are available, one way to evaluate the agricultural expansion effect on the aquifer dynamics is a deep study of the GWL by using clustering tools.
In the present work, the aim is to apply clustering methodologies to improve the understanding of the Almonte-Marismas aquifer dynamics and obtain information that will be useful to develop sustainable groundwater resource management plans. For that purpose, monthly GWL from 1975 to 2016 were clustered using three different methods: visual observation (as a reference for a qualitative analysis), k-means [23] and TSC [15]. To the best of the authors' knowledge, this is the first time that TSC has been applied to a medium size GWL database with a 41 year time series.

Study Site
The Almonte-Marismas aquifer, in Southwest Spain, is located at the margins of the mouth of the Guadalquivir River (Figure 1a). This aquifer hosts the Doñana marshlands, which was flooded by groundwater and surface water and declared to be RAMSAR wetlands in 1982 [24]. At the same time, Doñana water supports the socio-economic activities developed on the area, such as agriculture, industry, mining and tourism. The topography of the region falls from approximately 150 m above sea level (masl) in the north, to less than 1 masl in the marshland area on the coast to the south. In geological terms, it is composed of a complex sedimentary system of sand and gravels with some clay lenses, basal sands, silts and aeolian sands [26,27] (Figures 1a and 1b). The average precipitation between 1975 and 2016 was approximately 550 mm/year and the average annual temperature was 25 °C. The maximum daily precipitation was registered at 168.2 mm in November 2001. From June to September the maximum temperature was 45 °C and the minimum temperature in winter months was 6 °C. The sub-humid Mediterranean climate with an Atlantic influence results in high seasonality [24]. The main recharge sources are rainfall, irrigation return flow and lateral inflow from the northeast. The average recharge is 200 hm 3 /year, primarily during spring and autumn [24]. The main groundwater flow runs from northwest to southeast, recharging mainly from the northern and southwestern areas and discharging into the Atlantic Ocean onto the marshland, rivers and streams that flow from the higher regions in the north (Figure 1c). There is no control over how much water is withdrawn for agriculture but there are estimated increases in groundwater abstraction for irrigation, from 11 hm 3 /year at the beginning of the 1970s to 80 hm 3 /year in the mid-1990s, continuing to increase up to more than 130 hm 3 /year, currently [28].
The Almonte-Marismas aquifer's complex sedimentary origins, ecological richness and groundwater over-exploitation are three challenging factors for groundwater resource management decisions [27,29]. The combined effect of agricultural activities, industry and tourism have added to climate change and led to a general decrease of GWL, changing the hydrodynamics in the area [22]. As a result, several groundwater-dependent temporal ponds have disappeared [30].

Data Collection and Pre-Treatment
Groundwater monitoring has historically been carried out in the Almonte-Marismas aquifer since 1903. Unfortunately, this GWL series does not retain continuity. Piezometers were first installed in 1970 [24], making it possible to have an entire database with 160 points (blue dots in Figure 1c) with different degrees of temporal continuity. Data from these piezometers (with less than 80% missing values (MV)), have been collected by the Spanish Geological Survey (IGME) and the CHG and have been used in the present work (yellow dots in Figure 1c). The period of time chosen, from October 1975 to September 2016, follows a hydrological year and registers the most data recorded. Agricultural activity grew considerably in the mid-1990s. Consequently, in order to analyze the evolution of the aquifer dynamics due to anthropic actions, the study period was subdivided into two sub-periods of twenty and twenty-one hydrological years: (1) October 1975-September 1995 and (2) October 1995-September 2016.
The measurements, as an absolute elevation above or under sea level, have been homogenized to a monthly time step, taking the average of all data recorded within each month. In addition, outliers and duplicates were discarded.
To study GWL responses to precipitation, rainfall data were taken from the Almonte-Acebuche rain gauge (Figure 1c), which is maintained by the AEMET (Spanish State Meteorological Agency). Cumulative deviation from the mean monthly rainfall (CDR) was calculated to represent precipitation variations in both sub-periods. Figure 2 shows the whole approach that was followed in this study. To define a starting point (and before applying more advanced statistical techniques), visual classification of the piezometric time series was performed. This exercise is essential to developing a global understanding of the time series. MV in the GWL time series have been infilled using all the available information (160 piezometers) and the missForest algorithm [31]. MissForest is a non-parametric, iterative imputation method, based on the random forest algorithm [32]. The advantage of missForest is that it does not make any assumptions about the distribution of data or imputation models [33]. It uses an iterative imputation scheme by training a random forest on the observed values of each variable in a first step. Then, it predicts the missing values and proceeds iteratively until the stopping criterion is met or the user-specified maximum number of iterations is reached. The algorithm continuously updates the imputed matrix variables, assessing its performance between iterations. This assessment is done by considering the difference between the previous and the new imputation results. As soon as this difference increases for the first time, the algorithm stops and keeps the last imputation matrix. The Normalized Root Mean Square Error (NRMSE; Equation (1)) has been obtained to evaluate the performance of the method: where is the complete data matrix, is the imputed data matrix, and are, respectively, the short notation for the empirical mean and variance computed over the continuous missing values only. Good performance leads to a value of NRMSE close to 0 and bad performance results in a value approaching 1.
To perform a consistent GWL clustering it was decided to only use piezometers with a low percentage of MV in both study sub-periods (i.e., October 1975-September 1995 and October 1995-September 2016). The average MV percentage for the entire study period was T = 51.5%. For the first and second sub-periods the average MV percentages were P1 = 64.1% and P2 = 39.5%, respectively. That is the reason why a more restrictive MV threshold was applied in order to select which piezometers were used in the cluster analysis in the first sub-period, compared to the second one. The following expressions (relating P1 and P2 to T (Equations (2) and (3))) were used to decide which MV thresholds define what piezometers from the 160 available data points remained for the cluster analysis: where r1 = P1/T and r2 = P2/T. Substituting P1, P2, r1 and r2 for their corresponding values in the equations yields an MV1 threshold of 48.4% for the first sub-period and an MV2 threshold of 69.8% for the second sub-period. Rounding up these figures, the piezometers that have less than 50% MV for the first sub-period and 70% MV for the second sub-period have been kept, resulting in 24 piezometers in total (yellow dots in Figure 1c). The general statistics for these 24 piezometers can be seen in Appendix A, Table A1.
To avoid the scaling effects of GWL values while clustering, the imputed monthly piezometric series were standardized. This step was carried out before visual and k-means clustering and it is already implemented in the algorithm used to perform TSC.

K-Means Clustering
K-means clustering is one of most common methods for exploratory data analysis. This technique can identify groups in the dataset that show similar patterns or equal trends. Specifically, k-means divides up the data into pre-defined, non-overlapping groups, where every data point belongs to only one group or cluster. First of all, the random partition method randomly assigns a cluster to each observation and then the algorithm proceeds to the update step, computing the initial mean to be the centroid of the cluster's randomly assigned points.
The standard algorithm of k-means [34] (Equation (4)), minimizes the sum of squares of the distance between data points within a cluster and the centroid of that cluster: In the present case study, is the data vector with monthly measurements corresponding to piezometer , is the set of piezometers that belongs to cluster and µ is the centroid, computed as the mean value of the data vector belonging to cluster . Each data vector ( ) is assigned to a cluster when its distance to the centroid, ( ), is at a minimum.
The appropriate number of clusters is given by the point where the sedimentation graph changes the trend, drawing an "elbow" [35]. The sedimentation graph plots a clustering score as a function of the number of clusters and the elbow method is a heuristic method for interpreting and validating consistency. The score is generally a measure of the input data on the k-means objective function (e.g., sum of ( ) across clusters). The k-means algorithm was run with the R package factoextra [36].

Time Series Clustering (TSC)
The TSC method considers time series distribution. As with the k-means method, it has a stochastic nature due to its random start. TSC was run with the R package dtwclust [15].
In order to classify all piezometers, partitional clustering [37] was selected to create partitions. The specification of the number of clusters (k) is tedious work that must be performed before starting the algorithm. The optimal k was identified by using internal clustering validity indices (CVIs). Cluster validity is a set of techniques that have been developed to find the best k without any previous class information. Based on similar works [38,39], three internal CVI indexes were used to choose k: Dunn (D), Davies-Bouldin (DB) and Calinski-Harabasz (CH). D is a ratio-type index and its cohesion is calculated by the distance to its nearest neighbor and the separation by the maximum cluster diameter [40]. DB is one of the most popular indices and it was based on the distance between centroids [41]. CH cohesion is estimated based on the distance from the points in a cluster to its centroid [42].
Depending on the distance of the disturbance and the aquifer characteristics, GWL responses can be registered with a lag of days or months from one location to another. Considering this singularity of GWL behavior, the most appropriate distance to measure GWL time series is Shape-Based Distance (SBD) [15]. SBD warps the time series and it is based on the cross-correlation with coefficient normalization ( ) and shape extraction as a prototype function (Equation (5)) [37]: where ‖ ‖ 2 denotes the standard L2 norm of the series and SBD( , ) values range between 0 (perfect similarity) and 2 (unequal time series).

Imputation Piezometry and Visual Classification
Monthly imputation of the 41-year time series for the 24 piezometers with the missForest algorithm resulted in an NRMSE of 0.05, indicating a good performance of the imputation scheme. This small error can mainly be explained by the lack of uneven piezometer variations at a monthly scale, which is present in other kinds of time series (e.g., temperature), and by the high number of piezometers available for imputation. Table 1 shows how many piezometers integrate every cluster for each sub-period. A layer corresponding to a rough spatial distribution of the groundwater extractions in the two different subperiods [25,43] is also included in Figure 3 (in grey). To ease the interpretation, the raw piezometric time evolution for each visual cluster is shown in Figure 4      It is difficult to compare our GWL classes (Figures 3 and 4) with the ones obtained from the 15 year series through visual inspection by UPC [16] and, in addition, those published in Manzano et al. [44]. These authors only showed GWL evolution at seven points within the same study area but without trying to find analogous piezometric evolution at different points.
Visual analysis found four different groups for both sub-periods. All visual cluster groups have a different spatial distribution when comparing the first and second sub-periods studied (Figures 3a  and 3b). The main difference between the two sub-periods is that the clustering spatial distribution is more irregular in the second one. This could be attributed to an increase in the number of pumping wells (grey zones in Figure 3), in addition to climate change effects that could reduce the recharge rate [45].
At the end of the first sub-period there is a generally descending trend of GWL and a maximum average local oscillation of 5 m (Figure 4). The series that comprise groups V1.2 and V1.3 have piezometers with GWL between 0 and 80 masl, while V1.1 and V1.4 have a lower range, between −10 and 30 masl. This reflects the fact that series with visually similar GWL oscillations are located in different areas of the aquifer (Figure 3a). All of the groups exhibit time variations that may be due to extractions, as well as the recharge-discharge pattern characteristics of the hydrological cycle (e.g., compared with the CDR line). Visually, it looks like V1.1 and V1.3 are highly affected by pumping due to their cyclical oscillation over a year, ranging from 2 to 6 m. Group V1.4 seems to be located in an area exploited in the first half of the sub-period, but not in the second period. Until 1990, cluster V1.2 fluctuations followed the same temporal evolution that Custodio [4] assigned to undisturbed GWL. Furthermore, the reduced influence of anthropic actions is corroborated by the fact that piezometry has the same trend as the CDR curve ( Figure 4). Among the four cluster groups, V1.4 presents the highest average level decrease of around 7 m (from 1975 to 1995). The main feature revealed by these results is that, between 1975 and 1995, anthropic impacts were already affecting the aquifer as GWL was not recuperating after the pumping cycle.
For the second sub-period, Figure 5 shows that the order of the groups from highest to lowest inter-annual oscillation is V2.2, V2.1, V2.3 and V2.4. All groups have a descending trend but at varying degrees. The V2.4 graph ( Figure 5) shows the same trend as the CDR and the smallest decrease (of up to 2 m) reflects low anthropic intervention. V2.1, V2.2 and V2.3 have GWL between −10 and 40 masl and GWL in V2.4 is between 20 to 90 masl because of the altitude at the different locations. As in the first study sub-period, the wide GWL ranges are a consequence of the classification criteria (GWL behavior) and not just because of location or average values. For example, V2.4 presents a greater GWL range because two piezometers (yellow dots in Figure 3b) are located at the highest topographical area and groundwater table. V2.3 is the group with the highest drawdown in this second sub-period, up to 6 m on average. Observed drawdowns are in accordance with the piezometric descent that was estimated for the whole of the Almonte-Marismas aquifer between 1994 and 2012 (Olías-Álvarez and Rodríguez-Rodríguez [46]); this ranged from 2 to 6 m. Cluster V.2.1 shows high fluctuations of up to 4 m, attributed to pumping effects. However, time variations are similar to the ones seen in CDR, which also shows the climate response. This feature could be explained by merging climate oscillations and pumping effects seen at the locations in this cluster V2.1 (Figure 3).
Visual cluster results allow us to have an overview of the piezometer responses along each subperiod. In addition, this approach is a qualitative baseline for evaluating and comparing the results of the mathematical methods.

K-Means Clustering
As explained in the Section 2, when applying k-means clustering, the number of clusters k has been defined with the aid of the sedimentation graph ( Figure 6). Both sedimentation curves for first (dark blue) and second (red) study sub-periods change their trend at the point which indicates the optimal number of clusters (i.e., the elbow). Both curves show smoothly decreasing trends and, given that the maximum number of clusters is 24, points around the second inflexion point in each period were tested and a final value of k chosen as a trade-off between complexity and representativeness. Consequently, k = 6 and k = 7 were taken to be the optimal number of clusters to run the k-means algorithm for the first and the second study sub-periods, respectively ( Figure 6). Initially, it can be interpreted that the k-means method distinguishes more GWL variation groups that were found by visual classification. However, two and three groups are formed by just one piezometer in each subperiod. These clusters with a single piezometer correspond to singular time series variations that kmeans cannot group. As a visual classification, k-means has more spatial continuity in the cluster groups in the first sub-period than in the second period (Figures 3c and 3d). For both sub-periods and each k-means cluster, Figures 7 and 8 plot the raw piezometric series (grey), and the average raw series (colored). In the first sub-period, K1.3 is the group with the highest GWL range (Figure 7) due to the sparse spatial distribution of the piezometers. For example, there is one point located in the piezometric dome (Figure 1c), where important recharge rates take place. K1.3 is almost the same group as V1.2, with just one piezometer more and the same trend as the CDR. These groups of piezometers do not show fluctuations affected by direct pumping, but a decreasing trend can be seen since 1990. Groups K1.2, K1.5 and K1.6 have GWL between −10 and 30 masl and they correspond to piezometers located in the pumping areas with high anthropic changes in the natural hydrodynamics. K1.2 and K1.5 ( Figure 7) have an average time series that can correspond to V1.3 ( Figure 4) and K1.6 ( Figure 7) corresponds to V1.1 and V1.4 ( Figure 4). These groups are affected by groundwater extractions and are differently classified when using visual or k-means clustering. K1.5 and K1.6 average series show clear cycles that are attributed to the effects of groundwater extraction, as well as an important, descending trend of up to 8 m, on average ( Figure 7). Spatially, they are located in an area of high agricultural development (Figure 3c). Finally, both groups K1.1 and K1.4 are composed of a single piezometer (Table 1), located in regions not affected by irrigation pumping during this first subperiod (Figure 3c).  Table 1). K2.1 is the same piezometer as K1.1. It presents a different GWL pattern than the other piezometers (with small oscillations from October 1984) and an important drawdown at the beginning of 2009 (Figures 7 and 8). These abrupt variations suggest that there could be errors in the data acquisition from this piezometer. K2.4 temporal behavior indicates that aquifer exploitation started at the beginning of 2007 near this point and this could be the reason why the GWL behavior does not fit with other groups. The opposite behavior was seen in K2.6, with smooth variations during the second half of the sub-period. K2.3 does not show a clearly descending trend, with piezometers in very different areas of the aquifer (Figure 3d) and appearing as a smoothed signal of the CDR. The average series of K2.2, K2.5 and K2.7 have a decreasing tendency (Figure 8) due to the growing groundwater extractions in the nearby areas (Figure 3d). In particular, the K2.2 group presents the most pronounced drawdowns of up to 8 m, and mainly comprises piezometers located in central areas of the aquifer. It appears to correspond to an area that began to be exploited later than zone K2.5 but, as piezometry suggests, with a greater intensity reflected by a faster descent since the year 2000. The average oscillation patterns are more regular for clusters K2.5 and K2.7; this can be explained by irrigation and geological heterogeneity.
When we compare k-means results with visual classification however, some k-means groups represent a merge of hydrogeological behavior due to exploitation for irrigation of different types of crops (e.g., berries, citrus) or other uses (e.g., drinking water supply), it could be seen in the different patterns of K2.2, K2.5 and K2.7 (Figure 8). On the other hand, groups K1.3 and K2.3 have strong similitudes with the CDR graph. Similar k-means results were found by Boomfield et al. [8] with a 30-year monthly GWL series. These were in a limestone and chalk aquifer located in the UK. However, they highlight the relationship between GWL and geological characteristics for 74 points, spatially distributed across 2000 km 2 . In the present study (and by comparing cluster spatial locations (Figure 3) with the geological distribution (Figure 1a)), it was observed that GWL are not always dependent on geology. However, cluster groups are in agreement with the findings in Boomfield et al. [8], regarding the identification of anthropogenically impacted groundwater hydrographs showing declining trends (e.g., clusters K1.6 or K2.2). Figure 9 shows the CVI values for a different number of clusters in both sub-periods. Following the criteria explained in Section 2, k was set to 7 in both sub-periods; D, DB and CH present an increase (marked with a red line in Figure 9). Only one of the clusters in the second sub-period had just one piezometer. For the first sub-period, the TSC map ( Figure 3e) reveals a spatial homogeneity for clusters T1.2, T1.4 and T1.5, which are located in the north and center of the aquifer. The average raw series of groups T1.3 and T1.4 ( Figure 10) do not seem to have been directly influenced by agricultural pumping, due to the shape and size of their oscillations and they follow the CDR. They correspond to clusters V1.2 and K1.3, with the same decreasing trend from 1990 (Figures 4, 7 and 10). The T1.1 average time series is very similar to V1.1 and T1.7 to V1.4, both are affected by pumping cycles (Figures 4 and 10). The rest of the groups (T1.2, T1.5 and T1.6) discriminate between different patterns that were not detected by k-means or visual classification. All of them merge annual hydrological cycles and irrigation extractions.

Time Series Clustering
In the second sub-period, the spatial distribution of TSC groups is not homogeneous (Figure 3f). As in the former cluster methods, and regarding cluster series graphs (Figure 11), the general trend in the second sub-period is a GWL decrease. More specifically, T2.2, T2.4, and T2.7 have an outstanding average decrease between 2 and 10 m. Only T2.1 is composed of one piezometer ( Figure  11, Table 1). T2.3 and T2.5 have similar average series and are not spatially adjacent to the irrigation areas (Figure 3f) but are closely related to the CDR. T2.2, T2.4, T2.6 and T2.7 reveal the annual oscillations associated with extractions, with average amplitudes of approximately 4 m (Figure 11). It is difficult to find correlations between these four TSC groups in the second sub-period and the visual clusters. This can be explained by the TSC's ability to integrate different temporal hydrodynamic variations such as increasing pumping, evolution in the types of crops [47] and seasonality of GWL oscillations [48].  Comparing k-means results with TSC, the clusters reveal some similarities. For example, for the first sub-period, the location and mean graphs of pairs K1.2-T1.2 and K1.5-T1.5 are equal but K1.6-T1.7 only share the mean series shape (Figures 3, 7 and 10). That is the case in the second sub-period for pairs K2.2-T2.2 and K2.3-T2.3 (Figures 8 and 11). For the second sub-period, the spatial distribution of the results is quite different when both methods are compared. K-means is based on the Euclidean distance between point coordinates in the feature space, thus distance is evaluated in terms of the difference between the GWL of the piezometers at the same time coordinate. Basically, k-means works by clustering together piezometers showing the least distant GWL in the complete time period vector but it does not take into account how the GWL change along the different time series. On the other hand, TSC uses the shape-based distance (SBD) that accounts for the crosscorrelation between GWL series. This allows for the detecting of lagged responses in different piezometers which is, in turn, more closely related to the hydrodynamic characteristics [49]. TSC warps the series in order to fit the oscillation curves. This is confirmed by the results, for example K1.4 is composed of only one piezometer (104140047, Appendix A, Table A1). This point is far away from the other 23 points and is located in the northern part of the aquifer with a higher topographic elevation (Figure 1c). On the other hand, the same piezometer is included with another three piezometers in the T1.4 group. The average mean of T1.4 GWL series is similar to K1.4 but smoothed out. This means that there are four points with similar patterns but k-means could not detect it (Figures 7 and 10). Furthermore, unlike most k-means clusters, that show relatively constant GWL trends (Figures 7 and 8), several TSC clusters (T1.1, T1.6, T1.7, T2.2, T2.4 and T2.7), present clearly decreasing patterns (Figures 10 and 11).
In the second sub-period, there are at least two groups with similar GWL behavior to the CDR time series. As aquifer storage is depleted because of exploitation, the rainfall dependency of the GWL response increases [50]. For the second sub-period, the mean graphs of the cluster groups ( Figure 11) are more similar to CDR than the first sub-period results ( Figure 10) TSC was also successfully applied to a 0.2 km 2 pre-alpine Swiss shallow aquifer to investigate subsurface hydrological connectivity [10]. The authors found six groundwater response clusters over a 19 month study period. As is shown in the present work, they found that relative groundwater levels in the wells in each cluster were significantly different, in terms of amplitude and the rate of recession. Otherwise, GWL ranges in the Almonte-Marismas aquifer are quite dissimilar between piezometers belonging to the same cluster.
Rinderer et al. [10] studied a small, shallow aquifer, which is difficult to compare with the high GWL ranges of the bigger and heterogeneous Almonte-Marismas aquifer. As these authors state, grouping the groundwater dynamics into a fewer number of clusters is a gross simplification as there are obviously more types of groundwater responses. However, this clustering is a necessary step in synthetizing site-specific groundwater dynamics to general groundwater behavior. For example, TSC clusters could be used to define the over-exploitation zones in the Almonte-Marismas aquifer. Another application could be the selection of a number of key piezometers to focus the calibration process of the existing Almonte-Marismas groundwater mathematical model [25]. This will reduce the dimension of the optimization problem and, consequently, the computation time.
The most important result from TSC is the six characterized types of hydrodynamic behavior (T2.1 is a singularity) that are now present in the aquifer ( Figure 11). Visually, the mean piezometric series of T2.2 is similar to that of T2.7 and the same happens with T2.4 and T2.6. The area around T2.4 is being excavated for the gravels that are under the marshlands, while gravels that are under the aeolian sands are being extracted from the area around T2.6. The different hydrogeological parameters in both cluster locations explain the differentiation in these GWL time series that cannot be seen with a qualitative comparison. In addition, the descending trends seen in T2.2 and T2.7 (from the beginning of the 2000s) are the result of uncontrolled increases in the berry fields in the northwestern part of the aquifer. This development has been denunciated by WWF on several occasions [51]. On the other hand, clusters T2.3 and T2.5 are less anthropically influenced but are more affected by recharge fluctuations.
On a global scale, GWL time series clustering can contribute to the identification and separation of spatiotemporal processes associated with both climate and human disturbances. This separation is often tackled through complex transfer function-noise models [52,53]. Similar aquifer exploitation regimes in different countries or regions could then be clustered together and provide the basis to build time series classification models based on labeled time series resulting from the clustering analysis. In turn, this would support the development of global actions for sustainable groundwater management.

Conclusions
The main goals of this work were to study the historical piezometry of the Almonte-Marismas Aquifer, to improve the knowledge about hydrodynamics and to prove that anthropic activities have changed GWL trends in the system. It was possible to reach these aims by analyzing the historic piezometric database (for a period of over 41 hydrological years), applying two robust clustering methods, comparing the results and highlighting the advantages and disadvantages of both. Temporal changes in the variables that affect GWL (recharge, water exploitation, climate change, surface/groundwater interactions) could help to explain the visual, k-means and TSC results. The study period was divided in two in order to search for important piezometric changes due to the growing demand of irrigation fields. The differences observed between the GWL clusters for the first and the second sub-periods reflect the influence of the socioeconomic and anthropic development on the natural regime. The relationship between CDR and GWL decreases with anthropic exploitation, making aquifer storage more vulnerable to the effects of climate change but the response of GWL to rainfall events becomes quicker when the aquifer is over-exploited. The natural system exhibits complex hydrological behavior, highlighting the need to adapt water policy and management to every single case.
Visual clustering offers a first approach to characterizing the behavior of the hydrographs, considering expert judgement and providing a basis for the validation of mathematical methods. However, in the second study sub-period, it was observed that the combination of extractions (in terms of quantity and time of year) produces a merging of responses in which there is no clear spatial correspondence. More specifically, k-means methodology allows the distinguishing of singular GWL oscillations, assigning piezometers to classes with only one GWL point. This permits the identification of errors in data measurements or temporal changes in the external factors that does not occur in other areas of the aquifer. In areas not affected by groundwater extractions, k-means classification gives similar results to visual classification.
Our findings confirm that TSC offers better results for studying time series distribution because the members of the same group reproduce the same pattern. K-means considers time (average monthly GWL as attributes for each piezometer) but not the correlation structure of the time series. This is an important limitation to detecting changes in temporal patterns in different groups. TSC considers the time series distribution and the results show consistent distribution of GWL responses. Further studies will facilitate the introduction of cluster results, in order to optimize and reduce the amount of input data for mathematical groundwater flow models, contributing to the speeding up and simplifying of the modelling process. Also, the obtained results could offer a way to zone the effects of uncontrolled exploitation wells by estimating the extent of the effects of water extractions. The new characterization of the Doñana piezometry, based on a long GWL series and TSC objective criteria, proves the important impact that uncontrolled extraction has had over a period of decades.

Acknowledgments:
We are grateful for the support given by Fernando Ruíz Bermudo and Antonio Sánchez de la Nieta from the Spanish Geological Survey (IGME) for the acquisition and maintenance of the piezometer database; the Guadalquivir River Basin Authority (CHG) for the piezometric data supplied; and the Spanish State Meteorological Agency, the Biological Station of Doñana and Junta de Andalucía for the meteorological data provided.

Conflicts of Interest:
The authors declare no conflict of interest.