Carbamazepine Levels Related to the Demographic Indicators in Groundwater of Densely Populated Area

: Consumption of pharmaceuticals by people is growing. Carbamazepine (CBZ) is an extensively used anti-epileptic drug that is recalcitrant to degradation. As a result, CBZ has been widely detected in the aquatic ecosystem due to its daily consumption and drainage in sewage systems. Leakages from sewage networks and septic tanks may represent one of the main sources of CBZ in groundwater. In this study, CBZ concentrations in groundwater and their correlations with the demographic structure of the population were investigated in the densely populated Milan urban area. Seventy-six demographic variables were retrieved from the Italian Population and Housing census. Twenty-one groundwater samples were collected from unconﬁned and semi-conﬁned aquifers of the Milan area and the concentration of CBZ was measured. Groundwater CBZ levels in both aquifers were associated with the demographic data within a circular buffer with a radius of 1.5 km. All data were analyzed using a multivariate statistical approach. The results showed a signiﬁcant association ( p < 0.05) between CBZ concentrations and speciﬁc demographic segments of the population. Higher CBZ concentrations were found to be associated with the population aged 70 years and over (aging index), and with families having children aged under 5 years (family index). In addition, the divorce index was correlated with the high concentration of CBZ, whereas the educated and sexagenarian population showed a negative correlation. Our results indicated that the contamination of CBZ follows the same pattern in unconﬁned and semi-conﬁned aquifers, which are used for drinking water purposes in Milan area. Therefore, changing the CBZ consumption pattern or replacing CBZ with other drugs may strongly inﬂuence groundwater contamination of the investigated area.


Introduction
Pollution from pharmaceuticals observed in the aquatic environment, and its potential risk for humans and the wider environment, have recently received significant attention among the global scientific community [1][2][3]. After pharmaceuticals are injected into the human body, they are excreted through the metabolism via urination [4,5], and then enter sewage networks or septic tanks. In many cases, wastewater treatment plans are not capable of effectively removing pharmaceuticals, and small concentrations of these compounds (ng/L) have been detected in the effluent of sewage treatment plants [6][7][8] or septic systems [9,10].
Pharmaceuticals mainly enter the groundwater through human activities such as sewage leakages [11,12]. Recently, several medical products from various classes have been found in groundwater [13][14][15][16][17][18][19]. Generally, pharmaceuticals in groundwater are regarded as emerging groundwater contaminants, and their long-term potential risk has been studied Water 2021, 13, 2539 2 of 18 less often than in surface water [16,20]. Exchanges between aquifers, rivers, and sewage networks can cause the contamination of shallow and deep groundwater [16].
Most studies of emerging groundwater contamination have focused on locations near the effluent discharge points of wastewater treatment plants (WWTPs) [21][22][23] or close to septic systems' effluent [7,[24][25][26][27][28][29][30]. Moreover, the potential for pharmaceuticals to reach and contaminate the groundwater depends also on geology and hydrology [31], the location of the water table [32], population density [23], demographic characteristic [33] and drug consumption patterns [34]. However, few studies have investigated the factors that affect drugs entering the sewage stream and reaching the groundwater resource [35]. Vatovec et al. [33] investigated the effect of demographic changes and people's drug disposal practices on the volume of different pharmaceuticals entering the aquatic environment via wastewater discharges. Their results indicated that older people use larger volumes and different types of pharmaceuticals than the younger population, and this is reflected in wastewater concentrations. Moreover, scholars have noted that excretion of drugs from the bodies of consumers impacts the observed concentrations of pharmaceuticals in wastewater samples significantly more than the direct disposal of unused medications by consumers [33,36,37]. These studies show the importance of demographic information to assess the impact of the observed concentrations of pharmaceuticals in water samples.
Furthermore, several studies concerning the removal of pharmaceuticals from water have shown that both conventional drinking water and wastewater treatments have limitations in effectively removing pharmaceuticals, and thus the development of advanced treatment steps with high implementation costs are essential [38,39]. However, the results of some studies proved that advanced water treatments are not always a robust treatment for the removal of some pharmaceuticals, such as carbamazepine, from wastewater or drinking water, in which residual quantities may remain [1,40]. Therefore, understanding how the population within an urban wastewater agglomeration uses pharmaceuticals, which enter the wastewater or drinking water treatment plants, may provide important insights to enhance the control of the volume of these compounds [41].
In addition, hospital wastewaters contain hazardous pharmaceuticals and can be another source of pharmaceutical pollution in surface water and groundwater. Hence, hospital wastewater should be treated before being released to sewage systems and the aquatic environment. In Italy, analysis shows that five regions (Lombardia, Lazio, Emilia, Romagna, Veneto and Piemonte) produce more than 50 percent of the total medical waste [42]. However, some studies indicated that, although all hospital wastes are treated before their release to the environment, they are not completely degraded in the wastewater treatment plans and some specific pollutants (ng/L) are therefore discharged into the aquatic environment [43].
Traces of pharmaceuticals in drinking water have been reported in a few studies [44,45], and the influence of these compounds on the environment is poorly understood. Most of the emerging contaminants, including pharmaceuticals, are not currently included in the legislation for drinking water, such as the Guideline for Drinking-water Quality (WHO, 2011) and the Drinking water Directive 98/83/EC (EC, 1998). This is not because of their lesser importance, but mostly because there is not enough information on their toxicity and the impact factors that increase the concentration of these contaminants in drinking water sources [44]. As mentioned, the presence of pharmaceuticals in groundwater is harmful for the aquatic environment, but these contaminants may be an even greater problem where groundwater is used for drinking water purposes, such as in Milan city.
Studies have assessed the occurrence and fate of pharmaceutical contaminants in the groundwater of Milan, and the associated environmental risks [8,44,46,47]. These studies revealed the contamination of the upper layers of the groundwater and of the drinking water network of Milan city. Riva et al. [44] reported that carbamazepine was the most detected contaminant in Milan's drinking water network due to the fact that the water supply is drawn from groundwater, which may be polluted by leakages from the sewage network. Carbamazepine (CBZ) is an established drug for the control of grand mal and psychomotor epilepsy, and treatment of trigeminal neuralgia and bipolar depression [22]. Until the early 1990s, anti-epileptic drugs (e.g., CBZ) have been increasingly prescribed for indications other than epileptic disorders [48,49]. For instance, in Milan and, more generally, Italy, CBZ is ranked as the second-most prescribed drug due to its wide use in the treatment of conditions other than epilepsy, including migraines, neuropathic pain, bipolar disorders, and anxiety. Therefore, there is high demand for this drug in Italy [49,50].
In terms of environmental contamination, CBZ is only degraded very little, if at all, in wastewater treatment facilities, so remains in the environment for a long time [51]. In many assessments, CBZ was used as a stable indicator of untreated or treated sewage [22,[52][53][54]. Heberer [5] believes that under recharge conditions, CBZ can leach through the subsoil into groundwater aquifers. Moreover, Nakada et al. [23] found that there is a positive correlation between the CBZ concentration in groundwater and population density in Japan. In Italy, numerous assessments have reported the contamination of groundwater because of the sewage network leakage and septic tanks [44,[55][56][57].
In consideration of the fact that human discharge is the main source of CBZ in sewage networks, and sewage network leakages and domestic septic tanks may be important sources of CBZ pollution in the groundwater of the Milan area, in this study we investigated the demographic structure of drug consumers in the Milan area and its correlation with the groundwater contamination using a multivariate statistical technique. Although studies have examined the drinking water and groundwater contamination in Milan, as previously explained, no prior study has investigated the relationship between population density and demographic characteristics, and pharmaceutical contamination in groundwater. In this study, twenty-one groundwater samples were collected from the unconfined aquifer (A) and semi-confined aquifer (B) of the Milan urban area in 2011. Concurrently, 76 demographic and social structure variables were retrieved from the Population and Housing census database of the Italian National Institute of Statistics in 2011 [58], with the aim to identify population patterns statistically associated with specific CBZ levels in groundwater. The results of this research can provide insight into the source of CBZ contamination and the potential impacts on groundwater aquifers that serve as drinking water sources.

Study Area
The city of Milan (Lombardy Northern Italy) is located in the northern margin of the alluvial plain of the Po River ( Figure 1a). The Milan urban area is extremely dense in terms of urban, industrial, and agricultural development. This area covers 181.7 km 2 with a population density of 7693 inhabitants/km 2 , which is among the highest in Italy. The total pipeline length of the sewage system in the Milan area is 1446 km, with a flow rate of 280 million m 3 year −1 [59]. However, in rural areas, not all households are connected to sewage systems, and septic tanks are used; even when rural households are connected to sewage systems, they may have not drained their old septic tanks [57].

Hydrogeological Study
According to the hydrogeological structure of the study area, the thickness of the water-bearing alluvial deposits is more than 250 m [60]. The Milan metropolitan area aquifer system is composed of three main aquifers ( Figure 1a): unconfined (A) aquifer, 45-50 m thick; the underlying semi-confined (B) aquifer, 50 to 150 m thick; and the deep confined aquifer (C) [60]. The unconfined aquifer A consists of a coarse lithology, mainly comprising gravel with a sandy matrix, and is underlined by a clay-silt layer of variable thickness (Figure 1b), which shows a good continuity in the southern portion of the study area, and disappears moving northward [61]. This layer separates aquifer A from the semi-confined aquifer group B (50-150 m thickness), which is mainly formed by sand and sandy gravels (Figure 1b) [61][62][63]. Although the unconfined aquifer A is monitored periodically for contamination, it is not used for drinking water purposes. By contrast, deeper layers in aquifer B are used as sources for the drinking water distributed in Milan city [44]. In this study, the groundwater samples were collected from aquifers A and B.
comprising gravel with a sandy matrix, and is underlined by a clay-silt layer of variable thickness (Figure 1b), which shows a good continuity in the southern portion of the study area, and disappears moving northward [61]. This layer separates aquifer A from the semi-confined aquifer group B (50-150 m thickness), which is mainly formed by sand and sandy gravels (Figure 1b) [61][62][63]. Although the unconfined aquifer A is monitored periodically for contamination, it is not used for drinking water purposes. By contrast, deeper layers in aquifer B are used as sources for the drinking water distributed in Milan city [44]. In this study, the groundwater samples were collected from aquifers A and B.

Groundwater Samples
Groundwater samples were collected in Milan from different aquifers. Thirty-two samples were collected in November 2010 from the first layer of the unconfined aquifer A (1-19 m deep), and twenty-one samples were collected in June 2011 from depths of between 71 and 170 m within the semi-confined aquifer B, which is used as the main drinking water source by the city ( Table 1). The samples were collected from the wells before the potabilization treatment, and were collected in glass or PP 1000 mL bottles. All samples were stored in the dark at −20 °C until analysis. Figure 2 shows the location of the sampling stations.

Groundwater Samples
Groundwater samples were collected in Milan from different aquifers. Thirty-two samples were collected in November 2010 from the first layer of the unconfined aquifer A (1-19 m deep), and twenty-one samples were collected in June 2011 from depths of between 71 and 170 m within the semi-confined aquifer B, which is used as the main drinking water source by the city ( Table 1). The samples were collected from the wells before the potabilization treatment, and were collected in glass or PP 1000 mL bottles. All samples were stored in the dark at −20 • C until analysis. Figure 2 shows the location of the sampling stations.

Analytical Method
Carbamazepine (CBZ) was measured using solid phase extraction (SPE) and liquid chromatography tandem mass spectrometry analysis (LC-MS/MS) according to analytical methods developed in previous studies of Castiglioni et al. [8,46]. Briefly, 500 mL of sample was filtered using glass microfiber filters GF/A 1.6 µm (Whatman, Kent, UK) and mixed cellulose ester microfiber filters ME25 0.45 µm (Whatman, Kent, UK), and solid-phase extracted by Oasis MCX 60 mg cartridges (Waters Corp., Milford, MA, USA). Elution was undertaken with 2 mL of methanol and 2 mL of 2% ammonia solution in methanol. Eluates were dried under a gentle nitrogen stream and redissolved in 200 uL of MilliQ water.
Mass spectrometric analysis was undertaken using an API5500 triple quadrupole equipped with a turbo ion spray source (AB-Sciex, Thornhill, ON, Canada) in positive ionization modes, and quantification was performed by isotopic dilution using the labelled internal standard carbamazepine d10. Recovery was checked by triplicate analyses of spiked samples, and was confirmed to be over 90% with a variability (relative standard deviation) lower than 10%, as reported in previous publications [8,46]. The limit of quantification was calculated directly from chromatograms as the concentration giving peaks for which the signal-to-noise ratio was 10, and for carbamazepine was 0.2 ng/L. Procedural and instrumental blanks were used within each analytical batch to check for potential contamination. More details can be found in Section S1.1 of Supplementary Materials.

Analytical Method
Carbamazepine (CBZ) was measured using solid phase extraction (SPE) and liquid chromatography tandem mass spectrometry analysis (LC-MS/MS) according to analytical methods developed in previous studies of Castiglioni et al. [8,46]. Briefly, 500 mL of sample was filtered using glass microfiber filters GF/A 1.6 µ m (Whatman, Kent, UK) and mixed cellulose ester microfiber filters ME25 0.45 µ m (Whatman, Kent, UK), and solidphase extracted by Oasis MCX 60 mg cartridges (Waters Corp., Milford, MA, USA). Elution was undertaken with 2 mL of methanol and 2 mL of 2% ammonia solution in methanol. Eluates were dried under a gentle nitrogen stream and redissolved in 200 uL of MilliQ water.
Mass spectrometric analysis was undertaken using an API5500 triple quadrupole equipped with a turbo ion spray source (AB-Sciex, Thornhill, ON, Canada) in positive ionization modes, and quantification was performed by isotopic dilution using the labelled internal standard carbamazepine d10. Recovery was checked by triplicate analyses of spiked samples, and was confirmed to be over 90% with a variability (relative standard deviation) lower than 10%, as reported in previous publications [8,46]. The limit of quantification was calculated directly from chromatograms as the concentration giving peaks for which the signal-to-noise ratio was 10, and for carbamazepine was 0.2 ng/L. Procedural and instrumental blanks were used within each analytical batch to check for potential contamination. More details can be found in Section S1.1 of Supplementary Materials.

Demographic Variables
The population census data (2011) of the Milan municipality area were retrieved fro the archive of the Italian National Institute of Statistics [58]. Seventy-six demographic v iables, such as the total resident population, gender (male/female), age groups, mari status, and education, were included this dataset and are shown in Section S1.2 of Su plementary Materials. The map in Figure 3a shows the distribution of the sub-census un for which the census data were available. Pre-processing of demographic and groundw ter datasets was performed through QGIS tools (version 3.12). To correlate the dem graphic and social structure data with the concentration of CBZ in groundwater, a circu buffer with a radius of 1.5 km was created around each groundwater station at a set d tance. Within the circular buffer, the information about the population was associat with the groundwater CBZ level in both aquifers (Figure 3b; Table 1). It is relevant to no that the circular buffer was first applied around the groundwater monitoring stations aquifer B, and the corresponding groundwater stations of aquifer A were then select ( Figure 3b). Therefore, only 21 of the 32 groundwater samples of aquifer A, located in t circular buffer, were selected and analyzed (Figure 3b). The new dataset was created cluding the demographic information of each buffer and its CBZ level in both aquife More information about the population density in each circular buffer is presented in S tion S2.1 of Supplementary Materials.

Demographic Variables
The population census data (2011) of the Milan municipality area were retrieved from the archive of the Italian National Institute of Statistics [58]. Seventy-six demographic variables, such as the total resident population, gender (male/female), age groups, marital status, and education, were included this dataset and are shown in Section S1.2 of Supplementary Materials. The map in Figure 3a shows the distribution of the subcensus units for which the census data were available. Pre-processing of demographic and groundwater datasets was performed through QGIS tools (version 3.12). To correlate the demographic and social structure data with the concentration of CBZ in groundwater, a circular buffer with a radius of 1.5 km was created around each groundwater station at a set distance. Within the circular buffer, the information about the population was associated with the groundwater CBZ level in both aquifers ( Figure 3b and Table 1). It is relevant to note that the circular buffer was first applied around the groundwater monitoring stations of aquifer B, and the corresponding groundwater stations of aquifer A were then selected ( Figure 3b). Therefore, only 21 of the 32 groundwater samples of aquifer A, located in the circular buffer, were selected and analyzed (Figure 3b). The new dataset was created including the demographic information of each buffer and its CBZ level in both aquifers. More information about the population density in each circular buffer is presented in Section S2.

Statistical Analysis
Factor analysis (FA) and cluster analysis (CL) are statistical methods that indicate respective associations between variables and samples [65,66]. All the statistical computations were performed using SPSS Statistics Software version 26.0. FA was performed based on the correlation matrix between demographic variables [67]. Factor analysis was obtained through a preliminary principal component analysis (PCA), which was used to reduce the contribution of the less significant parameters within each varifactor. FA was used to extract the eigenvalues from the correlation matrix of the original variances. The number of factors to be retained was chosen based on the "eigenvalue greater than 1" criterion, meaning that all the factors that explained less than the variance of one of the

Statistical Analysis
Factor analysis (FA) and cluster analysis (CL) are statistical methods that indicate respective associations between variables and samples [65,66]. All the statistical computations were performed using SPSS Statistics Software version 26.0. FA was performed based on the correlation matrix between demographic variables [67]. Factor analysis was obtained through a preliminary principal component analysis (PCA), which was used to reduce the contribution of the less significant parameters within each varifactor. FA was used to extract the eigenvalues from the correlation matrix of the original variances. The number of factors to be retained was chosen based on the "eigenvalue greater than 1" criterion, meaning that all the factors that explained less than the variance of one of the original variables were discarded. A varimax rotation was applied to the extracted principal components. Following the PCA/FA, cluster analysis was applied to the extracted varifactors. Applying a cluster analysis based on the FA-extracted varifactors allowed us to identify geographical clusters that showed interesting common demographic profiles.
Cluster analysis is an unsupervised pattern recognition technique that classifies a set of objects into categories or clusters based on their nearness or similarity [68]. We applied a hierarchical cluster analysis (HCA), in which the Euclidean distance between samples was used as a measure of similarity and Ward's agglomerative method based on squared Euclidean distances was used as a grouping mechanism [69]. In our assessment, the HCA was run over the extracted varifactors derived from factor analysis to obtain demographic clusters, and was also applied to CBZ data in both aquifer A and B to group the concentration values into CBZ clusters (Figure 4) varifactors. Applying a cluster analysis based on the FA-extracted varifactors allowed us to identify geographical clusters that showed interesting common demographic profiles.
Cluster analysis is an unsupervised pattern recognition technique that classifies a set of objects into categories or clusters based on their nearness or similarity [68]. We applied a hierarchical cluster analysis (HCA), in which the Euclidean distance between samples was used as a measure of similarity and Ward's agglomerative method based on squared Euclidean distances was used as a grouping mechanism [69]. In our assessment, the HCA was run over the extracted varifactors derived from factor analysis to obtain demographic clusters, and was also applied to CBZ data in both aquifer A and B to group the concentration values into CBZ clusters (Figure 4) The chi-square test was also applied to assess the association of CBZ clusters in both aquifers with demographic clusters. The result was significant if the p-value was equal to or less than the alpha level (0.05). Moreover, Pearson correlation analysis was used to assess the correlation between the demographic clusters and the concentration of CBZ. Finally, the paired sample t-test was used to test if there was a significant difference between the averages of the CBZ concentrations in aquifers A and B. Figure 4 shows a simplified flow diagram for the methodology of this research.  The chi-square test was also applied to assess the association of CBZ clusters in both aquifers with demographic clusters. The result was significant if the p-value was equal to or less than the alpha level (0.05). Moreover, Pearson correlation analysis was used to assess the correlation between the demographic clusters and the concentration of CBZ. Finally, the paired sample t-test was used to test if there was a significant difference between the averages of the CBZ concentrations in aquifers A and B. Figure 4 shows a simplified flow diagram for the methodology of this research.

Factor Analysis Applied to the Demographic Variables
Factor analysis was applied to the 76 demographic variables (P1 to P76) in order to reduce the dimensionality. The description of 76 demographic variables is presented in Section S1.2 of Supplementary Materials. Eight varifactors were considered in factor analysis accounting for over 88% of the total variance ( Table 2). The factor loadings matrix of the PCA/FA analysis is provided in Supplementary Materials Section S2.2. The factor loadings of the original variables with the extracted varifactors (presented in Table S3 of the Supplementary Materials) indicate loadings were found that were higher than 0.5 and lower than −0.5. These loadings allowed identification of the most meaningful parameters within each varifactor. The first varifactor (F1), explaining 29.1% of the total variance (Table 2), showed a positive loading with the general information of the population, such as gender, age, marital status, and children aged less than 5 (Table S3). Because this factor represented the presence of families within the population, it was named the "family index". The second varifactor accounted for about 19.45% of the total variance and was loaded by the age class of people older than 70 years, and was named the "ageing index". The third extracted varifactor (F3) explained about 15% of the total variance and accounted for the variables that were mostly related to the population age classes between 10-24 and 50-59 years old. Thus, it was interpreted as the "adolescents and quinquagenarian index". Factor four (F4) accounted for 7% of the total variance and was interpreted as the "divorce index" because it was correlated with the number of people who were divorced. Factor five (F5) accounted for the "illiteracy index", explaining 6.47% of the total variance. The sixth varifactor (F6) was loaded by the population with an age between 60 and 69 and was named the "sexagenarian index", explaining 5% of the total variance. Factor seven (F7) accounted for children of 5-9 years, so was considered the "child index", explaining 3.60% of the total variance. Finally, factor eight (F8) accounted for about 2.6% of the total variance and correlated with the number of people with a university degree, and was thus interpreted as the "education index". These eight demographic indicators were used to describe the socio-demographic context in the Milan area, which is relatively heterogeneous in terms of age, education, employment, etc.

Hierarchical Cluster Analysis (HCA) Applied to Eight Factor Loadings
HCA was applied to the eight varifactors (F1-F8) previously extracted through factor analysis for analysis of the similarities in terms of the demographic indictors at the different monitoring stations. HCA outlined six demographic clusters (CLs). In Figure 5, bar charts show the cluster characteristics. Bars show the average of the eight demographic indicators across the six clusters. Based on Figure 5, the six demographic clusters (CLs) have the following characteristics: indicators below the average, with the exceptions of the child index and education index, which are slightly above the average; cluster 4 (CL4) has the majority of varifactors above the average of the overall sample, particularly illiteracy, family, divorce and child indexes; and cluster 5 (CL5) is dominated by the aging index. However, the illiterate index and the family index are slightly above the average of the overall sample. Cluster 6 (CL6) is dominated by the education index, whereas all other indicators, with the exception of the illiterate index, are above the average of the overall sample.

Statistical Analysis of CBZ Levels in Aquifer B
To reduce the effect of heterogeneity, hierarchical clustering was applied to the 21 measuring points of CBZ in the second aquifer (B). Cluster analysis helped to group the groundwater samples according to their similarities in terms of CBZ levels, as illustrated in the dendrogram ( Figure 6). The dendrogram illustrates the distance between cases or groups that are clustered in a particular step. Dendrograms can generally provide information about the appropriate number of clusters to retain. The horizontal axis of the dendrogram represents the distance or dissimilarity between clusters, and the vertical axis represents the groundwater (GW) monitoring stations and clusters. The dendrogram suggests grouping the 21 groundwater monitoring stations of aquifer B into five clusters (CLBs). Cluster 1 (CLB1) includes the stations of B1, B14, and B15, with CBZ concentrations lower than 4 ng/L; cluster 2 (CLB2) includes the stations B17, B19, B8, B5, and B20, with CBZ values in the range of 7-11 ng/L. Cluster 3 (CLB3) consists of stations B4, B6, B3, and B16, with CBZ concentrations around 16 ng/L. Cluster 4 (CLB4) contains the stations B2, B9, B18, B21, B10, and B11, with CBZ values between 12 and 15 ng/L, and cluster 5 (CLB5) includes the stations B12, B7, and B13, with the highest levels of CBZ above 22 Cluster 1 (CL1) shows some of the characteristics above the average of the overall sample (e.g., family, aging, and adolescent indexes), whereas the other four demographic varifactors are all below the average; cluster 2 (CL2) is characterized by all the varifactors and indicators below the average of the overall sample; cluster 3 (CL3) presents all the indicators below the average, with the exceptions of the child index and education index, which are slightly above the average; cluster 4 (CL4) has the majority of varifactors above the average of the overall sample, particularly illiteracy, family, divorce and child indexes; and cluster 5 (CL5) is dominated by the aging index. However, the illiterate index and the family index are slightly above the average of the overall sample. Cluster 6 (CL6) is dominated by the education index, whereas all other indicators, with the exception of the illiterate index, are above the average of the overall sample.

Statistical Analysis of CBZ Levels in Aquifer B
To reduce the effect of heterogeneity, hierarchical clustering was applied to the 21 measuring points of CBZ in the second aquifer (B). Cluster analysis helped to group the groundwater samples according to their similarities in terms of CBZ levels, as illustrated in the dendrogram ( Figure 6). The dendrogram illustrates the distance between cases or groups that are clustered in a particular step. Dendrograms can generally provide information about the appropriate number of clusters to retain. The horizontal axis of the dendrogram represents the distance or dissimilarity between clusters, and the vertical axis represents the groundwater (GW) monitoring stations and clusters. The dendrogram suggests grouping the 21 groundwater monitoring stations of aquifer B into five clusters (CLBs). Cluster 1 (CLB1) includes the stations of B1, B14, and B15, with CBZ concentrations lower than 4 ng/L; cluster 2 (CLB2) includes the stations B17, B19, B8, B5, and B20, with CBZ values in the range of 7-11 ng/L. Cluster 3 (CLB3) consists of stations B4, B6, B3, and B16, with CBZ concentrations around 16 ng/L. Cluster 4 (CLB4) contains the stations B2, B9, B18, B21, B10, and B11, with CBZ values between 12 and 15 ng/L, and cluster 5 (CLB5) includes the stations B12, B7, and B13, with the highest levels of CBZ above 22 ng/L (Table 1). More information about the classifications based on cluster analysis and descriptive statistics of each cluster can be found in Supplementary Materials Tables S4 and S5. descriptive statistics of each cluster can be found in Supplementary Materials Tables S4  and S5.
The CBZ concentration clusters in aquifer B (CLBs) were cross-tabulated with the demographic clusters (CLs) explained in Section 3.2, and their statistical association was tested using the chi-square test. Chi-square results showed that the CBZ clusters and the demographic clusters were significantly associated (χ 2 = 158; p-value = 0.00001). In particular, the clusters (CLBs) with higher CBZ levels were found to be associated with a higher presence of older people. For a better understanding of the relationship between the demographic indicators and CBZ levels, a correlation analysis was applied to the dataset aggregating the demographic indexes by the CBZ clusters. The results showed that there was a positive significant relationship between family index, aging index, divorce index, and CBZ levels in groundwater. By contrast, the sexagenarian index showed a negative significant correlation with the concentration of CBZ in aquifer B. The correlation matrix is presented in Table 3. Although some correlations are weak, it appears reasonable to conclude that some factors such as age may affect the population consumption of CBZ, and therefore its concentration in the groundwater. Higher CBZ levels may be the result of a higher consumption of CBZ among the elderly aged over 70 years, or adults who have children aged under 5 years. Moreover, there is a positive significant correlation between the high level of CBZ and the divorce index. Other indicators did not show a clear relationship with CBZ pollution in the mentioned aquifer. The CBZ concentration clusters in aquifer B (CLBs) were cross-tabulated with the demographic clusters (CLs) explained in Section 3.2, and their statistical association was tested using the chi-square test. Chi-square results showed that the CBZ clusters and the demographic clusters were significantly associated (χ 2 = 158; p-value = 0.00001). In particular, the clusters (CLBs) with higher CBZ levels were found to be associated with a higher presence of older people.
For a better understanding of the relationship between the demographic indicators and CBZ levels, a correlation analysis was applied to the dataset aggregating the demographic indexes by the CBZ clusters. The results showed that there was a positive significant relationship between family index, aging index, divorce index, and CBZ levels in groundwater. By contrast, the sexagenarian index showed a negative significant correlation with the concentration of CBZ in aquifer B. The correlation matrix is presented in Table 3. Although some correlations are weak, it appears reasonable to conclude that some factors such as age may affect the population consumption of CBZ, and therefore its concentration in the groundwater. Higher CBZ levels may be the result of a higher consumption of CBZ among the elderly aged over 70 years, or adults who have children aged under 5 years. Moreover, there is a positive significant correlation between the high level of CBZ and the divorce index. Other indicators did not show a clear relationship with CBZ pollution in the mentioned aquifer.

Statistical Analysis of CBZ Levels in Aquifer A
Hierarchical cluster analysis (HCA) was also applied to the groundwater samples of the first aquifer where CBZ concentrations varied between 1.53 and 98.16 ng/L. The HCA dendrogram suggested grouping the monitoring stations into four clusters (CLAs). Cluster 1 (CLA1) indicates the stations A11, A13, A2, A3, A5, and A6, with concentrations between 13.6 and 25.10 ng/L. Cluster 2 (CLA2) includes samples A1, A14, A18, A19, A20, and A8, with CBZ values in the range from 1.53 to 9.4 ng/L. Cluster 3 (CLA3) shows the highest concentration of CBZ between 65.03 and 98.16 ng/L, and consists of stations A12, A17, A21, and A7, and cluster 4 (CLA4) represents stations A10, A15, A16, A4, and A9 with CBZ values from 34.57 to 46.6 ng/L. The box-and-whisker plot (Figure 7) shows the CBZ concentrations in different clusters. Table S6 in Supplementary Materials reports the summary statistics of each CBZ cluster in aquifer A. The chi-square test was applied to determine whether there was an association between CBZ clusters in aquifer A (CLAs) and demographic clusters (CLs). The result allowed us to reject the null hypothesis (χ 2 : 278; p < 0.01) and to conclude that there was an association between CBZ clusters in aquifer A and demographic clusters.

Statistical Analysis of CBZ Levels in Aquifer A
Hierarchical cluster analysis (HCA) was also applied to the groundwater samples of the first aquifer where CBZ concentrations varied between 1.53 and 98.16 ng/L. The HCA dendrogram suggested grouping the monitoring stations into four clusters (CLAs). Cluster 1 (CLA1) indicates the stations A11, A13, A2, A3, A5, and A6, with concentrations between 13.6 and 25.10 ng/L. Cluster 2 (CLA2) includes samples A1, A14, A18, A19, A20, and A8, with CBZ values in the range from 1.53 to 9.4 ng/L. Cluster 3 (CLA3) shows the highest concentration of CBZ between 65.03 and 98.16 ng/L, and consists of stations A12, A17, A21, and A7, and cluster 4 (CLA4) represents stations A10, A15, A16, A4, and A9 with CBZ values from 34.57 to 46.6 ng/L. The box-and-whisker plot (Figure 7) shows the CBZ concentrations in different clusters. Table S6 in Supplementary Materials reports the summary statistics of each CBZ cluster in aquifer A. The chi-square test was applied to determine whether there was an association between CBZ clusters in aquifer A (CLAs) and demographic clusters (CLs). The result allowed us to reject the null hypothesis (χ 2 : 278; p < 0.01) and to conclude that there was an association between CBZ clusters in aquifer A and demographic clusters.  The Pearson correlation analysis was applied to the dataset aggregating the demographic indexes by the CBZ clusters of aquifer A, and the results are shown in Table 4. The correlation analysis demonstrated that there was a significant positive relationship between high concentrations of CBZ in aquifer A and the family index, aging index, and divorce index. Moreover, the inverse correlation between the sexagenarian index and education index showed that people who are aged between 60 and 69 years, and educated people, tend to use smaller quantities of CBZ compared to the other demographic groups.
The correlation between the demographic indexes and CBZ levels of aquifer A was higher than that of aquifer B, in general, but the two followed the same trend. In order to determine whether CBZ levels are higher in the first aquifer than in the second aquifer, a paired sample t-test was applied. The results showed that there was a significant difference between CBZ levels in aquifer A and aquifer B (t = 54.3; p-value < 0.0001), meaning that, on average, the concentration of CBZ in aquifer A was 16.8 ng/L higher than the CBZ concentration in aquifer B (95% confidence interval ranging from 16.18 to 17.4). This result is consistent with the hypothesis that CBZ may originate from leakages from the sewage network, which generally lays at few meters from the ground surface. Moreover, a chi-square test was used to test the association of CBZ clusters in the first aquifer (A) with the CBZ clusters of the second aquifer (B), and the association was found to be significant (χ 2 = 21.97; p-value < 0.0001). Thus, we can reject the null hypothesis that asserts the two aquifers were independent of each other in terms of CBZ contamination. Finally, Pearson's correlation analysis was applied to the CBZ levels in both aquifers A and B, and revealed a positive and highly significant correlation (r = 0.44; p < 0.00009), suggesting the possibility of CBZ leaching from the upper aquifer (A) to the lower aquifer (B).

Discussion
The significant amount of evidence of the presence of pharmaceutical residues in freshwater suggests that current policy approaches are inadequate for the protection of water quality and freshwater ecosystems. Current policies are mostly reactive (when the risks are proven), applied on an individual substance basis (i.e., through individual EQS), and resource intensive, whereas the diffuse pollution remains largely unmonitored and unregulated [70]. There is a need to change the implementation of current policies from reactive to proactive according to the following strategies (OECD) [71]: (i) improve monitoring and reporting on the occurrence, fate, toxicity, human health, and ecological risks of pharmaceutical residues to lay the groundwork for pollution reduction policies; (ii) implement source-directed approaches, such as the sustainable design and procurement of pharmaceuticals, to prevent the release of residues in water bodies; (iii) introduce the use of orientated approaches, such as disease prevention, improved diagnostics, and restrictions on pharmaceuticals with high environmental risk, to reduce inappropriate and excessive consumption; and (iv) implement end-of-pipe measures, such as advanced wastewater treatment, public collection schemes for unused pharmaceuticals, and education campaigns to safely dispose of and remove pharmaceutical residues.
Consistent with these strategies, the main objective of this study was to evaluate the correlation of groundwater CBZ levels in different aquifers with demographic indicators within the Milan urban area. Despite the fact that CBZ is a well-known and ubiquitous pollutant, very little is known about its source and impacts. Thus, to assess whether the overall population acts as a homogeneous CBZ source, we investigated the demographic characteristics in the Milan urban area at a sub-municipal scale.
The 76 demographic variables derived from the census data of the Milan municipality were reduced to a smaller number of more effective indicators using factor and cluster analyses. The results clearly showed that the level of CBZ in groundwater is not homogeneous throughout the territory and is strongly related to the population composition of the sub-areas. Moreover, the concentrations of CBZ in the groundwater samples of the first aquifer were 16.8 ng/L higher than those of the second aquifer, and the contamination followed the same pattern in both the unconfined and semi-confined aquifers. This finding is consistent with the hypothesis that CBZ derives from sewage leakages occurring at soil depths of a few meters below the surface, before leaking to the lower aquifers.
The results also indicated that, although the population overall is the main source of pharmaceutical contamination in urbanized areas, the different consumption profiles of the population demographic segments are clearly reflected in the occurrence and loads of CBZ measured in groundwater. Higher concentrations of CBZ in both aquifers were found to be associated with different age classes and social composition. In particular, the highest levels of CBZ in groundwater were found to be correlated with a higher presence of people aged over 70 years. This is consistent with the findings of Hauser et al. [72], who found that the incidence of epilepsy is increasing in those over 60 years of age. In addition, many studies showed that CBZ is used frequently in the elderly population [73,74], and elderly patients who use CBZ and are considered stable are unlikely to switch to another anti-epileptic drug and use it for the remainder of their life [75].
The family index (parents who live with their children aged under 5 years) was another indicator influencing the CBZ concentration in groundwater of both aquifers. According to some research, CBZ is also heavily utilized for treatment of epilepsy in children [76][77][78]. Recently, the Italian Medicines Agency (AIFA) [79] reported that CBZ is widely consumed by children aged between 2 and 7 years as an anti-epileptic drug and is among the top 20 pharmaceuticals sold in Italy.
The results also indicated that, when the education index was higher, the consumption of CBZ decreased and lower concentrations of CBZ were detected in the groundwater of aquifer A. By contrast, there was a strong positive relationship between the divorce index and the high level of CBZ in groundwater. Some experts have noted that CBZ, as a tricyclic drug, can produce a sensation of euphoria, which is noted as a side effect of long-term use and can lead to addiction and overdose [80,81]. Thus, overuse of this drug is not only addictive, but may also be harmful for the aquatic environment. As a result, because the responses to substance usage differ in line with, in particular, age and level of education, as noted by Xu et al. [82], increasing social awareness may be necessary for controlling the usage of CBZ.
It should also be noted that this analysis has some limitations. Although the census data used in this study are consistent with the temporal scale of CBZ monitoring measurements, the measured CBZ concentrations may also derive from the former demographic composition and previous drug consumption. A pre-existing CBZ monitoring campaign, which was not available, may have allowed us to assess whether past drug use was more correlated to the CBZ levels we measured. In addition, groundwater flux dynamics may also have contributed to changing the CBZ concentration pattern.
CBZ is a well-proven drug with a long history of human consumption. Thus, in consideration of the long history of CBZ uses, it may be assumed that the time scales of both the local groundwater and the local demographic dynamics are not significantly different, which may explain the correlations we found. We believe these results provide further insights into the source and impacts of CBZ and may support more efficient management actions in terms of implementing both a use-orientated approach and end-of pipe-measures.

Conclusions
In line with current strategies for the management of pharmaceutical residues, this study contributes to the understanding of the source and impacts of carbamazepine groundwater pollution in a large urban area. The main finding of the study is the evidence that, although the population as a whole is the main source of pharmaceuticals in urban areas, the local demographic characteristics and the specific consumption patterns are clearly reflected by CBZ levels in groundwater. In summary:

•
Demographic information can provide a better qualitative understanding of CBZ contamination in the groundwater of Milan city. • CBZ concentrations in groundwater were higher in the area with a higher presence of the aging population over 70 years old.

•
The consumption of CBZ in children less than 5 years old living with their parents may also significantly influence the amount of CBZ in groundwater.
• In addition to age classes, social characteristics (i.e., level of education, marital status) may also influence CBZ levels. • CBZ levels present the same pattern in both the upper aquifer (A) and the lower aquifer (B) in the Milan urban area, supporting the hypothesis that CBZ may derive from sewage leakages and then leak to the lower aquifers.

•
Given that the semi-confined aquifer (B) is used for drinking water purposes in the Milan area, the contamination of groundwater may have implications for human health, and supports the need for specific water treatments. A better understanding of the source and impact of CBZ may support more efficient management in terms of both better drug use and more cost-efficient treatment.