Deriving Natural Background Levels of Arsenic at the Meso-Scale Using Site-Speciﬁc Datasets: An Unorthodox Method

: Arsenic is found in groundwater above regulatory limits in many countries and its origin is often from natural sources, making the deﬁnition of Natural Background Levels (NBLs) crucial. NBL is commonly assessed based on either dedicated small-scale monitoring campaigns or large-scale national/regional groundwater monitoring networks that may not grab local-scale heterogeneities. An alternative method is represented by site-speciﬁc monitoring networks in contaminated/polluted sites under remediation. As a main drawback, groundwater quality at these sites is affected by human activities. This paper explores the potential for groundwater data from an assemblage of site-speciﬁc datasets of contaminated/polluted sites to deﬁne NBLs of arsenic (As) at the meso-scale (order of 1000 km 2 ). Common procedures for the assessment of human inﬂuence cannot be applied to this type of dataset due to limited data homogeneity. Thus, an “unorthodox” method is applied involving the deﬁnition of a consistent working dataset followed by a statistical identiﬁcation and critical analysis of the outliers. The study was conducted in a highly anthropized area (Ferrara, N Italy), where As concentrations often exceed national threshold limits in a shallow aquifer. The results show that site-speciﬁc datasets, if properly pre-treated, are an effective alternative for the derivation of NBLs when regional monitoring networks fail to catch local-scale variability.


Introduction
Heavy metals and metalloids affect the quality of groundwater in many parts of the world [1] and are by far the most abundant group of contaminants and pollutants affecting European groundwaters [2]. Among them, arsenic is a well-known threat to human health and ecosystems due to its toxicity and carcinogenicity. Arsenic is found in groundwater above local regulatory limits in many countries [3]. Its origin is often from natural sources, such as arsenic-bearing minerals occurring in sediments and rocks, with release to groundwater driven by certain geochemical conditions that favor the As mobilization [4]. For this reason, the assessment of the Natural Background Level (NBL) of arsenic in groundwater is crucial, especially in urbanized and industrialized areas, where natural arsenic pollution should be distinguished from contamination caused or triggered by human activities in order to set proper remediation goals [5]. Moreover, the definition of NBLs supports the correct management of groundwater resources by highlighting potential issues related to chronic human exposure to naturally occurring arsenic.
Essential for the definition of NBLs, besides methods and protocols, is the availability of a set of hydrogeochemical data representative of the pristine groundwater composition, including the concentration of the pollutant/contaminant of interest and major ions, as well as the physicochemical parameters of water. The higher the quality, quantity, and homogeneous distribution of the data, the more accurate the NBL definition and the disentanglement of processes that cause pollutant release in groundwater [6][7][8]. The dataset is most often acquired with dedicated monitoring campaigns in pre-existing or new boreholes and springs (feasible for relatively small areas in the order of 100 km 2 ) [9][10][11][12][13] or from groundwater quality monitoring networks implemented at the national level [14][15][16][17][18] or, as for Italy, at the scale of administrative regions [8,[19][20][21][22][23][24]. In many European countries, national or regional groundwater quality networks were developed in the last decades to fulfill the requirement of the European Groundwater Directive (2006/118/EC) for a systematic assessment of the chemical status of groundwater bodies. Some authors recently pointed out that the definition of arsenic NBLs at the scale of the groundwater body may provide unreliable results, due to local-scale geological and geochemical variability influencing As release to groundwater [25,26]. To overcome this limitation, new approaches have been proposed, representing an improvement of the two commonly used methodologies (i.e., the preselection and component separation methods) defined by the EU BRIDGE research project (Background cRiteria for the IDentification of Groundwater thrEsholds), that provided guidelines aimed at harmonizing the methods for estimating NBLs at the European level [27]. The preselection approach involves the use of indicator chemical species, such as NO 3 , NH 4 , Cl, and K [15,28,29], to identify samples with most likely anthropogenic influences [14,19,[30][31][32][33][34]. The component separation approach involves the subdivision of the working dataset into normally and log-normally distributed populations, considering the latter as representative of the natural background [8,20,[35][36][37]. The new approaches mentioned above involve the combination of the preselection or component separation methods with geostatistical tools that take into account the actual distribution of the contaminant of concern and its correlation with other environmental parameters (e.g., indicator kriging [21,23,[38][39][40] or object-oriented statistics [24]). These approaches allow for a spatial enhancement of the high-quality information provided by national or regional monitoring networks by producing maps of NBLs or associated probability of exceedance, instead of assigning a single background value or a range of values for the whole groundwater body. However, the regional-scale monitoring networks upon which the geostatistical approaches are based remain unable to grab the potential spatial complexity caused by local-scale natural heterogeneities, since they can only provide point data with relatively large mesh (i.e., a few km).
A pervasive and extensive source of information on groundwater chemical composition is related to contaminated/polluted sites under remediation. These sites are commonly found in urbanized and industrialized areas and are generally managed through site-specific monitoring networks made of densely spaced piezometers that are typically monitored over a timespan of years to decades [41,42]. Notwithstanding the large amount of monitoring data associated with contaminated and polluted sites, an obvious drawback towards the use of these data for the definition of NBLs is that groundwater quality at these sites is most likely affected by human activities to some degree. Some authors recently put effort into distinguishing between natural (or, in a broad sense, background) and anthropogenic arsenic and other compounds in the aquifers below urban landfills [43,44] or tried to identify unimpacted monitoring locations within the network of a large industrial complex to assess background concentrations [45]. These authors performed thorough analyses at the scale of individual contaminated sites using very well-informed databases. To the best of our knowledge, the potential for an aggregation of monitoring networks of sites under remediation to assess NBLs at a meso-scale (i.e., areas in the order of 1000 km 2 ) has never been explored in the literature. Given the large availability of this type of data in urbanized and industrialized settings, their potential is worth investigating at least for assessing NBLs of compounds of likely natural origin, being infrequently associated with common human activities, such as arsenic [4].
In the case of groundwater quality data collected from an assemblage of site-specific monitoring networks whose original scope was different from NBL definition, data preprocessing is of pivotal importance [8]. The first issue that should be dealt with is the variability of sample collection and analysis methods, since each site-specific network is managed independently, possibly following different strategies. This variability necessarily reflects on concentration data that should be critically evaluated in order to identify a working dataset as consistent as possible. The second major issue is the identification of possible anthropic impacts on NBLs, involving the contaminant of concern. The most common strategies recently used to distinguish anthropic from natural contributions (i.e., preselection and component separation) hardly apply to assemblages of site-specific datasets for the following reasons: (1) the preselection approach barely operates because the monitoring of sites under remediation generally involves the sole analysis of specific contaminants and pollutants, excluding some major ions which are frequently used as indicator species (e.g., NO 3 , NH 4 , Cl, and K); (2) the component separation can be inappropriate because different sites could be subjected to different geochemical processes, leading to different types of data distribution and generating, on the whole, a multi-modal distribution, for which the assumptions "lognormal component = natural background dataset" and "normal component = human-influenced dataset" hardly apply. In addition, natural background populations are often not lognormal [8,46,47]. With the inapplicability of such standardized procedures, careful statistical identification and critical analysis of the outliers of the compound of concern for NBLs can be a reasonable way to eliminate the most likely anthropic inputs [48].
This paper aims at assessing the potential for publicly available groundwater quality data from sites under remediation to define NBLs of As at the meso-scale (i.e., in the order of 1000 km 2 ). Since standardized procedures, such as the preselection and component separation cannot be applied to a dataset formed by an assemblage of site-specific datasets, an "unorthodox" method is here applied, involving statistical identification and critical analysis of the outliers, based on the conceptual model elaborated for the investigated system. This study was performed in a highly anthropized area (Ferrara, in the Po Plain, northern Italy), where As concentrations exceeding the national threshold limits (10 µg/L [49]) are often detected in the shallowest aquifer of a complex multilayered system. This shallow aquifer is made of alluvial or coastal deposits locally enriched in peat, which is well known to drive As release in groundwater in the Po Plain [50,51] and worldwide [3]. Thus, a potential issue of surficial natural pollution subsists in the area, that must be addressed carefully. In the study area, the shallow aquifer is monitored by an institutional regional network dedicated to groundwater quality monitoring, which consists of only 18 monitoring points. The lack of an adequate number of monitoring points from the regional monitoring network motivated the use of an assemblage of site-specific monitoring networks (leading to a total number of 980 monitoring points). This approach can be potentially applied to other anthropized areas worldwide, where many site-specific monitoring networks exist and the regional monitoring is not able to grab the existing hydrogeochemical heterogeneity.

Conceptual Model of the Investigated System
The investigated area is located in the southeastern sector of the lower Po River Plain and corresponds to the administrative Province of Ferrara (Emilia Romagna Region, northern Italy). It covers an area of about 2600 km 2 bounded by the Po River to the north, the Adriatic Sea to the east, and the Reno River to the south ( Figure 1). The subsurface stratigraphy of the study area consists of sub-horizontal alternations of coarse-grained (sand) and fine-grained (silt and clay) sediment bodies, tens of m thick, down to around 200 m below ground surface (bgs) that form the Pliocene-Pleistocene infill of the Po Plain foreland basin [52]. From a hydrogeological standpoint, such configuration corresponds to a number of vertically stacked aquifer-aquitard systems [53]. Our research is focused on the shallowest 10 m bgs, hosting heterogeneous Holocene deposits that constitute the shallowest aquifer system, known as "A0" [54]. Above the Pleistocene-Holocene boundary, the A0 system is made up of poorly interconnected sandy ribbons and lenses (aquifer) encased into a silty or clayey matrix (aquitard) [55]. Facies associations in the A0 aquifer system are part of three distinct depositional systems ( Figure 1). The western sector is occupied by alluvial plain deposits supplied by the Po River and other Northern Apennine river systems south of Ferrara. Several facies associations have been recognized along a detailed cross-section cutting this sector [56]: fluvial-channel sands, crevasse splay and levee sand-silt alternations, and poorly-drained to well-drained floodplain silts and clays (thorough descriptions of facies associations are provided in the cited literature). The central sector corresponds to a delta plain depositional system mainly fed by the Po River. It is characterized by thick, laterally extensive mud-prone facies associations, including swamp, salt marsh, and lagoon/bay clays and silts. Distributary-channel sands and related overbank deposits form isolated lens-shaped bodies [57,58]. A peculiar feature of this sector is the abundance of organic matter, especially in fine-grained swamp and salt marsh facies showing frequent peat intercalations, several dm thick. Vegetal material, such as plant remains, plant debris, or roots, is typically associated with peat layers. The easternmost sector is a coastal plain facing the Adriatic Sea that consists almost exclusively of coastal sands, interpreted as beach-ridge or delta front facies associations [55]. Unlike the other two sectors, fine-grained deposits in the shallow subsurface of the coastal area are highly subordinate and correspond to thin, surficial paludal deposits that accumulated in topographic lows between individual sand ridges.
The aquifer is unconfined to semiconfined depending on the stratigraphic architecture and thickness of the encasing fine-grained deposits. Horizontal groundwater flow directions, hydraulic conductivity, and hydraulic gradients are extremely variable due to aquifer heterogeneity and complex sand-body geometry. The hydraulic head is close to the ground surface (max depth of about 1 m bgs) [59,60] and is generally dominant with respect to the deeper confined aquifers. Recharge of A0 is almost exclusively vertical, by rainfall or irrigation [61]. These features make the aquifer vulnerable to surface sources of pollution.
As for the most part of the Po River Plain, the Ferrara area is highly impacted by farming, with 70% of cultivated lands, and by industrial and urban activities occupying 8% of the territory [62][63][64], causing, in many cases, the deterioration of shallow groundwater quality. Among many organic and inorganic pollutants, arsenic is often detected in the A0 groundwater at concentrations exceeding the Italian regulatory limit of 10 µg/L. The study area hosts 45 sites under remediation supervised by local public authorities, i.e., the former Environmental Office of the administrative Province of Ferrara and the Regional Agency for Prevention, Environment, and Energy of Emilia-Romagna (ARPAE). These sites are listed in the Supplementary Material (SM) and their locations are shown in Figure 1. The sites are impacted by a number of different anthropic activities, such as gas stations (14 sites), urban (six sites), and inert (three sites) waste landfills, sugar factories (five sites), one large petrochemical complex, and other smaller industrial plants or various activities (16 sites) causing groundwater contamination. To the best of our knowledge, no direct anthropogenic inputs of As to soils and groundwater were registered in these sites. Thus, As occurrence is likely related to mobilization from sediments due to the reductive dissolution of Fe and Mn oxyhydroxides driven by the oxidation of organic matter (OM). The main source of OM in this aquifer is buried peat, as previously observed in other sectors of the Po Plain [65][66][67] and in similar fluvial systems around the world [3], however, additional anthropogenic sources of OM (e.g., hydrocarbons or landfill leachate spills) cannot be excluded. More specifically, some spills/leaks of hydrocarbons were reported for a sugar factory (site S02) and most likely occurred in the petrochemical complex (site S45).
Eh values between +150 and −385 mV, with a prevalence of negative values around -200 mV, have been previously detected in A0 all around the investigated area [68][69][70][71][72], confirming the occurrence of reducing conditions. The occurrence of the peat-rich facies in the central sector is expected to favor much higher arsenic concentrations in groundwater with respect to the western and eastern sectors.

Available Dataset
Hydrochemical data of the 45 sites were collected from a public registry of sites under remediation. The registry was managed by the former Environmental Office of the administrative Province of Ferrara until 2015 and is currently handled by ARPAE. Each site is equipped with 1 to 463 piezometers screened in the A0 aquifer, with variable screen depths ranging within 1.5 and 10 m bgs. For this study, arsenic concentrations were retrieved from 980 piezometers distributed in the 45 sites, over a variable timespan between 1997 and 2015, for a total of 4305 As values. Concentration analyses were performed either by private labs or by the public ARPAE labs, following a variety of sampling and analytical protocols. In a few instances (126 samples), reference samples were counteranalyzed by two different labs. Although details on sampling and analytical protocols were often unavailable, samples have been labeled as "filtered at 45 µm" or "unfiltered" when the information was retrievable. In some cases (363 samples), duplicate filtered and unfiltered samples were analyzed by the same lab. Data on major water ions and the specific contaminants affecting the sites were often incomplete or unavailable. When feasible, the location of the piezometer with respect to the source of contamination was defined based on groundwater flow directions provided by site-specific reports. Piezometers unequivocally located up-gradient to the source or labeled as "blanks" by site investigators were considered not affected by the site-specific anthropic contamination. When the location with respect to the source of contamination was unknown, piezometers were labeled as "internal", i.e., within the boundaries of the site as defined for remediation purposes, or "external". Internal piezometers were considered plausibly affected by the site-specific contamination due to likely proximity to the source of contamination. Concerning the regional monitoring network, managed by ARPAE and aimed at supporting the periodic determination of the chemical status of groundwater bodies, As concentrations were available from the 18 monitoring points tapping aquifer A0 (locations in Figure 1

Data Pre-Processing and Calculation of Natural Background Levels of Arsenic
Since the entire database of As concentrations in groundwater was derived from the registry of sites under remediation, where some indirect anthropogenic influences on As concentrations (i.e., hydrocarbons and/or landfill leachate spills/leaks) cannot be excluded, the data were pre-processed prior to NBL calculation, following a slightly unconventional (unorthodox) procedure aimed at minimizing data shortcomings caused by the variability of sampling and analytical techniques and/or possible anthropic interferences. This procedure is described step-by-step in the following subsections and involves: (1) treatment of non-detects, (2) data quality assessment and preparation of the working dataset, (3) identification and treatment of outliers, and (4) calculation of NBL. The common procedure of considering a charge balance error below 10% as the criterium for evaluating data quality [27] could not be applied in the present study since the full set of major ions analyses were not available in the site-specific datasets. Instead, data quality was evaluated through the analysis of duplicate filtered/unfiltered samples and counter-analyzed (public and private labs) samples (i.e., the only information available for a significant number of samples), as described in detail in Section 2.3.2.

Treatment of Non-Detects
Non-detects (i.e., samples with an As concentration lower than the limit of detection-0-LOD) are 14.8% of the total As dataset. The literature suggests assigning a value of LOD/2 to the non-detects when these are a small proportion (<15%) of the total number of observations, without the need for further analyses [73]. The use of different sampling and analytical protocols in the considered dataset produced nine different LODs between 0.01 and 5 µg/L. To avoid a fabricated spatial and temporal variability of As concentrations, the lowest LOD that had a significant recurrence in the database (>5% of the non-detects) was selected, corresponding to 0.01 µg/L (23.1% of total non-detects), and its half value (0.005 µg/L) was assigned to all the non-detects.

Data Quality Assessment and Preparation of the Working Dataset
Different sampling strategies and sample pre-treatment and preservation may induce significant variability in As concentration measures, much larger than that expected from the application of different analytical methods, as long as the analyses follow standard protocols validated under national accreditation bodies [74]. In particular, field filtration of samples was reported to influence the measured concentration of several trace elements in water [75]. Filtered samples generally cause less variability of groundwater As concentration in a well compared to unfiltered samples [76].
The dataset of this study was split into three different categories based on field filtration of the sample: filtered, unfiltered, and unknown. In the last category, the information on field filtering was unavailable and the associated As data could not be considered for further observations on data quality. Duplicate filtered/unfiltered samples were employed to compare the first two categories. The alignment with respect to the 1:1 line of the two types of data was checked and the nonparametric Mann-Whitney U test [77] was applied to assess whether the data of the two groups were ascribable to the same statistical population. Counter-analyzed samples were used to assess the variability within each of the two categories. Scatter plots were employed to check whether the couples of counter-analyzed data were affected by analytical variability.
The aim of such comparisons was to assess data quality in order to identify a consistent, reliable, and robust working dataset of As concentration to be considered in the further steps.

Identification and Treatment of Outliers
The identification of anthropogenic inputs that may have affected the composition of shallow groundwater is crucial in the current study given the human-impacted nature of the considered database. The most widely used approaches at the EU level for the identification of such impacts are pre-selection and component separation [8,27]. These approaches could not be applied in the present case study for the reasons discussed in Section 1.
Therefore, potential impacts deriving from human activities were assessed here through a careful analysis of temporal and spatial outliers of As concentration informed by the knowledge on the conceptual model described in Section 2.1. The idea behind this choice is that impulsive anthropogenic influences may generate one or more outliers within the time series of a monitoring point (temporal outliers), whereas constantly impacted monitoring points may constitute an outlier with respect to the surrounding unaffected monitoring points (spatial outliers).
As a first step, a temporal analysis of the outliers was performed at each monitoring point. The low number of As data available for each piezometer (max six) did not allow for the statistical identification of the outliers. Instead, the interquartile range (IQR) approach was employed to analyze the temporal series [78]. The upper outliers (C) were defined as follows [79]: where IQR corresponds to the difference between the 75th percentile (Q3) and 25th percentile (Q1) of the series. The lower outliers (i.e., < Q1-1.5 × IQR) were not considered, since they are not of interest in the definition of NBLs, being uninformative of possible anthropic impacts. After the exclusion of temporal outliers, each series was tested for the occurrence of a temporal trend using the nonparametric Mann-Kendall test [80,81], since the occurrence of such trends could be a further indication of anthropic influences [82]. Eventually, the outliers were critically analyzed on the basis of the site conceptual model, with a focus on the geology and location of the monitoring points with respect to the source of contamination and groundwater flow, to discern between natural or anthropic anomalies. The latter were excluded from the database prior to NBL calculation. As a general rule, upper outliers with arsenic concentrations ≤5 µg/L, i.e., half of the national regulatory limit, were maintained in the working database, since they were considered unaffected by relevant anthropogenic influences [83]. A similar procedure was followed for the identification of spatial outliers. In this case, a representative As concentration was considered for each piezometer corresponding to the median value of the temporal series [25] and the outliers were searched within each site, i.e., the IQR was calculated from the median values of each piezometer at a site. The use of the median as a representative value of a temporal series is suggested in several protocols for the estimation of NBLs [27,82], although it may lead to an underestimation of NBLs when the peak values of the series are unequivocally related to natural processes [83]. However, the median value was considered as a precautionary choice in the current research because the presence of anthropogenic influences on the extreme As values of some piezometers cannot be definitively excluded by the sole analysis of outliers.

NBL Calculation
For the definition of the NBLs of As in the A0 groundwater, the pre-processed database was split into three sub-populations corresponding to the three depositional systems identified in the conceptual model of Section 2.1. Indeed, the literature for the Po Plain area shows evidence of a strong control of the local geology (i.e., presence/absence of natural buried organic matter) on natural As release processes operating in the shallow aquifers [65]. The distinct groups of data were first checked with the Mann-Whitney U test to assess whether they could be statistically ascribed to different populations. Then, the NBL was calculated as the 90th percentile of the distribution of each distinct population in agreement with the BRIDGE protocol [27]. The choice of the 90th percentile (instead of higher suggested percentiles, e.g., 95th [84] or 97.7th [27]) aimed at mitigating the intrinsic uncertainty associated with possible undetected anthropic influences in the peak values of the distribution.

Comparison with NBLs Calculated from the Regional Monitoring Network
As a final step, the NBLs calculated with the methodology presented above, i.e., from the assemblage of site-specific datasets, were compared with the NBLs calculated using the data from the 18 points of the regional groundwater quality monitoring network of the same study area (see Section 2.2). Assuming, reasonably, that the regional monitoring points are located out of contaminated/polluted sites (in order to determine a chemical status representative of the whole groundwater body), the NBL was calculated as the 90th percentile of the median values calculated for each monitoring point omitting preliminary analyses on possible anthropic influences. The choice of the 90th percentile, rather than higher percentiles (e.g., 95th or 97th), is motivated by two reasons: (1) to have comparable NBL results (the same percentile used) between the regional dataset and the site-specific dataset, since the 90th percentile was used for the latter; (2) to adopt a precautionary approach: although we can reasonably assume that the regional monitoring points are unaffected by anthropogenic influences, we cannot definitely exclude it.

Identification of a Higher Quality Working Dataset
Statistical parameters for the total As dataset and for the sub-datasets made on the basis of the filtration procedure, i.e., filtered or unfiltered (see Section 2.3.2), are shown in Table 1 (the complete database is presented in the SM). Arsenic concentrations measured from unfiltered samples are generally higher than those from filtered samples, with mean values of 34.3 and 18.7 µg/L, respectively, and similar medians of 3.8 and 3.6 µg/L. The standard deviation is much higher for the unfiltered samples (304.7, compared to 55.7 of filtered data), suggesting higher variability within the unfiltered group. The scatter plot comparing filtered and unfiltered analyses from double sampling ( Figure 2) shows a rather good alignment for many samples but, at the same time, a poor correlation for a significant number of samples, with unfiltered samples often providing higher concentrations. A reason for that may be the occurrence of solid particulates in unfiltered samples, comprising Fe-oxides [85] on which some As may be adsorbed.  The counter-analyzed As concentrations ( Figure 3) show good alignment on the 1:1 line between the public and private labs in the case of filtered samples, even if very few data were available for the comparison (11 samples). On the contrary, the unfiltered data, available in a larger number (115 samples), indicate a much larger variability. In most cases, the unfiltered concentration from the private labs is lower than that of the public lab. This highlights the poor reliability of the unfiltered data. The Mann-Whitney U test [77] applied to the two groups of filtered and unfiltered data from double samples rejects the null hypothesis that the two groups pertain to the same population, with a p-value of 6.8 × 10 −10 . This result confirms that filtered and unfiltered data should be considered separately for the calculation of NBLs.
Given the higher variability and uncertainty (i.e., lower data quality) that appears to be associated with the unfiltered samples, only the dataset of As concentrations from filtered samples was considered as the working dataset for the further steps of NBL calculation. Samples for which the information on filtration was missing were also excluded, since these may be affected by the same limitations as for unfiltered data. These choices cause a significant decrease in the number of available sites (from 45 to 25), monitoring points (from 980 to 392), and samples (from 4305 to 815; see Figure 1 and the SM for details on the availability of filtered samples). On the other hand, the reliability of the final NBLs is expected to increase since filtered concentrations represent higher quality data, as demonstrated by the above observations. Also, the exclusive use of filtered data allows defining background values that are consistent with national guidelines [84], which suggests field filtration of samples for the analysis of metals and metalloids.

Identification and Removal of Outliers Likely Representing Anthropogenic Influences
A total of 21 arsenic concentration values from 5 out of 25 sites (S02, S06, S11, S15, and S45) were identified as outliers from the IQR of the time series of each monitoring point ( Table 2). Each outlier was critically evaluated, according to the conceptual model of the site, to assess whether the anomaly was most likely related to natural or anthropic causes. Six outliers were identified in the piezometers of site S02, a sugar factory in which some hydrocarbons spills/leaks were reported (see Section 2.1). The anomalous values ranged between 8.8 and 594.0 µg/L, pertaining to six different piezometers classified as "internal" and thus plausibly affected by the hydrocarbon contamination at the site. Moreover, the site is located in the western sector of the study area, which is made up of fluvial deposits that are expected to contain only rare peat deposits and vegetal material. For these reasons, the six outliers were considered to be the effect of anthropogenic influences on As concentrations and were deleted from the working database. The five outliers of site S06, another sugar factory in the western sector, were identified in five piezometers whose location with respect to the source of contamination is unknown. The outliers ranged from 2.4 to 66.0 µg/L. Two of the five outliers had As <5 µg/L and thus could not be related to relevant anthropogenic impacts (see Section 2.3.3). In the same site, four piezometers labeled as blanks showed As concentrations up to 40.2 µg/L, comparable to those of the outliers. For this reason, the outliers of this site were considered as natural anomalies and kept in the working database. The same criterion was applied to site S11, a small engine company in the western sector, showing three outliers ranging between 7.8 and 29.5 µg/L in three piezometers with unknown location, whereas blank piezometers at the same site had higher concentrations (up to 123.0 µg/L). Site S15 is a disused municipal landfill located in the central sector, made of deltaic deposits enriched in peats. The four outliers of this site range from 15.0 to 76.9 µg/L and pertain to four piezometers, three of which were downgradient to the source of contamination and one upgradient. The values were kept in the working database since the anomalous concentrations downgradient to the source (15.0 to 51.0 µg/L) were lower than those from the upgradient piezometer (76.9 µg/L) and other blank samples from the same site (up to 78.0 µg/L). The last site that showed anomalously high concentrations was S45, a large petrochemical complex located in the western sector. The three outliers (ranging from 11.6 to 27.8 µg/L) detected in three piezometers were cautiously excluded from the working database since, at this site, some hydrocarbons spills/leaks most likely occurred. The Mann-Kendall test [80,81] applied to all the time series with sample size ≥3 (the minimum to let the test working) did not allow for detecting any significant temporal trend. This may be partly due to the small number of values available for each series, i.e., up to six, which is below the recommended minimum sample size of 10 [86][87][88], possibly spanned over at least 5−10 years [89,90], to obtain a statistically acceptable trend identification. Notwithstanding the above limitation, the absence of temporal trends agrees with the hypothesized mainly natural origin of arsenic in groundwater in the study area (Section 2.1).
For the analysis of spatial outliers, the median value of the time series, cleaned from the temporal outliers likely representing anthropogenic influences, was calculated for each piezometer. Spatial outliers were identified in 13 sites (S01, S02, S06, S11, S15, S18, S20, S23, S26, S28, S35, S40, and S45), in a variable number of piezometers, between 1 and 10 ( Table 3). The outliers found in sites S06, S11, and S15 (ranging from 9.6 to 73.7 µg/L) were kept in the working database since the highest concentrations were detected in blank piezometers, and thus all values can be considered to reflect natural As contents. Sites S26, S28, S35, and S40 showed one to two spatial outliers. These four sites are located in the central (deltaic) sector. The values identified as outliers in these sites, ranging from 5.0 to 27.3 µg/L, were lower than the values observed in other sites in the same sector that did not display spatial outliers, such as 95.0 µg/L in site S17, or 95.0, 157.0, and 181.0 µg/L in site S21, or 90.6 µg/L in S34. Thus, the spatial outliers in sites S26, S28, S35, and S40 were kept in the working database because they were considered to be consistent with the natural background. Site S18 in the eastern sector is a former sugar factory showing six spatial outliers with the maximum value of 18.5 µg/L. This value is lower than the values calculated in other sites of the eastern sector for blank piezometers, e.g., 73.7 µg/L in site S15. For this reason, the outliers of site S18 were kept in the working database. One outlier was identified in each of the sites S01 and S20 in the central sector. In both cases, the anomaly is clearly related to one single sample of the temporal series, that the IQR criterium was unable to detect due to the small size (two samples) of the time series. Here, the choice was to delete these two anomalous samples, in order to increase data consistency within the two sites. The same reasoning was applied to the outlier in site S23 in the western sector, which was related to one anomalous value in the temporal series. The outliers of sites S02 and S45 in the western sector (3 and 10 values, respectively) were cautiously deleted from the database because some anthropogenic influences on As concentrations may have occurred in these two sites (see Section 2.1).

Derivation of Natural Background Levels
The population of median arsenic concentrations purged of temporal and spatial outliers was employed for the calculation of NBLs. Data were split into three sub-populations corresponding to three distinct (western, central, and eastern) geological sectors. These populations are represented in the probability plots of Figure 4. The range of median concentrations is larger in the central sub-population (0.005 to 181.0 µg/L) compared to the western and eastern sub-populations that show similar ranges between 0.005 and 108 µg/L and between 0.005 and 73.7 µg/L, respectively. The probability plots of Figure 4 show unimodal asymmetric distributions in the eastern and western sectors, whereas a bimodal distribution characterizes the central sector. All the three distributions have one or two extreme values which, following outlier analysis, can be considered as natural hot spots. The upper segment of the bimodal distribution of the central sector, identifiable between 40 and 100 µg/L, is missing in the plots of the other two sectors and likely represents, together with the two hot spots at 157 and 181 µg/L, the effect of stronger natural arsenic release triggered by the local occurrence of peaty layers and vegetal material in the shallow subsurface. In the eastern and western sectors, there is also evidence of natural mobilization of arsenic, reaching values above the regulatory limit (10 µg/L) in several instances. However, the lower ranges and different shapes of the distributions compared to the central sector suggest the occurrence of conditions less favorable to arsenic mobilization in these two sectors, consistently with the conceptual model of Section 2.1. A similar shape of data distribution ( Figure 4) and results of a Mann-Whitney U test (p-value of 4.03 × 10 −6 ) suggest that the eastern and western datasets pertain to the same population, confirming the lithologic similarities (paucity of buried OM), whereas the data from the central sector represent a distinct population.
For these reasons, two NBLs were calculated for the study area, one pertaining to the central sector and corresponding to the 90th percentile of the distribution of the central dataset, whereas a second one was calculated for the eastern and western sectors, as the 90th percentile of the distribution of the two combined datasets. The two resulting NBLs correspond to 68 and 21 µg/L, respectively, confirming the serious issue of arsenic in shallow groundwater in the central sector.

Comparison with NBLs Derived from the Regional Monitoring Network
The 18 monitoring points of the regional network ( Figure 1) were divided into two populations, corresponding to the eastern plus western sectors and the central sector, consistently with observations in Section 3.3. Thus, two NBLs were calculated following the methodology described in Section 2.3.5. The NBL for the eastern and western sectors was calculated on 15 out of the total 18 monitoring points of the network and was equal to 21 µg/L. The same value was obtained using the aggregation of site-specific datasets. The NBL for the central sector was calculated on 3 out of 18 monitoring points and was 6 µg/L, slightly below the regulatory limit of 10 µg/L and one order of magnitude below the NBL of 68 µg/L obtained from the aggregation of site-specific datasets.
The fact that distinct NBLs obtained from the regional network and from the assemblage of site-specific datasets had the same value for the eastern and western sectors (21 µg/L) suggests the following considerations: (1) the NBL value is robust, since it was obtained by two different and independent datasets; (2) the regional network is dense enough to catch the natural geochemical heterogeneity influencing As distribution in these two sectors; (3) the use of an aggregation of site-specific datasets derived from sites under remediation represents a valid alternative for calculating NBLs, after proper depuration from anthropogenic influences.
On the other hand, the significantly different NBLs obtained in the central zone (6 µg/L from the regional network and 68 µg/L from the assemblage of site-specific datasets) highlights two important aspects: (1) the inadequacy of the regional network to grab the geochemical heterogeneity of this sector, which led to an unreliably low NBL value. Such a value may lead to erroneous evaluations of potentially contaminated sites, attributing high As concentrations found in this sector to anthropogenic sources instead of natural sources; (2) the importance and the validity of the methodology proposed here, involving the use of site-specific datasets to fill the gap left by regional monitoring networks, which proved to be unable to catch the local-scale heterogeneity.

Conclusions
The present study involved the derivation of NBLs for groundwater As at the mesoscale (2600 km 2 , corresponding to the administrative Province of Ferrara in the Po Plain, N Italy) for the shallow aquifer, using an aggregation of site-specific datasets collected from a public registry of sites under remediation. The use of this kind of data was motivated by the lack of an adequate number of monitoring points in the regional groundwater quality network able to grab the local-scale hydrochemical heterogeneity.
In the study area, the NBLs obtained for As were: • 68 µg/L in the central sector, characterized by the abundance of buried OM, and thus, by a stronger potential for release of As to groundwater due to the reductive dissolution mechanism; • 21 µg/L in the eastern and western sectors, characterized by a lower content of buried OM.
More general conclusions of this study are: • Site-specific datasets can represent a cost-effective source of data useful for the derivation of NBLs, when regional monitoring networks fail to catch local-scale variability; however, the main disadvantage of using an assemblage of site-specific datasets is limited data quality, due to the likely application of different sampling and analytical methodologies; • The lack of complete information on major ions and specific pollutants/contaminants for the sites, which prevents the application of conventional methodologies (e.g., pre-selection), can be overcome through a critical analysis of outliers that allows identification of possible anthropogenic influences; the analysis of outliers, however, must be supported by a robust conceptual model of each site, which must contain a description of the site (a) geology, (b) hydrogeology (type and depth of the aquifer involved, groundwater flow direction), and (c) contamination/pollution (chemical species and compounds involved, location of monitoring points with respect to the contamination/pollution sources); • The design of regional monitoring networks of groundwater quality must consider local-scale geological heterogeneities that can generate local and high natural concentrations of particular chemical species, such as arsenic, in order to avoid the calculation of unreliably low NBL values that might lead to erroneous evaluation of potentially contaminated sites.
The approach presented in this study is based on a local case study, but it can be reproduced in other areas worldwide where the abundance of different site-specific datasets can counterbalance the limitation of regional monitoring networks in detecting local-scale heterogeneity.