Next Article in Journal
Spatio-Temporal Behavior of Land Surface Temperatures (LSTs) in Central Chile, Using Terra MODIS Images
Previous Article in Journal
Influence of Slope Aspect and Vegetation on the Soil Moisture Response to Snowmelt in the German Alps
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Geostatistical Analysis of Groundwater Data in a Mining Area in Greece

School of Mineral Resources Engineering, Technical University of Crete, 73100 Chania, Crete, Greece
*
Author to whom correspondence should be addressed.
Hydrology 2024, 11(7), 102; https://doi.org/10.3390/hydrology11070102
Submission received: 14 April 2024 / Revised: 17 June 2024 / Accepted: 20 June 2024 / Published: 11 July 2024

Abstract

:
Geostatistical prediction methods are increasingly used in earth sciences and engineering to improve upon our knowledge of attributes in space and time. During mining activities, it is very important to have an estimate of any contamination of the soil and groundwater in the area for environmental reasons and to guide the reclamation once mining operations are finished. In this paper, we present the geostatistical analysis of the water content in certain pollutants (Cd and Mn) in a group of mines in Northern Greece. The monitoring points that were studied are 62. The aim of this work is to create a contamination prediction map that better represents the values of Cd and Mn, which is challenging based on the small sample size. The correlation between Cd and Mn concentration in the groundwater is investigated during the preliminary analysis of the data. The logarithm of the data values was used, and after removing a linear trend, the variogram parameters were estimated. In order to create the necessary maps of contamination, we employed the method of ordinary Kriging (OK) and inversed the transformations using bias correction to adjust the results for the inverse transform. Cross-validation shows promising results ( ρ = 65 % for Cd and ρ = 52 % for Mn, RMSE = 25.9 ppb for Cd and RMSE = 25.1 ppm for Mn). As part of this work, the Spartan Variogram model was compared with the other models and was found to perform better for the data of Mn.

1. Introduction

Underground water plays a key role in the hydrological cycle, and it is a very valuable resource for rivers and wetlands, especially during times of drought [1]. As a result, the quality of the water must be taken into consideration. Groundwater contamination is the result of polluted water infiltrating through the soil and secondary porosity (fractures) in the rock and eventually reaching the groundwater [2]. Because the water moves slowly through the soil, the impact on human activity is relatively long-term. Thus, pollution that occurred decades ago may still be dangerous presently for groundwater quality [1]. The contamination of groundwater is a serious hazard because the pollutants may find their way into local agriculture products, either from the plants or animals, thus monitoring the concentration is of paramount importance. High concentrations of Cd can contribute to kidney damage or bone fragility and increase the chance of cancer. Exposure to high concentrations of Mn can contribute to neurological disorders and cognitive impairments in both humans and animals.
Mining has an impact on the quality of groundwater. The extent of the impact depends on a number of factors, such as the formation of the rock being mined, the type of soil, the technology used in mining, and the environmental commitment of each company [3]. The mining process has expanded significantly in the past decades. Therefore, mining waste has multiplied, especially in opencast mines where excavations include large amounts of waste materials. Heavy metal pollution of water is caused when elements such as Cadmium, Lead, Arsenic, etc., come into contact with water. Groundwater contamination in heavy metals can affect the inhabited areas around the mine, not just the workers [4].
Years of experience show that the measures employed are occasionally not adequate to remove all contaminants completely and that pollution sources, even if partially removed, continue to release pollutants. It is, therefore, preferable to prevent or reduce the risk of pollution rather than to deal with the consequences, and it is well-advised to have reliable estimations of the pollution in the entire area. Thus, the use of geostatistics in such environmental studies is both useful and important in order to guide the techniques used to reduce the pollution level [5].
These reasons motivated our study of the potential contamination of groundwater in a mining area in Greece by the heavy metals Cd and Mn.
Similar works were performed by [6], who used ordinary Kriging to analyze the spatial variability of pollutants in groundwater and [7], who used ordinary Kriging (OK) to support groundwater quality management. Another application of OK by [8] developed quality maps of pH, P, K, Ca and Mg in five Kentucky counties. Furthermore, Shi et al., 2007 [9] analyzed 665 soil samples of Hg, Pd, Cu, As, Cr, and Cd. They applied OK and lognormal Kriging in order to create a map of their spatial patterns. Li et al., 2017 [10] studied the spatial distribution and transport characteristics of heavy metals around a mine area in central China using OK.
Several recent studies on water quality monitoring were published recently, employing geostatistics. Barkat et al., 2021 [11] assess the quality of groundwater both for drinking and irrigation purposes, using OK for the water quality index (WQI). Konaté et al., 2022 [12] applied statistical and geostatistical methods to assess the problem of the quality of groundwater in the gold zone of Baraka, Republic of Guinea. Also, in 2022, Ngounouno Ayiwouo et al., 2022 [13] used OK to trace metals (Fe, Mn, Cr, Ni, Cu, Zn and Pb) in surface sediments along the Lom River in the gold mining area of Gankombol. Last but not least, in 2020, Jianfei et al., 2020 [14] searched for potential toxic elements in soils using absolute principal component score/multiple linear regression (APCS/MLR), positive matrix factorization (PMF) and geostatistics in a mining industry in Eastern China. The negative impact of mines in a river system in Australia is discussed by [15], especially regarding the mobility of heavy elements. Zhang et al., 2023 [16] used OK to investigate the distribution of heavy metal contaminants in a mining area in China. Zhu et al., 2019 [17] used OK to estimate the distribution of pollutants in a mining area in southeastern China. Both works concluded that the distribution of pollutants was influenced by both natural factors (such as topography and soil properties) and anthropogenic factors (such as mining activities).
Compared to these works, the present paper focuses specifically on the estimation of water content in certain pollutants (Cd and Mn) and provides comparable and improved cross-validation metrics.
In the case of the mines of Northern Greece, the dataset and analysis present certain challenges that need to be overcome. The sample size is relatively small and the data locations are mainly clustered around the three mines Another issue is the presence of two dumping grounds for the tailings, areas where the concentration of pollutants is artificially increased in contrast with the underlying statistical stationarity assumption.
The analysis that follows is divided into four main stages: statistical analysis, trend investigation and removal, variography and ordinary Kriging. The flowchart of the procedure can be seen in Figure 1. Discussion of the results and conclusions follows the geostatistical analysis.

2. Notation and General Principles

In the following analysis of the mine area dataset, we will make use of the following terms and principles. Random variable X is a function whose domain of definition is a sampling frame Ω . The range of values is a set of real numbers x. The capital letter X will be used for the random variable while the lower case letter x denotes a specific value of variable X. The probability must be calculable for each value x [18].
In geostatistics, the term of the random field is commonly used. A random field is a random function, the value of which in every point in space is a different random variable. For simplicity, we can consider a random field as the collection of random variables in space.
In geostatistics a random field is considered to be composed of a deterministic part called the trend M x ( s ) and a stochastic part called Fluctuation X ( s ) . The covariance function c x ( s 1 , s 2 ) shows how much the fluctuation of the point s 1 depends on the point s 2 . The function c x ( s 1 , s 2 ) of a random field X(s) is defined in Equation (1)
c x ( s 1 , s 2 ) = E [ X ( s 1 ) · X ( s 2 ) ] ,
where E [ X ] symbolizes the mathematical expectation, as in
E [ X ] = x f ( x ) d x .
In statistical homogeneous and isotropic fields, the key parameters of the covariance function are the variance σ x 2 ( s ) , the nugget effect c 0 = c x ( 0 ) and correlation length (h). Variance is the measure of the variation in the amplitude of the field. The correlation range h normalizes the distance; r / h and defines the interval within which interdependence exists. The nugget effect ( c 0 ) in the covariance model is a constant value for all distances greater than zero. It is affected by the volume of sampling, decreasing in value as the volume increases. It is related to the intrinsic randomness of the regionalized variable [19].
The Mean Squared Error (MSE) is a measure of how close an estimation is to the data. It is the square of the error of the estimation ( ϵ = X ^ X , where X ^ is the estimation). The greater the MSE, the worse the fit of the model. Another validation measure calculated is the Root Mean Squared Error (RMSE), which is the square root of the Mean Square Error. More than one validation measure is needed for the assessment of the quality and functionality of a geostatistical estimation [18,20].

2.1. Trend

The average values of the data are often related to the position in which they are located (spatially dependent) making the random field X non-stationary. When the dependence of those random variables is fully known it can be described by deterministic functions [19,21]. Such a deterministic function describes the trend.
A random field X is considered to be composed of the deterministic trend and the fluctuation as shown in
X ( s ) = M x ( s ) + X ( s ) ,
where X ( s 1 ) = X ( s 1 ) M x ( s 1 ) is the fluctuation of the random field around the s 1 point [18,22].
Significant trends in the data should be removed in order to preserve stationarity [23]. A trend can often be described as a linear or polynomial equation. For example, the first-order (linear) trend can be described by:
M x ( s ) = a 0 + a 1 s p + a 2 s q ,
where ( s p , s q ) are the coordinates of a point s .

2.2. Variogram

The variogram of a random field is a measure of the spatial dependence between pairs of points. It is defined according to the following function:
γ x ( s , r ) = 1 2 E [ { X ( s + r ) X ( s ) } 2 ] ,
where r is the distance vector and E [ X ( s ) ] is the mathematical expectation (the expected value) of the random field X at the location s [24].
In static homogeneous fields, the covariance and variogram functions are closely related [25,26]. Either one can be used for the calculation of spatial dependence as shown in
γ x ( r ) = σ x 2 c x ( r ) .
The empirical variogram equation of (5) cannot calculate the variogram in any distance in space. It is thus required to fit a theoretical variogram model in order to have an approximation of the variogram value for all distances [20].
Different variogram models have been tested in this research [19]. One of the prevalent models selected after this analysis was the spherical variogram model presented in Equation (7). This is a widely used model in spatial analysis that captures spatial correlation up to the correlation range h, beyond which correlation becomes negligible.
γ x ( r ) = σ x 2 1.5 r h 0.5 r h 3 , r h σ x 2 r h
The other model employed after the selection process was the Spartan Spatial Random Field (SSRF) model, detailed in Section Spartan Spatial Random Field.

Spartan Spatial Random Field

A novel covariance model, the Spartan Spatial Random Field (SSRF) was tested for the data in this analysis. The SSRF model is given by [27]
C ( r ) = η 0 2 π Δ e | r | ξ β 2 sin ( | r | ξ β 1 ) | r | ξ | η 1 | < 2 , η 0 8 π e | r | ξ η 1 = 2 , η 0 4 π Δ e | r | ξ ω 1 e | r | ξ ω 2 1 ( ω 1 ω 2 ) | r | ξ η 1 > 2 .
In Equation (8), η 0 is the scale factor that determines the magnitude of the fluctuations, η 1 is called the rigidity coefficient, and ξ is a normalization parameter that determines the range of spatial correlations. The remaining coefficients are given by β 1 , 2 = | 2 η 1 | 1 / 2 , ω 1 , 2 = η 1 Δ 2 1 / 2 , and Δ = | η 1 2 4 | 1 / 2 . The range h of the Spartan model is determined by both η 1 and ξ . The SSRF model was shown to perform competitively in interpolation studies that involved coal mine reserves [21] and groundwater level [28] data. Furthermore, the SSRF model, thanks to its three parameters ( η 0 , η 1 , ξ ), can better adapt to a smaller amount of data [27].

2.3. Kriging

Random field estimation attempts to predict the value of the field in locations for which no measurements have been made. The most widespread methods of linear interpolation belong to the family of Kriging [26,29]. Kriging methods are best linear unbiased estimators when the covariance (or variogram) of the field is known [26].
In all Kriging methods, the predicted value of the study point is obtained by a weighted linear combination of the values of neighboring points. The weights are determined by minimizing the variance of error ϵ (see Section 2) using the covariance or variogram matrix [20,22].
In Kriging, a neighborhood ω ( u ) is often employed. Neighborhood ω ( u ) is around the prediction point u , which includes n ( u ) N points from s i ( i = 1 , , N ) . The size of the neighborhood is determined by the range h of the variogram as data outside ω ( u ) are uncorrelated with the value of X ( u ) .
Kriging methods are the best linear unbiased estimators (BLUE) [23]. Along with the lowest variance for the prediction error, Kriging methods provide the uncertainty of the estimation for each predicted location. For this study, ordinary Kriging (OK) is used. OK considers the trend as an unknown value that is constant within the neighborhood. In our case, the second-order trend has been removed but we opted to use OK instead of simple Kriging on the fluctuations in order to stabilize the residuals [30].
The prediction of the field at the study point u is expressed by the linear combination shown in Equation (9).
X ^ ( u ) = α = 1 n ( u ) λ α X ( s α )
where λ α refers to the Kriging linear weights i = 1 n ( u ) λ i = 1 . The weights are calculated with the help of the augmented covariance or variogram matrix C X from Equation (10)
C X ( s ) Λ = C X ( u ) ,
with Λ = { λ 1 , , λ n ( u ) , μ } T , μ being the lagrance coefficient and C X ( s ) , C X ( u ) the augmented variogram matrices shown in Equations (11) and (12) below:
C X ( s ) = γ X ( 0 ) γ X ( s 1 s n ) 1 γ X ( s 2 s 1 ) γ X ( s 2 s n ) 1 γ X ( s n s 1 ) γ X ( 0 ) 1 1 1 0
C X ( u ) = γ X ( s 1 u ) γ X ( s 2 u ) γ X ( s n u ) 1
One of the benefits of the Kriging method compared to deterministic methods is that the variance of the prediction can be calculated. If the data are close to the normal distribution, the Kriging variance σ OK 2 corresponds to a normal distribution of values around the predicted value. Thus, for Gaussian data, Kriging methods do not provide just the estimated value but the entire distribution allowing for confidence intervals of the prediction [18,19,31]. Thus, OK can provide a measure of the uncertainty of the estimation. For OK, the variance of the prediction is given by
σ OK 2 ( u ) = i = 1 n ( u ) λ i γ X ( u , s i ) μ .
The logarithm of the concentrations of the elements in the investigated dataset is not sufficiently close to the normal distribution as shown in Table 1. Thus, the Kriging variance σ OK 2 does not correspond to a normal distribution around the predicted value X ( u ) .

3. Study Area and Prelimenary Analysis

The study area contains three mines, with the locations shown in Figure 2. The coordinates of the axes (easting and northing) have been adjusted so that the value of 0 on each axis corresponds to the minimum observed coordinate value. i.e., the easting of the westernmost sampling location equals X = 0 and the northing of the southmost sampling location equals Y = 0 . Distances are in meters. One of the mines is currently inactive and the company is trying to reduce the environmental footprint of the mines in the region [32].
The area is geologically composed of various metamorphic and igneous rocks; biotite gneisses with intercalations of marble horizons are located in the area, and granodiorites and acid to intermediate sub-volcanic rock intrusions into the gneisses cover small areas. The study area is characterized hydrogeologically as semi-permeable, and its vertical profile comprises three hydrostratigraphic units. Secondary porosity (fractures) is responsible for hydrological connectivity. In certain areas, the rocks exhibit strong fragmentation which increases with depth, thus creating pathways for slow groundwater flow. Government and company studies in the area have shown that the water reservoirs are connected [32].
Our study was carried out by using geostatistical methods to determine the spatial variability of the quality of underground water in the area of these three mines.
The data are comprised of concentrations of Cd and Mn pollutants in the water samples at the 62 locations shown in Figure 2. As the data did not follow the normal distribution, the logarithmic transform was applied in order for the distribution of the data values to be closer to the normal distribution. When using logarithms of the data or other non-linear transformations, care must be taken to investigate whether the transformation leads to biased estimations, and if necessary use bias correction methods similar to those presented in [19]. The value of Kriging prediction is not affected by the distribution of the data. However, Kriging variance depends on the distribution, making it unreliable in skewed data sets. Furthermore, the potential for error is expected to be greater at a prediction point surrounded by data with very different values from a point surrounded by data with a smoother variation in local values [31,33]. Thus, a distribution closer to the normal distribution is desired.

3.1. Data Statistics

Table 1 shows the basic statistics of the Cd and Mn elements after logarithmic transformations. The small number of the dataset did not allow for any attempts at declustering.
The histograms representing the data of each element and the correlation between them are presented in Figure 3. The correlation between the values of Cd and Mn is 54% in the entire area. Thus, there is a medium to strong correlation between the values of Cd and Mn in space. After the logarithmic transformation, the data are still skewed. However, as shown in Table 1 and the histograms of Figure 3, the logarithm values have a range of 3.07 to 5.57 for Cd and 4.67 to 5.67 for Mn. Compared to the unaltered range of 0.05 ppb to 261.65 ppb for Cd and 1.60 ppm to 289.46 ppm for Mn, Kriging smoothing is expected to affect the predictions less dramatically, producing lower errors [33].

3.2. Trend Analysis

As mentioned in Section 2.1, existing trends within the data can lead to spatial dependencies in the data and thus non-stationarity. Preliminary investigation with a linear trend model (first-order polynomial trend) has shown that a low-order polynomial was not sufficient for adequate removal of the trend. Thus, a second-order polynomial was removed as shown in Equation (14).
M x = a 0 + a 1 s p + a 2 s q + a 3 s p 2 + a 4 s q 2 + a 5 s p s q
where the polynomial coefficients are shown in Table 2. s p , s q are the coordinates for s .
We decided to estimate the trend and its coefficients through multi-linear regression [33]. The correlation coefficient with second-order trend for the element Cd is equal to 45.70%, while in Mn is equal to 63.76%. As a rule of thumb, a trend with a correlation over 20% should be removed from the data or it will cause issues in the model estimations of the stochastic part of the random field (the fluctuation X ) [21,24].

3.3. Variogram Models for Cd, Mn

In order to select the variogram model for each investigated element, the fluctuations in the data X ( s ) have been used to estimate the empirical variogram at 10 lags (Cd) and 11 lags (Mn). Different variogram models have been tested for this purpose. We used least squares error to fit the various theoretical models (exponential, Gaussian, spherical, SSRF) to the empirical variogram and extract the model parameters [24].
The spherical variogram model of Equation (7) was found to provide the best fit by returning the lowest least squares error for the element of Cd. The spherical variogram is fit to the experimental variogram as shown in Figure 4a,b. The range h is the maximum distance of spatial dependence and σ 2 is the variance.
For Cd (Figure 4a), the range h is just 731 m, which means points are correlated in a distance of 731 m. As such, reliable measurements of Cd are possible only at a short distance from the boreholes. The variance for Cd is σ x 2 = 6.67 .
For the Mn element, shown in Figure 4c, the results are more promising since the distance of the dependence is larger at h = 2984 m. The other parameters of the Spartan Variogram model are: η 0 = 583.1 , η 1 = 1.92 . The Spherical Variogram is presented in Figure 4b for comparison. The parameters of the Spherical variogram are h = 1424 m, σ x 2 = 8.30 .

4. Spatial Estimation Results

This section summarizes the results of the spatial estimation using OK for the two elements. First, OK was employed to predict the values of the logarithms and then inverse transform was performed to back-transform the prediction maps to ppm (Mn) or ppb (Cd) [22,25].

4.1. Kriging of the Logarithm

The logarithm of the data was decomposed in the deterministic trend M X ( s ) and the stochastic fluctuation X ( s ) . OK was performed on the fluctuation of the logarithms of the concentration to obtain predictions X ^ ( u ) . The trend for the prediction points M X ( u ) was calculated based on the trend coefficients of Table 2. The trend was added to the prediction of the fluctuation. The sum of the fluctuation and the trend gives the prediction of the logarithm of the concentration for the two elements as X ^ ( u ) = M X ( u ) + X ^ ( u ) [18,25]. Figure 5a,b shows the predicted values and Figure 5c, which are unitless variables as they are the logarithm of the concentration; Figure 5d shows the standard deviation σ OK of the predictions.
Because the range h of the variogram for Mn is higher ( h = 1424 m for Mn instead of h = 731 m for Cd), OK neighborhoods are larger for Mn which can be noticed in the maps of Figure 5a,b. In order for the value X ( u ) of any grid point to be estimated reliably, at least three sample points must be included in the neighborhood. The values at grid locations without enough neighbors are not estimated.
From the logarithmic Kriging map of Figure 5a,b, it can be seen that very high content of both Cd and Mn is found in a few areas, which rapidly increase in value (hotspots). Those areas for Cd are in the center of mine C and in the northwest and eastern sectors of mine B. The hotspots for Mn are in the eastern parts of mine C and in the northeastern parts of mine B. furthermore, Cd presents slightly higher values than Mn. The presence of such hotspots in the logarithm indicates that values of random field X are not independent of the coordinates despite removing the trend. Such hotspots do not denote long-term dependencies, as the trend removal has already removed them. The presence of these hotspots undermines the stationary hypothesis. Thus, the quality of the prediction close to such hotspots is subject to the influence of these hotspots. Kriging Error Standard Deviation ( σ OK ) represents the uncertainty of the estimate. There seems to be high uncertainty of the estimations further away from data locations, which is expected given the range of the variogram and the sparsity of the data.

4.2. Inverse Transformation

After predicting the fluctuations with OK and adding back the trend, the logarithm transformation was inverted. The first step of the inversion is
X ^ ( u ) = exp ( X ^ * ( u ) ) ,
where X ^ * ( u ) is the estimation of the logarithm of the concentration in the prediction point u . In the case of the presence of bias, a bias correction factor B can be used in Equation (15) as
X ^ ( u ) = B exp ( X ^ * ( u ) ) .
In the case of the log-transform, the Bias correction B is different for each prediction point. B ( u ) is given by the following equation [19,34]:
B ( u ) = e 1 2 σ O K 2 ( u ) .
The results of the inverse transformation for both elements can be seen in Figure 6a,b.
The Kriging maps for both Cd and Mn show mostly low concentrations, except for a couple of small areas and the areas at the edge of mine B and C. For mine C, there is one area with a high concentration of Cd. For mine B, there are two areas with high concentrations of Cd and three for Mn. These areas are the dumping grounds where the tailings from mining activities are deposited according to the mining company operating in the area. The high values at the eastern end of mine C and mine B are neighborhood artifacts, influenced by the trend and a few high values in their neighborhood.

4.3. Cross-Validation

To assess the estimation, leave-one-out cross-validation (LOOCV) was performed for both Cd and Mn [25]. LOOCV serially removes one point from the data and uses the variogram parameters to predict its value with OK. The predicted values for each point are then compared with the known field value at this point. The histogram of the LOOCV errors (after the inverse transform) can be seen in Figure 7 and Figure 8 for both elements. Most errors have low values, but for both elements, there is an outlier—the dumping ground for the tailings of the exploitation. Histograms in Figure 7 and Figure 8 show the absence of bias from the estimations.
Table 3 summarizes the validation measures for the LOOCV of both elements. For Mn, Table 3 includes the validation measures for the Spherical and SSRF variogram. For both elements, the value of Root Mean Square Error (RMSE) is significantly higher than the Mean Absolute Error (MAE). The divergence between RMSE and MAE is caused by the couple of outliers in each element. The value of Maximum Absolute Error (MaxAE) shows the highest absolute value (negative or positive). Mn shows ρ = 0.519 ( ρ = 0.467 for Spherical model) which is modest correlation. Cd performs better with ρ = 0.647 , which is a strong correlation.
According to the LOOCV results, the most accurate results are obtained on Cd, although estimation for the concentration of both elements was satisfactory with the exception of the outliers. These outliers are dumping grounds for the tailings. Thus, the field is not statistically homogeneous even after the removal of a second-order trend. As such, when LOOCV removes an outlier, the value of their location, estimated by much lower values in the vicinity, would be significantly lower. The outliers can be seen in Figure 6a,b.

5. Discussion

There are several works in the literature that have used geostatistical prediction methods to estimate contamination in mining sites. We investigated the correlation between Cd and Mn and used the logarithm of the data values to estimate the variogram parameters and compared widely used variogram models (spherical, exponential, Gaussian) with the SSRF variogram model. We employed OK to create contamination prediction maps that better represent the values of Cd and Mn. The cross-validation outcomes were encouraging, demonstrating that the prediction maps are representative.
The choice to remove a second-order polynomial trend was reached after preliminary analysis showed there was a significant trend in the data. After removing the first-order polynomial trend, variogram analysis showed that a significant trend remained in the data. The variogram would not reach a sill, which, according to [19,33], represents the non-stationarity introduced by trend. Thus, a second-order trend was removed. The trend’s coefficient was estimated using multilinear regression [22].
The cross-validation results of Table 3 show promising results with a correlation of 0.65 for Cd and 0.52 for Mn, and root mean squared error (RMSE) of 25.9 ppb for Cd and 25.1 ppm for Mn. These results indicate that the prediction maps based on the best-fit model parameters are reasonably accurate. It should be noted that without the Bias correction, the errors were more significant for both elements. For the element of Cd, without bias correction, MaxAE = 194 ppb, mean error = −5 ppb, and ρ = 62.2 % . Mn displayed less bias (ME = 1.7 ppm) but the improvement was still noticeable. Without bias correction, the validation measures for Mn were MaxAE = −182 ppm, RMSE = 32.6 ppm and ρ = 51.0 % . This noticeable improvement for both elements showcases the benefits of the bias correction of Equation (17).
The exception for both elements, locations where cross-validation did not perform well are locations near the dumping grounds of the tailings from the mining activities. LOOCV predicts the value of location based on surrounding values. Thus, significantly high concentrations that are non-stationary, have this effect, as it is impossible for Kriging methods to predict a value of 400 ppm for Mn or 200 ppb for Cd based on surrounding values of 2 to 40 ppm, resulting in high errors in such locations. However, the prediction map incorporates all data, including the outliers; thus, predictions around the location of the tailings are accurate. The quick drop in the concentration of the pollutant in drill-holes away from the tailing areas shows the extend of the influence of the dumping grounds in space. Thus, such methods can be useful for monitoring and managing contamination during mining activities.
The SSRF variogram detailed in Section Spartan Spatial Random Field was tested for both Cd and Mn. While the Spherical was found to give a better fit and better LOOCV results for Cd, the SSRF variogram showed significant improvement for the Mn predictions as shown in Table 3. It is also worth mentioning that while the spherical variogram predictions for both Cd and Mn returned right-skewed errors, with the highest error being an underestimation of the concentration in the dumping grounds, the SSRF for Mn (see Figure 8 is more symmetrical. The highest error was again an underestimation of the value at the dumping grounds.
Another issue encountered in this analysis was the small amount of available data. It has been shown in studies [27] that the SSRF variogram can better adapt to a smaller amount of data. Despite the small sample size, which includes outliers (the dumping grounds), SSRF was able to give a fair prediction for Mn concentration, showing that it can capture the complex spatial dependencies well despite the small sample size.

6. Conclusions

In this study, spatial analysis with OK is used to present the contamination of groundwater in two heavy elements, Cd and Mn, for an area of mining activity. High concentrations of Cd and Mn in the water increase the risk of environmental damage and impact the health of humans and animals in the area. The systematic monitoring that the local mining company performs, helps to observe any contamination and take appropriate measures if necessary. The spatial distribution of the pollutants can assist companies in focusing their attention on areas that have high concentrations. The range of the variogram can help to define the extent of the spread of the contamination. Namely, one of the locations used as a dumping ground for tailings has been announced as the place for a restoration project by local authorities to protect the environment and the surrounding area.
Overall, the use of geostatistical prediction methods in mining and environmental management is a well-established area of research. The present paper contributes to this area by presenting a case study of the application of geostatistical analysis to estimate water content in certain pollutants in a group of mines in Northern Greece.
The data values were transformed (logarithm) to better approach the normal distribution. The data are scattered and divided among three mines in the area. A second-order trend has been removed. However, issues with field stationarity persist because of the presence of dumping grounds with higher concentrations. According to LOOCV results (see Table 3), the most accurate results are obtained on Cd although estimation for the concentration of both elements was satisfactory with the exception of dumping grounds, which are considered outliers.
The results presented in the paper demonstrate the successful use of geostatistical prediction methods to estimate the water content of certain pollutants. The cross-validation results show promising results with a correlation of 0.65 for Cd and 0.52 for Mn, and root mean squared error (RMSE) of 25.9 ppb for Cd and 25.1 ppm for Mn. These results indicate that the prediction maps, based on the best-fit model parameters, are reasonably accurate with the exception of a few artifacts at the eastern edge of the mines. Thus, such methods can be useful for monitoring and managing contamination during mining activities. From the various variogram models tested, the SSRF model was shown to significantly improve the predictions for Mn compared to the spherical model.
The authors used the original Matlab code to apply the method of OK to create contamination prediction maps that better represent the values of Cd and Mn, which are important contaminants that need to be monitored during mining activities.
As a direction for future studies, the uncertainty of the predictions could be further investigated. As mentioned in Section 4.2 the logarithmic transformation presents challenges. By means of kernel-based Gaussian anamorphosis [31,35], the probability distribution of the sample can be transformed to the normal distribution. Thus, the prediction uncertainty could be assessed using conditional simulations [21,25,31]. To further explore the uncertainty introduced by the estimation of the trend, universal Kriging could be applied to the data [18,22].

Author Contributions

Conceptualization, E.A.V.; methodology, E.A.V. and A.P.; software, E.A.V., A.P. and E.D.; validation, E.D., E.A.V. and A.P.; formal analysis, E.A.V., A.P., E.D. and E.S.; investigation, E.A.V., A.P. and E.D.; data curation, E.A.V., A.P. and E.D.; writing—original draft preparation, E.A.V., A.P., E.D. and E.S.; writing—review and editing, E.A.V., A.P., E.D. and E.S.; visualization, E.A.V., A.P. and E.D.; supervision, E.A.V.; project administration, E.A.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Junk, W.J.; An, S.; Finlayson, C.; Gopal, B.; Květ, J.; Mitchell, S.A.; Mitsch, W.J.; Robarts, R.D. Current state of knowledge regarding the world’s wetlands and their future under global climate change: A synthesis. Aquat. Sci. 2013, 75, 151–167. [Google Scholar] [CrossRef]
  2. Geetha, A.; Palanisamy, P.; Sivakumar, P.; Kumar, P.G.; Sujatha, M. Assessment of underground water contamination and effect of textile effluents on Noyyal River basin in and around Tiruppur Town, Tamilnadu. J. Chem. 2008, 5, 696–705. [Google Scholar] [CrossRef]
  3. Spitz, K.; Trudinger, J. Mining and the Environment: From Ore to Metal; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
  4. Council, N.R. Hardrock Mining on Federal Lands; The National Academies Press: Washington, DC, USA, 1999. [Google Scholar] [CrossRef]
  5. Gounot, A.M. Microbial oxidation and reduction of manganese: Consequences in groundwater and applications. FEMS Microbiol. Rev. 1994, 14, 339–349. [Google Scholar] [CrossRef]
  6. Arslan, H. Spatial and temporal mapping of groundwater salinity using ordinary Kriging and indicator Kriging: The case of Bafra Plain, Turkey. Agric. Water Manag. 2012, 113, 57–63. [Google Scholar] [CrossRef]
  7. Kostopoulou, E. Applicability of ordinary Kriging modeling techniques for filling satellite data gaps in support of coastal management. Model. Earth Syst. Environ. 2021, 7, 1145–1158. [Google Scholar] [CrossRef]
  8. Mueller, T.; Pusuluri, N.; Mathias, K.; Cornelius, P.; Barnhisel, R.; Shearer, S. Map quality for ordinary Kriging and inverse distance weighted interpolation. Soil Sci. Soc. Am. J. 2004, 68, 2042–2047. [Google Scholar] [CrossRef]
  9. Shi, J.; Wang, H.; Xu, J.; Wu, J.; Liu, X.; Zhu, H.; Yu, C. Spatial distribution of heavy metals in soils: A case study of Changxing, China. Environ. Geol. 2007, 52, 1–10. [Google Scholar] [CrossRef]
  10. Li, X.; Yang, H.; Zhang, C.; Zeng, G.; Liu, Y.; Xu, W.; Wu, Y.; Lan, S. Spatial distribution and transport characteristics of heavy metals around an antimony mine area in central China. Chemosphere 2017, 170, 17–24. [Google Scholar] [CrossRef] [PubMed]
  11. Barkat, A.; Bouaicha, F.; Bouteraa, O.; Mester, T.; Ata, B.; Balla, D.; Rahal, Z.; Szabó, G. Assessment of complex terminal groundwater aquifer for different use of Oued Souf Valley (Algeria) using multivariate statistical methods, geostatistical modeling, and water quality index. Water 2021, 13, 1609. [Google Scholar] [CrossRef]
  12. Konaté, A.A.; Konaté, S.F.; Zaheer, M.; Kwame, A.G.; Oularé, F. Geostatistical analysis of hydrogeological data of the Baraka gold zone, prefecture of Siguiri, Republic of Guinea. Environ. Earth Sci. 2022, 81, 232. [Google Scholar] [CrossRef]
  13. Ngounouno Ayiwouo, M.; Ngueyep Mambou, L.L.; Boroh, W.A.; Kingni, S.T.; Ngounouno, I. Spatial variability of trace metals in sediments along the Lom River in the gold mining area of Gankombol (Adamawa Cameroon) using geostatistical modeling methods. Model. Earth Syst. Environ. 2022, 9, 313–329. [Google Scholar] [CrossRef]
  14. Jianfei, C.; Chunfang, L.; Lixia, Z.; Quanyuan, W.; Jianshu, L. Source apportionment of potentially toxic elements in soils using APCS/MLR, PMF and geostatistics in a typical industrial and mining city in Eastern China. PLoS ONE 2020, 15, e0238513. [Google Scholar] [CrossRef]
  15. Johnston, S.G.; Bennett, W.W.; Doriean, N.; Hockmann, K.; Karimian, N.; Burton, E.D. Antimony and arsenic speciation, redox-cycling and contrasting mobility in a mining-impacted river system. Sci. Total Environ. 2020, 710, 136354. [Google Scholar] [CrossRef]
  16. Zhang, Y.; Song, B.; Zhou, Z. Pollution assessment and source apportionment of heavy metals in soil from lead-zinc mining areas of south China. J. Environ. Chem. Eng. 2023, 11, 109320. [Google Scholar] [CrossRef]
  17. Zhu, X.; Cao, L.; Liang, Y. Spatial distribution and risk assessment of heavy metals inside and outside a typical lead-zinc mine in southeastern China. Environ. Sci. Pollut. Res. 2019, 26, 26265–26275. [Google Scholar] [CrossRef]
  18. Wackernagel, H. Multivariate Geostatistics: An Introduction with Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
  19. Hristopulos, D.T. Random Fields for Spatial Data Modeling: A Primer for Scientists and Engineers; Springer Nature: Dordrecht, The Netherlands, 2020. [Google Scholar] [CrossRef]
  20. Olea, R.A. Geostatistics for Engineers and Earth Scientists; Springer Science & Business Media: New York, NY, USA, 2012. [Google Scholar]
  21. Pavlides, A.; Hristopulos, D.; Roumpos, C.; Agioutantis, Z. Spatial modeling of lignite energy reserves for exploitation planning and quality control. Energy 2015, 93, 1906–1917. [Google Scholar] [CrossRef]
  22. Christakos, G. Random Field Models in Earth Sciences; Academic Press: San Diego, CA, USA, 1992. [Google Scholar]
  23. Cressie, N. Spatial Statistics; John Wiley and Sons: New York, NY, USA, 1993. [Google Scholar]
  24. Olea, R. A six-step practical approach to semivariogram modeling. Stoch. Environ. Res. Risk Assess. 2006, 20, 307–318. [Google Scholar] [CrossRef]
  25. Chilès, J.P.; Delfiner, P. Geostatistics: Modeling Spatial Uncertainty; Wiley: Hoboken, NJ, USA, 2012. [Google Scholar]
  26. Cressie, N. The origins of Kriging. Math. Geol. 1990, 22, 239–252. [Google Scholar] [CrossRef]
  27. Hristopulos, D.; Elogne, S. Computationally Efficient Spatial Interpolators Based on Spartan Spatial Random Fields. IEEE Trans. Signal Process. 2009, 57, 3475–3487. [Google Scholar] [CrossRef]
  28. Varouchakis, E.A.; Hristopulos, D.T. Improvement of groundwater level prediction in sparsely gauged basins using physical laws and local geographic features as auxiliary variables. Adv. Water Resour. 2013, 52, 34–49. [Google Scholar] [CrossRef]
  29. Krige, D.G. A statistical approach to some basic mine valuation problems on the Witwatersrand. J. Chem. Metall. Miner. Soc. S. Afr. 1951, 52, 119–139. [Google Scholar]
  30. Hristopulos, D.T.; Pavlides, A.; Agou, V.D.; Gkafa, P. Stochastic local interaction model: An alternative to Kriging for massive datasets. Math. Geosci. 2021, 53, 1907–1949. [Google Scholar] [CrossRef]
  31. Pavlides, A.; Agou, V.D.; Hristopulos, D.T. Non-parametric kernel-based estimation and simulation of precipitation amount. J. Hydrol. 2022, 612, 127988. [Google Scholar] [CrossRef]
  32. ENVECO. Environmental Impact Assessment Report for Mining and Metallurgy Operations, Appendix 2, Map 9-2; ENVECO: Athens, Greece, 2010. (In Greek) [Google Scholar]
  33. Goovaerts, P. Geostatistics for Natural Resources Evaluation; Oxford University Press: New York, NY, USA, 1997. [Google Scholar]
  34. Schabenberger, O.; Gotway, C.A. Statistical Methods for Spatial Data Analysis; Chapman and Hall/CRC: Boca Raton, FL, USA, 2017. [Google Scholar]
  35. Agou, V.D.; Pavlides, A.; Hristopulos, D.T. Spatial Modeling of Precipitation Based on Data-Driven Warping of Gaussian Processes. Entropy 2022, 24, 321. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the procedure.
Figure 1. Flowchart of the procedure.
Hydrology 11 00102 g001
Figure 2. Map of drill holes in three mines. Mine A (blue O), mine B (back X) and mine C (red +). Coordinates are in meters.
Figure 2. Map of drill holes in three mines. Mine A (blue O), mine B (back X) and mine C (red +). Coordinates are in meters.
Hydrology 11 00102 g002
Figure 3. Histograms for the logarithm-transformed data and correlations of Cd and Mn. The red lines on the histogram represent the normal distribution fit to the data and the red lines at the scatterplots represent the 1-to-1 line.
Figure 3. Histograms for the logarithm-transformed data and correlations of Cd and Mn. The red lines on the histogram represent the normal distribution fit to the data and the red lines at the scatterplots represent the 1-to-1 line.
Hydrology 11 00102 g003
Figure 4. Element variograms. Distance is in meters.
Figure 4. Element variograms. Distance is in meters.
Hydrology 11 00102 g004
Figure 5. Prediction maps of the logarithmic concentration (a,b); standard deviation of the prediction error for the logarithm of the elements (c,d).
Figure 5. Prediction maps of the logarithmic concentration (a,b); standard deviation of the prediction error for the logarithm of the elements (c,d).
Hydrology 11 00102 g005
Figure 6. Kriging map of estimated values after the inverse transformation. Values are in ppb for Cd and ppm for Mn.
Figure 6. Kriging map of estimated values after the inverse transformation. Values are in ppb for Cd and ppm for Mn.
Hydrology 11 00102 g006
Figure 7. Cross-validation errors for Cd.
Figure 7. Cross-validation errors for Cd.
Hydrology 11 00102 g007
Figure 8. Cross-validation errors for Mn using the SSRF variogram.
Figure 8. Cross-validation errors for Mn using the SSRF variogram.
Hydrology 11 00102 g008
Table 1. Statistics parameters Cd and Mn after logarithmic transformation. S.dev stands for the standard deviation. Units would be in log(ppb) for Cd and log(ppm) for Mn.
Table 1. Statistics parameters Cd and Mn after logarithmic transformation. S.dev stands for the standard deviation. Units would be in log(ppb) for Cd and log(ppm) for Mn.
StatisticCdMn
Min−3.0764−4.6721
Max5.56735.6682
Mean−0.7959−0.8834
Median−1.8638−1.2817
S.dev2.37362.6253
Kurtosis3.51672.7944
Samples6262
Table 2. Trend polynomial coefficients.
Table 2. Trend polynomial coefficients.
CoefficientCdMn
a 0 3.0818−3.6910
a 1 0.3254 10 3 −0.3054 10 3
a 2 0.3973 10 3 0.2183 10 3
a 3 0.3495 10 7 0.4360 10 7
a 4 0.2154 10 7 −0.1857 10 7
a 5 0.1743 10 7 0.3264 10 7
Table 3. Validation measures in variance and trend. RMSE: Root Mean Square Error, ME: mean error, MaxAE: Maximum Absolute Error, ρ : correlation coefficient.
Table 3. Validation measures in variance and trend. RMSE: Root Mean Square Error, ME: mean error, MaxAE: Maximum Absolute Error, ρ : correlation coefficient.
CdMn SpheMn SSRF
ME−0.3 ppb−4.2 ppm−0.3 ppm
MAE8.3 ppb11.7 ppm7.4 ppm
RMSE25.9 ppb39.2 ppm25.1 ppm
MaxAE166.8 ppb226.7 ppm147.9 ppm
ρ 64.7%46.7%51.9%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Diamantopoulou, E.; Pavlides, A.; Steiakakis, E.; Varouchakis, E.A. Geostatistical Analysis of Groundwater Data in a Mining Area in Greece. Hydrology 2024, 11, 102. https://doi.org/10.3390/hydrology11070102

AMA Style

Diamantopoulou E, Pavlides A, Steiakakis E, Varouchakis EA. Geostatistical Analysis of Groundwater Data in a Mining Area in Greece. Hydrology. 2024; 11(7):102. https://doi.org/10.3390/hydrology11070102

Chicago/Turabian Style

Diamantopoulou, E., A. Pavlides, E. Steiakakis, and E. A. Varouchakis. 2024. "Geostatistical Analysis of Groundwater Data in a Mining Area in Greece" Hydrology 11, no. 7: 102. https://doi.org/10.3390/hydrology11070102

APA Style

Diamantopoulou, E., Pavlides, A., Steiakakis, E., & Varouchakis, E. A. (2024). Geostatistical Analysis of Groundwater Data in a Mining Area in Greece. Hydrology, 11(7), 102. https://doi.org/10.3390/hydrology11070102

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop