Optimization of a Groundwater Monitoring Network for a Sustainable Development of the Maheshwaram Catchment, India

Groundwater is one of the most valuable resources for drinking water and irrigation in the Maheshwaram Catchment, Central India, where most of the local population depends on it for agricultural activities. An increasing demand for irrigation and the growing concern about potential water contamination makes imperative the implementation of a systematic groundwater-quality monitoring program in the region. Nonetheless, limited funding and resources emphasize the need to achieve a representative but cost-effective sampling strategy. In this context, field observations were combined with a geostatistical analysis to define an optimized monitoring network able to provide sufficient and non-redundant information on key hydrochemical parameters. A factor analysis was used to evaluate the interrelationship among variables, and permitted to reduce the original dataset into a new configuration of monitoring points still able to capture the spatial variability in the groundwater quality of the basin. The approach is useful to maximize data collection and contributes to better manage the allocation of resources under budget constrains.


Introduction
Industrial production and agriculture are major threats to water resources in the semi arid regions of India.In recent years groundwater provided water to irrigate approximately 27 million hectares (60% of the country's irrigated land) against 21 million hectares supplied by surface water [1].Between 1970 and1994, the extent of groundwater-irrigated land in India increased by 105%, whilst land irrigated by surface waters showed an increase of only 28% [2].At many localities, groundwater constitutes the only supply for drinking purposes.The dramatic increase in groundwater usage clearly shows that competition for water is on the rise.Furthermore, climate change and population growth in years to come are expected to place additional pressure on the groundwater system.Planning and an adequate management of the resources are crucial not only to meet the demand, but also to protect the water quality of stressed regions.
The first step of any sustainable management is to understand the underlying physical processes and to derive the corresponding mathematical formulations [3].In the Maheshwaram Catchment, there is a large number of irrigation bores useful for monitoring purposes.However, constrains on budget, equipment, and human resources mean that only on a limited number of these bores can be used for investigation.Selecting the location of the points to be monitored involves practical and technical considerations but to a certain extent, the process is subjected to bias and uncertainty.In this regard, geostatistical predictions are being increasingly used to address the imperfect knowledge of attributes that fluctuate over large areas [4].The accuracy of the data retrieved and the subsequent predictions are especially dependant on a reliable optimization of the monitoring network.The information generated by such an optimal monitoring network should provide sufficient and non-redundant information to fully understand the spatial phenomena of the monitored variables.Several statistical methods can be applied to approach the problem.According to [5], these methods can be classified as simulations, variance-based techniques, and probability.Essentially, the difference among them lies in the formulation of the objective function to be optimized.The variance reduction method is widely accepted as a reliable tool in optimization problems [2].In addition, it usually requires less iterations to obtain the same accuracy on the estimates.The technique utilizes a unique property of geostatistical estimation by which the variance of the estimation error depends only on the structure of the selected parameter and not on the measured values at additional points.This enables one to select a new observation point and analyze its effect prior to making a measurement [6].Due to these advantages, the variance reduction technique was selected for the present exercise.The uncertainty associated with a given monitoring network may be determined by the variance of estimation obtained by kriging interpolation.In a certain area, the uncertainty for a distribution of monitoring wells is associated to that particular location.Changes in the number or location of wells will be directly reflected in the level of accuracy of the estimations.Thus, the variance of the error must be sought to be used as an objective function.
This paper describes the use of geostatistical techniques on groundwater-quality data in the Maheshwaram Catchment, India, in order to determine an optimal monitoring network for the area.The approach is essentially supported by a principal component analysis (PCA) along with kriging interpolation.Findings from the study are useful to reduce the existing dataset whilst retaining the relationships originally present.More importantly, the optimized network constitutes a cost-effective alternative to design future sampling programs, and allows for a better management of the available resources under budgetary constrains.

Theoretical Considerations
The principal component analysis (PCA) is a procedure for finding hypothetical variables which account for the majority of the variance in a multi-dimensional dataset [7].This can be achieved by transforming the variables under study into a new set of variables, the principal components (PCs).Thus, the goal when using PCA analysis is to determine a few linear combinations of the original variables that can be used to summarize the dataset without losing much information [8].The mathematics behind the procedure is explained in most statistic textbooks and will not be presented here.However, it is worthy to note that the analysis calculates new variables from the original variables in an attempt to detect similarities among the original data.The method allows for the identification of homogeneous subgroups that better describe the system behavior [9].
Quantification of the spatio-temporal variability of a dataset can be achieved by the use of variograms.A generalized formula to calculate a variogram from a set of scattered data can be written as follows [10]: where, where Zx i is a multivariate random variable, d and θ correspond to the initially selected lag and direction of the variogram, and ∆d and ∆θ are the tolerance on the lag and direction respectively.The components d and θ are the actual lag and direction for the corresponding calculated variogram.N d is attributed to the number of pairs for a particular lag and direction.Equation (3) avoids the rounding off error of pre-decided lags (only multiples of the initial lag are taken in conventional cases) and the direction.If the data is collected on a regular grid, and ∆d is assumed to be zero, Equations ( 1) and (3) will be simplified only for θ.
On the other hand, kriging is a method for linear optimum unbiased interpolation with a minimum mean interpolation error [11].Kriging is only a technique among many others for interpolation of a variable.However, it presents a number of advantages since it considers: (i) the number and spatial configuration of observation points; (ii) the position of the data points; (iii) the distance between the data points with respect to the area of interest; and (iv) the spatial continuity of the interpolated variable [12].These advantages and its wide application in hydrogeological problems led us to select this method for the present study.
In short, kriging is a method of weighted averaging of the observed values of a property Z within a neighborhood V, from measured values z(x i ) of the property at 'n' sites, x i = 1,2,3, ..., n.Estimates can be made over a block B by: where λ i correspond to the weights associated with the sampling points.To ensure that the estimates are unbiased the sum of the weights λ i must be 1.Therefore, The estimation variance for z (x 0 ) is given by: In this case, γ (x i , x j ) is the semi-variance between the i th and the j th sampling points;   (x i , B) is the average semi-variance between the block B and the i th sampling point; and   (B,B) is the average semi-variance within the block B (i.e., the block variance).The estimation variance is minimized consistent with Equation (5) when: A Lagrange multiplier, µ, is introduced to achieve minimization.The weights are found by solving these kriging equations, and then they are inserted into Equation (7).The kriging estimation variance is estimated from the solution by: The estimation variance for simple kriging equations as described above depends on the configuration of the observations in relation to the point or block to be estimated, but not on the observed values themselves.This can be exploited in designing sampling schemes for spatially heterogeneous variables, as a measure to determine the distance between sampling locations and in the area of optimization of monitoring networks.

Study Area
The Maheshwaram catchment is located approximately 30 km south of Hyderabad, in the Ranga Reddy district of Andra Pradesh, India (Figure 1).The area is a typical granitic terrain extending over an area of about 60 km 2 .
The topography is flat to gently undulating, with elevations between 590 and 670 m above mean sea level.The climate is classified as semi-arid, with a mean annual precipitation in the order of 750 mm, mainly falling during the monsoon season between June and September.There are no perennial streams.On a regional scale, groundwater flows from SW to NE. Geologically, the area is dominated by Archean granites of medium to coarse grain, commonly intruded by quartz and dolerite dykes over several generations.Rocks have undergone a variable degree of weathering, usually with a thickness of 15 m to 20 m.An underlying fractured zone extends up to 50 m below ground level (mbgl) (Figure 2).Crystalline basement aquifers can be divided in several compartments that together constitute the aquifer but which are characterized by distinct hydrogeological properties [13,14].In the area of study, these compartments can be described as: (i) the upper zone which consists of weathered and decayed rocks of clayey-sandy composition.Their hydraulic conductivity is usually low, but ther water-retention capacity can be significant; (ii) the intermediate fissured zone (FZ) characterized by horizontal fractures that diminish in density with depth, and an important number of vertical fractures and fissures that act as preferential pathways.This zone is characterized by higher values of hydraulic conductivity; (iii) the underlying unaltered rock, usually of low permeability and limited storage capacity.

Methods
A PCA analysis was carried out on groundwater-quality data from 61 bores scattered throughout the Maheshwaram catchment (Figure 3).Data collection involved both field and laboratory work.Standard parameters such as electrical conductivity (EC), and pH were measured in situ.Samples were taken following parameters stabilization.Water was filtered in the field and stored in previously-rinsed 500 ml bottles for chemical tests.A subset of samples was acidified with HNO 3 -for cation analyses.

Results and Discussion
A correlation matrix was used to apply a PCA analysis of water quality data.Following the methodology of [15], components with an eigenvalue less than 1 were eliminated.Thus, only the first three components were extracted for the analysis.The initial factors solution was then rotated by the variamax rotation technique [16], in order to obtain new variables (i.e., principal components or principal axes).As indicated by their cumulative percentage of variance, the three extracted components accounted for 62% of the entire dataset variance (Appendix I).
The first factor explained 37% of the total variance.It was characterized by very high loadings of TDS and EC, high loadings of SO 4 2-and Cl -, and somewhat moderate Na + and total hardness.These results suggest that the above mentioned ions are the main solutes in groundwater, and are thus, responsible for the elevated dissolved solids and conductivity values present in the system.Leakage of agricultural products would dominate the input of these chemicals into groundwater.
In contrast, the combination of factors 2 and 3 contribute to nearly 25% of the total variance.Factor 2 is characterized by high pH, alkalinity, and F -loadings, whilst factor 3 is dominated by Ca 2+ , Fe 2+ , Mg 2+ , and moderate K + and hardness.It is hypothesized that these elements are derived from natural processes rather than anthropogenic activities.
The three principal components of the PCA (PC 1 , PC 2 , PC 3 ) were used to establish three new variables (X 1 , X 2 , X 3 ), which project the 'n' observations onto the first three principal components.These new variables constituted the basis for optimizing the monitoring network with a reduced number of interrelated variables.The spatial variability of these new variables over the Maheshwaram watershed was defined by calculating their experimental and theoretical variograms (Appendix II).Before the implementation of any simulation or optimization mathematical model, consistency with the original data must be verified [17].Thus, a cross validation test was carried out to ensure that the variogram represents the true variability of the parameter and is able to reproduce the measured values: where, z is the observed value for the parameter under study, z* is the estimated value of that parameter, and σ is the standard deviation of the estimation error.
The cross validation was performed by masking a specified value from the dataset and then estimating it from the remaining values and by the variograms.Results must satisfy Equation (11) otherwise, the variogram outcomes cannot be considered plausible.Having established the adequacy of the variograms, the kriging procedure was employed to estimate the standard error across the area of investigation.The watershed was divided into a uniform grid of 883 cells of 250 m by side and a cut-off value of 0.5 was established for the first variable.In contrast, values of 0.7 and 1 were considered for the second and third variable respectively.Points were individually removed to see their effect on the error function.Points that resulted into an increase of the estimation error if removed were kept in the monitoring network.In contrast, boreholes that did not affect the error value were permanently eliminated.This procedure was repeated for each one of the three variables, to finally derive an optimized monitoring network through combination of points.A spatial distribution of redundant wells is depicted in Figure 4.The original dataset and the optimized monitoring network produced similar solutions, which lead to the conclusion that a reduction in the number of observation points does not compromise the quality and resolution of the collected samples if the network distribution is properly designed (Table 1).
As a last validation, the final monitoring network was examined against each one of the 15 initial variables.The verification procedure does not aim to prove the correctness of the model but to ensure the absence of systematic errors [11,18].This was carried out by making variographic analysis of all the initial parameters.Cross validation tests were performed and then followed by the ordinary kriging estimation of each variable using original observation points and optimized monitoring points.Estimations were carried out on 250 m grids for both the original and final monitoring network.Subsequently, the mean standard deviation of the estimation error was calculated.Results of the analysis confirm the redundancy of 13 points for each of the individual initial parameters (Table 2).

Summary and Conclusions
Identification of aquifer parameters by direct observations is a challenging task.The distribution of measurement points depends on a number of factors such as aquifer characteristics, terrain conditions, and availability of resources.Heterogeneous conditions usually require a dense monitoring network, which in most cases is not economically feasible.As a consequence, complex groundwater systems must be analyzed from only a handful of observations.It is clear that the accuracy of the inferences made is strongly linked to the arrangement of the observation points.A correct well distribution needs to capture a representative set of aquifer properties that can be extrapolated to unsampled locations.Experience and technical knowledge play a major role in the process, but to a certain extent, selecting the wells location is inherently subjective.Therefore, the last years have witnessed an increase in the use of geostatistical techniques to quantitatively evaluate and optimize observation networks.In this context, a PCA analysis was applied to water-quality data of the Maheshwaram catchment, India, to optimize the groundwater monitoring network in the region.Kriging interpolation provided an insight into the uncertainty of the distribution.Results indicated that 13 out of a total of 61 bores are redundant and therefore, should be disregarded in future sampling rounds.A comparison between the interpolation error of the original dataset and the optimized distribution showed a negligible difference.This indicates that a reduction in the number of monitoring wells will not incur in a considerable loss of detail for data collected in the future.In view of this, it is concluded that an efficient sampling network should not be defined solely on intuition or qualitative assessments as they can be misleading.Through a simple exercise, the present study demonstrated that an adequate configuration of observation points is still able to accurately capture the spatial variability of groundwater characteristics, while maximizing the use of the allocated resources.The continuous development of more user-friendly software and the reduction in computational efforts suggest that the use of geostatistical tools will increase in the future, allowing hydrogeologists to better reach an appropriate trade-off between density of data and the investment demanded to collect it.

Figure 1 .
Figure 1.Location of the study area.

Figure 2 .
Figure 2. Schematic cross section of the Maheshwaram Catchment.

Table 1 .
Optimal solution for the monitoring network at the Maheshwaram Catchment.
*Redundant monitoring wells in bold italics; MSD: mean standard deviation.

Table 2 .
Statistics comparison for the original and optimized monitoring networks.