Automated Mapping of Water Table for Cranberry Subirrigation Management: Comparison of Three Spatial Interpolation Methods

: In this paper we ﬁrst compare three different methods of spatial interpolation, i.e., inverse distance weighting (IDW), thin plate splines (TPS), and kriging on weekly water table depth (WTD) measurements from 80 observation wells in two cranberry farms (Farm A and Farm B) located in Québec, Canada. We use the leave-one-out cross-validation approach to assess the performance of the methods. Second, we evaluate the inﬂuence of the density of measurement points over the interpolation error for the cited methods. Third, we assess the performance of drainage systems and their impacts on crop productivity as a result of cumulative rainfall. Results along with practical considerations show that TPS is the best interpolator for WTD and this superiority is maintained and further demonstrated through a sensitivity analysis of the methods to spatial sampling density, i.e., partitioning the data into subsets of 25, 50, and 75% of the dataset. However, the random approach for selecting these subsets shows an unexpected result; that is, the interpolation methods exhibit a higher performance in terms of the Pearson correlation ( r ) for the 25% data subset at Farm B. Meanwhile, the cumulative precipitation over a three-day period, the maximum time required to return the soil matric potential to the optimal value after a major rainfall event, had a steady inﬂuence on WTD and thus crop productivity in the studied farms. This inﬂuence is more apparent for Farm A, but a rather random effect is noted for Farm B. This study presents a water-management-based strategy that mitigates the supplementary cost and effort for sensor deployment in water table monitoring for cranberry production. It is therefore of practical interest to cranberry growers and decision-makers who aim to maximize yields through water-management-oriented strategies.


Introduction
Integrated groundwater resource management is a viable approach for sustaining the activities of the industrial, agricultural, and domestic sectors. The rapid expansions of these sectors impose a growing impact on groundwater quality and quantity that has received international attention [1]. On top of this fundamental concern, climate change and increased interannual variability, as a result of alterations in the spatio-temporal distribution of precipitation and atmospheric temperature, threaten groundwater resources and lead to an overall decrease in groundwater level [2][3][4].
The province of Québec in Canada comprises about 3.4 million hectares of agricultural land that heavily depend on groundwater to meet the current food demand [5]. This agricultural landscape supports cranberry production, Quebec being the second largest producer in the world behind the state of Wisconsin [6], and requires the extraction of large volumes of water for plant development [7], management of soil moisture, harvest, and winter flooding, as well as protection against frost [8,9]. In general, the annual water requirement for cranberry production is about 2200 mm/yr [8].
Despite the high water demand for cranberry production, the crop itself is sensitive to soil water tension variations during the fruit-growing season [10]. For instance, [11] found that hypoxic stress can be responsible for up to 50% in yield reduction when the water table is 25 cm below the soil surface. They further reported an 8% decrease in cranberry yield for each 10 cm decrease in water table depth, starting from 60 cm below the soil surface down to 120 cm. This has heightened growers' interest in fostering drainage and irrigation management practices, conserving water resources while providing optimal yields, and hence establishing sustainable water management practices in cranberry production [12].
Understanding spatio-temporal variations in groundwater levels is a prerequisite for sustainable water supply and use [13][14][15]. However, the density of water table monitoring networks is not always adequate to accurately represent groundwater levels. Meanwhile, increasing sampling at fine spatio-temporal resolutions adds a heavy pecuniary load to environmental studies budgets, which require drilling wells or installing piezometers [16]. Alternative techniques that provide insights on unsampled areas, without much supplementary cost and effort, are therefore important. A variety of methods offer the opportunity to interpolate variables in space and time, and make optimal use of measurements. Among the interpolation methods are inverse distance weighting (IDW), thin plate splines (TPS), and kriging (KRG), which are widely used for estimating hydrological and soil properties, providing unbiased estimates with minimum variance [13,14,[16][17][18][19][20][21][22]. A detailed review of the applications of spatial interpolation methods in the environmental sciences is presented in [23].
IDW is a deterministic method that estimates values at unknown points based on weights assigned to values at known points. Weights are calculated as the inverse of the distance between each known point and the unknown location raised to a power. This implies that closer points have a greater influence on the estimated value than farther ones and the higher the exponent is, the greater the influence. The resulting interpolated surface is smoothed out when the weighting exponent is greater than zero [24]. The main limitations of this method are the arbitrary selection of the weighting exponent and the non-consideration of the sampling scheme in the method [24]. The TPS method is also a deterministic method but with a local stochastic component; it offers the possibility for the interpolated surface to pass exactly through the data points or to predict points that output a more continuous surface [24]. Kriging is a probabilistic method that applies a fitted variogram to compute the weights necessary for the interpolation process. A caveat of this method is that, when working with large datasets, it requires large computational resources [16].
The development of spatio-temporal variations in groundwater levels offers the opportunity to control the water table depth of cranberry fields which, in return, has the potential for increasing yield without decreasing the quality of the fruit, and whilst avoiding hypoxic stress in the root zone [11,25]. For instance, Caron et al. [10] suggested that, in order to optimize cranberry yield without excess water, the soil matric potential must be maintained between −4 and −7.5 kPa. The application of above-mentioned interpolation methods to cranberry yields can provide insights into the spatial relationships between cranberry yields and water table depth (WTD). This is a necessary step for developing a precision agriculture system for cranberries [26]. Thus, the objectives of this study were (1) to compare the aforementioned spatial interpolation methods and to select the one with the best performance, (2) to test the influence of the network density on method performance, and (3) to evaluate the performance of two drainage systems and their impacts on crop productivity as a result of rainfall events. The paper presents a water-management-based strategy that mitigates the supplementary cost and effort for sensor deployment in water table monitoring for cranberry production. It is therefore of practical interest to cranberry growers and decision-makers who aim to maximize yields through water-management-oriented strategies.

Study Sites
We selected two cranberry farms in the Canadian province of Québec in which to conduct the experimental study. They are both located on the south shore of the St. Lawrence River lowlands; the first farm (Farm A) at (46 • 14 N, 72 • 02 W) and the second farm (Farm B) at (46 • 18 N, 71 • 53 W). We show the geographical positions of the farms in Figure 1. Each farm comprises several fields with a surface area of approximately 1.8 ha (i.e., 40 m-wide × 450 m-long) where the "Pilgrim" cranberry variety is cultivated. These fields are constructed by excavating and replacing the top 30 cm of soil of each field with sand, a technique necessary to create favourable conditions for crop growth and production. The fields are also equipped with a subsurface drainage system consisting of four drainage pipes spaced by approximately 10 m and installed at an average depth of 65 cm below the soil surface. The drains flow out into a control box at one end of each farm, which maintains the water table at around 60 cm below the soil surface, providing a soil matric potential between −7.5 and −4 kPa. This ensures that the cranberries are adequately supplied with water.

Water Table Depth
We measured the water table level manually with a dipper-T water level meter (Heron Instruments Inc., ON, Canada) on a weekly basis during the growing seasons of 2017, 2018 and 2019 at 80 observation wells as depicted in Figure 1. These observation wells were made of PVC pipes inserted vertically and at a specific spacing with respect to subsurface drains. Each field contained between 1 and 4 wells installed at strategic positions to capture the range of possible WTDs in cranberry fields [27]. The ground surface of the fields, which is almost horizontal, was used as the reference for the measurements.
The WTDs in cranberry fields are maintained within an optimal range [11] by applying irrigation typically between two and three times per week [10]. As a consequence, the weekly measurements taken in this study do not capture the temporal drawdown between applications of irrigation. Thus, the study focuses on the spatial correlation in the WTD data and, accordingly, we treat every time step separately, i.e., as an independent set of spatial data. In addition, the water management at the farm level during the growing season is a factor that masks the seasonality in the dataset. Therefore, we did not consider investigating seasonal trends.

Precipitation
Precipitation was measured every day with a rain gauge (model WatchDog 1120) from Spectrum Technology, Inc. (IL, USA) connected to a ST-4 Hortau data logger (Lévis, QC, Canada) installed on each farm. Daily precipitation was converted to three-day cumulative antecedent precipitation. The choice of the three-day period is consistent with the maximum time after a major rainfall event required to return the soil matric potential to the optimal value of −4.0 kPa (i.e., WTD = 40 cm) to avoid hypoxic conditions in the root zone as reported by Pelletier et al. [11]. Hypoxic conditions resulting from inadequate drainage, particularly in periods of intense rainfall events, negatively influence cranberry production. These influences include root and fruit rot diseases and reduced root development, fruit retention, and productivity.

Yields
Cranberry yields were sampled at systematic locations in staggered rows in both farms at the end of the 2017, 2018, and 2019 cropping seasons (mid-October), a few days prior to harvesting by growers. We used a 0.09-m 2 wood square to collect the cranberries at each location. They were weighed with a 0.1 g precision scale and the yield was estimated in kg/ha for each location. Because of the limited spatial extent of our yield sampling and the expected influence of water table depth on cranberry yield, we linked each yield sample with the nearest observation well using the Distance Matrix tool of the open source Geographic Information System named QGIS [28]. The tool generates a table or matrix of yield sampling locations and observation well identifiers along with distance (m). Then, we processed the table to associate each well with the set of nearest yield sampling locations, and thus the estimated yield for each location. Finally, we computed the average yield with the corresponding observation well to explore relationships or trends between three-day cumulative precipitation, cranberry yields, and WTDs.

Inverse Distance Weighting (IDW)
As mentioned above, with this method, the weights given to observations at known locations are inversely proportional to the distance between these points and the location for which unknown values are estimated. These values are determined using Equation (1): where x i is the location with known values and x 0 , the unsampled locations, and w(x i ) is the weight assigned with the known value Z(x i ) measured at x i . w(x i ) is a function of the Euclidean distance between x i and x 0 . It is thus obtained using the power law of the distance between x i and x 0 , as in Equation (2): where d is the distance between x i and x 0 and p is a parameter that confers a greater influence to locations x i that are closer to x 0 and translates how rapidly this influence decreases as the distance becomes larger. Equation (2) also indicates that when x 0 coincides with x i , the weight is infinite; consequently the prediction at such a location is equal to the known value. Hence, the method is called an exact interpolator. In this study, we used three different values of p (i.e., 0.5, 2, and 4).

Thin Plate Splines (TPS)
Thin plate splines (TPS) is a type of spline interpolation method, which consists of applying mathematical functions with continuous derivatives up to a certain order to fit the observed data. The resulting curve represents the minimized surface curvature of a thin metal sheet "riveted" at the observed points, i.e., a smoothed surface profile that coincides with the observed points. This is realized by minimizing the error and the bending energy as described below.
The value Z(x i ) at each location x i is estimated using Equation (3): where f (x i ) and (x i ), respectively, represent an unknown fitting function and random errors.
The fitting function f is obtained by minimizing its bending energy given by Equation (4): where λ is a smoothing parameter estimated through a cross-validation procedure and J( f ) is the measure of the smoothness of f (x i ). The latter, given by Equation (5), is determined through a bidimensional integration of the biharmonic function ∇ 4 f (x, y).
In our study, this deterministic interpolation method was performed using the Tps function from the fields package [29] of the R programming language [30]. This function also includes a parameter for setting the smoothing parameter λ.

Kriging
Kriging is a family of interpolation methods that, similar to IDW, assigns weights to observed values in predicting values at non-observed locations. The weights are computed using a semi-variance function based on the distance between the sampled points and unsampled locations, as well as on the spatial distribution (i.e., direction) of the sampled points. Thus, kriging is known to be more appropriate when the data present some spatial association or directional bias [31]. Kriging is also considered as a type of auto-regression with spatial correlation between the residuals of sampled points considering two at a time.
Assuming that the mean value (constant or variable) of the whole domain is known, the predicted valuesẐ(x 0 ) at unsampled locations x 0 are given by Equation (6): where m(x 0 ) and m(x i ) are, respectively, the expected values ofẐ(x 0 ) and Z(x i ); w i denotes the kriging weights assigned to the sampled points x i . m(x i ) is estimated by minimizing the error variance of the kriging estimator given by Equation (7): The kriging weights are estimated using a variogram model of the residuals as given by Equation (8).
where γ is the semi-variance and N is the number of pairs of sampled points separated by the distance or lag h. We tested two methods of the kriging family, i.e., kriging with external drift (KED) and regression-kriging (RKG) [32]. We used the automatic fitting routine from the automap package developed by Hiemstra et al. [33], which is based on the gstat package [34] of the R programming language [30].

Evaluation Criteria of Method Performance
We determined the performance of the interpolation methods with the leave-one-out cross-validation procedure, also called jack-knifing. We used several criteria that reflect different evaluation properties such as the mean error (ME; Equation (9)), mean absolute error (MAE; Equation (10)), mean squared error (MSE; Equation (11)), root mean squared error (RMSE; Equation (12)), and correlation between observed and simulated (Cor o bs s im), to evaluate the cross-validation performance.

Sensitivity of Interpolation Methods to Spatial Sampling Density
To study the sensitivity of the interpolation methods to spatial sampling density, we randomly partitioned the data for each farm into three subsets containing 25, 50, and 75% of the initial data points. For each subset, we used two groups of replicates consisting of ten (10) and twenty (20) replicates. We performed the Kruskal-Wallis test [35] to assess the statistical significance of the MAE between subsets.

Performance of the Studied Interpolation Methods
The performance of the interpolation methods for both farms are presented in Table 1 using the evaluation criteria specified above. For instance, the TPS method outperforms the other methods in terms of the Pearson's correlation coefficient (0.38); KED performs better in terms of ME (0.27 cm); and IDW (p = 2) and RKG perform equally better in terms of MAE (11.89 cm) for Farm A. For Farm B, IDW (p = 4) is the dominant method (MAE = 0.28 cm and Pearson's correlation coefficient = 0.60) and TPS the second best (MAE = 0.32 cm and Pearson's correlation coefficient = 0.59) with a negligible difference in MAE (0.05 cm) compared to IDW (p = 4). This difference between TPS and IDW (p = 4) for Farm B can partially be attributed to the fact that IDW performs better on regular spaced data [23], which is the case for Farm B shown in Figure 1. In addition, the spatial clustering of the sampled data points in Farm B is a factor that may further explain these results [23]. IDW (p = 0.5) is the lowest performing method with the smallest Pearson's correlation coefficient and a relatively larger MAE.
The metrics indicate that the choice of an interpolation method for these cranberry farms is not obvious, as no single method exhibits a clear superiority. Therefore, additional considerations, such as the practicality of the best performing methods (i.e., TPS, IDW (p = 2 and p = 4), RKG, KED), in addition to the trade-off in terms of potential yield loss or gain if one or another of these methods is used, should guide the choice. In line with these considerations, it is worth mentioning that the kriging methods might have produced better results compared to those presented in Table 1 by further optimizing the automatically fitted semivariograms. However, the fine-tuning process requires visual inspection of the results and the method is computationally intensive. These two factors are not practical in a precision irrigation context, which requires the automated generation of WTD maps through a web-based decision support system. Therefore, we discard the kriging methods from the subsequent discussion. Table 1 shows that the respective maximum differences or variations between MAEs (1.56 cm) and RMSEs (2.60 cm) of the (remaining) best performing methods are small compared to the threshold of water depth variation (10 cm) that has a considerable impact on yield (8% as loss rate) reported by Pelletier et al. [11]. With respect to the threshold, these variations suggest that there may be a negligible impact on yield if one of the best performing methods is used in irrigation planning decisions. The variations also mean the consideration of the method that produces the smallest MAE to RMSE variation compared to the 10 cm threshold could be a strategy to single out a method. With this consideration, TPS, being among the two best performing methods and with the lowest MAE to RMSE variation, is selected as the interpolation method most suitable for mapping water table depths in these cranberry fields. In addition, as we shall see later, TPS is less sensitive to the spatial density of the observation points compared to the other methods.
Our results agree with those from some previous studies that compared interpolation methods for groundwater depth [14]. However, they disagree with other previous results that suggested kriging is the best interpolator [13,16,36,37]. A plausible explanation for the disagreement with previous studies is the statistical characteristics (i.e., variance, mean) along with the spatial distribution of the data points [23,36,38]. For instance, cranberry fields are intensely managed (i.e., drained and irrigated) to maintain the water table within an optimal range to favour plant productivity [11]. The drainage/recharge cycles impose a limit on the WTD statistics (i.e., range, mean, variance) in cranberry fields as opposed to the WTD of natural aquifers studied by Sadat-Noori et al. [37] and Yao et al. [36]. Furthermore, Ohmer et al. [39] reported that the performance of interpolation methods may well depend on physical characteristics such as type of aquifer and topography of the study area. Therefore, the agreement and disagreement may be also be explained by the fact that cranberry fields are made of anthropogenic soils that comprise natural or constructed sandy and peat soils. Another factor that may influence the results is the position of the observation wells located at different distances (e.g., mid-spacing, quarter-spacing) with respect to the drain pipes within a field. The need to measure the water table depth or drawdown at different distances between the drains introduces some kind of regularity in the positions of the wells which may well influence the results. In addition, the degree of clogging of the drain pipes could influence the water table behaviour not only within a specific field but also between fields. Another result is that for each farm, IDW (p = 2) and RKG produce similar values for MAE and the Pearson's correlation coefficient (Farm A: MAE = 11.89 cm, correlation coefficient = 0.29; Farm B: MAE = 0.32 cm, correlation coefficient = 0.59). In such a situation, the precision of interpolation methods and the smoothness of the interpolated domain often dictate the choice of the best interpolation method [40]. In addition, we noticed that the semivariograms differ between the two farms and WTD survey dates. In this regard, Farm B shows spatial dependence with a time-varying linear fitted model with a very large range while Farm B shows an almost exponential fitted model with a time-varying range between 150 and 400 m. The automatic procedures for fitting the semivariograms are not optimal for our data and mostly require a lot of tuning.
Specific results pertaining to Farm A and presented in Figure 2 show that MAE values (up to 20 cm) are larger for the 2017 growing season data compared to the other years. This is explained by some management problems in the farm and some high intensity rainfall events during 2017. Using the TPS as the best performing interpolation method, we developed maps of the spatial distributions of WTDs for Farms A and B. These results are presented in Figures 4 and 5 for Farm A and Farm B, respectively.

Effects of Spatial Sampling Density on Interpolation Metrics
Figures 6 and 7 present boxplots of Pearson's correlation coefficient, and MAE resulting from the application of each interpolation method to the data subsets with 20 replicates for Farm A and Farm B, respectively. In addition, we performed the Kruskal-Wallis test [35] to assess the level of significance of the differences in MAE per interpolation method. The results indicate that the interpolation metrics are sensitive to the spatial density sampling scheme and the MAEs between the subsets are significantly different (p < 0.05). The highest Pearson's correlation coefficient is achieved with the 75% subset, followed by the 50% subset for Farm A, but with the 25% subset followed by the 50% subset at Farm B for each interpolation method. We need to highlight that the later result (i.e., for Farm B) persists for both groups of replicates, i.e., ten (10) and twenty (20). Furthermore, TPS still maintains its dominance over the other methods.
For Farm A, MAE values vary from around 5 cm to about 30 cm with a median close to 12.50 cm. In addition, we observed that for all interpolation methods, the narrowest interquartile range box is obtained for the 75% subset followed by the 50% subset.  For Farm B, we observed MAE values from 0.1 to 0.7 cm. Contrary to Farm A, we observed that for all interpolation methods, with the exception of the KED, the narrowest interquartile range box is obtained for the 75% subset followed by the 50% subset.

Water Table and Cranberry Yield Response to Precipitation
We investigated the performance or behaviour of the drainage systems at both sites under the influence of cumulative precipitation over a period of three days preceding the date of weekly water table depth measurements. The analysis was complemented by considering the farms' productivity. For this, we used the average annual yield associated with the closest observation wells in our analysis, and we present in Figure 8 boxplot summarizing the statistics of the sampled yield data. The boxplot shows that Farm A is more productive overall than Farm B. In addition, Farms A and B were at their lowest productivity stages in 2017 and 2019, respectively. Furthermore, both farms were most productive for 2018, with sampled yields ranging from about 3000 to 46,000 kg/ha and from 4000 to 43,000 kg/ha for Farm A and Farm B, respectively. Figures 9 and 10 depict the relationships on an annual basis between three-day cumulative precipitation, water table depth, and cranberry yields for Farms A and B, respectively. It can be observed that the water table depth for Farm A (the most productive farm), is controlled within a more or less consistent range from roughly 50 to 75 cm compared to Farm B (the least productive), which has a range from about 50 down to 120 cm. As mentioned before, this anomaly can be attributed to some farm management problems and under-performing drainage systems. These results are in overall agreement with the findings of Pelletier et al. [11] for similar cranberry fields.  It is not clear that cumulative precipitation over a three-day period has a steady influence on water table depths. However, this impact is more apparent for Farm A with some water table depths between 12 and 50 cm with respect to the soil surface as the three-day cumulative precipitation increases. For Farm B (Figure 10), the impact is rather unclear. As mentioned previously, this may be a consequence of management problems and poorly-performing drainage systems. Another possible explanation is that, in some cases, the larger fraction of the rainfall over the three-day period is measured on the first or second day. Accordingly, the drainage systems may have sufficient time to evacuate the excess water. Therefore, finer temporal-resolution measurements are necessary to shed light on this dependency.

Conclusions
The performance of three spatial interpolation methods were assessed on weekly water table depth measurements collected at 80 observation wells installed in two individual cranberry farms located in Québec, Canada. We also evaluated the influence of measurement point density over the interpolation errors for the cited methods by randomly withdrawing ten replicates of 25, 50, and 75% of the original datasets. For these two steps, the results indicate that thin plate splines (TPS) method had the best performance. In addition, we noticed a sensitivity of TPS to the density of the sampling points. In a third and last step, we assessed the performance of the farms' drainage systems and their impacts on crop productivity as a result of cumulative rainfall over a period of three days preceding the water table depth measurements. The results indicate no clear-cut trend. Indeed, such an impact was more apparent for one farm, but appeared as a random effect for the other. Finer temporal-resolution measurements (e.g., sub-daily data) would be necessary to shed light on this dependency.
The development of alternative sampling techniques that provide insights on unsampled areas is a viable strategy for mitigating supplementary cost and effort for sensor deployment. It is therefore of practical interest to cranberry growers and decision-makers who aim to maximize yields through water-management-oriented strategies. In future work, we plan to explore the possibility of using machine learning methods such as random forest to generate the data subsets used to measure the sensitivity of the methods to sampling density. In this vein, this effort would help in more accurately identifying those subsets that are more likely to provide results that approximate the results of the original dataset. An anticipated practical application for such an outcome is that the process of planning sensor deployment to measure water tables would be conducted in a more educated fashion. In return, cost and labour for dense deployment of sensors in cranberry fields would be optimal.
Funding: This research was supported by a grant (RDCPJ 477937-14 -Gestion intégrée des ressources en eau dans la production de canneberges) from the Natural Sciences and Engineering Research Council (NSERC) of Canada. We acknowledge the contribution of CentrEau | Quebec Water Management Research Centre in the form a postdoctoral scholarship to the first author through its New Research Projects Grant program.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: