Estimation of Daily Water Table Level with Bimonthly Measurements in Restored Ombrotrophic Peatland

Daily measurements of the water table depth are sometimes needed to evaluate the influence of seasonal water stress on Sphagnum recolonization in restored ombrotrophic peatlands. However, continuous water table measurements are often scarce due to high costs and, as a result, water table depth is more commonly measured manually bimonthly with daily logs in few reference wells. A literature review identified six potential methods to estimate daily water table depth with bimonthly records and daily measurements from a reference well. A new estimation method based on the time series decomposition (TSD) is also presented. TSD and the six identified methods were compared with the water table records of an experimental peatland site with controlled water table regime located in Eastern Canada. The TSD method was the best performing method (R2 = 0.95, RMSE = 2.48 cm and the lowest AIC), followed by the general linear method (R2 = 0.92, RMSE = 3.10 cm) and support vector machines method (R2 = 0.91, RMSE = 3.24 cm). To estimate daily values, the TSD method, like the six traditional methods, requires daily data from a reference well. However, the TSD method does not require training nor parameter estimation. For the TSD method, changing the measurement frequency to weekly measurements decreases the RMSE by 16% (2.08 cm); monthly measurements increase the RMSE by 13% (2.80 cm).


Introduction
Hydrological monitoring in ombrotrophic peatlands is used to understand the effect of the water table depth fluctuation on vegetation structure [1], which is mostly composed of Sphagnum species. Spatial changes in the water table depth, which is associated with water availability, drive changes in the species composition of biotic assemblages [2]. Water table fluctuation also influences decomposition, microbial activity [3], and greenhouse gas emissions [4]. Water availability is important information for peatland management because it influences gas diffusion rates, redox status, nutrient availability and cycling and species composition and diversity [5][6][7][8][9], and it is important for water resource management, flooding and stream water quality [9]. There is evidence that demonstrates its potential use in predicting primary production and surface vegetation growth, mainly Sphagnum, for moss cultivation [10], even in a forestry or peatland post-extraction context [11,12].
The presence of water in ombrotrophic peatland influences peat hydrophysical properties such as water storage capacity [13], hydraulic conductivity due to the surface subsidence [1,14] and drainable porosity [14,15]. It is then a domino effect since any alteration of the hydrological regime can significantly affect peatland vegetation [16,17]. This paper is divided into seven sections allowing us to answer our research question: Is it possible to estimate the daily fluctuation of the water table depth from weekly or bimonthly manual measurements? This paper begins with a brief overview of the methods identified as potentially useful for estimating daily water table depth with bimonthly measurements and daily measurements of some reference wells. With the limitations of the identified methods, a new estimation method based on the time series decomposition (TSD) is described in the third section. Section 4 describes the methodology for the evaluation of the set of estimation methods. Finally, Sections 5-7 present and discuss the results.

Estimation Methods
The estimation of water table depth in restored peatlands as function of weather data, hydrological characteristics of peatland (e.g., peat decomposition, depth of peat) and even past records of the depth of the water table has remained a difficult topic [6,14,40,41]. As bimonthly manual measurements of water table depth co-occur generally with daily instrumented measurements in a few selected wells (reference wells), some methods can be used to estimate the daily water table level in wells with infrequent water table records that are close to reference wells with daily records. These methods are classified into two groups: physical-based and data-driven methods. Physical-based methods are widely used for the description of hydrological phenomena in peatlands [7,[40][41][42]. However, they do have practical limitations [43]. For example, a physical-based modelling approach requires an adequate and accurate definition of aquifer parameters to describe the soil subsurface spatial variability [44][45][46]. Typically, this information is difficult to obtain because of cost and time constraints [47,48].
Data-driven empirical methods are susceptible to provide useful results without costly calibration times [43]. The identified data-driven estimation methods are grouped into three types of methods: linear method, nonlinear methods and regression trees. The first category includes the general linear model (GLM). The second category comprises k-nearest neighbours (KNN) method, which is based on the similarity measure (distance) between data. Finally, for the regression trees, there are four estimation methods: support vector machines (SVM), decision tree (TREE), random forest (RF) and adaptive boosting (ADABOOST). The next sections present summaries of these methods. For all methods, the following terms must be defined a priori: (1) The estimated wells: some occasional measurements (weekly, bimonthly or monthly) were made for those wells and there is an interest in knowing the daily water table values to identify periods of water stress; and the reference well, which has daily or hourly records of the water table level and is normally located near the estimated well. (2) These data-driven methods are calibrated with the infrequent data from the reference well and the estimated well. The method then uses the daily water table data of the reference well to estimate the daily water table values for the estimated well.

General Lineal Model (GLM)
The general lineal model is the most widely used method because of its easy implementation [49]. The GLM assumes a linear relationship between the independent variables (water table depth of reference wells) and the dependent variable (water table depth of estimated well).
Brown et al. [10] used a linear regression between weekly logged estimated wells and daily logged reference wells to estimate daily water table values, obtaining a coefficient of determination (R 2 ) of 0.55. The dataset was then used to calculate the optimal range of days for gross ecosystem productivity calculation.
By employing linear regression, the estimation procedure is simple and easy to understand. Authors including Lu et al. [50] and Choubin and Malekian [51] argued that linear models are appropriate to model simple systems characterized by linear relationships between the hydrological observations. The use of a linear model (simple or multivariate) Sustainability 2021, 13, 5474 4 of 21 assumes that the process in question behaves like a normal distribution. The use of linear regressions is current, even if the groundwater flow and most hydrological processes are commonly considered nonlinear [52].

k-Nearest Neighbours (KNN)
Proposed by Cover and Hart [53], this method does not have any discriminative function from training data, but stores the training data set by groupings. It is based on the selected distance measurement and the number of k neighbours. The k-nearest neighbours (KNN) algorithm selects the k-nearest samples in the feature space (N-dimensional space, where N is the number of features); each sample is equivalent to one vote and assigns an attribute (labelling) by a majority vote. In the case of having a reference well associated with an estimated well, the observations can be organized in a 2-D space (x for the water table depth of the reference well and y for the water table depth of the estimated well). The interpolation for x i is made within the range of observed water table depth for the reference well, x min and x max , and the estimated value forŷ i is the average of the observed values closest to x i .
The number of k-neighbours may be specified a priori in the cross-validation. It is advisable to use the square root of the number of observations in the calibration set [54]. There are two peculiarities of this algorithm: it is a memory-based approach, it is adapted immediately to new training data and it is sensitive to the local structure of the data [55] since the closest neighbours could have more weight on the average calculation. Moderasi and Araghinejad [56] and Sakizadeh and Mirzaei [57] report successful cases of groundwater classification, with accuracy above 90%.

Support Vector Machines (SVM)
This algorithm was originally developed for classification problems [58]. However, it aroused special interest years after its emergence because despite being a linear machine, it can be implemented on nonlinear class boundaries [59]. The objective of the support vector machines (SVM) method is to construct a hyperplane to classify the data points (data from the reference well and the estimated well) in the feature space [59]. The selection criteria to draw the hyperplane is to maximize the margin. The margin is defined as the distance between the hyperplane of separation and the training points that are close to the hyperplane. These points are also called support vectors. The SVM method has been developed to be applied to nonlinear problems using few support vectors [55,60]. Although authors including Zhao et al. [61] and Rahman et al. [62] report satisfactory evaluation statistics (R 2 grater than 0.93) for groundwater level forecasting, its interpretability is low.

Decision-Tree-Based Models: Regression Tree (TREE) and Random Forest (RF)
Like the SVM method, a division of the data set is performed in different and nonoverlapping regions with shared characteristics. Decision-tree-based models represent a suitable solution for applications on small-sized datasets [63]. Regression tree (TREE) represents a set of restrictions or conditions which are hierarchically organized and successively applied from root to a terminal node or leaf of the tree [64,65]. In practice, this can produce a very deep tree with many nodes, which produces overfitting. A good option is to prune the tree, i.e., adjust its maximum depth [55]. The induction of the decision-tree-based method involves (a) selecting optimal splitting of the dependent variable into binary pieces, where the child nodes are "purer" than the parent node, and (b) searching through all candidate splits to find the optimal split that minimizes the impurity of the resulting tree [63,64]. Decision-tree-based models allow the presentation of more understandable results. They can model nonlinear phenomena and do not need prior statistical assumptions, elimination of outliers or data transformation [66].
The random forest (RF) method combines multiple decision-tree-based models to produce repeated predictions of the same phenomenon [63]. RF is a relatively new machine learning technique [67]. The idea behind RF is to average multiple blocks to create a Sustainability 2021, 13, 5474 5 of 21 more robust model that has better generalization performance, and it is less susceptible to overfitting to their training set [55,63]. The multiple blocks are also called deep-decision trees, which individually suffer from high variance. RF is a popular approach due to its high precision and capability to handle a large amount of input variables [68,69]. The number of trees and the number of features to be used at each split are the parameters to be determined during training [70]. There are also other two parameters to be established for RF training: the random state to control the random number generator used, and the minimum number of observations at the terminal nodes of the tree [63]. This methodology reports the best results (R 2 grater than 0.9) in hydrology applications (e.g., groundwater pollution and groundwater forecasting) in comparison to the set of data-driven methods (R 2 between 0.5 and 0.9), explained before [66,69,71,72].

Adaptive Boosting (ADABOOST)
The original idea of adaptive boosting was formulated by Robert E. Schapire [73] in 1990 and it became one of the most used combined sets in the following years [74,75]. The concept behind the boosting is to focus on training samples that are difficult to classify [76], i.e., to let classifiers (called weak classifiers, also weak learners) learn from poorly classified training samples to improve overall performance [55]. If the performance of each weak classifier is slightly better than random guessing, the final model can be shown to converge to strong learning. ADABOOST is adaptive in the sense that subsequent weak classifiers are tweaked in favour of those samples misclassified by previous classifiers. To maximize the predictive accuracy of ADABOOST, the following parameters must be defined [77]: the learning algorithm use to train the weak models (base estimator), the number of models to iteratively train (number of estimators) and the contribution of each model to the weights (learning rate).

Closing Considerations About Data-Driven Methods
All the above methods (general linear model-GLM, k-nearest neighbours-KNN, support vector machines-SVM, decision-tree-based methods-TREE, random forest-RF, and adaptive boosting-ADABOOST) have a common factor. Their performance in modelling groundwater hydrology requires a long-time series of hydrological data to be trained [78], and these methods present overfitting during the training step [72]. Datadriven methods are sensitive to input measurements and all the previous methods used the same approach: the calibration data (infrequent data from the reference well and the estimated well) is split in two datasets, a training and a validation set. Those methods are used to generate estimates, even for the observed data which is counterintuitive. Furthermore, discrepancies in the input data may be attributed to measurement errors, systematic bias, geographical distance between the sampling points or a combination of the above factors [79]. These uncertainties of observation lead to a decrease in the accuracy of the prediction or even a problematic interpretation of the results. The latter can be even more problematic for models where the interpretability is low. To counteract the precision challenges of the six data-driven methods presented, it is recommended to consider a regional sensitivity analysis [80] and physical background concept [61], where the observed regime types are considered in the estimation.

Time Series Decomposition (TSD) Method: The New Proposed Method
The time series decomposition method (TSD), the new proposed method, defines the behaviour of the water table as the result of a local component (mainly drainage and irrigation) and a regional component (mainly precipitation and evapotranspiration). The intention remains to estimate the daily water table depth, in this case, as the sum of the regional and local components. The local component can be captured by few measurements and the regional component can be captured from daily measurements in few wells. The principle is shown in Figure 1a with real water table observations on a restored site with controlled water table in Eastern Canada in 2017. The water table Sustainability 2021, 13, 5474 6 of 21 depths in the two wells are different but show similar patterns. If the observed period is spit into 2-week periods (a normal frequency of water table observation), the trend of the water table depth for each period is defined by the water at the beginning and the end of each period (Figure 1b). The differences between the two trends are caused by local management, mainly due to drainage and irrigation. For each well, the difference between the observed water table depth and its trend (Figure 1c) represents the daily fluctuation of the water table from the trend. Even though the two wells show different water table depths, the fluctuation from the trend is very similar and represents the influence of precipitation and evapotranspiration, which are regional in nature. This decomposition of water table fluctuations corresponds to two components: a deterministic (trend) and irregular component (daily fluctuation from the trend), which also includes the stationary processes [81]. Although this principle has been explored for discrete and continuous description of physical phenomena [82], finding the functions that represent these two components remains unclear.
behaviour of the water table as the result of a local component (mainly drainage and irrigation) and a regional component (mainly precipitation and evapotranspiration). The intention remains to estimate the daily water table depth, in this case, as the sum of the regional and local components. The local component can be captured by few measurements and the regional component can be captured from daily measurements in few wells. The principle is shown in Figure 1a with real water table observations on a restored site with controlled water table in Eastern Canada in 2017. The water table depths in the two wells are different but show similar patterns. If the observed period is spit into 2week periods (a normal frequency of water table observation), the trend of the water table depth for each period is defined by the water at the beginning and the end of each period ( Figure 1b). The differences between the two trends are caused by local management, mainly due to drainage and irrigation. For each well, the difference between the observed water table depth and its trend (Figure 1c) represents the daily fluctuation of the water table from the trend. Even though the two wells show different water table depths, the fluctuation from the trend is very similar and represents the influence of precipitation and evapotranspiration, which are regional in nature. This decomposition of water table fluctuations corresponds to two components: a deterministic (trend) and irregular component (daily fluctuation from the trend), which also includes the stationary processes [81]. Although this principle has been explored for discrete and continuous description of physical phenomena [82], finding the functions that represent these two components remains unclear. Splitting the observed interval in time elements (called periods) and the decomposition of the water table depth in a trend and a fluctuation component are the base of the proposed method. For each period, the daily estimation of the water table depth in a well with infrequent measurements will be the addition of the trend of the water table observed and the daily fluctuation component derived from a reference well-in this case, a nearby well with daily observations. As shown in Figure 1c, the daily fluctuation from the trend is nearly the same for the estimated well and the reference well. The water table depth of the estimated well can therefore be expressed by Equation (1): where h(t) is the daily water table depth, λ(t) refers to the trend component and ρ(t) is the daily fluctuation from the trend. Superscript e and r represent the values for the estimated and the reference well, respectively. The first step for this method is to divide the time scale into periods (time elements), bound by the infrequent measurements (nodes). Then, the method determines the trend component for the estimated and the reference wells. For a period, the trend component (λ) can be described under the shape function of 1-D finite element (Equation (2)): where λ e (t) refers to the trend component for a specific period, h 1 and h 2 are the observed values of the water table depth at the beginning and the end of the period, respectively, and ψ 1 and ψ 2 are called the partitions of unity and are functions of t, and they are calculated by Equations (3) and (4): where t 1 and t 2 are the time at the beginning and the end of each period. For the reference well, the superscript e is replaced by superscript r in Equation (2). Subsequently, the daily fluctuation component (ρ r ) is deducted with Equation (5): where h r (t) is the observed water table depth in the reference well. The procedure described above is computed for each of the observed periods.  If more than one reference well is available, the estimation of the daily water table depth is made according to Equation (6) as an average of the daily level of the water table in each reference well: where m is the number of reference wells used for the estimation. If more than one reference well is available, the estimation of the daily water table depth is made according to Equation (6) as an average of the daily level of the water table in each reference well:

Requirements for Testing Methods
where m is the number of reference wells used for the estimation.

Requirements for Testing Methods
To test the capacity of the previously described methods to estimate the daily water table depth, a site with the largest possible number of wells with daily water table measurements is required. Within the database for this project, the largest number of wells with daily water table depth observations was 30 wells, over a 2-year observation period. Each of these wells is considered infrequently sampled, obtaining only bimonthly measurements. For each estimation method, the estimation model is trained with the bimonthly water table data extracted from each well and the associated reference well. Finally, the daily estimates of the water table depth are made with each method and are then compared with the daily water table observations. The procedure is carried out for each well located on the site.  (Table 1). Sphagnum moss was reintroduced over the five basins in 2013 according to an adaptation of Moss Layer Transfer Technique [11,83].   The basins were located at the edge of an industrial bog on slightly decomposed peat (H3-H5 on the von Post scale, mean peat depth of 1.6 m). The section was in a slight topographic depression and it was surrounded on the northwest by an adjacent natural peat bog and on the southeast by a peat extraction field. Among the five basins, three had a peripheral channel (PC-NI, PC-20, PC-10) and two had a central channel (CC-10, CC-20). Basins were irrigated with water coming from a sedimentation pond, which collected the  The basins were located at the edge of an industrial bog on slightly decomposed peat (H3-H5 on the von Post scale, mean peat depth of 1.6 m). The section was in a slight topographic depression and it was surrounded on the northwest by an adjacent natural peat bog and on the southeast by a peat extraction field. Among the five basins, three had a peripheral channel (PC-NI, PC-20, PC-10) and two had a central channel (CC-10, CC-20). Basins were irrigated with water coming from a sedimentation pond, which collected the drainage waters of the surrounding peat extraction fields, except for basin PC-NI, which only received rainfall. A pumping system fed the irrigation channels in each basin. The water level in channels was monitored by ultrasonic sensors installed at the dam, and when the water level was lower than the target level, the pumping system was activated to feed the channel. The maximum water level in a channel was controlled by the height of a dam, which was a wooden sluice gate that blocked the water flow and increased the water level upstream of the dam. This increase caused a favourable hydraulic gradient for groundwater flow within the peat for rewetting.

Water Table Depth Monitoring
Water  Figure 3). The data loggers were placed inside the 30 wells of the site to simultaneously record pressure and temperature. The wells were made of 2 in diameter PVC pipe. The wells were installed at a depth of approximately 70 cm using an auger. The wells had nylon stockings on the outer surface to prevent the entry of solids in suspension. All measurements were corrected with the air pressure Barologger Gold-Model 3001 (Solinst Canada Ltd., Georgetown, ON, Canada, accuracy: ±0.1 kPa) with the Solinst Levelogger Series software.
The daily value of the water table depth was estimated from hourly measurements as the average between the maximum value (h max, i ) and the minimum value (h min, i ) recorded during each day of the growing season (Equation (7)).

Bimonthly Measurements
Bimonthly measurements were extracted from daily values of water table depth for each well. This time interval was chosen because it was the frequency with which field measurements are normally made. A total of 11 measures were chosen per year, and the data of two years (2016 and 2017) were used. In other terms, a dataset of 660 bimonthly water table observations were assumed to be taken manually (22 measurements for each well). The water table records from level loggers were verified with the manual measurements taken on outings. The reference of these measurements was the peat surface, which was levelled to obtain zero slopes. The relative position of the wells was recorded with a Prismless Total Station-Model TC905 (Leica Geosystems AG, Heerbrugg, Switzerland). The extracted bimonthly values were considered the infrequent measure of the water table depth. The observed daily values are used to evaluate the performance of estimation methods. For each well, the reference well was chosen as the nearest well, which was generally not farther than 3.5 m.
The method architectures of GLM, KNN, SVM, TREE, RF and ADABOOST are shown in Figures A1 and A2 of Appendix A.
To avoid model overfit and errors in out-of-sample estimations, the leave-P-out crossvalidation [88] was used to determine method hyperparameters for the KNN, SVM, TREE, RF and ADABOOST methods. For these cases, P was set to 2, so predictions were tested on all distinct samples of size P = 2, while the remaining n−2 samples formed the training dataset in each iteration. In calibration, 10 resamples of the training dataset were generated for each hyperparameter value to be assessed. A model was fit using each resample data set and used to predict the remaining observations. Minimizing of training and testing root-mean-square error (RMSE, Equation (8)) was the criterion for the selection of the hyperparameter. Table 2 shows the different hyperparameters used for the training of the methods, the estimated parameters for regression and the number of estimated parameters (k). For all methods except TSD, which does not require any training, the bimonthly values dataset was divided by the random split function (train_test_split) from scikit-learn python library [88]. The result of the split was two datasets: a training subset, which contained 80% of data randomly selected, and the test subset with the remaining 20%. After the training and calibration of each method, daily estimates were generated and compared with the daily data originally observed. The test dataset was used to assess the generalization ability of the trained model.

Data Analysis and Method Performance
To quantify the degree of correspondence between the daily estimated and daily observed data, four criteria were considered: coefficient of determination (R 2 ), the rootmean-square error (RMSE), the Nash-Sutcliffe coefficient (NS), and the Akaike information criterion (AIC). These coefficients were calculated according to Equations (8)- (11). where h i is the observed water table depth,ĥ i is the estimated water table depth from each method,h i is the mean of observed water table depth, N is the number of observations and k is the number of estimated parameters. The best fit between simulated and observed data shows the RMSE closer to zero, the AIC lower, the NS and R 2 closer to one. In this study, RMSE and NS statistics are used to measure the method performance for forecasting water table depth and AIC is used to compare the performance of methods regarding accuracy and complexity, whereas R 2 is used to analyze the linear regression goodness of fit between observed and estimated data. Moreover, for the best fit between simulated and observed data, the intercept and gradient should be close to zero and one respectively to observe over-or under-predictions.

Impact on a Practical Application: Sum of Daily Deficit of Water Table Depth
The daily estimates of the water table depth can be used to quantify the annual water stress due to fluctuations of the water table depth in restored bogs. For this publication, the sum of the daily deficit of water table deeper than 15 cm (SDW 15 ) was used to study the error generated by daily estimates from the different methods on the computation of this indicator. This sum is computed for each well via Equation (12).
SDW 15 values were computed with the data from the 151 days of observation (20 May to 18 October) for both years (2016 and 2017). The SDW 15 from estimated and observed water table depths were compared using the same performance criteria as in the previous section.

Water Table Observations Statistics
As expected, the different water management systems (basins) resulted in variability of observed water table conditions (Table 3). For basins with a target water table depth of 10 cm (PC-10 and CC-10), the water table depth remained close to the surface for both years, with PC-10 having the least variation and a stable level. The basin without any control (PC-NI) was the treatment with the greatest variation in the water table depth. There are significant differences between the water tables observed between basins, except for PC-NI and PC-20.

Methods Performance
The results of the method performance evaluation are presented by the Taylor diagram ( Figure 4) and Table 4, which show the methods' performance. The azimuth angle in the Taylor diagram represents the correlation coefficient (R, dashed lines), the radial distance the standard deviation of estimated water table depth (SD, solid lines) and the semicircles centred at the "Observed" marker the root mean squared error (RMSE, dashdotted lines). Considering those performance metrics, the seven methods had an overall acceptable performance (R 2 greater than 80%). The TSD method offers the best performance (R = 0.97, R 2 = 0.95 and RMSE = 2.48 cm). The GLM and SVM methods show similar performance (for GLM R = 0.96, R 2 = 0.92 and RMSE = 3.10 cm; for SVM R = 0.95, R 2 = 0.91 and RMSE = 3.24 cm). Finally, KNN, RF, ADABOOST and TREE were the least performing methods.   The accuracy and simplicity of the methods is also evaluated using the Akaike information criterion (AIC), which favours models with the lowest RMSE (accuracy) and with the minimum number of estimated parameters (simplicity). The model with the lowest AIC value is privileged, which in this case is the TSD method, with an AIC of 7628 ( Table  4).
The accuracy of the TSD method, followed by GLM and SVM methods for daily water table depth estimation can also observed in specific cases ( Figure 5). Figure 5 (blue lines) presents a near-surface and stable water table depth (basin PC-10, an observation well approximately 1 m from the irrigation channel). Figure 5 (red lines) also shows more unstable and deeper water table (basin PC-NI, observation well in the middle of the non irrigated basin). In both cases, the estimates with the highest performance in terms of coefficient of determination were those made with the TSD, GLM and SVM methods. The different water management systems do not affect the order of the best performing methods. When there was larger variation of water table depth, as is the well in basin PC-NI ( Figure 5, red lines), the estimates were not good for the ADABOOST, TREE and RF meth-  The accuracy and simplicity of the methods is also evaluated using the Akaike information criterion (AIC), which favours models with the lowest RMSE (accuracy) and with the minimum number of estimated parameters (simplicity). The model with the lowest AIC value is privileged, which in this case is the TSD method, with an AIC of 7628 (Table 4).
The accuracy of the TSD method, followed by GLM and SVM methods for daily water table depth estimation can also observed in specific cases ( Figure 5). Figure 5 (blue lines) presents a near-surface and stable water table depth (basin PC-10, an observation well approximately 1 m from the irrigation channel). Figure 5 (red lines) also shows more unstable and deeper water table (basin PC-NI, observation well in the middle of the non irrigated basin). In both cases, the estimates with the highest performance in terms of coefficient of determination were those made with the TSD, GLM and SVM methods. The different water management systems do not affect the order of the best performing methods. When there was larger variation of water table depth, as is the well in basin PC-NI ( Figure 5, red lines), the estimates were not good for the ADABOOST, TREE and RF methods, which show abrupt changes in the depth of water table that do not match the observed values. As evidence of the lower performance, the RMSE for these methods was higher than the other cases (RMSE = 7.7 cm for ADABOOST, 7.2 cm for TREE and 4.8 cm for RF). When there is a minor variation of water table depth, the model estimates were generally good (RMSE less than 4.5 cm).

Impact on the Computed Daily Indicator
To estimate the impact of the error produced by the different methods on cumulative

Impact on the Computed Daily Indicator
To estimate the impact of the error produced by the different methods on cumulative daily indicator computation, Equation (12) was used with the observed data and the estimates from each of the seven methods. Table 5 shows the SDW 15 computed for the six observation wells in each basin. The data show variability within each basin and between basins. Three groups were identified according to the Nemenyi multiple non-parametric comparison test. The first grouping points to where water table depth remained, most of the time, above or close to 15 cm (a small SDW 15 value, less than 260 cm·days). This group consists of the wells in basins PC-10 and CC-10. The second group is the basins with a high SDW 15 value (greater than 1200 cm·days), which means that the water table depth was repeatedly below 15 cm and/or even reached greater depths. This group consists of the wells located at basins PC-NI and PC-20. Finally, the third group consists of the wells in CC-20, which are somewhere in between the two previous groupings.  Means from observed data followed by different letters indicate differences, according to Nemenyi (nonparametric test). Table 6 presents performance criteria for the different methods for estimating SDW 15 . As expected, estimations of water table depth by the TSD method to compute the SDW 15 is the best performing method with the highest R 2 and the lowest RMSE. An RMSE of 131 cm·days is quite low in regard of the range of computed SDW 15 (Table 5).

Selection of the Reference Well
For the evaluation of the different methods, the reference well was chosen as the nearest well (not farther than 3.5 m). The selection of the reference well (e.g., based on distance) can have an impact on the estimation performance. To test the impacts (notably on the RMSE estimation) of the selection of the reference well, the TSD method was chosen with two cases: • A reference well within the basin: One well was randomly selected per basin as the reference well and, the water table depths for the remaining five wells in the basin were re-estimated. This was done for every basin; • A reference well within another basin: The same reference wells of the previous case were chosen, but in this case, the estimation of the daily water table depths is made over the wells of all basins. The procedure is repeated for each reference well and is identified as a run in Table 7. Variations in RMSE were calculated and are presented in Table 7, including the case using the nearest well. Changing the nearest well to a random reference well belonging to the same basin, an RMSE increase of 12% (2.48 to 2.77 cm) was observed, and to a random reference well in another basin, the increase was 39% (2.48 to 3.45 cm) on average with five repetitions (runs). The increase made by choosing a random reference well in another basin is expected because those other basins do not have the same water management or hydraulic network type, which may influence the daily fluctuation. Therefore, it is preferable that the reference well be chosen from wells belonging to the same basin. Table 7 also shows that the basins with higher water table depth fluctuation (PC-NI, CC-20, PC-20) show larger RMSE. Figure 6 shows that the error between observed and estimated water table level (h i −ĥ i ) is not influenced by the distance to the reference well belonging to the same basin. five repetitions (runs). The increase made by choosing a random reference well in another basin is expected because those other basins do not have the same water management or hydraulic network type, which may influence the daily fluctuation. Therefore, it is preferable that the reference well be chosen from wells belonging to the same basin. Table 7 also shows that the basins with higher water table depth fluctuation (PC-NI, CC-20, PC-20) show larger RMSE. Figure 6 shows that the error between observed and estimated water table level (hi − ĥi) is not influenced by the distance to the reference well belonging to the same basin.

Measurement Frequency
The measurement frequency could influence the performance of the estimation methods. After extracting the bimonthly data, the same estimation procedure was also done with the TSD method with a weekly and monthly measurement frequency. As shown in Table 8, the correlation coefficient increases when the measurement frequency is higher. Changing the measurement frequency to weekly measurements decreases the RMSE by 16% (2.08 cm); monthly measurements increase the RMSE by 13% (2.80 cm). The TSD method can be used with monthly measurements, but it leads to higher error.

Measurement Frequency
The measurement frequency could influence the performance of the estimation methods. After extracting the bimonthly data, the same estimation procedure was also done with the TSD method with a weekly and monthly measurement frequency. As shown in Table 8, the correlation coefficient increases when the measurement frequency is higher. Changing the measurement frequency to weekly measurements decreases the RMSE by 16% (2.08 cm); monthly measurements increase the RMSE by 13% (2.80 cm). The TSD method can be used with monthly measurements, but it leads to higher error. According to the criteria performance shown in Figure 4, TSD yielded the lowest RMSE, the lowest AIC and the highest R 2 scores. The RMSE statistics, which is a measure of residual variances between observed and simulated data, was the lowest for the TSD method. TSD predictive accuracy is higher for two reasons:

•
First, TSD uses an appropriate methodological principle. It estimates the daily water table depth as the result of a local component and a regional component, which is observed in real data ( Figure 1). This type of method considers regional sensitivity [80] and uses a physical concept, which is advisable [61]. Moreover, the TSD method keeps the known data (bimonthly observations) for the estimated well. The other methods generate new data, even for the observed data, which is contra-intuitive; • Second, this method considers the time series properties, which the other methods do not consider. Time series data show auto-correlation from day-to-day data which the TSD method captures by the trend component. The other methods consider daily data as independent. Furthermore, the TSD method also captures the local impact of daily phenomena (precipitation, irrigation) through the daily fluctuation from the trend of the reference well without any additional step. Therefore, the estimated water table hydrograph is more realistic than those obtained by the remaining methods ( Figure 5).
The TSD method is also interesting because it does not require any training and it is easy to implement. It can even be used for short observation periods. The testing of this method at other restored ombrotrophic peatlands will be of interest for generalization.

Estimation Performance by the Range of Water Table Depth Variation
When estimations are made regardless of the method, the wells with less variation of the water table level (SD value less than 6 cm, as the example in Figure 5, blue lines), show a lower R 2 than the wells with greater variation of the water table level (SD value greater than 8 cm, as the example Figure 5, red lines). However, the RMSE is lower for wells with less variation of the water table observations. The range of variation is smaller and for the calculation of R 2 (Equation (8)), the denominator [∑(h i 2 ) − (1/N) ∑(ĥ i 2 )] becomes smaller. This causes the R 2 to decrease, even though the RMSE is low.

Estimation of Daily Indicator
According to the computed SDW 15 values (Table 6), TSD is the method that estimates values with the least RMSE value. The performance of the method for estimating SDW 15 follows a similar order of the performance for estimating the daily water table depth, which is not surprising. Because SDW 15 is a sum, the probable error accumulates according to the square root of the several measurements [90], in this case, 151 measurements. The probable error of the SDW 15 based on the estimates can be expressed as Equation (13): where N is the total number of daily estimations and n is the number of bimonthly measurements. The probable error can then decrease as more real measurements of the water tables depth are made (in this case bimonthly measurements). For this reason, TSD is an interesting estimation method as the coefficient (N − n) can be reduced. For the other methods (as SVM and GLM), n is 0 since the bimonthly measurements are used in the training stage and not kept in the generated testing dataset. Since (N − 0) and the RMSE are higher than TSD case, this greatly increases the probable error of the sum.

Choice of the Reference Well
Fluctuations in the water table depth are influenced by water inputs and outputs (precipitation, irrigation and evapotranspiration) and essentially by the configuration and management of the hydraulic network of channels [14,23,26,30]. This explains why reference wells belonging to the same basin show lower RMSE than for reference wells belonging to a different basin. The reference well must preferably belong to the same basin's hydraulic network and management. Table 7 also shows that in some basins (PC-NI, PC-20 and PC-10), the randomly selected reference well within the basin gave a lower RMSE than the original case (the nearest well). This suggests that the nearest well may not be the best choice and other selection strategies may yield better results. This must be further investigated.

Conclusions
This paper identifies six methods from the literature (GLM, SVM, RF, KNN, AD-ABOOST and TREE) for estimating daily water table depth with bimonthly measurements and daily measurements of some reference wells. It also presents a new method (the time series decomposition, TSD), which divides the time series in periods and for each of these periods it determines a trend component and daily fluctuation component. These methods were used to estimate the daily water table depths over two years at a site with five Sphagnum cultivation basins, and each basin had six observation wells. The TSD method was the best-performing method (R 2 = 0.95, RMSE = 2.48 cm, NS = 0.95 and the lowest AIC), followed by GLM (R 2 = 0.92, RMSE = 3.10 cm, NS = 0.92) and SVM (R 2 = 0.91, RMSE = 3.24 cm, NS = 0.91).
The methods evaluated allow the computation of SDW 15 , a way of quantifying daily water stress. This indicator varies according to the location of the well and the basin type with computed values between 0 and 2860 cm·days. The TSD method is the best method computing SWD 15 (R 2 = 0.98, RMSE = 131 cm·days, NS = 0.98), which is not surprising.
The TSD method was also tested with weekly and monthly measurement frequency. Changing the measurement frequency to weekly measurements decreases the RMSE by 16% (2.08 cm) and monthly measurements increase the RMSE by 13% (2.80 cm) in comparison to bimonthly measurements (RMSE 2.48 cm).
It is preferable to choose the reference well from within the same hydraulic network and management. The distance from the reference well does not have impact on the RMSE. The selection strategies of the reference well need further investigation. Further data collection would be of interest to test the TSD method performance on other sites and other fluctuation regimes of water table depth. Figure A1. Machine learning architecture for GLM, SVM, KNN and ADABOOST methods. The input layer is composed of the bimonthly water table levels of the estimated (h e ) and the reference well (h r ). The hidden layer depends on the method, as explained in Section 2.