Comparison of Four Methods for Vertical Extrapolation of Soil Moisture Contents from Surface to Deep Layers in an Alpine Area

The accurate estimation of moisture content in deep soil layers is usually difficult due to the associated costs, strong spatiotemporal variability, and nonlinear relationship between surface and deep moisture content, especially in alpine areas (where complications include extreme heterogeneity and freeze-thaw processes). In an effort to identify the optimal method for this purpose, this study used measurements of soil moisture content at three depths (4, 10, and 20 cm) in the upper parts of the Babao River basin in the Qilian Mountains, Northwest China. These measurements were collected in the HiWATER (Heihe watershed allied telemetry experimental research) program to test four vertical extrapolation methods: exponential filtering (ExpF), linear regression (LR), support vector regression (SVR), and the application of a type of artificial neural network, the radial basis function (RBF). SVR provided the best predictions, in terms of the lowest root mean squared error and mean absolute error values, for the 10 and 20 cm layers from surface layer (4 cm) measurements. However, the data also confirmed that freeze-thawing is an important process in the study area, which makes the infiltration process more complex and highly variable over time. Thus, we compared the vertical extrapolation methods’ performance in each of the four periods with differing infiltration characteristics and found significant among-period differences in each case. However, SVR consistently provided the best estimates, and all methods provided better estimates for the 10 cm layer than for the 20 cm layer.


Introduction
Soil moisture plays a crucial climatic role, particularly in water and energy exchanges between land surfaces and the atmosphere [1][2][3], in a myriad of environmental and ecological processes [4]. Inter alia, it strongly affects the distribution of precipitation by modulating processes including runoff, infiltration and surface storage, plant growth, and microbial population dynamics [5]. Thus, accurate monitoring or prediction of soil moisture content (SMC), especially in deep soil layers, is essential in land and agricultural management [6][7][8][9][10], especially in any area where it is not reliably abundant. However, this is far from straightforward, because SMC is affected by numerous factors, including soil properties, climate, land cover, and both biophysical and topographical surface characteristics, especially at large scale, hence it is highly variable in space and time [11][12][13].
There are many robust ways to collect surface soil moisture data, including through the use of portable sensors, cosmic-ray neutron probes, and ground-penetrating radar for fine-scale measurements [14,15], and various remote sensing techniques for larger-scale measurements [16,17]. Monitoring networks can also provide accurate, simultaneous, in situ measurements of soil moisture at different depths, which can be compiled in databases, such as the North American Soil Moisture Database [5,18]. However, this requires major soil moisture anomalies prior to spring also putatively affect summer climate. The mechanism can be explained as follows: the longer soil frozen periods, the more soil moisture anomalies can persist up to spring and favor the wetter anomalies of spring soil moisture [48,58]. Thus, there are clear needs to understand distributions of SMC in the focal area, and to develop robust vertical extrapolation methods for estimating the moisture contents of deep soil layers from the contents in surface layers.

Study Area
The study area consisted of the upper part of the basin of the Babao River, a tributary of the Heihe River, in Northwest China [59]. The basin has an area of 2450 km 2 (latitude 37°43′-38°20′ N, longitude 100°05′-101°09′ E), with an elevation ranging from 2339 to 4947 m, and an annual average temperature of −1 °C. The local climate can be characterized as semiarid and alpine cold, with the 300-500 mm annual precipitation associated with the southwest monsoon. Both temperature and precipitation vary significantly with elevation because of the steep topography and large variation in elevation [60]. The mean annual precipitation increases from about 250 mm in the low-mountain or hill zone to about 500 mm in the high-mountain zone, which has an elevation ranging from 2000 to 5500 m a.s.l. [60]. The altitudinal spring temperature lapse rate is −0.48 °C/100 m and altitudinal summer temperature is about −0.46 °C/100 m [61]. Vegetation in the basin is predominately mountain forest and grassland, with extensive shrub meadow coverage. Due to the high variations in climate (including temperature and precipitation), topography, vegetation cover, SMC and other pedological variables, there is remarkable spatial heterogeneity in ecohydrological parameters of the Babao River basin [62]. Clearly, this heterogeneity must be thoroughly addressed to describe the ecohydrological processes and interactions adequately. To assist such description, there are four national meteorological stations in or near the Babao watershed. In addition, a program called HiWATER (http://westdc. westgis.ac.cn/) [59], including a "Waternet" (wireless sensor network for monitoring soil moisture, land surface temperature, and precipitation), was established to observe changes in important hydrological factors in the area [63]. Positions of the Waternet points and the meteorological station in the study area are shown in Figure 1.

Datasets
Soil moisture and temperature data have been collected through the 40 Waternet points distributed in the Babao watershed [63], at a 5 min frequency and at three depths (4, 10, and 20 cm), using Hydra Probe II sensors (Stevens Water Monitoring System, Inc., Portland, OR, USA), since 30 June 2013. Many data points are missing, due to sensors being lost, damaged, or lacking power. Thus, 27 of the Waternet points were used and the temporal scale used in this study was daily (from 1 August 2013 to 12 September 2014). Here, the soil moisture is the volumetric moisture content of unfrozen soil, measured in cm 3 /cm 3 . Daily precipitation records were obtained from the four national meteorological stations: one is Qilian in the study area, and three others (Minle, Tuole, and Yeniugou) locate around the study area.

Exponential Filtering (ExpF)
Exponential filtering (ExpF) is a semi-empirical method, initially derived from a simple water balance equation [30] and subsequently presented in a recursive form [64] in the following equations: where T is the optimal characteristic decay time, SM t n (cm 3 /cm 3 ) is the SMC in a focal deep layer at time t n , and x n is the surface soil moisture at time t n . As previously noted, T is insensitive to many factors [5]. In this study, the optimal T value was found by minimizing root mean squared error (RMSE) and mean absolute error (MAE) at each depth. The results are shown in Section 4.1.1.

Support Vector Regression (SVR)
SVR is used in SVM learning to solve complex regression problems [65]. This method involves structural risk minimization to minimize empirical risk (performance on a known set of training data), which is the general aim of statistical learning systems, such as ANNs [66]. However, structural risk reduction involves the minimization of empirical risk to obtain good generalization capacity by minimizing the error of the generalization rather than the training error [41,67]. Fundamentally, SVR involves nonlinear mapping of primary data into a higher-dimensional feature space and the use of a kernel function for linear regression in the feature space [41,[68][69][70]. Several types of kernels can be used (e.g., polynomial, linear, or sigmoid), but the RBF reportedly provides the best performance [71][72][73][74]. To identify the optimal choice in this study, we compared the performance of SVR with several kernels. The results are shown in Section 4.1.2.

Radial Basis Function (RBF)
ANNs have received a great deal of attention in recent years due to their ability to overcome non-linearity and multicollinearity problems, and numerous types have been developed [75][76][77][78][79]. The RBF is a multi-layer feed-forward ANN. The architecture and training algorithms of RBF networks are simpler and clearer than back-propagation networks [80,81]. Furthermore, they can be trained more quickly than a multilayer perceptron network [82]. Thus, in this study, the RBF was selected as a representative form of ANN. It typically has a feed-forward structure with three layers: the input layer, hide layer (a non-linear RBF activation function), and linear output layer. Each layer may have one or more neural elements. This type of network has self-organizing characteristics that allow adaptive determination of the hidden neurons during training of the network [83]. Each input neuron is connected to all hidden neurons, and both hidden and output neurons are interconnected by a set of weights [79]. In this study, for interpolating SMC values at deep layer, the inputs (x) were empirical SMC values for the surface layer and the deep layer during the training period. Then, all the resulting values were applied to determine the transfer function S = x training , SM training T and weights λ (i) to apply in formula [5]. Here, a Gaussian function was selected for the RBF network. Thus, the response of the hidden neuron to an input neuron S is given by: where φ() is the Gaussian activation function, and denotes the Euclidean distance, c (i) is the center of the ith neuron in the hidden layer, and σ is the width of the hidden neuron. It can be computed by [84]: where d max is the maximum distance between centers of the hidden neurons and T is the number of hidden neurons.
Finally, the RBF model is established as in the following formula: where x training is the surface soil moisture and SM training is the measured soil moisture value for the deep layer during the training period. The unit used is cm 3 /cm 3 . Then, the predicted surface soil moisture at selected times is input into the trained RBF model to obtain predicted soil moisture values for the deep layer. When RBF is applied in MATLAB, the spread (variance) influences its performance, and the results obtained, with spreads ranging from 0.0001 to 0.5, are shown in Section 4.1.3.

Linear Regression (LR)
Linear regression was used to construct site-specific linear relationships between SMC in the near-surface (4 cm) layer and at deeper depths using the following formula [5]: where SM t n and x n are as above, a is the regression coefficient, and b is a constant.

Calibrating Methods
We used the MAE and RMSE to evaluate the moisture content in deep soil layers, derived using the four methods [85,86], with the following formulas: where SM t n (cm 3 /cm 3 ) is the observed soil moisture, SM * tn (cm 3 /cm 3 ) is the value predicted by one of the four extrapolation methods, and n is the number of data points during a focal period.
Both MAE and RMSE are used to quantify errors between predicted and observed values. By definition, both measures are greater than zero, or equal zero if predicted values are identical to observed values [85][86][87]. Comparing them, the MAE value is better at reflecting the error between the predicted value and measured value, while the RMSE value is better at representing the relationship between the data series and the true value.

Calibration of Three Methods in the Alpine Region
For all the tested methods, with the exception of LR, there were important parameters that strongly affected their performance, such as the T parameter in ExpF, which is insensitive to the location and season [5]. The kernels in SVR and spread in RBF also impacted the performance of the respective methods.
Thus, a key step was to identify the best parameters for the ExpF, SVR, and RBF methods in order to calibrate the functions for use in the alpine study area. For this, cross-validation was applied with 20 repetitions, in each case using a third of the dataset (randomly extracted) for training, and a third as the calibration set. The performance of the ExpF, SVR, and RBF methods with different parameters was then evaluated, as described in the following sections.

Calibration of ExpF
Previous studies have shown that the decay time (T) is impacted by the soil depth or thickness [5,24]. The optimal value increases with increasing depth of soil, but also significantly varies among areas. Thus, we sought the optimal parameter T for the 10 and 20 cm depths in the study area. Results obtained with T, ranging from 0.01 to 30 and showing values that generated the minimum RMSE and MAE for SMC in each depth, are presented in Figure 2. As found in previous studies, the effects of changes in T on the methods' performance depended on the layer, and performance was poorer for the 20 cm layer than the 10 cm layer when T exceeded 3. The performance for both layers initially improved and then deteriorated with increases in T, and, for data collected from all stations, the optimal decay times were 6 and 0.5 for the 10 and 20 cm layers, respectively. reflecting the error between the predicted value and measured value, while the RMSE value is better at representing the relationship between the data series and the true value.

Calibration of Three Methods in the Alpine Region
For all the tested methods, with the exception of LR, there were important parameters that strongly affected their performance, such as the T parameter in ExpF, which is insensitive to the location and season [5]. The kernels in SVR and spread in RBF also impacted the performance of the respective methods.
Thus, a key step was to identify the best parameters for the ExpF, SVR, and RBF methods in order to calibrate the functions for use in the alpine study area. For this, crossvalidation was applied with 20 repetitions, in each case using a third of the dataset (randomly extracted) for training, and a third as the calibration set. The performance of the ExpF, SVR, and RBF methods with different parameters was then evaluated, as described in the following sections.

Calibration of ExpF
Previous studies have shown that the decay time (T) is impacted by the soil depth or thickness [5,24]. The optimal value increases with increasing depth of soil, but also significantly varies among areas. Thus, we sought the optimal parameter T for the 10 and 20 cm depths in the study area. Results obtained with T, ranging from 0.01 to 30 and showing values that generated the minimum RMSE and MAE for SMC in each depth, are presented in Figure 2. As found in previous studies, the effects of changes in T on the methods' performance depended on the layer, and performance was poorer for the 20 cm layer than the 10 cm layer when T exceeded 3. The performance for both layers initially improved and then deteriorated with increases in T, and, for data collected from all stations, the optimal decay times were 6 and 0.5 for the 10 and 20 cm layers, respectively.

Calibration of the SVR Function
The SVR method's performance strongly depends on the kernel, which is incorporated to simplify computation of the inner product value of the transformed data in the feature space [73]. Determining the kernel and error penalty parameters for SVR is very problem-dependent in practice. As previously mentioned, four types of kernels can be used. Thus, we also compared the performance of the method with each of these four types, for the 10 and 20 cm layers, and the results are shown in Figure 3. Minimization of RMSE and MAE at each depth showed that the optimal kernel function for both layers

Calibration of the SVR Function
The SVR method's performance strongly depends on the kernel, which is incorporated to simplify computation of the inner product value of the transformed data in the feature space [73]. Determining the kernel and error penalty parameters for SVR is very problemdependent in practice. As previously mentioned, four types of kernels can be used. Thus, we also compared the performance of the method with each of these four types, for the 10 and 20 cm layers, and the results are shown in Figure 3. Minimization of RMSE and MAE at each depth showed that the optimal kernel function for both layers was the RBF, which is in accordance with many previous studies [71,73,88]. Thus, the RBF was selected as the kernel for the SVR method in the subsequent analysis. was the RBF, which is in accordance with many previous studies [71,73,88]. Thus, the RBF was selected as the kernel for the SVR method in the subsequent analysis.

RBF Calibration
The spread was one of the key factors affecting the RBF method's performance. Use of spread constants that are too large and too small resulted in underfitting and overfitting, respectively, due to the excessive and insufficient influence of general trends relative to finer-scale variations [89]. Thus, we compared the RBF method's performance with different spreads and sought the optimal spread parameters for the study area. Figure 4 shows the RMSE and MAE values obtained with spreads ranging from 0.0001 to 0.5 for the two soil layers. The method's performance for both layers first improved, then deteriorated as the spread increased, and the minimum RMSE and MAE were obtained with 0.01 and 0.05 spread values for the 10 and 20 cm layers, respectively. Thus, these values were selected as the optima.

Comparison of the Methods for the Entire Period
The performance of the four vertical extrapolation methods for predicting SMC in the 10 and 20 cm layers during the same period (February, 21, 2014 to August ,21, 2014), using the same training period (August, 1, 2013 to February, 20, 2014), and in constant time for valid comparison, is shown in Figure 5.

RBF Calibration
The spread was one of the key factors affecting the RBF method's performance. Use of spread constants that are too large and too small resulted in underfitting and overfitting, respectively, due to the excessive and insufficient influence of general trends relative to finer-scale variations [89]. Thus, we compared the RBF method's performance with different spreads and sought the optimal spread parameters for the study area. Figure 4 shows the RMSE and MAE values obtained with spreads ranging from 0.0001 to 0.5 for the two soil layers. The method's performance for both layers first improved, then deteriorated as the spread increased, and the minimum RMSE and MAE were obtained with 0.01 and 0.05 spread values for the 10 and 20 cm layers, respectively. Thus, these values were selected as the optima.
was the RBF, which is in accordance with many previous studies [71,73,88]. Thus, the RBF was selected as the kernel for the SVR method in the subsequent analysis.

RBF Calibration
The spread was one of the key factors affecting the RBF method's performance. Use of spread constants that are too large and too small resulted in underfitting and overfitting, respectively, due to the excessive and insufficient influence of general trends relative to finer-scale variations [89]. Thus, we compared the RBF method's performance with different spreads and sought the optimal spread parameters for the study area. Figure 4 shows the RMSE and MAE values obtained with spreads ranging from 0.0001 to 0.5 for the two soil layers. The method's performance for both layers first improved, then deteriorated as the spread increased, and the minimum RMSE and MAE were obtained with 0.01 and 0.05 spread values for the 10 and 20 cm layers, respectively. Thus, these values were selected as the optima.

Comparison of the Methods for the Entire Period
The performance of the four vertical extrapolation methods for predicting SMC in the 10 and 20 cm layers during the same period (February, 21, 2014 to August ,21, 2014), using the same training period (August, 1, 2013 to February, 20, 2014), and in constant time for valid comparison, is shown in Figure 5. Both the MAE and RMSE values indicated that the SVR method consistently provided better SMC estimates for both soil layers, and all methods provided better estimates for the 10 cm layer than the 20 cm layer. In addition, the ExpF indicated the smoothest changes over time, while the other three indicated similar, more substantial fluctuations.  Both the MAE and RMSE values indicated that the SVR method consistently provided better SMC estimates for both soil layers, and all methods provided better estimates for the 10 cm layer than the 20 cm layer. In addition, the ExpF indicated the smoothest changes over time, while the other three indicated similar, more substantial fluctuations.

Sub-Division of the Study Period to Assess Freeze-Thaw Effects
Numerous factors affect soil moisture contents at different scales in every area [88,90]. For the focal environment and scale here, the most important variables were deemed to be soil temperature, the normalized difference vegetation index (NDVI), and precipitation, all of which significantly affect SMC according to previous studies [52,[91][92][93][94][95]. Figure 6a shows observed changes in daily moisture contents in the three soil layers and precipitation in and around the study area. As shown in Figure 6a

Comparison of the Four Methods' Performance for Estimating SMC in the Four Sub-Periods
As described in Section 4.3.1, the methods' performance significantly varied with time, which was apparently due to shifts in SMC between liquid and ice states. Thus, freeze-thaw effects on the performance of each extrapolation method were assessed in more detail. MAE and RMSE values obtained for SMC in the 10 and 20 cm layers during the four mentioned sub-periods clearly confirmed that all four methods' performance varied among these sub-periods (Figure 7).
For estimating SMC in the 10 cm layer, the ExpF method's performance was best for period 3, followed by periods 2, 4, and 1, while the other three methods' performance was best for period 3 followed by periods 1, 4, and 2. In addition, the ExpF method provided substantially better estimates than the other methods for period 2, but substantially worse estimates for period 1. For estimating SMC in the 20 cm layer, the ExpF method's performance was best for period 3, followed by periods 2, 4, and 1, while the other three methods' performance was best for period 3 followed by periods 1, 4, and 2. The LR method provided particularly poor estimates for SMC in this layer during period 2.  Figure 6b shows the daily SMC of the three layers and NDVI in each month during the study period. The SMC generally increased with increases in NDVI. However, in November, the NDVI was 0.203, close to the lowest recorded value in the study, but the SMC was still relatively high. Similarly, in both April and May, the NDVI values were still low, but the soil SMC values were among the highest in the study period.
Comparison of the measurements of SMC and soil temperature in the three layers clearly showed that the soil temperature also affected SMC (Figure 6c). The performance of the correlation coefficient of the three layers with precipitation, soil temperature, and NDVI showed that the correlation coefficients between SMC and soil temperature were much larger than others (Table 1), representing the fact that the soil temperature had significant impacts on SMC. When the temperature was below a threshold value (around 0 • C), SMC quickly declined to a low level, while, when above the threshold, it rapidly increased to a higher level. This can be largely attributed to freeze-thawing of the soil, which affects SMC by changing the soil water phase (Wang et al., 2020). As soils freeze and their liquid water content declines, a strong gradient in matric potential develops, which drives the water toward the freezing front [5,93]. When the soil is completely frozen, a small amount of water is still present in a "supercooled" state [96], and when the soil rapidly thaws in spring, the SMC quickly increases.

Comparison of the Four Methods' Performance for Estimating SMC in the Four Sub-Periods
As described in Section 4.3.1, the methods' performance significantly varied with time, which was apparently due to shifts in SMC between liquid and ice states. Thus, freeze-thaw effects on the performance of each extrapolation method were assessed in more detail. MAE and RMSE values obtained for SMC in the 10 and 20 cm layers during the four mentioned sub-periods clearly confirmed that all four methods' performance varied among these sub-periods (Figure 7).

Comparison of the Methods for Estimating SMC in Different Layers
As previously mentioned, there were clear between-depth differences in the performance of each extrapolation method, in constant time, during the four periods. Thus, coefficients of correlation between surface (4 cm) layer measurements of SMC and the 10 and 20 cm layers were calculated, and, as shown in Table 2, in all cases they were significant at the p < 0.05 level. However, correlation coefficients were significantly larger for the 10 cm layer than the 20 cm layer in periods 1 and 3. In addition, coefficient of variation (CV) values for the 20 cm layer increased in the four periods in the order of 2 < 3 < 1 < 4, while the corresponding order for the 10 cm layer was 3 < 1 < 4 < 2. CV values were also larger for the 20 cm layer than the 10 cm layer in each period, except period 2.  For estimating SMC in the 10 cm layer, the ExpF method's performance was best for period 3, followed by periods 2, 4, and 1, while the other three methods' performance was best for period 3 followed by periods 1, 4, and 2. In addition, the ExpF method provided substantially better estimates than the other methods for period 2, but substantially worse estimates for period 1. For estimating SMC in the 20 cm layer, the ExpF method's performance was best for period 3, followed by periods 2, 4, and 1, while the other three methods' performance was best for period 3 followed by periods 1, 4, and 2. The LR method provided particularly poor estimates for SMC in this layer during period 2.

Comparison of the Methods for Estimating SMC in Different Layers
As previously mentioned, there were clear between-depth differences in the performance of each extrapolation method, in constant time, during the four periods. Thus, coefficients of correlation between surface (4 cm) layer measurements of SMC and the 10 and 20 cm layers were calculated, and, as shown in Table 2, in all cases they were significant at the p < 0.05 level. However, correlation coefficients were significantly larger for the 10 cm layer than the 20 cm layer in periods 1 and 3. In addition, coefficient of variation (CV) values for the 20 cm layer increased in the four periods in the order of 2 < 3 < 1 < 4, while the corresponding order for the 10 cm layer was 3 < 1 < 4 < 2. CV values were also larger for the 20 cm layer than the 10 cm layer in each period, except period 2.

Effects of Environmental Factors on Method Performance
As described in Section 4.2, four periods with clearly differing SMC regimes were discerned during the total study period, when SMC was relatively high and stable, rapidly decreasing, low and stable, and rapidly increasing (designated periods 1-4, respectively). Changes in temperature were the main reasons for variations in periods 2, 3, and 4.
In a previous study, the temperature clearly affected vertical extrapolation methods' performance and, thus, was used to divide the annual SMC cycle into two periods in Oklahoma, USA: warm and cold [5]. The cited authors found that the vertical extrapolation methods they tested provided better estimates of SMC in the warm season than in the cool season, at least for the 60-75 cm layer. The focal area in this study is much more mountainous and has much colder temperatures than Oklahoma, thus there is much higher variance in soil moisture and freeze-thawing is a much more important process [97]. There are clearly four differing seasonal periods, and differences among them substantially affect the methods' performance, even for estimating SMC in a shallow layer (ca. 10 cm) as aforementioned. When freezing begins, SMC rapidly decreases and remains stable, if most of the soil moisture is converted from liquid water to ice, until thawing begins in spring, when the SMC quickly increases then remains stable if almost all soil moisture changes from ice to liquid water. Furthermore, freeze-thawing also affects infiltration [98], thus causing major fluctuations in SMC and the extrapolation methods' performance.

Mechanistic Reasons for Differences in the Methods' Performance
A comparison of the results obtained with the four extrapolation methods showed that the SVR method provided the best estimates of SMC in both 10 and 20 cm layers during the total study period, followed by the ExpF, RBF, and LR methods. There are three clear mechanistic reasons for the differences in their performance. First, the relationship between surface and deep SMC is non-linear [6][7][8]. Thus, the LR method (the only linear approach tested) could not provide as accurate estimates of deep SMC from surface soil measurements as non-linear methods, as shown by this study and that of Zhang et al. (2017) [5]. Second, unlike the other two non-linear methods, the ExpF function extrapolated deep soil SMC directly from surface soil moisture measurements without using any training data [30]. Thus, its performance would be much better when the vertical infiltration rate of soil moisture is relatively fast, or the interaction between surface and soil moisture and deep soil moisture is rapid, than in other cases. A previous study found that it performs better than ANN methods because it can better capture the relative variability and correlation between SMC at different depths [5]. We also found that it performed better than the RBF method. However, infiltration patterns in focal alpine area are very complex, and because an assumption underlying the ExpF method is that absolute water contents among the layers are similar, it performs less well than the SVR method in constant time. Third, the SVR and RBF methods are very similar, with both predicting values by fitting training functions through minimizing differences between observation and training values.
However, SVR is superior to the RBF method, partly because SVR is based on structural risk minimization, minimizing an upper bound for the generalization error, while the RBF method is based on empirical risk minimization, i.e., minimizing the training error. Moreover, SVR provides a global optimal solution and the parameter selection is less complex, so the RBF method is prone to higher uncertainty during the fitting process and can less easily generate optimal solutions in many cases [44,99], especially over longer forecasting horizons [100]. As its optimization goal is to minimize the structural risk rather than the empirical risk, and it has excellent generalization ability, SVR has become one of the most commonly used and effective methods. It can provide much better results than other algorithms with small sample training sets, thus it reduces dataset size and data distribution requirements by maximizing margins between classes and thereby acquiring a structural description of log data distributions. In summary, due to the complexity of the vertical infiltration processes, non-linear methods are intrinsically more suitable than linear methods; SVR performs better than the other two tested non-linear methods (RBF and ExpF); and SVR is both much faster and performs significantly better than RBF methodology.

Effects of the Freeze-Thaw Process on Method Performance
As already mentioned, the performance of the four extrapolation methods for estimating SMC significantly differed among the four identified periods. For the 10 cm layer, the ExpF method provided the best estimates in period 3, followed by periods 2, 4, and 1, while the performance of the other three methods was maximal in period 3 followed by periods 1, 2, and 4. For the 20 cm layer, the LR method provided the best estimates in period 1, followed by periods 3, 4, and 2, while the corresponding sequence for the other methods was periods 2, 1, 3, and 4. There are three main reasons for these variations. First, differences in SMC variability among the periods affected the methods' performance.
The CV values were lowest in period 3, followed by periods 1, 4, and 2 for the 10 cm layer, while the corresponding order for the 20 cm layer was 2, 1, 3, and 4 ( Table 2). This is important because, as previous authors have noted, increases in the variability in soil moisture will inevitably impair the performance of calibration methods [101,102]. Second, there are clear mechanistic reasons for the differences in the methods' performance. In contrast to the other methods, the ExpF method predicted SMC values for a given moment (day) from values in the previous moment. Both the measured SMC in the surface layer and predicted SMC in deep layers in the preceding moment determined the deep SMC at a given moment. Thus, the variations in CV values among periods did not significantly affect the ExpF method's performance. Third, there were substantial anomalies between the trends in the SMC values generated by the LR method and the CV values, largely due to the complex, non-linear, vertical movement of soil moisture. The lag times between responses of other layers to changes in the surface layer were depth-dependent, so correlation coefficients between SMC in the 10 cm layer at a given moment and surface soil moisture in the preceding moment during periods 1 to 4 were 0.772, 0.835, 0.930, and 0.684, respectively, all of which were significant at the p < 0.05 level. With increasing depth, this correlation weakened and eventually disappeared. Both the SVR and RBF are self-learning methods, and thus find the relationship between SMC in the surface soil and deeper layers at different moments from training data. As a result, their performance is not significantly correlated with the CV values.

Performance of the Extrapolation Methods at Different Depths
In accordance with findings by Zhang et al. (2017) [5], the four methods provided consistently better estimates of SMC in the 10 cm layer than in the 20 cm layer, both in terms of constant time and specific periods. SMC also had a smaller CV in the 10 cm layer than in the 20 cm layer, both for the entire period (0.471 and 0.527, respectively) and the four sub-periods. The CV was the most commonly used index to describe the variability of geographical elements because it is dimensionless. As it increases, the variability of soil moisture rises, and the performance of calibration methods tends to decline [101,102], thus explaining the better performance of the four extrapolation methods for SMC in the 10 cm layer than in the 20 cm layer.

Conclusions
In the present study, we used daily records of soil moisture, soil temperature, and precipitation in an alpine area, located in the Qilian Mountain, to assess four extrapolation methods (SVR, ExpF, RBF, and LR) used for predicting SMC in 10 and 20 cm soil layers from surface soil measurements, with consideration of the variance of soil moisture and the effects of freeze-thaw processes. There are three main conclusions: 1.
The freeze-thaw process significantly impacted the SMC and performance of extrapolation methods, as it greatly increased the complexity and temporal heterogeneity of infiltration processes. Environmental factors (e.g., soil temperature, precipitation, and NDVI) were also clearly correlated with soil moisture. The freeze-thaw process caused major fluctuations in the SMC via correlation analysis and through the extrapolation methods' performance. Thus, the influence of the freeze-thaw process should be considered when applying extrapolating methods in an alpine area; 2.
The SVR can be adapted to the variance in data through training its function by selflearning with small datasets. Thus, the SVR is most suitable for extrapolating SMC values, particularly in areas with complex environmental factors and movements of soil moisture; 3.
The performance of extrapolation methods was correlated with the variability in soil moisture in focal layers, thus the performance of all extrapolation methods was better for the 10 cm layer than the 20 cm layer.
In conclusion, the SVR was determined to be the best current choice for extrapolating SMC in alpine areas in which freeze-thawing increases the complexity of infiltration processes and, thus, is an important factor. When the datasets are small, the advantage of SVR is obvious. Furthermore, soil temperature was a very important factor, which affected the extrapolation methods' performance.