In Search of a Soil Moisture Content Simulation Model: Mechanistic and Data Mining Approach Based on TDR Method Results

Soil moisture content simulation models have continuously been an important research objective. In particular, the comparisons of the performance of different model types deserve proper attention. Therefore, the quality of selected physically-based and statistical models was analyzed utilizing the data from the Time Domain Reflectometry technique. An E-Test measurement system was applied with the reflectogram interpreted into soil volumetric moisture content by proper calibration equations. The gathered data facilitated to calibrate the physical model of Deardorff and establish parameters of: support vector machines, multivariate adaptive regression spline, and boosted trees model. The general likelihood uncertainty estimation revealed the sensitivity of individual model parameters. As it was assumed, a simple structure of statistical models was achieved but no direct physical interpretation of their parameters, contrary to a physically-based method. The TDR technique proved useful for the calibration of different soil moisture models and a satisfactory quality for their future exploitation.


Introduction
The importance of soil moisture as an environmental variable is evident from the viewpoint of its role in the hydrological cycle. Soil moisture impacts rainfall-runoff processes, infiltration, groundwater recharge rates, and constrains evapotranspiration as well as photosynthesis. To some extent, it governs water and energy exchange between the land, plants and the atmosphere [1][2][3][4], but embraces multiscale feedbacks as well [5]. The importance of soil moisture was stressed particularly in respect of the partitioning between the sensible and latent heat fluxes, and consequently for the temperature of the surface and the lower atmosphere [6,7]. From the viewpoint of mass and energy transfer, the storage and availability of water is crucial for plant growth, and its shortages are considered to be serious limitations to agricultural production. In recent years, the on-going climate changes have resulted in atmospheric and hydrological droughts that give rise to soil and physiological droughts affecting harvest yields and water supplies. The observed rise in air temperature along with precipitation shortages occur more frequently all over the world, e.g., in Europe and Poland, and affect the security of soil water resources and crop yields.
Since soil moisture content plays a significant role to satisfy water requirements not only of agricultural produce, but also in other numerous respects, the TDR method has become highly developed and applied worldwide for soil moisture estimation [8][9][10][11][12], being available for other materials as well [13][14][15]. That method involves electrical permittivity as an indirect parameter of soil moisture content estimation. Electrical permittivity ε [−] is a measure of matter molecule behavior in the case of external, variable electrical field application [16]. The values of that parameter for individual phases of porous media are the following: air-1 [−], soil solids 1-15 [−], water-80 [−]. The occurring difference between the permittivities of individual phases is the base for dielectrical methods application, including TDR.
Based on the data from the TDR method, the knowledge on temporal variability of soil moisture content in different hydrological conditions, in particular, becomes indispensable for producing environmental forecasts and improving their predictions [17]. It is also of significance to a number of applications, e.g., management practices that may currently employ soil moisture data all over the world. This is also the basis to acknowledge crop plant growth issues but also an effort to better understand and predict soil water storage in short time increments, e.g., on a daily basis.
In this work, we focused on the performance of various models for soil moisture content simulation, taking advantage of the measurement data obtained by the TDR method. In addition to the physically-based models, the literature review [18] shows that potential is offered also by data mining methods [19]. They facilitate finding relatively simple models that involve easily obtainable hydrometeorological variables, such as air relative humidity and temperature, precipitation, or others, in order to simulate the moisture content of the shallow soil layers.
Among all relevant models, artificial neural networks, support vector machines, Random Forests, boosted trees, and k-nearest neighbor should be included [20]. In these methods, at the training stage, the model structure is formulated based on historical data on different input variables. This becomes decisive for the obtainable quality of model simulations, which is usually indicated by the values of mean absolute and relative errors. The main advantage is the non-complex and flexible model structure which was was attempted in this work for the selected statistical and physical models. Additionally, the volumetric soil water contents, recorded by a selected type of sensor, were studied to prove their reliability for the calibration of different soil moisture mathematical models [21].

Materials and Methods
The analyses involved an algorithm for the selection of the soil moisture content forecast model. Two mathematical models were taken into consideration: a statistical one and a physical model by Deardorff [6,22] which was especially developed and dedicated for shallow soil layers moisture, shaped by daily weather variations. The comparison of the models is a base for their further exploitation in the field of water resources management and related decision-making. The sense of comparative studies and resultant model selection would facilitate to approach either water shortages or excess (i.e., drought or flood), and to mitigate environmental and ecological threats (not overlooking economic issues). With reference to the above-mentioned problems, an appropriate scheme was proposed ( Figure 1).
The considerations involved the observation data included: precipitation, air relative humidity, and air temperature and wind speed, to elaborate soil moisture forecast models through data mining methods. The calculated potential evapotranspiration and precipitation were input to a physical model. The main advantage of statistical methods, in comparison to the physical approach, is a much more simple model structure. In a statistical model, the independent variables which significantly influence the phenomenon are distinguished in the first instance. From a practical point of view, a compromise is established between the model complexity, usefulness, and exploitation possibilities. Hence, the analyses performed herein embraced a couple of data mining methods so that the proper model for soil moisture forecast could be found. In reference to the statistical model, the creation of the physical model (mechanistic one) seems to be more complicated. This is due to variability of weather and soil conditions as described by a number of relevant equations [23,24], and the one by Deardorff in particular, which requires the employment of numerical methods as well as proper differential schemes to be solved. Owing to the existence of considerable interactions between the coefficients of the Deardorff equation there arise problems in their identification that hampers its usefulness.
In order to acknowledge existing constraints in the identification of Deardorff equation parameters a probabilistic estimation was realized through coupled General Likelihood Uncertainty Estimation (GLUE) and Global Sensitivity Analysis (GSA) methodology [25]. Such an approach facilitated to reach reliable distributions of the calibrated parameters in the Deardorff equation, that governed the range of simulation results (95% confidence level).
Based on the determined models (a physical and a statistical one), the simulations of the soil moisture content were performed for the adopted input values. The aforementioned expected model accuracy, resulting from measurement data, would delineate the aftermath of the latter decisions regarding water storage, testing of irrigation efficiency or drainage and irrigation practices, for instance. It would take effect further in evaluation of ecosystem services or ecosystem potential, bringing about uncertainties in the decision-making process.

Measurements of Meteorological Parameters
The research was performed for the southern district of Warsaw (Ursynów district) within the compounds of Warsaw University of Life Sciences ( Figure 2). That area is characterized by a high anthropopressure degree and a high coverage by impervious areas, e.g., the presence of large areas of housing, sidewalks, car parking lots and main streets, that contribute to a fast generation of overland flow. However, the percentage of permeable surface includes city parks, and extensive green and recreational areas, which causes water storage in the soil to be essential for the sake of city adaptation to a changing climate. This means there is a need for proper models for the soil moisture content that employ easily obtainable hydrometeorological variables (Table 1). In this respect, the indispensable data range originated from the WULS Ursynów meteorological station (λE 21 • 02 52" ϕN 52 • 09 38", 102.5 m a.s.l.) was used. Daily precipitation acted as model input for soil moisture simulations, as well as air temperature, relative humidity, and windspeed, that became also indispensable for potential evapotranspiration estimates according to FAO procedure [26] on hourly time step by the Penman-Monteith equation: . Hourly evapotranspiration data were then averaged to obtain mean daily evapotranspiration.

Measurements of Soil Moisture Content
The selected research site within the WULS campus (the meteorological station) was subject to measurements of the volumetric soil moisture content by the TDR method. The geological setting of that site exhibited generally sandy loams up to 1.9 m depth [27] underlain by sandy clay higher than 10 m in thickness, along with deep ground water table, reaching 11 m beneath the land elevation. The conducted survey revealed soil layers to the depth of 0.2 m, for which moisture was monitored by TDR probes, containing 65% of sand (diameter 2-0.05 mm, including: 24% of coarse sand (1-0.5mm), 16% of medium (0.5-0.25 mm), 13% of fine sand (0.25-0.1 mm) and 12% of very fine sand (0.1-0.05 mm)). The estimated fraction of silt equaled 24% (diam. 0.05-0.002 mm), along with 11% of coarse silt (0.05-0.02 mm) and 13% of fine silt (0.02-0.002 mm), and finally the fraction of clay (diam. < 0.002 mm), reaching 11% [28]. Soil particle size composition was made available by laboratory sieves, that offered diameters from 1 mm to 0.05 mm. However, silt and clay fractions were found through hydrometer (areometric) analyses [29].
Field Operated Multimeter (FOM, E-Test, Lublin, Poland) was used for the soil moisture content measurements, equipped with electromagnetic impulse generator of a falling and rising time equal to about 3 × 10 −10 s (300 ps). The generated impulse propagated along the concentric cable (6 m long) of a constant and invariable dielectric permittivity [30]. Then, the impedance changes in the cable-probe system caused the signal reflections, that were next used to determine the time of propagation along the measurement rods (100 mm long, parallel, stainless, 2 mm in diameter with a spacing of 16 mm). The first reflection (the initial one) took place at the cable-probe interface, and the second occurred at the end of the probe. Those reflections were intercepted by the FOM meter in the form of a so-called reflectogram, and automatically interpreted into electrical permittivity and then into soil volumetric moisture content with the use of calibration equation by Malicki [31]. The reliability of the applied equation was proved through comparisons to gravimetric method for sixty soils all over the area of Poland for a range of soil densities. Determination coefficient for the compared variables was equal to 0.98 and the absolute, average error of moisture content reached the value of 3%. The TDR probes, such as FP/mts, were applied here, that also made it possible to record the soil temperature by AD592CN sensor. They facilitated the measurements performed at two soil depths: 10 and 20 cm at the abovementioned meteorological station, where the land surface was covered by standard, mown grass of 10 cm height. All data were recorded for vegetation periods (01.04-30.09) of 2011, 2013 and 2014.

Data Mining Methods for Soil Moisture Content Simulation
A literature search [32] revealed a range of data mining methods that can be applied for the sake of environmental processes simulation. An advantage is the simple structure of the models and relatively small number of parameters or coefficients that require identification in a proper manner. On the other hand, a number of cases are thought to employ models of a complex structure with numerous parameters subject to calibration. In principle, more complex models should provide better match of the modeled and observed variables, i.e., higher accuracy. It should be stressed, however, there is no actual rule, since many cases proved relatively simple models to offer high or acceptable accuracy of the results not very different from the complex ones.
Modeling of processes occurring in the natural environment frequently employs multivariate regression methods; however, there exists linear relationships between dependent and independent variables that are found to restrict application possibilities. In response to existing limitations, the required refinements led to MARS (Multivariate Adaptive Regression Spline) determination. The main advantages of MARS lie in its capacity to capture the intrinsic, complicated data mapping in high-dimensional data patterns and produce simpler, easier-to-interpret models, as well as its ability to perform analysis on parameter relative significance. In addition to the complete involvement of the predictors in MARS procedure, there are also all observations of the independent variable being analyzed. Moreover, their variability range becomes divided into intervals where the interval limits are derived from the so-called knots, which, in turn, constitute the threshold values (t) [33]. In this manner, by comparing the values of the independent variable (x m ) with the threshold ones (t m ), the sign of the relationship y = f(x i ) can be determined. The threshold values are utilized to distinguish the values of the independent variable assuming the base functions such as β i ·max (0, X − t). The general form of the MARS model is achieved through summing the base functions and their products as follows: where k is the number of independent variables, M is the number of components of the created model, m = 1, 2, 3, . . . , M, and β m is the weight values. In order to acknowledge the fact of measurement data falling into clusters, for instance, the procedure for data partitioning was identified through proper rules dependent on the values of independent variables. That finding was successfully implemented in the method of regression and classification trees. However, due to emerging problems with those models' stability, the modification was imposed to replace a single tree with the forest (parallel connected trees-Random Forest) or a serial layout (boosted trees algorithm). The adopted solutions provided better predictive capabilities of the regression tree. Both Random Forest and boosted trees structure require identification of the number of trees, which should not be higher than 300 in practice [34]. On the other hand, an alternative approach to MARS, RF and BT is offered by the neural network method (ANN), in which model structure and the number of estimated coefficients is higher than in the above-mentioned models. Nonetheless, one of the frequently applied ANN models is the multilayer perceptron (MLP) that consists of three layers: an entrance layer, a so-called hidden layer, and the output layer. The neurons of the individual layers are connected by means of proper weights, which are estimated at the stage of model learning. Because of the significant influence of initial values of the weights on the simulation results, and also problems with determining the global minimum of the analyzed functional relationships, a proper modification of MLP was introduced. If no amendments were introduced, possible limitations to the predictive capabilities of that model may occur, which cause the results of MARS, BF and BT to become comparable or even more precise than the ones achieved by MLP. In the method of SVM (support vectors method) the shortcoming of the neural networks (MLP) was eliminated by replacing the so-called hidden neurons with Kernel functions that facilitated to define the conditions of a global minimum occurrence. The above modification considerably influenced the predictive capabilities of SVM in comparison to MLP. However, in particular the SVM requires the identification of both: the insensitivity threshold (ε) and capacity (C) for the coefficients of namely the Kernel core (α i ), stressing that the cases of limited data quantity may affect the optimum values of the coefficients, and finally diminish the predictive capabilities of the model.
The determination of MLP models involved the number of neurons within the range from j to 2j + 1. In order to optimize the model structure for the adopted number of neurons, there was the need to analyze the proper, so-called activation functions in the hidden and the output layer (assuming particular function types: linear, exponential). Neuron number and activation functions were determined through consecutive iterations, leading to initial model structure. In the MARS method, the recursive algorithm was implemented both for model building and for independent variables selection, while SVM adopted the Gaussian form of Kernel functions, and the values of C, ε, and α were obtained through consecutive substitutions method. The GP (genetic programming) models were derived from mathematic operators such as +, −, /, and basically relied on the initial number of individuals n = 200, the generations N = 300, with the assumed mutation probability P m = 0.25 and the crossing probability P c at the level of 0.65.

Deardorff Equation
In order to describe relationships between meteorological parameters and soil moisture, the quasi-physical model was utilized, based on the concept presented by Deardorff [6]. The choice of that methodology was conditioned by simplicity aspects and appropriateness to represent diurnal variation influence on water regime of the shallow soil system. According to original assumptions, the variation of near-surface volumetric soil water content can be approached through mass exchange between the atmosphere and two distinguished soil layers: upper and lower. The first one, namely the surface layer, constitutes the transitional zone of the soil moisture controlled by daily weather variations to a large extent. Its thickness is assumed to be 10 cm. In the second layer, the moisture content changes seem to be visibly lower. The Deardorff model incorporates then only two time-dependent variables: the soil-surficial moisture content, for a first layer, and the bulk moisture content of the second one: where w 1 , w 2 denotes the soil water content (-) at upper and lower layer, respectively, h 1 and h 2 are layer thickness (m), P is precipitation rate (m/s), τ is diurnal time period (1 day expressed in seconds), C 1 , C 2 are dimensionless proportionality coefficients, E g is the bare-soil evaporation (m/s), µ is the constant used to account for the moisture transport below a lower layer and w max is the soil field capacity. The C 1 and C 2 parameter values rely on the soil moisture and texture as well as the functional relationships reported by Noilhan and Planton [35]. In the Deardorff approach the C 2 parameter is treated as a constant and C 1 becomes a piecewise function of effective saturation (C 1 = f(w 1 /w max )). The present model follows the original Deardorff formulation; however, the authors adopted another type of function for C 1 . Eventually, its value is derived against effective saturation, but as a power function of a single parameter: where m denotes the model parameter. That modification was imposed to reduce the number of parameters, as a piecewise, original relationship requires two more. Assuming m < 0, the proposed equation preserved the decreasing character of the C 1 coefficient with saturation. The exponential form of that equation also showed better agreement with an experimental set proposed by Noilhan and Planton [35]. With Equation (5), the dependency given with the data points for all three soil classes, sand, loam and clay, was successfully reproduced, and the coefficient of a linear correlation became higher than 0.95. The soil moisture transport was not involved in the Deardorff model as it was limited to a time period of a few weeks only. Here, to extend the time scale, the sink term was added to Equation (4), that permitted taking account of the moisture transport outside the model's domain. That term was utilized as linear relationship, applied before in the similar studies [36]. The bare-soil evaporation rate (Eg) was linked to the potential evaporation (Ep) with the following Equation (7): where w s ≈ 0.75 w max . The solutions of Equations (4) and (5) were achieved through explicit Runge-Kutta procedure, offered by the ode23 solver as the component of the Matlab computation software [37]. The variables m, C 2 , µ and w max were considered to be model parameters.

Coupled GSA-GLUE Methodology
The basics of the GSA-GLUE technique lie in the probabilistic identification of parameters according to original GLUE framework. A common practice is to search for one deterministic parameter set that provides the closest match of the model-simulated and measured values. Probabilistic approach facilitates the case of models with a considerable number of parameters since the inverse modeling may lead here to many equivalent solutions. The existence of different parameter combinations, that allow to sufficiently represent the process dynamics, results from lack of reliable data that provide ambiguous parameter values. Therefore, the solution is offered by adopting probabilistic parameter identification. In this case the existence of only one optimum combination of parameter values is dismissed on account of random distribution, which properties are to be determined in the model identification process. Probabilistic description of parameter space in the GLUE method provides also for the description of model uncertainty itself, being attributable to insufficient data number, but also imperfect representation of the real process (e.g., inaccurate data, inadequate mathematical description, simplifying assumptions) [38]. The problem of GLUE parameter estimation was formulated as Bayesian analysis: where X is the vector of model parameters [m C µ w max ], M(X) is a priori distribution of parameters, w is the vector of measured soil moisture contents, M(w|X) is the reliability function, M(X|w) is resultant distribution a posteriori, and M(w) is the scaling factor (such that ΣM(w|X) = 1). A priori distribution M(X) reflects initial assumptions on parameters' variability. In the case of models of surface runoff either soil moisture content or the structure of the parameters' distribution is not known, but only their range or allowable limits resulting from their physical interpretation. A constant character of the parameter distribution was adopted for soil moisture content modeling herein, which is a frequent case in such or similar applications [38]. Moreover, GLUE parameter identification embraced the transformation of a priori distribution to a posteriori one through reliability function as conditioning the probability of parameter combination due to the fitting quality of the modeled and measured values. In the analyses herein the following form of the reliability function was applied: where p is scaling parameter, k is a shape factor used to control a variance of function value, and V(·) is the variance r-averaged squared deviation of modeled and simulated values: where w m,k is the measured soil moisture content, and w s,k is the simulated soil moisture content. According to GLUE procedure assumptions Equation (8) is solved by means of the Monte Carlo method. In the first stage, the parameter sample is created based on the adopted a priori distribution. Then, the model is launched for either combination of parameters, that forms the following vector, X = [x 1 , x 2 ,..., x i ], and based on the calculated and measured soil moisture contents the reliability function values (w) and resultant a posteriori distribution of parameters are determined.
The supplementary element to the GLUE method was introduced as GSA (global sensitivity analysis). It facilitated to quantify the impact of parameter variance on the variance of model results, and finally to distinguish those parameters that significantly influence the model uncertainty. GSA belongs to variance-based methods and relies on variance decomposition, as follows: where Y is the model output and E(Y|X i = x i *) and V(Y|X i = x i *) is the model expected value and variance, respectively, for the i-th parameter X i to x i *.
The functions E(·) and V(·) are defined for Xi parameter that takes different values in its variability range. Once a particular sensitive parameter has been revealed, the output variance E[V(Y|X i = x i *)] takes relatively small values for different X i since the component V(Y|X i = x i *) does not involve the variability of that sensitive parameter. The significance of a parameter, on the other hand, will be manifested through high variance values of V[E|X i = x i *] on the condition that X i parameter changes along with output expected value. The model output variance is described by the second component of Equation (11) and is related with the direct impact of the X i parameter variance. That was the base to propose the model sensitivity measure to i-th parameter in GSA procedure, the so-called first order sensitivity indicator (S i ): Variance decomposition according to Equation (11) can be executed against all other parameters, (X -i = x -i *) than only on X i , as follows: In Equation (13), in the same manner as (11), the first component describes part of the model variance, being a direct effect of the parameters X -i on the model output Y. The second component, however, represents the influence of Xi parameter being in interaction with other parameters (X -i ) [38]. On this basis, the total sensitivity index is defined (S Ti ) for the model against X i parameter as follows: The general problem of sensitivity indexes determination by Equations (12) and (14) is the necessity to know the variance of the parameters. A solution is offered by the coupled sensitivity analysis and parameter identification in GSA-GLUE. Sensitivity analysis is performed on the index that takes the same form as the reliability function in the GLUE method and enables to analyze the model variance as identified with the uncertainty herein. Variance decomposition is realized by means of Monte Carlo technique with the adoption of proper structure of a sample to determine the values of parameter ensemble X i = x i * and X -i = x -i , according to Equations (11) and (13). Most frequently, the properly prepared sample is utilized both for GLUE parameter identification and sensitivity analysis.
In this paper the probabilistic identification of Deardorff model parameters m, C 2 , µ, and w max was accomplished by coupled GLUE-GSA methodology. Constant (a priori) distributions of the parameters were assumed according to literature data [24]. Parameter ranges are given in Table 2, and based on their values as well as assumed distributions the coupled GLUE-GSA analyses were performed aiming at the estimation of first order sensitivity indexes (S i ) as well as total sensitivity (S Ti ). Table 2. Assumed parameter ranges (m, C 2 , µ, w max ).

Variables
Unit Min Max Based on the above-mentioned analyses, a multidimensional distribution of the parameters m, C 2 , µ, and w max were determined so as to describe the reliability of the Deardorff model. That model was calibrated on soil moisture content data for the first vegetation season (first period) and validated on the available dataset for the second and third period, so as to test the accuracy of the model in different meteorological conditions.

Aspects of Soil Moisture Content Estimation
Since soil moisture content may play a role as an indicator in decision-making processes (e.g., catchment management planning, water resources distribution and allocation) the offered accuracy of the relevant, applied mathematical model seems to be crucial. Taking final decisions is directly dependent on the achieved simulation results (y mod ).
The measure of the decisions' reliability and their potential effects can be expressed as follows: where y mes is measured soil moisture content, y mod is modeled soil moisture content, and Rs(t) is a function of either potential ecological or economic effects based on discrepancies between the modeled and observed variables (data). The Rs(t) function takes different values depending on meteorological conditions (e.g., the season of the year, vegetation period, drought, etc.).
In exploitation practice two cases may occur, whether the modeled values are higher than measured ones or vice versa: Rs(t) d < 0 and Rs(t) g > 0, respectively. In the first case, the decision-making needs to acknowledge that the managed amount of water is actually lower than the theoretical, calculated one (y mod ). This can be a constraint on, e.g., drainage-irrigation scheduling or other solutions (retention, temporary water damming) in order to maintain a required soil moisture content. As a consequence, the environmental or ecological problems may occur related with the nature of the analyzed catchment and the discussed, relevant management issues. The value of Rs(t) g > 0 indicates additional soil water reserves that were not taken into account in the calculation phase. However, in this case the foreseen water requirements in the catchment can be treated as satisfied. Existing water excess, on the other hand, could be subject to reservoir retention or energy production. Whatever the solution, usually dependent on the existing infrastructure status, the proper maintenance of catchment resources would require modifying Equation (15) as follows: where θ is the function for the simulations of the soil moisture content dependent on meteorological conditions (in the situation of a deep groundwater table). Two models were adopted here: a mechanistic one, κ = 1, and a statistical one, κ = 2, which are chosen to execute the simulation due to prevailing local conditions in such a manner that the available expert knowledge helps to minimize the differences between the model and observations. At last, the analyses of predictive capabilities of the models were performed in order to reveal the periods of simulation results over-or underestimation, or if the measured data shows considerable mismatch against the modeled variables. In order to achieve that aim the so-called "heat maps" were prepared so as to classify modeling errors on a properly assumed scale.

Coupled GLUE-GSA Method
Based on (C 2 , m, µ, w max ) parameter variability ranges the simulations of their numerical values were performed by means of Monte Carlo procedure in the first instance. Obtained vectors (C 2 , m, µ, w max ) for the measured values (the values of precipitation and evapotranspiration) were incorporated into the Deardorff equation and then the probabilistic solution of the soil moisture content w 1 and w 2 was achieved. Taking into consideration the measurement data and the results achieved through Equations (11) and (13), the first order (S i ) and total (S Ti ) sensitivities were calculated (Figure 3). At the same time, the reliability functions for the consecutive, calibrated parameters of the Deardorff equation were determined and are given in Figure 4.
Taking advantage of a posteriori parameter distributions, the 95% confidence ranges were also determined for soil moisture contents w 1 and w 2 throughout the measurement period ( Figure 5).   The data given in Figure 3 proves the most considerable influence of the C2 parameter on the agreement between the modeled and observed values. It is made evident by the highest values of the first order sensitivity (S C2 = 0.358) and the total sensitivity (S TC2 = 0.548) in comparison to other parameters' sensitivities. Lesser influence on calibration results was shown by µ parameter which is supported by S µ and S Tµ values. Among all the analyzed parameters of the Deardorff model, the minor impact of m parameter was noticed. In this case, the values of S i and S Ti were equal to 0.019 and 0.168, respectively. Data shown in Figure 4 prove that the increase of p 1 from 0 to 0.2 leads to the average value of reliability function range from 0.75 to 0.85. However, the case of p2 and p3 proves that the increase of those parameters' values leads to decline of mean values of the reliability function and, moreover, the increase of p4 value considerably influences the agreement of the model and observations. It is also a fact that the increase from p3 = 0.02 to p3 = 0.08 caused the reliability function to range from 0.52 to 0.85. A considerable influence of ν parameter on reliability function was also noted, as the increase from ν = 2·10 −8 to10 −7 results in that function value to range from 0.02 to 0.93. Further increase of ν to 2·10 −7 causes the decrease of reliability function value to 0.82. The existence of strong interactions between consecutive calibrated parameters (p1, p2, p3, p4, ν) is indicated by the achieved reliability function ranges. Calibration results of the Deardorff model (soil moisture content w 1 , w 2 ) by the coupled GLUE-GSA method for the analyzed second period are given in Figures 5 and 6. The results show w 1 (average soil moisture content in layer1) to be underestimated by as much as 0.03 maximum in comparison to measured values for the period 0-85 days of the first calibration stage. However, the soil moisture content at 0.20 m (w 2 ) was underestimated by 0.03 on average for the whole period of 0-85 days, where the daily average precipitation was equal to 1.64 mm and the total amounted to 143.1 mm. The next period, encompassing days 89-120, showed the moisture contents w 1 and w 2 to be overestimated in comparison with the measurements, and involved more intense daily average precipitation of 5.77 mm and a total of 369.3 mm. However, the period between day 120 and 180 exhibited w 1 values to be overestimated by 0.08 maximum, and the observed difference between the model and observations decreased within the period, leading to their final equality. It was also noted that the simulated soil moisture content values in the second layer (w 2 ) became lower than the observed ones after the 170th day. The second calibration phase (2014) was subject to considerably smaller errors. The achieved results pointed at predictive capabilities of the model to be dependent on dynamic meteorological conditions. This was the reason for the realization of additional soil moisture content simulations (w 1 , w 2 ) aimed at detailed analyses of model predictive capabilities in order to select the optimum calculation method.

Statistical Models for Soil Moisture Content Forecast
Based on Section 2.3 and the presented relevant principles for model building by statistical methods BT, RF, MLP, MARS, and SVM, the simulations of soil moisture content were realized for both layers (w 1 , w 2 ). The calculations involved numerous combinations of independent variables, describing precipitation totals (P), air temperature (T, T sr ), and humidity (Wilg, Wilg sr ). Tables 3 and 4   In the case of BT and RF models, elaborated for w 1 and w 2 forecasting, the obtained number of trees ranged from 40 to 200, which means the models were not over-learned. For the MLP method the number of neurons ranged from 2 to 6, indicating that their maximum value (2·j + 1) was not exceeded. The proper activation functions in the hidden layer and the output layer were most frequently in the form of hyperbolic tangent and linear relationships. Moreover, the models achieved through the SVM method yielded the C parameter range 100-900 and α = 0.10-0.80, respectively.
All in all, the data given in Tables 3 and 4 proved that precipitation, air temperature, and humidity influenced the soil moisture content most significantly. The smallest forecast errors for w 1 and w 2 , taking into consideration only the precipitation totals, were obtained for average daily rainfall on a weekly basis. What is more, the achieved results point at more considerable impact of air temperature than humidity on soil moisture content, which is made evident by the estimated error values (MAE, MAPE). It points at a major influence of meteorological conditions which is also reflected by other research [33]. Among all analyzed methods, the smallest forecast errors of w 1 and w 2 were obtained by the MARS approach. More considerable errors were found for the SVM, BT and RF models. The analyzed combinations of independent variables provided the smallest error values of w 1 and w 2 forecasting for the P sr , Wilg sr , and T sr (weekly averaged values) combination set.
All the above-mentioned models served to find the independent variables, which exhibited the best match of the modeled and observed values. Those variables seem to prove the soil moisture content to be conditioned by the dynamics of precipitation phenomena and meteorological conditions, first of all. Other impacts, such as soil type and its physical properties, may be found as important and cannot be omitted. Although these other impacts were not deeply considered herein, they deserve more insight in a future application of elaborated models. However, additional validation of these models in different precipitation and meteorological conditions should be treated as the most urgent issue.

Simulations of Soil Moisture Content by Deardorff and Statistical Models-Heat Maps
Taking into consideration the simulation results by statistical models on one hand, and the physical model results on the other hand, additional calculations of soil moisture content (w 1 , w 2 ) were realized for vegetation seasons of 2013 and 2014 (calibration stages). This aimed to provide some insight into the models' performance in slightly different development of hydrological and meteorological conditions (e.g., temporarily variable precipitation seasons). The achieved results, given in Figures 7-10, prove that from all the applied methods (Deardorff model, BT, RF, MLP, SVM and MARS method) the best match of the modeled and the observed soil moisture contents was provided by Deardorff equation (physical one, Equations (4) and (5)). For most of the analyzed periods, the outcome of all applied models shows the overestimation of the simulated values in comparison to measurement data. This fact can be treated as a limitation in respect of the utilization of these models for soil moisture control. In realistic exploitation conditions, the soil moisture content would be lower than the mode result, which can lead to water shortages at the stage of decision making.
Based on the performed calculations by statistical models (RF, BT, SVM, MLP, MARS), and also by the Deardorff model, it was found that soil moisture contents (w 1 , w 2 ) for no-rain periods were overestimated by 10% (SVM, MLP) or even by 15% (BT, RF, MARS, Figures 7 and 8). The same tendency was revealed for 2014 in the second part of the analyzed period-mainly in September. The highest error values, exceeding 5%, were noticed for the BT method. Relatively small errors of w 1 and w 2 forecast were achieved by RF, BT, SVM, MLP, and MARS determined by the choice of independent variables. It was also noticeable that the simulated soil moisture content at the depth of 0.10 m (w 1 ) in the case of intense precipitation (period between 30th and approximately 90th day) was locally lowered even by 20% (Figure 8) by the applied statistical models: MLP, SVM, MARS, and RF.
Simulation results of the soil moisture content for the depth of 0.20 m (w 2 ), taking into account the single precipitation events, tended to be lower by about 5 to 10% in the adopted statistical models (Figure 7). In particular, this pertains to the period from the 30th to the 90th day for which the moisture contents were lower by as much as 5 to 10% in comparison with the observations (Figure 8).
Such values of the modeled soil moisture content result from the applied calculation procedures (models) and rely on the selection of the independent variables for the analyzed case study (P sr , Wilg sr , and T sr ). Only the computations for MLP and MARS for 2013 showed that for single precipitation events the w 2 values (moisture contents for the second layer) became overestimated by not more than 5%. However, for 2014, the period between the 30th and 90th day exhibited visibly lower simulated moisture contents for most of the precipitation events. On the other hand, taking into consideration such precipitation rates that are not of a more intensive character (e.g., the rates of p < (2-3)·10 −7 m/s), the simulated values of the soil moisture content were underestimated by no more than 5%. Only in the case of lower precipitation intensity was a tendency of overestimated modeling results observed, especially for RF, MARS and SVM method (Figures 9 and 10). Moreover, the application of elaborated statistical models through RF, BT, MLP, SVM, and MARS (Figures 7 and 9) found that the simulated soil moisture contents w 1 and w 2 (both layers), preceded by a 23-day no-rain period in particular, were underestimated by about 15% maximum for SVM and MLP, 12% by MARS and RF, and 10% by BT. It was also noted, the second layer 0.20-m (w 2 ) moisture contents were underestimated by as much as 5 to 10% by all the above-mentioned methods, relying on the choice of independent variables. It was found for the BT method, in particular, that for the period from July to August 2014 the soil moisture content was overestimated by 5 to 10%. Furthermore, the results provided by RF, SVM, MLP and MARS for the analyzed period, preceded by rainfall events of daily intensity ranging from (0.1-1)·10 −7 m/s, tended to be overestimated by as much as 5%. In summary, the predictive behavior of all the elaborated statistical models in respect of the soil moisture contents seem to become strictly dependent on the character (particularly the actual values) of the input data (P sr , Wilg sr , and T sr ).

Summary and Conclusions
The case of physically-based models results in more straightforward estimations of the consecutive, independent variables' influence on the simulation results. It involves particular equations which make it possible to analyze and prove the impact of selected variables on the process dynamics. Statistical models, however, show more difficulties in this respect. All of the models that were approached in this work, as well as the coefficients describing their structure, encountered difficulties in their physical interpretation, but this was found only by this particular study and cannot be generalized. It should be mentioned that statistical models involved the interpolation of soil moisture contents on the basis of the data, which was utilized at the stage of learning and testing. The data used for calculations may involve identical or similar combinations of the independent variables which, in turn, influence the results of the forecasts. Physical models can avoid such problems. However, taking into account the fact that the Deardorff model overestimated the soil moisture contents, potential problems in decision-making may arise, affecting the determined goals that were suggested by this particular study, such as water reservoirs and hydraulic structures operation. It is based on the assumption that there is more water stored in the soil than there actually is, which can be considered a shortcoming in the management of water resources in a hypothetical area. This may lead to potential negative ecological and economic consequences from the viewpoint of decision-making and planning, therefore the need arises to introduce, for instance, safety factors. The value of that factor, assumed to be lower than one, would reduce the actual amount of water resulting from the simulations.
Nonetheless, the statistical models applied to TDR data (soil moisture contents) proved to have both a simple structure and interpretation character, showing considerable exploitation potential. The physically-based Deardorff model required additional assumptions on the formal relationships between the parameter values. Eventually, the measurement data gathered by TDR technique facilitated the uncertainty analysis on different types of models. Evidence was found for sufficient reproduction of the soil moisture contents for two layers (at 10 cm and 20 cm depth) of a loamy soil at a particular sampling point through a selection of statistical models and a physical one. Even if the proposed methods can be considered as universal, their future development and utilization requires calibration on a range of data coming from similar research sites.