High-Resolution Estimation of Soil Saturated Hydraulic Conductivity via Upscaling and Karhunen–Loève Expansion within DREAM ( ZS )

: Quantification of the soil hydraulic conductivity is key to the study of water flow and solute transport in unsaturated soils. Rapid advances in measurement technology have provided a large number of observations at different scales, offering unprecedented opportunities and challenges for the estimation of hydraulic parameters. This paper proposes an inverse estimation method for downscaling of observations on coarse scales to estimate hydraulic parameters on high-resolution scales. Due to the significant spatial heterogeneity, the inversion faces the problems of dynamics-based integration of data at different scales, model uncertainty due to hundreds and thousands of parameters, and computational consumption due to the large number of forward simulations. To overcome these problems, this paper uses an efficient Bayesian optimization DREAM ( ZS ) as an inverse framework, and incorporates an analytical upscaling method and Karhunen–Loève (KL) expansion to infer finer-scale saturated hydraulic conductivity distribution conditioned on coarse-scale measurements. The efficient upscaling method is used to link measurements and hydraulic parameters at different scales, and Karhunen–Loève (KL) expansion is incorporated to greatly reduce the dimension of the parameter to be estimated. To further improve the efficiency of the inversion, a locally one-dimensional (LOD) algorithm is used to solve the multidimensional water flow model at coarse scales. The proposed inverse model is applied in a series of numerical experiments to demonstrate its applicability and effectiveness under different flow boundary conditions, different levels of ratio between coarse-and fine-scale grids, different densities of observation points, and different degrees of statistic heterogeneity of soil mediums.


Introduction
Soil hydraulic conductivity is essential information for soil water flow and solute transport simulations to represent hydrological processes such as water storage, release, and transfer in the soil, and is therefore a prerequisite and key link to ensure model effectiveness.Saturated hydraulic conductivity is often spatially heterogeneous and scaledependent [1,2].Uncertainty in ground permeability can result in considerable alterations to the response of the porous medium [3,4].And the simulation process is often timeconsuming and labor-intensive to obtain directly in field or laboratory experiments subject to scale constraints [5,6].In recent years, numerical inversion methods based on field observations have been increasingly used to estimate the hydraulic conductivity of soils [7][8][9][10][11][12][13][14][15][16].
We note that most of the existing studies focus on the inversion of hydraulic parameters in homogeneous or stratified soils using observations at the same scale.However, strong spatial variability often requires us to obtain the spatial distribution of soil hydraulic parameters at a high resolution.Particularly, as the observations are obtained from multiple scales with different means of observation, it is necessary to integrate the dynamic observations at different scales to obtain the hydraulic parameters at the required scales.Thus, the problem boils down to integrating observations at different scales to invert the spatial distribution of soil hydraulic parameters at high resolution.This problem faces two difficulties: first, how to integrate observations on different scales for the inversion of hydraulic parameters on the scale of interest.Here, it is necessary to establish a kinetic relationship between observations on multiple scales and the hydraulic parameters on the scale of interest.The other is how to solve the problem of efficiency and uncertainty of the inversion simulation due to the huge parameter dimensions and the consequent large number of forward simulation calculations.
This study uses Bayesian optimization DREAM (ZS) as an inverse framework, and incorporates an efficient upscaling method and Karhunen-Loève (KL) expansion to infer finer-scale saturated hydraulic conductivity distribution conditioned on coarse-scale measurements.First, the DREAM (ZS) algorithm is used as a inverse framework to construct the posterior distribution of saturated hydraulic conductivity field in high-dimension soils conditioned on coarse-scale observations.The DREAM (ZS) is an improved Markov chain Monte Carlo (MCMC) method proposed by Vrugt et al. [17][18][19][20], which greatly improves the convergence speed of the inverse model compared to the traditional MCMC method.Compared to the basic DREAM (DiffeRential Evolution Adaptive Metropolis) algorithm, it refers to the previous sampling state to adjust the current process in generating each chain candidate, which aims to reduce the number of parallel chains and accelerate the convergence of high-dimensional problems, and has been successfully applied to research in many fields [21][22][23][24][25].
Then, the upscaling method proposed by Li et al. [26] is used to link observations and hydraulic parameters, which may come from different scales.This method is based on the Kirchhoff transformation and the two-convergence theory to propose the macroscopic description of the Richards equation and the effective constitutive relations, rather than equivalent or upscaled flow parameters, as is the case with many parameter averaging approaches such as the stream tube approach [e.g., [27][28][29]] and the scaleway approach [e.g., [30,31]].And compared with a number of multiscale numerical methods to calculate the effective constitutive relations, which are formed and solved numerically, the proposed upscaling approaches is based on analytical formulations, which are preferred in practical applications because they are efficient in computation and convenient to use.Simultaneously, within the Bayesian inversion, the using of upscaling method avoids solving the model at fine scales, thus improving the computational efficiency of the forward models.In addition, the local one-dimension (LOD) [32] is used to solve the multidimensional Richards equation to further improve the efficiency of the inversion process.
Within the DREAM (ZS) framework, the incorporation of the upscaling method can efficiently integrate the hydraulic parameters and state variables at different scales, but it still has to face the problem of model uncertainty due to the large number of parameters and thus the consequent problems of convergence and efficiency of inversions.This study employs KL expansion to reduce the parameter dimension and thus the uncertainty of the inversion simulation.Karhunen-Loève (KL) expansion is a technique used in data analysis to represent a data set in a more efficient way by decomposing the data into a set of simpler, uncorrelated functions capturing the most significant variations within the data.In Bayesian inference, KL expansion has been an effective approach to represent the random field and is successfully applied in various fields (e.g., [33][34][35][36][37][38]). Through KL expansion, thousands of spatially distributed hydraulic parameters can be reduced to dozens of quantities, which greatly reduces the number of predicted parameters and thus the uncertainty of the inverse simulation.
Therefore, in the framework of Bayesian inference DREAM (ZS) , combined with the upscaling algorithm and KL expansion, we propose an efficient and effective inversion method to downscale the observations on coarse scales to obtain higher-resolution saturated hydraulic conductivity.The applicability and effectiveness of the proposed method are investigated through a series of numerical examples.Different flow boundary conditions, including constant head boundaries, variable flux boundaries, and mixed boundaries, are considered for two-dimensional soil profiles.We consider soil heterogeneity, characterized by statistical spatial variability include variance, correlation length, degree of anisotropy.We set different levels of mesh coarsening, i.e., inversion of saturated hydraulic conductivity field on fine scales based on different coarse-scale data, to explore the effects of different levels of mesh coarsening on the inverse results.And we investigate the effect of a smaller number of data on the inverse simulation by reducing the number of observation points in the data.Overall, the proposed methodology is validated in a comprehensive manner through a large number of numerical experiments.
The paper is structured as follows.Section 2 describes our numerical inverse model, revisits the upscaling method and KL expansion, and introduces the DREAM (ZS) to solve the inverse problem.Section 3 describes the case setup for numerical simulation, including the selection of boundary conditions for water flow, characterization of soil heterogeneity, scaling between coarse and fine scales, and the setup of observation point density.Section 4 provides a thorough assessment of our method's performance under different model settings.Section 5 summarizes the key findings of this paper.

Methodologies
This study is concerned with the reconstruction of the spatial distribution of soil saturated hydraulic conductivity at finer scales based on pressure heads at larger scales.Hydraulic parameters and dynamical data at different scales are integrated within the DREAM (ZS) inference framework, incorporating the upscaling method and the KL expansion.The forward model is to solve the macroscopic Richards equation with homogenized soil hydraulic properties using the LOD method.Therefore, this section describes the forward models, including the Richards equation with soil properties, the upscaling method, and KL expansion, and then briefly introduces the DREAM (ZS) method.

Soil Water Flow Modeling
This study considers estimation of saturated hydraulic conductivity in two-dimensional soil profiles.The Richards equation is the most prevalent model to describe soil water flow, as follows in Equation (1): where h is the pressure head, m; t is the time, d; C(h) is the change of water content with head, m 2 m −3 ; K(h) is the hydraulic conductivity corresponding to the pressure head, m d −1 ; and θ is the water content, m 3 m −3 .The van Genuchten model [39,40] is used to inscribe the soil hydraulic properties, including the water content curve θ(h) and the hydraulic conductivity function K(h): where α, β, γ are curve empirical coefficients, α is the reciprocal of the inlet value, m −1 , β is the pore size distribution coefficient, , θ s is the saturated water content, m 3 m −3 , and θ r is the residual water content, m 3 m −3 .

Upscaling Method
The upscaling method based on numerical homogenization theory is computationally efficient and convenient to use because of its explicit expression [26].The macroscopic model of the Richards equation is as follows: where θ* , h* , K* , D s *, α* are the equivalent coefficients in the homogenized Richards equation.

KL Expansion
In order to solve the uncertainty problem caused by a large number of parameter inversions, this study uses KL expansion to reduce the dimension of the parameter space.This powerful technique decomposes a random field into a series of uncorrelated random variables and deterministic coefficients.The key advantage of KL expansion lies in its ability to represent a complex random field with just a few parameters, leading to low estimation errors and broad applicability.Assume that the two-dimensional random field of saturated hydraulic conductivity K s (x, ω) has a mean value of K s (x, ω), and that all spatial points on E K s (x, ω) − K s (x, ω) 2 all exist; then, its KL expansion is described as Equation (9): where x is the spatial point coordinates; ξ i (ω) is a random field generated using a Gaussian distribution, where ω is the parameter randomness and in this paper the value is all the same in different samples for control variable; { f n (x)} is the eigenfunction series, and {λ n } is the series of eigenvalues.{ f n (x)} and {λ n } are the solution to Fredholm integral equation of the second kind.
where Ω is the region of integration.The eigenfunction can be decomposed into a linear combination of M orthogonal basis functions.
In the generation of a random field through the use of KL expansion, a set of M (the truncation order of KL expansion) mutually independent random variables, denoted by ξ i (ω) , is produced.The eigenfunctions and eigenvalue series are obtained through the solution of the integral equations, { f n (x)}, {λ n }.Once the integral equation has been solved to yield the eigenfunction and eigenvalue series, the value of the Ks random field is generated by Equation ( 9), thereby enabling the K s field to be represented by a small number of M [36].
Figure 1 depicts the magnitude of the eigenvalues for the first 50 orders.It can be seen from the figure that the eigenvalue size decreases rapidly as the order increases, and the eigenvalue sizes of the lower orders are much higher than the higher-order values as a percentage of the eigenvalue size.Therefore, a small truncation order M can be used to approximate the original complete eigenvalue, which can greatly reduce the number of parameters and also ensure the accuracy of the overall KL expansion on behalf of the random field, which greatly reduces the number of parameters required for the inversion.In this paper, the truncation number of the eigenvalues used is M = 30, which contains more than 99% of the total eigenvalues' size.

Bayesian Inference with DREAM (ZS)
In the inversion process, suppose that the model to compute is f (u), which contains a series of continuously distributed parameters to be solved, u, and a series of observations, Y obs .According to the Bayesian formula, we can obtain Equation (11).
where p(u|Y obs ) is the posterior density probability function of u, p(Y obs |u) is the likelihood function, and p(Y obs ) is the probability of the observation arising from the value of the observation, which is a constant independent of u.
In the usual inversion process of MCMC sampling, if the proposed distribution is poorly correlated with the target distribution, a large number of proposed samples are rejected because they fall outside the posterior distribution, which leads to the need for a much higher number of iterations and greatly prolongs the overall inversion computation time.Therefore, an effective parametric inversion algorithm, DREAM (ZS) , is used in the inversion process of this paper.DREAM (ZS) creates the jump vector in DREAM from the past states of the joint chains [19,20].Instead of generating candidate samples ζ i based only on the samples in the current row of the matrix X, all the samples in the current and past state matrices Z of the chain are used, the equation for obtaining the c candidate sample ζ i in the i chain is Equation (12).
where: χ i c−1 is the c − 1 sample on the i Markov chain; η is a normally distributed random number with mean 0 and standard deviation b * ; e η is a uniformly distributed random number in the interval [−b, b]; g(ψ, η ′ ) is a scale factor, the value of which is determined by the parameters ψ and η ′ , ψ is the number of parallel chains for generating candidate samples, and η ′ is the number of parameters being updated in candidate sample ζ i ; Z r 1 (j) and Z r 2 (k) are the r 1 (j) and r 2 (k) samples randomly selected from the matrix Z, r 1 (j In this way, the inversion problem at high latitudes can be solved with a smaller number of parallel Markov chains and shorter chain lengths compared to the traditional MCMC algorithm. The DREAM (ZS) V1.0 software can be downloaded from the https://faculty.sites.uci.edu/jasper/ (accessed on 15 May 2024).

Setup for Downscaling Simulations
In this study, numerical experiments are performed with artificially generated dynamic data in two-dimensional heterogeneous soil profiles.Soil water pressure heads are generated based on the spatially distributed hydraulic properties and then treated as observations by adding Gaussian errors as perturbations.The proposed inverse model is demonstrated in a series of numerical experiments where three different boundary conditions of the soil water flow, four levels of scale variances between the fine and the coarse scales, various combinations of soil heterogeneity including four degrees of variances, four degrees of correlation lengths, and four degrees of anisotropy ratios are considered to give a comprehensive applicability and validity of the methodology.

Parameter Settings for Soil Profiles
The study area is set up as a 5 m × 5 m vertical profile.We choose loamy soil with moderate soil quality, and its hydraulic parameters are from the USDA (United States Department of Agriculture) for different soil texture categories: saturated water content θ s = 0.4, residual water content θ r = 0.07, a pore size distribution index n = 1.1, and the reference value of saturated hydraulic conductivity K s is 0.28 m d −1 .Here, the proposed inverse method is performed in heterogeneous soil profiles, with the two most variable parameters (K s and shape parameter characterizing the pore distribution α) exhibiting spatially variable.The "true" two-dimensional K s fields are generated using KL expansion, with this reference as the mean.α is considered to be the spatially distributed field associated with K s , that is, α(x, z) = K s (x, z).The spatial heterogeneity is quantified by statistical parameters including the spatial variation variance σ, the size of characteristic length λ (λ x and λ z in the x and z directions, m) and the anisotropic ratios (λ z /λ x ).These parameters are set as follows: σ = 0.5, 1.0, 1.2, 2.0; when investigating the effect of different characteristic lengths on the simulation results, we use an isotropic medium, but with varying characteristic lengths, that is, λ x = λ z = 4, 8, 12, 16 m; when investigating the effect of the degree of anisotropy on the simulation results, we use an anisotropic medium by taking the characteristic length in the x-direction to be 4, but in the z-direction to be 16, 64, and 180, which gives us four degrees of anisotropy, i.e., v = 1, 4, 16, 45.

Settings for the Fine-and Coarse-Scale Grids
The scales of the observed and needed parameters often do not coincide; here, we opt to downscale from the observed data on the coarse scale to obtain the hydraulic parameters on the fine scale.The fine-scale grid in this paper is set as 128 × 128 in the 5 m × 5 m study domain, on which the saturated hydraulic conductivity field will be estimated, that is, 16,384 parameters will be estimated.In order to explore the effect of upscaling size on the inversion results, the coarse-scale grid on which the observations are obtained is set to 128 × 128 (the same as the fine-scale), 64 × 64, 32 × 32, 16 × 16, and 8 × 8.The saturated hydraulic conductivity will estimated via the inverse model conditioned on observations from one of these coarse scales.

Settings for the Observations
The observations are generated based on the hydraulic properties.The initial condition is set as pressure head is −10 m.The total simulation time is 120 d, and the observation interval is 3 d.In order to investigate the influence of boundary conditions and on the inversion results, we set up three models with three different specific boundary conditions as follows: constant pressure head boundary (Model A in Figure 2a), variable flux boundary (Model B in Figure 2b), and lateral flow boundary (Model C in Figure 2c); other parameters not mentioned can be found in Figure 2. In the case of determining the coarse-scale and fine-scale grids, we want to use as few observations as possible to obtain reasonable parameter estimates, so we set up a set of different numbers of observations to investigate their effects on the inverse results.This investigation is conducted in the case of a 32 × 32 coarse-scale grid, which is a trade-off between efficiency and accuracy addressed in a later section.In the case of a coarse-scale grid of 32 × 32, if there are observations on all grids there will be 1024 observations, here we gradually and randomly reduce the number of observations by taking N equal to 1024, 974, 424, 124, 74, 24, 14, 9, 4 for nine cases.
Totally, three boundary conditions and different magnitudes of decline in scale, together with different degrees of soil heterogeneity, yield to the 66 numerical experiments in this study.The combinations are shown in Figure 3.
In order to measure the performance of the inverse method on these experiments, the normalized RMSE is used as a measure of agreement between observed and modeled values: where M, denotes the total number of soil water pressure head measurements or spatially distributed saturated hydraulic parameters; Y sim and Y obs denote the model simulated and observed value; Y max and Y min signify the maximum and minimum soil water pressure head measurement, respectively.

Presentation of Case Results
In this section we demonstrate the estimation results of saturated hydraulic conductivity by the inverse method.The results are shown here using model A as an example case, with a coarse-scale grid of 32 × 32, and spatial variability parameters σ = 1.0, 4 illustrates the trends of the likelihood function of the parameters to be estimated as the number of Markov chain iterations increases.The three chains show a similar convergence process, rapidly decreasing and gradually reaching the reference value of −1.4194 × 10 5 during the 50,000 iterations.Given the similar convergence behavior of the three chains, we arithmetically average these chains and then plotted the averaged evolution of the estimated saturated hydraulic conductivity, as shown in Figure 5 with iteration number I =100, 500, 1000, 2000, 4000, 8000, 15,000, 30,000, 50,000.It can be seen that as the iteration number increases, the estimated and true values of the parameters match better and better, illustrated visually in the scatter plots and also in the increasingly lower values of NRMSE marked in the upper right corner.

Effects of the Coarse-and Fine-Scale Grid Ratios
We use the case of Model A with σ = 1.0, λ x = λ z = 4 m but different coarse-scale grids to illustrate the effects of the coarse-and fine-scale grid ratios on the inverse modeling.The inversely estimated saturated hydraulic conductivity fields under different coarse-scale grid are illustrated in Figure 6, and values of NRMSE are also indicated in the figure .As can be seen from the figure, all cases show good simulation results, and as the coarse-scale grid gets closer to the fine-scale grid, the simulated values match the true values more and more closely.We also show in Figure 7 the temporal and spatial distribution of soil water potential, corresponding to modeled and true saturated hydraulic conductivity fields.Overall, the estimated hydraulic conductivity is able to simulate the spatial and temporal variability of the soil water potential very well in the coarse-and fine-scale setup scenarios, as also shown by the very low NRMSE values marked in the figures.Table 1 further gives the time spent on the inversion and the values of the evaluation metrics for different coarse-scale grids.As can be seen, when fine-scale grid is determined as 128 × 128, the greater the difference between the coarse and fine-scales, the lower the cost of the observation and the less the inverse simulation consumes, but also the lower the accuracy of the reconstruction of the fine-scale saturated hydraulic conductivity.Therefore, we need to make a trade-off between accuracy and cost.Through the above analyses, in the research situation of this paper, the coarse grid taking the size of 32 × 32 to 64 × 64 are more appropriate choices, which can guarantee a certain computational accuracy at the same time, the computational efficiency will be increased to 2 to 6 times of the original, which is why the other calculations of this paper set the size of its coarse grid to 32 × 32.Due to limitations on the length of the paper, the results related to model B and C are not represented graphically, but list metrics in Table 2 only.The spatial distribution of the K s field is expressed using three statistic parameters: spatial variation variance, characteristic length and anisotropy ratio.This section conducts numerical simulations in either isotropic (λ = λ x = λ z = 8, 12, 16 m) or anisotropic (λ x = 4 m, λ z = 16, 64, 180 m, that is, v = 4, 16, 45) soil profiles with σ = 0.5, 1.2, 2.0.
The inverse results under different soil heterogeneity conditions are illustrated in Figure 8.Because previous figures has shown the case with σ = 1.0, λ x = λ z = 4 m, here, results only under other conditions are illustrated.The quantitative metrics are listed in Table 3 using NRMSE.From the figure, it can be observed that overall the inversion effect has a tendency to become better as the spatial variability of the K s field becomes smaller.

Impact of Observation Site Density
In the case of a coarse-scale grid of 32 × 32 (1024 observations), we gradually and randomly reduce the number of observations to obtain nine cases of N = 1024, 974, 424, 124, 74, 24, 14, 9, 4. Figure 9 depicts the plots of the simulated values of the K s field against the reference values for these cases with the boundary condition denoted Model A. It can be observed that the values of NRMSE are stable around a small value for all the cases where the N value of the total number of actual observation points is above 74.That is, when the number of observations is greater than or equal to 7% of the total grid, a very satisfactory agreement between the estimated and true values is obtained, and when the number of observations is less than 7% of the total grid, the gap between the simulated and real values gradually becomes larger.We also conduct the investigation with other two boundary conditions denoted Model B and C, and the corresponding results are shown in Table 4.

Conclusions
In this paper, for the spatially high-resolution soil saturated hydraulic conductivity, a multiscale dynamic data integration method is proposed to obtain these parameters on finer scales based on the inversion of dynamic observations on coarse scales.The method uses Bayesian statistical inference as the inversion framework, combines an analytical upscaling method and KL expansion, constructs a posteriori distributions of saturated hydraulic parameters based on observations on coarse scales, and performs sampling simulations of the posteriori distributions to reconstruct the parameter fields at high resolution.The method provides an effective solution to the computational efficiency, scale integration, and parameter uncertainty problems of multiscale inversions, and is demonstrated and validated in a series of numerical experiments.The numerical results show that the proposed method has good applicability for different boundary conditions, and is very tolerant to large-scale variability in soil heterogeneity.The performance of the proposed method corresponding to the coarse-and fine-scale grid setup indicated that the method can be used to integrate data from different coarse scales as well as coarse scales that differ significantly from fine scales.

Figure 1 .
Figure 1.Variation in eigenvalues' size with order in KL expansion.

Figure 2 .
Figure 2. Schematic diagrams of the 2D section models with different boundary conditions.(a) Model A. (b) Model B. (c) Model C.

Figure 3 .
Figure 3. Parameter and model compositions for each algorithm.

Figure 4 .
Figure 4.The change in LogLikelihood with the iteration of the Markov chains.

Figure 5 .
Figure 5.The change of K s with the iteration of the Markov chains.

Figure 6 .
Figure 6.Comparison between the simulated and reference value of K s field (effects of the coarseand fine-scale grid ratios).

Figure 9 .
Figure 9.Comparison between the simulated and reference value of K s field (impact of observation site density).

Table 1 .
The simulation time with K s -NRMSE at different scales in Model A.

Table 2 .
Variation in the NRMSE with the scale size for the simulated and the reference K s

Table 3 .
Variation in the NRMSE with the number of observation points for the simulated K s and the reference K s

Table 4 .
Variation in the NRMSE with the spatial distribution for the simulated K s and the reference K s .