An Exponential Filter Model-Based Root-Zone Soil Moisture Estimation Methodology from Multiple Datasets

: Modern smart agriculture initiative presents more requests for soil moisture (SM) monitoring over large agricultural areas. Remote sensing techniques facilitate high-resolution surface SM (SSM) estimation at a large scale but lack root zone SM (RZSM) information. Establishing the deduction method of RZSM from the SSM has long been the focus of most attention. Data assimilation methods are promising techniques for RZSM estimation, developing numerous assimilated reanalysis datasets, e.g., ERA5 and the latest Soil Moisture Active and Passive (SMAP) L4 SM product. However, data latency and large computation during data collecting and processing often inhibits further applications. This work proposes a rapid estimation scheme for estimating RZSM with short latency and small computations, based on the Exponential Filter (EF) method. The EF model with single parameter T was ﬁrstly calibrated and validated using the SSM and RZSM of ERA5 reanalysis dataset, obtaining the optimum parameter T map for each grid. Then, the fast-updating SMAP L3 SSM product together with the scale-matched optimum T were adopted as inputs into the EF model to retrieve RZSM estimation of each grid. Speciﬁcally, such estimation scheme was tested over the central and eastern agricultural areas of China, using a dense monitoring network of 796 SM observation sites, which contains various land uses, as well as meteorological and hydrological conditions. The calibrated optimum parameter T presented an increasing trend with good physical explanations. Furthermore, all the estimated RZSMs were found to have good performances on capturing the temporal-spatial variations of RZSM and well reﬂecting seasonal RZSM changes. Overall, such an estimation scheme was proven to be a desirable alternative for estimating RZSM over large agricultural areas.


Introduction
Soil moisture (SM) plays a key role in the hydrologic cycle processes by controlling the rainfall distributions between surface runoff, infiltration, and ground water recharge, and the evapotranspiration rate from bare and vegetated areas [1,2]. Hence, a good understanding of SM information is essential to improve weather forecasting [3], drought and flood predictions [4][5][6][7], agricultural water resources management [8], and climate change investigations [9]. Traditional methods to obtain SM information are based on those stationobserved instruments, including the gravimetric method [10], time domain reflectometry (TDR), gamma ray scanners, and neutron probes [11], data quality of which depend on the sampling frequency and distribution density of measuring stations at regional to global scales. Newly arising remote sensing (RS) techniques can continually estimate SM at a large This work aimed to test the applicability of EF methods in estimating RZSM from SMAP L3 SSM dataset, and to establish a scheme suited for RZSM estimation with short latency and high computational efficiency over large agricultural areas. To this purpose, this work calibrated and validated the EF model using ERA5 reanalysis SM dataset over central and eastern agricultural areas of China, obtaining the optimum parameter T of each grid according to the largest Nash-Sutcliffe Efficient (NSE) between the ERA5-derived and ERA5-provided RZSM. The ERA5-derived RZSM was then evaluated against the observed RZSM series in different agricultural zonings, including the spatial-temporal and seasonal performance from site to regional scale. The calibrated optimum T values were interpolated to the grid matched with the spatial resolution of SMAP L3, and on this basis, optimum T value together with SMAP L3 SSM was used as inputs of EF model to retrieve RZSM estimation for each grid. The performances of SMAP L3-derived RZSM series were also evaluated compared to the observations. Further, the applicability of the proposed calculating scheme was discussed, as well as the implications for modern agricultural water management.

Study Area
The central and eastern agricultural areas of China (105 • 14 E~131 • 31 E, 28 • 8 N~46 • 2 N), including 13 major food producing provinces with a covering area of 1.77 million km 2 , were chosen as the study area ( Figure 1). The cultivated area of the study area reaches~45% of the total cultivated area in China, and the grain output accounts for~55% of the total national output. The study area is divided into four agricultural areas as: I. Middle-lower Yangtze Plain; II. Huang-Huai-Hai Plain; III. Northeast China Plain; and IV. Loess Plateau, according to China's nine major agricultural regions. Some key soil properties (e.g., texture, sand/silt/clay fraction, and bulk density) for topsoil (0-30 cm) within the four agricultural areas are listed in Table 1. For those relatively humid areas as Middle-lower Yangtze Plain and Northeast China plain, soil texture is Medium/Fine, with higher clay fraction, lower sand fraction, and lower bulk density. Whilst, those relatively arid areas, particularly for the Loess Plateau, showed the highest sand fraction, lowest clay faction, and largest bulk density. Table 1. Key physical soil properties for topsoil (0-30 cm) in four agricultural zoning areas. (Note that only three simplified textural classes were used: Coarse-, Medium-, and Fine-textured, due to the scale (1:5 million) of the Soil Map of the World, and that numbers in the brackets of the last column indicate the average value of bulk density). Land uses of those areas mainly include Cropland (CRO), Forestland (FOR), Grassland (GRA), Mixed land (MIX), and Barren land (BAR), based on the Remote Sensing Monitoring Data on Land Use of China that are generated in 2015. The land use data was downloaded from the Resources and Environment Data Cloud Platform (http://www.resdc.cn (accessed on 6 July 2021)) of the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences. Soil moisture, particularly that in rootzone areas, is highly related with crop growth in those agricultural areas, especially for the North China Plain areas, which has long been perturbed by a serious water shortage issue due to extensive irrigation [41]. Therefore, obtaining rootzone SM information is critically needed for drought forecasting, agricultural irrigation guidance, as well as water resources management within those areas [7,42].

In Situ SM Data
The in situ, ground station-based SM observations are fundamental datasets for understanding the true SM dynamics. A total of 796 SM monitoring stations are distributed in the study area, with the most densely distributed stations in Huang-Huai-Hai Plain (region II) ( Figure 1). There are 419, 48, 39, 288, and 2 stations located in the Cropland, Forestland, Grassland, Mix land, and Barren land areas, respectively. Each station adopted the gravimetric method for SM measurement [10], which is generally recognized as the standard method. At all sites, four soil samples under 0-10 cm, 10-30 cm, and 30-50 cm surface layer were collected by a soil auger, and 3 parallel soil samples were analyzed for each soil sample. The soil samples were dried at 105 • C for 8 h in the oven and weighed after cooling. In terms of error control, the allowable error did not exceed 1% for parallel determination when SM is less than 5%, and that did not exceed 2% when SM is higher than 40%. The in-situ station observed SM data started from 1 January 2006 to 31 November 2017, sampling on the 1st, 11st, 21st day per month at 00:00 UTC.
According to Yang et al. [43], SM observation at three times per month could well represent the SM distribution within a month, which showed good reflection of the ground SM changes with local precipitation and evapotranspiration. Additionally, the in situ SM observations provide gravimetric units (kg/kg), while most satellite/reanalysis SM datasets provide volumetric units (m 3 /m 3 ); thus, the first step is to uniform the units of different datasets. The soil bulk density data used for unit conversion is from the Harmonized World Soil Database (HWSD), of which the data source of China is the 1:1,000,000 soil data provided by Institute of Soil Science, Chinese Academy of Sciences, for the second national land survey. Specific descriptions on the calculating formulas can be seen in our previous work [43].

Satellite/Reanalysis SM Datasets
Based on the evaluation work of multi-source surface SM datasets over the central and eastern agricultural areas of China [43], several satellite products and reanalysis datasets were demonstrated with better spatial-temporal performances and higher estimation accuracy. Specially, ERA5 reanalysis dataset produced by the European Centre for Medium-Range Weather Forecasts (ECMWF) had been proven the most promising alternatives for SM estimation [33]. ERA5 SM dataset covers the period from 1981 to the present, providing four-layer soil moisture data, with high spatial-temporal resolution (i.e., 1-hourly and 0.1 • ). A high-quality ERA5 dataset is available at 2-3 months before the present, and the latency of the preliminary dataset can be controlled within~5 days.
The Soil Moisture Active Passive (SMAP) mission is an orbiting observatory to measure the amount of water in the surface soil, which was launched in January 2015 and started operation in April 2015. The enhanced level 3 radiometer global daily 9 km EASE-Grid surface soil moisture products generated from SMAP were chosen as the input to EF model in this work, covering the period from 1 April 2015 to 31 December 2017 [44]. Meanwhile, the level 4 global 3-hourly 9 km EASE-Grid surface and root zone soil moisture analysis update products (at depths of 0-100cm), that were assimilated and derived from Catchment Land Surface Model of SMAP [45], were added herein to compare with the EF-simulated root zone soil moisture. All SMAP L3 and L4 soil moisture products were downloaded from NASA Distributed Active Archive Center (DAAC) at National Snow and Ice Data Center (NSIDC). It should be noted that only SM at 00: 00 UTC of SMAP L4 was chosen to match the observed SM time series ( Figure S2). A summary of ERA5 and SMAP soil moisture datasets can be seen in Table 2.

Methodology
Exponential Filter Method. According to Wagner et al. [36], water fluxes across the surface and the root zone layer are assumed proportional to the differences of volumetric SM between the two layers. That way, the water balance equations for the root zone SM can be simplified as where ∅ S is the surface SM (cm 3 /cm 3 ); ∅ R is the root zone SM (cm 3 /cm 3 ); L is the depth of root zone layer (cm); and C is the pseudo diffusivity coefficient (cm/d), which is not only highly related with soil properties, but also influenced by plant species and meteorological conditions. Defining T = L/C gives the following analytical solution of the Equation (1): where T is the characteristic time length (d), which increases with the layer depth L, and decreases with the pseudo diffusivity coefficient C. The parameter T is usually used as the time scale of SM variation, and is a multi-factor comprehensive parameter that affects the dynamic change of soil moisture [40]. Note that although transpiration processes are not involved in this simplified water balance model, and that soil hydraulic conductivity is also assumed constant in the calculating processes, such method can provide a useful tool for estimating profile SM content due to the weakening influences of measurements with increasing time lag [36]. Further, the discrete forms of Equation (2) are defined as the following formula (using the standardized SM data): where ϑ R is the standardized rootzone SM (cm 3 /cm 3 ); ϑ s is the standardized surface SM (cm 3 /cm 3 ). Herein, the standardization formula uses min-max normalization. Albergel et al. [37] also proposed an iterative scheme of Equation (3) for further convenient calculation: where the gain K at time t n (K n ) is defined as the followings: The initial value is given as K 1 = 1, and ϑ R (t 1 ) = ϑ s (t 1 ). It is noteworthy that although Equation (5) is an explicit recursive formula, and prone to accumulate estimation errors in theory; however, the root zone SM at time t n (i.e., ϑ R (t n )) is estimated based on the surface SM at the corresponding time, not merely using the estimated value at t n−1 (i.e., ϑ R (t n−1 )), which can largely reduce the accumulated estimation errors.
Estimation Scheme. Herein, an estimation scheme was proposed to briefly outline the calculating steps for estimating RZSM using satellite surface SM based on the EF model ( Figure 2). Two key steps were mainly involved in this scheme. The first step (i) was to calibrate and validate the EF model using the surface and root zone SM data of ERA5. When establishing the EF model, 2/3 of the total ERA5 SM data series duration was divided as calibration period (2006-2013) and the rest 1/3 was as the validation period (2014-2017). Specifically, the ERA5 SSM dataset (first layer, 0-7cm) was firstly adopted in EF model to simulate the RZSM in the range of T ( ) for each 10 km grid. Then, the Nash-Sutcliffe Efficient (NSE) was computed between the second layer of ERA5 SM (7-28cm) and all the simulated RZSMs, to determine the optimum T of each grid. On this basis, an optimum T map at 10 km grids was constructed, together with the simulated RZSM series corresponding to the optimal parameter T for each grid (hereinafter referred to as ERA5-derived RZSM). The second step (ii) was to derive the RZSM from SMAP L3 surface SM dataset using the established EF model in step (i). Firstly, interpolate the optimum T map (~10 km) obtained in step (i) to match the scale of SMAP L3 SSM dataset (~9 km), using the inverse distance weighted (IDW) interpolation, to obtain a new optimum T map at 9 km grid. Then, the EF model was driven by the matched optimum parameter T and the SMAP L3 surface SM data, to estimate the RZSM at each 9 km grid (Hereinafter referred to as SMAP L3-derived RZSM). Lastly, all estimated RZSMs by EF model, and ERA5-and SMAPprovided RZSMs were compared and evaluated against the ground observed SM series, including the spatial-temporal and seasonal performances.
Performance Metrics. Herein, three statistical metrics as Nash-Sutcliffe Efficient (NSE), Relative Error (RE), and Root Mean Squared Error (RMSE) were adopted to evaluate the performance of EF model. The higher NSE, lower RE and RMSE, indicating the better performance of EF model in simulating RZSM. Herein, NSE, RE, and RMSE was calculated according to the following formulas: where t 1 and t n is the start and end time of RZSM dataset series, ∅ t ERA5 and ∅ t sim is the ERA5-provided and EF-simulated RZSM at time t, and ∅ ERA5 is the average of ERA5provided RZSM series.
where ∅ sim and ∅ ERA5 is the average of model-simulated and ERA5-provided RZSM series.
Similarly, ∅ t ERA5 and ∅ t sim indicates the ERA5-provided and model-simulated RZSM at time t.
Moreover, statistical metrics as Correlation Coefficient (CC), p value, bias, and RMSE were chosen, to quantitatively evaluate the differentiation between model-simulated RZSM and the ground observations. The larger CC, lower bias and RMSE compared with the observations, the higher estimation accuracy of model-simulated RZSM. Of them, RMSE was calculated using the same formula with Equation (8), and Pearson correlation coefficient was computed using the following formula: where ∅sim is model-simulated RZSM value (m 3 /m 3 ), ∅obs is in-situ ground observed RZSM value (m 3 /m 3 ), ∅sim and ∅obs is the average of model-simulated and observed RZSM series, and the difference between the former minus the latter was taken as bias: The Hausdorff distance method was used to evaluate the distributed trajectory similarity of annual RZSM, which was computed as the following formula [46,47]: where · denotes the distance norm between line L i and L j .
Usually, the Hausdorff distance reflects the maximum mismatch of two distributed trajectories: the larger D H , the higher mismatch between the two trajectories; and vice versa. Specifically, herein the larger Hausdorff distance between the observed and one provided/model-simulated RZSM dataset, the worse performance of this dataset in reflecting the annual RZSM changes.

Distribution of Optimum T Parameter
The EF model was calibrated and validated using ERA5 SSM and RZSM datasets for each 10 km grid, obtaining the distribution map of the optimum T parameter (T opt ) ( Figure 3). In the upper left box diagram, T opt for all grids were in the range of [2,4], with an average of~3 days. In terms of spatial T opt distribution, the optimum T parameter showed an increasing trend from the eastern humid areas to the western semi-humid areas: the minimum T opt occurred in the southern Middle-lower Yangtze Plain and northeastern Northeast China Plain areas (humid areas with annual rainfall >800 mm), with a value of 1-3 days; while larger T opt was found in those semi-humid (Huang-Huai-Hai Plain) and semi-arid (Loess Plateau) areas, with most values in 4-10 days. As mentioned above, T opt is a comprehensive parameter affected by multiple factors, which is not only related to the climatic conditions and vegetation in the region, but also affected by the soil properties. For instance, the fine texture with low sand content and high clay content, will affect the movement and stagnation of soil water, which can be reflected in the parameter T opt .

Overall Performances
For all 10 km grids, the performance of EF model was evaluated using the three statistical metrics, i.e., NSE, RE, and RMSE. Overall, EF model showed good performances both in the calibration and validation period, as presented by the violin diagrams ( Figure 4

Evaluation on ERA5-Derived RZSM against In Situ Observations
In this section, the ERA5-derived RZSM using the EF model was evaluated against the in situ observed RZSM (10-30 cm). Specifically, the spatial-temporal comparisons between ERA5-derived and observed RZSM were performed at a regional scale. Furthermore, a quantitative evaluation was carried out by interpolating ERA5-derived RZSM dataset from grid to each observation station.

Spatial Comparison between Observed and ERA5-Derived SM
By contrast with the in situ observed RZSM, both ERA5-provided RZSM and ERA5derived RZSM by the EF model can capture the spatial variations of RZSM in a large part ( Figure 5): spatial distributions of RZSM were effectively estimated in the study area, despite the overestimation in some marginal areas, e.g., the southern Middle-lower Yangtze Plain. Meanwhile, ERA5-derived RZSM by the EF model showed similar spatial performance as the ERA5-provided RZSM itself in estimating RZSM, which even presented higher estimation accuracy for some semi-humid and -arid areas against the in situ observed RZSM. Despite the slight difference between the depth of 7-28 cm and 10-30 cm, soil moisture will not show much variations between the two depths; thus, the in situ ground RZSM observations at 10-30 cm could act as the approximate "ground reference" to evaluate the ERA5-provided and -derived RZSMs at 7-28 cm.

Temporal Comparison among Different Agricultural Zoning Areas
The temporal changes of ERA5-derived RZSM by the EF model were compared with the in situ observed RZSM in Middle-lower Yangtze Plain, Huang-Huai-Hai Plain, Northeast China Plain, and Loess Plateau, respectively, by computing four statistical metrics (i.e., CC, bias, RMSE, and NSE) between the two RZSM series ( Figure 6). For all agricultural areas, the EF-simulated RZSM could well capture the temporal changes of in situ observed RZSM, as presented by the large CC (all >0.7) between the simulated RZSM and the observations. Meanwhile, the bias and RMSE between ERA5-derived RZSM and observed RZSM varied with different agricultural zonings, which were minimum for Huang-Huai-Hai Plain (bias: −0.003 m 3 /m 3 ; RMSE: 0.03 m 3 /m 3 ) and relatively larger for the other areas (bias: 0.047~0.079 m 3 /m 3 ; RMSE: 0.06~0.08 m 3 /m 3 ). Despite the low NSE before bias calibration, higher NSE (0.37~0.70) occurred after eliminating bias of ERA5derived and -provided RZSM against the in situ observed RZSM. By contrast, the temporal changes of ERA5-derived RZSM by the EF model were close to that of ERA5-provided RZSM, which was further verified by the similar performance metrics (Figure 6), indicating that ERA5-derived and -provided RZSM series had similar performances on depicting the RZSM changes with time in different agricultural areas.

Quantitative Comparison against Ground Observation Sites
The ERA5-derived RZSM at~10km grid was interpolated to each observation station, which was then quantitatively evaluated against the in situ observed RZSM series. The overall statistical results of all observed stations and regional statistics within four agricultural zoning areas were presented by the following violin diagrams (Figure 7). The average CC (CC) between ERA5-derived RZSM and the ground observed RZSM series were in the range of 0.42-0.6, fluctuating around 0.5, and the corresponding p values were below 0.05, as presented by the 1st quantile Q1 which was under the red short dot line (p = 0.05). Thereinto, the humid areas, i.e., Middle-lower Yangtze Plain, showed the largest CC with the observed RZSM series (CC = 0.60) and the lowest p value (far below the line of p = 0.05); meanwhile, the Loess Plateau (semi-arid areas) showed the lowest CC (0.29-0.53, CC = 0.42). The bias and RMSE between ERA5-derived RZSM and the ground observed RZSM series were in the range of −0.01-0.04 m 3 /m 3 and 0.07-0.09 m 3 /m 3 , respectively, indicating that the estimation error of ERA5-derived RZSM was small compared to the in situ measured RZSM. By contrast, the simulated RZSM by EF model could attain a good estimation accuracy as that of ERA5-provided RZSM ( Figure S1).

Evaluation of Root-Zone SM Estimated from SMAP L3 Surface SM
This section attempted to derive the RZSM from SMAP L3 SSM using the abovementioned established ERA5-based EF model, and on basis of which evaluated the SMAP L3-derived RZSM against in situ observed RZSM.

Temporal Comparison between Observed and SMAP L3-Derived RZSM
Four statistical metrics (i.e., CC, bias, RMSE, and NSE) compared to the observed RZSM series were computed to evaluate the temporal performance of SMAP L3-derived RZSM (Table 3). Overall, SMAP L3-derived RZSM in all agricultural areas could well capture the temporal changes of in situ observed RZSM, as presented by the large CC (CC: 0.53~0.82, p < 0.001) between the simulated RZSM and the observations. Furthermore, the bias and RMSE between SMAP L3-derived RZSM and observed RZSM varied with different agricultural zonings, which were small for Middle-lower Yangtze Plain and Northeast China Plain (Humid areas) but large for Huang-Huai-Hai Plain and Loess Plateau areas (semi-humid/-arid areas). This is consistent with those conclusions that EF model provides better performances under humid conditions [14]. Meanwhile, lower NSEs (a few negative) were found for all agricultural zonings before bias calibration; however, larger NSE (0.35~0.48) could be attained once the bias of SMAP L3-derived RZSM were corrected compared to the observed RZSM. By contrast with the SMAP L4-provided RZSM product assimilated using the EnKF method, the SMAP L3-derived RZSM by the EF model had similar performance metrics for all agricultural areas. Table 3. Performance metrics of SMAP L3-derived and L4-provided RZSMs against the in-situ ground observations in five different agricultural zoning areas. (Herein, all statistical metrics were calculated after bias correction, and ** indicates significant correlation at p < 0.001).

Accuracy Evaluation Using Ground Observation Sites
Similarly, the SMAP L3-derived RZSM at 10 km grid was interpolated to each observation station, and was then quantitatively evaluated using the in situ observed RZSM series. The following box diagrams showed the overall statistical results of all observed stations and regional statistics within four agricultural zoning areas (Figure 8  Moreover, the CC, bias and RMSE compared to the observed RZSM series between SMAP L3-derived and SMAP L4-provided (after interpolation) RZSM showed similar spatial distributions within the whole evaluated area (Figure 9). Particularly, SMAP L3derived RZSM even outperformed the SMAP L4-provided products in the central part of Northeast China Plain (SMAP L3-derived CC: 0.49; SMAP L4-provided CC: 0.41), where the latter cannot cover at that moment. Additionally, one typical area located in the northern part of Haihe River basin in China was zoomed in from the spatial error map (Figures 9 and 10). Larger estimation errors and uncertainties were found in those Cropland areas, where the underlying surface are strongly perturbed by complex human activities, e.g., mining groundwater, agricultural irrigation, which may largely influence the supply and transport conditions of soil water. Figure 10. Spatial distribution of bias and RMSE among different land uses zoomed from the red rectangle area in Figure 9. (Note that the zoomed area is mainly the northern part of Haihe River basin in China).

Seasonality of Estimated RZSMs
The seasonality of all estimated RZSMs were further compared with the in situ observations in Huang-Huai-Hai Plain areas ( Figure 11). Herein, Huang-Huai-Hai Plain was chosen as the typical case mainly considered the most densely distributed SM observation stations in those areas that can best represent the ground reference of SM dynamics [43]. The Hausdorff distances and Person CC were computed as performance metrics of reflecting the seasonality by different RZSM versions. Of these RZSM trajectories, the ERA5-derived and -provided RZSMs showed similar performance in reflecting seasonal changes of RZSM, as presented by similar CC of 0.96 and 0.95 (p < 0.001) with the observed RZSM series. The Hausdorff distances between ERA5-derived and -provided RZSMs away from the observed RZSM trajectory also proved their similar seasonality reflection performance, reaching 0.36 (H 1 ) and 0.33(H 2 ), respectively. Furthermore, the SMAP L3-derived RZSM showed similar performance with the SMAP L4-provided RZSM in reflecting seasonal RZSM distribution: the CC of SMAP L3-derived and SMAP L4-provided RZSM with the observed trajectory reached 0.74 and 0.77 (p < 0.001); and the corresponding Hausdorff distances were 1.5 (H 3 ) and 1.06 (H 4 ), respectively. Figure 11. Seasonality of soil moisture distribution among different RZSMs estimated from multisource SM datasets. (Herein H 1 , H 2 , H 3 , and H 4 indicates the Hausdorff distance of ERA5-derived, ERA5-provided, SMAP L3-dervived, and SMAP L4-provided RZSM against the observed RZSM trajectories, respectively, and note that all data series were standardized using the Z-score method).

Discussion
This work proposed a rapid estimation scheme for estimating RZSM with short latency and small computations, and tested over the central and eastern agricultural areas of China. Two main calculating procedures were involved in the scheme. The calibrated and validated EF model in the first step could successfully attain as good performance on estimating RZSM as the ERA5 reanalysis product itself. Firstly, the calibrated optimum parameter T (T opt ) showed a reasonable distribution with different wet-dry conditions, as presented by an increasing trend from the eastern humid areas (1-3 days) to the western semi-humid areas (4-10 days) (Figure 2). The single parameter T is simple to calibrate, however, giving a physically-based explanation on this parameter has long remained a key problem to be solved in previous studies [37,40,48]. From the model perspective, the larger parameter T, the smaller K n , indicating that the RZSM at t n (ϑ R (t n )) relies less on the surface SM at t n (ϑ s (t n )) but more on the RZSM at the last period (ϑ R (t n−1 )); and vice versa (as shown in Equations (4) and (5)). Parameter T could be regarded as a "pseudodiffusivity" that controls how much of the SSM infiltrates into the root zone layers and the retention time [39]. Physically, for those humid areas, e.g., Middle-lower Yangtze Plain (annual rainfall >800 mm) with mechanism of excess storage runoff, where root-zone soil moisture is highly related with the surface soil moisture due to its role in partitioning how much rainfall to deeper soil; while, those relatively arid areas often present excess infiltration, where deeper-layer soil moisture rarely rests on the changes of surface soil moisture. Consistently, the spatial distributions of T opt by EF model well reflected the varying hydraulic properties of underlying surface. Such relationships between the surface and root zone soil moisture among different wet-dry areas were also consistent with the previous report based on the China Ecosystem Research Network [49]. The good reflection of hydrological and meteorological conditions by model parameters may further strengthen the physical mechanism of EF methods to some extent [19].
Secondly, the applicability of this calibration approach was demonstrated by the overall performance indexes as well as the comparison against the in situ observed RZSM. On one hand, the EF model for all calculating girds showed high NSE (NSE : 0.82, 0.78), and low RE (RE:~10% m 3 /m 3 ) and RMSE (~0.08 m 3 /m 3 ) both in calibration and validation period (Figure 3), which indicated that the calibration and variation using ERA5-provided SSM and RZSM datasets was desirable for the EF model in those study areas. The applicability of such calibration of EF methods using reanalysis datasets was also proved by other studies [39]. On the other hand, by contrast with the in situ observed RZSM series at 10-30 cm depth, the ERA5-derived RZSM (7-28cm) by the EF model could capture the temporal-spatial variations of RZSM in different agricultural zonings, as presented by the large CC (all >0.7), low bias (|bias| < 0.08 m 3 /m 3 ), and RMSE (all <0.08 m 3 /m 3 ), as well as the high NSE (0.37~0.61) between the simulated and observed RZSM series (Figures 4 and 5). The quantitative evaluation results at each observation sites further verified the good estimation accuracy of ERA5-derived RZSM with average CC, bias and RMSE reaching the range of 0.42-0.6, 0.01-0.04 m 3 /m 3 , and 0.07-0.09 m 3 /m 3 , respectively ( Figure 6). Moreover, the ERA5-derived RZSM by the EF model could well reflect the seasonal changes of in situ observed RZSM, due to the short Hausdorff distances with the observed (H = 0.36) (Figure 11). The good correspondence between the simulated RZSM and the observations at~30cm depth was also found in previous studies [16,39,50,51]. It should be noted that all those abovementioned evaluation results of ERA5-derived RZSM were also compared with that of ERA5-provided RZSM itself, indicating that the two RZSM series had similar performances in various aspects. Overall, the simulated RZSM by EF model could be a desirable alternative of ERA5-provided RZSM, considering its efficiency in rapidly obtaining RZSM, and thus solving the longer data latency and large computation costs of ERA5 in land surface models [25].
Necessarily, a reliable input of surface soil moisture dataset, which should be accessible with short data latency (real time is the best if possible), is the critical issue for EF methods to be solved [16,37]. This work adopted the fast-updating SMAP L3 SSM product (assured latency within 50 h) as the input to EF models. By contrast with the in situ observed RZSM, SMAP L3-derived RZSM by EF model presented good performances on capturing the temporal RZSM changes over all agricultural areas (Table 3); meanwhile, the quantitative evaluation for each observed site also proved the good estimation accuracy of SMAP L3-derived RZSM (Figure 7). More importantly, all those evaluation results of SMAP L3-derived RZSM by the EF model showed as good estimation accuracy as that of SMAP L4-provied RZSM dataset, together with the performances in capturing temporal RZSM changes and reflecting seasonality of RZSM ( Figure 8). Although SMAP L3-derived RZSM was slightly inferior to SMAP L4 product in some agricultural areas (e.g., Middlelower Yangtze Plain), this is likely due to the inherent discrepancy of surface SM (0-5 cm) provided by the two products [7,39]. The SMAP L4 RZSM dataset is known as the assimilation product merged from the lower-level SMAP data using the EnKF technique [22,32], which considers the uncertainties from both model product and satellite observations, and therefore, is found superior to model and satellite data alone [52]. Nevertheless, SMAP L4-provided products are not full area coverage ( Figure S2), in some specific areas, e.g., the central part of Northeast China Plain, SMAP L3-derived RZSM even outperformed the interpolated SMAP L4-provided RZSM (Figure 9). Additionally, Cropland areas with complex human activities, e.g., crop planting and irrigating, which may cause larger estimation errors and uncertainties due to the influences on the supply, transport, and conduction conditions of soil water.
It should be noted that the RZSM depth of SMAP L4 was defined as soil moisture in the top 1 m of the soil column [32]; however, in most cases for those agricultural areas, a layer of 10-30 cm depth soil is primarily concerned due to its important role for crop growth [12,53]. According to Raza et al. [12], most temperate crop roots grow at the uppermost~15 cm depth of the soil, and further indicated that 61-78% of the root biomass of various crops were found in the top 30 cm layer of soil. At this point, SMAP L3-derived RZSM by the EF model could provide more direct information than SMAP L4 RZSM at the 10-30 cm depth of crop root areas. Adding the data latency of SMAP L4 RZSM product (latency of various input data plus processing time, assured latency within seven days) and the high computation costs of the EnKF assimilation technique in process-based models [25,54], the estimation scheme proposed in this work might be a more convenient and rapid approach appropriate for estimating RZSM over large agricultural areas [16].
In sum, the proposed calculation scheme in this work for estimating root zone soil moisture using the EF method was initially driven by the surface and root-zone SM dataset of ERA5, which attained as good performances as ERA5-provided RZSM itself in various aspects. The input of almost real-time SMAP L3 SSM dataset, together with the good estimation accuracy of SMAP L3-derived RZSM, facilitated the estimation scheme based on the EF method to be a desirable alternative for estimating RZSM with short data latency and small computation. Under the smart agricultural initiative in modern agriculture, such estimation scheme can effectively utilize the latest released satellite SM products, and provide timely guidance for agricultural water management, e.g., drought monitoring for crop growth.
One thing should be kept in mind that although only one parameter T needed for calibration does save much computation, the physical explanation of T needs to be further considered. Parameter T was found to be related with all physical process affecting the soil moisture dynamics, e.g., evapotranspiration, the hydraulic characteristics of the soil, texture, density, thickness, and the number of soil layers [40]. Considering the influences of those processes when calibrating parameter T for each grid in future may imply more specific illustrations on the physical mechanisms of the EF model. This may contribute to improve the worse performances of EF model in relatively arid areas than those humid areas. In addition, bias correction or multi-source data fusion can enhance data quality of input to the EF model, which may further attain higher estimation accuracy of RZSM.

Conclusions
This work established a rapid estimation scheme suited for estimating root zone soil moisture with short latency and small computations over central and eastern agricultural areas of China. First, the EF model was calibrated and validated using the surface and root-zone SM of ERA5 reanalysis dataset, to obtain the optimum parameter T for each grid based on the largest NSE between the ERA5-derived RZSM by the EF model and ERA5-provided RZSM. Second, the ERA5-derived RZSM by the EF model was compared with the observed RZSM series in different agricultural areas, including spatial-temporal and seasonal performances from site to regional scale. Finally, a scale-matched optimum T value together with SMAP L3 SSM was used as inputs into the EF model to retrieve RZSM estimation for each grid, and the performances of SMAP L3-derived RZSM series were also quantitatively evaluated against the in situ observations. The major conclusions of this work are as follows: 1.
The calibrated optimum parameter T showed an increasing trend from the eastern humid areas (1-3 days) to the western semi-humid areas (4-10 days), which is in line with the mechanism of local runoff generation, verifying the physical mechanism of the EF model to some extent; 2.
The applicability of the calibration approach using ERA5 SSM and RZSM dataset was demonstrated: (1) EF model in all calculating girds showed high NSE (NSE : 0.82, 0.78), and low RE (RE:~10% m 3 /m 3 ) and RMSE (~0.08 m 3 /m 3 ) both in calibration and validation period; (2) EF-simulated RZSM could capture the temporal-spatial and seasonal variations of RZSM by comparison with the in situ observed RZSM series among different agricultural zonings, as presented by the large CC (all >0.7), low bias (|bias| < 0.08 m 3 /m 3 ) and RMSE (all <0.08 m 3 /m 3 ), as well as the high NSE (0.37~0.61) between the simulated and observed RZSM series; 3.
The SMAP L3-derived RZSM by the EF model presented good performances on capturing the temporal RZSM changes over all agricultural areas. Moreover, the quantitative evaluation at each observed site also proved the good estimation accuracy of SMAP-derived RZSM. SMAP L3-derived RZSM even outperformed the interpolated SMAP L4-provided RZSM in some specific areas; 4.
The fast-updating SMAP L3 SSM product facilitated the proposed estimation scheme a desirable alternative for estimating RZSM with short data latency and high computa-tional efficiency. Such estimation scheme presents a distinct advantage in agricultural water management under the modern smart agriculture initiative.
Supplementary Materials: The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/rs14081785/s1, Figure S1: Comparison of ERA5-provided and -derived RZSM series against the in situ observed RZSM for all observation sites of the study area; Figure S2: Covering scale of SMAPL3-derived (a) and SMAPL4-provided (b) RZSM datasets at the same time (at 00:00 UTC) within the study area. Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.