#### 2.2.1. Methods Description

(a) Cumulative Distribution Function (CDF) Matching

The CDF matching approach has been widely used for removing the systematic differences between two series, such as bias reduction in satellite-observed SSM [

31,

32,

33,

48]. The method can also be used to transfer the data of different areas [

25] and upscale the point data measurements [

26]. To implement CDF matching, the first step is ranking the reference, and the to-be scaled data. Second, calculate the differences between the corresponding data of two ranked datasets and then perform linear regression analysis segment by segment on the CDF curves of the calculated differences and the to-be scaled data. Lastly, use the CDF matching parameters to scale the to-be scaled data for each segment [

49]. The scaled data would show a similar pattern with the CDF curve of the reference data, which indicates that the systematic difference between the original observation and the reference data has been eliminated. It is worth noting that, among four conventional scaling methods (linear regression, linear rescaling, MIN/MAX correction, and CDF matching), the CDF matching shows the best performance [

48], and it only requires one year data to calibrate [

33].

(b) Least Squares Method

The objective of least squares method is determining optimal weights for merging two or three independent datasets while using a weighted average method. It is one of the most widely used data assimilation methods, which has been used in numerous studies since it was shaped into the current form. It was used to blend remotely sensed and model-simulated SSM products with or without in-situ data constraint [

50].

When the target product is merged as a linear combination of single products, the equation of data merging can be expressed as:

where

${\omega}_{a}$,

${\omega}_{b}$,

${\omega}_{c}$ are the relative weights of data sets

a,

b,

c, and

$S{M}_{m}$ is the target merged product. The merged product is unbiased optimal if the summation of optimal weights is one. It is a constraint of the solution to the estimation error variance minimization problem. Accordingly, the solution to minimize the error variance of

$S{M}_{m}$ relate to weights

${\omega}_{a}$,

${\omega}_{b}$,

${\omega}_{c}$ could be calculated from relative error variance

${\sigma}_{a}^{2}$,

${\sigma}_{b}^{2}$,

${\sigma}_{c}^{2}$. The to be minimised error variance of

$S{M}_{m}$ can be expressed as

Assume

$\partial {\sigma}^{2}/\partial {\omega}_{a}^{2}=0$ and

$\partial {\sigma}^{2}/\partial {\omega}_{c}^{2}=0$, the equations to determine the relative weights by using relative errors are presented below:

The method can also work in two datasets situation. The equations are presented below:

(c) Triple Collocation Analysis

Triple collocation was used to determine the relative errors of each input datasets in this research. Triple collocation is an error estimation method that can be used to estimate the random error variances and systematic biases in different datasets without reliable reference data sets. It improved the accuracy of calibration or validation when compared with the dual comparisons that were widely used before. To implement the triple collocation method, three independent datasets should be used jointly to determine the relative errors [

51]. The error variances of each dataset can be presented as:

where

${\sigma}_{a}^{2}$,

${\sigma}_{b}^{2}$,

${\sigma}_{c}^{2}$ are the data variances and

${\sigma}_{{\epsilon}_{a}}^{2}$,

${\sigma}_{{\epsilon}_{b}}^{2}$,

${\sigma}_{{\epsilon}_{c}}^{2}$ are the errors variances.

${\sigma}_{a,b}$,

${\sigma}_{b,c}$,

${\sigma}_{a,c}$ are data covariance.

(d) Soil Moisture Analytical Relationship (SMAR) Model

The SMAR model is a physically based infiltration model with two layers, which aims to estimate the RZSM (the second soil layer) from the SSM (the first soil layer) time series.

where

${S}_{2}$[-] is the relative saturation of the second soil layer,

${S}_{w2}$[-] is the wilting point of the second soil layer, and

$y\left({t}_{j}\right)$[-] represents the fraction of soil water infiltrating from the top layer to the lower layer. Coefficients

a and

b represent:

where

${V}_{2}$[

$L{T}^{-1}$] is the soil water loss (evapotranspiration and percolation) coefficient,

${n}_{i}$[-] represents the porosity of

${i}^{th}$ layer of soil, and

${Z}_{ri}$[

L] is the soil depth of

${i}^{th}$ layer.

As it is an analytical method, the parameters of the SMAR model were mathematically derived under some assumptions. First, when compared with infiltration, capillary rise and lateral flow are negligible in water mass exchange between two layers. Second, the water exchange happens immediately and ends within one day when soil water in the first layer exceeds field capacity, with an infinite permeability. Third, the soil water loss of the second layer decreases linearly from a relatively humid (does not include the real humid condition with a significant non-linear water loss function) condition to the wilting point [

24,

29,

52].

#### 2.2.2. Processing Procedures

The central processing procedures include satellite data merging, SSM data blending, and RZSM generating (the flowchart of processing procedure is presented in

Figure 2). The division of the calibration period, the product period, and the validation period were based on data availability.

(a) Satellite Data Merging

Satellite data merging aims to merge all of the available passive microwave observation data into one PASSIVE product. The ESA-CCI PASSIVE product [

44,

45,

46] was merged in the same way using TRMM Microwave Imager, WindSat Radiometer, AMSRE, AMSR2, and SMOS, but without SMAP, so, it will be used for comparison. As for active microwave observations, the ESA-CCI Merged ACTIVE products were used directly, as it was merged using Advanced Scatterometer (Metop) which will be applied the same way for this research. Nevertheless, the approach in our research differs from ESA-CCI, in that the in-situ SSM climatology is taken into account before the final SSM blending.

● Rescaling while using CDF matching

Differences in sensor specifications, particularly in microwave frequency and spatial resolution, result in different absolute SSM values from AMSR2, SMOS, SMAP, and AMSR-E. It is needed to scale datasets into a common climatology using the CDF matching method. The climatology of AMSR-E was selected as the reference, because the AMSR-E SSM retrievals were identified as more accurate than other passive products due to the relatively low microwave frequency and high spatio-temporal resolution of the sensor [

33].

● Error Characterization using triple collocation

Error characterization aims to obtain the relative errors (stationary average random errors) of each dataset while using triple collocation analysis. The triple collocation analysis was performed for those pixels, where two or three datasets overlapped and were uncorrelated [

53]. Furthermore, the uncertainties of the target product (PASSIVE product) can also be determined from the error variances of every single product.

● Optimal weights calculated while using the least-squares method and weighted averaging

The scaled satellites data were merged using a weighted average on a pixel basis which considers the error properties of each dataset. The optimal weights for the weighted average were determined by the relative error variances of all of the input datasets over each specific merging period using the least-squares method. The merging method works in both three datasets and two datasets cases, as relative errors of each dataset have been determined. However, for specific locations, triple collocation analysis does not yield valid error estimates. In such cases, the weights were equally distributed amongst the available datasets.

(b) SSM Data Blending

● Rescaling while using CDF matching method

First, as a reference dataset, ERA-Interim SSM products over the Calibration Period was scaled based on the obtained in-situ data climatology using CDF matching, which can also produce the seasonal CDF matching parameters for climatology scaling of ERA-Interim data over the Product Period (see

Figure 2). The above-mentioned seasons are defined as Monsoon (May–October), Transaction1 (April), Winter (December-March), and Transction2 (November). Subsequently, the ERA-Interim SSM data over Product Period were scaled using the seasonal CDF matching parameters.

Next, another CDF matching was performed to scale each merged satellite dataset (PASSIVE and ACTIVE products generated in satellite data merging step) based on the in-situ scaled ERA-Interim data climatology over the Product Period.

Error Characterization using triple collocation

The relative errors among the scaled PASSIVE, ACTIVE, and ERA-Interim products were calculated while using the triple collocation method, which used three collocated datasets to constrain the relative error variance determination without a manually decided reference.

Optimal weight calculation using the least-squares method and weighted averaging

Similar with satellite data merging, a weighted average was used to merge scaled PASSIVE, ACTIVE, and ERA-Interim products over the Product Period and the optimal weights were obtained based on the relative errors while using least squares method.

(c) Deriving RZSM

● Depth Scaling Using CDF Matching

CDF matching method was used to calculate RZSM from the blended SSM data. The Tibet-Obs in-situ measured SSM and RZSM datasets [

1] over the Calibration Period were used to generate the depth scaling parameters. Note that the RZSM refers to a mean value of the SM in the 0–80 cm depth:

where

$i$ is the index of the soil layer;

${\theta}_{p}$[L

^{3}L

^{−3}] is the RZSM;

${L}_{i}$[L] is the soil layer depth; and,

${\theta}_{i}$[L

^{3}L

^{−3}] is soil moisture in the

${i}^{th}$ soil layer. The specific Tibet-Obs product used in this research includes the soil moisture data at the depth of 5 cm, 10 cm, 20 cm, 40 cm, and 80 cm.

The scaling parameters of the fifth-order polynomial fitting function were generated from the SSM and the differences between the corresponding values of ranked SSM and RZSM. Gao et al. [

30] identified the fifth-order polynomial as the optimal choice based on a pre-analysis to define the fitting function for depth scaling. The scaling parameters were used to estimate the predicted difference between SSM and RZSM over the Product Period; then, the predicted RZSM were generated while using blended SSM and the predicted differences.

● RZSM Estimation Using SMAR model

In the four in-situ measurement networks of Tibet-Obs (Ali, Shiquanhe in an arid region, Naqu in semi-arid region, and Maqu in sub-humid region), SMAR model parameters were derived mathematically over the one-year Calibration Period while using the SSM and RZSM (same with the datasets used for depth scaling using CDF Matching). The parameters were applied over the entire Product Period.