Error Decomposition of Remote Sensing Soil Moisture Products Based on the Triple-Collocation Method Introducing an Unbiased Reference Dataset: A Case Study on the Tibetan Plateau

: Remote sensing (RS) soil moisture (SM) products have been widely used in various environmental studies. Understanding the error structure of data is necessary to properly apply RS SM products in trend and variation analysis and data fusion. However, a spatially continuous assessment of RS SM datasets is impeded by the limited spatial distribution of ground-based observations. As an alternative, the RS apparent thermal inertia (ATI) data related to the SM are transformed into SM values to expand the validation space. To obtain error components, the ATI-based SM along with the Soil Moisture Active Passive Mission (SMAP) and Advanced Microwave Scanning Radiometer 2 (AMSR2) SM are applied with the triple-collocation (TC) method to evaluate the RS SM data regarding random errors and amplitude variances at the regional scale. When the ATI-based SM is regarded as the reference data, the amplitude biases of the other two datasets are determined. The mean bias is also estimated by calculating the mean value di ﬀ erence between the ATI-based and validated RS SM. The results show that the ATI-based SM is a reliable source of reference data that, when combined with the TC method, can correctly estimate the error structure of RS SM datasets in wide space, promoting the reasonable application and calibration of RS SM datasets.


Introduction
Soil moisture (SM) plays an important role in the climate system and influences water, energy, and carbon cycles [1][2][3]. Long-term and large-scale SM monitoring can promote research on SM-climate interactions [4]. With the growth of research interest on the relationship between SM and climate in recent years, the demand for SM observations has increased [5]. Satellite remote sensing (RS) technology has been considered a principal way to provide long time series of SM datasets with 25-40 km of spatial resolution at the global scale [6]. In recent decades, numerous satellite sensors have enriched current SM datasets in real time. Due to several factors, e.g., platform design, instrument configuration, and retrieval algorithm, different RS SM datasets have their own characteristics [7], which are expressed in their systematic and random errors, and the systematic error consists of mean bias (first-order) and amplitude bias (second-order). Therefore, understanding the error structure of RS SM datasets is beneficial for mining valuable information, e.g., in data fusion studies [8,9].

Study Area and Data
The study area is situated on the Tibetan Plateau (TP), which has an average elevation of approximately 4000 m. The temperature increases from west to east, and precipitation shows a decreasing trend from southeast to northwest. According to the International Geosphere-Biosphere Programme (IGBP) classification scheme, the TP is mainly covered by grasslands and bare (less than 10% vegetation), accounting for more than 90% of the total area.
There are a number of observation sites on the TP, which is beneficial to the study on the error decomposition of remotely sensed SM products. As shown in Figure 1, the SM observation networks are deployed in Ngari, Naqu, Maqu, and Babao [22,31]. The differences in the temporal variation among four observation networks are shown in Figure 2. The in-situ measurements obtained at a depth of 5 cm from May 2015 to September 2015 are adopted. The Moderate Resolution Imaging Spectroradiometer (MODIS) albedo (MCD43B3) and land surface temperature (LST; MOD11A1 and MYD11A1) data are used to calculate the apparent thermal inertia (ATI) with a spatial resolution of 1 km, which is transformed into SM values by building the relationship with SM measurements from Ngari and Maqu networks separately covering bare areas and grasslands. The ATI-based SM is considered the reference data, and its applicability on the TP is validated using the relative SM truth estimated using the Voronoi diagram method and the modeled surface SM with 0.25 • × 0.25 spatial resolution from the Global Land Data Assimilation System (GLDAS) [32] of the National Aeronautics  [34] as experimental datasets combined with the ATI-based SM are adopted to perform error decomposition based on the TC method.

Error Decomposition
Validation of RS SM products is generally based on temporal information. The error structure of temporal data can be characterized by systematic and random errors, and the systematic error has   [34] as experimental datasets combined with the ATI-based SM are adopted to perform error decomposition based on the TC method.

Error Decomposition
Validation of RS SM products is generally based on temporal information. The error structure of temporal data can be characterized by systematic and random errors, and the systematic error has

Error Decomposition
Validation of RS SM products is generally based on temporal information. The error structure of temporal data can be characterized by systematic and random errors, and the systematic error Remote Sens. 2020, 12, 3087 4 of 12 has usually been represented as the biases of the temporal mean and amplitude range in previous validation cases. Therefore, the time series of RS SM data S i can consist of three components: where β i is the temporal mean value, α i is the multiplicative factor of the true amplitude process θ with E(θ) = 0, and ε i is the additive zero-mean random noise. The error of S i can be partitioned via the root-mean-square error (RMSE) between S i and the reference data: where var(·) computes the temporal variance. The last two items are the variances of the random errors. It is assumed that the random errors are not cross correlated with each other in Equation (2) or with the other terms in Equation (1). If S j is the reference data, error metrics of each component of S i can be expressed as follows [35]: where β ij is the temporal mean bias of S j and α ij is the amplitude bias of S i with α j = 1. The random error of S i is measured using the standard deviation of random error σ ε i . The uncertainty of amplitude α i θ is expressed as the RMSE between α i θ and α j θ.
In this paper, the error structures of AMSR2 and SMAP are analyzed using the TC method to estimate the parameters var(ε i ), var(α i θ), and α in combination with a reference dataset [36]: where σ 2 ε is the same as var(ε) in Equation (2) and C ij computes the covariance between S i and S j . The product term of C ij represents the variances of the amplitudes α i θ: When S 1 is treated as the reference data, the multiplicative factors of the other two temporal datasets are estimated as follows: The multiplicative factors of S 2 and S 3 representing the amplitude biases can be solved without being affected by random errors.

Determination of the Reference Dataset
The key to error decomposition based on the TC method is to select a reliable reference dataset. In-situ measurements have always been considered as reference data but have a limited spatial extent. To obtain the error structure of RS SM products in wide space, the extra ancillary data need be introduced. Optical satellite sensors with thermal infrared channels have received a great deal of attention for representing regional SM at the fine spatial scale. In this paper, the remotely sensed ATI Remote Sens. 2020, 12, 3087 5 of 12 related to the surface water status is considered ancillary data to capture the spatial heterogeneity of SM and is computed with the MODIS albedo and LST products as follows [37]: where A is the solar correction factor, ω is the surface albedo derived from the MODIS albedo product, and ∆T is the maximum daily amplitude of the LST calculated with the MODIS LST product. To extend the in-situ measurement information to a wider space, an empirical relationship between the in-situ SM measurements θ in−situ and the ATI τ can be formulated as follows: where η represents the modeling residuals. The relationship function f (·) is used to transform the ATI to SM. The ATI-based SM is resampled by averaging the SM obtained from Equation (8) within pixels of 0.25 • .

Obtaining the Soil Moisture Reference Dataset
The MODIS albedo and daily LST products are applied to calculate the daily ATI from May 1, 2015, to September 30, 2015 with Equation (7). The temporal ATI mean values at each pixel represent the finer spatial distribution of the SM versus the ground-based observations (Figure 3a). To spatially extend the measurement information of the ground-based sites, the relationship model between the ATI and SM is built using in-situ observations and their corresponding ATI at pixels from May 2015 to September 2015. A monotonically increasing natural logarithm function is fitted with a goodness of fit value of 0.732 (Figure 3c), which is employed to transform the ATI data into SM values at a spatial resolution of 1 km with Equation (8). To match the RS SM products in space, the transformed SM is resampled to 25 km as reference data, which represent the fact that SM values are higher on the southeast of the TP where the South Asian monsoon leads to a great deal of water vapor in the summer (Figure 3b). Table 1 shows the validation result of applicability of ATI-based SM in the Naqu and Babao regions. The average biases of the mean values of the temporal ATI-based SM relative to those of the ground-based observations are 0.006 cm 3 /cm 3 , meaning that the ATI-based SM has fine unbiasedness on the expectation. The amplitude bias of the ATI-based SM is estimated using the TC method along with the relative truth and GLDAS data. The closer that the α value is to 1, the smaller the amplitude bias of the ATI-based SM is. Therefore, the ATI-based SM has excellent systematic unbiasedness, showing a high α value and low temporal mean bias, meaning that it is feasible to spatially expand the information of limited ground-based observations by building the relationship with the remotely sensed data for obtaining reference data. In addition, the high systematic bias of the GLDAS SM in validation regions indicates that it is risky to evaluate other RS SM products by referencing the nonvalidated modeled data.

Error Decomposition
The random errors are partitioned from three collocated datasets (Figure 4), and there are larger random errors in the southeast of the TP, where the SM values are high. The averaged random errors of the ATI-based, SMAP, and AMSR2 SM are 0.034 cm 3 /cm 3 , 0.022 cm 3 /cm 3 , and 0.044 cm 3 /cm 3 ,

Error Decomposition
The random errors are partitioned from three collocated datasets (Figure 4), and there are larger random errors in the southeast of the TP, where the SM values are high. The averaged random errors of the ATI-based, SMAP, and AMSR2 SM are 0.034 cm 3 /cm 3 , 0.022 cm 3 /cm 3 , and 0.044 cm 3 /cm 3 , respectively. Compared with the other two datasets, the SMAP SM observed by the microwave L-band with a high sensitivity to the SM has the smallest random error. respectively. Compared with the other two datasets, the SMAP SM observed by the microwave Lband with a high sensitivity to the SM has the smallest random error.  (5) and (6), where the multiplicative factor of the ATI-based SM data is equal to 1.0. The systematic errors of the RS SM data are shown in Figure 5, and the averaged absolute values (Abs) of the temporal mean biases on the TP are 0.114 cm 3 /cm 3 for the SMAP SM and 0.067 cm 3 /cm 3 for the AMSR2 SM, accounting for 80.70% and 41.90%, respectively, of their total errors. The averaged RMSEs of the amplitudes of both SMAP and AMSR2 SM are 0.014 cm 3 /cm 3 and 0.024 cm 3 /cm 3 , respectively. The evaluation of the amplitude avoids the influences of random errors. The amplitude uncertainty of the AMSR2 SM is higher than that of the SMAP SM on the southeastern TP, and the same is true for the random errors. The systematic error generally includes the biases of the temporal mean and amplitude values. The mean values β i can be calculated with E(S i ), and since the random errors are already estimated, only the uncertainty of the amplitude is unknown in Equation (2) and can be calculated by subtracting the temporal mean bias and random errors from the total error RMSE(S i , S j ). The uncertainty of the amplitude can also be estimated with Equations (5) and (6), where the multiplicative factor of the ATI-based SM data is equal to 1.0. The systematic errors of the RS SM data are shown in Figure 5, and the averaged absolute values (Abs) of the temporal mean biases on the TP are 0.114 cm 3 /cm 3 for the SMAP SM and 0.067 cm 3 /cm 3 for the AMSR2 SM, accounting for 80.70% and 41.90%, respectively, of their total errors. The averaged RMSEs of the amplitudes of both SMAP and AMSR2 SM are 0.014 cm 3 /cm 3 and 0.024 cm 3 /cm 3 , respectively. The evaluation of the amplitude avoids the influences of random errors. The amplitude uncertainty of the AMSR2 SM is higher than that of the SMAP SM on the southeastern TP, and the same is true for the random errors.  Figure 6 shows the biases of both temporal mean and amplitude calculated by referencing the ATI-based SM. The temporal mean of the SMAP SM in nearly the entire region is underestimated. The temporal mean of the AMSR2 SM in 72.4% of study area is overestimated. The temporal mean of both remotely sensed SM products shows high biases deviating by approximately 0.20 cm 3 /cm 3 on the southeastern TP, where there are high SM values according to Figure 3b, indicating the low retrieval accuracy of the SMAP and AMSR2 SM for the temporal mean in the region with high SM values.

Analysis of Systematic Errors
The amplitude bias of the SMAP and AMSR2 SM can be evaluated by the multiplicative factor i  in Equation (1). When the ATI-based SM is regarded as reference data, the i  of the SMAP and AMSR2 SM can be calculated with Equation (6), avoiding the influences of random errors, which is different from the traditional estimation of the parameter i  through establishing a linear model with ordinary least squares (OLS). As shown in Figure 6, the amplitude of the AMSR2 SM is obviously amplified in 83.5% of the study area. Combined with the previous analysis, this finding shows that the AMSR2 SM has large systematic and random errors in the region with high SM values. The amplitude of the SMAP SM is reduced on the northern TP but amplified on the southern TP. Unlike the AMSR2 SM, the accuracy in the amplitude of the SMAP SM is not significantly affected by high SM. The systematic errors of both remotely sensed products have spatial stratification heterogeneity, indicating that it is possible to use a small number of statistical models for the calibration of the systematic error.  Figure 6 shows the biases of both temporal mean and amplitude calculated by referencing the ATI-based SM. The temporal mean of the SMAP SM in nearly the entire region is underestimated. The temporal mean of the AMSR2 SM in 72.4% of study area is overestimated. The temporal mean of both remotely sensed SM products shows high biases deviating by approximately 0.20 cm 3 /cm 3 on the southeastern TP, where there are high SM values according to Figure 3b, indicating the low retrieval accuracy of the SMAP and AMSR2 SM for the temporal mean in the region with high SM values.

Analysis of Systematic Errors
The amplitude bias of the SMAP and AMSR2 SM can be evaluated by the multiplicative factor α i in Equation (1). When the ATI-based SM is regarded as reference data, the α i of the SMAP and AMSR2 SM can be calculated with Equation (6), avoiding the influences of random errors, which is different from the traditional estimation of the parameter α i through establishing a linear model with ordinary least squares (OLS). As shown in Figure 6, the amplitude of the AMSR2 SM is obviously amplified in 83.5% of the study area. Combined with the previous analysis, this finding shows that the AMSR2 SM has large systematic and random errors in the region with high SM values. The amplitude of the SMAP SM is reduced on the northern TP but amplified on the southern TP. Unlike the AMSR2 SM, the accuracy in the amplitude of the SMAP SM is not significantly affected by high SM. The systematic errors of both remotely sensed products have spatial stratification heterogeneity, indicating that it is possible to use a small number of statistical models for the calibration of the systematic error. Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 12 Figure 6. Absolute biases of both temporal mean and amplitude.

Ground-based Validation of Error Decomposition
The averaged ground-based observations from the Babao and Naqu regions as the relative truth, along with the ATI-based, SMAP and AMSR2 SM (Figure 7), are adopted to analyze the feasibility of the TC method regarding the error decomposition of RS SM products. The ATI-based SM as reference data is expected to be systematically unbiased, which can be validated through a comparison with the relative truth. The differences in the mean value and amplitude between the ATI-based SM and the relative truth are small (Table 2), which is caused by the high correlation between the ATI and SM, represented by a correlation coefficient of 0.856. The ATI-based SM suitably inherits the systematic component of the ground-based observation. Compared with relative truth, the amplitude biases of the ATI-based SM shown in Tables 1 and 2, introducing modeled and remotely sensed SM data into the TC method, respectively, are almost the same, demonstrating that the TC method for estimating the amplitude variance is stable. Though the samples mentioned by Zwieback et al. are not adequate [38], the error decompositions of RS SM products in Table 2 are reasonable combining with Figure 7. Therefore, it is feasible to transform RS ancillary data that are closely related to the SM as reference data to analyze the error structure of RS SM datasets with the TC method. However, if the ATI-based SM is directly employed to evaluate the total error of RS SM datasets, the random error of the ATI-based SM as a redundant item in Equation (2) should be removed to improve the accuracy of the RMSE index.

Ground-based Validation of Error Decomposition
The averaged ground-based observations from the Babao and Naqu regions as the relative truth, along with the ATI-based, SMAP and AMSR2 SM (Figure 7), are adopted to analyze the feasibility of the TC method regarding the error decomposition of RS SM products. The ATI-based SM as reference data is expected to be systematically unbiased, which can be validated through a comparison with the relative truth. The differences in the mean value and amplitude between the ATI-based SM and the relative truth are small (Table 2), which is caused by the high correlation between the ATI and SM, represented by a correlation coefficient of 0.856. The ATI-based SM suitably inherits the systematic component of the ground-based observation. Compared with relative truth, the amplitude biases of the ATI-based SM shown in Tables 1 and 2, introducing modeled and remotely sensed SM data into the TC method, respectively, are almost the same, demonstrating that the TC method for estimating the amplitude variance is stable. Though the samples mentioned by Zwieback et al. are not adequate [38], the error decompositions of RS SM products in Table 2 are reasonable combining with Figure 7. Therefore, it is feasible to transform RS ancillary data that are closely related to the SM as reference data to analyze the error structure of RS SM datasets with the TC method. However, if the ATI-based SM is directly employed to evaluate the total error of RS SM datasets, the random error of the ATI-based SM as a redundant item in Equation (2) should be removed to improve the accuracy of the RMSE index.  [38], the error decompositions of RS SM products in Table 2 are reasonable combining with Figure 7. Therefore, it is feasible to transform RS ancillary data that are closely related to the SM as reference data to analyze the error structure of RS SM datasets with the TC method. However, if the ATI-based SM is directly employed to evaluate the total error of RS SM datasets, the random error of the ATI-based SM as a redundant item in Equation (2) should be removed to improve the accuracy of the RMSE index.

Conclusions
The validation of RS SM products is an important precondition of their application. The accuracy of datasets can be improved by analyzing the error structure. To decompose the errors of RS SM datasets, the temporal SM data are expressed by the sum of three components, including the temporal mean value, the amplitude, and the random error. The first two components reflect the systematic changes at the annual and seasonal scales, respectively. The objective of this study was to obtain the biases of the temporal mean and amplitude values and the variance of the random error.
To estimate the systematic biases of datasets, a reliable reference dataset is required. Ground-based measurements are commonly used as reference data in validation work; however, they are limited to a specific geographic space. The remotely sensed ATI data related to the SM are transformed into reference data to spatially extend the evaluation of RS SM datasets. To accurately assess the systematic biases of different RS SM datasets, the transformed reference data should be systematically unbiased depending on the correlation between the SM and the RS ancillary variable. It is meaningful to develop a reliable RS ancillary variable that is closely related to the SM to promote validation studies of RS SM datasets.
In this paper, the ATI-based SM as reference data combined with the TC method was applied to obtain the error structure of the SMAP and AMSR2 SM. The mean biases of the SMAP and AMSR2 SM are universal on the TP, with a large mean bias on the southeastern TP, where there is a high SM value. The amplitude bias and random error of the AMSR2 SM are more significant than those of the SMAP SM in regions with high SM values. The SMAP and AMSR2 SM cannot always retain their own advantages regarding the systematic or the random error at each pixel. Therefore, accurately determining the error structure can provide a strong basis to better understand the accuracy of RS SM datasets, which is beneficial to validation and calibration studies of RS SM datasets.

Conflicts of Interest:
The authors declare no conflict of interest.