Mapping High Spatiotemporal-Resolution Soil Moisture by Upscaling Sparse Ground-Based Observations Using a Bayesian Linear Regression Method for Comparison with Microwave Remotely Sensed Soil Moisture Products

: In recent decades, microwave remote sensing (RS) has been used to measure soil moisture (SM). Long-term and large-scale RS SM datasets derived from various microwave sensors have been used in environmental ﬁelds. Understanding the accuracies of RS SM products is essential for their proper applications. However, due to the mismatched spatial scale between the ground-based and RS observations, the truth at the pixel scale may not be accurately represented by ground-based observations, especially when the spatial density of in situ measurements is low. Because ground-based observations are often sparsely distributed, temporal upscaling was adopted to transform a few in situ measurements into SM values at a pixel scale of 1 km by introducing the temperature vegetation dryness index (TVDI) related to SM. The upscaled SM showed high consistency with in situ SM observations and could accurately capture rainfall events. The upscaled SM was considered as the reference data to evaluate RS SM products at different spatial scales. In regard to the validation results, in addition to the correlation coefﬁcient (R) of the Soil Moisture Active Passive (SMAP) SM being slightly lower than that of the Climate Change Initiative (CCI) SM, SMAP had the best performance in terms of the root-mean-square error (RMSE), unbiased RMSE and bias, followed by the CCI. The Soil Moisture and Ocean Salinity (SMOS) products were in worse agreement with the upscaled SM and were inferior to the R value of the X-band SM of the Advanced Microwave Scanning Radiometer 2 (AMSR2). In conclusion, in the study area, the SMAP and CCI SM are more reliable, although both products were underestimated by 0.060 cm 3 cm − 3 and 0.077 cm 3 cm − 3, respectively. If the biases are corrected, then the improved SMAP with an RMSE of 0.043 cm 3 cm − 3 and the CCI with an RMSE of 0.039 cm 3 cm − 3 will hopefully reach the application requirement for an accuracy with an RMSE less than 0.040 cm 3 cm − 3 .


Introduction
Soil moisture (SM) has been recognized as one of the essential climate variables (ECVs) [1]. It plays an important role in land-atmosphere interactions and has a great impact in numerous research fields, such as hydrological studies [2,3], numerical weather forecasting [4,5], agricultural management [6,7], and climate change [8,9]. As the demand for long-term and large-scale SM data in recent years has increased [9], numerous microwave remote sensing (RS) SM products at the global scale have been produced based on different instruments and retrieval algorithms [10]. These differences cause various performance characteristics, which may lead to inconsistent trends and variability in SM. Therefore, understanding the errors in RS SM datasets is beneficial to ensuring their valid applications in a variety of studies.
The validation of RS SM products aims to provide their quality by estimating error through comparison with reference data assumed to be the truth [11]. The reference data can be chosen from RS SM products, modeled datasets, representative SM transformed by an ancillary variable related to SM, e.g., the apparent thermal inertia (ATI) [12], the soil evaporation efficiency (SEE) [13] and terrain data [14], and ground-based observations. Due to the different SM representations of reference data, the evaluation results are inconsistent [15]. Thus, the key to validation activity is to obtain reliable reference data.
Ground-based observations are commonly considered to be highly accurate in assessing the performances of RS SM products. However, the validation activity based on ground-based observations is limited to performing at only a few large-scale pixels. Facing the disadvantage of ground validation, the modeled SM, e.g., reanalysis data produced by the European Centre for Medium-Range Weather Forecasts (ECMWF) [16] and the Global Land Data Assimilation System (GLDAS) [17], as reference data can provide spatially continuous performance metrics for RS SM products [18][19][20]. However, representativeness errors may still possibly exist; these errors are caused by several factors, such as the fuzzy expression of parameter heterogeneity information on the sub-grid or uncertainty of atmosphere driver data, meaning that it is difficult to quantify the systemic bias and random error of modeled SM [21]. As an alternative to reference data, an RS SM dataset is used to directly compare with other RS SM products for temporal consistency assessment [22] and triple collocation analysis (TCA) on random error [23][24][25][26][27]. Due to the different sensitivities to the SM signal caused by the microwave band, observation angle and polarization mode, the performance on time series consistency is limited. Moreover, the retrieval algorithms for various RS SM products may employ similar ancillary data, such as vegetation, surface temperature and roughness, leading to an unreliable assessment of random error due to the correlated error when the TCA method is performed [28]. To complete an accurate evaluation at the regional or global scale, a representative SM dataset as reference data is estimated by transforming RS ancillary data into SM values by building the relationship between ground-based SM and ancillary data [29]. The representative SM, along with a land surface model (LSM) dataset and an RS SM dataset, are introduced into the TCA method to obtain the error structure of each RS SM product, including the temporal mean bias, amplitude bias, and random error [30]. This reference dataset expands ground-based observation information across a wide spatial extent. The evaluation method based on the representative SM can realize pixel-by-pixel validation in the large-scale region and reduces the accuracy requirement for reference data, demanding only systemic unbiasedness.
The RS modeled and representative SM as reference data first need accuracy assessments using the ground-based observations. Though the spatial distribution is limited, the ground-based observation is always considered the most reliable reference data. A large number of ground-based networks have been built to support the validation of RS SM products around the world [31], such as SCAN (https://www.wcc.nrcs.usda.gov/scan/), SNOTEL(https://www.wcc.nrcs.usda.gov/snow/) and USCRN [32] in North America, Oznet [33] in Australia, DAHRA [34] in Africa, TERENO [35], REMEDHUS (http://campus. usal.es/~hidrus/), SMOSMANIA [36], in Europe FMI (http://fmiarc.fmi.fi/), and in China Tibet-Obs [37]. However, most SM networks are sparse, showing that only a single or very few SM sites are located on a coarse pixel scale (25 km-40 km), leading to the poor ability to capture the spatio-temporal heterogeneity of SM [38], unless an effective upscaling algorithm can be implemented. The multi-point average without any prior information of SM as an empirical method is usually employed but is not applicable when the sample size is small. For upscaling sparse ground-based observations, the upscaling algorithm based on multiple temporal ground-based observations is adopted, and the extra information e.g., representative SM or intensive field campaign measurements [39,40], is used to describe the spatio-temporal trend of SM in the upscaling process. The key for the temporal upscaling method is to determine the weighted coefficient of each temporal observation. To avoid the overfitting problem, Bayesian linear regression [41,42] and ridge regression [43] are used to solve weighted coefficients by introducing a regularization parameter. Currently, the upscaling scheme based temporal observations is the available way to transform sparse in situ measurements into pixel-scale values.
In this paper, the validation activity at the microwave pixel scale was performed based on ground-based observations with low sampling density. To accurately assess RS SM products, a temporal upscaling algorithm was adopted to map daily SM with a 1 km resolution by introducing the representative SM transformed by the temperature vegetation dryness index (TVDI). The upscaling results can capture the spatial heterogeneity of SM within the coarse RS pixel and were then resampled to match the spatial scale of RS SM products. Finally, multiple RS SM products, including the Soil Moisture Active Passive (SMAP), Soil Moisture and Ocean Salinity (SMOS), Advanced Microwave Scanning Radiometer 2 (AMSR2) and Climate Change Initiative (CCI) SMs, were evaluated.

Study Area
The study area is situated in the Babao River basin, which is located in the eastern branch of the upper reach of the Heihe River basin, northwestern China ( Figure 1). The study area has an area of 2495 km 2 , and elevation ranges from 2640 m to 5000 m. Alpine meadows are distributed in the elevation range from 2600 m to 3800 m and become sparser at elevations exceeding 3800 m; forest is dotted near 3100 m. The mean annual air temperature is approximately 0.7 • C, and the soil is unfrozen from May to September and primarily consists of sand (27-45%), silty (43-63%), and clay (10-14%) (10.3972/heihe.023.2013.db). The annual rainfall is approximately 400 mm and has an increasing trend from the northwest to southeast. Rainfall frequently occurs in summer and enhances the spatial heterogeneity of SM. As a whole, the SM on shady slopes is higher than that on sunny slopes, and the soil is wetter in the mountains than in the valleys [29].

Data
In the study area, there are numerous ground-based observations from the Ecological and Hydrological Wireless Sensor Network (EHWSN) [44] and two automatic meteorological stations (AMSs) including A'rou and Jingyangling [45], which are used for the evaluation of remotely sensed SM products. As shown in Figure 1, SM data at a depth of 4 cm observed by 16 EHWSN nodes were upscaled for mapping daily SM data with a 1 km spatial resolution in the unfrozen season from 2015 to 2017. Due to the strong heterogeneity of SM, extra ancillary data related to SM were introduced into the upscaling algorithm. The TVDI, as an ancillary variable, was calculated using the normalized difference vegetation index product (NDVI; MYD13A2) and land surface temperature (LST; MYD11A1) derived from the Moderate Resolution Imaging Spectroradiometer (MODIS). The upscaled SM was validated using the 4-cm SM observations from 11 EHWSN nodes. The daily precipitation data from two AMSs were used to detect the response of the upscaled SM to rainfall events.
The upscaled SM was resampled to match the spatial scales of the RS SM products, including the SMAP, SMOS, AMSR2, and CCI SMs. In this paper, the validated SMAP SM products were SMAP_L3_SM_P [46] and SMAP_L3_SM_E [47]. Both AMSR2 SM products were produced by the Japanese Aerospace Exploration Agency (JAXA) [48] and the National Aeronautics and Space Administration (NASA) [49,50]. The SMOS SM products used in this study included SMOS Level 3 (L3) [51] and SMOS-IC [52], respectively, produced by the Centre Aval de Traitement des Données SMOS (CATDS) and both the Institut National de la Recherche Agronomique (INRA) and the Centre d'Etudes Spatiales de la BIOsphère (CESBIO). One filter was taken into consideration to exclude unreliable SMOS-L3 data according to the probability of radio frequency interference (RFI) exceeding 0.2 (RFI_Prob > 0.2). The CCI SM product merges SM retrievals from multiple active and passive microwave sensors [53,54]. Different versions of the CCI SM products were compared with each other in this study. Each RS SM product with the ascending and descending data was reprocessed as a new single dataset by averaging both orbit data. To avoid influences of frozen soil and snow on validation accuracies of RS SM products, RS SM products in the unfrozen season (from May to September) were selected to be compared with the upscaled SM. Additional information on the RS SM products is shown in Table 1. The most direct way to evaluate the coarse-scale SM is to upscale ground-based observations by calculating their mean. The validation uncertainty is affected by the spatial observation density. In this study, the EHWSN shows a low sampling density, which makes it difficult to meet the validation requirement of large-scale RS SM products by averaging multipoint observations [55]. An upscaling algorithm based on temporal data was employed to solve upscaling spatially sparse observations.
The upscaled SM θ p t on the pth pixel from time t 1 and t M can be represented as follows: β p 0 is the constant term, and β p N is the weighted coefficient assigned to temporal observations of the Nth EHWSN node. D obs t is the observation matrix: where θ obs,Nth t M is the observation value of the Nth EHWSN node at time t M , and the upscaling problem becomes solved for the coefficient vector β p . Generally, the ordinary least squares (OLS) method is used to determine β p by minimizing the cost function J: where θ p,sample t is a vector consisting of available samples during the upscaling period t. Samples can be estimated by transforming the remotely sensed ancillary data related to SM, which is explicitly explained in Section 2.3.2 below. However, the correlation between observation column vectors in the matrix D obs t can result in as overfitting problem, which may generate a relationship model with low prediction power. To alleviate this problem, a regularization term is added to Equation (3), and then a new cost function is adopted as follows: where C = σ 2 I, and σ is the standard deviation of θ p,sample t . α is the regularization parameter. In Equation (4), parameters α and β p are unknown and can be simultaneously determined by a Bayesian method [41]. The θ p t in Equation (1) is estimated using the following expression:θ p t = D obs tβ p (5) whereβ p is the estimator of β p .θ p t is the final upscaling result at the pth pixel from time t 1 to t M . The upscaled SM located at other pixels can be obtained by repeating the above process.

Determination of Samples of the Upscaled SM
The key to performing the Bayesian method for estimating the weighted coefficient is to obtain samples of the upscaled SM θ p,sample t ; thus, extra information needs to be introduced into the upscaling process. In this paper, a simplified land surface dryness index (TVDI) based on the LST-NDVI space is used to capture the spatiotemporal changes in the upscaled SM, and the TVDI is defined as shown in Figure 2. In the LST-NDVI space, the following formula defines the TVDI to obtain the soil wetness status located at the coordinate [NDVI, LST]: where LST is the observed land surface temperature derived from the MODIS LST product. As shown in Figure 2, LST min is the minimum land surface temperature indicating the wet edge. LST max can be estimated using the linear regression model with the NDVI variable in Figure 2. The NDVI values can be obtained from the MODIS NDVI product. To obtain samples of the upscaled SM, an empirical relationship between the TVDI and ground-based SM measurements can be formulated as follows: where θ insitu represents in situ SM measurements. τ tvdi is the TVDI at pixels corresponding to in situ measurements. The regression function f (·) is used to transform the TVDI to SM as sample values in the upscaling process. η is the regression residual.

Performance Metrics
The root-mean-square error (RMSE) is always used to directly quantify the difference between temporal datasets. The RMSE defines the sum of systemic and random errors. Due to the representativeness error, a systemic bias may occur between in situ observations and the upscaled SM in the time series. To improve the validation activities between data at different spatial scales, a statistical index, the unbiased RMSE, is adopted by removing the temporal mean of the data. The correlation coefficient (R) is used to evaluate the fluctuation consistency between both time series data. When validation activities are performed at the same spatial scale, the temporal bias is also calculated. The performance metrics are defined as follows: 11) where N is the number of data. θ n and θ n represent the nth validated data and reference data, respectively, with mean values θ and θ. R indicates the correlation coefficient. Bias is the absolute bias between the temporal data.

Obtaining Samples of the Upscaled SM
The MODIS NDVI and LST products are used to establish the LST-NDVI space during the unfrozen season. To obtain comparable TVDI values among different days in a year, it is feasible to plot the annual LST-NDVI scatterplot [57]. To determine the parameters of dry and wet edges, the range of NDVI values (x-axis in Figure 3) is first split into several equal intervals, and then the minimum and maximum LST values in each interval are extracted from the LST-NDVI space. Finally, as shown in Figure 3, LST min is estimated by averaging the minimum LST value in each interval, and LST max is determined by fitting points with maximum LST values. The points located outside dry and wet edges as outliers are not calculated. From 2015 to 2017, the wet edges were almost the same, but the differences among annual dry edges were obvious, and in 2016, there was a maximum dry edge due to the high LST value. By analyzing meteorological data from station A'rou, it was also proven that the near-surface mean air temperature in the unfrozen season of 2016, with a value of 9.62 • C, was higher than those with values of 8.21 • C and 8.98 • in 2015 and 2017, respectively. Based on the estimated dry and wet edges, the daily TVDI values were calculated according to Equation (6). Figure 4 shows the spatial distribution of the TVDI over several days. Due to cloud cover, the TVDI data had a mass of missing pixels, but the spatiotemporal changes in the dry and wet conditions of soil could be captured. There were significant seasonal variations in the TVDI data from May to September each year. The dry index was higher on the lowland area around the Babao River from northwest to southeast than that in the mountains and was lower on the south slope than on the north slope, indicating that the spatial distribution of the TVDI was in accordance with that of the SM in the Babao River basin, as described in Section 2.1. The SM observations from the 16 upscaled EHWSN nodes were used to build relationships with the TVDI values at the pixels spatially corresponding to ground-based sites. As shown in Figure 5, a monotonically decreasing linear function was fitted according to Equation (7), showing a goodness of determination coefficient indicating the high correlation between the TVDI and SM. Then, the linear regression model was used to transform the TVDI into samples of the upscaled SM.

Upscaling Ground-Based SM Observations
The TVDI-based SM as samples can describe the temporal variability of soil moisture at the pth upscaled pixel to solve Equation (4) for the weighted coefficient vector β p , and then the upscaled SM can be estimated as a linear combination of the EHWSN observations based on Equation (5). The daily upscaled SM maps, with a spatial resolution of 1 km, are shown in Figure 6, and compared with the TVDI shown in Figure 4, the spatiotemporal trend of the upscaled results was consistent with that of the TVDI, indicating that the correlation between the ancillary data and SM was the crucial factor affecting the upscaling accuracy. The upscaled results had a detailed spatiotemporal distribution relative to the TVDI. Although the ancillary data had numerous gaps due to cloud cover, the EHWSN observations with temporal continuity could fill the upscaled SM at the missing pixels based on constant weighted coefficients in the upscaling time window. In addition, the upscaled SM process in the Babao River basin from August to September 2013 was performed by Kang et al. [29] using the upscaling method used in this paper. There was a similar spatial trend between the previous results and the upscaled SM in this study, showing higher SM in the mountains than in the wide valleys.

Validation of the Upscaled SM
Rainfall is a primary factor causing spatiotemporal changes in SM. The accuracy of upscaled SM can be indirectly evaluated by detecting the response of SM to rainfall. In Figure 7, the ground-based SM and rainfall observations from the AMSs, including stations A'rou and Jingyangling, were compared with the upscaled results in terms of the consistency of temporal changes. However, not all rainfall events caused daily SM to increase, and this result was related to the amount of rainfall. According to the AMS observations, 193 rainfall events led to an increase in in situ SM, and 85% of these events harmonized with the changes in the upscaled SM. For rainfall with more than 10 mm, of which there were 57 occurrences, only four rainfall events contradicted the changes in the upscaled SM. For heavy rain events (25.0-49.9 mm) or rainstorms (50.0-99.9 mm), the upscaled SM increased at different levels.
The EHWSN observations with missing data were employed to evaluate the upscaled results (see Figure 8). Due to the spatial scale mismatch between the in situ observations and the upscaled SM, the representativeness error led to the mean and amplitude bias of the upscaled results relative to the point-scale EHWSN observations, and the maximum mean bias reached 0.21 cm 3 cm −3 . Thus, the ubRMSE index was used to improve the performance metric of the upscaled SM. There were no obvious differences among the mean annual ubRMSEs, which had an approximate value of 0.03 cm 3 cm −3 . The R was also an insensitive index to the mean bias and was adopted to evaluate the consistency of the temporal change between the ground-based and upscaled SM values. As shown in Table 2, overall, the larger the correlation coefficient was, the smaller the ubRMSE value. The averaged R index was 0.73, representing a high correlation. Only the WSN-37 observations, where soil was dry, had a low correlation with the upscaled SM because of the low representativeness of ground-based observations in terms of amplitude change.

Comparisons between the Upscaled SM and Microwave Remotely Sensed SM Products
Generally, ground-based SM observations are considered reliable reference data to evaluate RS SM products. However, the point and RS observations were spatially mismatched, but this problem could be solved by upscaling multipoint observations. In this paper, the upscaled SM with a spatial resolution of 1 km could meet the validation requirement of RS SM products at different spatial scales, and the process was performed by resampling the upscaled SM values to match the spatial resolutions of the RS SM products.    The darker the color, the larger the statistical value is.
As shown in Figure 9, SMAP products including SMAP_L3_SM_P and SMAP _L3_SM_P_E (the SMAP_L3_SM_P_E product was first aggregated to 36 km) were compared with the upscaled SM at the spatial scale of 36 km. The accuracies of both SMAP products on the ubRMSE and R indexes were almost the same, with average statistical values of 0.028 cm 3 cm −3 and 0.610, respectively. The difference in their accuracy values was mainly reflected in the bias index, indicating that the SMAP_L3_SM_P and SMAP_L3_SM_P_E products were underestimated by 0.066 cm 3 cm −3 and 0.054 cm 3 cm −3 (see Table 3), respectively. The larger underestimation occurred in May and June because the modeled surface temperature showing obviously seasonal changes with low values in the same months was adopted in the SMAP SM retrieval. As a whole, the disaggregated SMAP_L3_SM_P_E product preserved the original brightness temperature used in the retrieval of the SMAP_L3_SM_P product. Thus, both SMAP products had similar accuracies, which was also proven using ground-based observation networks at the global scale in the study of Chan et al. [58]. Like SMAP, SMOS carrying sensors with the same observation band (L-band) have attracted increasing attention due to their better capacity to penetrate vegetation canopies. In the study area, as shown in Figure 10, the SMOS-L3 and SMOS-IC products did not yield similar skills to the SMAP, especially in terms of the R indexes, which showed low consistencies between the upscaled SM and SMOS products and had values of 0.310 for SMOS-IC and 0.128 for SMOS-L3. The SMOS products had a mean anomaly bias of 0.093 cm 3 cm −3 represented by the ubRMSE index, which was approximately three times that of the SMAP products. The mean biases of the SMOS products, with -0.053 cm 3 cm −3 for SMOS-IC and -0.004 cm 3 cm −3 for SMOS-L3, were smaller than those of the SMAP products. The SMOS-IC product had a retrieval algorithm similar to that of SMOS-L3, but several simplifications, such as rejecting some ancillary data, were implemented. Compared with SMOS-L3, the accuracy of SMOS-IC on the RMSE index was not obviously improved. However, the averaged anomaly accuracies of SMOS-IC on the R and ubRMSE indexes were higher than those of SMOS-L3, and the R values still represented a weak correlation. The darker the color, the larger the statistical value is. The AMSR2 products have obvious overestimation relative to the upscaled SM in terms of bias, with a value of 0.077 cm 3 cm −3 for AMSR2_JAXA and 0.107 cm 3 cm −3 for AMSR2_NASA_X. Compared with AMSR2_NASA_X with an ubRMSE of 0.086 cm 3 cm −3 , the AMSR2_JAXA product showed a smaller temporal fluctuation range with a lower ubRMSE of 0.074 cm 3 cm −3 (see Figure 11). Due to the larger bias and ubRMSE values, the total error of AMSR2_NASA_X, with an averaged RMSE value of 0.137 cm 3 cm −3 , exceeded that of AMSR2_JAXA, with an averaged RMSE value of 0.107 cm 3 cm −3 . However, with regard to describing the temporal trend of the upscaled SM, AMSR2_NASA_X, with an averaged R value of 0.514, it was superior to AMSR2_JAXA, with an averaged R value of 0.067, meaning that AMSR2_NASA_X had stronger applicability than AMSR2_JAXA in the study area due to its better consistency with the SM signal. In addition, the R value of AMSR2_NASA_X was higher than that of SMOS, which deviates from the theoretical fact that the L-band is more sensitive to SM than the X-band. It is possible that the underlying surface in this study area is beneficial to LPRM SM retrievals. This phenomenon has also occurred in other validation activities, e.g., in Yee et al. [59] and Ma et al. [60].
Through the above analysis, RS SM products with different observation bands and retrieval algorithms showed various performances, and a single product did not always have the best performance on different evaluation indexes. The CCI SM product merging multiple single RS SM datasets was shown to improve the accuracy of satellite-based SM. In this paper, the CCI SM combining active and passive SM products, with two different versions, were compared, as shown in Figure 12. Compared with SMAP with better skills based on the above analysis, the performance of the CCI showed a slightly higher mean R value of 0.612. For other performance metrics of the CCI, including an average RMSE of 0.086 cm 3 cm −3 , an average ubRMSE of 0.037 cm 3 cm −3 and an average bias of -0.077 cm 3 cm −3 , the accuracies of the CCI were inferior to those of SMAP. The accuracy differences between the CCI SM products version 4.7 and 5.2 were not obvious because all RS SM products merged into different versions of CCI products were rescaled based on the same SM simulation dataset. Therefore, although the SMAP SM was introduced into the latest version of the CCI SM, the accuracy of the CCI had not been significantly changed due to attenuated satellite observation features in the rescaling process.

Discussion
The accuracy of the upscaled SM is affected by both sides including the representative SM that is the sample of the upscaled SM and the performance of the upscaling algorithm. The sample decides the spatio-temporal trend of the upscaled SM, thus an ancillary data highly related to SM is needed. The ATI and SEE are also recognized as the ancillary data to monitor the spatio-temporal variability of SM besides the TVDI. However, in current studies, the correlations between SM and these three variables are not compared with each other. The sample obtained by transforming any ancillary data may have system bias and random error and can be applied in the upscaling processing when there is no system error. Kang et al. [30] have proven that the sample has the low mean and amplitude bias in this study area but random error still exists. To reduce the influence of random error on the upscaled SM, an upscaling algorithm with good prediction performance should be employed. Though the traditional ordinary least square (OLS) regression method can build the relationship of the ground-based observation and the upscaled SM, random error is also fitted leading to the regression model with poor prediction (overfitting). In this study, the upscaling method introduces a regularization parameter to solve the overfitting problem. Kang et al. [43] found that the introduction of the regularization parameter could reduce the uncertainty of the upscaled SM.
The necessity to upscale ground-based observations is a matter of concern in this study. The single-point and multi-point arithmetic average validation are commonly used to assess SM retrieval values at their corresponding pixels. However, the representativeness of ground-based observations to the truth at the pixel scale are generally neglected due to the unknown truth. In this study, the representative error of ground-based observations can be estimated when the upscaled SM at difference scales is considered the relative truth. As shown in Figure 13, the RMSE index is used to measure the representative error, whose maximum and mean reach more than 0.300 cm 3 cm −3 and 0.092 cm 3 cm −3 , respectively, meaning that the single-point validation is unreliable. In addition, for a certain pixel, the most representative site may not locate within the pixel zone but rather away from the pixel. Therefore, it is justified that the upscaling algorithm in this study employs all ground-based observations for each upscaled pixel. Figure 13. Root-mean-square errors (RMSEs) between in situ measurements and the upscaled SM at multiple pixel scales. The gradient color represents spatial distances between the ground-based sites and the center points of pixels. The asterisk means that the site locates within the pixel zone. The arithmetic mean values are obtained by averaging the ground-based observations derived from multiple sites within the same pixel area. The RMSEs between arithmetic mean values and their corresponding upscaled SM at different pixel scales are shown in the extreme righthand column.
The multi-point average value is closer to the relative truth than the single-point observation, but the representative error having an averaged RMSE of 0.059 cm 3 cm −3 still needs to be considered. Due to different representativeness of sites, the representative error of the multi-point arithmetic mean does not always reduce with the increase of the number of sites, e.g., 4-site average with a RMSE value of 0.028 cm 3 cm −3 for Grid No.3 of 36 km and 2-site average with a RMSE value of 0.053 cm 3 cm −3 for Grid No.1 of 36 km, indicating that the assignation of equal weight coefficients to sites is not the optimal scheme, especially when the spatial distribution of ground-based sites within a pixel zone is sparse. The upscaling algorithm can give the weighted average of ground-based observations for the upscaled SM at each pixel showing an obvious advantage on upscaling sparse ground-based observations.

Conclusions
Ground-based observations are considered reliable reference data in the validation activities of RS products. For microwave RS SM products, however, the spatial distribution of ground-based sites generally shows a low density relative to the RS SM pixel (25-40 km), and the mismatched scale between point and RS observations may cause representativeness errors in the performance metrics. To better understand the accuracies of microwave RS SM products, a temporal upscaling algorithm was employed to transform sparse point-scale observations into pixel-scale values. In the upscaling process, the TVDI, which is strongly related to SM, was introduced to capture the spatial heterogeneity of SM. The upscaled SM with a 1 km resolution in the study area was first evaluated using in situ SM measurements and showed high performance in terms of temporal consistency and amplitude accuracy. Approximately 85% of rainfall events could be captured by the upscaled SM. Compared with previous research in the same study area, the upscaled SM in this paper had a similar spatial trend. Therefore, it was feasible to obtain the reference data using the upscaling algorithm in this study. Finally, the upscaled SM with 1 km resolution was regarded as reference data to validate RS SM products at the same spatial scale.
To meet the validation requirement at different spatial scales, the high resolution of the upscaled SM was resampled to match the spatial scales of various RS SM products. In the study area, the L-band and combined CCI SM were significantly underestimated, but the AMSR2 SM was overestimated. In addition to the R value of the CCI being slightly higher than that of SMAP, the performance of SMAP was the best of all validated RS SM products, followed by that of CCI, AMSR2_NASA_X, SMOS-IC, SMOS-L3, and AMSR2_JAXA. The accuracy of SMOS did not reach the theoretical expectation in this study, although the retrieval algorithm was developed for SMOS. SMAP and CCI had better performance, but they still could not meet the application requirement for an accuracy of less than 0.04 cm 3 cm −3 .
This validation result is applicable only in this study area and cannot answer the whole accuracy of an RS SM product; however, the regional validation results can explain the error characteristic under some underlying surfaces. To understand the global accuracy, an RS SM product was considered as reference data used to evaluate other values in previous research, but this approach is risky because no single RS SM product always has the best performance at different spatial pixels, a result proven in this and previous studies. Thus, a pixel-by-pixel validation scheme is urgently needed to accurately understand the overall accuracies of RS SM products to promote their applications in various fields.

Conflicts of Interest:
The authors declare no conflict of interest.