An Approach for Downscaling SMAP Soil Moisture by Combining Sentinel-1 SAR and MODIS Data

: A method is proposed for the production of downscaled soil moisture active passive (SMAP) soil moisture (SM) data by combining optical / infrared data with synthetic aperture radar (SAR) data based on the random forest (RF) model. The method leverages the sensitivity of active microwaves to surface SM and the triangle / trapezium feature space among vegetation indexes (VIs), land surface temperature (LST), and SM. First, ﬁve RF architectures (RF1–RF5) were trained and tested at 9 km. Second, a comparison was performed for RF1–RF5, and were evaluated against in situ SM measurements. Third, two SMAP-Sentinel active–passive SM products were compared at 3 km and 1 km using in situ SM measurements. Fourth, the RF5 model simulations were compared with the SMAP L2_SM_SP product based on the optional algorithm at 3 km and 1 km resolutions. The results showed that the downscaled SM based on the synergistic use of optical / infrared data and the backscatter at vertical–vertical (VV) polarization was feasible in semi-arid areas with relatively low vegetation cover. The RF5 model with backscatter and more parameters from optical / infrared data performed best among the ﬁve RF models and was satisfactory at both 3 km and 1 km. Compared with L2_SM_SP, RF5 was more superior at 1 km. The input variables in decreasing order of importance were backscatter, LST, VIs, and topographic factors over the entire study area. The low vegetation cover conditions probably ampliﬁed the importance of the backscatter and LST. A su ﬃ cient number of VIs can enhance the adaptability of RF models to di ﬀ erent vegetation conditions.

SM can be obtained from station-based measurements, data assimilation products based on land surface models, and remote sensing monitoring data. In situ SM observations cannot represent the SM status at the regional scale. In areas with sparsely distributed stations, data assimilation products by means of station-based measurements cannot fully reflect the spatial and temporal variations of surface SM [12,13]. Among the remote sensing technologies, microwave remote sensing can penetrate the soil surface to directly detect the surface SM content by utilizing the large difference in dielectric properties between dry soil and liquid water. Passive microwaves can generate accurate surface SM estimates because they are less affected by vegetation, soil surface roughness, topography, and water content [14,15]. At present, the soil moisture active passive (SMAP) [14] and soil moisture and ocean the range of SM values to reduce the error caused by soil surface roughness [56]. Although only a single vegetation index (VI) and air temperature were used in the above SM retrieval, the results indicated that the machine learning method with SAR data, VIs, and LST as input variables might be effective for downscaling coarse SM data. Qiu et al. investigated the suitability of different VIs for the parameterization of vegetation water content in SAR-based SM retrievals [57]. This study indicated that the combination of different VIs might be necessary to express different vegetation conditions. Additionally, Amazirh et al. [58] constructed a relationship model among radar backscatter, NDVI, and LST by polynomial fitting for the disaggregation of MODIS LST data in low vegetation cover areas. Therefore, the machine learning models combining optical/infrared data and SAR data may be feasible for downscaling passive microwave SM.
In light of the above insights, we proposed an RF model combining SAR data and optical/infrared data for downscaling passive microwave SM. In this study, the input variables were from SAR data, optical/infrared data, and topography data, and the output variable was SM. The VV polarization backscatter may be an important input variable due to the sensitivity of SAR data to surface SM, especially in bare soil and low vegetation cover conditions. The VIs can explain the contribution of vegetation to SAR data. Additionally, the triangular/trapezoidal feature space among LST, VIs, and SM can also contribute to the relationship model. Although some RF models that are mainly dependent on LST/VIs feature space have been developed, feature space is relatively unstable in low vegetation cover conditions [53,59,60]. The surface elevation and slope related to land surface energy and surface backscatter were used for constructing models [38,53]. In the training process, the output variable was from 9 km enhanced SMAP SM data, and the input variables were from 9 km aggregated SAR data, optical data, and topography data. We assumed that the relationship models were spatial-scale independent [30,36,37,53,58,61,62]. The trained RF models at a 9 km resolution were then directly applied to the input variables at 3 km and 1 km resolutions to obtain the corresponding 3 km and 1 km SM data. In the present study, three sets of comparative tests (RF1 vs. RF2, RF1 vs. RF3, and RF4 vs. RF5) were first established to show the effects of backscatter at VV polarization, elevation, and other auxiliary input variables (another VIs and slope), respectively. RF1 and RF3 were mainly based on the triangular/trapezoidal feature space among LST, NDVI, and SM. Second, the 3 km and 1 km downscaled SM from RF1-RF5 were compared using in situ SM. Third, the two SMAP-Sentinel active-passive SM products were compared using in situ SM measurements at 3 km or 1 km. Fourth, the RF5 model simulations were compared with the SMAP L2_SM_SP product based on the optional algorithm at 3 km and 1 km.

Study Area and In Situ Soil Moisture (SM) Measurements
In this study, an area in northern China was selected as the study area ( Figure 1). The elevation is highly variable, ranging from approximately 27 to 3455 m. The terrain and landforms are complex, with mountains, plateaus, and basins. The region presents a temperate continental climate with little rainfall. Most of the area is located in a semi-arid region, and the SM content is usually at a low level, especially under the colder and drier conditions of winter. Vast dry land and complex and scattered land cover types including dry land, woodland, and sandy land are present in the eastern part of the study area. Thus, fine-resolution SM monitoring is particularly important. In situ SM measurements of four representational stations were collected to evaluate the more overlapping observations of Sentinel-1 and moderate-resolution imaging spectroradiometer (MODIS) satellites in agricultural regions. The stations (53676, 53685, 53780, and 53788) operated by the China Meteorological Administration (CMA) monitor SM content at depths of 10, 20, and 40 cm at 8:00 a.m. and 3:00 p.m. local time. In this study, we evaluated the remote sensing SM data depending on the in situ SM at 3:00 p.m. The land cover types of stations 53676, 53685, and 53780 are dry land, and the land cover type of station 53788 is shrubland ( Table 1). The four stations are at a high elevation with a moderately steep slope.
Due to the lack of SM monitoring stations at 5 cm in China, SM values at 10 cm were used instead, as in past research [41,63,64]. Although the measurement depths are inconsistent, there is a strong correlation between the SM values of the two continuous soil layers [13]. In theory, the measuring depth inconsistency should have little effect on the correlation between the SMAP SM and in situ SM values. The SMAP satellite was launched by the National Aeronautics and Space Administration (NASA) in January 2015, and it is operated within a 685 km near-polar orbit [14]. The satellites, with In situ SM measurements of four representational stations were collected to evaluate the more overlapping observations of Sentinel-1 and moderate-resolution imaging spectroradiometer (MODIS) satellites in agricultural regions. The stations (53676, 53685, 53780, and 53788) operated by the China Meteorological Administration (CMA) monitor SM content at depths of 10, 20, and 40 cm at 8:00 a.m. and 3:00 p.m. local time. In this study, we evaluated the remote sensing SM data depending on the in situ SM at 3:00 p.m. The land cover types of stations 53676, 53685, and 53780 are dry land, and the land cover type of station 53788 is shrubland ( Table 1). The four stations are at a high elevation with a moderately steep slope. Due to the lack of SM monitoring stations at 5 cm in China, SM values at 10 cm were used instead, as in past research [41,63,64]. Although the measurement depths are inconsistent, there is a strong correlation between the SM values of the two continuous soil layers [13]. In theory, the measuring depth inconsistency should have little effect on the correlation between the SMAP SM and in situ SM values.

Soil Moisture Active Passive (SMAP) SM Products
The SMAP satellite was launched by the National Aeronautics and Space Administration (NASA) in January 2015, and it is operated within a 685 km near-polar orbit [14]. The satellites, with ascending (6:00 p.m.) and descending (6:00 a.m.) modes and carrying an L-band radar (1.41 GHz) and radiometer (1.26 GHz), are dedicated to monitoring global surface SM and freeze-thaw states. The SMAP can cover the entire globe in 2-3 days with three spatial resolutions of 3, 9, and 36 km. SMAP SM products are defined as L2, L3, and L4, where L2 is semi-orbit data, L3 is a composite of daily SM estimates, and L4 is the model assimilation products. Unfortunately, the L-band radar became inoperable in July 2015, and only the L-band radiometer is still operating, providing radiometer-based SM data at a coarse resolution (36 km). To restore the finer spatial resolution of SMAP products, the Backus-Gilbert interpolation technique was applied to antenna temperature in the original SMAP L1B TB product [29] to generate a SMAP-enhanced TB product on a 9 km Equal-Area Scalable Earth version 2 (EASEv2) grid; then, the 9 km enhanced passive SM product (L3_SM_P_E) was obtained [65]. Then, Sentinel-1 C-band SAR backscatter data were used as background values to disaggregate 9 km SMAP TB and SM data (L3_SM_P_E), to generate 3 km/1 km L2_SM_SP based on the baseline algorithm (L2_SM_SP baseline ), and L2_SM_SP based on the optional algorithm (L2_SM_SP optional ) [66]. Both L2_SM_SP baseline and L2_SM_SP optional are SMAP-Sentinel active-passive L2 SM products (L2_SM_SP).
In this study, 9 km SMAP L3_SM_P_E ascending data (Version 2) and 3 km/1 km L2_SM_SP data (Version 2) from 1 August, 2017 to 1 August, 2018 were downloaded from NASA's Earth Observing System Data and Information System (EOSDIS) [67]. The 9 km SMAP L3_SM_P_E ascending data were used in this study because they have more similar observation times to the available Sentinel-1 data in most areas of northern China.

Sentinel-1 Data
Sentinel-1 is operated in a polar sun-synchronous orbit of 693 km, and it carries a C-band (5.405 GHz) SAR. The similar orbit configurations of the Sentinel-1 and SMAP satellites are key to the synergy of SMAP with Sentinel-1, and these satellites have adequately overlapping image scenes and minimal overpass time differences.
In this study area, Sentinel-1A interferometric wide swath (IW) ground range detected (GRD) observations with a pixel spacing of 10 m × 10 m in ascending mode were recorded. The 136 Sentinel-1A images with VV and vertical-horizontal (VH) polarizations between 1 August, 2017 and 1 August, 2018 were acquired by the Copernicus Open Access Hub [68]. The postprocessing of Sentinel-1 SAR data was performed in ENVI SARscape 5.5 Basic Module including speckle filtering (5 × 5 refined Lee filter [69]), geocoding, radiometric calibration, and normalization. The local incidence angle (LIA) from a 30 m resolution SRTM DSM was used to compute the scattering area for radiometric calibration. The radiometric normalization was performed by the semi-empirical method to empirically correct the effects of the unequal incidence angles and the incidence angles of all Sentinel-1A images were normalized to 40 degrees. In the semi-empirical method, a linear regression between the cosine of the LIA and the backscattering coefficient in logarithmic form was performed. Finally, the backscattering coefficients at VV polarization (σ 0 vv ) in a linear unit were averaged within 9 km, 3 km, and 1 km grids in the WGS 1984 coordinate system [26,56]. The 9, 3, and 1 km aggregated Sentinel-1 SAR data were used to build the model and assess its accuracy.

Moderate-resolution Imaging Spectroradiometer (MODIS) Products
The MODIS satellites Terra and Aqua operate on a solar synchronous polar orbit. The Terra For this study, MODIS products for 1 August, 2017 through 1 August, 2018 including the 1 km daily LST product (MYD11A1), 1 km 16-day NDVI and enhanced vegetation index (EVI) product (MOD13A2), 500 m 8-day leaf area index (LAI) product (MCD15A2H) and 500 m 16-day albedo product (MCD43A3) were downloaded from NASA's EOSDIS [50,67,[70][71][72]. In addition, the 500 m 8-day surface reflectance product (MOD09A1) was downloaded to calculate the normalized difference water index (NDWI) [73]. The LST data were measured during the ascending mode of the Aqua satellite (1:30 p.m.). The overpass times of the Aqua ascending data (1:30 p.m.) and SMAP ascending data (6:00 p.m.) are similar; thus, errors caused by rainfall and the change in surface temperature can be minimized. Furthermore, the daytime LST, which exhibits large variations, is sensitive to surface SM [47,53,74]. Finally, cloud-free scenes in MODIS images were averaged over 9 km, 3 km, and 1 km grids, respectively; the LST data below zero degrees centigrade were removed.

Other Geospatial Data
A DSM, land cover data, precipitation data, and soil map were also used. The SRTM DSM with 30 m spatial resolution can be downloaded from the United States Geological Survey (USGS) Earth Explorer [75]. The land cover data with 1 km resolution, namely, WESTDC_Land_Cover_Products1.0, were acquired from the Cold and Arid Regions Science Data Centre at Lanzhou [76,77]. The precipitation data were provided by NASA's Global Precipitation Measurement (GPM) mission [78].

RF-Based Downscaling Method
The random forest (RF) technique [79] is a widely used machine learning model for solving classification, regression, and other tasks. RF is an ensemble learning model that leverages multiple weak classifiers (decision trees) to improve the generalization and reduce over-fitting phenomena. In a regression, the mean predicted values of all independent decision trees are regarded as the RF model outputs. The adaptive, randomized, and decorrelated features make RF suitable for complex and highly non-linear relationship models. RF is simple and flexible, and it is less affected than other models by hyper-parameters. Previous studies have also shown that the RF model is effective in complex non-linear fitting and that it can represent a feasible SM downscaling model [39,53].
Downscaling SMAP SM utilizes detailed information from fine-spatial-resolution auxiliary data. Figure 2 shows the general process of downscaling SMAP SM, and the text description is as above (Section 1). Surface SM from the 9 km SMAP L3_SM_P_E is the target variable. The σ 0 vv from Sentinel-1 SAR data; the VIs, LST and ALB from MODIS optical/infrared data; and the elevation and slope from topographic data are all input variables.
The 9 km SM data can provide a more detailed SM distribution and more adequate SM values than the 36 km SMAP SM data [80][81][82]. Although the SMAP SM products with the unbiased root mean square error (ubRMSE) requirement of the SMAP mission (0.04 cm 3 /cm 3 ) have a satisfactory accuracy, the products are still affected by many complex factors (e.g., topography, vegetation, surface temperature,' and RFI). At the same time, RF is a suitable model because it is less affected by sample errors and has a strong ability to resist over-fitting.
The σ 0 vv from Sentinel-1 SAR data, the common parameters (i.e., NDVI, EVI, NDWI [73], LAI, LST, and ALB) from MODIS optical/infrared data and topographic data (i.e., elevation and slope) are all input variables. First, C-band radar can penetrate the soil surface to detect the surface SM content directly. Compared with VH polarization, VV polarization is more sensitive to SM and less affected by vegetation and soil surface roughness [56]. Furthermore, VIs based on optical data can describe the contribution to and attenuation of vegetation on the total backscattering coefficients from Sentinel-1 SAR data [56]. The LST of the SM state can constrain the range of SM values to reduce the error caused by surface roughness. Second, the triangular/trapezoidal feature space of the VIs (i.e., NDVI, EVI, NDWI, and LAI) and LST is also a pivotal physical basis for downscaling SM [70]. In addition, surface SM has an impact on ALB [42]. Third, the effect of surface elevation and slope associated with land surface energy and surface backscatter are also used as input variables due to the wide range and variations in the surface elevations and slopes in this study area [23]. The slope data were calculated from elevation using ENVI software.
Nevertheless, it is necessary to outline the limitations of the RF models in this study. First, the depths of the soil layer contributing to the L-band radiometer signal (~5 cm) and the C-band SAR backscatter signal (∼2 cm) are different [14,20,36]. The penetration depths of microwaves mainly depend on microwave frequency, soil texture, and SM content [83]. Nonetheless, Sentinel-1 SAR data may provide valuable information for downscaling coarse-resolution SMAP SM [26,32]. Second, the MODIS daily LST values could be significantly different from LST values at the time of SMAP acquisition. SM status may be different at the two acquisition times depending on the soil type [36]. Additionally, the sensing depths of the SMAP L-band for bare soil and the MODIS thermal infrared band are~5 cm and~1 mm, respectively [14,36]. The thermal regime of the two depths may be quite different. Nonetheless, MODIS LST data may provide valuable information for downscaling coarse-resolution SMAP SM [36,39,56,58].
Remote Sens. 2019, 11, x FOR PEER REVIEW 7 of 20 MODIS daily LST values could be significantly different from LST values at the time of SMAP acquisition. SM status may be different at the two acquisition times depending on the soil type [36]. Additionally, the sensing depths of the SMAP L-band for bare soil and the MODIS thermal infrared band are ~5 cm and ~1 mm, respectively [14,36]. The thermal regime of the two depths may be quite different. Nonetheless, MODIS LST data may provide valuable information for downscaling coarseresolution SMAP SM [36,39,56,58]. Therefore, the RF1-RF5 models were established for three sets of comparative tests. RF1 vs. RF2, RF1 vs. RF3, and RF4 vs. RF5 were used to show the effects of σ , elevation, and other auxiliary input variables (another VI and slope), respectively. Equations (1)-(5) show the five RF models: SM = 4 (LST, NDVI, elevation, σ ) (4) SM = 5 (LST, NDVI, EVI, NDWI, LAI, ALB, elevation, slope, σ )

Evaluation of the Random Forest (RF) Models at 9 km Resolution
In this study, one year (1 August, 2017 to 1 August, 2018) of available overlapping SMAP, MODIS, and Sentinel-1 acquisitions in the entire study area were used to evaluate the RF1-RF5 models at 9 km. In the model training and testing process, there were 14,629 sets of samples with a 9 km resolution, and a 10-fold cross-validation method was adopted for parameter adjustment to ensure the robustness and generalizability of the RF models. In other words, the 14,629 sets of samples were randomly and evenly divided into 10 parts, and then the training and testing were circulated 10 times with nine parts as the training dataset and the remaining part as the testing dataset. In addition, the number of trees was set to approximately 120 by the trial-and-error method.
First, the importance of the different variables was analyzed based on the range and distribution of these samples at 9 km ( Figure 3). The results revealed an increase in the mean squared error (%) Therefore, the RF1-RF5 models were established for three sets of comparative tests. RF1 vs. RF2, RF1 vs. RF3, and RF4 vs. RF5 were used to show the effects of σ 0 vv , elevation, and other auxiliary input variables (another VI and slope), respectively. Equations (1)-(5) show the five RF models:

Evaluation of the Random Forest (RF) Models at 9 km Resolution
In this study, one year (1 August, 2017 to 1 August, 2018) of available overlapping SMAP, MODIS, and Sentinel-1 acquisitions in the entire study area were used to evaluate the RF1-RF5 models at 9 km. In the model training and testing process, there were 14,629 sets of samples with a 9 km resolution, and a 10-fold cross-validation method was adopted for parameter adjustment to ensure the robustness and generalizability of the RF models. In other words, the 14,629 sets of samples were randomly and evenly divided into 10 parts, and then the training and testing were circulated 10 times with nine parts as the training dataset and the remaining part as the testing dataset. In addition, the number of trees was set to approximately 120 by the trial-and-error method.
First, the importance of the different variables was analyzed based on the range and distribution of these samples at 9 km ( Figure 3). The results revealed an increase in the mean squared error (%) (%IncMSE), which indicates the corresponding increase in mean square error (MSE) when one variable is set to a random number. Overall, the input variables in decreasing order of importance were σ 0 vv , LST, VIs, and topographic factors according to the scores. Among the VIs, NDVI and LAI were more important than NDWI and EVI.
Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of 20 (%IncMSE), which indicates the corresponding increase in mean square error (MSE) when one variable is set to a random number. Overall, the input variables in decreasing order of importance were σ , LST, VIs, and topographic factors according to the scores. Among the VIs, NDVI and LAI were more important than NDWI and EVI. Second, the performances of RF1-RF5 generally agreed with the variable importance ( Figure 4). The 10-fold cross-validation method was adopted; thus, the entire testing dataset with a 9 km resolution (14,629 sets of samples) was used for the evaluation of the RF models. For the entire testing dataset, the RF1-RF5 models captured SM variations with R values of approximately 0.855, 0.912, 0.901, 0.931, and 0.955, and ubRMSE values of approximately 0.023, 0.018, 0.019, 0.016, and 0.013 cm 3 /cm 3 . Among the RF models, RF5, with σ and more input variables from the optical/infrared data, performed best. The RF2 model (with input variables of NDVI, LST, and σ ) exhibited significant improvement over the RF1 model (with input variables of NDVI and LST). The elevation can optimize the model according to the difference between RF3 and RF1. The comparison between RF4 and RF5 indicates that the RF model with more VIs and topographic information can increase the accuracy further to obtain a more robust RF model. In addition, the slopes were all less than 1, which is similar to [84].  Second, the performances of RF1-RF5 generally agreed with the variable importance ( Figure 4). The 10-fold cross-validation method was adopted; thus, the entire testing dataset with a 9 km resolution (14,629 sets of samples) was used for the evaluation of the RF models. For the entire testing dataset, the RF1-RF5 models captured SM variations with R values of approximately 0.855, 0.912, 0.901, 0.931, and 0.955, and ubRMSE values of approximately 0.023, 0.018, 0.019, 0.016, and 0.013 cm 3 /cm 3 . Among the RF models, RF5, with σ 0 vv and more input variables from the optical/infrared data, performed best. The RF2 model (with input variables of NDVI, LST, and σ 0 vv ) exhibited significant improvement over the RF1 model (with input variables of NDVI and LST). The elevation can optimize the model according to the difference between RF3 and RF1. The comparison between RF4 and RF5 indicates that the RF model with more VIs and topographic information can increase the accuracy further to obtain a more robust RF model. In addition, the slopes were all less than 1, which is similar to [84].

Evaluation of the Downscaled SM
Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of 20 (%IncMSE), which indicates the corresponding increase in mean square error (MSE) when one variable is set to a random number. Overall, the input variables in decreasing order of importance were σ , LST, VIs, and topographic factors according to the scores. Among the VIs, NDVI and LAI were more important than NDWI and EVI. Second, the performances of RF1-RF5 generally agreed with the variable importance ( Figure 4). The 10-fold cross-validation method was adopted; thus, the entire testing dataset with a 9 km resolution (14,629 sets of samples) was used for the evaluation of the RF models. For the entire testing dataset, the RF1-RF5 models captured SM variations with R values of approximately 0.855, 0.912, 0.901, 0.931, and 0.955, and ubRMSE values of approximately 0.023, 0.018, 0.019, 0.016, and 0.013 cm 3 /cm 3 . Among the RF models, RF5, with σ and more input variables from the optical/infrared data, performed best. The RF2 model (with input variables of NDVI, LST, and σ ) exhibited significant improvement over the RF1 model (with input variables of NDVI and LST). The elevation can optimize the model according to the difference between RF3 and RF1. The comparison between RF4 and RF5 indicates that the RF model with more VIs and topographic information can increase the accuracy further to obtain a more robust RF model. In addition, the slopes were all less than 1, which is similar to [84].

Evaluation of the Downscaled SM
To compare the RF1-RF5 models, the downscaled SM estimates at 3 km and 1 km spatial resolutions were evaluated by the point-scale observations at four stations. The overlapping Sentinel-1 and cloud-free MODIS data between 1 August, 2017 and 1 August, 2018 were used to calculate 3 km and 1 km SM over the four stations. Figure 5, and Tables 2 and 3 show the assessment of the 3 km and 1 km downscaled SM data from RF1-RF5, respectively. The performance of RF1 in Figure 5a indicated that a slight underestimation would occur over dry land. Although relatively large spatial heterogeneity was observed over station 53788 (Figure 1c), the performance of the 3 km and 1 km SM estimates over station 53788 were satisfactory. According to the difference between RF1 and RF2, the added input variable (σ 0 vv ) increased the average R from 0.51 to 0.69 and decreased the corresponding average ubRMSE from 0.033 cm 3 /cm 3 to 0.023 cm 3 /cm 3 (Tables 2 and 3). The elevation can slightly increase the accuracy of the downscaled SM according to the difference between RF3 and RF1. The RF5 model, with more VIs and slope data, performed better than the RF4 model; therefore, adequate land surface parameters from optical/infrared data are needed to obtain a robust non-linear relationship. In general, the RF5 model, with σ 0 vv and several common land surface parameters from optical/infrared data, performed the most stably at both the 3 km and 1 km spatial resolutions. To compare the RF1-RF5 models, the downscaled SM estimates at 3 km and 1 km spatial resolutions were evaluated by the point-scale observations at four stations. The overlapping Sentinel-1 and cloud-free MODIS data between 1 August, 2017 and 1 August, 2018 were used to calculate 3 km and 1 km SM over the four stations. Figure 5, and Tables 2 and 3 show the assessment of the 3 km and 1 km downscaled SM data from RF1-RF5, respectively. The performance of RF1 in Figure 5a indicated that a slight underestimation would occur over dry land. Although relatively large spatial heterogeneity was observed over station 53788 (Figure 1c), the performance of the 3 km and 1 km SM estimates over station 53788 were satisfactory. According to the difference between RF1 and RF2, the added input variable (σ ) increased the average R from 0.51 to 0.69 and decreased the corresponding average ubRMSE from 0.033 cm 3 /cm 3 to 0.023 cm 3 /cm 3 ( Table 2 and Table 3). The elevation can slightly increase the accuracy of the downscaled SM according to the difference between RF3 and RF1. The RF5 model, with more VIs and slope data, performed better than the RF4 model; therefore, adequate land surface parameters from optical/infrared data are needed to obtain a robust non-linear relationship. In general, the RF5 model, with σ and several common land surface parameters from optical/infrared data, performed the most stably at both the 3 km and 1 km spatial resolutions.     Although the RF models exhibited satisfactory performance according to the evaluation based on the testing dataset at 9 km, the accuracy of the downscaled SM at 3 and 1 km may decrease. In general, the errors of downscaled SM mainly arise from three factors: inherent uncertainty in the original coarse SM data, model fitting error, and spatial differences among the coarse SM data, fine-resolution SM data, and point-scale SM measurements [51].

Evaluation of SMAP SM Products
To compare the performance of the two L2_SM_SP products at 3 km or 1 km, the overlapping observations of L2_SM_SP baseline and L2_SM_SP optional between 1 August, 2017 and 1 August, 2018 were evaluated against in situ SM measurements at four sites ( Figure 6 and Table 4). The closest SMAP data in ascending/descending modes were spatially matched with Sentinel-1 SAR data in the ascending mode to generate L2_SM_SP. Although the two fine-resolution L2_SM_SP products can monitor the SM variations, their performances were different. The L2_SM_SP optional product showed a moderate performance (3 km L2_SM_SP optional product with an average R of 0.68 and an average ubRMSE of 0.033 cm 3 /cm 3 and 1 km L2_SM_SP optional product with an average R of 0.43 and an average ubRMSE of 0.046 cm 3 /cm 3 ), whereas the L2_SM_SP baseline product performed worse (3 km L2_SM_SP baseline product with an average R of 0.53 and an average ubRMSE of 0.052 cm 3 /cm 3 and 1 km L2_SM_SP baseline product with an average R of 0.29 and an average ubRMSE of 0.074 cm 3 /cm 3 ). Clearly, L2_SM_SP optional performed significantly better than L2_SM_SP baseline . Overall, although the combinations of low-spatial-resolution SM/TB data and high-spatial-resolution SAR data can achieve higher-resolution SM retrievals, these gains came at the cost of degradation in the temporal statistics of disaggregated TB and retrieved SM data [66]. Additionally, note that more missing data existed in the L2_SM_SP baseline product when compared with the L2_SM_SP optional product. Although the RF models exhibited satisfactory performance according to the evaluation based on the testing dataset at 9 km, the accuracy of the downscaled SM at 3 and 1 km may decrease. In general, the errors of downscaled SM mainly arise from three factors: inherent uncertainty in the original coarse SM data, model fitting error, and spatial differences among the coarse SM data, fineresolution SM data, and point-scale SM measurements [51].

Evaluation of SMAP SM Products
To compare the performance of the two L2_SM_SP products at 3 km or 1 km, the overlapping observations of L2_SM_SPbaseline and L2_SM_SPoptional between 1 August, 2017 and 1 August, 2018 were evaluated against in situ SM measurements at four sites ( Figure 6 and Table 4). The closest SMAP data in ascending/descending modes were spatially matched with Sentinel-1 SAR data in the ascending mode to generate L2_SM_SP. Although the two fine-resolution L2_SM_SP products can monitor the SM variations, their performances were different. The L2_SM_SPoptional product showed a moderate performance (3 km L2_SM_SPoptional product with an average R of 0.68 and an average ubRMSE of 0.033 cm 3 /cm 3 and 1 km L2_SM_SPoptional product with an average R of 0.43 and an average ubRMSE of 0.046 cm 3 /cm 3 ), whereas the L2_SM_SPbaseline product performed worse (3 km L2_SM_SPbaseline product with an average R of 0.53 and an average ubRMSE of 0.052 cm 3 /cm 3 and 1 km L2_SM_SPbaseline product with an average R of 0.29 and an average ubRMSE of 0.074 cm 3 /cm 3 ). Clearly, L2_SM_SPoptional performed significantly better than L2_SM_SPbaseline. Overall, although the combinations of low-spatial-resolution SM/TB data and high-spatial-resolution SAR data can achieve higher-resolution SM retrievals, these gains came at the cost of degradation in the temporal statistics of disaggregated TB and retrieved SM data [66]. Additionally, note that more missing data existed in the L2_SM_SPbaseline product when compared with the L2_SM_SPoptional product.

Comparison between the Downscaled SM and SMAP SM
The RF1-RF5 models and SMAP L2_SM_SP products were evaluated separately because more in situ SM in one year can be used for assessing them. According to the above results, RF5 performed the best among the RF models and L2_SM_SP optional performed better than L2_SM_SP baseline . Then we further compared the downscaled SM estimates from RF5 with the L2_SM_SP optional product. Although only cloud-free overlapping SMAP, MODIS, and Sentinel-1 scenes were available for the comparison, it was still meaningful.
First, a temporal comparison between the downscaled SM from RF5 and SMAP L2_SM_SP optional data at 3 km and 1 km resolutions was conducted (Figures 7 and 8, and Table 5). The overlapping observations of L2_SM_SP optional and SM estimates based on RF5 at 3 km or 1 km were extracted for further temporal comparison. As shown in Figure 8 and Table 5, a comparable performance was exhibited between the 3 km SM estimates and 3 km L2_SM_SP optional due to similar R and ubRMSE values. More outliers were observed in the 1 km L2_SM_SP optional than in the 1 km SM based on RF5, likely caused by SAR data (Figure 7 and Table 5). In addition, it appeared that the higher the spatial resolution, the more significant the effect of noise related to SAR data (Figure 7). The fine resolution of the L2_SM_SP optional product came at the cost of degradation in the temporal statistics of disaggregated SM data [66]. Overall, although optical/infrared data can only indirectly reflect surface SM values, RF5 with less noise related to radar data can generate SM data with a stable performance at both 3 km and 1 km. Compared with L2_SM_SP optional , RF5 is more superior at 1 km. Note that an extremely uncorrelated relationship was obtained between the 1 km L2_SM_SP optional product and in situ SM data ( Figure 8 and Table 5) because the overlapping SM observations were mainly distributed in the dry season and the effect of vegetation and surface roughness was heavier at the low SM content level [85].

Comparison between the Downscaled SM and SMAP SM
The RF1-RF5 models and SMAP L2_SM_SP products were evaluated separately because more in situ SM in one year can be used for assessing them. According to the above results, RF5 performed the best among the RF models and L2_SM_SPoptional performed better than L2_SM_SPbaseline. Then we further compared the downscaled SM estimates from RF5 with the L2_SM_SPoptional product. Although only cloud-free overlapping SMAP, MODIS, and Sentinel-1 scenes were available for the comparison, it was still meaningful.
First, a temporal comparison between the downscaled SM from RF5 and SMAP L2_SM_SPoptional data at 3 km and 1 km resolutions was conducted (Figures 7 and 8, and Table 5). The overlapping observations of L2_SM_SPoptional and SM estimates based on RF5 at 3 km or 1 km were extracted for further temporal comparison. As shown in Figure 8 and Table 5, a comparable performance was exhibited between the 3 km SM estimates and 3 km L2_SM_SPoptional due to similar R and ubRMSE values. More outliers were observed in the 1 km L2_SM_SPoptional than in the 1 km SM based on RF5, likely caused by SAR data (Figure 7 and Table 5). In addition, it appeared that the higher the spatial resolution, the more significant the effect of noise related to SAR data (Figure 7). The fine resolution of the L2_SM_SPoptional product came at the cost of degradation in the temporal statistics of disaggregated SM data [66]. Overall, although optical/infrared data can only indirectly reflect surface SM values, RF5 with less noise related to radar data can generate SM data with a stable performance at both 3 km and 1 km. Compared with L2_SM_SPoptional, RF5 is more superior at 1 km. Note that an extremely uncorrelated relationship was obtained between the 1 km L2_SM_SPoptional product and in situ SM data ( Figure 8 and Table 5) because the overlapping SM observations were mainly distributed in the dry season and the effect of vegetation and surface roughness was heavier at the low SM content level [85].    Second, a spatial comparison between the SM based on RF5 and L2_SM_SPoptional product was conducted. Only the comparison at the 1 km resolution is shown due to the similar performances at 3 km and 1 km. The 9 km L3_SM_P_E product is shown as a reference. Two regions under different conditions were selected, namely, regions A and B (Figure 9). Region A mainly has a relatively flat terrain covered with sand and grass as well as mountains covered with forest in the southeast. Region B is mountainous with interlaced dry land and forests. Additionally, according to the performance of RF1-RF5 in regions A and B, RF1 and RF2 also exhibited relatively significant differences from RF5. The SM values from RF3 and RF4 showed similar spatial distributions to those from RF1 and RF2, respectively. The downscaled SM based on the RF models and L2_SM_SPoptional product were compared at 1 km in region A ( Figure 10). The SM from the RF models exhibited smoother SM spatial distribution  Second, a spatial comparison between the SM based on RF5 and L2_SM_SP optional product was conducted. Only the comparison at the 1 km resolution is shown due to the similar performances at 3 km and 1 km. The 9 km L3_SM_P_E product is shown as a reference. Two regions under different conditions were selected, namely, regions A and B (Figure 9). Region A mainly has a relatively flat terrain covered with sand and grass as well as mountains covered with forest in the southeast. Region B is mountainous with interlaced dry land and forests. Additionally, according to the performance of RF1-RF5 in regions A and B, RF1 and RF2 also exhibited relatively significant differences from RF5. The SM values from RF3 and RF4 showed similar spatial distributions to those from RF1 and RF2, respectively.
Remote Sens. 2019, 11, x FOR PEER REVIEW Figure 8. Scatterplots of SM estimates from RF5 and L2_SM_SPoptional product (3 km and 1 km) vs. in situ SM at the four SM stations. Second, a spatial comparison between the SM based on RF5 and L2_SM_SPoptional product was conducted. Only the comparison at the 1 km resolution is shown due to the similar performances at 3 km and 1 km. The 9 km L3_SM_P_E product is shown as a reference. Two regions under different conditions were selected, namely, regions A and B (Figure 9). Region A mainly has a relatively flat terrain covered with sand and grass as well as mountains covered with forest in the southeast. Region B is mountainous with interlaced dry land and forests. Additionally, according to the performance of RF1-RF5 in regions A and B, RF1 and RF2 also exhibited relatively significant differences from RF5. The SM values from RF3 and RF4 showed similar spatial distributions to those from RF1 and RF2, respectively. The downscaled SM based on the RF models and L2_SM_SPoptional product were compared at 1 km in region A ( Figure 10). The SM from the RF models exhibited smoother SM spatial distribution The downscaled SM based on the RF models and L2_SM_SP optional product were compared at 1 km in region A ( Figure 10). The SM from the RF models exhibited smoother SM spatial distribution and fewer outliers than that from L2_SM_SP optional . The flat terrain, low vegetation cover, and 9 km SMAP SM data of good quality may contribute to the good performance of the RF models in region A. The parameters that account for the effects of vegetation/roughness and heterogeneity within the coarse resolution are from a relationship between SMAP radiometer TB and Sentinel-1 SAR data [66]. The quality flag of 1 km disaggregated TB observations at vertical (V) polarization can offer some explanations for L2_SM_SP optional (Figure 11). Although the TB data at V polarization had acceptable quality, L2_SM_SP optional appears to be of poor quality due to noise related to radar measurements. Furthermore, compared with RF5, RF1 and RF2 without sufficient VIs clearly underestimated the SM values in areas covered with forest ( Figure 10). Overall, RF5 exhibited a smoother and better SM spatial distribution than L2_SM_SP optional and other RF models at 1 km in region A.
Remote Sens. 2019, 11, x FOR PEER REVIEW 13 of 20 and fewer outliers than that from L2_SM_SPoptional. The flat terrain, low vegetation cover, and 9 km SMAP SM data of good quality may contribute to the good performance of the RF models in region A. The parameters that account for the effects of vegetation/roughness and heterogeneity within the coarse resolution are from a relationship between SMAP radiometer TB and Sentinel-1 SAR data [66]. The quality flag of 1 km disaggregated TB observations at vertical (V) polarization can offer some explanations for L2_SM_SPoptional (Figure 11). Although the TB data at V polarization had acceptable quality, L2_SM_SPoptional appears to be of poor quality due to noise related to radar measurements. Furthermore, compared with RF5, RF1 and RF2 without sufficient VIs clearly underestimated the SM values in areas covered with forest ( Figure 10). Overall, RF5 exhibited a smoother and better SM spatial distribution than L2_SM_SPoptional and other RF models at 1 km in region A.  (0: disaggregated TB data has acceptable quality. 1: unable to disaggregate TB data into cells; 16: significant levels of RFI were detected, and the TB data was repaired because of the effects of RFI; 17: significant levels of RFI were detected, and unable to disaggregate TB data into cells; 25: significant levels of RFI were detected, unable to disaggregate TB data into cells, and some V polarization TB input used for SM retrieval were questionable or of poor quality.). and fewer outliers than that from L2_SM_SPoptional. The flat terrain, low vegetation cover, and 9 km SMAP SM data of good quality may contribute to the good performance of the RF models in region A. The parameters that account for the effects of vegetation/roughness and heterogeneity within the coarse resolution are from a relationship between SMAP radiometer TB and Sentinel-1 SAR data [66]. The quality flag of 1 km disaggregated TB observations at vertical (V) polarization can offer some explanations for L2_SM_SPoptional (Figure 11). Although the TB data at V polarization had acceptable quality, L2_SM_SPoptional appears to be of poor quality due to noise related to radar measurements. Furthermore, compared with RF5, RF1 and RF2 without sufficient VIs clearly underestimated the SM values in areas covered with forest ( Figure 10). Overall, RF5 exhibited a smoother and better SM spatial distribution than L2_SM_SPoptional and other RF models at 1 km in region A.  (0: disaggregated TB data has acceptable quality. 1: unable to disaggregate TB data into cells; 16: significant levels of RFI were detected, and the TB data was repaired because of the effects of RFI; 17: significant levels of RFI were detected, and unable to disaggregate TB data into cells; 25: significant levels of RFI were detected, unable to disaggregate TB data into cells, and some V polarization TB input used for SM retrieval were questionable or of poor quality.). (0: disaggregated TB data has acceptable quality. 1: unable to disaggregate TB data into cells; 16: significant levels of RFI were detected, and the TB data was repaired because of the effects of RFI; 17: significant levels of RFI were detected, and unable to disaggregate TB data into cells; 25: significant levels of RFI were detected, unable to disaggregate TB data into cells, and some V polarization TB input used for SM retrieval were questionable or of poor quality.).
The downscaled SM data based on RF models and the L2_SM_SP optional product were compared at 1 km in region B (Figure 10). Compared with the L2_SM_SP optional product, the RF models showed a smoother SM spatial distribution and featured fewer missing values. Large numbers of missing values in 1 km L2_SM_SP optional occurred in region B because the TB data could not be disaggregated into 1 km resolution cells, probably due to poor-quality SMAP TB observations heavily contaminated by RFI (the cells flagged with 17 and 25 in Figure 11b) and variations in the DSM (the cells flagged with 1 in Figure 11b). Furthermore, although the TB data was repaired because of the effect of RFI in the cells flagged with 16, many SM values were flagged as anomalously high, as shown in the retrieval quality flag of the L2_SM_SP optional product. Additionally, RF1 without σ 0 vv slightly underestimated SM values in dry land, which was consistent with the underestimation of RF1 in Figure 5a. Overall, RF5 exhibited a smoother and better SM spatial distribution than L2_SM_SP optional and other RF models at 1 km in region B.
Overall, RF5 with σ 0 vv and a sufficient number of VIs appeared to be superior to other RF models. RF5 was comparable with L2_SM_SP optional at 3 km and expressed more accurate SM values in the time series and smoother SM spatial distributions than L2_SM_SP optional data at 1 km. RFI, noise related to radar measurements, and variations in the DSM appear to have a greater impact on L2_SM_SP optional than on the RF models, leading to a large number of missing values.

Discussion
Different methods for surface SM mapping perform differently. The characteristics of the RF-based downscaling method with optical/infrared and Sentinel-1 SAR data and the discrepancies among the RF models and L2_SM_SP products should be discussed.

Analysis of Input Variables in the RF Models
The variable importance scores, the test at the 9 km resolution, and the evaluation of the 3 km and 1 km downscaled SM products indicate that the input variables in decreasing order of importance are σ 0 vv , LST, VIs, and topographic factors. First, σ 0 vv plays a more significant role than the other variables because active microwaves can penetrate the soil surface to detect the surface SM content directly. The better performance of RF2 versus RF1 in the time series (Section 3.2) and the underestimation of RF1 without σ 0 vv in dry land (Figure 10b) indicate the high sensitivity of σ 0 vv to surface SM. Additionally, the low vegetation cover conditions and bare soil probably amplified the importance of the backscatter. σ 0 vv is very appropriate for northern China with vast areas of low vegetation or sparse vegetation. The RF model from Zhao et al. [53] that is dependent on the LST/VIs feature space was proposed for relatively high vegetation cover conditions, and the importance scores of VIs in their study were very high. However, the feature space performed relatively poorly in areas with NDVI values ranging from 0 to 0.30 [59,60], and the LST with low sensitivity to surface SM in winter would reduce the accuracy of the downscaling SM [53]. Thus, σ 0 vv with a high sensitivity to surface SM plays a more significant role than the other variables in this study. Amazirh et al. used σ 0 vv instead of SM proxies from optical/infrared data to express the SM variability for the disaggregation of LST in areas of low vegetation cover [58]. Overall, σ 0 vv may be able to serve as an input variable for downscaling coarse SM data.
Second, LST plays a crucial role because of the constraint on the combination of SAR data and VIs and the LST/VIs feature space. Although the combination of SAR data and VIs has been used for constructing non-linear models for SM retrievals, an error is introduced when ignoring vegetation type and surface roughness. Many combinations of characteristics (e.g., surface SM and surface roughness) contribute to the same backscattering coefficient, and LST seems to be an effective constraint on SM conditions, which is comparable with [56]. Additionally, LST is very sensitive to surface SM in this study area because the SM status is the main variable controlling the LST variability in areas of low vegetation cover [58]. Thus, LST is a pivotal feature in large regions with low vegetation cover.
Third, VIs play an important role because they can compensate for the influence of vegetation on the total backscattering coefficients and the LST/VIs feature space [56]. The suitability of different VIs for the explanation of vegetation information varies [57]. Among the VIs, both LAI and NDVI exhibit remarkable importance (Figure 3), which indicates that LAI has a similar vegetation cover representativeness as NDVI [47,53]. EVI and NDWI are less important, likely because these two variables are more sensitive in areas of high vegetation cover. EVI is usually used to compensate for the easily saturated NDVI, and the NDWI represents the vegetation water content. Thus, the RF models without appropriate VIs underestimate the SM in forest areas and are less stable than the RF5 model ( Figure 10a).
Fourth, the effects of the elevation and slope were weak when compared with the effects of other input variables (Figure 3), possibly because the differences in elevation and slope were not large for most of this study area and the value ranges at the 9 km resolution narrowed because of the smoothing effect caused by spatial aggregation. The RF3 model performed slightly better than RF1 (Tables 2  and 3), which is consistent with the findings for variable importance.
In short, the input variables in decreasing order of importance are σ 0 vv , LST, VIs, and topographic factors. The backscatter at VV polarization and LST appear to be pivotal in areas with low vegetation cover. Appropriate VIs can enhance the adaptability of RF models to different vegetation cover levels. The RF5 model can obtain a more stable performance due to its inclusion of σ 0 vv and a sufficient number of VIs.

Analysis of the Differences between RF Models and L2_SM_SP Products
The discrepancies between the RF models and L2_SM_SP products, mainly including the noise associated with SAR data, uncertain auxiliary data, and dependence on original SMAP SM, should be discussed.
First, L2_SM_SP products are more affected by the noise associated with SAR data than the RF models, which is likely attributable to the complex surface scattering conditions on vegetation and surface roughness. The SM and TB data from the 9 km L3_SM_P_E product were disaggregated using the Sentinel-1 SAR backscatter with VV and VH polarizations to generate the L2_SM_SP optional and L2_SM_SP baseline data, respectively. Common parameters account for heterogeneity and the effects of vegetation/roughness in the baseline and optional approach, and are derived from a relationship between SMAP radiometer TB and Sentinel-1 SAR data [66]. The complex surface scattering conditions caused by vegetation and surface roughness can bring more uncertainty for the L2_SM_SP products, especially at a 1 km spatial resolution. Rainfall events, cultivation activities, and the inconsistent observation time between SMAP and Sentinel-1 satellites can further introduce additional outliers for the L2_SM_SP data [20,86]. The RF5 model is less affected by noise related to SAR data through a combination with other land surface features (e.g., VIs and LST), and is capable of generating 3 km and 1 km SM data with satisfactory performance.
Second, more uncertain auxiliary data are used for generating L2_SM_SP baseline , which reduces the accuracy of L2_SM_SP baseline . The auxiliary data (e.g., approximately 25 km soil surface temperature (SST) and coarse soil texture data) have resolutions coarser than 3 km [66]. China is a mountainous country with complex topography. Additionally, the four in situ stations are in a basin surrounded by mountains with relatively complex topography, and the land surface parameters may present complex heterogeneity. Thus, L2_SM_SP baseline may exhibit more uncertainty in temporal and spatial variations than L2_SM_SP optional . In these situations, the SM data from the synergistic use of fine-resolution SAR and MODIS data and L2_SM_SP optional without coarse auxiliary data may be superior to L2_SM_SP baseline .
Third, the performance of L2_SM_SP data is more dependent on the original SMAP SM than the RF models. The L2_SM_SP data are directly generated from the original SMAP SM/TB data, whereas the RF-based downscaled SM data are calculated using Sentinel-1 SAR data, MODIS data, and the trained RF models. The SMAP TB data with poor quality and noise associated with SAR data can lead to a large number of missing values and many outliers in L2_SM_SP products. RFI has a significant impact on SMAP TB observations because the L-band is severely contaminated by RFI in China [87,88]. The RF-based downscaling method seems to be a suitable method, especially for areas easily contaminated by RFI. Additionally, the SMAP SM data with the closest observed time to Sentinel-1 SAR data were matched with SAR data; thus, the inconsistent observation time between SMAP and Sentinel-1 satellites has an impact on L2_SM_SP.
In general, compared with L2_SM_SP, the RF5 model may be less affected by noise related to SAR data and the accuracy of the original SMAP SM data, which is superior at both 3 km and 1 km resolutions. The L2_SM_SP optional data are better than the L2_SM_SP baseline calculated with low-resolution auxiliary data.

Conclusions
A method was presented to produce downscaled SMAP SM data by combining optical/infrared data with SAR data based on the RF technique. The method leverages the sensitivity of C-band radar backscatter to SM and the triangular/trapezoidal feature space of the VIs and LST.
In this study, five architectures (RF1-RF5) were first trained and tested at a 9 km resolution. SM from the 9 km SMAP L3_SM_P_E product was the target variable. Second, the model simulations from RF1-RF5 were compared using the in situ SM. RF5, with σ 0 vv and a sufficient number of VIs, performed best among the five RF models in the time series. Third, the two SMAP-Sentinel active-passive products at 3 km and 1 km were compared using in situ SM measurements. The L2_SM_SP optional product performed significantly better than the L2_SM_SP baseline product, possibly because coarse-resolution auxiliary data were used in L2_SM_SP baseline . Fourth, the RF5 model and L2_SM_SP optional were further compared at 3 km and 1 km. The SM data from RF5 was comparable with L2_SM_SP optional at 3 km and showed more satisfactory accuracy at temporal and spatial scales than L2_SM_SP optional at 1 km.
In conclusion, the synergy of the Sentinel-1 backscatter signal and MODIS optical/infrared data can provide reliable temporal and spatial variability in semi-arid areas with relatively low vegetation cover. The input variables in decreasing order of importance were σ 0 vv , LST, VIs, and topographic factors over the entire study area. The low vegetation cover conditions probably increased the importance of the radar backscatter and LST. The presence of sufficient VIs can enhance the adaptability of RF models to different vegetation covers. The RF5 model, with σ 0 vv and sufficient VIs, could obtain stable performance at both 3 km and 1 km. Compared with L2_SM_SP, RF5 was more superior at the 1 km resolution.