Sensitive Feature Evaluation for Soil Moisture Retrieval Based on Multi ‐ Source Remote Sensing Data with Few In ‐ Situ Measurements: A Case Study of the Continental U.S.

: Soil moisture (SM) plays an important role for understanding Earth’s land and near ‐ sur ‐ face atmosphere interactions. Existing studies rarely considered using multi ‐ source data and their sensitiveness to SM retrieval with few in ‐ situ measurements. To solve this issue, we designed a SM retrieval method (Multi ‐ MDA ‐ RF) using random forest (RF) based on 29 features derived from pas ‐ sive microwave remote sensing data, optical remote sensing data, land surface models (LSMs), and other auxiliary data. To evaluate the importance of different features to SM retrieval, we first com ‐ pared 10 filter or embedded type feature selection methods with sequential forward selection (SFS). Then, RF was employed to establish a nonlinear relationship between the in ‐ situ SM measurements from sparse network stations and the optimal feature subset. The experiments were conducted in the continental U.S. (CONUS) using in ‐ situ measurements during August 2015, with only 5225 training samples covering the selected feature subset. The experimental results show that mean de ‐ crease accuracy (MDA) is better than other feature selection methods, and Multi ‐ MDA ‐ RF outper ‐ forms the back ‐ propagation neural network (BPNN) and generalized regression neural network (GRNN), with the R and unbiased root ‐ mean ‐ square error (ubRMSE) values being 0.93 and 0.032 cm 3 /cm 3 , respectively. In comparison with other SM products, Multi ‐ MDA ‐ RF is more accurate and can well capture the SM spatial dynamics. DOY, latitude, MODIS_NDVI, longitude, and ERA_SR. The five selected features include geographic location (latitude, longitude), time (DOY), surface parameter (ERA_SR), and vegetation parameter (MODIS_NDVI). These factors are closely related to SM retrieval, which also shows that the feature selection method is scientific in this work.


Introduction
Soil moisture (SM) is usually defined as a volume of water stored within the unsaturated zone [1,2], and surface (0-5 cm) SM is an important variable associated with global terrestrial water, energy, and carbon cycles [3,4]. Therefore, it is necessary to obtain accurate and timely SM data.
Although traditional in-situ SM acquiring methods can provide accurate data each day, they only obtain scattered and limited point data [5]. In addition, it is impractical to deploy intensive monitoring stations around the world. Therefore, the in-situ measurements cannot well describe the spatial variability at large scale, especially when the measurements are sparse. Satellite microwave remote sensing is a more advocated method with the advantages of large-scale observation and high temporal resolution, which was proven to be a more effective way to estimate SM. Microwave remote sensing data, optical remote sensing data, and land surface models (LSMs) all provide products to estimate SM values.
For the microwave remote sensing data, several satellites were launched in the past two decades, carrying on board radiometer (passive), radar (active), or both sensors in various frequency bands with different spatial and temporal resolutions. Compared with the active microwave products, passive microwave products have lower spatial resolutions, but the revisit periods are shorter. Therefore, passive microwave products are more conducive to large-scale SM retrieval and obtain timely SM values. Microwave remote sensing products also have different bands. Various low frequencies (X, C, and L bands) have been used to detect SM content [6]. Some multi-band sensors mainly include Advanced Microwave Scanning Radiometer-Earth Observing System (AMSR-E), the Advanced Microwave Scanning Radiometer 2 (AMSR2), the Advanced Scatterometer (ASCAT), and Fengyun-3B Microwave Radiation Imager (FY-3B), etc. L-band radiometers and radars mainly include Soil Moisture and Ocean Salinity (SMOS) and the Soil Moisture Active Passive (SMAP), which are more widely used these years since the L band is considered as the optimal band for SM monitoring due to its strong penetration ability of both soil and vegetable [7].
For the optical remote sensing data, related studies mainly used Moderate Resolution Imaging Spectroradiometer (MODIS), which provides many land products related to SM variations [8][9][10][11][12][13][14]. For example, Cui et al. [9] reconstructed the FY-3B SM product based on three MODIS products including 16-day NDVI, daily land surface temperature, and 16day albedo. Kim et al. [8] used the MODIS-based EVI product as an indication for the vegetation condition at each site to assess the remote sensing SM products.
As for the LSMs, there are the European Center for Medium-range Weather Forecasts Re-Analysis Interim (ERA-Interim) [5,[15][16][17], Global Land Data Assimilation System (GLDAS) [18][19][20][21][22], and Modern Era Retrospective Analysis for Research and Applications (MERRA) [23][24][25], which are also beneficial to estimating SM values. For example, Qu et al. [22] used the GLDAS Noah Land Surface Model L4 data as the reference to estimate SM values. Ge et al. [5] used the ERA-Interim product to train the deep convolutional neural network and neural network, the results suggest that the simulated SM values and the ERA-Interim SM agree relatively well at a global scale.
SM retrieval methods can use those multi-source data to estimate SM values, and different methods can be categorized into two classes: physical-driven models and datadriven models. Traditional SM retrieval methods are based on different physical models [26][27][28][29][30][31][32][33][34][35][36][37]. Although these physical models can estimate SM values by fitting geophysical parameters and in-situ measurements, they are lack of extendibility and flexibility. On the one hand, the model parameters are usually directly obtained from limited measurement values, which cannot extend to large areas. On the other hand, the complexity of physical models makes it difficult to flexibly construct the relationship between physical parameters and SM values.
Some SM retrieval methods are equipped with machine learning (ML) algorithms. ML-based models can handle a large amount of nonlinear data and flexibly combine information from multiple sources without explicit physical relationships. According to the usage of training samples, they can be divided into three types. The first type uses LSMs as the reference for the SM retrieval model [5,38]. The advantage of this method is that it is more suitable for large-scale SM retrieval since the LSMs have global space-time coverage, inducing sufficient training samples. The disadvantage is that each LSM has some uncertainties due to the model parameter estimation errors, which can be transferred to the SM retrieval model. The second type uses the satellite SM products as the training data [9,22,39]. This method has been widely used in SM downscaling studies to obtain high spatial resolution SM. Although high spatial resolution SM products can enhance ecological and hydrological applications, the accuracy is relatively low compared with in-situ measurements. The third type uses in-situ measurements as the reference. Xu et al. [40] designed a method based on generalized regression neural network (GRNN) to train SMAP products using in-situ measurements from five networks, and they found that GRNN has a good potential for retrieving SM. Eroglu et al. [41] used the in-situ measurements as a reference based on artificial neural network, which was capable of generating sub-daily and high-resolution SM predictions. The advantage of this method is that the in-situ measurements are closer to the true value than using LSMs or satellite SM products as the reference, which is beneficial to improving the accuracy for SM retrieval. However, it is hard to obtain large-scale, adequate, and evenly distributed in-situ SM measurements. In addition, we should consider the problem of spatial matching between in-situ point measurements and satellite remote sensing data.
SM retrieval is a complex process, which depends on many interactive factors, such as soil texture, soil structure, the organic matter content, surface roughness, topography, and vegetation coverage [42,43]. It poses great challenges for accurate SM retrieval when facing with multi-source data, especially for multiple feature selection and fusion based on those factors. ML-based models have big potentials for SM retrieval considering those challenges. Among most of existing ML-based studies, the involved features are mainly from a single-source, and they are usually selected based on prior experience [5,9,22,40,44]. The features may contain noisy or redundant information without feature selection, which may decrease accuracy and increase computational cost [45]. Actually, feature selection methods can help us to understand the impact of different features on retrieval accuracy, which can also help to reduce noisy or redundant information, and avoid over-fitting in the SM retrieval model. To the best of our knowledge, very few MLbased studies have considered feature selection in SM retrieval based on multi-source data.
In this study, we designed a novel SM retrieval method (Multi-MDA-RF) using random forest (RF) based on 29 features derived from passive microwave remote sensing data, optical remote sensing data, LSMs, and other auxiliary data. The Multi-MDA-RF model is examined in a serial of comprehensive experiments: (1) we compared 10 filter or embedded type feature selection methods combined with sequential forward selection (SFS) to find the optimal feature subset for SM retrieval; (2) we analyzed the impacts of RF parameters on accuracy, including mtry and ntree; (3) our model was compared to back-propagation neural network (BPNN) and GRNN; (4) our product was compared with five popular SM products, including SMAP, AMSR2, SMOS, FY-3B, and ERA-Interim; (5) we analyzed the applicability of our model using in situ measurements from seven networks; (6) we visually inspected our product on three U.S. states with similar latitudes in the east, central, and west of CONUS; (7) we resampled the optimal features according to the lowest spatial resolution, to improve the spatial resolution of the input features and obtain a higher spatial resolution product.
It is worth noting that, some studies [40,44,46] are closely related to our work. However, there are some essential differences. Firstly, we considered multi-source data and generated 29 features as the inputs of the SM retrieval model, nevertheless other studies mainly used a single microwave remote sensing product. Secondly, the importance of different features was evaluated by using 10 feature selection methods, while existing studies did not exploit feature selection in SM retrieval. Thirdly, we used fewer training samples (i.e., a total of 5225), which was around one-third to a half of that reported in other studies, whereas achieving more accurate results with R = 0.93, which was around 0.03-0.26 higher than other studies. Finally, we produced a SM product with higher spatial resolution of 0.125°, which is around one-third of that reported in other studies. In this context, the main contribution and novelty of our work lie in that we proposed a novel Multi-MDA-RF model for SM retrieval, where multi-source data and 29 features were used as the inputs, and the importance of different features are evaluated. In addition, fewer training samples were used to produce more accurate results with higher spatial resolution. To the best of our knowledge, the proposed model is unique in the literature.
The rest of this paper is organized as follows. Section 2 describes the study area and the multi-source data used for the SM retrieval model. Section 3 illustrates the feature selection methods and presents the Multi-MDA-RF procedure for SM estimation. Section 4 reports the experimental results with a comprehensive comparison. Section 5 compares our model with other published state-of-the-art methods. Section 6 provides a summary and puts forward an outlook for future work.

Study Area
The study area is the continental U.S. (CONUS), which borders on the North Atlantic and North Pacific [44]. The terrain of CONUS is complex and diverse. The east is composed of hills and low mountains, and the center is a vast plain. The west is the most complex area, consisting of the Colorado Plateau, Wyoming Plateau, Columbia Plateau, Grand Canyon, the Sierra Nevada Mountains, and the Cascade Mountains. There are many rivers and lakes in this area. The central and eastern regions have the largest freshwater lakes in the world, which is known as the North American Mediterranean. The climate of CONUS is mostly temperate and subtropical. The southeast is a subtropical climate zone with an average annual rainfall of 1500 mm, which is relatively humid. The climate in the western plateau is dry with large temperature differences, and the annual average rainfall is below 500 mm.

In-Situ Measurements
CONUS is an ideal study area for SM retrieval, because it has large-scale and longterm SM observation networks covering almost the entire continent. These abundant insitu measurements are evenly distributed, which is conducive to the construction of SM retrieval models.
The in-situ measurements come from seven networks, which are spread across the whole CONUS: the Cosmic-ray Soil Moisture Observing System (COSMOS) [47][48][49], the Interactive Roaring Fork Observation Network (iRON) [50], PBO_H2O [51], the Soil Climate Analysis Network (SCAN) [52,53], the Snow Telemetry (SNOTEL) [54], the Soil moisture Sensing Controller and oPtimal Estimator (SOILSCAPE) [55], and the U.S. Climate Reference Network (USCRN) [56]. All in-situ measurements can be downloaded from the International Soil Moisture Network (ISMN) (http://ismn.geo.tuwien.ac.at/ accessed on 20 February 2021). ISMN has assembled over 50 operational and experimental SM networks around the world [57,58], providing uniform data format and pre-processing quality flags for global SM database [45]. Since the remote sensing satellite can only detect the surface SM, the in-situ measurements should be screened by retaining the surface SM measurements with a depth of 0-10 cm for COSMOS and 0-5 cm for others. The spatial distribution and the characteristics of the in-situ measurements are presented in Figure 1 and Table 1, respectively. The sites (total) in Table 1 refer to the total number provided by ISMN, and the sites (used) refer to the remaining number after the screening of depth.

Multi-Source Microwave Remote Sensing Data
National Aeronautics and Space Administration (NASA) launched the SMAP satellite in January 2015 to monitor global SM and landscape freeze-thaw conditions. On March 31, 2015, the L-band (1.41G Hz) radiometer was used to continuously collect scientific data [59]. Its nominal incident angle is 40° and it can achieve global coverage every 2-3 days. SMAP carries a radiometer and a radar to provide active and passive microwave remote sensing data at the same time. However, the radar stopped working after about three months of operation [60]. The ascending and descending overpasses of SMAP satellite are at 6 p.m. and 6 a.m., respectively, which are synchronized with the sun. As the thermal equilibrium of vegetation canopy and near-surface soil increases with temperature, SM retrieval is more stable in the early morning [7]. Therefore, we use SMAP Level-3 radiometer global daily 36-km EASEv2-grid soil moisture (SPL3SMP) descending overpass data, including brightness temperatures at vertical and horizontal polarization (SMAP_TBV, SMAP_TBH), the 4th Stokes' parameters (SMAP_TB4) [38], surface temperature (SMAP_Ts), vegetation water content (SMAP_VWC), albedo (SMAP_albedo), landcover classification (SMAP_landcover), latitude, and longitude. SMAP are available from https://nsidc.org/data/SPL3SMP(accessed on 15 January 2021) AMSR2 was launched by the Japan Aerospace Exploration Agency (JAXA) on May 18, 2012, and its first scientific observation began on July 3, 2012. As the successor of AMSR-E, AMSR2 continues to provide observations similar to AMSR-E. The orbit and basic settings of AMSR2 are consistent with AMSR-E. The ascending overpass of AMSR2 is 1:30 p.m., and the descending overpass was 1:30 a.m. [61]. We used the descending overpass, because it was closer to the early morning. We used AMSR2 Level-2 product with a spatial resolution of 25 km, including C-band 36GHz brightness temperatures at vertical and horizontal polarization (AMSR2_TBV, AMSR2_TBH), C-band surface temperature (AMSR2_Ts), C-band optical depth (AMSR2_optx), and X-band optical depth (AMSR2_optc). AMSR2 are available from https://search.earthdata.nasa.gov/ (accessed on 15 January 2021) and https://suzaku.eorc.jaxa.jp/. (accessed on 15 January 2021) SMOS was developed by the European Space Agency (ESA), which is the first polar orbit L-band radiometer. It was successfully launched on November 2, 2009, and has become one of the most important satellites for monitoring the global water cycle [62]. SMOS observations cover the globe approximately every 3 days with the ascending overpass at 6 a.m. and the descending overpass at 6 p.m., respectively. We used SMOS L3 ascending overpass observations with a spatial resolution of 25 km, including H and V polarization brightness temperature data (SMOS_TBV, SMOS_TBH), and optical depth (SMOS_opt). SMOS data can be obtained from https://smos-diss.eo.esa.int/oads/access/. (accessed on 20 February 2021) FY-3B was successfully launched by the China National Space Administration (CNSA) on November 5, 2010, and the data service ceased on June 2020. It was equipped with a passive microwave radiometer called Microwave Radiation Imager (MWRI) [39]. The specific characteristics of all of the above microwave remote sensing products are reported in Table 2. Table 2. The characteristics of the microwave remote sensing products.

Microwave Remote
Sensing Product Band Spatial Resolution (km)

Auxiliary Data
MODIS was developed by NASA to understand global climate changes. We used three 0.05° MODIS products, including MOD13C2, MOD11C3 and MCD12C1. MOD13C2 provides monthly global Normalized Difference Vegetation Index (MODIS_NDVI). MOD11C3 provides monthly night surface temperature (MODIS_Ts). MCD12C1 is a global landcover classification product (MODIS_landcover). MODIS data are available from https://modis.gsfc.nasa.gov/. (accessed on 13 December 2020) ERA-Interim is a global atmospheric reanalysis product provided by the Mediumrange Weather Forecasts (ECMWF) [5]. The time coverage of the product ranged from January 1979 to August 2019. We used surface roughness (ERA_SR), surface temperature (ERA_Ts), and albedo (ERA_albedo) with a spatial resolution of 0.125° at 6 a.m. from this product. ERA-Interim can be obtained from http://apps.ecmwf.int/datasets/.(accessed on 13 December 2020) The 30-m Global Land Cover Dataset (GlobeLand30) is produced by National Geomatics of China [63]. The data covers the land area of 80° N to 80° S, which consisted of 10 land cover types. GlobeLand30 are available from http://kmap.ckcest.cn/. (accessed on 13 December 2020) DEM data used in this experiment come from Shuttle Radar Topography Mission (SRTM) with a spatial resolution of 90 m. SRTM started in February 2000, and it covers an area of more than 119 million square kilometers between 60° N and 56° S. SRTM data can be obtained from http://srtm.csi.cgiar.org/ (accessed on 13 December 2020) The soil texture data come from the 1 km Harmonized World Soil Database version 1.2 (HWSD), which contains 12 soil textures. HWSD can be obtained from http://www.fao.org/. (accessed on 13 December 2020) The day of year (DOY) should also be considered as a necessary feature to reflect time variations.

Methodology
In this work, we designed a novel Multi-MDA-RF method to estimate SM values in CONUS. The presented method includes multi-feature generation, feature evaluation, and RF. Firstly, we matched multi-source data and in-situ measurements spatially and temporally to generate multiple features. Secondly, we evaluated these features by using various feature selection methods. According to the feature importance ranking, the optimal subset was obtained by using SFS. Then, RF was employed to establish a nonlinear relationship between in-situ SM measurements and the optimal feature subset. Finally, the SM retrieval model was evaluated from many aspects in terms of different models, products, in-situ measurements, and U.S. states. A flowchart of this work is exhibited in Figure 2, and the details are described as follows.

Multi-Feature Generation
For in-situ measurements, PBO_H2O records SM data once at 12 p.m. every day, and the other networks record data per hour. We used the measurements recorded at 6 a.m. for all networks except for PBO_H2O, which was in accordance with the time of satellite observations. In order to ensure the authenticity of in-situ SM measurements, we only selected the data with a quality mark "G (Good)".
In order to comprehensively consider the variables related to SM retrieval, 29 multisource features from passive microwave remote sensing data (SMAP, AMSR2, SMOS, FY-3B), optical remote sensing data (MODIS), LSM (ERA-Interim), and some other auxiliary data (GlobeLand30, SRTM, HWSD, DOY) were generated, as listed in Table 3. These features included brightness temperature data, surface parameters, vegetation parameters, soil texture, land use classification, geographical location, time, which are commonly used features in SM retrieval. There are some repetitive variables, which may be based on different criteria or methods, such as brightness temperature data from different passive microwave remote sensing data. We used feature selection to remove the redundant features among them, and then selected the features that were more suitable for SM retrieval.
For the multi-source microwave remote sensing data and the auxiliary data, they were converted to the same projection coordinate system of SMAP and resampled to a spatial resolution of 36 km.
Then, the multi-source data should be matched with in-situ measurements, spatially and temporally. In order to relieve the scale difference between in-situ points and satellite pixels, the in-situ measurements within a multi-source data grid were averaged, which was adopted in many previous studies [5,64]. After the point-surface matching, 410 spatially isolated sites were available.

Sensitive Feature Evaluation Methods
The feature selection algorithms can be divided into filter, wrapper, and embedded type methods. The filter methods measure feature importance based on different indicators, which are independent of the adopted predictor. The wrapper methods train the predictor using a subset of features, and then add or remove a feature based on a selected criterion. The embedded methods obtain the feature importance in the model training process [65].
In this experiment, we used 10 filter or embedded type feature selection methods to obtain the importance ranking of each feature. Then, we combined these methods with SFS to find the optimal feature subset. The filter type methods included regression ReliefF (RReliefF), F-test, neighborhood component analysis (NCA), Laplacian score (LS), Pearson correlation coefficient (PCCs), and maximal information coefficient (MIC). The embedded methods included mean decrease impurity (MDI), mean decrease accuracy (MDA), Lasso, and the feature optimization of Gaussian process regression (GPR) model. [66], which is very powerful in estimating the quality of features [67,68]. RReliefF penalizes the input features that give different values to neighbors with the same response values, and rewards the input features that give different values to neighbors with different response values. We used the 29 features as the input data and the in-situ measurements as the response values. The algorithm selects a random observation and finds the k-nearest observations to it. Then, the weight of SM features can be calculated as follows:

RReliefF: RReliefF is inspired from Relief
where is the weight of having different values for the response y, is the weight of having different values for the feature r, ^ is the weight of having different response values and different values for the feature r, m is the number of iterations. The importance ranking of each feature can be obtained according to this weight.
2. F-test: F-test is a statistical test by calculating the f-score of each feature [69]. We examined the importance of each feature individually using F-test, which calculates the values of f-score as follows, and the features were ranked based on f-scores.
score log (2) where p is the p values between features and in-situ measurements.
3. NCA: a novel nearest neighbor-based feature selection method was proposed by [70]. This feature selection method performs feature selection with regularization to learn feature weights for minimization of an objective function that measures the average leave-one-out regression loss over the training data. The objective function of minimization is as follows: 1 where n is the number of observations, li is the distance between the in-situ measurements and y, wr the feature weight, λ is the regularization parameter, p is the average accuracy.
4. S: Laplacian score is a feature selection algorithm introduced by [71]. The locality preserving power for each feature was reflected by calculating the Laplacian score. Then, we can rank features using the Laplacian scores computed as follows: where ri is the -th feature, Dg is the degree matrix, and L is the Laplacian matrix.
5. PCCs: Pearson correlation coefficient is a simple method that can help to understand the relationship between features and response variables. This method measures the linear correlation between variables. The value range of the result is (-1,1), where "-1" represents the complete negative correlation, "+1" represents the complete positive correlation, and "0" represents no linear correlation. The feature with the larger absolute value of the correlation is considered more important. 6. MIC: MIC is a powerful measure for relevance [72]. It is used to measure the degree of correlation between two variables r and y, and is often used in feature selection of machine learning. MIC can eliminate the feature with less information, so as to make the variable used in model more representative. MIC between the feature r and the response values y can be computed as follows: MIC ; max ; log min , where I (r; y) is the mutual information between r and y, a, b,is the number of grids in r and y directions, and B is a variable, approximately set to the 0.6th power of the amount of data.

The Embedded Methods
1. MDI: RF based feature selection methods can be divided into MDI and MDA [73]. MDI computes feature importance for tree by summing changes in the mean squared error (MSE) due to splits on every feature and dividing the sum by the number of branch nodes. The importance of each feature segmentation is as follows: where Ri is the MSE of each node, Nbranch is the total number of nodes.
2. MDA: MDA quantifies variable importance by measuring the change in prediction accuracy when the values of the variable are randomly permuted [74]. The importance of the feature r is then calculated using the following equation: where is the average differences of the features, is the standard deviation of the features.
3. Lasso: this method trains a linear regression model with Lasso regularization. For a given value of λ, a nonnegative parameter, Lasso solves the problem: where N is the number of observations, yi is the response at observation i, ri is the i-th feature, a vector of length p at observation i, λ is a nonnegative regularization parameter corresponding to one value of Lambda, and the parameters and are a scalar and a vector of length p, respectively. 4. GPR: This method is a feature selection method of GPR model [75]. It trains a GPR model and finds the predictor weights by taking the exponential of the negative i learned length scales. Then, we can normalize the weights and obtain the importance ranking.

Sequential Forward Selection (SFS)
SFS is a bottom-up search procedure [76], in which the features are added to an empty set in the specified order until the addition of further features does not decrease the criterion. The criterion used in this experiment was the root mean square error (RMSE), and Pearson correlation coefficient (R) serves as a reference. The order of forward input features was the importance ranking of each feature selection method.

The Random Forest (RF) Method
RF was developed by [77], which is a popular method in Applied Statistics field to solve classification and regression problems using multiple decision trees. One advantage of RF is that it has powerful generalization performance by using multiple regression trees, which is beneficial to reducing the variability of the model. Another advantage of RF is that it does not require complex parameter adjustments since it only has two parameters: the number of trees (ntree) and the number of features (mtry). RF selects features from the entire set using replacement sampling to establish the decision tree. To model the relationship between SM values and sparse network stations, a set of training inputoutput pairs should be given. The input variables are the optimal subset selected from the 29 features, and the output variables are the in-situ SM measurements. Once the SM retrieval model is trained, we then can estimate SM values by feeding the new samples into the model.

Evaluation Method
Four commonly used error metrics including RMSE, the mean bias (bias), R, and the unbiased root mean square error (ubRMSE) were used to evaluate the performance between the SM products and the in-situ SM measurements [78]. Those error metrics are defined as follows: where x is the SM product, y is the in-situ measurements, and indicate the standard deviation of x and y, respectively.

Experimental Settings
For validation set, there were a total of 1999 samples covering all 29 features. We split the validation set into training (60%) and test (40%). We further screened the training set after selecting the optimal feature subset, and there were 5225 training samples left. For the two parameters of RF, mtry and ntree in the RF model were experimentally set to 4 and 100, respectively, for feature selection. All experimental results were reported by averaging the outputs of 20 independent runs in terms of randomly initializing the training set. Note that our experiments were carried on a personal computer (Intel Core 2.40 GHz processor with 8 GB random access memory). The software implementation was performed using MATLAB (The MathWorks Inc., Natick, MA, USA).

Selection of Sensitive Features
Initially, 29 features from SMAP, AMSR2, SMOS, FY-3B, ERA-interim, MODIS, GlobeLand30, SRTM, HWSD and DOY were used. In this experiment, 10 filter and embedded type feature selection methods with SFS were considered. Different feature selection methods could obtain the importance weights for each feature, and the weights were normalized to 0-1 for a fair comparison. The corresponding feature importance of each feature selection method is shown in Figure 3. The abscissa is 28 features, except DOY, and the ordinate is the stacked importance. Figure 3 shows that the feature importance calculated by different feature selection methods are discrepant. Some features gained high importance in a variety of feature selection methods, such as latitude and MODIS_NDVI. These features may be more sensitive to SM. Some features are less important after integrating various feature selection methods, such as SMAP_Tb4. It may reduce the accuracy of SM retrieval and increase the uncertainty.
Generally, the curves obtained by embedded type methods are smoother than that of filter type methods, which is due to the fact that the filter type methods are independent of the adopted predictor. The overall trends of different feature selection methods are consistent, i.e., the accuracies gradually increase to the maximum value and then decrease, which verifies that too many features will lead to accuracy reduction in the SM retrieval models. Specifically, among the filter type methods, RReliefF achieves a minimum RMSE value of 0.0502 cm 3 /cm 3 , and LS achieves a maximum R value of 0.8431. Whereas, among the embedded type methods, all feature selection methods are better than RS, which shows the effectiveness of feature selection. MDA obtained a minimum RMSE value of 0.0499 cm 3 /cm 3 and a maximum R value of 0.8457 with the top 5 features. Compared with all 29 features as input, R value increased by 0.0527 and RMSE value decreased by 0.0085 cm 3 /cm 3 , indicating that feature selection is worthy of attention in SM retrieval problems.
As we can see, MDA is the best feature selection method since it achieves the smallest RMSE and highest R values. Therefore, we chose to use the optimal features selected by MDA. In order to observe the results more clearly and to find the optimal K value, we drew the step diagrams of RMSE and R values obtained by MDA in Figure 6. Our retrieval model achieved the best accuracy when K = 5. Therefore, the optimal feature subset was determined as DOY, latitude, MODIS_NDVI, longitude, and ERA_SR. The five selected features include geographic location (latitude, longitude), time (DOY), surface parameter (ERA_SR), and vegetation parameter (MODIS_NDVI). These factors are closely related to SM retrieval, which also shows that the feature selection method is scientific in this work.

Parameters Selection for Multi-MDA-RF
In order to analyze the impacts of parameters on SM retrieval accuracy, we tested two parameters, including mtry and ntree and determined their optimum values. In this experiment, we experientially set mtry to (1, 2, ..., 5), and set ntree to (100, 200, ..., 1000). The variations of R and RMSE values against mtry and ntree parameters are depicted in Figure  7 and Figure 8, respectively. It can be seen that the best results appear when mtry = 4 and ntree = 800. To sum up, the mtry parameter is set to 4 and the ntree parameter is set to 800 in the following experiments.

Generalization Performance Analysis
In general, the performance could be better when we use more training samples since the model will be well-trained with sufficient prior knowledge. In this experiment, we tested the generalization performance in our model under different numbers of training samples. To this end, the ratio of training samples was set to (0.1, 0.2, ..., 0.9). As shown in Figure 9, the accuracy increases as the ratio of training samples also increases, which confirms our assumption. In the following experiments, the ratio of training samples is set to 0.7, to obtain better results and retain sufficient validation data.

Evaluation of Different Retrieval Models
Based on the above experiments, the proposed Multi-MDA-RF model is determined. In order to give a comprehensive and reliable analysis of our model, BPNN and GRNN are chosen for comparison with the same training samples. The results of different models are summarized in Table 5. It can be observed that Multi-MDA-RF obtains much higher accuracy than BPNN and GRNN, with R (ubRMSE) values improved by 0.30 (0.039 cm 3 /cm 3 ) and 0.19 (0.029 cm 3 /cm 3 ) for BPNN and GRNN, respectively. For the operation time, our model is reasonable with a time cost of 37 s. However, GRNN takes the longest time among the three models. Although BPNN has a shorter calculation time, its accuracy is lower than the other models.
We then drew box plots in Figure 10 to analyze the stabilities of different models. More intuitively, it can be seen that the results of our model are more compact, indicating that it is more stable than the other two models. Among the three models, BPNN is the least accurate and the most unstable one. SM retrieval maps based on the three models are generated in Figure 11, which displays the mean SM maps during August 2015 for different models and the in-situ measurements. From a spatial perspective, all three models show low SM values in the west, with an increase toward the east, which agrees with climate of CONUS. However, both BPNN and GRNN underestimate SM values, especially in the center of CONUS. The result of our model illustrated in Figure 11c is wetter than BPNN and GRNN models, which is well matched with the spatial pattern of the insitu measurements.

Evaluation of Different SM Products
In this experiment, Multi-MDA-RF is compared with other five SM products including SMAP, AMSR2, SMOS, FY-3B, and ERA-Interim. We used 1577 points of spatiotemporal matching to draw scatter plots as shown in Figure 12. According to the scatter plots, Multi-MDA-RF matches the 1:1 line better than the other products, indicating that our product has the highest accuracy. Most SM values of SMAP, SMOS, and FY-3B agree with the in-situ measurements, but there are still some deviation values that make the accuracy relatively low. AMSR2 is less accurate with the R value of 0.29, and its SM values deviate from the 1:1 line. Most SM values of ERA-Interim are concentrated between 0.1 and 0.3 cm 3 /cm 3 , which are inconsistent with the in-situ measurements. Note that the satellite data used in our model are not screened, and we did not conduct quality control for all of the considered products for a fair comparison, which results in the deviations of the other products. According to the mean SM maps (Figure 13), our product can well capture the spatial dynamics and outperform the other products. The results of SMAP and SMOS are not "good", and their SM estimates involve too many black pixels near the continental margins and water systems, where the water bodies or wet soil may lead to the poor performance. AMSR2 seems to be the worst with a lot of black pixels and it overestimates the SM values, especially in densely vegetated areas. This may be because C-band satellites do not have capacity to penetrate vegetation as well as L-band satellites [79]. Note that, those black pixels in Figure 13b-d represent that the corresponding SM values are greater than 0.5 cm 3 /cm 3 , which are considered as outliers [80]. Therefore, SMAP, SMOS, and AMSR2 need conducting carefully quality control to avoid errors in practical applications. FY-3B is also very poor, and there are a lot of gaps in the map. The gaps in satellite-based SM products are intrinsic due to satellite orbits and retrieval algorithms [9]. The data gaps in CONUS accounts for about 25%, which makes the application value of FY-3B extremely limited. In addition, microwave remote sensing products are easily affected by Radio-Frequency Interference (RFI), which may be another reason for the low accuracies of these products. ERA-Interim overestimates the SM values in the east and underestimates the SM values in the west of CONUS. Its estimations are too smooth to reflect variations since the distribution of SM values cannot be so uniform in practice. This may be because ERAinterim is a reanalysis product from the data assimilation system, the product itself has certain deviations.
In this context, our product is more consistent with the in-situ measurements, which means it outperforms the other compared products, including official satellites and LSM. In addition, the obtained scatter plot and the map of our product are fairly good, demonstrating it is able to accurately describe the relationship between the selected features and in-situ measurements. Therefore, Multi-MDA-RF showed the best predictive ability and can well capture spatial variations in SM even without quality control. To further evaluate the performance of the proposed method, we selected two typical products, including SMAP and ERA-Interim using 410 spatially isolated sites. Figure 14 shows the monthly mean difference between Multi-MDA-RF, SMAP, and ERA-Interim with in-situ measurements. It can be seen that our product has the smallest difference, i.e., most of the points are closer to the 0 line, indicating the good accuracy and potential of Multi-MDA-RF in SM retrieval.

Evaluation of Different SM Networks
In order to evaluate the predictive power of the Multi-MDA-RF model over each network, Table 6 and Figure 15 show the specific evaluations at different networks. It can be observed that our model performs slightly better at SCAN and USCRN than the other five networks. This is because the two networks have sufficient, widespread and uniform insitu measurements, which are ideal SM observation networks. Table 6 shows that the R values of PBO_H2O and SOILSCAPE are lower than the others. This may be due to the fact that most of the sites of these two networks are located in rugged areas of the west CONUS. In addition, PBO_H2O only records SM data at 12 p.m. once a day. However, the satellite data used in the experiment are close to 6 a.m., which may be another reason for the low correlation of this network. Box plots in Figure 15 show that the results of SCAN, SNOTEL and USCRN are compact and stable. However, the results of COSMOS, iRON, and SOILSCAPE are unstable. This is mainly due to the fact that only a few sites of these three networks participate in model training, resulting in unstable prediction results.
Radar plot is usually used for comprehensive analysis of multiple indicators, which has the advantages of integrity, clarity and intuition. Figure 16 shows the performance of different models at seven networks. The results show that the performance of Multi-MDA-RF is better than the other models, which means that our model is more adaptable at different networks. In addition, BPNN and GRNN have low accuracy in predicting SM values at iRON, which may be because the two models are not dominant when there are few training samples. Figure 17 shows the performance of different SM products at seven networks. Multi-MDA-RF still performs best at all networks. The other products show low R values at COSMOS except for our product, which further indicates that Multi-MDA-RF has a wider range of applications. The RMSE values of AMSR2 are much higher than the other products at all networks, because there are many abnormal values in this product.

Evaluation of Different U.S. States
To evaluate the spatial variations between different states, we compared the SM values from three U.S. states with similar latitudes, i.e., Utah in the west, Kansas in the central, and Virginia in the east of CONUS. As shown in Figure 18, the topographies of the three areas are different. Utah is mainly covered with forest and grassland with high altitude. Kansas is mainly cultivated land and grassland, and the terrain is gentle. Virginia is mainly covered with forests, hills and low mountains. Table 7 shows the evaluation results of three selected U.S. states. The R value of Utah is the highest, which may be due to the richer in-situ measurements in the western region. Kansas has the lowest RMSE value, this is because the terrain of the central region is flat, which is conducive to the construction of SM retrieval model. The performance of Virginia is not very good, because the eastern region has less in-situ measurements and the dense vegetation cover is unfavorable for SM retrieval.

Producing High Resolution SM Map
Although our product has potential for SM retrieval, a SM product with relatively low spatial resolution is inadequate to be applied to practical use in some fields. Therefore, in this experiment, we used higher spatial resolution features as the input data to retrain the proposed Multi-MDA-RF model, and obtain a higher spatial resolution SM product.
In previous experiments, we resampled all of the features to the same spatial resolution of 36 km. After feature selection, the selected five features could be resampled according to the lowest resolution among the features, i.e., 0.125° of ERA-Interim product. The evaluation results of the retraining model are measured as R = 0.94, bias = 0.000 cm 3 /cm 3 , RMSE = 0.033 cm 3 /cm 3 , ubRMSE = 0.033 cm 3 /cm 3 , which means that the SM map with high spatial resolution still maintain high accuracy. Figure 19 shows a comparison of the lower and the higher resolution SM maps. It can be found that the SM map with higher spatial resolution can well represent the global and local variations with more clear details.

Discussion
The comparison of SM products shows that satellite products and LSM products have great uncertainty. Existing studies used these products as reference to train SM retrieval models, which will inevitably bring great uncertainty to the results [5,38,81]. The proposed model has greatly reduced the uncertainty from several aspects. Firstly, we used the in-situ measurements as the reference, which was beneficial to yielding more accurate results. Secondly, we conducted sensitive analysis of 29 features generated from multisource data. Thirdly, we equipped the RF model with feature selection for SM retrieval, which had higher generalization performance with limited training samples. Previous comprehensive experiments have validated the ability of our model in terms of weakening uncertainty. For example, our model is more stable compared to BPNN and GRNN according to the box plots. Our product is closer to the in-situ measurements compared to other products according to the scatter plots, SM maps, and difference diagram. Our method equally performs best at different networks according to the radar plots.
To demonstrate the advantages of Multi-MDA-RF in terms of generalization performance, we compared several other studies in the same study area in Table 8. Firstly, it can be found that our method uses the most abundant features from multi-source products. However, the input data of other studies are all from a single source except auxiliary products. Most of the experiments have not conducted feature selection, while our experiment optimizes sensitive features. Although Senyurek et al. [45] also selected features, they considered fewer features, which may be not comprehensive enough. Secondly, we used fewer training samples (5225), which shows that our product also can be used in the areas with insufficient in-situ measurements. Thirdly, we achieve higher accuracies with the R value of 0.93 and the ubRMSE value of 0.032 cm 3 /cm 3 , indicating that our product has more potential to predict SM values. Fourthly, we produced two SM maps with different spatial resolutions and provide the highest spatial resolution in the analogous studies. The 0.125° map also has high accuracy and more clear details.

Conclusions
In this paper, we proposed a novel Multi-MDA-RF method for the estimation of highquality SM values in CONUS during the August 2015. The core content of this method is to select sensitive features from 29 multi-source SM variables before training the SM retrieval model. Ten various feature selection methods were compared, and MDA shows the highest accuracy when K = 5. Then, RF was employed to establish a nonlinear relationship between the optimal feature subset and the in-situ measurements with fewer training samples (i.e., a total of 5225). Compared with BPNN and GRNN, our model is more stable and achieves higher accuracy with the R value of 0.93 and ubRMSE value of 0.032 cm 3 /cm 3 . Compared with other SM products, our product is more consistent with the in-situ meas-urements, and it shows the best predictive ability and can well capture SM spatial dynamics. The evaluation of different SM networks indicates that SCAN and USCRN are ideal SM observation networks, and our method is more adaptable at different networks compared with other models and products. The evaluation of different U.S. states shows that the flat area with rich in-situ measurements is more suitable for the implementation of SM retrieval work. We also produced a SM product with higher spatial resolution of 0.125°, which can well represent the global and local variations with more clear details.
To conclude, the Multi-MDA-RF method used in this study shows great potential in estimating reliable regional SM values. The ideology of our work may be extended to SM retrievals from other meaningful SM features and applied in other geographical regions in the world. Future work will focus on more comprehensive SM feature selection, quality control of the in-situ data, and more intensive monthly data. We can also consider the incorporation of some state-of-the-art deep learning techniques to replace RF.