Using FengYun-3C VSM Data and Multivariate Models to Estimate Land Surface Soil Moisture

: Land surface soil moisture (SM) monitoring is crucial for global water cycle and agricultural dryness research. The FengYun-3C Microwave Radiation Imager (FY-3C / MWRI) collects various Earth geophysical parameters, and the FY-3C / MWRI SM product (FY-3C VSM) has been widely applied to determine regional-scale surface SM contents. The FY-3C VSM retrieval accuracy in di ﬀ erent seasons was evaluated by calculating the root mean square error (RMSE), unbiased RMSE (ubRMSE), mean absolute error (MAE), and correlation coe ﬃ cient (R) values between the retrieved and measured SM. A lower accuracy in July (RMSE = 0.164 cm 3 / cm 3 , ubRMSE = 0.130 cm 3 / cm 3 , and MAE = 0.120 cm 3 / cm 3 ) than in the other months was found due to the impacts of vegetation and climate variations. To show a detailed relationship between SM and multiple factors, including vegetation coverage, location, and elevation, quantile regression (QR) models were used to calculate the correlations at di ﬀ erent quantiles. Except for the elevation at the 0.9 quantile, the QR models of the measured SM with the FY-3C VSM, MODIS NDVI, latitude, and longitude at each quantile all passed the signiﬁcance test at the 0.005 level. Thus, the MODIS NDVI, latitude, and longitude were selected for error correction during the surface SM retrieval process using FY-3C VSM. Multivariate linear regression (MLR) and multivariate back-propagation neural network (MBPNN) models with di ﬀ erent numbers of input variables were built to improve the SM monitoring results. The MBPNN model with three inputs (MBPNN-3) achieved the highest R (0.871) and lowest RMSE (0.034 cm 3 / cm 3 ), MAE (0.026 cm 3 / cm 3 ), and mean relative error (MRE) (20.7%) values, which were better than those of the MLR models with one, two, or three independent variables (MLR-1, -2, -3) and those of the MBPNN models with one or two inputs (MBPNN-1, -2). Then, the MBPNN-3 model was applied to generate the regional SM in the United States from January 2019 to October 2019. The estimated SM images were more consistent with the measured SM than the FY-3C VSM. This work indicated that combining FY-3C VSM data with the MBPNN-3 model could provide precise and reliable SM monitoring results. variation patterns of the SM estimations using the MBPNN-3 model were more consistent with the in situ SM measurements than those using the FY-3C VSM products. The over- and underestimation issues were reduced in the estimated SM images throughout the study area, especially the overestimation problem in the western and northeastern parts, indicating that the MBPNN-3 model with the FY-3C VSM and supplementary information (vegetation and location) as inputs was applicable for providing reliable SM estimations. Dry areas with SM values lower than 0.20 cm 3 /cm 3 were found in the southwestern part, and the proportion of dry areas gradually increased from January to July and slightly decreased from August to October. The wet areas with SM values greater than 0.30 cm 3 /cm 3 were mainly located in the mideastern region, and the proportions of wet area during January to July were slightly greater than those from July to October.


Introduction
Soil moisture (SM) is a key component in the global water cycle, heat transfer, and energy exchange processes [1]. Traditional site-based methods for the acquisition of information on surface SM and other physical properties are time-consuming, expensive, and challenging to achieve at the national or global scale [2,3]. In recent decades, satellite remote sensing technology has greatly developed, and a variety of remotely sensed data have been extensively employed to retrieve and map soil water content changes over large scales and regular temporal intervals [4,5]. In recent years, microwave remote sensing has demonstrated the potential to effectively measure spatial and temporal SM variations over large areas due to its ability to penetrate clouds, rain and the atmosphere and its sensitivity to the dielectric constant [6,7]. Thus, microwave-based remote sensing has generally been considered the most promising approach for monitoring land surface SM [8].
There are two forms of microwave remote sensing, active and passive, depending on the operation mode and sensor type [6]. Numerous studies have indicated the direct and strong correlation between the surface SM content and signals collected by active and passive microwave sensors [9][10][11][12]. In general, passive microwave remote sensing observes the Earth's surface at a shorter revisit interval than active microwave remote sensing. Additionally, passive sensors have a wider variety of mature algorithms for the retrieval of quantitative SM data, which makes it feasible to measure SM dynamics over large areas [3,13,14]. Currently, a large number of microwave-based satellite missions, such as the L-band passive microwave instruments aboard the Soil Moisture Active Passive (SMAP) satellite launched by the National Aeronautics and Space Administration (NASA) in 2015, the SM and Ocean Salinity (SMOS) satellite launched by the European Space Agency (ESA) in 2009, the Advanced Microwave Scanning Radiometer for Earth observation science (AMSR-E) and the Advanced Microwave Scanning Radiometer-2 (AMSR-2) onboard the EOS-Aqua satellite, and China's Microwave Radiometer Imager (MWRI) onboard the FengYun-3 (FY-3) series satellites, have been frequently employed for global SM monitoring [8,[15][16][17][18].
Although passive microwave remote sensing is an effective approach for providing regional SM estimates under all weather conditions, its monitoring accuracy is likely to be affected by land surface features, such as vegetation coverage and roughness. Many researchers have proposed solutions to correct for the effects of vegetation and roughness on surface emissivity by assimilating multisource data [19], developing a microwave-based vegetation index [7], applying an iterative algorithm [20], and enhancing the radiation transfer equation [14]. Choi and Hur [20] downscaled the AMSR-E SM products from a spatial resolution of 25 km to 1 km, and then combined the AMSR-E and Moderate-resolution Imaging Spectroradiometer (MODIS) data to monitor SM in Korea from May 1 to September 30, 2007. The disaggregated SM products achieved a better statistical error between the ground-measured and AMSR-E SM values, and the integration of AMSR-E and MODIS products appeared to be an effective method for improving microwave SM products, with the minimum root mean square error (RMSE) values decreasing from 0.131 to 0.059, and the maximum RMSE values decreasing from 0.179 to 0.171. Liu et al. [19] produced a merged SM product by combining AMSR-E and Advanced Scatterometer (ASCAT) data, and the results showed that the proposed methodology had the potential to rescale and improve the accuracy of SM retrievals. Shi et al. [14] proposed a dual-polarization inversion technique to minimize the effects of surface roughness, and the developed technique has been proven to be useful for achieving more accurate global SM estimations using the available satellite instruments. Wang et al. [7] developed a microwave vegetation water index (MVWI) by combining the vegetation water content (VWC) index and a vegetation structure parameter, making it possible to evaluate the VWC along with the SM.
A number of researchers have evaluated the relationships among multisource data to enhance SM products [21,22]. De Jeu et al. [21] researched the distribution of correlations between the SMOS and AMSR-E products using the classic linear regression model, and high correlations were found between the two products in moderately vegetated areas, while the correlations were low in densely vegetated regions. Ma and Liu [22] used a back-propagation (BP) neural network model to analyze the potential of using optical vegetation information and a combination of polarizations (HH and HV) for SM inversion. The results indicated that HV was more strongly related to SM than HH, and the addition of optical vegetation information to microwave remote sensing data for SM retrieval increased the average R value by more than 0.05.
Among these soil moisture products, the FY-3/MWRI SM (FY-3 VSM) is an official SM product released by the National Satellite Meteorological Center (NSMC) of China and has been widely used to record the soil moisture content in the top layer with a spatial resolution of 25 km [18,23]. The FY-3 VSM products are retrieved from observations via the MWRI onboard FY-3 series satellites, including FY-3A, FY-3B, FY-3C, and FY-3D. Generally, the FY-3C VSM product is considered to have a better quality than FY-3A and FY-3B, and longer available time series data than FY-3D. Zhu et al. [23] assessed the performance of FY-3C VSM over cropland in Henan Province, China, during 2016, and the results indicated that the SM variation patterns of the FY-3C VSM retrievals were in good agreement with the measured SM. However, the FY-3C VSM tended to overestimate or underestimate the SM during the year. To further improve the performance of the FY-3C VSM, various algorithms and models should be established and applied to enhance the FY-3C VSM product and achieve more accurate SM estimations.
This paper aimed to improve the accuracy of surface SM (0-5 cm) estimates by combining multisource data, including satellite data, optical vegetation information, elevation, and location data. For this purpose, the FY-3C VSM product and MODIS normalized difference vegetation index (NDVI) time series for the months from January 2019 to October 2019 were reprojected and resampled. The international soil moisture network (ISMN) measurement database was employed as the reference SM data. Then, a quantile regression (QR) model was used to analyze the detailed correlation between the surface SM and multiple factors. Multivariate models, including multivariate linear regression (MLR), and multivariate BP neural network (MBPNN), were developed to correct for the effects of vegetation coverage and enhance the monitoring of SM over large regions. Additionally, the SM images retrieved from FY-3C VSM, which were estimated by the best model and interpolated from the ISMN, were compared, and statistical parameters were calculated to further validate the feasibility of using the proposed method to enhance satellite SM products.

Multisource Data
2.1.1. Data from the ISMN SM measurements are a valuable resource for assuring the consistent and precise quality of various satellite-derived SM products [24]. The ISMN is an international cooperation that has provided and maintained globally available in situ SM measurements at various depths since 1952 [25]. Currently, the ISMN contains information from more than 60 networks and over 2500 stations around North America, Europe, Asia, Australia, Africa, and South America [26].
Given the widespread distribution and large number of ground-based SM observations in the U.S. (Figure 1), this region (67 • 03 W-124 • 76 W and 25 • 24 N-49 • 39 N) was selected as the study area in this paper. Most parts of the study area have continental and subtropical climates, with average temperatures in January and July ranging from −3 • C and 24 • C in the northern part to 11 • C and 28 • C in the southern part, respectively. The average annual rainfall in the area decreases from 1500 mm over the central plain and Pacific coasts to 500 mm over the western plateau, where the highest elevation is 4187 m.
Approximately 700 SM sites operated by three SM networks, i.e., the U.S. Climate Reference Network (USCRN), Snow Telemetry (SNOTEL), and Soil-Climate Analysis Network (SCAN), are located in the study area ( Figure 1). The hourly data sets at 0.05-m depth from January 1 at 00:00 to October 31 at 23:00 in 2019 were collected and generated as the actual SM in this study. The sampling interval of the original data was one hour, and the time interval of the measurements was converted into one month using the averaging method. Then, the empirical Bayes kriging (EBK) method and ArcMap software were employed for spatial interpolation and to achieve the monthly SM over the study area ( Figure 2). Here, the output size of the cell was set as 0.27756061° to assure the same spatial resolution as the microwave remote sensing data. The mean prediction error and RMSE of the spatial interpolation using EBK were 0.0025 cm 3 /cm 3 and 0.096 cm 3 /cm 3 , respectively, which indicated high accuracy between the ground-based and interpolated SM in the study area. According to Figure 2, the SM content in the midwestern part was significantly lower than that in the eastern and northwestern parts of this region, which is basically consistent with the patterns of rainfall variations and climate types.

FY-3C VSM
FY-3 series satellites, including FY-3A, FY-3B, FY-3C, and FY-3D, are China's second generation of polar-orbiting meteorological satellites and are operated by the China Meteorological Administration (CMA). These satellites are equipped with various instruments in the optical, infrared, and microwave bands and have become valuable data sources for global climate predictions, weather forecasting and surface SM measurements over the past decade [27,28]. The MWRI onboard FY-3 includes vertical and horizontal polarization modes (V/H) in the 10.65-GHz, 18.7-GHz, 23.8-GHz, 36.5-GHz, and 89.0-GHz frequency bands [29]. The characteristics, such as the bandwidth, sensitivity, calibration accuracy, scanning mode, and instantaneous field of view (IFOV), are shown in Table 1 [23]. Among all of the frequency bands, the channels in the 10.65 GHz band are capable of penetrating clouds and rain and are mainly used to obtain global surface sea temperature (SST), wind speed, SM content, and other geophysical parameters under all weather conditions.  Then, the empirical Bayes kriging (EBK) method and ArcMap software were employed for spatial interpolation and to achieve the monthly SM over the study area ( Figure 2). Here, the output size of the cell was set as 0.27756061 • to assure the same spatial resolution as the microwave remote sensing data. The mean prediction error and RMSE of the spatial interpolation using EBK were 0.0025 cm 3 /cm 3 and 0.096 cm 3 /cm 3 , respectively, which indicated high accuracy between the ground-based and interpolated SM in the study area. According to Figure 2, the SM content in the midwestern part was significantly lower than that in the eastern and northwestern parts of this region, which is basically consistent with the patterns of rainfall variations and climate types. Then, the empirical Bayes kriging (EBK) method and ArcMap software were employed for spatial interpolation and to achieve the monthly SM over the study area ( Figure 2). Here, the output size of the cell was set as 0.27756061° to assure the same spatial resolution as the microwave remote sensing data. The mean prediction error and RMSE of the spatial interpolation using EBK were 0.0025 cm 3 /cm 3 and 0.096 cm 3 /cm 3 , respectively, which indicated high accuracy between the ground-based and interpolated SM in the study area. According to Figure 2, the SM content in the midwestern part was significantly lower than that in the eastern and northwestern parts of this region, which is basically consistent with the patterns of rainfall variations and climate types. . These satellites are equipped with various instruments in the optical, infrared, and microwave bands and have become valuable data sources for global climate predictions, weather forecasting and surface SM measurements over the past decade [27,28]. The MWRI onboard FY-3 includes vertical and horizontal polarization modes (V/H) in the 10.65-GHz, 18.7-GHz, 23.8-GHz, 36.5-GHz, and 89.0-GHz frequency bands [29]. The characteristics, such as the bandwidth, sensitivity, calibration accuracy, scanning mode, and instantaneous field of view (IFOV), are shown in Table 1 [23]. Among all of the frequency bands, the channels in the 10.65 GHz band are capable of penetrating clouds and rain and are mainly used to obtain global surface sea temperature (SST), wind speed, SM content, and other geophysical parameters under all weather conditions.  . These satellites are equipped with various instruments in the optical, infrared, and microwave bands and have become valuable data sources for global climate predictions, weather forecasting and surface SM measurements over the past decade [27,28]. The MWRI onboard FY-3 includes vertical and horizontal polarization modes (V/H) in the 10.65-GHz, 18.7-GHz, 23.8-GHz, 36.5-GHz, and 89.0-GHz frequency bands [29]. The characteristics, such as the bandwidth, sensitivity, calibration accuracy, scanning mode, and instantaneous field of view (IFOV), are shown in Table 1 [23]. Among all of the frequency bands, the channels in the 10.65 GHz band are capable of penetrating clouds and rain and are mainly used to obtain global surface sea temperature (SST), wind speed, SM content, and other geophysical parameters under all weather conditions. FY-3C was launched in September 2013 and has been recognized as the first official service satellite and a suitable replacement for FY-3A as the new forenoon satellite [30]. FY-3C VSM products are retrieved from the L1 brightness temperature (BT) data collected by MWRI using a new inversion technique, i.e., the Q p model, which was built using advanced integral equation (AIE) model simulations of microwave emissions [14]. Compared with the single channel SM retrieval algorithm, both the vertical and horizontal polarizations of the X-band (10.65 GHz) were applied in the Q p model to decrease the effects of roughness and vegetation coverage [23]. Also, the pixels covered with snow, frozen soil, and coastal waters were removed from the FY-3C VSM retrievals. These products are classified into ascending (A) and descending (D) products at daily, 10-day, and monthly time intervals, and have been widely used to detect the global land surface soil volumetric water content (cm 3 /cm 3 ) with a spatial resolution of 25 km. In this study, the monthly FY-3C VSM products (FY3C_MWRIX_GBAL_L3_VSM.HDF) data from January to October 2019 were downloaded from the website of the NSMC (satellite.nsmc.org.cn/) and reprojected from the equal-area scalable earth (EASE) grid 2.0 to the WGS-84 grid.

MODIS NDVI
The NDVI products derived from MODIS are the most mature and widely used source for monitoring vegetation status and estimating soil water contents at the regional scale as supplemental data in many previous studies [20,23]. Here, 14 tiles (h08v04, h08v05, h08v06, h09v04, h09v05, h09v06, h10v04, h10v05, h10v06, h11v04, h11v05, h12v04, h12v05, and h13v04) of monthly MODIS NDVI products (MOD13A3, v006) from the Terra satellite from January 2019 to October 2019 were downloaded from the NASA website (https://ladsweb.modaps.eosdis.nasa.gov/). The MODIS NDVI products were reprojected from sinusoidal to WGS-84, and their spatial resolutions were resampled from 250 m to 25 km to be consistent with the FY-3C VSM.
Apart from the FY-3C VSM and MODIS NDVI, the elevation was derived from the GMTED2010 elevation dataset and resampled to 25 km as well. Then, the values of ISMN, FY-3C VSM, MODIS NDVI, elevation, latitude, and longitude of the study area were extracted pixel by pixel for multivariate correlation analysis and comprehensive monitoring of SM.

Quantile Regression Model
A key objective of this paper was to indicate the detailed relationship between SM measurements and remotely sensed surface features and other potential impact factors, such as vegetation and Remote Sens. 2020, 12, 1038 6 of 20 location information. In previous studies, regression models were extensively applied to quantitatively analyze the relationships among various factors. However, most regression models were conducted based on the correlation between dependent and independent variables at the mean level. Therefore, the information reflected by those models is limited [31]. In addition, the distribution characteristics of random errors are strictly required when ordinary least squares (OLS) method is used for regression analysis. Thus, the linear assumption of least squares may be too harsh when applying the OLS to some practical problems [32].
The quantile is the proportion of dependent variables which are lower than a certain element in the sorted array. For example, the quantile 0.1 represents for the first 10 elements in the array with 100 elements arranged from the lowest to the highest, which makes it possible for quantiles to represent any location in the distribution. QR theory was developed by Koenker and Bassett [33] as a tool for overcoming the limitations of classic OLS and comprehensively reflecting the conditional distribution characteristics under different quantiles. The QR model was built based on the conditional quantile of dependent variables, which makes it capable of describing the impacts of independent variables on the shapes and variation ranges of the conditional distributions of dependent variables. A number of researchers have applied QR in many fields, including economics, medicine, education, and social and environmental sciences, to achieve steady and robust regression results [34,35].
Compared with the OLS-based model, the QR model provides an in-depth view of the relationship between independent and dependent variables [34]. Let the distribution function of the random variable Y be F(y) = P(Y ≤ y). The quantile function of Y at the τ (0< τ <1) quantile can be defined as The loss function, which is a piecewise function (Figure 3), is defined as where I(·) is the indicative function.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 19 quantitatively analyze the relationships among various factors. However, most regression models were conducted based on the correlation between dependent and independent variables at the mean level. Therefore, the information reflected by those models is limited [31]. In addition, the distribution characteristics of random errors are strictly required when ordinary least squares (OLS) method is used for regression analysis. Thus, the linear assumption of least squares may be too harsh when applying the OLS to some practical problems [32]. The quantile is the proportion of dependent variables which are lower than a certain element in the sorted array. For example, the quantile 0.1 represents for the first 10 elements in the array with 100 elements arranged from the lowest to the highest, which makes it possible for quantiles to represent any location in the distribution. QR theory was developed by Koenker and Bassett [33] as a tool for overcoming the limitations of classic OLS and comprehensively reflecting the conditional distribution characteristics under different quantiles. The QR model was built based on the conditional quantile of dependent variables, which makes it capable of describing the impacts of independent variables on the shapes and variation ranges of the conditional distributions of dependent variables. A number of researchers have applied QR in many fields, including economics, medicine, education, and social and environmental sciences, to achieve steady and robust regression results [34,35].
Compared with the OLS-based model, the QR model provides an in-depth view of the relationship between independent and dependent variables [34]. Let the distribution function of the random variable Y be F(y) = P(Y ≤ y). The quantile function of Y at the τ (0< τ <1) quantile can be defined as The loss function, which is a piecewise function (Figure 3), is defined as where I(·) is the indicative function. Let z equal ŷ y − , and ρτ(z) can be represented by Then, the τth QR of y (ŷ ) can be estimated by minimizing the expected loss: Assume that the linear regression equation is described as where Y is the dependent variable, X' is the transpose of independent X, β is the regression coefficient, Let z equal y −ŷ, and ρ τ (z) can be represented by Then, the τth QR of y (ŷ) can be estimated by minimizing the expected loss: Assume that the linear regression equation is described as where Y is the dependent variable, X' is the transpose of independent X, β is the regression coefficient, and ε is the error term. The regression coefficient β in a conditional QR is estimated by minimizing the sum of absolute deviation [32]: Currently, the simplex algorithm, interior point algorithm, and smoothing algorithm are commonly used to estimate the regression efficiency β(τ). Among those algorithms, the smoothing algorithm is suitable for processing datasets with a large number of observations and variables and was thus applied in this study. Then, the τth (0 < τ < 1) quantile function of y can be represented by where β(τ) is the solution to the minimizing problem in Equation (6). Thus, many regression lines at different quantiles can be obtained to examine the behavior of the response variable beyond the average of the Gaussian distribution [36].

Back-Propagation Neural Network Model
Neural networks (NNs) are flexible and intelligent computing frameworks for solving large-scale and nonlinear problems over a broad range of applications, such as time series data forecasting and soil property estimation [3,22,37,38]. The back-propagation neural network (BPNN), developed by Rumelhart et al. [39] in 1985, has been recognized as the most mature and widespread method in the field of NNs and usually demonstrates superior simulation capability [5,40]. The learning process of the BPNN consists of forward propagation of the data stream in the "input layer→hidden layers→output layer" direction and back propagation of the error signal in the "output layer→hidden layers→input layer" direction, as shown in Figure 4. and ε is the error term. The regression coefficient β in a conditional QR is estimated by minimizing the sum of absolute deviation [32]: Currently, the simplex algorithm, interior point algorithm, and smoothing algorithm are commonly used to estimate the regression efficiency β(τ). Among those algorithms, the smoothing algorithm is suitable for processing datasets with a large number of observations and variables and was thus applied in this study. Then, the τth (0 < τ < 1) quantile function of y can be represented by where β(τ) is the solution to the minimizing problem in Equation (6). Thus, many regression lines at different quantiles can be obtained to examine the behavior of the response variable beyond the average of the Gaussian distribution [36].

Back-propagation Neural Network Model
Neural networks (NNs) are flexible and intelligent computing frameworks for solving largescale and nonlinear problems over a broad range of applications, such as time series data forecasting and soil property estimation [3,22,37,38]. The back-propagation neural network (BPNN), developed by Rumelhart et al. [39] in 1985, has been recognized as the most mature and widespread method in the field of NNs and usually demonstrates superior simulation capability [5,40]. The learning process of the BPNN consists of forward propagation of the data stream in the "input layer→hidden layers→output layer" direction and back propagation of the error signal in the "output layer→hidden layers→input layer" direction, as shown in Figure 4. Suppose a three-layer BPNN has n inputs, m neurons in the hidden layer, and one output ( Figure  4). The weights and thresholds from the input to hidden layer are wjk and θk (j = 1, 2, …, n; k = 1, 2, …, m), and the weights and thresholds from the hidden to output layer are wk (k = 1, 2, …, m) and θ, respectively. Then, the kth unit of the hidden layer (yk) and the output of the output layer (b) can be given as Equations (8) and (9), respectively.
where f represents the sigmoid function, which is described as Suppose a three-layer BPNN has n inputs, m neurons in the hidden layer, and one output (Figure 4). The weights and thresholds from the input to hidden layer are w jk and θ k (j = 1, 2, . . . , n; k = 1, 2, . . . , m), and the weights and thresholds from the hidden to output layer are w k (k = 1, 2, . . . , m) and θ, respectively. Then, the kth unit of the hidden layer (y k ) and the output of the output layer (b) can be given as Equations (8) and (9), respectively.
where f represents the sigmoid function, which is described as The mean square error (e) between the actual output (b) and target vector (T) is calculated as follows: where N represents the number of samples in the target vector.
The weights and thresholds are continuously adjusted to meet the error limit during the BP process of the error signal. In general, the gradient descent with momentum (GDM) method is helpful to speed up the learning process and prevent it from being trapped in poor local minima. Thus, this method has been widely applied as the training method in BPNNs [38,39]. The training efficiency and accuracy are highly relevant to the initial setting of parameters, such as the learning rate (η), iterations (t), and momentum term (α), when the GDM method is used in the training process of BPNN [5]. The regulating rules of weights and thresholds are shown in Equations (12)- (15). After the adjustment of weights and thresholds, the well-trained BPNN is used to map SM based on the inputs provided.
In this study, three multivariate BPNN (MBPNN) models with one, two, and three input variables were established using the MATLAB Neural Network Toolbox for estimating SM in the U.S. For this purpose, the FY-3C VSM and MODIS NDVI, together with the location information, were generated and extracted as the input variables of the MBPNN models. The measured SM from the ISMN data was set as the target vector in these models. The "newff" function was used to design a BPNN, the "sim" function was used to calculate the output of the network, and the "mse" function was applied to calculate the performance errors. Additionally, the GDM method was employed to train the samples, and the learning rate, momentum coefficient, iteration and error limit were set as 0.05, 0.95, 10,000, and 0.00001, respectively, based on the comparison experiments. Of the samples, 80% and 20% were set as the training and testing sets, respectively, and the model yielded the best training and testing accuracies to simulate the SM values of each pixel.

Linear Regression Model
The linear regression model has been extensively used for interpretation of the change in the dependent variable using one or more independent variables in various fields, and it can be defined as where x 1 , x 2 , . . . , x n are independent variables, Y is the dependent variable, β 0 is a constant, β i (i = 1, 2, . . . , n) are coefficients of x i (i = 1, 2, . . . , n), and ε is the residual error. In this study, three linear regression models (MLR-1, MLR-2, and MLR-3) with 1~3 influence factors as independent variables and ground-based SM from ISMN as the dependent variable were established for SM estimation. The types of independent variables in each MLR model and inputs in each MBPNN model are listed in Table 2.

Statistical Indicators
To evaluate the accuracies of FY-3C VSM products and the monitored SM using multivariate models, five statistical indicators, including the correlation coefficient (R), RMSE, unbiased RMSE (ubRMSE), mean absolute error (MAE), and mean relative error (MRE), were adopted in this study.
The R and MRE represent the relative accuracy between the FY-3C VSM retrievals or the estimated SM and the SM measurements, respectively. To evaluate the FY-3C VSM accuracy, R and MRE can be defined by Equations (17) and (18).
where VSM i represents the FY-3C VSM retrievals (cm 3 /cm 3 ), SM i represents the SM measurements (cm 3 /cm 3 ), VSM represents the average of FY-3C VSM retrievals (cm 3 /cm 3 ) across all pixels in the study area, SM indicates the average of SM measurements (cm 3 /cm 3 ), and N represents the total number of valid samples and varies for each month. The RMSE, ubRMSE, and MAE evaluate the absolute difference of FY-3C VSM product or SM estimations from the measured SM via the ISMN, which can be calculated by the following equations.
where MD is the abbreviation of the mean deviation, which can be calculated as

Comparison between FY-3C VSM and Measured SM
To assess the consistency between the FY-3C VSM and the measured SM in different seasons, pixel values were extracted from FY-3C VSM images and the measured SM from the ISMN in January, April, July, and October 2019. Considering the periodic change of SM curves between different rows, the pixels in one cycle were representative and should be selected to describe the over-and underestimation issues of FY-3C VSM directly. Here, 150 pixels were selected randomly from the central rows with the latitude ranging from 32.19 • N to 32.74 • N in each month for comparison between the FY-3C VSM product and measured SM. The variation curves of FY-3C VSM (red line) and measured SM (green line) are displayed in Figure 5. Generally, the varying patterns of the FY-3C retrievals were basically consistent with the in situ SM values in most samples during the four months, especially in January (Figure 5a), whose pixel values from both sources were distributed between 0.10 cm 3 /cm 3 and 0.45 cm 3 /cm 3 . In February 2019 (Figure 5b), the FY-3C VSM sample values from 10 to 50 were basically lower than the SM measurements, and the sample values from 70 to 130 were mainly higher than the measured SM values. It was obvious that the FY-3C satellite retrievals in July 2019 (Figure 5c) overestimated the SM in almost of the samples, and the estimation error was significantly greater than that in the other months, which was possibly due to the potential impacts caused by factors such as deep and widespread vegetation coverage in summer. The curves for October 2019 (Figure 5d) indicated a similar situation to that in July, although the proportion and degree of overestimation were slightly decreased compared with those in July. To further evaluate the vegetation impact on the FY-3C VSM retrieval accuracy in July, the SM deviations between the FY-3C VSM and the measured SM were calculated and compared to the MODIS NDVI by plotting the variation curves and evaluating the correlation between the two sources. The plot could be found in the Appendix A, where the curves of the MODIS NDVI (red line) and the SM deviations (green line) showed a similar variation pattern and the R value was 0.666, which indicated a strong correlation between the NDVI and SM deviation.
To further analyze the SM monitoring accuracy using the FY-3C VSM product, the values of R, RMSE, ubRMSE, and MAE between the FY-3C VSM and the in situ SM measurements in different months were calculated pixel by pixel, as shown in Table 3. All valid pixels from the FY-3C VSM products in January (4464), April (8870), July (7446), and October (8852) were included in the statistical error metrics in Table 3. The values of R were distributed between 0.467 and 0.621, indicating a moderate or strong correlation between FY-3C VSM and the actual SM in all seasons. The SM estimation accuracies in January (RMSE = 0.116 cm 3 /cm 3 , ubRMSE = 0.113 cm 3 /cm 3 , and MAE = 0.092 cm 3 /cm 3 ) and April (RMSE = 0.114 cm 3 /cm 3 , ubRMSE = 0.112 cm 3 /cm 3 , and MAE = 0.094 cm 3 /cm 3 ) were significantly higher than those in July (RMSE = 0.164 cm 3 /cm 3 , ubRMSE = 0.130 cm 3 /cm 3 , and MAE = 0.120 cm 3 /cm 3 ) and October (RMSE = 0.147 cm 3 /cm 3 , ubRMSE = 0.121 cm 3 /cm 3 , and MAE = 0.113 cm 3 /cm 3 ). Among these months, despite the high correlation between the estimated and actual SM, the SM inversion in July yielded the worst accuracy due to overestimation. degree of overestimation were slightly decreased compared with those in July. To further evaluate the vegetation impact on the FY-3C VSM retrieval accuracy in July, the SM deviations between the FY-3C VSM and the measured SM were calculated and compared to the MODIS NDVI by plotting the variation curves and evaluating the correlation between the two sources. The plot could be found in the Appendix A, where the curves of the MODIS NDVI (red line) and the SM deviations (green line) showed a similar variation pattern and the R value was 0.666, which indicated a strong correlation between the NDVI and SM deviation. To further analyze the SM monitoring accuracy using the FY-3C VSM product, the values of R, RMSE, ubRMSE, and MAE between the FY-3C VSM and the in situ SM measurements in different months were calculated pixel by pixel, as shown in Table 3. All valid pixels from the FY-3C VSM products in January (4464), April (8870), July (7446), and October (8852) were included in the statistical error metrics in Table 3. The values of R were distributed between 0.467 and 0.621, indicating a moderate or strong correlation between FY-3C VSM and the actual SM in all seasons. The SM estimation accuracies in January (RMSE = 0.116 cm 3 /cm 3 , ubRMSE = 0.113 cm 3 /cm 3 , and MAE = 0.092 cm 3 /cm 3 ) and April (RMSE = 0.114 cm 3 /cm 3 , ubRMSE = 0.112 cm 3 /cm 3 , and MAE = 0.094 cm 3 /cm 3 ) were significantly higher than those in July (RMSE = 0.164 cm 3 /cm 3 , ubRMSE = 0.130 cm 3 /cm 3 , and MAE = 0.120 cm 3 /cm 3 ) and October (RMSE = 0.147 cm 3 /cm 3 , ubRMSE = 0.121 cm 3 /cm 3 , and MAE 3 3

Relationship between Multiple Variables and SM Measurements
By evaluating of the FY-3C VSM and multiple related factors, such as vegetation, slope, and location, against the measured SM using the QR model, we aimed to correct for bias and improve the SM inversion accuracy that was described in Section 3.1.
In Figure 6, the dependent variable, i.e., the measured SM, was plotted against quantiles from 0.05 to 0.95 for five independent variables, including the FY-3C VSM retrievals, MODIS NDVI, latitude, longitude, and elevation, and an intercept using the "quantreg" package in R (https://www.R-project. org/). The plots depicted the variation pattern of coefficient estimates and 95% confidence intervals of all independent variables and intercept. According to Figure 6, the FY-3C VSM had a positive correlation with the measured SM at all quantiles, and the QR coefficient estimates increased from 0.01 in the lower quantile to 0.22 in the upper quantile. The estimate from the OLS method was constant at approximately 0.14, which suggested that the estimated SM using OLS was likely to be overestimated in areas with low soil water contents (dry) and underestimated in areas with high soil water contents (wet). Apart from the FY-3C VSM, the MODIS NDVI had negative estimates at quantiles from 0 to 0.2 and positive estimates at quantiles from 0.2 to 0.95. The coefficient estimates showed an increasing trend before reaching their peak (0.06) at quantile 0.8 and then slightly decreased to 0.04. The latitude and elevation had both negative (quantiles 0.8-0.95) and positive (0.05-0.8) effects on SM, and the latitude was more significant than elevation at all quantiles. The longitude had a positive correlation with SM at all quantiles, and its coefficient estimates decreased from the lower to upper quantiles.
The QR models were created at five quantiles (0.1, 0.3, 0.5, 0.7, and 0.9), and their coefficient estimates and significant P values were calculated, as presented in Table 4. Table 4 indicates that the linear relationship between the measured SM and the abovementioned multiple independent variables varied at different quantiles. For FY-3C VSM, latitude and longitude, the p values were lower than 0.001 at all quantiles, which indicated a high and stable correlation between the SM measurements and those input variables. At the 0.1 quantile, the larger p value was found in the NDVI, which indicated a lower correlation between the NDVI and the ground-based SM measurements than between that and the other variables. The p values of elevation at quantiles 0.7 (0.001) and 0.9 (0.992) were greater than those of the other inputs, indicating a lower correlation between elevation and measured SM at high quantiles. All of the regression models passed the significance test at the 0.005 level except for the elevation at the 0.9 quantile. Thus, the FY-3C VSM, MODIS NDVI, latitude, and longitude were selected as the input variables for improving the accuracy of SM monitoring. estimates at quantiles from 0 to 0.2 and positive estimates at quantiles from 0.2 to 0.95. The coefficient estimates showed an increasing trend before reaching their peak (0.06) at quantile 0.8 and then slightly decreased to 0.04. The latitude and elevation had both negative (quantiles 0.8-0.95) and positive (0.05-0.8) effects on SM, and the latitude was more significant than elevation at all quantiles. The longitude had a positive correlation with SM at all quantiles, and its coefficient estimates decreased from the lower to upper quantiles. The QR models were created at five quantiles (0.1, 0.3, 0.5, 0.7, and 0.9), and their coefficient estimates and significant P values were calculated, as presented in Table 4. Table 4 indicates that the linear relationship between the measured SM and the abovementioned multiple independent variables varied at different quantiles. For FY-3C VSM, latitude and longitude, the P values were lower than 0.001 at all quantiles, which indicated a high and stable correlation between the SM measurements and those input variables. At the 0.1 quantile, the larger P value was found in the NDVI, which indicated a lower correlation between the NDVI and the ground-based SM measurements than between that and the other variables. The P values of elevation at quantiles 0.7 (0.001) and 0.9 (0.992) were greater than those of the other inputs, indicating a lower correlation between elevation and measured SM at high quantiles. All of the regression models passed the significance test at the 0.005 level except for the elevation at the 0.9 quantile. Thus, the FY-3C VSM, MODIS NDVI, latitude, and longitude were selected as the input variables for improving the accuracy of SM monitoring.

Multivariate Models for SM Estimation
The issue with SM overestimation was relatively more serious in summer (July) among all of the seasons, as demonstrated in Section 3.1. Six multivariate models (MLR-1, MLR-2, MLR-3, MBPNN-1, MBPNN-2, and MBPNN-3) were built and compared to improve the SM estimation accuracy in July. The R, RMSE, MAE, and MRE values between the actual and estimated SM using those models were calculated pixel-by-pixel and are shown in Table 5. It is obvious that the performances of the models with two (MLR-2 and MBPNN-2) or three (MLR-3 and MBPNN-3) independent or input variables were improved compared with the models with only FY-3C VSM (MLR-1 and MBPNN-1) as the independent or input variable. The MLR-1 and MLR-2 models obtained higher monitoring accuracy than the MBPNN-1 and MBPNN-2 models, which was possibly due to the disadvantages of the BPNN, such as the limited sample number, overfitting, and trivial linear case inclusion. The addition of location information significantly enhanced the accuracy of SM estimates in models MLR-3 and MBPNN-3, and the performance of MBPNN-3 (R = 0.871, RMSE = 0.034 cm 3 /cm 3 , and MAE = 0.026 cm 3 /cm 3 , MRE = 20.7%) was the best among all of the models, followed by MLR-3 (R = 0.694, RMSE = 0.047 cm 3 /cm 3 , MAE = 0.035 cm 3 /cm 3 , and MRE = 27.3%).
To further validate the accuracy of the SM estimations using MLR-3 and MBPNN-3, the distribution of absolute error and relative error between the actual and estimated SM in the pixels are depicted in Figure 7. Generally, MBPNN-3 demonstrated better performance than MLR-3, with lower median values of the absolute (0.00 cm 3 /cm 3 ) and relative (14.0%) errors than MLR-3. Additionally, the distribution ranges of the MLR-3 errors between the 25% (Q1) and 75% (Q3) quantiles, which were [−0.02, 0.02] for absolute errors and [11.0%, 28.0%] for relative errors, were both wider and overall larger than those for the MBPNN-3, whose absolute and relative errors between Q1 and Q3 were distributed between [−0.01, 0.02] and [7.5%, 25.0%], respectively. The results indicate that the multivariate nonlinear model is a suitable and promising approach for achieving SM estimation results with high accuracy.

Regional SM Monitoring Results
Here, the MBPNN-3 model was applied to estimate the regional SM in the study area from January to October 2019, as shown in the middle column of Figure 8. The images in the left and right columns are the FY-3C VSM retrievals and the measured SM from the ISMN. In general, the FY-3C VSM products overestimated the SM in the eastern part and underestimated the SM in the midwestern part of the U.S. during the 10 months. It was evident that the spatiotemporal variation patterns of the SM estimations using the MBPNN-3 model were more consistent with the in situ SM measurements than those using the FY-3C VSM products. The over-and underestimation issues were reduced in the estimated SM images throughout the study area, especially the overestimation problem in the western and northeastern parts, indicating that the MBPNN-3 model with the FY-3C VSM and supplementary information (vegetation and location) as inputs was applicable for providing reliable SM estimations. Dry areas with SM values lower than 0.20 cm 3 /cm 3 were found in the southwestern part, and the proportion of dry areas gradually increased from January to July and slightly decreased from August to October. The wet areas with SM values greater than 0.30 cm 3 /cm 3 were mainly located in the mideastern region, and the proportions of wet area during January to July were slightly greater than those from July to October.

Regional SM Monitoring Results
Here, the MBPNN-3 model was applied to estimate the regional SM in the study area from January to October 2019, as shown in the middle column of Figure 8. The images in the left and right columns are the FY-3C VSM retrievals and the measured SM from the ISMN. In general, the FY-3C VSM products overestimated the SM in the eastern part and underestimated the SM in the midwestern part of the U.S. during the 10 months. It was evident that the spatiotemporal variation patterns of the SM estimations using the MBPNN-3 model were more consistent with the in situ SM measurements than those using the FY-3C VSM products. The over-and underestimation issues were reduced in the estimated SM images throughout the study area, especially the overestimation problem in the western and northeastern parts, indicating that the MBPNN-3 model with the FY-3C VSM and supplementary information (vegetation and location) as inputs was applicable for providing reliable SM estimations. Dry areas with SM values lower than 0.20 cm 3 /cm 3 were found in the southwestern part, and the proportion of dry areas gradually increased from January to July and slightly decreased from August to October. The wet areas with SM values greater than 0.30 cm 3 /cm 3 were mainly located in the mideastern region, and the proportions of wet area during January to July were slightly greater than those from July to October.

Accuracy Assessment between the Estimated and Actual SM
To further validate the SM estimation accuracy, the R, RMSE, MAE, and MRE values between the estimated and actual SM, as well as the number of samples in each month from January to October 2019, were calculated, as shown in Table 6. The R values in each month were distributed between 0.787 and 0.943, and the RMSE, MAE, and MRE values in each month in 2019 were all lower than 0.063 cm 3 /cm 3 , 0.054 cm 3 /cm 3 , and 46.7%, respectively, indicating that the SM estimation accuracy in the U.S. was generally high during all months. The SM estimation accuracy varied in different months. The highest SM estimation accuracy was found in February (R = 0.943, RMSE = 0.030 cm 3 /cm 3 , MAE = 0.022 cm 3 /cm 3 , and MRE = 10.4%), followed by January (R = 0.913, RMSE = 0.040 cm 3 /cm 3 , MAE = 0.031 cm 3 /cm 3 , and MRE = 15.4%). The worst accuracy was found in September (R = 0.862, RMSE = 0.063 cm 3 /cm 3 , MAE = 0.054 cm 3 /cm 3 , and MRE = 46.7%), followed by October and August. In conclusion, the established MBPNN-3 model yielded accurate and acceptable SM estimations in all seasons and can be employed to improve the monitoring accuracy of regional surface soil moisture.

Discussion
The enhancement of the FY-3C VSM product using SM measurements and multivariate models

Accuracy Assessment between the Estimated and Actual SM
To further validate the SM estimation accuracy, the R, RMSE, MAE, and MRE values between the estimated and actual SM, as well as the number of samples in each month from January to October 2019, were calculated, as shown in Table 6. The R values in each month were distributed between 0.787 and 0.943, and the RMSE, MAE, and MRE values in each month in 2019 were all lower than 0.063 cm 3 /cm 3 , 0.054 cm 3 /cm 3 , and 46.7%, respectively, indicating that the SM estimation accuracy in the U.S. was generally high during all months. The SM estimation accuracy varied in different months. The highest SM estimation accuracy was found in February (R = 0.943, RMSE = 0.030 cm 3 /cm 3 , MAE = 0.022 cm 3 /cm 3 , and MRE = 10.4%), followed by January (R = 0.913, RMSE = 0.040 cm 3 /cm 3 , MAE = 0.031 cm 3 /cm 3 , and MRE = 15.4%). The worst accuracy was found in September (R = 0.862, RMSE = 0.063 cm 3 /cm 3 , MAE = 0.054 cm 3 /cm 3 , and MRE = 46.7%), followed by October and August. In conclusion, the established MBPNN-3 model yielded accurate and acceptable SM estimations in all seasons and can be employed to improve the monitoring accuracy of regional surface soil moisture.

Discussion
The enhancement of the FY-3C VSM product using SM measurements and multivariate models improved its feasibility for land surface SM monitoring. In this study, the in situ SM measurements from ISMN played an important role during processes such as the retrieval accuracy evaluation of FY-3C VSM, the relationship analysis of various input variables, the establishment of multivariate models, and the validation of SM estimation accuracy. A large number of representative stations in the U.S. made it possible to map the SM measurements across the whole study area and adequate training and testing samples for the MLR and MBPNN models. Therefore, widespread and accurate SM measurements are crucial and indispensable data sources for enhancing satellite products. The interpolation method allows the spatial resolution consistency between the satellite and in situ SM data. However, the SM prediction accuracy needs to be further improved.
Although the established MBPNN-3 model with FY-3C VSM, MODIS NDVI, latitude, and longitude as inputs enhanced the SM retrievals from the pure satellite products, the estimation accuracy varied in different seasons due to the seasonal variations in land surface features and the impacts of other factors. The seasonal differences in the relationships between SM and vegetation coverage and in the climate distribution should be studied to further improve the accuracy of SM estimations in future research.

Conclusions
Assessing the consistency between the FY-3C VSM and measured SM indicated an issue with overestimation due to the impacts caused by factors such as vegetation. Thus, a comprehensive evaluation of multiple factors is necessary to improve the SM inversion accuracy from the FY-3C VSM. In this case, QR models were applied to provide a detailed depiction of the relationship between SM and multisource factors, including the FY-3C VSM, MODIS NDVI, elevation, latitude, and longitude. The results showed that the FY-3C VSM and longitude had a positive correlation with SM in all quantiles, and the MODIS NDVI, latitude, and elevation had both positive and negative effects in the different quantiles. The coefficient estimates of those factors all passed the significance test at the 0.005 level at each quantile between [0.1, 0.9], except for the elevation at quantile 0.9.
In this study, six multivariate models, including MLR-1, MLR-2, MLR-3 MBPNN-1, MBPNN-2, and MBPNN-3, were employed to estimate regional SM in July. The assessment of model performance showed that the SM estimation accuracy was improved with the increase in the number of independent or input variables for both linear and nonlinear models. The MBPNN-3 achieved the highest SM estimation accuracy and was thus used to estimate SM in the U.S. from January to October 2019. The FY-3C VSM retrievals, estimated SM using MBPNN-3 and the measured SM images, were compared, and the R values and estimation errors between the actual SM measurements and SM estimations in each month were calculated. The results showed that the spatiotemporal characteristics of the estimated SM using MBPNN-3 were more consistent with the actual SM than those estimated using the FY-3C VSM retrievals. The values of R were distributed between 0.787 and 0.943, and the RMSE, MAE, and MRE values were less than 0.063 cm 3 /cm 3 , 0.054 cm 3 /cm 3 , and 46.7%, respectively, which demonstrated the feasibility of using the MBPNN-3 model to improve the SM monitoring accuracy at the regional scale.

Conflicts of Interest:
The authors declare no conflicts of interest.

Appendix A
See Figure A1 here.