Rebuilding long time series global soil moisture products using the neural network adopting the microwave vegetation index

: This study presents a back propagation neural network (BPNN) method to rebuild a global and long-term soil moisture (SM) series, adopting the microwave vegetation index (MVI). The data used in our study include Soil Moisture and Ocean Salinity (SMOS) Level 3 soil moisture (SMOSL3sm) data, the Advanced Microwave Scanning Radiometer-Earth Observing System (AMSR-E), and Advanced Microwave Scanning Radiometer 2 (AMSR2) Level 3 brightness temperature (TB) data and L3 SM products. The BPNNs on each grid were trained over July 2010–June 2011, and the entire year of 2013, with SMOSL3sm as a training target, and taking the reﬂectivities (Rs) of the C/X/Ku/Ka/Q bands, and the MVI from AMSR-E/AMSR2 TB data, as input, in which the MVI is used to correct for vegetation effects. The training accuracy of networks was evaluated by comparing soil moisture products produced using BPNNs (NNsm hereafter) with SMOSL3sm during the BPNN training period, in terms of correlation coefﬁcient (CC), bias (Bias), and the root mean square error (RMSE). Good global results were obtained with CC = 0.67, RMSE = 0.055 m 3 /m 3 and Bias = − 0.0005 m 3 /m 3 , particularly over Australia, Central USA, and Central Asia. With these trained networks over each pixel, a global and long-term soil moisture time series, i.e., 2003–2015, was built using AMSR-E TB from 2003 to 2011 and AMSR2 TB from 2012 to 2015. Then, NNsm products were evaluated against in situ SM observations from all SCAN (Soil Climate Analysis Network) sites (SCANsm). The results show that NNsm has a good agreement with in situ data, and can capture the temporal dynamics of in situ SM, with CC = 0.52, RMSE = 0.84 m 3 /m 3 and Bias = − 0.002 m 3 /m 3 . We also evaluate the accuracy of NNsm by comparing with AMSR-E/AMSR2 SM products, with results of a regression method. As a conclusion, this study provides a promising BPNN method adopting MVI to rebuild a long-term SM time series, and this could provide useful insights for the future Water Cycle Observation Mission (WCOM).


Introduction
Land surface soil moisture (SM), which is the water stored in the upper soil layer, is a key variable to improve our understanding of the energy and water cycles in the Earth system; thus, it is an important parameter in climate, hydrology, and environment [1][2][3][4].SM plays a crucial role in a large number of applications, including numerical weather prediction, disaster monitoring, crop yield prediction, flood and drought damage estimation, water resources management, greenhouse gas accounting, civil protection, and epidemiological modeling of water borne diseases [5][6][7][8].SM, as a key parameter in the water cycle process, has been endorsed by the Global Climate Observing System (GCOS) as one of 50 Essential Climate Variables (ECVs), which are required to support the work of the United Nations Framework Convention on Climate Change (UNFCCC) and the Intergovernmental Panel on Climate Change (IPCC).As many applications mentioned above require a soil moisture record that spans a longer period than the lifetime of a single sensor, SM is required for both current and historical observations.There is a need to build a long time-series SM product.Lack of spatial-temporal-consistent, long time series products, existing SM data with various resolutions, and accuracy cannot provide effective support for the study of the water cycle response mechanism to global climate change, which has been identified as one of scientific objectives of the new Chinese satellite mission of WCOM [9,10].It is necessary to build space-temporal-consistent, long time series products of SM, to answer the scientific problems in the study of the water cycle and climate change.
Both the active and passive microwave remote sensing systems can estimate SM through the observation of backscatter signals and brightness temperatures (TBs), especially passive microwaves at low frequencies.The theoretical basis of microwave remote sensing of soil moisture is that very large differences between the dielectric constants of dry soil and liquid water lead to a very large contrast between wet soil and dry soil [11,12].Several microwave radiometers onboard satellites can be used to estimate global soil moisture, such as radiometers operating at the C band and at higher frequencies [13][14][15], including the Scanning Multichannel Microwave Radiometer (SMMR), the Special Sensor Microwave/Imager (SSM/I), the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI), WindSat, AMSR-E/AMSR2, and FY-3 (FengYun 3) from China, etc.However, these radiometers are easily affected by vegetation attenuation, are insensitive to soil moisture under conditions of moderate vegetation (water content greater than ~3 kg/m 2 ), and measurements can only represent information about the top of 1 cm of soil [11].Since the launch of SMOS of the European Space Agency (ESA), which operates at a low frequency (L-band), several similar satellites have been successively launched, including the Aquarius-and the Soil Moisture Active Passive (SMAP) mission [1,6,16,17].All of them can provide global observations in the L-band, in which SMOS provides TB observations with multiple incidence angles, and the SMAP mission and Aquarius/SAC-D are supposed to provide global measurements at both the L-band TB and backscatter.However, the SMAP radar stopped working in July 2015.L-band has more advantages in the retrieval of soil moisture, because it can penetrate the atmosphere and vegetation coverage (up to ~5 kg/m 2 water content), and TB represents information on the upper 5 cm of soil [1].
In order to make full use of the observations accumulated by different sensors, and to build a long time series dataset, multisource soil moisture data from historical and existing data should be merged for a long period of time.Liu et al. [18] presented an approach for combining passive and active soil moisture.This research spanned the soil moisture observations period starting from 1979, and is of great significance to enhance our basic understanding of soil moisture in the water, energy and carbon cycles.In view of the similar and different payload configurations, the solutions for products rebuilding can be divided into two types: (1) Cross calibration method.For similar payload configurations of different sensors, we can use the method of cross calibration.Different sensors with similar payload configurations include SMMR, SSM/I, TMI, WindSat, AMSR-E/AMSR2, with multi-frequency bands ranging from C, X, Ku, Ka-bands to higher frequencies; different sensors with similar payload configurations also include sensors at low-frequencies, i.e., L band, such as SMOS, Aquarius, SMAP/Radiometer, etc.The reconstruction solution of long time series products in this case is takes cross calibrations between different sensors, unifies the algorithm, and applies the same algorithm to the observations of different sensors; (2) Taking the most credible retrieval products (such as SMOS products, etc.) as the standard reference, to train other data.For different sensors with low and high frequencies, such as AMSR-E, with the C, X, Ku, and Ka-band, and SMOS, with the L band, the reconstruction solution of long time series products may be the option in this case.
The NNs method is an effective nonlinear method to establish a model.It has been widely applied in remote sensing fields, including soil moisture retrieval.The general idea is to build relationships between input data (TB/backscatter, SM) and target SMs (in situ SM, model SM, or satellite SM) through NNs training, and then retrieve SM using the trained NNs.At a small scale, such as the watershed scale, NNs have been used to retrieve soil moisture, with in situ measurements as reference and airborne/satellite observations as input, including synthetic aperture radar (SAR) [19][20][21][22][23][24][25] and radiometer observations [26][27][28][29][30], or both active/passive data [31,32].Another approach is to use a model SM as a reference to train NNs (models such as National Centers For Environmental Prediction (NCEP), European Centre For Medium Range Weather Forecasts (ECMWF), Global Land Data Assimilation System (GLDAS)), and add some auxiliary data to the input layer (Normalized Difference Vegetation Index (NDVI), land surface temperature (LST), Precipitation (PRC)) to obtain more ideal retrieval results [33][34][35].These studies built the foundations of NN application in SM retrieval.In recent years, some researchers have used NNs method to rebuild long time series global SM products.Lu et al. [36] reconstructed a time series soil moisture dataset using SMOS and AMSR2 soil moisture products to train the NN, with daily TB, NDVI, LST, PRC, and DEM information as input data.However, this study concentrated on the Heihe River Basin, and its time-series products only had a year of products in 2012 because the limitations of AMSR2 TBs.In order to develop longer time series products, based on the ESA-funded SM fusion study program, de Jeu et al. [37,38] designed three approaches to retrieve global time series soil moisture datasets during the 2003-2013 period, with SMOS and AMSR-E datasets over the June 2010-September 2011 period.Their goal was to carry out an integration of SMOS in a consistent soil moisture climate record.The first approach was based on statistical regression, which was adopted by Al-Yaari et al. [39].The authors retrieved global soil moisture during the 2003-2011 period using SMOSL3sm and AMSR-E TB observations as the training dataset.The second approach is a NN method developed by Rodriguez-Fernandez et al. [40], using ECMWF SM predictions and SMOSL3sm products as reference to train the NNs, with SMOS TB, ASCAT backscatter (σ), NDVI, and soil texture information as input data.Rodriguez-Fernandez et al. [41][42][43] compared different combinations of input data (TB, NDVI, texture) and obtained the best configuration of TB to train NN.Additionally, they analyzed the contribution of auxiliary data to the accuracy of SM retrieval.The third method, called Land Parameter Retrieval Model (LPRM) fusion, was derived by van der Schalie et al. [44].Their research updated roughness parameterization, and optimized AMSR-E LPRM parameters for the C-and X-bands to match SMOS retrievals.
In our study, for comprehensive consideration, we adopted the second solution to rebuild long time series SM products, using a BPNN method with SMOSL3sm products as a reference.The main target of this study was to develop a reconstruction approach to obtain long time series global soil moisture datasets, on the basis of previous studies.The initial purpose of this study is to find a way to refine the former observations from the future WCOM soil moisture products, which mainly rely on L-S-C band observations.Therefore, the best option is to use SMOS observations to train the former AMSR-E/AMSR2 data.The data fusion of multi-source satellite observations lies outside of the aim of this study.On the choice of reference, from Aquarius, SMAP, and SMOS, we selected SMOS SM as the standard reference of SM products.SMOS provides multi-angular global microwave TB observations, from 2010 until the present, and its SM products have the high accuracy of 0.04 m 3 /m 3 .A great deal of research has evaluated SMOS SM products and demonstrated its high accuracy [45][46][47][48][49][50][51][52][53][54][55][56].For the consideration of long time series, we choose AMSR-E/AMSR2 data as the training dataset because it has the same configuration and constitutes a continuous time series, for AMSR-E, lasting from 2003 to 2011, and AMSR2, lasting from 2012 to the present.In contrast to previous studies, we implemented the Microwave Vegetation Indices (MVI) as inputs in order to provide information on the effects of vegetation.Additionally, this study serves as part of the pre-feasibility studies of the scientific objectives of WCOM.We will join the modeled and WCOM observed soil moisture to produce an even better product, and then use this to refine the former observations (including SMOS, AMSR-E/AMSR2, etc.).
The rest of this paper is organized as follows.Section 2 presents the different data used in the study and the BPNN method used to retrieve soil moisture.Section 3 shows the results of BPNN and evaluates NNsm with SCAN in situ SM observations.Then, Section 4 discusses the accuracy and advantages of NNsm in comparison to satellite SM products, and the resulting datasets of other methods.Finally, Section 5 provides conclusions.

SMOS Data
The ESA SMOS satellite has a sun-synchronous polar orbit with a 06:00/18:00 equator overpass time of ascending and descending orbits.It provides global multi-angular TB observations at the 1.41 GHz L-band from 2010 until the present, with a resolution of 35-50 km, and the mission target of SM products is 0.04 m 3 /m 3 [57].
The SMOSL3sm ascending product (at 06:00 local time) was selected in our study, corresponding to the AMSR-E/AMSR2 descending nighttime data (at 01:30 local time).SMOSL3sm daily products are provided by CATDS (Centre Aval de Traitement des Données).It can be freely downloaded from CATDS website from 16 January 2010, to the present.The SMOSL3sm daily product is a gridded product, with a global size of 1383 rows × 586 columns and a 25 km resolution global EASE (Equal Area Scalable Earth) grid.The format of the product is NetCDF, which is a user-friendly format.The SMOSL3sm product represents the soil moisture of the top 5 cm of the soil layer.
SMOSL3sm was used in this study in the BPNN training period, and consisted of two years of data: From 1 July 2010 to 30 June 2011, and the entire year of 2013.The first data period coincides with the AMSR-E data below, and the second data period coincides with the AMSR2 data below.

AMSR-E Data
AMSR-E is one of six sensors onboard the NASA Aqua satellite, which was launched on 4 May 2002, and stopped operations on 4 October 2011.It had a 13:30/01:30 equator-crossing orbit of ascending and descending orbits, with a one-to two-day repeat coverage.The AMSR-E sensor is a passive microwave radiometer operating at six frequencies, ranging from 6.925 GHz to 89.0 GHz (6.9 GHz, 10.7 GHz, 18.7 GHz, 23.8 GHz, 36.5 GHz, and 89.0 GHz).Both horizontal and vertical polarized radiations are measured at each frequency with an incidence angle of 55 • .The ground spatial resolution at the nadir is 75 × 45 km for the 6.925 GHz channel (C-band), and the footprints of the higher frequencies are much smaller than that of the C-band.
AMSR-E L3 TB data and AMSR2 L3 TB data, described in Section 2.1.3,are descending nighttime data (at 01:30 local time), because, at night, surface temperatures (Ts) are more stable than during the daytime, and the vegetation temperature is closer to that of soil temperature [58].AMSR-E L3 TB data are provided by the National Snow and Ice Data Center (NSIDC), and can be downloaded free of charge from the NSIDC.These data are provided in EASE-Grid projections [59,60] (global cylindrical) at a 25-km resolution, which is the same as SMOSL3sm data.For the dataset, spatial coverage is global, data are daily, and coverage begins 19 June 2002, and ends 27 September 2011.The TB data (in tenths of Kelvins) are two-byte unsigned integers.
We used two kinds of AMSR-E L3 SM products.One kind was the standard products of AMSR-E, derived by the NSIDC, and can be downloaded from [61].The other type of product is based on the Land Parameter Retrieval Model (LPRM) algorithm, and can be downloaded freely [62], with a 0.25 × 0.25 resolution.
In this study, the AMSR-E L3 TB data used in the BPNN training period ranged from 1 July 2010 to 30 June 2011, which coincide with the aforementioned first period of SMOSL3sm data.The AMSR-E L3 TB used in the simulation period span from 1 January 2003 to 27 September 2011.The AMSR-E L3 SM products used to compare with NNsm have the same range as the AMSR-E L3 TB.

AMSR2 Data
The AMSR2 onboard the Global Change Observation Mission-Water (GCOM-W) satellite is operated by the Japan Aerospace Exploration Agency (JAXA).The aim of this satellite is to observe water circulation changes.As a follow-up mission for AMSR-E/Aqua, the sensor and satellite configurations of AMSR2 are almost the same as those of AMSR-E/Aqua, as described in last section.
AMSR2 L3 TB is produced by JAXA, and has an equi-rectangular (EQR) projection.For the dataset, spatial coverage is global, data are daily, the number of pixels for EQR projection is 0.25 • , the resolution is 1440 × 720, and coverage began 3 July 2012 and is continuing today.Data can be accessed freely from the GCOM-W1 Data Providing Service.
We used two kinds of AMSR2 L3 SM products.One is the standard products of AMSR2 from JAXA, based on the lookup table method.Furthermore, it can be obtained from [63].Another is based on the Land Parameter Retrieval Model (LPRM) algorithm, with a 0.25 × 0.25 resolution, and can be downloaded from [64].
The AMSR2 L3 TB data used in this study lasted the entire year of 2013, which coincided with the aforementioned second period of SMOSL3sm data.The AMSR2 L3 TB used in the simulation period spanned from 1 January 2013 to 31 December 2015.In order to collocate with SMOS data, the AMSR2 data with an EQR projection of 1440 × 720 are interpolated to a SMOS 1383 × 586 grid of EASE projection.The AMSR2 L3 SM used in this study ranged from 2013 to 2015.

In Situ Network Data
To evaluate the NNsm time series, we used in situ data from the Soil Climate Analysis Network (SCAN) [65,66].SCAN sites are comprehensive automated weather stations that provide agricultural producers and resource managers with hourly data to monitor soil and weather conditions.Each SCAN installation can measure soil moisture and temperature at depths of 5, 10, 20 and 50 cm below ground surface.This database is available from 1 January 1996 until the present, and can provide long time series of in situ soil moisture observations that met our demands.It is an essential means of the geoscientific community for validating and improving global satellite observations and land surface models.
The sampling interval of SCAN is one hour; we use the daily average soil moisture, at a depth of 5cm, corresponding to the measurement depth of the L-band.We used 206 sites, which can provide effective data, from a total of 232 SCAN sites across the United States.We selected 15 representative SCAN sites for more detailed analysis, which covered a wide variety of land cover types, soil types, and climates [49].The characteristics of the selected 15 sites are shown in Table 1.

The BPNN Method
A neural network (NN) is a powerful nonlinear regression method.BPNN, one of the commonly-used NNs, is a type of multilayer feedforward network, in which the algorithm is trained according to error back propagation.Its learning rule is the steepest descent method, adjusting the network weights and threshold by error back propagation in order to minimize the sum of squared error of the network.
The structure of the BPNN model includes the input layer, the hidden layer, and the output layer.Our research adopted the three-layer network model with a single hidden layer.We used Matlab software to realize the training and verification of BPNN.After several tests, we selected the appropriate configurations and built the following network: where P is the input vector; T is the output vector; 7 is the number of neuron nodes in the hidden layer, which is determined when the learning error does not significantly decrease with the increase of neuron nodes, after several training tests, with a small initial value of neuron nodes; 'tansig' is the node transfer function of the hidden layer, and it is a hyperbolic tangent sigmoid transfer function, with an arbitrary input and output, ranging from −1 to 1, which is different from the linear transfer function 'purelin' etc.; 'purelin' is the transfer function of the output layer, 'trainlm' is the training function that uses the Levenberg-Marquardt algorithm to estimate the training weight.
The input layer includes effective surface reflectivities (Rs) of each channel of AMSR-E/AMSR2 and the microwave vegetation index (MVI), and the output layer is the SMOSL3sm product.For the selection of key input variables in our study, we chose the Rs of the C/X/Ku/Ka/Q bands and the MVI derived from AMSR-E/AMSR2 TBs as the input layer, rather than the TBs from each channel of AMSR-E/AMSR2.Figure 1 shows the flow chart of the BPNN method.

The BPNN Method
A neural network (NN) is a powerful nonlinear regression method.BPNN, one of the commonly-used NNs, is a type of multilayer feedforward network, in which the algorithm is trained according to error back propagation.Its learning rule is the steepest descent method, adjusting the network weights and threshold by error back propagation in order to minimize the sum of squared error of the network.
The structure of the BPNN model includes the input layer, the hidden layer, and the output layer.Our research adopted the three-layer network model with a single hidden layer.We used Matlab software to realize the training and verification of BPNN.After several tests, we selected the appropriate configurations and built the following network: where P is the input vector; T is the output vector; 7 is the number of neuron nodes in the hidden layer, which is determined when the learning error does not significantly decrease with the increase of neuron nodes, after several training tests, with a small initial value of neuron nodes; 'tansig' is the node transfer function of the hidden layer, and it is a hyperbolic tangent sigmoid transfer function, with an arbitrary input and output, ranging from −1 to 1, which is different from the linear transfer function 'purelin' etc.; 'purelin' is the transfer function of the output layer, 'trainlm' is the training function that uses the Levenberg-Marquardt algorithm to estimate the training weight.
The input layer includes effective surface reflectivities (Rs) of each channel of AMSR-E/AMSR2 and the microwave vegetation index (MVI), and the output layer is the SMOSL3sm product.For the selection of key input variables in our study, we chose the Rs of the C/X/Ku/Ka/Q bands and the MVI derived from AMSR-E/AMSR2 TBs as the input layer, rather than the TBs from each channel of AMSR-E/AMSR2.Figure 1 shows the flow chart of the BPNN method.

Input Layer Selection
On the selection of the input layer, we start with the radiative transfer process of land surfaces.Different soil moisture contents, from dry soil to wet soil, lead to different soil dielectric constants, so the corresponding soil emissivities/reflectivities vary.The soil information is then affected by vegetation and the atmosphere and is observed by microwave sensors.The main influence factors in passive microwave retrieval of soil moisture are vegetation coverage and surface roughness.First,

Input Layer Selection
On the selection of the input layer, we start with the radiative transfer process of land surfaces.Different soil moisture contents, from dry soil to wet soil, lead to different soil dielectric constants, so the corresponding soil emissivities/reflectivities vary.The soil information is then affected by vegetation and the atmosphere and is observed by microwave sensors.The main influence factors in passive microwave retrieval of soil moisture are vegetation coverage and surface roughness.First, we adopt Rs to represent the final surface signals, which can be derived from the TB signals and the surface temperature (T s ): where p is the polarization and i represents the five frequencies of AMSR-E/AMSR2.The surface temperature (T s ) can be calculated by an empirical formula with a V polarization TB of 36.5 GHz [67]: Vegetation has a significant influence on total surface emissivities.The training target of the BPNN is soil moisture; in order to optimize the training relationship, we introduced the microwave vegetation index (MVI) [68] as an input for vegetation effect correction.MVI is only affected by vegetation and can express vegetation characteristics well.Previous studies have shown that MVI is related to vegetation properties and can be used for soil moisture retrieval through analytical [69] or iterative solutions [70].In comparison to the optical vegetation indices, such as NDVI, Ratio Vegetation Index (RVI), and Leaf Area Index (LAI), MVI has the following advantages: (1) it will not be affected by the conditions of weather and lighting; (2) it will not reach saturation easily in thickly vegetated areas; and (3) it includes information about both the leafy and woody parts of vegetation, due to its greater penetration and sensitivity, while NDVI mainly responds to a thin layer of canopy.Compared to other microwave vegetation indices, such as microwave polarization difference temperature (MPDT), microwave polarization difference index (MPDI), and microwave emissivity difference vegetation index (EVDI) [71][72][73][74], MVI can remove the influence of soil background signals and only depends on vegetation properties.MVI can be derived from AMSR-E/AMSR2 TBs [68]: where f 1 and f 2 are two adjacent frequencies; b is the linear coefficient of emissivity for f 1 and f 2 ; V t is the vegetation transmission component, which is related to temperature and the overall vegetation transmissivity effect.MVI is polarization independent and is not affected by ground surface emission signals when using the ratio of polarization differences.In our study, we use the TBs of the C-band and X-band of AMSR-E/AMSR2 to calculate MVI.

BPNN Training
The BPNNs in Equation (1) were trained in order to determine the best relationship between the input and the output.In the training phase, we took Rs of C/X/Ku/Ka/Q bands and the MVI from AMSR-E/AMSR2 TB data as input, and used two years of SMOSL3sm data as the reference output: from 1 July 2010 to 30 June 2011, and the entire year of 2013, which coincides with the AMSR-E data and the AMSR2 data, respectively.Thus, the BPNNs were trained on each grid, over the two-year period (July 2010-June 2011, and 2013).Grids with numbers of AMSR-SMOS matching measurements (N) less than 50 were excluded.

Development of the NNsm Product
In the simulation phase, our input layer Rs and MVIs are derived from TBs, thus, we considered the time coverage of the AMSR-E and AMSR2 TBs.Based on the BPNNs trained in the previous step, we can develop a long-term SM dataset over 2003-2015 (except 2012) on each grid, with available AMSR-E TB from 1 January 2003 to September 2011 and AMSR2 TB from 2013 to 2015.Normally, soil moisture is not retrieved over frozen soils, thus, before the simulation, we preprocessed the TBs and T S data, removing those data where T s were lower than 273.15K.

Evaluation of NNsm Product
Before application of the NNsm product, we firstly evaluate the performance of the BPNN algorithm.Because we take SMOSL3sm as a training target, we compared NNsm with SMOSL3sm during the period of the BPNN training.To evaluate long-term NNsm products, we compared NNsm products with SCAN in situ observations.The evaluation was done using the correlation coefficient (CC), the root mean square error (RMSE) and bias (Bias), defined as follows: where the overline indicates the mean and x i is the NNsm, y i is SMOSL3sm or SCAN in situ SM observations: where N is the number of matching data.Additionally, to better understand the similarity/dissimilarity between NNsm and other SM's patterns with the in situ SCANsm patterns, Taylor Diagrams [75,76] were used.A Taylor Diagram provides a way of graphically summarizing how closely a pattern (or a set of patterns) matches observations.The similarity between two patterns is quantified in terms of several statistics, including their correlation (CC), their centered centered RMS difference (CRMS) and the standard deviations (SD), which are related by: To plot statistics of two SM products, the statistics can be normalized (and non-dimensionalized), dividing both the centered RMS difference and the SD of the two products by the SD of the in situ observations, and thus get the normalized SD (NSD) and the normalized centered RMS difference (NCRMS): To understand whether SM retrievals capture the interannual variations and rainfall events, in addition to the seasonality, we compute their anomalies [18].The anomalies (ANO) were obtained by removing the product's seasonality SM from the original (ORI) time series: where Y represents the years from 2003 to 2015, DOY means one day of the year.The seasonality SM is one time series of 366 days:

BPNN Training Results
To display the training results, we compared the daily and three-day average NNsm with the corresponding reference SMOSL3sm on 2 July 2010, as shown in Figure 2. Figure 2 shows that the trained NNsm has an almost identical spatial distribution and structure with SMOSL3sm.This illustrates that the BPNN method has a good training result, and the input AMSR TB has a good nonlinear relationship with the reference SMOSL3sm.In Figure 3a, the correlation coefficients between the reference (SMOSL3sm) and the trained SM (NNsm) are shown.The trained SM values generally correlate well with the reference SM values over most of the globe, with a global mean of CC = 0.67, and with higher values of CC (CC > 0.8) obtained over the United States, South Africa and Northwestern Africa, the south of South America, etc., and especially over Australia, where it corresponds to low to moderate vegetation coverage.Highly vegetated areas, including the Amazon and Congo rainforests, have a low correlation or failure in retrieval due to the vegetation effects.There are also areas with low vegetation coverage with poor results, such as the Arabian Peninsula.The spatial distribution of the RMSE values are plotted in Figure 3b, with a global mean of RMSE = 0.055 m 3 /m 3 .Lower RMSE values (<0.04 m 3 /m 3 ) were generally distributed over arid, semi-arid, and desert regions with short and sparse vegetation or bare soil (e.g., Australia, Sahara, Southwestern United States), while high RMSE values (~0.1 m 3 /m 3 ) were distributed over regions with moderate to high vegetation or alpine flora (e.g., Southeastern China, India, Southeastern Australia, Eastern United States, and locations close to the Earth's equator).In general, the CC values are mainly moderate to high, and the RMSE values are moderate, but in an acceptable and reasonable range.Figure 3c shows the absolute value of Bias, with a magnitude of 10 −3 and mean global value of Bias = −0.0005m 3 /m 3 .The results show that the BPNN trained SM is mainly unbiased compared with the original SMOSL3sm, and it ensures the stability of the long time series of soil moisture.In Figure 3, we can note that there is such a gap in the data such as in southern Asia.As we know, in southern Asia and the northeastern Africa, the RFI is serious.The NNs training are affected by RFI.In those areas, the NNs cannot be trained effectively or the NNs training result have a poor relationship between the input and the output, so the soil moisture cannot be retrieved correctly.The poorly retrieved NNs cause a gap in the data in southern Asia and the low correlations in northeastern Africa.
In order to better understand the accuracy of the training results, we specifically analyzed each grid that the selected SCAN sites belong to.Detailed analysis results are shown in Figures 4 and 5 In Figure 3, we can note that there is such a gap in the data such as in southern Asia.As we know, in southern Asia and the northeastern Africa, the RFI is serious.The NNs training are affected by RFI.In those areas, the NNs cannot be trained effectively or the NNs training result have a poor relationship between the input and the output, so the soil moisture cannot be retrieved correctly.The poorly retrieved NNs cause a gap in the data in southern Asia and the low correlations in northeastern Africa.
In order to better understand the accuracy of the training results, we specifically analyzed each grid that the selected SCAN sites belong to.Detailed analysis results are shown in Figures 4 and 5 and Table 2. Figure 4 shows the temporal series scatterplots for the trained NNsm (red dots) versus SMOSL3sm (blue dots) over the training period (July 2010-June 2011, and 2013).As can be seen from the figures, the trend of the trained NNsm (red dots) is consistent with the trend of SMOSL3sm (blue dots), and NNsm captures the temporal dynamic of SMOSL3sm.Table 2 presents the statistical comparison between NNsm and SMOSL3sm for the training period (July 2010-June 2011, and 2013).In the analysis, we computed the statistical characteristics in terms of CC, Bias, and RMSE. Figure 5 shows the scatterplots for the trained NNsm versus SMOSL3sm and the RMSE values.In Table 2, it is obvious that the value of CC is high, ranging from 0.49 to 0.95, with an average value 0.75 of 15 selected sites and an average value 0.63 of all SCAN sites.The RMSE have an average value 0.062 m 3 /m 3 of 15 selected sites and an average value 0.053 m 3 /m 3 of all SCAN sites.The better values are 0.037 m 3 /m 3 from site 2093 and 0.034 m 3 /m 3 from site 2026; particularly, the RMSE is 0.018 m 3 /m 3 for site 2168 for a lower dynamic range of soil moisture in an arid area.However, site 2079 has an exceptionally high RMSE value of 0.16 m 3 /m 3 .The trained net over this site cannot simulate SM well, and from the long time series in Section 3.2, the NNsm has a higher dynamic range than that of SCAN sm.The value of Bias has a magnitude of 10 −3 and better ones can be 10 −4 from site 2026 and site 2075.The unbiased trained results ensure the stability and consistency of the NNsm products.This indicates that the trained results are unbiased compared with original soil moisture, which is the most important factor to ensure the stability and consistency of the long time series soil moisture products.
Remote Sens. 2017, 9, 35 11 of 27 and Table 2. Figure 4 shows the temporal series scatterplots for the trained NNsm (red dots) versus SMOSL3sm (blue dots) over the training period (July 2010-June 2011, and 2013).As can be seen from the figures, the trend of the trained NNsm (red dots) is consistent with the trend of SMOSL3sm (blue dots), and NNsm captures the temporal dynamic of SMOSL3sm.Table 2 presents the statistical comparison between NNsm and SMOSL3sm for the training period (July 2010-June 2011, and 2013).
In the analysis, we computed the statistical characteristics in terms of CC, Bias, and RMSE. Figure 5 shows the scatterplots for the trained NNsm versus SMOSL3sm and the RMSE values.In Table 2, it is obvious that the value of CC is high, ranging from 0.49 to 0.95, with an average value 0.

NNsm Products and Evaluation
We further undertook assessments to evaluate the performance of the long-term NNsm products over the period of 2003 to 2015 (except 2012) with the in situ observations of the SCAN sites.The satellite standard SM products of AMSR-E/AMSR2 (AMSRsm) from NSIDC and JAXA were also added as a reference.The evaluations were done by plotting time series, and in terms of CC, RMSE and Bias, as shown in Figures 6-9 and Table 3 (left part).-9 show the temporal series scatterplots (upper panel) and anomaly time series (lower panel) for SCANsm (red line), NNsm (blue dots) and AMSRsm (green diamonds) over a long time period (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015).Table 3 (left) shows the CC, RMSE and Bias of the selected SCAN sites and all SCAN sites.For the results of NNsm vs. SCANsm, the CC has a mean value 0.52 of 15 selected sites and 0.43 of all SCAN sites, ranging from 0.21 to 0.71.The RMSE has a mean value 0.084 m 3 /m 3 for 15 selected sites and 0.078 m 3 /m 3 for all SCAN sites.The value of Bias has a magnitude of 10 −3 and an average value −0.004 of all SCAN sites.From the time series plots, it is evident that the majority of the SCAN sites have a good consistency between NNsm and SCANsm, but several SCAN sites overestimate or underestimate the soil moisture.The performance of NNsm is superior to that of AMSRsm.AMSRsm from NSIDC and JAXA underestimate soil moisture significantly and has a little variation range.
For anomaly time series, after removing the contribution of seasonal cycle, the mean value of CC is lower than that of origin time series, while the mean value of RMSE decreases, as shown in Table 2.It means that the seasonal cycles make a considerable contribution to the value of origin CC.In addition to the seasonal cycles, NNsm weakly capture the interannual variations and the rainfall events.In contrast, after removing the contribution of seasonal cycle, the AMSRsm hardly has a fluctuation, based on the little variation range of original time series.It means that AMSRsm can weakly capture the seasonal cycles and cannot capture the interannual variations and the rainfall events.Detailed analysis is presented by dividing these sites into three categories: (1) sites with strong interannual variations; (2) sites with weak interannual variations; and (3) sites in semi-arid areas.

Sites with Strong Interannual Variations
Figure 3 shows the time series and anomaly time series for seven SCAN sites from Table 3. From these time series figures, it is obvious that the NNsm can capture the dynamic variety of SCAN in situ soil moisture well, and the interannual variation of NNsm is noteworthy.Relatively speaking, the fluctuation range of SCANsm is bigger than that of NNsm.Furthermore, there are problems in the consistency and the continuity of the soil moisture between AMSR-E and AMSR2.The most consistent sites are SCAN site 2076 and site 2089, with almost no bias.However, for SCAN site 2024, site 2075, site 2078, and site 2084, the NNsm underestimated the soil moisture by 7%, 6%, 8%, and 6%, respectively.Especially, for site 2078 and site 2084, the NNsm have obviously been underestimated from 2003 to 2006.For site 2030, the NNsm overestimated soil moisture, particularly in the peak area of interannual soil moisture variability.For anomaly time series, after removing the contribution of seasonal cycle NNsm weakly capture the interannual variations and the rainfall events, while AMSRsm cannot capture the interannual variations and the rainfall events.

Sites with Strong Interannual Variations
Figure 3 shows the time series and anomaly time series for seven SCAN sites from Table 3. From these time series figures, it is obvious that the NNsm can capture the dynamic variety of SCAN in situ soil moisture well, and the interannual variation of NNsm is noteworthy.Relatively speaking, the fluctuation range of SCANsm is bigger than that of NNsm.Furthermore, there are problems in the consistency and the continuity of the soil moisture between AMSR-E and AMSR2.The most consistent sites are SCAN site 2076 and site 2089, with almost no bias.However, for SCAN site 2024, site 2075, site 2078, and site 2084, the NNsm underestimated the soil moisture by 7%, 6%, 8%, and 6%, respectively.Especially, for site 2078 and site 2084, the NNsm have obviously been underestimated from 2003 to 2006.For site 2030, the NNsm overestimated soil moisture, particularly in the peak area of interannual soil moisture variability.For anomaly time series, after removing the contribution of seasonal cycle NNsm weakly capture the interannual variations and the rainfall events, while AMSRsm cannot capture the interannual variations and the rainfall events.Figure 7 shows the time series and anomaly time series for two SCAN sites from Table 3. From these time series figures, it is obvious that NNsm can capture the interannual dynamic variety of in situ soil moisture well, but the dynamic ranges have some differences.
Over site 2059, the dynamic range of NNsm is lower than that of SCAN in situ soil moisture.From the trained time series plot of site 2059 in Figure 4, the reference SMOSL3sm is distributed in the range of 0.1-0.3m 3 /m 3 , so the trained NNsm is also distributed in this range.However, the SCANsm in this site range from 0 to 0.4 m 3 /m 3 .Therefore, the range difference of site 2059 is mainly derived from the difference between the reference SMOSL3sm and the in situ SCANsm.
Remote Sens. 2017, 9, 35 16 of 27 Figure 7 shows the time series and anomaly time series for two SCAN sites from Table 3. From these time series figures, it is obvious that NNsm can capture the interannual dynamic variety of in situ soil moisture well, but the dynamic ranges have some differences.
Over site 2059, the dynamic range of NNsm is lower than that of SCAN in situ soil moisture.From the trained time series plot of site 2059 in Figure 4, the reference SMOSL3sm is distributed in the range of 0.1-0.3m 3 /m 3 , so the trained NNsm is also distributed in this range.However, the SCANsm in this site range from 0 to 0.4 m 3 /m 3 .Therefore, the range difference of site 2059 is mainly derived from the difference between the reference SMOSL3sm and the in situ SCANsm.Over site 2079, the dynamic range of NNsm is higher than that of in situ SCANsm.Land cover of this site is deciduous broadleaf forest, while around 76% of the site is a cropland/natural vegetation mosaic.The soil moisture in crop lands has a higher variety in range than that in forest areas.Therefore, NNsm has a higher dynamic range than in situ site soil moisture.
For anomaly time series, after removing the contribution of seasonal cycle, NNsm still has a smaller range and a bigger range than SCANsm, over site 2059 and 2079 respectively.AMSRsm series is almost a flat line, and cannot capture the interannual variations and the rainfall events.

Sites with Weak Interannual Variations
Over site 2001 and site 2093, shown in Figure 8, NNsm is roughly in agreement with SCANsm.However, the interannual variation of soil moisture is weaker.AMSRsm underestimated the soil moisture.Through the analysis of the land cover, we found the reason for this.
The land cover of these two sites is cropland, and, around the two sites, 72% land cover of site 2001 and 84% land cover of site 2093 are croplands.Crops are short vegetation, so the soil moisture Over site 2079, the dynamic range of NNsm is higher than that of in situ SCANsm.Land cover of this site is deciduous broadleaf forest, while around 76% of the site is a cropland/natural vegetation mosaic.The soil moisture in crop lands has a higher variety in range than that in forest areas.Therefore, NNsm has a higher dynamic range than in situ site soil moisture.
For anomaly time series, after removing the contribution of seasonal cycle, NNsm still has a smaller range and a bigger range than SCANsm, over site 2059 and 2079 respectively.AMSRsm series is almost a flat line, and cannot capture the interannual variations and the rainfall events.

Sites with Weak Interannual Variations
Over site 2001 and site 2093, shown in Figure 8, NNsm is roughly in agreement with SCANsm.However, the interannual variation of soil moisture is weaker.AMSRsm underestimated the soil moisture.Through the analysis of the land cover, we found the reason for this.
The land cover of these two sites is cropland, and, around the two sites, 72% land cover of site 2001 and 84% land cover of site 2093 are croplands.Crops are short vegetation, so the soil moisture changes faster.After rain or irrigation, soil moisture content will decrease rapidly.The corresponding change trend of soil moisture over these two sites is falls rapidly after rising.Additionally, when SCANsm is higher than the overall SM level from 2003 to 2007, and is higher than that in the training period, NNsm has obvious underestimations in soil moisture.changes faster.After rain or irrigation, soil moisture content will decrease rapidly.The corresponding change trend of soil moisture over these two sites is falls rapidly after rising.Additionally, when SCANsm is higher than the overall SM level from 2003 to 2007, and is higher than that in the training period, NNsm has obvious underestimations in soil moisture.

Sites in Semi-Arid Areas
Figure 9 shows a time series of soil moisture over four sites: Site 2002, site 2026, site 2027, and site 2168, which are located in semi-arid areas.Over these four sites, the soil moisture level is not high, and basically remains below 0.2 m 3 /m 3 .NNsm overestimates the soil moisture over site 2026, site 2027, and site 2168, particularly in the drier period, while NNsm underestimated the soil moisture of site 2002.The main landcover types around these sites are croplands, cropland/natural vegetation mosaic, and grassland and open shrublands.Around site 2027, 88% is croplands and cropland/natural vegetation mosaic.A total of 96% of the surface of site 2168 is shrubland.For site 2002, most surfaces are croplands.The surface of site 2026 is 48% grasslands and 52% open shrublands.Thus, the soil moisture of these sites changes faster.
From the trained time series plots of site 2026, site 2027, and site 2168, in Figure 4, the reference SMOSL3sm distributed at around 0.1 m 3 /m 3 , 0.1-0.5 m 3 /m 3 , and 0.1 m 3 /m 3 , so the trained NNsm also distributed in these ranges, respectively.However, the SCANsm from site 2026 and site 2168 were almost 0 m 3 /m 3 in the drier period, and the SCANsm of site 2027 ranges from 0 to 0.2 m 3 /m 3 , which are distinctly lower than the level of in situ observations.Therefore, the differences on these sites are mainly derived from the differences between the reference SMOSL3sm and the in situ SCANsm.From the trained time series plots of site 2026, site 2027, and site 2168, in Figure 4, the reference SMOSL3sm distributed at around 0.1 m 3 /m 3 , 0.1-0.5 m 3 /m 3 , and 0.1 m 3 /m 3 , so the trained NNsm also distributed in these ranges, respectively.However, the SCANsm from site 2026 and site 2168 were almost 0 m 3 /m 3 in the drier period, and the SCANsm of site 2027 ranges from 0 to 0.2 m 3 /m 3 , which are distinctly lower than the level of in situ observations.Therefore, the differences on these sites are mainly derived from the differences between the reference SMOSL3sm and the in situ SCANsm.Compared with SCANsm and NNsm, we can clearly find that AMSRsm has an overestimation of soil moisture.Additionally, soil moisture from AMSR-E and AMSR2 have poor consistency and continuity, especially for AMSR2 soil moisture from 2013 to 2015 over site 2002 and site 2027, while NNsm has a better consistency and continuity.
For anomaly time series over site 2027, the total shift between NNsm and SCANsm was removed, and the NNsm still overestimates the soil moisture.The NNSm can slightly capture the interannual variations.
The disagreement between the satellite derived NNsm and the in situ observations may be caused by several factors.First, satellite-derived soil moisture represents the spatial mean value within a satellite footprint, while the in situ soil moisture observations are point measurements.The scaling difference between the point observations and the surface observations will lead to a disagreement, especially in terms of RMSE and Bias [71].In addition, the soil moisture derived from satellite data may have a different representative depth with that observed from in situ sites.The satellite-derived soil moisture may dry more rapidly after rainfall events or irrigation than that of in situ sites.Moreover, the heterogeneity in the satellite footprint will lead to the difference between the satellite grids and the in situ sites.For example, soil moisture will differ and change rapidly with cropland land cover of a site than that of a footprint with 80% forest.Despite all the above-mentioned impacting factors, NNsm reproduced the soil moisture variability and changing trends without drifts in the long-term data record well, which is essential for climate analyses.

Comparison and Discussion
To clarify the advantages of our retrieval results, we further discuss our soil moisture products (NNsm) by comparing them with other long-term soil moisture products, generated using alternative methods: (a) AMSR-E/2 soil moisture products generated using the same algorithm, the LPRM (Land Parameter Retrieval Model) algorithm (AMSR_LPRM hereafter), through cross-calibration; and (b) AMSR-E soil moisture retrievals through a regression method (Reg_sm hereafter) with the target reference of SMOSL3sm.

Comparing with Satellite Products
In Section 3, we compared NNsm with the reference SMOSL3sm, in situ SCANsm, and AMSRsm.It turns out that NNsm has a high consistency and accuracies relative to the reference.However, the AMSRsm from NSIDC and JAXA underestimate SM significantly.In this section, we discuss the performance of NNsm by comparing it with AMSR-E and AMSR2 soil moisture products (we call it AMSR_LPRM), generated by the same algorithm, namely the LPRM algorithm.These long time-series datasets are from two similar sensors, namely AMSR-E and AMSR2 satellite sensors, and further more they are retrieved both from the LPRM algorithm and are from one dataset, so we treated them as one long time-series datasets.We average the statistical results of the 15 selected sites, as shown in Table 3.
From the point of view of the time series (upper panel) and anomaly time series (lower panel) shown in Figure 10, it is obvious that NNsm is more consistent and continuous with SCANsm than the performance of AMSR-LPRM.AMSR_LPRM overestimate soil moisture over these sites.Additionally, the soil moisture from AMSR-E and AMSR2 are not consistent over these sites.Some panels show a shift in soil moisture in the AMSR-E period compared to the AMSR-2 period that could be explained by not accounting for possible bias between the LPRM datasets.Over site 2076, there is a big shift between the AMSR-E and AMSR2 data, so after anomaly process, AMSR-E underestimates the soil moisture and the AMSR2 overestimate the soil moisture.With respect to accuracy, NNsm has an average value of CC = 0.52 for 15 selected sites and CC = 0.43 for all sites; and NNsm has a mean value of RMSE = 0.084 for 15 selected sites and RMSE = 0.078 for all sites.These statistics of NNsm is more accurate than that of AMSR_LPRM, whether the time series or the anomaly time series.

Comparing with a Regression Method
As mentioned in the Introduction Section, Al-Yaari et al. [39] used a regression approach to retrieve global time series soil moisture datasets, with SMOSL3sm as a training reference target and AMSR-E TB as training input.We compared results NNsm with Reg_sm of the algorithm used by Al-Yaari.The comparative results are listed in Table 4 for the training and simulation period.
In the training period, compared with the reference SMOSL3sm, NNsm has a slightly higher CC 0.67 than 0.60 of Reg_sm, and NNsm has a slightly lower RMSE 0.055 m 3 /m 3 than 0.057 m 3 /m 3 of Reg_sm.By the statistical comparisons, we can see that our NNsm has a better training result than the Reg_sm.Moreover, the input of NNsm includes TBs from AMSR-E and AMSR2, while the input of Reg_sm is only AMSR-E TB.It makes it possible to derive continuous and longer soil moisture dataset from AMSR-E and AMSR2 TB in the next simulation step.
In the simulation period, from an accuracy perspective against SCANsm, NNsm has a slightly higher CC 0.52 than 0.49 of Reg_sm.However, the two algorithms have nearly equal RMSE when compared with in situ SCANsm.For the long-term dataset, the NNsm can simulate soil moisture from 2013 to 2015 based on AMSR-E/AMSR2 TBs, longer than the dataset from Reg_sm.

Comparing with a Regression Method
As mentioned in the Introduction Section, Al-Yaari et al. [39] used a regression approach to retrieve global time series soil moisture datasets, with SMOSL3sm as a training reference target and AMSR-E TB as training input.We compared results NNsm with Reg_sm of the algorithm used by Al-Yaari.The comparative results are listed in Table 4 for the training and simulation period.In the training period, compared with the reference SMOSL3sm, NNsm has a slightly higher CC 0.67 than 0.60 of Reg_sm, and NNsm has a slightly lower RMSE 0.055 m 3 /m 3 than 0.057 m 3 /m 3 of Reg_sm.By the statistical comparisons, we can see that our NNsm has a better training result than the Reg_sm.Moreover, the input of NNsm includes TBs from AMSR-E and AMSR2, while the input of Reg_sm is only AMSR-E TB.It makes it possible to derive continuous and longer soil moisture dataset from AMSR-E and AMSR2 TB in the next simulation step.
In the simulation period, from an accuracy perspective against SCANsm, NNsm has a slightly higher CC 0.52 than 0.49 of Reg_sm.However, the two algorithms have nearly equal RMSE when compared with in situ SCANsm.For the long-term dataset, the NNsm can simulate soil moisture from 2013 to 2015 based on AMSR-E/AMSR2 TBs, longer than the dataset from Reg_sm.

Conclusions
This study investigates the feasibility of a BPNN method to build a long time-term soil moisture time series using SMOSL3sm products and AMSR-E/AMSR2 TB observations.First, the BPNNs on every grid were trained using SMOSL3sm products as a training target, and we took reflectivity (R) and the MVI from AMSR-E/AMSR2 TB observations during July 2010-June 2011 and the entire year of 2013 as inputs.With these BPNNs, we built long time series of global soil moisture from 2003 to 2015, using AMSR-E TB in 2003-2011 and AMSR2 TB in 2013-2015.
We evaluated the quality of the training step over the training period (July 2010-June 2011, and 2013), and it achieved a good agreement between the NNsm and SMOSL3sm products, with a mean global value of CC = 0.67, RMSE = 0.055 m 3 /m 3 and Bias = −0.0005m 3 /m 3 .A specific analysis on selected SCAN sites shows that the trend of the trained NNsm is consistent with that of SMOSL3sm, with a high CC value.These results ensure the following step of building a long time series of soil moisture.
The long time series and anomaly time series of NNsm were evaluated against in situ SCANsm observations.It turns out that our result NNsm has a high consistency and accuracy with reference SMOSL3sm and the in situ SCANsm, and captured the temporal dynamics of soil moisture, with CC = 0.52, RMSE = 0.084 m 3 /m 3 and a Bias with a magnitude of 10 −3 .Over most of the SCAN sites, NNsm well captures the in situ SCANsm and has strong seasonal and interannual variations.However, in some SCAN sites, although our method can capture the interannual dynamic variety well, there are some differences in the dynamic ranges between NNsm and SCANsm.This can be mainly explained by two factors, in which one is the differences between the reference SMOSL3sm and the in situ SCANsm, and the other is the heterogeneity in the satellite footprint.The soil moisture of some sites has weak interannual variations.Through our analyses we found that the land cover is mainly croplands.Thus, influenced by irrigation in these areas, soil moisture changes faster and has no obvious interannual variations.In some semi-arid sites, NNsm has overestimated or underestimated soil moisture.There are differences between the reference SMOSL3sm and the in situ SCANsm, which is the reason for overestimations or underestimations.
To further evaluate the accuracy and state the advantages of NNsm, we compared it with AMSR_LPRM products and a regression method.Comparative results show that NNsm is more consistent and continuous with SCANsm than the performances of AMSR_LPRM and Reg_sm.NNsm has significant advantages relative to the regression method, with a higher accuracy and longer time series.
For improvement and further research, we wish to continue the following work: (1) for the improvement of our BPNN method, we can test different combinations of Rs and MVI.For example, we can use low or high frequency bands of Rs and MVI, as we used all frequencies in this study, which may not be the optimal input combination for BPNN training; (2) Another direction is to train BPNNs for future WCOM.Because WCOM will have no overlapping data with AMSR-E data, we can train BPNNs using only the AMSR2 data and apply the BPNNs to cross-calibrated AMSR2 and AMSRE data.
In conclusion, through BPNNs training, this study provides a promising method to build long time series of global soil moisture products.The BPNN method can produce surface soil moisture in terms of absolute values and temporal variations.In addition, the BPNN method is independent of various ancillary data, and only relies on the reference SMOS data.This method can be applied in other satellite missions, such as SMAP and future WCOM satellite mission, so long as they have an overlap period with AMSR-E/AMSR2 observations.

Figure 2 .
Figure 2. Global soil moisture maps on 2 July 2010.(a,b) NNsm (c,d) corresponding SMOSL3sm (e,f) difference value of NNsm and SMOSL3sm.The left panels (a,c,e) show daily data and the right panels (b,d,f) show 3-day average data.We quantitatively evaluated the quality of the training by analyzing the agreement between the NNsm and SMOSL3sm products.The accuracy of the training results, in terms of CC, RMSE and Bias, were compared with SMOSL3sm over the training period (July 2010-June 2011, and 2013) and are shown in Figure 3.In Figure3a, the correlation coefficients between the reference (SMOSL3sm) and the trained SM (NNsm) are shown.The trained SM values generally correlate well with the reference SM values over most of the globe, with a global mean of CC = 0.67, and with higher values of CC (CC > 0.8) obtained over the United States, South Africa and Northwestern Africa, the south of South America, etc., and especially over Australia, where it corresponds to low to moderate vegetation coverage.Highly vegetated areas, including the Amazon and Congo rainforests, have a low correlation or failure in retrieval due to the vegetation effects.There are also areas with low vegetation coverage with poor results, such as the Arabian Peninsula.The spatial distribution of the RMSE values are plotted in Figure3b, with a global mean of RMSE = 0.055 m 3 /m 3 .Lower RMSE values (<0.04 m 3 /m 3 ) were generally distributed over arid, semi-arid, and desert regions with short and sparse vegetation or bare soil (e.g., Australia, Sahara, Southwestern United States), while high RMSE values (~0.1 m 3 /m 3 ) were distributed over regions with moderate to high vegetation or alpine flora (e.g., Southeastern China, India, Southeastern Australia, Eastern United States, and locations close to the Earth's equator).In general, the CC values are mainly moderate to high, and the RMSE values are moderate, but in an

Figure 2 .
Figure 2. Global soil moisture maps on 2 July 2010.(a,b) NNsm (c,d) corresponding SMOSL3sm (e,f) difference value of NNsm and SMOSL3sm.The left panels (a,c,e) show daily data and the right panels (b,d,f) show 3-day average data.

3. 2 . 3 .
Figure9shows a time series of soil moisture over four sites: Site 2002, site 2026, site 2027, and site 2168, which are located in semi-arid areas.Over these four sites, the soil moisture level is not high, and basically remains below 0.2 m 3 /m 3 .NNsm overestimates the soil moisture over site 2026, site 2027, and site 2168, particularly in the drier period, while NNsm underestimated the soil moisture of site 2002.The main landcover types around these sites are croplands, cropland/natural vegetation mosaic, and grassland and open shrublands.Around site 2027, 88% is croplands and cropland/natural vegetation mosaic.A total of 96% of the surface of site 2168 is shrubland.For site 2002, most surfaces are croplands.The surface of site 2026 is 48% grasslands and 52% open shrublands.Thus, the soil moisture of these sites changes faster.From the trained time series plots of site 2026, site 2027, and site 2168, in Figure4, the reference SMOSL3sm distributed at around 0.1 m 3 /m 3 , 0.1-0.5 m 3 /m 3 , and 0.1 m 3 /m 3 , so the trained NNsm also distributed in these ranges, respectively.However, the SCANsm from site 2026 and site 2168 were almost 0 m 3 /m 3 in the drier period, and the SCANsm of site 2027 ranges from 0 to 0.2 m 3 /m 3 , which are distinctly lower than the level of in situ observations.Therefore, the differences on these sites are mainly derived from the differences between the reference SMOSL3sm and the in situ SCANsm.

Furthermore
, the Taylor diagram is presented to display statistical comparison between NNsm (blue dots) and AMSRsm/AMSR_LPRM (green dots) with the in situ SCANsm (red dot) over the long time period(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) for all SCAN sites, as shown in Figure11.In general, the dots (sites) are unevenly distributed in the Taylor diagram for both NNsm and AMSRsm/AMSR_LPRM, meaning that their accuracy varies from one site to another.For the left panel, the NSDs of AMSRsm are much less than the one unit of normalized NSD, indicating a little variability of AMSRsm.For the right panel, the NSDs of AMSR_LPRM are much larger than the one unit of normalized NSD, indicating a larger variability of AMSR_LPRM.While the NSD of NNsm are always around the arc of one unit, compared with that of AMSRsm/AMSR_LPRM.As shown in the diagrams, the NNsm has a slightly higher CC with SCANsm than that of AMSRsm/AMSR_LPRM, meaning that these three products have very comparable correlation with SCANsm.Additionally, most blue dots are within the arc of one unit of CRMS, indicating that NNsm matches SCANsm more.Remote Sens. 2017, 9, 35 21 of 27Furthermore, the Taylor diagram is presented to display statistical comparison between NNsm (blue dots) and AMSRsm/AMSR_LPRM (green dots) with the in situ SCANsm (red dot) over the long time period(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) for all SCAN sites, as shown in Figure11.In general, the dots (sites) are unevenly distributed in the Taylor diagram for both NNsm and AMSRsm/AMSR_LPRM, meaning that their accuracy varies from one site to another.For the left panel, the NSDs of AMSRsm are much less than the one unit of normalized NSD, indicating a little variability of AMSRsm.For the right panel, the NSDs of AMSR_LPRM are much larger than the one unit of normalized NSD, indicating a larger variability of AMSR_LPRM.While the NSD of NNsm are always around the arc of one unit, compared with that of AMSRsm/AMSR_LPRM.As shown in the diagrams, the NNsm has a slightly higher CC with SCANsm than that of AMSRsm/AMSR_LPRM, meaning that these three products have very comparable correlation with SCANsm.Additionally, most blue dots are within the arc of one unit of CRMS, indicating that NNsm matches SCANsm more.

Figure 11 .
Figure 11.Taylor diagram displaying a statistical comparison between NNsm (blue dots) and AMSRsm/AMSR_LPRM (green dots) (left panel is AMSRsm, right panel is AMSR_LPRM) with the time series of in situ SCANsm (red dot) over the long time period (2003-2015).

Figure 11 .
Figure 11.Taylor diagram displaying a statistical comparison between NNsm (blue dots) and AMSRsm/AMSR_LPRM (green dots) (left panel is AMSRsm, right panel is AMSR_LPRM) with the time series of in situ SCANsm (red dot) over the long time period (2003-2015).

Table 1 .
Characteristic descriptions of the selected Soil Climate Analysis Network (SCAN) sites.

Table 3 .
Comparative results for NNsm and AMSR_LPRM against in situ SCANsm (selected 15 sites) for the 2003-2015 period.The mean value is computed only when the correlations are significant correlations (p-value < 0.05).

Table 4 .
Comparative results between retrieved SM and reference SM/in situ SCAN observations (selected 15 sites) in the training and simulation period.Only significant correlations (p-value < 0.05) are presented.