Advances in the Quality of Global Soil Moisture Products: A Review

Soil moisture is a crucial component of land–atmosphere interaction systems. It has a decisive effect on evapotranspiration and photosynthesis, which then notably impacts the land surface water cycle, energy transfer, and material exchange. Thus, soil moisture is usually treated as an indispensable parameter in studies that focus on drought monitoring, climate change, hydrology, and ecology. After consistent efforts for approximately half a century, great advances in soil moisture retrieval from in situ measurements, remote sensing, and reanalysis approaches have been achieved. The quality of soil moisture estimates, including spatial coverage, temporal span, spatial resolution, time resolution, time latency, and data precision, has been remarkably and steadily improved. This review outlines the recently developed techniques and algorithms used to estimate and improve the quality of soil moisture estimates. Moreover, the characteristics of each estimation approach and the main application fields of soil moisture are summarized. The future prospects of soil moisture estimation trends are highlighted to address research directions in the context of increasingly comprehensive application requirements.


Introduction
Soil moisture (SM), the moisture content in the soil, is a crucial component in the hydrological cycle; it links atmospheric precipitation and underground water and is also an important parameter of energy exchange between the land surface and the atmosphere [1][2][3][4]. Consequently, SM is recognized as an essential element in studies aimed at analyzing and understanding Earth system processes, such as climate change and ecological evolution. Specifically, the available water content, which is essential for vegetation growth, is one of the most important components of soil and has crucial guiding significance for agricultural production. Currently, both ground and spaceborne sensors are used to derive the original SM information [2,5,6]. Numerous technologies, such as statistical models, data fusion, machine learning, and assimilation approaches, are widely used to improve SM quality [7][8][9][10]. Additionally, SM datasets with high spatial-temporal resolution are valuable for boosting agricultural production in terms of drought and flood monitoring, crop growth analysis, and yield estimation.
Significant efforts have been devoted to SM acquisition and estimation techniques during the past decades, and numerous global-scale SM estimates have been generated and are available for scientific studies [11][12][13]. To fulfill the increasingly comprehensive requirements for SM estimates, their quality, including spatial coverage, temporal span, spatial resolution, temporal resolution, time latency, and data precision, is notably improved through advanced methods. However, there is still a long way to go so as to further enhance the spatiotemporal integrity, accuracy, and stability of estimated SM. Therefore, it is necessary to rigorously summarize these data acquisition methods, progress in advanced techniques, and point out future challenges for SM retrieval.
The remainder of this paper is organized as follows. Section 1 introduces the meaning of improving SM products and the two main original SM data acquisition methods. Section 2 provides a comprehensive and systematic review of the methods for improving the quality of both ground-and satellite-observed SM products. The principles, advantages, and limitations of these methods are presented. Section 3 presents the application fields of SM products. Section 4 presents future prospects for advancing global SM products, and Section 5 concludes the article.
Currently, there are two main data acquisition methods: (1) Point-scale original data acquisition: in situ measurements Considering the scientific significance and application value of SM, the Soviet Union and Mongolia have started to record ground SM using monitoring sensors to retrieve national soil water content through networks since the 1950s [14][15][16]. In situ measurements can conveniently monitor SM at precise sites, depths, and hourly or sub-hourly frequencies.
Both the sensors and networks are easily accessible and affordable. However, as various institutes have different research objectives, each SM network has its own station density, observation frequency, monitoring depth, sensor type, spatial coverage, and temporal period. SM can be expressed as a gravimetric unit (g/cm 3 ), volumetric unit (m 3 /m 3 ), or a function of the field capacity according to usage habits [17]. Every SM network has its own method of sharing data, usually through a website in its own language. Therefore, it is difficult for researchers to derive SM records from different observation networks.
Facing these difficult problems, the International Soil Moisture Network (ISMN, https: //ismn.geo.tuwien.ac.at/en/, accessed on 31 July 2022) is devoted to performing as a centralized data hosting facility for global in situ SM measurements [5,18,19]. This platform is initiated to collect global SM from operational networks and validation campaigns, standardize the techniques and protocols and make them available to users. Currently (June 2022), 73 networks and more than 2800 stations are located in Europe, North America, South America, Asia, Africa, Australia, and Oceania, which are collected by the ISMN and available to the public. In addition to SM, ISMN also integrates and provides SMrelated meteorological variables, such as soil temperature and precipitation, which serve as critical supplementary references for the comprehensive analysis of soil water evolution characteristics. Currently, the ISMN is an increasingly popular data source for studies focused on SM validation worldwide [14,[20][21][22][23][24][25][26]. With continuous network expansion and data updates, the ISMN has become an energetic and well-acknowledged global-scale SM ground observation database. Additionally, the National Soil Moisture Network has been established in the contiguous United States. There are 24 networks, and the SM data are retrieved in a timely manner with a one-day latency (http://nationalsoilmoisture.com/, accessed on 31 July 2022).
However, despite the increasingly standardized and abundant in situ measurements, it is still difficult for point-scale data to represent large-area SM conditions. Limited time and space coverage greatly restrict the application of in situ measurements in large-scale, long-term scientific studies and explorations. As a result, in situ measurements usually serve as a crucial reference for the evaluation of multi-scale SM estimates.
(2) Large-scale data acquisition: spaceborne remote-sensing technology There is an urgent demand for access to near-real-time soil moisture data on a global scale. Since the 1970s, spaceborne remote sensing technology has gradually become a promising approach for obtaining global-scale continuous time-series surface SM data. The abundance of satellite-retrieved soil moisture data provides an unprecedented opportunity to conduct related analyses and applications.
A number of remotely sensed data, including optical, thermal infrared, and microwave bands, were employed to retrieve SM estimates [27]. In terms of optical and thermal infrared remote sensing data, soil surface spectral reflectance characteristics, soil surface emissivity, and surface temperature are mainly used to estimate SM [28]. However, retrieval models are mostly established on the basis of empirical relationships between SM and land surface condition indexes, that is, vegetation condition index [29], normalized difference vegetation index (NDVI) [30], temperature vegetation drought index (TVDI) [31], and soil wetness index [32], which can hardly satisfy large-scale and multi-climate zone applications. In addition, both optical and thermal remote sensing are vulnerable and sensitive to cloudy and rainy weather, dense vegetation coverage, and aerosol optical depth. Optical remote sensing can only measure reflection and emission from the land surface at a depth of 1 mm. For hydrological and agricultural analyses, SM data could be far more meaningful at a depth of several centimeters than at a mere 1 mm.
In comparison, microwave signals are impervious to rainy and cloudy weather, and their penetration depth can reach 0-5 cm, showing prominent advantages in SM retrieval. Microwave remote sensing technology can be divided into active and passive microwaves based on the working modes of different sensors. Active microwave sensors transmit signals to the detection targets and receive backscattered signals after the interaction between the signals and targets, whereas passive microwave sensors receive signals reflected and emitted from the underlying surface [33][34][35]. Currently, both active and passive microwave signals are employed to derive land surface soil water content. As shown in Table 1, a large number of spaceborne microwave SM products have been retrieved and published in the past half-century. Through their application in various hydrology-related scientific explorations, they efficiently boosted the understanding of spatial-temporal evolution characteristics of SM and the mechanism by which SM influences climate change across the globe. In addition to the listed global SM products, there are also studies and programs focused on SM deriving in a certain vegetation cover or climate zone to acquire regional SM with high accuracy [9,36,37].  [66,67] Specifically, active microwave-derived data have high spatial and low temporal resolution, although they are susceptible to surface roughness and vegetation cover. Comparatively, passive microwave-derived data often have high temporal resolution and low spatial resolution and can behave insensitively to surface roughness and vegetation cover. Additionally, both active and passive microwaves suffer from radio-frequency interference (RFI) [68,69]. Direct broadcast and communication satellites cause considerable RFI above the microwave band, which can be a critical reason for outliers and gap regions in satelliteretrieved SM products [70,71]. Basically, all single spaceborne microwave SM retrievals have large gap regions induced by RFI, dense vegetation coverage, veil of ice, and the relative motion between satellite revolution and Earth rotation [62], seriously impeding their spatiotemporal integrity.
Despite the enormous number of multi-source SM products mentioned above, scientific explorations and experiments pursuing high quality are ongoing. Attempts have mainly focused on improving the completeness, spatial representativeness, spatial resolution, and accuracy of currently accessible SM retrievals. Therefore, this review aims to provide an auxiliary reference for readers to understand the history and emerging trends of global SM retrieval methods.

Statistical Model
A statistical model can be established based on the significant statistical or empirical relationship between SM and land surface elements (such as surface temperature, vegetation index, evapotranspiration (ET), and albedo). These convenient and simple statistical models have been widely employed since inception and are mainly used for regional SM gap-filling and downscaling in terms of different research emphases [36,[72][73][74][75]. Because of the variable coupling relationship along with various underlying surface hydrothermal features, the statistical model always has inter-regional applicability limitations. Furthermore, it is difficult to ensure the robustness and accuracy of statistical model-derived large-scale results.

Triangular (Tri)-Based Method
The Tri-based method can provide nonlinear solutions for SM estimation. Among the various statistical models, the Tri-based method is a classic method that estimates SM based on its close coupling relationship with land surface temperature (LST) and vegetation conditions [76][77][78]. Sandholt et al. [79] proposed a triangular feature space constructed using the LST and NDVI. The wet edge is composed of the lowest LST under different vegetation conditions, which indicates the maximum humidity. The dry edge, which indicates the minimum surface ET, is formed by the scatter of the highest LST under different NDVI values. As shown in Figure 1, if vegetation cover in a certain region ranges from bare soil to dense coverage and SM ranges from extreme drought to extreme humidity, the NDVI-LST scatter diagram is triangular in shape. A drought index, referred to as the TVDI, was defined and tightly linked to SM [80]. Then, a method was suggested to simulate SM using the combination of LST and NDVI based on the triangular feature space of TVDI. The Tri method equations are as follows.
where a ij is the correlation coefficient of every term in the polynomial, which is calculated using multiple regression. LST * is calculated as follows: where LST max (NDV I), LST min are the maximum and minimum values of the LST dataset calculated from NDVI, respectively. NDV I * is calculated as follows: where NDV I max , NDV I min are the maximum and minimum values, respectively, of the NDVI dataset. Zhao et al. [36] systematically tested the performances of different vegetation indexes in the Tri model through a case study at the northeastern part of the Tibetan Plateau. The results demonstrated the advantage of NDVI in constructing the Tri model. The SM estimated by the NDVI-based model showed higher accuracy than those estimated by models constructed from the enhanced vegetation index (EVI) and soil-adjusted vegetation index (SAVI).
Many studies have attempted to estimate SM using the Tri method. The LST and NDVI datasets were acquired from high-resolution, remotely sensed products, and the established model could be effectively employed to improve the coarse-resolution SM [72,[81][82][83]. Additionally, the Tri model neither requires any ancillary atmospheric data nor is it sensitive to atmospheric parameters. In general, this method is appropriate for flat regions with moderate vegetation coverage because NDVI is easily saturated in densely vegetated areas such as forests. This solution tends to exhibit better performance in regions with a single climate type and minimal artificial interference. Additionally, sufficient pixels are necessary to construct the "universal" triangular feature space. Sufficient pixels are also crucial for the accurate identification of wet and dry edges.
Apart from the classic vegetation and temperature combination, there are new approaches to parameterizing the Tri model. Shafian et al. [84] used thermal data and ground cover from Landsat imagery to establish the feature space to retrieve a perpendicular soil moisture index, which reduced the expense and complexity of the SM estimation. Sun [85] proposed a two-stage trapezoid to construct a feature space. This approach was established based on the theory that the vegetated surface temperature should vary after the bare soil surface temperature, as vegetation can absorb water from a deep soil layer to maintain transpiration. In addition, this two-stage method explicitly expresses the evolution of the feature space from a triangular to trapezoidal form.

Disaggregation Based on Physical and Theoretical Scale Change (DISPATCH) Algorithm
DISPATCH is another well-known and widely used algorithm capable of improving the spatial resolution of surface SM [86][87][88][89][90]. This approach was developed based on the tight interaction between surface SM and LST during the ET process. The DISPATCH method equation is a first-order Taylor series expansion and is expressed as follows [91]: where ST is the surface soil temperature. ST max and ST min correspond to the SM under extremely dry (SEE = 0) and extremely humid (SEE = 1) conditions, respectively. All ST were derived from the linear decomposition of LST into soil and vegetation using the following equation: where P veg is the vegetation coverage percent, and T veg is the vegetation temperature. Merlin et al. [92] first proposed this algorithm and successively disaggregated the SMOS from 40 to 1 km with favorable accuracy. Then, they conducted a case study using DISPATCH to downscale SMOS SM in southeastern Australia [93]. This study found that the quality of the disaggregated product was good in summer and poor in winter. In addition, the coupling level in semi-arid areas was evidently stronger than that in temperate zones, and both vegetation coverage and vegetation water stress could influence ST retrieval. Hence, it is suggested that DISPATCH could perform better in low-vegetated semi-arid areas than in densely vegetated temperate regions. To enhance the disaggregation accuracy, Merlin et al. [94] designed a yearly SEE self-calibration model that could effectively make the DISPATCH algorithm more robust. This study proved the competence of DISPATCH in multi-scale SM downscaling through an evaluation study at 3 km and 100 m resolution in Spain. To extend the applicability of the DISPATCH approach, Ojha et al. [91] used TVDI instead of SEE in their model to include more densely vegetated areas. The results showed that the adoption of TVDI obviously increased the coverage percentage of the case study region, and the downscaled SM from the EVI-derived model displayed a higher correlation against in situ measurements than the one from the NDVI-derived model over vegetated areas. Apart from disaggregation, DISPATCH can also be utilized for coarse-resolution SM product evaluation [95].

Data Fusion
The data fusion method integrates multi-source remotely sensed data to produce SM estimations with higher accuracy, completeness, and reliability than the single satellite information source-retrieved ones. Through the fusion of multi-band, sensor working mode, and transit time remote sensing information, the quality of SM, including data accuracy, spatial coverage rate, temporal scope, and day-scale representativeness, can be efficiently improved. The Essential Climate Variable Soil Moisture (ECV SM), Soil Moisture Operational Product System (SMOPS), and Soil Moisture Active Passive (SMAP) are three well-known multiple microwave information-fused SM products. Because of their high performance in depicting soil water content conditions, they have received considerable attention since their inception.
(1) ECV SM The ESA launched the ECV program, also known as the Climate Change Initiative, to monitor global climate evolution tendencies in 2010, and SM was simultaneously recognized as an ECV at the same time. The ECV SM, with global coverage, 0.25 • pixel size, and daytime scale temporal resolution, was derived from the fusion of numerous satellitebased microwave products [96]. There are 13 versions available to the public to date, each updated with new sensors and an extended time series (https://esa-soilmoisture-cci.org/, accessed on 31 July 2022). Currently, the latest one is v07.1, which spans over 40  The merging scheme of the ECV SM is described as follows: First, all the sensor retrievals are unified to a 0.25 • grid and daily time stamps (00:00 UTC) through a hammingwindow method and a nearest neighbor search. Then, the active estimation is retrieved using the TU Wien Water Retrieval Package, which is a change detection method to derive SM, as well as the official method to retrieve ASCAT L2 SM products [97]. Passive estimation is generated through the land parameter retrieval model, which is a forward model based on the radiative transfer model and has its own advantage of good frequency compatibility and a vegetation optical depth analytical solution [98]. The Global Land Data Assimilation System (GLDAS) Noah 2.1 was used for the active-passive combined estimation by offering a consistent climatology. The combined SM was finally derived through GLDAS Noah-based scaling, error characterization, and merging of each microwave sensor product. For more details about the merging algorithms of ECV SM and their evolutionary history, readers are referred to [99].
A number of studies have comprehensively and systematically evaluated the performance of ECV SM and almost consistently concluded that: (1) ECV SM expresses a good fitting degree to both ground observations and reanalysis products [100][101][102].
(2) The accuracy and robustness of ECV SM are steadily enhanced when the version is updated [99,103]. (3) Combined products are superior to the corresponding active and passive products [103,104]. (4) The spatiotemporal integrity and accuracy of the combined ECV SM display similar or better performances than each single microwave sensor retrieval [22,24].
(2) SMOPS Although the ECV SM reveals a favorable capability in depicting land surface soil humidity conditions, the prevalent gap regions still hinder its spatial coverage integrity. The NOAA initiated the SMOPS program in 2012, which is dedicated to creating a global seamless SM product from accessible microwave satellite observations [105]. The first version of SMOPS-blended SMOS, ASCAT MetOp-A, and Windsat/Coriolos generated a 6 h and daily SM simultaneously. In 2016, the upgraded version 2 product with an extended time series introduced ASCAT MetOp-B and AMSR-2 into the system. Windsat/Coriolos were excluded. Both SMOPS V1.0 and V2.0 were generated using the single-channel retrieval algorithm, which could convert the brightness temperature of a single channel to emissivity [106]. The SM estimation can then be derived through the Fresnel equation by calculating the dielectric constant and dielectric mixing model. SMOPS V3.0, which contained 6 h and daily (00:00 UTC) SM products with a 0.25 • grid, was developed in 2016, and SMAP was added to the blending system [107]. Moreover, a near real-time level-1 brightness temperature other than the officially released products was employed to satisfy the latency requirements.
SMOPS provides an almost seamless SM across the globe with high spatial coverage, which is a notable advantage compared to most satellite-based SM products. Small gap areas are mainly distributed in frozen (i.e., ice, snow) or dense vegetation-covered regions. Numerous studies have objectively assessed the quality of SMOPS and indicated that: (1) compared to the individual satellite-retrieved SM products, SMOPS exhibits much higher data availability; (2) the accuracy of SMOPS is continuously improved along with updated versions; (3) ECV shows higher accuracy, whereas SMOPS has superior spatial coverage [102,105,108].
(3) SMAP Considering the merits of the L-band and fusion of active and passive microwave signals, NASA launched the SMAP program in 2010, utilizing L-band radar and radiometer instruments onboard the same spacecraft to detect surface SM conditions [66,67]. One of the main tasks of SMAP is to acquire an active and passive blended product to advance SM mapping by combining its strengths. Radar signals can achieve high pixel resolution; however, they are vulnerable to surface roughness and vegetation, which could significantly influence signal accuracy. In contrast, radiometer signals usually have coarse resolution, but they can be sensitive to SM and insensitive to surface roughness and vegetation. Therefore, the combined SMAP SM was expected to be capable of accurately expressing the surface soil water level with a relatively intermediate resolution. The brightness temperature disaggregation and time-series methods were used in the combination process. First, a linear relationship was established between variations in brightness temperature and radar backscatter using time-series approaches. This relationship was then employed to disaggregate brightness temperature. Finally, SM can be derived from the disaggregated brightness temperature and the corresponding retrieval algorithms. The 9 km combined SM has been validated by many scholars, and they found that it performed well in terms of fitting degree in the forested region [109]. However, on 7 July 2015, the radar failed irreparably after 3 months of operation. Although the time series of the combined SMAP SM product was only 86 days, it acted as a valuable precedent for SM merging using SMAP retrievals.
Many attempts have been made to renew the mission of generating a high-resolution SMAP SM product, and the signal from C-band SAR onboard Sentinel-1A/1B has been found to be an adequate substitute for the irreparable SMAP radar signal. By merging with Sentinel-1A/1B, a high-spatial-resolution SM product at 3 and 1 km has been generated. Meanwhile, the swath width of Sentinel-1A/1B is approximately 250 km, whereas that of SMAP can reach 1000 km. Because of this large difference, the overlap spatial coverage of SMAP and Sentinel is remarkably reduced, which then reduces the revisit interval from the original 3 days to 12 days. During the fusion process, the resampled 1 km Sentinel-1A/1B backscatter and the 9 km SMAP passive enhanced brightness temperature were input together as original data. The 1 km brightness temperature was obtained using the snapshot retrieval approach [110] on the overlapped area. Then, the high-resolution SM can be retrieved using the tau-omega model [111], together with the brightness temperature and ancillary datasets. For more details about the merging approaches of the SMAP/Sentinel SM product, readers can refer to [112]. Both 1 and 3 km resolution SMAP/Sentinel SM products have been validated against hundreds of in situ measurements, including dense and sparse networks across the globe. These encouraging results suggest that SMAP/Sentinel SM estimations could considerably match ground observations, demonstrating their capability to express soil water content with good accuracy and high resolution [112,113].

Assimilation and Reanalysis
The assimilation approach could effectively overcome the spatial scope and representativeness limitation of ground observations, overcome the depth limitation of spaceborne microwave-derived data, and achieve complete multi-depth coverage SM with definite physical meaning. It is efficient for the integration and improvement of SM from multiple independent sources [114]. Hence, spatial-temporal continuous SM profile information can be efficiently derived by assimilation systems [115,116]. The assimilation algorithm is an important part of the entire process that connects the observed and predicted data to optimize the estimation values. Commonly used SM assimilation methods include step-by-step correction [117], optimal interpolation [118], variational constraints [119], Kalman filters [120], and particle filters [121,122]. Recent studies note that deriving algorithms of filtering (i.e., ensemble Kalman filter) [123,124] and variational constraints (i.e., four-dimensional variational) [119,125] express favorable performance in estimating model parameters. As the central part of the assimilation process, the land surface model (LSM) simulates the physical processes occurring between the ground and atmosphere in the exchange of matter and energy. Many LSMs, such as Noah [126], the Community Land Model (CLM) [127], the Simple Biosphere Model [128], and the Boreal Ecosystem Productivity Simulator [129], are frequently employed in the assimilation of land surface parameters (including SM). Table 2 shows that many LSM-based SM estimations are released for various hydrometeorological applications. It is worth noticing that the spatial extent of many LSM-based retrievals merely covers the specific nation or region the development organizations belong to, which remarkably restricts their scopes of application. In comparison, GLDAS, being one of the few global-scale assimilation systems, is well acknowledged as an eminent land surface modeling framework to produce optimal fields of land surface states and fluxes in near-real time across the world [130][131][132][133].
The SM profile information can also be retrieved from reanalysis approaches. The reanalysis process takes all available observations (i.e., ground-and spaceborne-based datasets) to calibrate the results from model running, whereas the assimilation process refers specifically to adding observation data for correction when the physical model is running. Many reanalysis retrievals have been released to simulate the global SM profile information ( Table 2). ERA5 has attracted extensive attention since its advent as a fifthgeneration reanalysis product of ECMWF. ERA5 is capable of generating higher spatial resolution (9 km) and temporal resolution (1 h for every atmospheric variable) retrievals than other reanalysis systems. In addition, it uses more satellite-based observations that are available to optimize the output results. Previous studies have revealed that the ERA5 SM exhibits higher skills than the other reanalysis products and a significant improvement over its predecessor [134], which may imply a promising application prospect for the ERA5 SM.

Machine Learning
Recently, machine learning techniques have demonstrated great potential for simulating patterns and gaining insights into Earth's systems from scientific data. Machinelearning-based approaches exhibit notable competence in the simulation of nonlinear complex mapping relationships, such as SM. Machine learning algorithms are currently employed in SM estimation studies [151,152]. In terms of the different scale transition processes, the simulation can be divided into gap filling, downscaling, and upscaling ( Figure 2). Gap filling means no scale transition during the entire simulation process, and the output estimations are dedicated to filling the gaps in the original SM products to improve spatial completeness. Great efforts have been made to downscale fields to acquire high pixel resolution SM estimations, which could depict regional SM spatial heterogeneity in detail and then be applied in the agricultural sector at the field scale. Comparatively, upscaling is usually dedicated to transferring point-scale in situ measurements to pixelscale estimations, retrieving spatially continuous and representative SM products. Table 3 introduces the application of machine-learning methods to improve the performance of SM products. Meanwhile, an increasing number of published papers clearly state that machine-learning-based SM research is becoming a hot topic at present.

Traditional Machine Learning
Because of their greater ability in nonlinear and complex relationship simulations than traditional statistical regression methods, considerable attention has been devoted to using machine learning methodologies for enhancing SM products [7]. As shown in Table 3, several approaches, such as artificial neural networks (ANN), Bayesian, classification and regression trees (CART), extreme gradient boost (XGB), gradient boost decision trees (GBDT), K-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM), are employed for both regional and global SM mapping [9,14,17,[152][153][154][155]. Liu et al. [14] systematically compared the performance of six traditional machine learning approaches in surface SM downscaling from 0.25 • to 1 km in four case study areas with different climates and land cover types. The results showed that the multi-regression tree-based RF achieved high performance with high goodness of fit and low regression bias, whereas the downscaled data from the ANN, CART, and SVM models occasionally showed abnormal values. Among the different case study regions, it was found that regions located in a single climate zone, with mild topographic variation and medium vegetation coverage tended to produce high-accuracy results. The contribution of each explanatory variable varied remarkably across the case study regions owing to their diverse complex hydrothermal and physical geographical conditions. On this basis, Liu et al. [154] further explored the capability of multiple regression tree-based machine learning algorithms to explicitly illuminate their characteristics in multi-scale surface SM disaggregation. Through inter-comparison among RF, GBDT, XGB, and CART, it was suggested that the best result was derived from GBDT in grasslands with a high correlation coefficient and low error, and both RF and XGB achieved favorable performances as well. Additionally, XGB was applied in multi-layer high-resolution SM estimation over the United States, and the downscaled SM favorably captured the temporal dynamics of in situ measurements with high accuracy [156]. The RF model was employed in a spatiotemporally continuous surface SM downscaling process at a field scale of 30 m resolution and displayed good performance in generating accurate SM estimations [9]. The GBDT algorithm was used for SM downscaling over the Tibetan Plateau and effectively improved the resolution of the SMAP SM from 36 to 1 km. High-resolution SM can preserve the accuracy of the original SMAP and express detailed spatial SM variability simultaneously [157]. Apart from the abovementioned studies, there is a host of research using multi-regression tree-derived machine learning methods to improve the resolution and spatial-temporal continuity of SM [65,[158][159][160][161]. In general, great efforts have been made to clarify the performance of each member of the huge machine-learning family in simulating SM across various underlying surfaces. Among the numerous methodologies, multi-regression tree-derived approaches, such as RF, XGB, and GBDT, have revealed favorable capabilities in simulating and reconstructing SM products with good accuracy and fitting degree. Thus, this finding provides important guidance for the selection of machine learning methods in SM regression. Feature extraction, as a critical pre-processing step, could be very important in decreasing dimensionality and redundancy, increasing learning accuracy, and improving the understandability of results. However, for traditional machine-learning algorithms, the feature extraction and model training processes of classical machine-learning methods are two separate processes. The extracted features are used directly in subsequent calculations without any return adjustment, which results in error propagation. Under the joint action of climatic and human factors, the pattern of SM presents spatial-temporal distribution regularities. Classical machine learning methods only support the input of sample data in the form of discretization and rarely exploit the spatial-temporal dependencies of samples [162].

Deep Learning
In comparison, deep learning techniques are capable of constructing multi-layer neural networks by simulating the mechanism of the human brain, automatically extracting the spatial-temporal features of data, and then conducting spatial-temporal modeling and prediction based on deep understanding and mining [163][164][165]. Deep-learning methods can behave much better in learning high-dimensional features than classical machinelearning methods. A series of studies and applications have been carried out in the field of spatial data mining using deep learning methods, and relatively ideal results have been achieved in recent years [162]. Deep learning shows good potential for texture extraction and reconstruction. As presented in Table 3, many scholars have attempted to retrieve qualified SM estimations through deep learning algorithms, such as convolutional neural networks (CNN), gated recurrent units (GRU), long short-term memory (LSTM), deep feedforward neural networks (DFNN), and H2O models. Liu et al. [166] designed a novel LSTM-based multi-scale scheme for estimating surface SM by integrating remotely sensed data and in situ measurements over the United States. The model directly learned spatialtemporal patterns from in situ measurements, and the derived 9 km SM presented better accuracy than the 9 km products of the SMAP mission. This upscaling study revealed the significance of ground observations despite the availability of numerous satellite-retrieved products. Li et al. [167] tested the performance of CNN, LSTM, and ConvLSTM (a model integrating the merits of CNN and LSTM) in improving SMAP SM over China. The ERA5 SM information was transferred to SMAP to improve the prediction accuracy. The results illustrate that ConvLSTM outperformed CNN and LSTM in terms of a higher fitting degree and lower error. The transfer-based models exhibited better accuracy than the models without transfer learning, except in winter. ConvLSTM, combined with a physical model, was applied to estimate root-zone SM [168]. The GLDAS SM products were used as prediction data, and the spatiotemporal continuous root-zone SM derived from the physical model and in situ measurements were treated as target data. The estimated SM achieved high fitting coefficients compared with the original GLDAS SM, especially for the deep layers. Zhao et al. [169] investigated the capability of the deep belief network (DBN), improved DBN model, and residual network (ResNet) model in SM downscaling on the Tibetan Plateau. It was shown that the deep learning models had the advantage of fitting detailed SM texture patterns compared with RF. Compared to the DBN models, ResNet displayed an extraordinary ability to learn and simulate SM textures with high robustness.
The results and conclusions of these studies indicate that deep learning methods are suitable for SM simulations. Further, the well-designed deep learning model could outperform RF in SM estimation, suggesting the huge potential of deep learning methods in improving the quality of SM. The multiple deep learning algorithm-fused model usually behaved better than the single ones. In addition, because there are a number of algorithms inside the deep learning framework, more deep learning method-based explorations are necessary to determine comparatively eminent algorithms for SM estimation.  Gridded SM estimations with an approximate spatial resolution of 100 m at three networks located in North America The RF model upscaled SM expresses high level matching degree against field samples and outperforms other common regression methods. The deep learning model retrieved SM shows better accuracy than SMAP radar and GLDAS SM products

Applications
SM is a sensitive component of the Earth system that interacts with the atmosphere and Earth's surface at every moment. Although the in situ measured SM can precisely reflect the soil water content, the confined extent and point-scale value remarkably restrict its applicability. Moreover, the original remotely sensed SM can hardly provide high-resolution and spatial-temporal continuous SM records because of the inherent limitations of spaceborne microwave sensors. Comparatively, advanced SM products provide unprecedented opportunities for deriving datasets with improved spatial coverage, multi-depth information, high resolution, and extended time sequence from the 1950s to future scenarios. These multi-model improved SM products are broadly applied to advance the understanding of Earth system processes, which mainly include drought monitoring, climate change, hydrology, and ecology.

Drought Monitoring
Drought is usually induced by a deficiency of precipitation and excess ET, which jointly cause varying degrees of decline in SM. As drought can seriously affect crop growth and yield, agricultural departments have always attached great importance to real-time drought monitoring. Therefore, a wide variety of studies have explored the potential of SM for drought monitoring. First, for regions renowned for their advanced plant product industries, more ground stations could be arranged in cropland when establishing SM networks [18,179,180]. This arrangement style reflects the emphasis attached to cultivationrelated drought monitoring by acquiring multi-depth SM recordings in real-time. Second, in regional-or national-scale drought forecasting studies, both in situ measurements and raster SM estimations are employed simultaneously to ensure data accuracy and spatial coverage [181][182][183]. Third, coarse-resolution SM products, retrieved from spaceborne sensors or LSMs, are mainly utilized for depicting large-scale (i.e., continental, global) drought characteristics [16,184]. In these studies, SM and other related auxiliary components, such as vegetation fraction, temperature, and precipitation, were used together in drought applications. These variables are co-converted to representative indices, such as the SM drought index [183], soil water deficit index [182], SM use efficiency [184], perpendicular drought index [16], modified perpendicular drought index [181], and enhanced combined drought index [185], to comprehensively indicate the duration, trend, intensity, and severity of drought conditions.

Climate Change
The Sixth Assessment Report of the Intergovernmental Panel on Climate Change was released in 2021 [186]. This unequivocally revealed a serious warning of unprecedented warming trends and increasingly frequent extreme weather events. Because every component inside the climate system constantly interacts with each other, the spatial and temporal patterns of SM are derived from the combined actions of all members. Consequently, SM products based on spaceborne sensors and LSMs have been widely used in climate-variability experiments and analyses. Dorigo et al. [187] evaluated the global trend in harmonized multi-satellite surface SM from 1988 to 2010 and found drying and wetting trends in different regions. Qiu et al. [188] compared the performance of satelliteand reanalysis-based SM products. The two types of products exhibit coincident patterns in non-irrigated areas. Moreover, the discrepancy was mainly induced by artificial interference such as irrigation and harvest. On the basis of ECV SM v4.2, Pan et al. [189] conducted seasonal and annual scale analysis, and the results revealed that "wet seasons get wetter, and dry seasons get dryer," proving the gradual extremity tendency. In addition to analyzing the evolutionary features of SM, integrated climate variability studies were carried out in terms of interactions and feedbacks between ET, temperature, precipitation, and SM [190][191][192].

Hydrology
SM plays an important role in the circulation of land-atmosphere hydrology and energy balance. It could "remember" exceptional signals from the land-atmosphere system and provide effective feedback to other components of the cycle, such as ET, precipitation, underground water, and runoff [193]. The Food and Agriculture Organization of the United Nations Irrigation and Drainage Paper No. 56 on crop Evapotranspiration listed SM availability as a key factor that could influence crop ET estimation [194]. Allam et al. [195] estimated evaporation over the upper Blue Nile Basin and used least-squares data assimilation methods to estimate soil water storage. SM datasets from the ECV, Climate Prediction Center, and Gravity Recovery and Climate Experiment terrestrial water storage were considered essential inputs during the assimilation procedure. The Global Land Evaporation Amsterdam Model v3 uses SM products retrieved from both spaceborne sensors (ECV and SMOS) and LSM (GLDAS Noah) to estimate terrestrial evaporation [196]. Previous studies have suggested a strong coupling between precipitation and SM [197,198]. By inverting the soil-water balance equation, an SM2RAIN algorithm was developed and used to estimate basin-and global-scale precipitation with satisfactory accuracy using in situ and satellite SM observations [199,200]. Swenson et al. [201] detected groundwater variability using in situ measurements in Oklahoma, U.S., and a time series of groundwater anomalies was successfully acquired after removing SM variability in the unsaturated zone. Additionally, remotely sensed SM has been proven capable of efficiently calibrating groundwater-land surface models [202]. Moreover, it is widely acknowledged that the spatial variability of SM and soil properties may have a dominant and complex impact on runoff in terms of changing storm size [203]. Therefore, multi-source SM products are widely utilized in advancing runoff models to help set the initial conditions and reduce prediction uncertainties [204,205].

Ecology
SM is a crucial regulator of the basic processes in terrestrial ecosystems. Its variability can remarkably impact the operational patterns of terrestrial ecosystems. SM can directly influence photosynthesis and the net primary productivity (NPP) of ecosystems by affecting the occurrence, intensity, and duration of vegetation water stress [96,206]. In addition, both nitrogen and carbon cycles are tightly linked to soil water movement [207]. Therefore, SM plays a significant role in ecosystem processes. Reich et al. [208] explicitly demonstrated the effect of SM on photosynthesis using in situ measurements. The results assumed that low SM may limit photosynthesis in boreal tree species during the growing season, despite warming temperatures. The impact of drought on NPP variability on a global scale was investigated, and a strong positive relationship between available moisture and NPP in arid and seasonally dry regions was demonstrated [209]. The SM balance was calculated using the Carnegie-Ames-Stanford approach and then converted to a water stress factor to express its impact on the NPP. In addition, dozens of global NPP estimation models have treated multi-depth SM (ranging from 0 to 2.5 m) as an important input parameter [210]. Li et al. [207] analyzed SM and other supplementary datasets from 1980 to 2015 in China's dryland derived from TerraClimate [211]. They found that water and soil conservation projects, such as reforestation, evidently increased the net primary production. However, SM continuously decreased, suggesting that the existing ecosystem was unlikely to be sustained. Satellite-derived SM together with related environmental drivers were employed to analyze the evaporation decline in the U.S. from 1961 to 2014, and a significant evaporation decrease of approximately 6% was detected [212].

Outlook
This study provides a brief introduction to the main types, deriving methodologies, quality-improving techniques, and applications of multi-source SM products. Generally, through development for more than half a century, great contributions and advancements have been made in SM acquisition and employment. However, to persistently enhance the performance and applicability of SM products, there is still a long way to go. Based on this review, we propose the following research priorities for future SM estimations.

Improved Spatial Coverage
Many studies employing SM as a key analysis object used seamless products to ensure complete coverage of the study area. Fortunately, assimilation-and reanalysis-based SM estimations have already overcome this problem in terms of the strength of numerous hydrological models. However, gap regions are prevalent for remotely sensed data. Owing to the limitation of microwave penetration, spaceborne sensors are unable to detect signals in frozen or dense vegetation (≥5 kg/m 2 )-covered regions. However, it is crucial to access spatial-temporal continuous SM over forests, which would enhance the understanding of the mechanisms by which forest structure affects soil water conditions. Forests have a significant impact on water movement in nature as well as the regulation of SM, precipitation, evaporation, runoff, and hydrological cycles. Unexpected RFI typically result in exceptional values. Moreover, the rotation difference between the satellite and the Earth could result in a strip-gap region. Hence, it is necessary to explore the capability of gap-filling methods (i.e., classical statistical algorithms and artificial approaches) and determine an adequate method to update the present products on the values of gap regions [72,171]. Data fusion is also an effective approach for improving spatial integrity by blending the quantities of qualified SM information. For example, the multi-source information-merged ECV and SMOPS SM products show an evidently higher coverage percentage than the single sensor-derived ones [102].

Higher Spatial Resolution
Compared to coarse-resolution SM products, fine-resolution SM products can be more appropriate for landscape scale, watershed scale, and field scale applications; for instance, hydrological simulation over the scale of drainage basins or SM spatial variability analysis on a field scale. Many studies have been conducted on SM downscaling using statistical models, data fusion, assimilation, and machine-learning algorithms. These works obtained good results by integrating high-resolution ancillary data collections from MODIS, Landsat, and Sentinel [11,14,113,176]. Moreover, machine learning approaches have notable advantages in terms of simplicity, efficiency, and competence. It was found that the multiregression tree-based models could accurately reproduce SM with a downscaled resolution; however, these models did not consider spatial texture features. Comparatively, the advent of deep learning techniques provides an unparalleled opportunity for the simulation of spatially autocorrelated objects, such as SM. Therefore, it would be beneficial to develop a suitable model to estimate SM among the large deep-learning family [162]. In addition, high-resolution land surface observations from well-known optical sensors and SAR could serve as qualified explanatory variables for SM downscaling to hundreds or even dozens of meter grids [17,213].

Longer Time Span
It can be beneficial to analyze evolutionary trends over decades or even hundreds of years in climate change fields to capture the laws of climate origination and evolution. Thus, it is valuable that the time span of SM datasets can be continuously prolonged. Both satellitebased and assimilated SM products begin when the corresponding observation programs begin. For the sake of continuous acquisition of SM data, on the one hand, observations in existence should be maintained and ensured to work properly; on the other hand, new ground networks and satellites to provide continuous monitoring of SM are indispensable for extending time series. For instance, the National Satellite Meteorological Center of China launched the FY-3E satellite on 5th July 2021, which is dedicated to networking with FY-3C and FY-3D in orbit to observe SM and other meteorological parameters [214]. Additionally, forecasting SM with the help of future scenarios and hydrologic models could also provide access to acquire SM predictions, which may favor the investigation of future climate variations [190,215].

Higher Temporal Resolution
In addition to pursuing a high spatial resolution, improving the frequency would also be a key research priority for future SM products. Hourly monitoring data can be of great benefit in investigating subtle SM fluctuations induced by artificial irrigation, rainfall, and ET within a day, which is valuable for agricultural and land-atmosphere interaction applications [195,199,200,216]. At present, both in situ measurements and LSMs are capable of providing sub-hourly and sub-daily observations. Additionally, the SMAP publishes three-hourly surface and root zone SM estimates with~2.5-day latency, which are derived from the assimilation of both ascending and descending brightness temperature data into the catchment LSM [217]. It is suggested that LSM is an effective and promising approach for generating high temporal resolution SM estimates. Furthermore, with an increasing number of satellites launched with different transit moments from each other, it would be promising to acquire observations more and more times per day across the globe [214].

Shorter Time Latency
It is imperative to access real-time or near-real-time SM recordings to conduct drought monitoring and early flood warning. Croplands also have high timeliness requirements for SM product availability to arrange irrigation or drainage without delay. In situ measurement data can be quickly collected through sensors and the internet. However, in terms of remotely sensed and assimilated products, there is always a latency of dozens of hours. For instance, the SMOPS data latency for 6-h products is 3 h and that for daily products is 6 h. The SMAP data latency for available data products is as follows: (1) Level 1 products, within 12 h of acquisition; (2) Level 2 products, within 24 h of acquisition; (3) Level 3 products, within 50 h of acquisition; and (4) Level 4 products, within 7 days for SM [129]. ERA5 is continuously updated with a latency of approximately 5 days [145]. Consequently, there is an urgent need to accelerate and optimize the processes of data transmission, algorithm operation, and data distribution, which should include, but not be limited to, the improvement of related equipment, techniques, and methodologies.

Developing Multi-Depth Products
Land surface and root-zone SM recordings are of equal importance for advancing the understanding of Earth's system processes. Furthermore, root-zone SM counts more than top-layer SM in vegetation growth. It is critical to develop multi-depth SM products to comprehensively master the soil wetness profile. In situ measurements can detect multi-depth SM using probes at different depths [18]. Assimilation-and reanalysis-based products can effectively describe soil water movement and then generate root-zone SM estimates to fulfill the requirements of considerably progressing hydrological and agricultural applications [145]. In addition, satellite-based programs have started to produce root-zone values through a data assimilation system. For instance, the SMAP project integrates its own observations with complementary information into an LSM and produces 3 h and 9 km surface (0-5 cm) and root-zone (0-100 cm) SM estimations through both spatial and temporal interpolations and extrapolations [66,218]. The ECV program also initiated a program to develop root-zone SM products using Noah-MP and ISBA LSMs, which are dedicated to linking vegetation phenology and biomass carbon allocation to moisture availability in the soil.

Higher Data Accuracy
Significant efforts have been devoted to reducing errors to continuously close the gap between SM estimations and real SM conditions. In a previous study, ground probes were periodically calibrated and maintained to ensure their operation under good conditions [18]. AMSR-2 retrieves SM using an X-band signal and applies a neighboring C-band to escape RFI [58]. The SMAP program designed effective L-band SM detection sensors together with advanced anti-RFI devices and improved algorithms to detect and remove harmful interference in the L-band [68,219]. A series of developments in model physics, core dynamics, and data assimilation have been steadily achieved, which have contributed to significant improvements in SM consistency [145]. Despite this progress, there is still considerable room to pursue higher accuracy. Artificial intelligence-driven algorithms display great potential for simulating the SM model. Increasing SM datasets will become available as more ground networks and satellite programs are being planned. A significant benefit can be expected from combining these advanced technologies and datasets.

Better Model Performance and Interpretability
In recent decades, numerous models have been built and updated to estimate SM, and the overall quality of the corresponding products has been evidently enhanced. Traditional physical models are widely employed in spaceborne and assimilation systems to retrieve SM. These sophisticated and exquisite models are carefully designed and theoretically interpretable [145,146]. In comparison, artificial intelligence-driven approaches, especially the deep learning family, exhibit outstanding capabilities in SM regression and prediction [17,162]. In addition, they have the advantages of being highly efficient, simple, and convenient. However, their inner operational mechanisms are difficult to explain. Consequently, it could be favorable to develop hybrid models by combining physical and artificial intelligence methods, which would be able to exploit the strengths and discard the weaknesses of both methods. The hybrid model is expected to improve both model performance and interpretability.

Conclusions
Much attention has been paid to SM monitoring since ancient times. Before the existence of modern technology and equipment, subjective perceptions were prevalently employed to detect local SM conditions for proper irrigation arrangements. With the emergence of advanced probes, spaceborne sensors, and algorithms, spatial-temporal continuous SM records are becoming increasingly easily available. Because SM plays an important role in the land-atmosphere interaction system, vast amounts of multi-source SM datasets have been utilized in numerous studies on drought monitoring, climate change, ecology, and hydrology. However, the current status and characteristics of SM estimates should be clarified before they can be used in practical applications. The review of SM has generally been limited to certain retrieval algorithms, scale-conversion techniques, or applications. Therefore, there is an urgent need for a relatively comprehensive demonstration of advances in the quality of global SM products.
In this study, we introduce the primary retrieval methodologies of SM and the current approaches used to enhance the quality of SM products. Owing to the complex driving mechanism of its spatial-temporal distribution and evolution, great efforts have been made to advance retrieval methods. Numerous statistics, data fusion, assimilation, and machine learning-based approaches have been continuously designed and improved to enhance the reliability (including spatial-temporal completeness, resolution, and accuracy) of retrieved SM products. Although some of the established models are explainable, whereas others remain unexplainable in mechanism, they basically give renewed impetus to advancing the quality of SM estimations. In addition, a large quantity of SM-related original datasets and land-atmosphere parameters collected from different sensors, bands, and time nodes have been taken as ancillary references during the retrieval process to promote the reasonability of the response of SM to land-atmosphere variation.
Despite the steady progress in SM estimation models, there is still a large margin for improvement, such as pursuing higher spatial coverage, finer spatial resolution, longer time span, higher temporal resolution, shorter time latency, multi-depth products, higher data accuracy, and better model performance and interpretability. Moreover, it is critical to propose targeted solutions to mitigate the influences of various vegetation canopies and human activity interference, which could fundamentally improve the accuracy of spaceborne received signals and retrieved SM.
This review is expected to provide a reference for understanding the advances achieved in global SM estimation in terms of different approaches. Although many previous studies are referred to in this review, it could be difficult to include all publications on this topic. More complete research is necessary to contribute to the generalization of studies focused on SM in the future.

Data Availability Statement: Not applicable.
Acknowledgments: We appreciate the anonymous reviewers for their valuable comments and suggestions in improving this manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.