1. Introduction
The advancement of remote sensing technologies has been instrumental in enriching our comprehension of Earth’s processes, particularly in analyzing Earth’s hydrological and climatic phenomena. These technologies, including radiometers, synthetic aperture radars (SAR), and optical sensors, supply diverse and valuable data that open new research avenues. These advancements enrich remote sensing applications that enhance crop yield prediction, facilitate the detection of vegetation changes, improve weather forecasting, and contribute to the analysis of the global carbon cycle. To accurately model hydrologic processes, it is vital to consider Essential Climate Variables (ECVs) such as soil moisture (SM), precipitation, and evapotranspiration [
1]. ECVs are physical, chemical, or biological variables that play a critical role in characterizing Earth’s climate and serve as key indicators of environmental changes, making their effective monitoring essential for ecosystem analysis. In particular, SM, a biological and geophysical indicator, plays a key role in the Earth’s system, influencing vital processes like vegetation growth, climate predictions, and hydrological models [
2]. It directly affects plant health, growth, and water availability for vegetation. Acting as a crucial land geophysical metric, SM mediates the exchange of energy and water between the land’s surface and the atmosphere. Therefore, the precision and comprehension of SM metrics are indispensable for climate prediction, water resource management, and the refinement of hydrological models. As a necessary gauge of terrestrial moisture conditions, SM aids in drought monitoring [
3], vegetation development [
4], and water conservation [
5]. Instruments such as radars, radiometers, and reflectometers are typically used for SM measurements, given that microwave frequencies resonate with variations in soil dielectric properties due to moisture content [
6]. Among the options, L-band frequencies are preferred due to their providing resilience against atmospheric losses and vegetation cover interferences. Integrating data from various instruments aimed at enhancing L-band measurements offers significant improvements in Soil Moisture (SM) accuracy and, alternatively, enhances spatial and temporal resolutions. An example of this approach is demonstrated in the SMAP (Soil Moisture Active–Passive) mission, designed to combine passive and active remote sensing technologies, an L-band radiometer, and an L-band radar.
The primary objective of merging data in this context is to enhance the spatial resolution of SM estimates. The SMAP radiometer, while offering accurate SM estimations, is limited by a spatial resolution of 36 km. In contrast, the radar component of SMAP provided a much finer spatial resolution, capable of reaching up to 3 km. By combining these datasets, it was possible to produce a downscaled SM product with an improved resolution of 9 km. This approach leveraged the strengths of both the radiometer and the radar, leading to a substantial enhancement in the overall quality of the data [
7]. However, shortly after its launch, the SMAP radar transmitter experienced an anomaly and stopped collecting data [
8]. As a result, the official SMAP product relied primarily on the brightness temperatures (
) measured by the L-band microwave radiometer to compute SM maps [
7] and freeze/thaw (F/T) state maps on a fixed 36 km EASE-Grid 2.0. As a novel effort, after the radar transmitter malfunctioned, SMAP Reflectometry (SMAP-R) emerged as an opportunistic polarimetric GNSS-R instrument. SMAP radar receiver bandpass frequency filter was modified to be centered at 1227.42 MHz to receive in bistatic configuration the Global Positioning System (GPS) L2C signals. SMAP-R has specific advantages compared to traditional GNSS-R missions due to its high-gain antenna, which provides high SNR for low integration times, and the linear polarimetric antenna, which enables the hybrid compact polarimetric (HCP) capability [
9]. On the other hand, the main drawback of SMAP-R comes from the highly directive scanning antenna as well, which significantly reduces the number of measurements per day that the instrument provides. A polarimetric GNSS-R instrument in HCP configuration allows computation of the full Stokes parameters describing a polarimetric portrait of the surface under observation [
10].
In vegetated areas, the phenomenon of dispersive reflection, which affects the signals received by radiometric sensors, leads to a signal polarization signature. This polarization signature allows SMAP-R to discriminate details about the physical properties of vegetation, the texture, roughness, and SM levels [
11]. In general, over soil surfaces, the characteristics of the GNSS-R reflection mostly depend on SM and are additionally affected by surface roughness and vegetation. The methodology for formulating the Stokes parameters and their calibration is detailed in [
12,
13]. The surface effects on the GNSS-R signals have been assessed using the Stokes parameters for the SMAP-R dataset [
14]. The analysis of SMAP-R data combined with SMAP radiometer data has shown efficacy in the estimation of SM maps by improving the spatial resolution while maintaining and checking the unbiased SM estimation error [
14]. The results of this data merging strategy suggest the potential for combining sparse datasets from GNSS-R with coarser radiometric data to achieve more accurate and detailed SM assessments.
Developing a data processing methodology that not only accounts for the nonlinear relationships among land geophysical parameters and measurements but also addresses the fact that the datasets have differing spatial resolutions would be highly advantageous for SM retrievals. In this context, there has been a growing focus on the development of SM retrieval algorithms that effectively incorporate terrain data. For example, in [
15,
16], the authors introduced the trilinear regression-based reflectivity–vegetation–roughness algorithm. This algorithm derives SM estimations at a 36 km resolution by considering Cyclone Global Navigation Satellite System (CYGNSS) [
15] reflectivity, along with the SMAP vegetation opacity and roughness coefficient. However, this algorithm has limitations in terms of the number of geophysical variables it considers. In [
17], the authors developed a time-series approach for SM retrieval, using maximum and minimum SM values from SMAP to establish the system’s limits. As a key point, the changes in vegetation and surface roughness evolve significantly more slowly than changes in SM. In [
2], the authors presented a fully connected Artificial Neural Network (ANN) to account for the effects of vegetation and ground dynamics on SM estimation. Although they used in situ SM measurements from International SM Network (ISMN) sites as reference labels, most of these sites were located on relatively non-mountainous surfaces with low-to-moderate vegetation cover, such as croplands, grasslands, and savannas. This limits the analysis’s ability to capture scattering effects, considering the significant temporal and spatial variation and non-uniformity of vegetation parameters.
Addressing the challenge of integrating diverse datasets for SM estimation, this manuscript introduces an innovative graph-based data integration technique for SM enhancement using the SMAP-R dataset and terrain characteristics. Our approach is based on Graph Signal Processing (GSP), which is well-suited for processing signals that are in irregular domains and result from physical processes influenced by multiple variables. Graph models efficiently capture the structural information of images, and their application in image processing has been proven effective in numerous applications [
18]. Recently, graph-based methods have been explored in remote sensing applications. For example, Change Detection (CD) algorithms have used the Nystrom extension to represent images as graphs, minimizing similarities among them to detect changes [
19]. Path-wise graphs [
20] and super pixel-wise graphs [
21], based on the self-similarity property, have been constructed to capture image structures and calculate the Difference Image (DI) through graph projection. In [
22], images were treated as a signal on graphs, highlighting changes between heterogeneous images in terms of structure and signal differences. Graph filters have also been studied to explore high-order neighborhood information. While these methods are innovative, they face challenges in incorporating terrain information and may be sensitive to outlier deviations. Our signal processing technique builds upon a graph-based method introduced in [
23]. Given the sparsity of GNSS-R reflections and their sensitivity to terrain characteristics, we propose a GSP approach that incorporates terrain information for sparse signal interpolation and SM estimation tasks. The land surface variables considered in our analysis include vegetation optical depth, roughness coefficient, land surface temperature, and clay and sand composition. Our goal is to develop a physics-aware GSP technique that captures the nonlinear dependencies between SMAP-R observables and SM values while considering vegetation and terrain effects. The aim is to augment radiometer data with reflectivity signals, thereby enhancing the overall quality and reliability of the SM estimations. It is important to note that the SMAP-R observation is a Delay–Doppler Map (DDM). The in-phase and quadrature (IQ) samples collected by the SMAP radar receiver are cross-correlated with the pseudo-random noise (PRN) code of each GPS satellite that operates in the L2C band. The basic DDMs are used then to compute Stokes parameters (also in the form of DDMs).
The remaining sections of the paper are organized as follows:
Section 2 provides the theoretical background of the GSP and graph construction.
Section 3 delves into graph as data merging tools for remote sensing,. Moving on to
Section 4, we explain the SM retrieval methodology and describe the details of the ANN model. In
Section 5, we present the SM estimation results and performance metrics achieved. Finally, in
Section 6, we conclude this study.
4. SMAP-R from GSP Perspective
Our fundamental objective is to use SMAP-R reflectivity signals to enhance SMAP brightness temperatures (
), consequently improving SM estimations. While the SMAP radiometer provides data at 36 km resolution, the spatial resolution of the SMAP-R measurements is not a fixed number, as presented in [
27]. However, on average, we can consider the scattering area to be ~9 km for most of the landscapes, especially over agricultural areas. For this purpose, the first step is to obtain complete reflectivity maps that can be used in the
graph interpolation task. The SMAP-R retrieved second Stokes parameters (
) and total power reflectivity (
) contain information about how the incident signals have been affected by the scattering surface. SMAP-R offers a unique polarimetric forward scattering dataset that can be used for land-related applications. For instance, high
values are found primarily in dry areas, such as deserts, and low
are usually found in wet or vegetated areas, such as wetlands or rainforests [
15]. Motivated by these relations between
and
with terrain characteristics, we implement a nonlinear ML algorithm to obtain complete maps of
and
. Those variables are used to compute the reflectivity maps at each polarization, i.e.,
and
. To obtain the reflectivity information, the full Stokes parameters should be computed as shown in Equation (3)
As derived by [
10], the Stokes parameters vector (i.e., [
S0,
S1,
S2,
S3]) can be related to the surface reflectivity by means of Equation (4).
where S
pq is the Sinclair scattering matrix coefficient for transmitted polarization
p and received polarization
q. For the sake of readability, we have omitted the definition of the Sinclar scattering matrix. One should refer to [
13] for additional information. Considering negligible cross-polarization as shown in [
14], most of the terms from (4) are eliminated, and one can write the reflectivity at HH and VV as follows:
Note that
is the calibrated first Stokes parameters following a similar methodology as in the CYGNSS mission, as detailed in [
22]. Furthermore, for notation simplicity, we will refer to
here in the document as the normalized second Stokes parameters,
. Because
and
are sparse, we obtain initial complete maps by implementing a regression tree that takes into consideration vegetation optical depth (VOD) and roughness coefficient information. In the context of this GNSS-R processing using the SMAP radar instrument, the specific characteristics of the instrument minimize the impact of the incidence angle variations on measurements due to its limited range (37.5–42.5°). Consequently, the incidence angle was not included in the machine learning analysis. However, in contrast, for other GNSS-R missions where the incidence angle has a larger variation range, its influence becomes significantly more pronounced, substantially affecting data quality and accuracy. Both ancillary data and validation sources for our algorithms are provided by the SMAP mission. This methodical alignment of both our input and validation datasets is a critical aspect of our study since they belong to the same mission and thus guarantee that the performance metrics we use are not only appropriate but also accurately reflective of the terrain and vegetation characteristics monitored by SMAP. In brief, our method consists of the following steps:
A machine learning (ML) approach is employed to learn the complex nonlinear relations between geophysical information (e.g., VOD, roughness, LST, and clay and sand composition) with and ;
Complete maps of and are retrieved from the sparse data using the model learned in step 1;
Implementing our GSP method, and maps are improved using VOD and roughness as profiles to determine the graph’s edge weights;
Reflectivity maps are generated using and from step 3;
The calculated reflectivity maps () from step 4 are used as graph profiles to disaggregate at 9 km;
Brightness temperatures obtained from step 5 are used to estimate SM values that are then validated using CVS measurements.
Figure 2 depicts the strategy highlighted in the previous steps. The methodology to obtain detailed brightness temperature maps vital for soil moisture evaluations will be explained in the subsequent paragraphs.
We conducted various correlation analyses and experimented with multiple regression models to select geophysical variables that significantly influence the estimation of
and
. Remote-sensed metrics, notably LST and VOD, showcased a strong correlation with soil moisture and
, as shown in
Figure 3. When terrain variables like temperature and vegetation optical depth (VOD) show consistency across adjacent nodes, they form a solid foundation for soil moisture estimation. This reliability stems from the established, statistically significant correlation these variables share with soil moisture levels and the second Stokes parameter. It is noteworthy, however, that this relationship is not strictly linear, indicating a more complex interaction between these terrain variables and SM. Consequently, they became integral to our regression exploration. Recognizing the benefits of regularization and the presence of both linear and nonlinear associations among our variables, we adopted the regression tree ML framework for the estimation of detailed maps of
and
.
Unlike polynomial regression, regression trees inherently accommodate complex variable interactions through their structure, which can be effectively managed to prevent overfitting via tree pruning and setting depth constraints. These methods, specific to regression trees, offered a significant advantage by allowing the model to be finely tuned to the characteristics of the terrain variables. Within this regression tree (RT) framework, we scrutinized how each variable, both individually and collectively, impacted the accuracy of soil moisture predictions. Variables were primarily selected based on their statistical significance, as evidenced by high correlation values from regression analysis. The final selection of input features in Equation (6), which includes VOD, roughness coefficient, LST, clay, and sand, was grounded both on their performance and their alignment with real-world physical factors.
During the training phase, we utilized the known and data to optimize the RT parameters by reducing the loss function. A significant advantage of our approach is RT’s natural ability to eliminate the need for feature scaling, making data normalization or standardization unnecessary. To implement the RT concept effectively, we utilized established machine learning packages known for their comprehensive features in conducting correlation analyses and optimizing regression trees to enhance the R2 performance metric. Moreover, the inherent interpretability of RT models provided a clear insight into decision-making based on feature values. Once the RT parameters were refined, we evaluated their performance in the model’s validation phase using a k-fold validation technique to guarantee a thorough assessment of its efficacy.
Following the strategy proposed in
Figure 2, SMAP-R offers sparse measurements of
and
, which can be aligned with geophysical variables like VOD, roughness coefficient, LST, clay, and sand. After training the RT, we employed global scale maps of terrain data, VOD, and LST to generate comprehensive maps of
and
. The non-parametric characteristic of RTs ensures the model’s suitability to project
and σ values, irrespective of the statistical attributes of the terrain variables. Furthermore, the RT method exhibits commendable L2 regularization efficiency, especially with interrelated independent variables, thereby providing a reliable initial estimation of complete
and
maps.
In our experimental setup, the initial graph signals at the nodes represent the interpolated and derived from step 2, with the edges between the nodes reflecting the terrain attributes. The aim is twofold: firstly, to integrate terrain features with the signals, and secondly, to enhance the estimation of and by leveraging neighboring nodes with analogous characteristics. This approach also aims to refine and smooth the signal at the nodes, as elaborated below:
- (A)
Graph Construction
Graph Signal: Our proposed GSP methodology uses
and
from step 2 as the baseline graph signals and terrain data to compute the edges of the graph, as shown in
Figure 4. During our analysis, the graph signals are represented by the form
and
;
Edge Weights: Our graph construction method assigns edge weights based on Euclidean distance and statistical correlations among the signal at the nodes and terrain information [
23]. From the correlation analysis, the graph construction for
will incorporate edges influenced by VOD, while for
, the edges will be determined by the roughness coefficient. For
graph construction, the edge weights are computed using the Gaussian kernel:
where
represents the difference among VOD values for neighboring nodes
, and
is a measure of correlation between
and VOD. This type of information has the potential to incorporate observed data behavior (correlation) and physical system characteristics (distance).
Analogously, for
graph construction, the graph edges are obtained from
In Equation (8), represents the difference among roughness coefficient values for neighboring nodes , and is the measure of the correlation between and roughness values at the node locations.
Note that when the distance becomes much larger than , the corresponding edge weight approaches zero. Therefore, when the terrain information for consecutive nodes is similar, their connection in the graph is strong.
- (B)
Graph Optimization
To ensure that our GSP interpolation produces smooth signals on the graph, we follow the optimization problem:
where
serves as the penalty parameter for the baseline interpolated estimation of
We aim to optimize
such that it remains closely aligned with baseline observations while also ensuring that spatially co-located measurements are like each other, considering terrain characteristics. To enhance the smoothness of the graph, we implemented optimizations on the graph Laplacian
as shown in Equation (9) for both the
and
graphs. This optimization approach takes into consideration the local proximity observed in the
and
measurements, indicating a likelihood of signal similarity between neighboring observations. By leveraging this similarity, we aim to improve the quality and accuracy of the signal reconstruction process. Ensuring a smoother graph structure, we enable more effective contributions from these signals towards the reconstruction of each other.
Figure 5 illustrates the comprehensive graph signal procedure described to generate
maps leveraging the use of ancillary data. The methodology is articulated based on the steps 1 through 3 outlined previously. Each stage is tailored to streamline the extraction, processing, and eventual rendering of a comprehensive SM map.
Figure 5 provides a visual representation of each phase, highlighting the journey from initial regression tree estimates of S1 to the ensuing graph interpolation, which is augmented by VOD. This structured visualization sheds light on the intricate data processing and analytical procedures vital for generating detailed maps from dispersed signals.
A qualitative validation for the graph signal interpolation using ancillary data is performed by subtracting the final
estimates obtained via GSP from those derived from the regression tree methodology (
Figure 6). This procedure helps elucidate the regions where VOD information has been utilized to enhance
values. In zones where VOD exhibited uniformity, we denote a smoothing effect on
values. This smoothing is an outcome of the GSP algorithm, which uses the similarity in vegetation optical depth between neighboring areas to enhance the spatial continuity of the
estimates. In contrast, over regions presenting diverse vegetation values, the graph becomes disconnected, leading to an absence of spatial smoothing in
values. This occurs because the GSP algorithm identifies these diverse vegetation characteristics as boundaries and thus does not propagate information across these boundaries. This analysis provides an illustrative and quantifiable demonstration of the efficacy of integrating ancillary information, like VOD, into our GSP methodology. The ability to enhance or moderate the smoothing of
values based on terrain characteristics affirms the strength of the GSP approach. The initial interpolation from the regression tree is key in the described methodology, as it forms the baseline graph signal. The subsequent application of GSP significantly enhances soil moisture estimation accuracy since SM estimates derived from GSP methods surpass those from ML approaches. This is due to the fact that GSP allows for effective regularization by considering observations with similar characteristics. Furthermore, our approach incorporates a multimodal analysis, where multiple physical variables influence or help estimate a measurement. A distinct advantage of using GSP over ML models is that it does not rely heavily on extensive training and validation data. This is particularly beneficial for estimating models of interconnected variables where data availability might be limited.
The Stokes parameters can be translated into reflectivity measurement using Equation (4). This translation of the first Stokes parameters into reflectivity provides a significant advantage, as it contains crucial information about seasonal variations.
To achieve an accurate downscaling of SM to a 9 km resolution, we first augment the spatial resolution of the from 36 km to 9 km, capitalizing on our derived reflectivity signals at the 9 km scale. To facilitate this enhancement, we adopt the signal processing methodology delineated earlier. The GSP method utilizes precomputed reflectivity maps to heighten the resolution of brightness temperatures. Central to this approach is the premise of using reflectivity maps as auxiliary datasets to infer brightness temperatures across distinct spatial coordinates. Within this framework, the signal at each node represents brightness temperatures, and the edge weights interlinking these nodes are modulated by the amplitude of reflectivity. Consequently, the resolution refinement of is significantly augmented with the inclusion of reflectivity maps, representing a distinct advancement over traditional re-gridding techniques or geospatial procedures such as kriging.