1. Introduction
Soil moisture is considered a key variable of the climate system; soil moisture can influence plant transpiration and photosynthesis and water, energy, and biogeochemical cycles [
1]. In addition, soil moisture can provide a super self-memory capacity of up to several months to retain anomalous information such as precipitation and radiation and can influence climate change through a series of feedback mechanisms at regional and global scales, which plays an important role in climate change prediction [
2]. Koster et al. [
3] used the Global Land-Atmosphere Coupling Experiment (GLACE) model to quantify the predictive skill of soil moisture initialization in regard to subseasonal-scale precipitation and temperature and obtained significant improvements in many regions. Therefore, accurate soil moisture data are important for scientific research and practical applications [
4], and the Global Climate Observing System recognizes soil moisture as an essential climate variable [
5]. However, due to the complex temporal and spatial variation characteristics of soil moisture [
6], the spatial coverage and representativeness of conventional observation data can hardly meet the needs of soil moisture research [
7]. Microwave remote sensing can obtain all-day and nearly all-weather high-resolution surface radiation information. These characteristics ensure that microwave remote sensing provides unique advantages in the field of soil moisture retrieval [
8].
The dielectric constant of soil is a function of several ground parameters (soil moisture, soil texture, capacitance, temperature, and salinity) and sensor parameters (frequency). It can be regarded as a complex number that contains a real and an imaginary part. The real part determines the propagation characteristics of energy as it passes upward through soil, while the imaginary part determines the energy loss [
9]. At these frequencies, soil moisture is retrieved through the large difference in dielectric constant that exists between water and dry soil. An increase in water content increases the dielectric constant, which in turn affects the emissivity. The dependence of soil dielectric constant on moisture content increases as the microwave frequency decreases. In the X-band, the difference between the dielectric constant of dry soil (∼4) and pure water (∼80) is very large. Therefore, the moisture content of soil can be derived from the observations at this wavelength.
Many algorithms based on radiative transfer models for soil moisture retrieval have been proposed. Jackson et al. [
10] proposed the single-channel algorithm (SCA) for soil moisture retrieval based on the empirical relationship between observation data and soil moisture. Njoku [
11] proposed the NASA iterative four-channel retrieval algorithm. This retrieval algorithm is based on the cost function of the observed brightness temperature versus the simulated brightness temperature, which is then iterated to fit the minimum cost function to simultaneously retrieve multiple surface parameters. Owe [
12] proposed the Land Parameter Retrieval Model (LPRM), which is a single-band dual-polarization channel algorithm. The LPRM algorithm optimizes the canopy optical depth and the soil dielectric constant by surface type and calculates the brightness temperature using a nonlinear iterative procedure through a forward modeling approach. Once the difference between the calculated brightness temperature and the observed brightness temperature reaches the convergence standard, the model derives the surface soil moisture from the optimized dielectric constant using a global database of soil physical properties and a soil dielectric model. This makes the model largely physically based, free of regional dependencies, and applicable to any microwave frequency suitable for soil moisture monitoring. These retrieval methods and satellite data are now widely used in scientific research. Wang et al. [
13] found that soil moisture in Northeast China obtained via LPRM retrieval of the FY-3B brightness temperature exhibits a high correlation with station information. Many other scientists have developed various soil moisture retrieval methods with different physical and mathematical bases [
14,
15,
16,
17,
18].
However, various instruments, such as TV, communication satellites, weather or military radar, GPS signals, and cell phones, occupy various low-frequency microwave channels. This means that in addition to the actual thermal radiation from the Earth’s surface, the on-board microwave radiometer also receives signals from active microwave transmitters on the Earth’s surface or signals reflected from the surface, collectively referred to as radio frequency interference (RFI) [
19]. Since the thermal radiation from the Earth’s surface is relatively weak, it can be easily drowned out by the RFI signal, resulting in higher-than-normal brightness temperature in localized areas, thus contaminating the satellite data and leading to large retrieval errors. The instruments that are susceptible to RFI are mainly satellite observation instruments that contain low-frequency bands such as C-band, X-band, and L-band, including MIRAS of SMOS, AMSR-2 of GCOM-W1, AMSR-E of Aqua, and MWRI of Fengyun. Since the RFI causes a sharp increase in the thermal radiation received by the observation instruments, the soil parameters of the observed brightness temperature are difficult for the LPRM algorithm to converge during the iterative process correctly, so that the retrieved soil moisture cannot be given or the wrong results may be obtained.
Numerous studies [
11,
20] have demonstrated that RFI is widely present in low-frequency band observations, especially in the C-band (6 GHz) located over the United States. Therefore, the soil moisture products of the AMSR-E are obtained by retrieving the X-band (10 GHz) with less RFI contamination. In 2004, Li et al. [
21] proposed that RFI signals of moderate and high intensities could be identified with the spectral difference method and determined that the United States was the most widely and severely RFI-disturbed region. Later, Li et al. [
22] suggested the use of principal component analysis (PCA) to identify RFI signals on land and sea considering the correlation between each channel. However, since the identification performance of the PCA method is insufficient in winter, Zou et al. [
23] proposed the normalized PCA (NPCA) method for RFI signal identification at high latitudes and complex snow and ice surfaces in winter. Zhao et al. [
24] further established the dual PCA (DPCA) method regarding the RFI identification problem in complex surfaces covered with snow and sea ice. Zhang et al. [
25] developed the modified PCA (MPCA) method using various RFI indices and two snow scattering indices to construct vectors for PCA. After the RFI signal has been effectively identified, it is highly important to consider the remediation of RFI-contaminated observations. Wu et al. [
26] proposed a linear fitting method to restore RFI-contaminated information, which relied on a linear fitting method to obtain relationship curves between observation channels in the absence of RFI contamination based on the strong correlation between observations in different channels. Later, Shen et al. [
27] established an iterative PCA method to restore RFI-contaminated observations, which fully considered the continuity between the spatial distribution of observations and their typical spatial distribution.
RFI severely restricts soil moisture retrieval of C-band microwaves. Although RFI restoration algorithms have been established by previous authors, further research is needed to verify whether the restored C-band brightness temperature can improve the soil moisture retrieval effect of C-band data. To address this issue, this paper chose the United States region as the main research area and used the iterative PCA method to restore RFI-contaminated data. Then, the LPRM algorithm was selected to retrieve the soil moisture based on C-band data. Retrieval results before and after RFI restoration were compared to clarify the applicability of restored data in soil moisture retrieval and finally provide a brief analysis of the soil moisture retrieval process.
2. Materials and Methods
2.1. Data
The AMSR-2 was launched in 2012 on board the GCOM-W1 satellite, which provides seven observation frequencies (6.9, 7.3, 10.7, 18.7, 23.8, 36.5, and 89.0 GHz), each with both horizontal and vertical polarization options, for a total of 14 channels. In this paper, whole-year L1R-level brightness temperature data covering the United States region of the AMSR-2 for 2016 were used for retrieval, which are resampled data, and all channels are therefore uniform at the same spatial resolution of 35 × 62 km similar to the 6.9 GHz channel.
To verify the effect of microwave imager information before and after restoration on the soil moisture inversion effect, European Space Agency Climate Change Initiative (ESA-CCI) and ERA5 soil moisture products were introduced in this paper for examination. The ESA-CCI soil moisture product is one of the important products initiated by the ESA Climate Change Association for the monitoring of feedback effects with climate change and represents a combined product of soil moisture based on the retrieval of three active and seven passive remote sensing probes on board the satellite. Among them, the active microwave remote sensing dataset is based on the C-band scatterometer on METOP-A satellite and the scatterometer SCAT observations on ERS-1 and ERS-2 to obtain soil moisture using the algorithm proposed by Wanger et al. [
28]. The passive microwave dataset is based on the terrestrial parameter retrieval model algorithm proposed by Owe [
12] and uses data observed by four instruments, the microwave imaging instrument TMI on board the TRMM satellite, the microwave radiometer SMMR on board the Nimbus, the channel-specific microwave imaging instrument SSM/I of the DMSP project, and the AMSR-E on board the Aqua satellite, to retrieve soil moisture. The dataset was started in November 1978 and has been continuously updated since then, with a spatial resolution of 0.25° × 0.25° and a daily temporal resolution. The ESA-CCI soil moisture version v06.1 was used in the study.
ERA5 is the fifth global reanalysis data set newly released by the European Centre for Medium-Range Weather Forecasts (ECMWF) using the four-dimensional variational data assimilation method of the Integrated Forecast System (IFS) CY41R2 to generate more climate-relevant driving fields. The employed ERA5 data are the latest soil moisture reanalysis data of the ECMWF, with a spatial resolution of 0.1° × 0.1°, divided vertically into four layers (7, 28, 100, and 289 cm). The first layer of these soil moisture reanalysis data was used in this study.
To determine the accuracy of the inversion results, this paper further analyzed the synergistic relationship between the retrieval products and precipitation, where the considered precipitation product encompasses Tropical Rainfall Measuring Mission (TRMM) precipitation data, generated through near-real-time 3 h TRMM multisatellite precipitation analysis with a spatial resolution of 0.25° × 0.25°.
2.2. Microwave Retrieved Method of Soil Moisture: LPRM
In this paper, we use the standard LRPM approach. The LPRM is an algorithm for soil moisture retrieval considering a single-band dual-polarization channel in the C- or X-band, which comprises two components based on a radiative transfer model for land surface thermal radiation simulation and a mixed-media model for surface soil water content estimation. Low-frequency channels can better penetrate the atmosphere with little opacity, so the influence of the atmosphere can be neglected in the calculation process, and the radiative transfer equation can be further simplified into the τ-ω model [
29]. The brightness temperature can be estimated with τ-ω model to determine the surface emissivity, and the surface emissivity can be related to the permittivity via the reflectivity model [
30]. Finally, the soil moisture can be calculated with the hybrid model [
6]. The technical details of the LRPM method can be found in the literature [
12].
First, an attempt was made to directly retrieve the soil moisture in the United States using the 6 GHz brightness temperature on 18 June 2016. As shown in
Figure 1, there remained a large number of blank areas in the retrieval results except for areas not covered by the satellite orbit, which indicates that the corresponding soil moisture cannot be obtained via retrieval in many metropolitan areas of the United States, which is especially obvious in the western part of the United States. Considering that the United States is the region where the 6 GHz brightness temperature is the most seriously contaminated by RFI [
27] and the observed blank areas are mainly concentrated in large cities, it is determined that RFI contamination could lead to nonconvergence of soil moisture retrieval iterations, and the study of RFI identification and restoration is thus especially important for soil moisture retrieval in the C-band.
2.3. RFI Detection Method: NPCA
To better retrieve soil moisture, it is necessary to prevent RFI pollution of brightness temperature data, so the RFI pollution area should be first identified. RFI signals can cause an anomalous increase in the brightness temperature in low-frequency channels, which can reduce the correlation between the common frequency channel and other channels. The NPCA method uses the difference in brightness temperature between low- and high-frequency channels to construct an interference coefficient matrix [
24] and can effectively identify RFI signals via PCA decomposition of the interference coefficient matrix. The specific calculation steps of the NPCA method are as follows:
The PCA method is used to identify RFI interference signals in 6 GHz horizontal polarization channels. The data matrix used for identification can be defined as follows:
where
N denotes the total number of scanning points in the area used for identification, TB denotes the observed brightness temperature, the subscripts
H and
V denote horizontal and vertical polarization, respectively, and the numbers denote the frequency.
Then, the corresponding covariance matrix
can be constructed, where
i denotes the
ith principal component,
denotes the
ith principal component mode, and
denotes the contribution of the
ith principal component mode to the total variance. Written in matrix form, the following can be obtained:
By projecting the data matrix
A for identification purposes into the standard orthogonal space constructed by a set of basis vectors
, the principal component coefficients can be obtained as follows:
where
denotes the coefficients of the
ith principal component mode. A large value of the first principal component coefficient
indicates the presence of RFI signals.
Descending orbit data of the AMSR-2 for the United States region on 18 June 2016 were used as an example to study RFI on a certain day.
Figure 2a,b shows the spatial distribution of the brightness temperature along the two polarization directions on this day. The brightness temperature along the vertical polarization direction is higher than that along the horizontal polarization direction, and there are anomalous brightness temperature high-value areas distributed in both channels. These anomalous brightness temperature high-value areas exhibit a directional, narrow-band, and isolated distribution over the continuous and smooth brightness temperature distribution characteristics of other regions. The spatial distribution of RFI signals in the 6 GHz channel can be identified with the above-mentioned NPCA method, as shown in
Figure 2c,d, where high values indicate areas with a high probability of RFI occurrence. A threshold value higher than 0.5 K is usually used to assess RFI signals [
29]. Based on the RFI spatial distribution, it could be observed that the above-mentioned areas with abnormally high brightness temperatures are similar to those with identified RFI signals, mainly attributable to the RFI strengthening effect on surface microwave radiation detected by low-frequency channels, resulting in an increase in brightness temperature anomalies. RFI signals are evident in the majority of large cities in the United States, with the most serious radio interference in the western coast and cities along the U.S.–Mexico border. The RFI signals in the different polarization channels exhibit unique distribution characteristics, with the horizontally polarized channels characterized by isolated large-value areas and the vertically polarized channels characterized by RFI signals near anomalies along all directions.
2.4. RFI Restoration Method: Iterative PCA Method
The purpose of RFI identification is to better restore brightness temperature data polluted by RFI, and the iterative PCA restoration method is therefore introduced below. The PCA method can effectively separate different information scales, while RFI can lead to an abnormal rise in single low-frequency channel brightness temperature points. This small-scale anomalous feature cannot affect the first few PCA modes with large-scale information as their main feature. Therefore, the first few PCA modes can be employed to restore anomalous observation points, and restored data with main features can be obtained.
First, the data matrix for PCA can be established. Assuming that there are P RFI-contaminated points, observation points contaminated by RFI can be excluded in circular areas with a radius of 350 km centered on these P points, and the remaining observation points can be recorded as N (N is the size of the repair matrix, a moving window is used to build the repair matrix, which means the N changes with the location of the target observation.), while the data of 9 channels can be selected for each observation point. The data matrix for restoration can be expressed as
:
The variables are the same as those of Equation (1), where
is the 6 GHz horizontal polarization channel of observation point
p, and
can be set to strengthen the abnormal characteristics between the data and surrounding data. PCA decomposition of matrix
A can be performed to obtain the following:
where
is generally referred to as the modal vector,
is the coefficient vector, and k denotes the kth mode. With the use of the first mode for reconstruction, we can obtain the following:
Although the first mode does not contain anomalous information, a value of 0 can cause first-mode information to be dispersed across other modes. Therefore, it is necessary to gradually improve the average feature represented by the first mode through iteration. Therefore, after obtaining
, this value can be substituted into the data matrix in Equation (4) to establish a new data matrix
A1, and matrix
A1 can again be decomposed via PCA, as expressed in Equation (4). Finally, the matrix can be restored to obtain
, and the above process is repeated. Because of the uniqueness of the true value, the first mode of PCA can gradually attain equilibrium in the iterative process. For
, it can be considered that the average features of the first mode have been completely retained. Then, similar to the extraction of first-mode information, information on the other modes can be extracted in a stepwise manner until all modal information is included. Finally, reconstructed P-light temperature data can be obtained.
The range of 350 km is empirically determined. During the development of the method, sensitivity tests were conducted using observations from AMSR-2 channel 9, which is not subject to RFI interference, and the test results showed that a restoration radius of 350 km gives optimal restoration results, so the restoration radius was empirically determined to be 350 km.
The iterative PCA method was used to restore the RFI-contaminated areas identified in
Figure 2c,d. The spatial distribution of the brightness temperature after restoration and RFI distribution results determined with the NPCA method are shown in
Figure 3.
Figure 3a,b shows the spatial distribution of the brightness temperature after restoration under 6 GHz horizontal and vertical polarization, respectively. Compared to
Figure 2a,b, the discontinuously distributed abnormal maximum points in the observation data were replaced by reasonable data obtained via iteration. The brightness temperature spatial distribution after restoration in the 6 GHz channels is more continuous and smoother, which better agrees with the characteristics of surface emission radiation. To more scientifically verify the effect of the restoration results, the NPCA method was used to identify the RFI phenomenon in the brightness temperature after restoration, and the results are shown in
Figure 3c,d. The figure reveals that no RFI signals occur in the brightness temperature data after restoration in the 6 GHz horizontal polarization channel, and there are no clear RFI signals in the brightness temperature data of the vertical polarization channel. The restoration effect in the vertical polarization channel via the iterative PCA method is better than that in the horizontal polarization channel. Certain RFI signals can still be identified in areas of Central America, as shown in
Figure 3d, but the signal intensity is lower than that before restoration.
4. Discussion
Currently, RFI-contaminated C-band brightness temperature data over the United States cannot be used for soil moisture retrieval, thus leaving a portion of brightness temperature data underutilized. To fully utilize this part of these data, we used the iterative PCA method to restore RFI-contaminated 6 GHz brightness temperature data and then used these restored brightness temperature data for soil moisture retrieval. It was found that the missing rate of soil moisture retrieval after brightness temperature data restoration was significantly lower than that before restoration, but the restoration effect was limited by the RFI detection method, which could be improved by adjusting the RFI identification threshold with the season. The retrieved soil moisture attained a satisfactory correlation with ERA5- and ESA-CCI-based soil moisture values and an especially high correlation with the ESA-CCI-based soil moisture. Finally, the variation characteristics of the retrieved soil moisture in spring, summer, and autumn 2016 were examined, and the underlying reasons were analyzed. Thus, the retrieved soil moisture from restored data could help to diversify the above datasets. However, a quantitative assessment of the accuracy of the retrieved soil moisture with reconstructed bright temperature in areas disturbed by RFI versus areas not disturbed by RFI is lacking; this will be further investigated in the follow-up research. As there are also many uncertainties in the soil moisture reanalysis data, the analysis in this study can only qualitatively display the spatial structure accuracy of the retrieved results, and the accuracy evaluation of the retrieved value needs to introduce more site observation data, which is also a problem that needs to be concerned in the follow-up research.