Soil moisture is an important basis for assisting the development of agriculture and forestry [
1,
2,
3]. Soil moisture is also a very important variable in the study of the terrestrial water cycle and energy cycle. It can affect not only the distribution ratio of the conversion of net radiant energy into latent heat and sensible heat, but also the ratio of precipitation into infiltration, runoff, and evaporation. Therefore, accurate acquisition of soil water content can make reasonable use of the land to improve production levels and production quality. Previously, the methods for accurately measuring soil water content were traditional methods such as time domain reflectometry (TDR) [
4]. These traditional methods can accurately measure local soil moisture content. However, they require much workforce and material resources and are not suitable for large-scale soil moisture extraction. Because of the limitations of traditional methods, remote sensing offers a method that can estimate soil moisture on a large scale without expensive on-site monitoring networks. The sensitivity of microwave to the dielectric properties and geometric surface structure of bare soil surface allows radar remote sensing to be used to estimate soil moisture [
5,
6,
7]. The backscatter coefficient of synthetic aperture radar (SAR) data is closely related to the surface parameters of the soil [
8,
9]. So far, there have been many studies using SAR data to generate physical, empirical, and semi-empirical backscattering models for soil moisture inversion. Physical models include the physical optics model (POM) [
10], geometrical optics model (GOM), small perturbation model (SPM) [
11,
12], and IEM [
13,
14]. Compared with the empirical model, the physical model does not require any specific field calibration, especially the IEM, which has a wide range of roughness applications (
ks ≤ 3, s is the surface root mean square height and
k is the radar wave number, where
k = 2π/λ, λ being the incident wavelength). However, the physical model has a condition with the assumption that the saturated exposed surface is a uniform half-space dielectric layer, which can be solved by Maxwell’s equations. Therefore, many parameter calculations are required. Furthermore, due to the unsatisfactory description of the soil surface, the calculation of backscattering coefficients is not accurate, which greatly limits the accuracy of these models [
15,
16,
17]. Thus, many researchers use the relationship between SAR data and soil moisture to propose empirical models and semi-empirical models, such as the Oh model [
18] and Dubois model [
19]. The soil moisture is calculated using the radar backscattering coefficients and compared with the actual measurement results for the evaluation of accuracy. The model calculation results are more accurate than those of the physical model when the parameters are adjusted. These models incorporate a variety of radar configurations and surface conditions of the SAR data into the theoretical basis of the physical model, which have a wide range of applications. However, for specific research areas, more work is still needed to determine the calibration factors of the model to ensure the accuracy of soil moisture retrieval [
20,
21]. The backscatter coefficients are determined by multiple soil surface parameters. When we use a single radar configuration and do not have accurate parameters related to soil roughness, the soil moisture calculated by the radar backscatter coefficient usually has a large error [
22]. Meanwhile, a major problem in estimating soil moisture is the separation of the contributions of soil moisture and anisotropic roughness to backscattered radar signal. To separate the estimation of soil roughness from that of soil moisture, Hajnsek et al. proposed the X-Bragg model, which considers the phase information of PolSAR data [
23]. In the X-Bragg model, through the decomposition of the complex coherence matrix, the scattering entropy (H), the scattering angle (α), and the anisotropy (A) are obtained. The roughness is only related to A. In this way, the estimation of roughness can be separated from the estimation of soil moisture, and the accuracy of soil moisture inversion is improved. However, the limitation of this method is that the applicable range of roughness is
ks ≤ 1. In terms of soil moisture inversion, the above models are only effective for bare or sparse soil surfaces. When vegetation exists in the land, the dielectric characteristics and physical structure of vegetation will affect the backscatter coefficient. Therefore, vegetation parameters need to be added to the model in a vegetation-covered area.
With the rise of machine learning, researchers have begun to use shallow neural network methods such as multi-layer perceptron (MLP) to estimate soil moisture [
24]. Baghdadi et al. [
25] used the MLP to retrieve soil moisture in bare soil areas with IEM simulation data and Radarsat-2 SAR data and obtained rewarding results. Said et al. [
26] estimated the soil moisture using the neural network with ERS-2 SAR data in bare soil and vegetation-covered areas. The results show that the estimated soil moisture has a high correlation with the measured soil moisture. The neural network can automatically extract features from real surface data and build a nonlinear model to predict classification and regression. Therefore, by using the neural network, we can effectively avoid manual modeling errors caused by defective surface modeling or imprecise nonlinear functions. Although much research has been done on remote sensing processing using neural networks, only a small number of works have studied the potential of neural networks for soil moisture inversion and obtained encouraging results. In recent years, deep learning has developed rapidly, and people have begun to use deep networks to fit data to achieve satisfactory results. There is a very typical algorithm in deep learning: CNN. The input of the CNN is the local patch centered on the target pixel, and the use of the MLP is basically based on the one-dimensional vector of the target pixel. So the CNN can extract neighborhood information and structural information from the local patch. Like the MLP, the CNN requires a large amount of data for training [
27]. However, the CNN can make full use of spatial information through a convolution kernel. When using the same training data, the CNN can obtain a higher precision than the MLP [
28]. This algorithm has been successfully applied to the classification of SAR images. For the classification of PolSAR data, Shang et al. [
29] proposed a DSNet based on the CNN. Classification experiments are carried out on three datasets with rich categories, and satisfactory results are obtained. The use of the CNN for automatic modeling requires a large amount of data to drive. Nevertheless, sufficient real surface data is difficult to obtain on a large scale, so only a few research studies combine the CNN with surface parameter inversion. Song et al. [
30] used simulated SAR images as training data in the CNN to invert the soil roughness and achieved good results. The accurate classification and soil roughness inversion by the CNN inspires us to use the CNN to retrieve soil moisture. As shown in
Table 1, we enumerate some differences between traditional, neural network, and convolutional neural network models.
Generally, there are two strategies to invert the models. In the first strategy, according to the complexity of the model, the equation between known parameters and soil moisture can be solved by an analytical or numerical method. In the second strategy, a complex network is generated based on training. A network, such as MLP and CNN, can represent the complex nonlinear relationship between known parameters and soil moisture. Because the X-Bragg model and IEM are too complex, it is almost impossible to get the equation analytically using the two models’ parameters. Therefore, this paper attempts to use the CNN combined with the physical model to solve the problem of soil moisture inversion. At present, whether it is classification or parameter inversion, most CNNs are trained directly with original data or image data. However, this requires a large amount of data to adjust the model parameters. A priori knowledge, e.g., about functional expressions of mappings or about serviceable parameters, can be provided by the physical model. So we can relax the data requirements by combining the physical models. Meanwhile, the CNN can use the spatial information to automatically learn the features of the given data by a convolution kernel, which can solve the problem of low accuracy caused by physical models that have not been calibrated with real surface data. The dual-channel CNN is used to estimate soil moisture from a simulated dataset and real bare surface data. The experimental results shows that the dual-channel CNN extracts the features of the two model parameters and incorporates their strengths. Compared with other inversion methods, the main contributions of this method are as follows: