Accuracies of Soil Moisture Estimations Using a Semi-Empirical Model over Bare Soil Agricultural Croplands from Sentinel-1 SAR Data

This study describes a semi-empirical model developed to estimate volumetric soil moisture ( v θ ) in bare soils during the dry season (March–May) using C-band (5.42 GHz) synthetic aperture radar (SAR) imagery acquired from the Sentinel-1 European satellite platform at a 20 m spatial resolution. The semi-empirical model was developed using backscatter coefficient (σ° dB) and in situ soil moisture collected from Siruguppa taluk (sub-district) in the Karnataka state of India. The backscatter coefficients 0 VV σ and 0 VH σ were extracted from SAR images at 62 geo-referenced locations where ground sampling and volumetric soil moisture were measured at a 10 cm (0–10 cm) depth using a soil core sampler and a standard gravimetric method during the dry months (March–May) of 2017 and 2018. A linear equation was proposed by combining 0 VV σ and 0 VH σ to estimate soil moisture. Both localized and generalized linear models were derived. Thirty-nine localized linear models were obtained using the 13 Sentinel-1 images used in this study, considering each polarimetric channel Co-Polarization (VV) and Cross-Polarization(VH) separately, and also their linear combination of VV + VH. Furthermore, nine generalized linear models were derived using all the Sentinel-1 images acquired in 2017 and 2018; three generalized models were derived by combining the two years (2017 and 2018) for each polarimetric channel; and three more models were derived for the linear combination of 0 VV σ and 0 VH σ . The above set of equations were validated and the Root Mean Square Error (RMSE) was 0.030 and 0.030 for 2017 and 2018, respectively, and 0.02 for the combined years of 2017 and 2018. Both localized and generalized models were compared with in situ data. Both kind of models revealed that the linear combination of 0 VV σ + 0 VH σ showed a significantly higher R2 than the individual polarimetric channels.

Thirty-nine localized linear models were obtained using the 13 Sentinel-1 images used in this study, considering each polarimetric channel Co-Polarization (VV) and Cross-Polarization(VH) separately, and also their linear combination of VV + VH. Furthermore, nine generalized linear models were derived using all the Sentinel-1 images acquired in 2017 and 2018; three generalized models were derived by combining the two years (2017 and 2018) for each polarimetric channel; and three more models were derived for the linear combination of 0

Introduction
Soil moisture estimation across space and time has become possible with the advent of microwave remote sensing [1]. The amount of moisture in the soil is a function of physical, chemical, and management practices. Soil moisture is highly dynamic across space and correlated in time. The radar backscattering coefficient is a function of soil characteristics such as dielectric constant, texture, and surface roughness, and depends on the wavelength, polarization, and angle of incidence of the radar [1]. Shorter wavelength C-band radar backscatter has shown sensitivity to surface soil moisture at a depth of about 5 cm [2][3][4]. The launch of the Sentinel-1 mission of the European Space Agency has made a huge amount of C-band data acquired since 2014 from all over the Earth's surface accessible. This opened up new perspectives on studying soil moisture in semi-arid regions, as was undertaken in Karnataka, India, in this work. Large scale soil moisture monitoring will provide greater insights into energy fluxes, which can result in improved meteorological and climatic projections [5] that will provide critical inputs for agriculture.
There have been studies based on physical, empirical, and semi-empirical models that estimate soil moisture over bare soils through radar remote sensing [6][7][8]. Physical approaches require many input parameters such as surface roughness and slope, which are not available under practical conditions [8]. Empirical models are only data driven, whereas semi-empirical models, while being data driven, also support theoretical considerations. In soil studies, they are site-specific and generally valid for specific soil characteristics [3]. Previous semi-empirical studies have considered single polarization to build a relationship between soil moisture and a backscatter model at 10 cm depth [9] and estimated v ϑ with a root mean square error (RMSE) of 3-6% [10][11][12] using C-band data. There have also been studies that have used the SAR interferometry technique and Sentinel-1 data to estimate soil moisture and compare them with in situ measurements [13]. Even though SAR interferometry is less frequently used in the remote sensing community to estimate soil moisture, its advantage lies in its ability to disentangle moisture and terrain roughness contributions. Most SAR-based soil moisture estimation studies have covered small areas limited to a few hundred square kilometers [11][12][13][14][15][16][17]. Estimating soil moisture over a wider area and at a higher resolution using SAR imagery will provide information on managing water resources and irrigation scheduling that can benefit a large number of farmers [14].
The aim of this study was to estimate soil moisture in bare rice agricultural soils. While SAR images have been used to estimate rice phenology using X-band TerraSAR-X images [15], there have been limited studies to estimate the soil moisture in bare rice agricultural soils using Sentinel-1 C-band images. Bare soils in Siruguppa are rice growing areas that lie bare after the rice crop has been harvested in March, with rice stubble and weeds that have dried up during summer (March-June). By the time the monsoon rains start, it is extremely critical to estimate the amount of soil moisture in the top 10 cm, which will help farmers decide when to start preparing the land and start sowing the next crop. Surface roughness, soil status, soil moisture, and crop residue distribution affect radar backscatter [16]. It is well established that 0 VV σ is more sensitive to variation in soils and 0 VH σ is more suited to the identification of dry crop residue [17]. Utilizing both together can improve the accuracy of soil moisture estimates [18]. Nevertheless, soil moisture studies using 0 VV σ and 0 VH σ together, especially using Sentinel 1 SAR data, are limited. The need for such studies over significantly large agricultural fields is very important to study agriculture, water, and food security. The major goal of this study was to estimate soil moisture over bare soils using both 0 VV σ and 0 VH σ polarization and compare it with in situ measurements at a 10 cm (0-10 cm) depth. At the time of measurement, soil moisture to 10 cm is at the steady state and consistent across that top surface layer and therefore the C-band can be assumed to detect the top 10 cm layer. However, it is known that C-band SAR signals cannot penetrate to a 10 cm depth.
The contribution of standing stubble to total backscattering coefficient is comparable with that of the soil surface when the stubble has more than 75% water content. Backscatter coefficient decreases with a decrease in water content in the stubble. However, when the water content in the stubble is less than 40%, the contribution to the total backscattering coefficient is negligible [19]. We investigated both localized and generalized linear models to try to disentangle the stubble and soil moisture contributions. The linear coefficients of localized models were derived using in situ data acquired on a specific Sentinel-1 day. In contrast, generalized models were built using all in situ measurements acquired in the study period, thus adding the temporal dimension to the analysis of Sentinel-1 data. The question we wanted to answer is: can semi-empirical models estimate soil moisture, getting rid of the stubble contribution to the backscattering coefficient? We tried to answer this question by studying the effects of each variable, time, and polarization, separately. A localized model does not take into account the temporal evolution of backscattering, while a generalized model includes the time variable when estimating the linear coefficients. Furthermore, for each model, it is possible to keep the polarimetric channels separated or merge them. In this work, we used a large dataset of in situ measurements of soil-moisture acquired across a 2-year period to answer the above question. The issues of the stability of results and of collinearity of data are crucial and will be used to assess the results of this experiment.
The rest of this paper is organized as follows: Section 2 is devoted to materials and methods, Section 3 to the results, and Section 4 presents the discussion. Finally, a few conclusions are drawn in Section 5.

Study Area
The study was conducted in Siruguppa taluk (sub-district) in the Bellary district of Karnataka state, India ( Figure 1). Siruguppa is located between 15.35°N to 15.83°N latitudes and 76.69°E to 76.71°E longitudes covering an area of 1048 sq. km. Its climate is moderate and dry most of the year. It experiences high temperatures ranging from 23.2 °C to 42.4 °C from March to May and an annual rainfall of 645 mm. Irrigation from canal discharges cater to 60% of the cropped area, and the rest is either rainfed or irrigated through groundwater. Most of the crops are grown in predominantly black-clay, red-loamy, and red-sandy soils. The River Tungabhadra runs diagonally across Siruguppa from the northwest, providing water for irrigation. The major crops grown are paddy, sorghum, pearl millet, sunflower, groundnut, cotton and sugarcane. The last decade saw a fall in kharif (rainy season) crop production due to deficit rainfall during the monsoon in some places in the taluk, leading to a shift from paddy and millets to cash crops such as cotton and sugarcane. The Deccan Plateau region is frequently prone to drought, making information on soil moisture critical for allocating water resources and scheduling irrigation. The date of sowing is a critical decision farmers make after the initial rainfall has occurred. This is done based on traditional knowledge and the physical assessment of soil moisture by hand or using a push probe. A scientific estimation of soil moisture can help farmers to decide the sowing date. This study was conducted on "bare agriculture fields" of Siruguppa to estimate soil moisture using radar remote sensing.

Soil Sampling and Ground Data Collection
The soils of Siruguppa are classified into Vertisols (covering 720.9 km 2 ), Aridisols (146.8 km 2 ), Inceptisols (65.1 km 2 ), Alfisols (34.1 km 2 ), and other land cover such as rock outcrops (21.5 km 2 ). The locations for soil sample collection were based on random sampling, taking into account the fractions of different soil types. This mitigates the effects of variation from sampling error and increases the precision of the measured variable [20]. Soil samples were collected using a 10 cm standard metallic cylinder for a soil type to account for vertical and horizontal homogeneity [21], and weighed on site using a Mettler Toledo electronic balance. A handheld GPS (Garmin etrex) was used to georeference the locations immediately with an average accuracy of 2.5 meters as we collected it after a good almanac was received. Sixty-two locations were sampled spread across the four soil types. Forty-eight locations were sampled in Vertisols, eight in Inceptisols, four in Aridisols, and two in Alfisols. This was repeated for two years (2017 and 2018) over 13 dates of satellite overpasses, bringing the total data points to 806 ( Figure 1).
Bulk density (BD) samples were collected simultaneously using standard cylindrical cores on site to estimate volumetric soil moisture ( v ϑ ). The sampling was carried out from March to May in bare agricultural soils with crop residue from paddy and weeds.

Laboratory Analysis
Volumetric soil moisture was measured in two steps. First, the gravimetric method was used to estimate soil moisture from field samples over bare agricultural land [22]. Global Positioning System(GPS) coordinates were taken at each sample location to allow the approximate identification of the soil sample location with the image pixel. The soil collected from the ground after measuring the wet weight ( w ϑ ) was filled in airtight polythene bags and numbered with their corresponding GPS ID. The polythene bags were brought to the soil laboratory to measure their dry weight ( d ϑ ) using a standard drying process. Each sample was transferred to a microwave bowl and placed in the oven at 105 °C for 24 h, and the weight measured as dry weight. The following formula was used to estimate gravimetric soil moisture: The second step involved collecting the soil cores to estimate bulk density (BD). The drying process was repeated for each sample and the following formula was used to estimate BD: where V is the volume of the core.
Volumetric soil moisture was expressed as: where 0 H 2 ρ is the water density.

Data Collection and Pre-Processing
Thirteen Sentinel-1 images were used, six acquired between March 4 2017 and May 27 2017 and seven between March 11 2018 and May 22 2018 (see Table 1). The incidence angle varied from 30° to 35° covering the study area in Co-Polarization (VV) and Cross-Polarization (VH) polarization. The frequency of the acquisition of imagery over India is very low, and a cycle of low and high number of acquisitions in alternating months was seen from the data portal (Table 1). Pre-processing of SAR imagery was carried out using SNAP software developed by the European Space Agency (ESA). Radiometric calibration, thermal noise removal, and terrain correction (using the Range Doppler terrain correction operator) algorithms were applied to obtain the backscattering coefficient σ° dB [23]. A Lee speckle filter was applied to reduce speckle noise. Linear Sentinel-2 Level-1C S2 imagery with less than 10% cloud cover was downloaded for the years 2016 to 2018. These were converted to Level 2A to obtain bottom of atmosphere reflectance using SNAP software provided by ESA under a GNU General Public License V3 . Visible and Near Infrared Radiation (NIR) bands B4 and B8 were used to generate normalized difference vegetation index (NDVI) to delineate the agricultural area.

Methodology
The study began with pre-processing of Sentinel-1 C-band data (described in Section 2.3) to obtain σ° from both polarizations after applying appropriate corrections and speckle reduction. The in situ data collected during the field missions were used to extract 0 VV σ and 0 VH σ values in dB from the respective images of different dates ( Table 1). The in situ data and σ° data were compiled to analyze and build a semi-empirical model. Agricultural land was derived using band B4 and B8 of a time series of Sentinel-2 images used to calculate the NDVI for the date for which an image was available in the season during each year. Random forest (RF) classification was applied to the set of nine NDVI images covering the study area and training dataset. This is useful to mask out non-agricultural areas when visualizing soil moisture estimates. An evaluation of the semi-empirical model was conducted to assess the accuracy of soil moisture ( Figure 2). Cross-Polarization (VH ) imagery.

Semi-Empirical Model
A semi-empirical model was proposed to estimate soil moisture over bare soils in agricultural areas from the backscatter coefficient based on a linear relationship. The linear equation captures the backscatter from bare soil, which constitutes soil moisture and surface roughness (as crop residue) and includes both VV and VH backscattering coefficients as: where v ϑ is the volumetric soil moisture; A, B, and T are empirical constants; and 0 VV σ and 0 VH σ are the VV and VH backscattering coefficients, respectively.
On bare soil, 0 VV σ and 0 VH σ are mainly influenced by soil moisture. Since the major crop in the study area is rice, there is a crop residue as rice stubble on the ground. The rice stubble at 75% water content also contributes to the 0 VV σ , but decreases as the water content decreases and is negligible in both polarizations [19,24]. A linear combination including both polarizations was found to better estimate soil moisture from bare soil.

Delineation of Agricultural Fields
The estimation of soil moisture is more meaningful when linked to the purpose for which it is used. The ideal domain for use of such information is agricultural lands. Ideally, NDVI [25] is used to understand changes in crop phenology as the growing season progresses. Since the target class was only agricultural land, time series NDVI during the cropping season was best suited for the delineation using Sentinel-2 imagery. A set of nine NDVI images during the three crop seasons was used to estimate land cover using the RF algorithm [26]. The training dataset included land use in the soil sample locations (62). Additionally, 200 training samples were used: 100 from agricultural land and 100 from non-agricultural land. This product was used as a base for mapping soil moisture in agricultural lands.

Evaluation of Semi-Empirical Model
Basic information like maximum, minimum and mean in situ soil moisture were generated ( Table 2). Linear regression was used to understand the relationship between Sentinel-1 backscattering coefficients and in situ soil moisture data. The P value, which indicates the significance of the accuracy assessment was significant (≤0.05) and not significant (≥0.05). The RMSE of the modeled soil moisture was estimated using the equation: To understand the contribution of each polarization and sum of both polarizations to the accuracy of the model, residual standard error (RSE) of the estimated soil moisture was calculated using equation: where is the observed soil moisture; is the predicted soil moisture; and n is the degree of freedom.

Results
A well distributed sampling scheme and data collected over two years yielded a well calibrated model to estimate soil moisture in the bare agricultural soils during the dry season (March-May). Linear and multi-linear regression was used to find the relationship between observed soil moisture and backscatter coefficients by deriving the model constants for each date and a combination of dates.

Field Measurements and Laboratory Analysis
Soil moisture was estimated using the gravimetric method for all 62 samples spread over Siruguppa taluk (Figure 1) Table 2). Figure 3 illustrates the range of values that each point in the population takes above and below the mean for six dates of satellite passes during 2017. It is worth noting that Figure 3 displays the soil moisture values measured the day of the satellite passes and for this reason, the ranges of the variation of soil moisture appeared as different from those reported in Table 2

Localized and Generalized Relationships
The concepts of localized and generalized relationships were used in the in situ measurements of soil moisture and SAR estimates. A relationship was localized if it was obtained using single date data points in the study area, collected both in 2017 and 2018. A generalized relationship was obtained when all the dates data points were considered in the study area ( Figure 5).
The relationship for localized models showed R 2 ranging from 0.62 to 0.75 between 0 VV σ and v ϑ , revealing a significantly strong relationship in 2017 ( Table 3) Table 3).
Generalized relationships attempted to study the impact of seasonal effects observed in the study area due to different agroecologies (i.e., the different management and practices in a homogenous landscape). Table 4

Soil Moisture Evaluation
Multi-linear regression and linear regression were applied to determine the value of empirical constants (A, B, and T) in both the localized and generalized models. Tables 5 and 6

Discussion
Accurate estimation of v ϑ was envisaged using a linear equation of 0 VV σ and 0 VH σ radar cross section from bare agricultural soils. A thorough data collection campaign was undertaken during 2017 and 2018, synchronizing with the pass of the satellite. Bare soil areas were mostly post-harvest cropped areas with little or no crop residue, depending on the crop sown. In the study area, 50% of the agricultural land comprises rice cropped and irrigated from a seasonal stream. Sentinel-1 SAR, dual polarized imagery was used to estimate soil moisture over bare soils using a semi-empirical model. Model parameters were estimated using linear and multi-linear regression. Performance evaluation was conducted based on a 70:30 ratio of sampled points and low RMSE was found between the observed and estimated soil moisture, when a linear relationship between 0 VV σ and 0 VH σ was combined for 2017 and 2018.  that in both years, the backscatter and observed soil moisture had a significant positive correlation [2,10,27,28]. In both years, VV polarization had a higher backscatter dB value than VH polarization. In cross-polarization (VH), signal attenuation occurs due to volumetric scattering [29]. In 2017, soil moisture constantly increased from March 4 to April 27. The R 2 between radar backscattering coefficient and in situ measurements of soil moisture is reported in Table 3. A sudden increase in R 2 (VV) can be observed on May 15, corresponding to the consecutive rainfall events that occurred during the three days before the date of the satellite pass ( Figure 8). This means that there is a better correlation for high values of soil moisture, probably because under this condition, the radar backscattering coefficient's dependence on soil moisture is more important than it is on surface roughness.
Similarly, an unexpected increase in 0 VH σ was observed ( Figure 5). May 27, 2017 (Table 3) had a low R 2 value from 0 VH σ compared to the rest of the dates due to the rainfall event (Figure 8), weeds or crop residue moisture [24]. In 2018, R 2 for the relationship between 0 VV σ and observed soil moisture was significant during March because of residual soil moisture (i.e., the crop residual moisture influenced the radar backscattering coefficient, Table 3). Residual soil moisture was low on April 4 and May 22 due to evaporative demand and higher between April 16 and May 10 due to consecutive rainfall events (Figure 8). R 2 did not decrease from March to May, probably due to irregular changes in crop residue moisture, since 0 VH σ is sensitive to it [24]. The R 2 values from 0 VH σ during March were relatively low despite no rainfall in the month because of residual soil moisture from the previous crop. The cumulative moisture due to rainfall during April is reflected in the low R 2 of April 16 and April 28 (  (Table 3). A similar relationship existed during 2018 from a linear combination of 0 VV σ and , which improved R 2 significantly (Table 3).

Localized and Generalized Relationships
To operationalize the accurate estimation of soil moisture for decision making, a global relationship was envisaged considering all dates during the dry season. The R 2 value of global relationship from VV polarization during 2017 was 0.68, which was higher than the mean of the local relationships. The generalized relationship was found to be more useful for an accurate soil moisture estimate. In addition, R 2 for the generalized relationship performed better than the mean of the localized relationship (0.67) with VH polarization. The scenario during 2018 from VV polarization was more influenced by rainfall events in the dry season. The R 2 values ranged from 0.56 to 0.69 with a mean of 0.62 from localized relationships and 0.66 from the generalized relationship, which was more than the local mean. R 2 was very low from VH polarization due to cumulative moisture from rainfall events. However, the generalized relationship produced a lower R 2 than the mean localized relationship for VH in 2018 (see Tables 3 and 4). The usefulness of a generalized relationship was exhibited with a consistent increase in the accuracy of the soil moisture estimates over two years. The relationship from VV and VH polarization during 2017 showed significantly lower R 2 than the linear combination of VV and VH during the same year. Similarly, also during 2018, R 2 was significantly higher than the individual polarization. Finally, the best relationship was obtained when the linear combination of two polarizations was combined (appended) for the two years 2017 and 2018, than from single polarizations combined for the two years. It was inferred that generalized relationships are more promising in terms of building a model compared to localized relationships, which may not relate to the entire population.

Modeling the Relationships
The relationships of localized and generalized modeling were explored and tested for multicollinearity, especially linear combination models 0 VV σ + 0 VH σ . Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated [30]. To detect multicollinearity, we used an indicator called variance inflation factor (VIF), which is a tool to measure and quantify how much the variance is inflated [30]. If any of the model's VIF values exceed 5 or 10, it is an indication that the associated regression coefficient is poorly estimated because of multicollinearity [31]. The P value indicates statistical significance for independent variable contribution in the model, which is explained in Section 2.4.3.
For generalized models, nine different types of linear relationships were explored with 0 VV σ and 0 VH σ data (Table 6)  This study also showed that the linear combination equations from the localized models also performed well with low VIF (<2) and a P value statistically significant for both backscatter coefficients (Tables  7 and 8).
A collinearity test on the generalized and localized models showed that the VIF for a linear combination of both backscatter coefficients (VV + VH) was <3. Hence, these models are non-collinear. All models showed low P value, indicating that both backscatter coefficients made meaningful addition to the models. During modeling relationships with a linear combination of individual backscatter coefficient, it was inferred that the individual backscatter coefficients were non-collinear, contributing to R 2 independently. It was found that the localized models from individual dates varied over time, and any one equation with a low RSE and VIF may not represent the whole season. In addition, the generalized models produced lower RSE representing the whole season, and were hence better than each localized model.

Validation of Models
Models were validated using 30% of the sampled points. Results for the localized models are summarized in Table 5. In 2017, the lowest RMSE (0.01) was found on 21 April. Figure 8 shows that no rainfall or very weak rainfall was observed on this day. An increase in RMSE was observed on 15 May. Similarly, in 2018, the lowest RMSE was observed on 23 March and the highest (0.03) on 16 April 2018, probably due to the increase in rainfall. The results seem to show that the RMSE of the models is related to the amount of rainfall. Localized models performed better in drier soils.
As far as the generalized models are concerned, the validation results showed that generalized models obtained using co-polar 0 VV σ data provided a lower RMSE than those based on cross-polar 0 VH σ data for both 2017 and 2018 and taking all data acquired from 2017 to 2018. We also found that the linear combination of both co-polar and cross-polar backscattering coefficients always provided a lower RMSE than the models using only one polarization. The best results came when using the linear combination of polarizations and all the data acquired along the two years, resulting in an RMSE of 0.02 (Table 6). This globalized model was used to produce maps of soil moisture and its spatial variability (Figures 9-11). This is probably the most important result, as a simple multi-linear model using both co-polar and cross-polar Sentinel-1 data acquired over long time periods can reproduce the spatial variability of soil moisture.

Conclusions
This study aimed to accurately estimate the soil moisture of bare, post-harvest agricultural areas collected from Siruguppa taluk (sub-district) in the Karnataka state of India. Fifty percent of this agricultural area is grown with rice that is irrigated by seasonal canal irrigation. An accurate estimate of volumetric soil moisture ( v ϑ ) was envisaged using a semi-empirical model based on a linear equation of co-polarized and cross-polarized radar cross section obtained by Sentinel-1 images. A thorough data collection campaign was undertaken during 2017 and 2018 during the pass of the satellite.
Both localized and generalized models were developed using Sentinel-1 image independently and all images together, respectively. Results indicate that the accuracy of the soil moisture estimates increased when using both co-polar and cross-polar images instead of only The use of localized models revealed that the RMSE of soil moisture estimates decreased corresponding to dry periods, with little or no rainfall. This indicates that better estimates of soil moisture can be obtained for drier soils. Coming to globalized models, soil moisture estimates with lower RMSE were observed when merging all data acquired in 2017 and 2018, and co-polar and cross-polar images, with a R 2 of 0.7 and RMSE of 0.02. The availability of a large amount of in situ data collected over a large area demonstrated that a globalized linear model based on the joint use of co-polar and cross-polar C-band SAR images acquired for a long time period, with a short revisiting time of twelve days, could capture spatial variability in soil moisture. This is an important result as the availability of Sentinel-1 data can provide farmers with timely and accurate estimates of soil moisture and enable the mapping of its spatial variability by using simple semi-empirical models. This information, when provided in the immediate weeks and months preceding the cropping season, could be very crucial in determining planting dates and assessing early season plant growth, thereby playing a key role in influencing productivity.