Soil Moisture Monitoring Using Remote Sensing Data and a Stepwise-Cluster Prediction Model: The Case of Upper Blue Nile Basin, Ethiopia

: In this study, a residual soil moisture prediction model was developed using the stepwise cluster analysis (SCA) and model prediction approach in the Upper Blue Nile basin. The SCA has the advantage of capturing the nonlinear relationships between remote sensing variables and volumetric soil moisture. The principle of SCA is to generate a set of prediction cluster trees based on a series of cutting and merging process according to a given statistical criterion. The proposed model incorporates the combinations of dual-polarized Sentinel-1 SAR data, normalized difference vegetation index (NDVI), and digital elevation model as input parameters. In this regard, two separate stepwise cluster models were developed using volumetric soil moisture obtained from automatic weather stations (AWS) and Noah model simulation as response variables. The performance of the SCA models have been veriﬁed for different signiﬁcance levels (i.e., α = 0.01, α = 0.05, and α = 0.1). Thus, the AWS based SCA model with α = 0.05 was found to be an optimal model for predicting volumetric residual soil moisture, with correlation coefﬁcient (r) values of 0. 95 and 0.87 and root mean square error (RMSE) of 0.032 and 0.097 m 3 /m 3 during the training and testing periods, respectively. While in the case of the Noah SCA model an optimal prediction performance was observed when α value was set to 0.01, with r being 0.93 and 0.87 and RMSE of 0.043 and 0.058 m 3 /m 3 using the training and testing datasets, respectively. In addition, our result indicated that the combined use of Sentinel-SAR data and ancillary remote sensing products such as NDVI could allow for better soil moisture prediction. Compared to the support vector regression (SVR) method, SCA shows better ﬁtting and prediction accuracy of soil moisture. Generally, this study asserts that the SCA can be used as an alternative method for remote sensing based soil moisture predictions.


Introduction
Soil moisture is a critical component of agricultural development because its availability and distribution substantially determine the growth and productivity of crops. Soil moisture is one of the limiting factors in countries such as Ethiopia where the country is predominantly affected by recurrent drought and dependent on rain-fed farming practices [1]. Ethiopia's crop production and productivity are low and dominated by smallholder farmers [2]. Most of these farmers are unable to sustain their livelihoods by a single harvest during the main rainy season [3,4]. More specifically, the Upper Blue using remote sensing data. Satalino et al. [30] used the ANN approach to retrieve soil moisture from ERS data with an overall RMSE of 6%. While Santi et al. [31] reported better soil moisture retrieval accuracy with an RMSE close to 0.023 m 3 /m 3 , using ENVISAT/SAR data and ANN technique. Baghdadi et al. [32] predicted soil moisture values derived from C-band SAR data using the ANN approach, with an RMSE approximate to 0.065 m 3 /m 3 and 0.098 m 3 /m 3 with and without considering a priori information related to the soil parameters over bare agricultural areas, respectively. Subsequently, Lakhankar et al. [33] have compared the performances of ANN with other statistical methods such as fuzzy logic and multivariate regression techniques. The ANN approach has shown a comparable performance (RMSE = 3.39%) as compared with fuzzy logic (RMSE = 3.45%) but was better than the multivariate statistics method (RMSE = 4.48%). Palosica et al. [34] have made a comparative analysis between the performances of ANN and the Single Chanel Algorithm developed by the US Department of Agriculture using AMSR-E data. The findings demonstrated that both algorithms can meet or exceed the AMSR-E mission soil moisture accuracy requirements (i.e., RMSE ≤ 0.06 m 3 /m 3 ).
In the last few years, there have also been other studies in the field of geo-/bio-physical parameter retrieval based on recent machine learning techniques, such as support vector regression (SVR) [35]. In this connection, different studies (e.g., [36][37][38]) have investigated the potential of a SVR model for soil moisture inversion using remote sensing data. Thus, an improved performance of the SVR algorithm (with RMSE = 1.98%), when compared to ANN (RMSE = 2.79%) and the conventional multiple linear regression approaches (RMSE = 2.84%) was achieved by Ahmad et al. [36].
In a different approach, stepwise-cluster analysis (SCA) is an alternative statistical method intended for modeling the nonlinear relationships between independent and dependent variables [39]. The SCA has been extensively used to handle multivariate modeling problems in environmental prediction and hydrological monitoring activities [39]. It can also effectively work either with continuous or discrete variables [40]. The modeling outputs of SCA are provided by a series of cluster trees, which gives a set of prediction systems (tip clusters), to reproduce the relations between multiple independent and dependent variables [41]. The SCA technique was first introduced by Liu and Wang [42] to solve multivariate modeling problems in medical research. Later, Huang [40] improved the SCA approaches and used for modeling the correlation between major air pollutants and multiple source factors in an urban environment. Eventually, the SCA has gained much attention and a large number of application studies based on the SCA method have been reported. For example, [39,43,44] developed a forecasting system using SCA for mapping the link between contaminating concentration and operating conditions in groundwater bioremediation processes. More recently, many works have successfully applied SCA for climate projection [45], stream flow prediction [46], hydrological processes modeling [47,48], and air quality management in an urban environment [49]. All these efforts attested the effectiveness of SCA for environmental and hydrological prediction systems. It is thus likely that the SCA approach could be applied for soil moisture inversion from remote sensing data and it might be used as an alternate technique. However, no attempts have been made to apply SCA statistical methods in this area so far.
Therefore, as a supplement of the previous efforts, the objective of this study is to develop and test a stepwise-cluster soil moisture inference model based on the statistical relationship between volumetric soil moisture and remote sensing data (obtained from SAR and optical sensing systems). Explicitly, we first (i) investigated the effect of surface parameters such as vegetation cover, topography, and soil properties on the relationship between SAR backscattering signals and soil moisture in our area of interest; (ii) then the synergy of dual-polarized SAR data, normalized difference vegetation index (NDVI), and digital elevation model (DEM) have been used to establish the SCA based soil moisture prediction model; (iii) followed by validation of the proposed prediction system, and (iv) compared with other statistical method.

Site Description
The Upper Blue Nile (UBN) basin is located in the northwestern part of Ethiopia ( Figure 1). The UBN basin is a main source of the Nile River water resource, and it contributes about 60% of the annual flow of the Nile [50,51]. The basin has an approximate drainage area of 176,000 km 2 [52]. It is characterized by a complex topography with elevation ranging from 4239 m a.s.l. at the northeastern part of the basin to 490 m a.s.l. at the western part of the basin near the Ethiopian-Sudan border ( Figure 1). The climate of the UBN basin ranges from humid to semi-arid. The main rainfall season (known as "Kiremt") occurs from June to September. The dry season runs from October to January followed by a short rainy season (called "Belg") from February to May. According to Kim et al. [53], about 70% of the annual precipitation in the study area (UBN basin) is observed during the Kiremt season. The UBN basin receives up to 2200 mm of annual rainfall. The annual mean rainfall varies between 1200 and 1800mm [52] with an increasing trend from northeast to southwest [53]. However, the basin is characterized by large temporal fluctuations in rainfall [52,54] both on intra-annual and inter-annual scale. As a result, the hydrological processes in the basin are quite complex and highly variable in space and time. Although quite a diversity of land use systems is common, the livelihoods of the majority of the populations in the basin are highly dependent on rain-fed agriculture. The Upper Blue Nile (UBN) basin is located in the northwestern part of Ethiopia ( Figure 1). The UBN basin is a main source of the Nile River water resource, and it contributes about 60% of the annual flow of the Nile [50,51]. The basin has an approximate drainage area of 176,000 km 2 [52]. It is characterized by a complex topography with elevation ranging from 4239 m a.s.l. at the northeastern part of the basin to 490 m a.s.l. at the western part of the basin near the Ethiopian-Sudan border ( Figure 1). The climate of the UBN basin ranges from humid to semi-arid. The main rainfall season (known as "Kiremt") occurs from June to September. The dry season runs from October to January followed by a short rainy season (called "Belg") from February to May. According to Kim et al. [53], about 70% of the annual precipitation in the study area (UBN basin) is observed during the Kiremt season. The UBN basin receives up to 2200 mm of annual rainfall. The annual mean rainfall varies between 1200 and 1800mm [52] with an increasing trend from northeast to southwest [53]. However, the basin is characterized by large temporal fluctuations in rainfall [52,54] both on intra-annual and inter-annual scale. As a result, the hydrological processes in the basin are quite complex and highly variable in space and time. Although quite a diversity of land use systems is common, the livelihoods of the majority of the populations in the basin are highly dependent on rain-fed agriculture.

Data
Remote sensing input data were acquired from Sentinel-1 SAR, Moderate Resolution Imaging Spectroradiometer (MODIS), and the Shuttle Radar Topographic Mission (SRTM). Volumetric soil moisture data were also collected from ground-based automatic weather stations (AWS) and land surface parameters simulated from the Noah 3.

Data
Remote sensing input data were acquired from Sentinel-1 SAR, Moderate Resolution Imaging Spectroradiometer (MODIS), and the Shuttle Radar Topographic Mission (SRTM). Volumetric soil moisture data were also collected from ground-based automatic weather stations (AWS) and land surface parameters simulated from the Noah 3.3 model in the Famine Early Warning Systems Network (FEWS NET) Land Data Assimilation System (FLDAS). Data were collected/acquired for the periods of 2016 and 2017 (for the months of September, October, November, December, and January). Those months, except September, are the dry periods in the study area when farmers can potentially practice additional cropping using residual soil moisture. The month of September indeed belongs to the wet period of the study area; however, the main season cropping reaches to the stage of physiological maturity and crops have limited moisture intake during this month. So, dry season farming may start as of September for efficient utilization of the residual soil moisture. Descriptions of each data are provided below.

Remote Sensing Data
This study used SAR image data from the Global Monitoring for Environment and Security (GMES) Sentinel-1 mission. It operates in C-Band SAR instrument with the frequency of 5.405 GHz. Sentinel-1 has four different operating modes; however, over land, it uses the main operational Interferometric Wide-Swath (IWS) mode and measured at dual polarization (i.e., vertical transmit and vertical receive-VV and vertical transmit and horizontal receive-VH) with a 250 km swath and an average temporal resolution of 12 days in the study area. Free data can be accessed via the European Space Agency (ESA) website (https://scihub.copernicus.eu/dhus/#/home) once it is acquired. In this study, 66 level-1 (for 32 acquisition dates) product of IWS mode generated as Ground Range, Multi-Look, and Detected (GRD) products were acquired for the periods of 2016 and 2017 (Appendix A Table A1, lists of Sentinel-1 SAR data used). The GRD product of high-resolution class has a spatial resolution of 20 × 5 m and a pixel spacing of 10 m. This study used data from descending orbit, which provides dual -polarized SAR data acquired both at VV and VH polarizations simultaneously. The essential characteristics of Sentinel-1 IW swath mode data are given by [55].
The preprocessing of SAR data consists of several steps including radiometric correction, speckle filtering, and geometric correction. These processes have been done using the Sentinel application platforms (SNAP) provided by ESA. The calibrations of raw SAR data have been made using the radiometric toolbox in SNAP. Radiometric calibration is required to convert SAR pixel values to exact backscattering coefficient of the scene. A 7 × 7 Enhanced Lee filtering window was applied to the SAR data to reduce the speckles that may degrade the quality of the SAR image. The geometry of the SAR data has been corrected using Range Doppler Terrain correction tool in SNAP.
Ground vegetation coverage has an effect on the backscattering characteristics of SAR data. In this aspect, the normalized difference vegetation index (NDVI) were used to assess ground vegetated land cover. For this study, MODIS NDVI data product (MOD13A2) was downloaded from the USGS earth explorer website for the period of 2016 and 2017. We have used NDVI data of MOD13A2 prepared with 16-day composite and a spatial resolution of 1 km. Daily values of MODIS NDVI were obtained by interpolating the 16-day composite using temporally corrected time-series information of composite. The digital elevation model (DEM) with a spatial resolution of 30m provided by the Shuttle Radar Topographic Mission (SRTM) was used for geometric correction of SAR data during the preprocessing phase. However, the topographic variation still determines the spatial distributions of soil moisture in the field [56] and the acquired SRTM Global elevation data was also considered in the SAR based soil moisture prediction model.

Soil Moisture
The prediction models in this study were calibrated and validated using known volumetric soil moisture data obtained from automatic weather stations (AWS) and FLDAS Noah model simulated for East Africa.

Ground Observed Soil Moisture Data
Ground observed soil moisture dataset is valuable for model calibration and validation when we are dealing with soil moisture estimation using remote sensing data. However, many African countries, including Ethiopia, are characterized by the scarcity or unavailability of ground based soil moisture observations [57]. Recently, the National Meteorological Agency (NMA) of Ethiopia has Remote Sens. 2019, 11, 125 6 of 24 installed about 16 automatic weather stations (AWS), which can measure soil moisture data at different depths of soil in addition to other climatic information.
For this study, only six stations are found in and around the UBN basin, corresponding to the acquisition of Sentinel-1, were used for the period of 2016 and 2017. The six stations are "Dangila", "Kachis", "Motta", "Nedjo", "Simada", and "Weliso" (Figure 1). At each site, soil moisture measurements are taken at 20, 50, and 100 cm depth with a 15 minutes time interval. To overcome the absence of calibration standards, the spatiotemporal distributions of AWS (0 to 20 cm) data sets have been compared to FLDAS Noah soil moisture product (0 to 10 cm depth) and Climate Hazards Group Infrared Precipitations with Stations (CHIRPS) satellite rainfall product. The CHIRPS has shown good agreement with ground observed rainfall over our area of interest [58]. To define the spatial patterns in the relationship, a point to pixel-wise correlation between the daily mean of all the six AWS observed volumetric soil moistures with that of FLDAS Noah and CHIRPS precipitation has been made ( Figure 2). Within this domain, AWS observations have shown consistent spatiotemporal distribution with the simulated FLDAS Noah and CHIRPS precipitation events (Figure 2). A strong correlation of AWS measured volumetric soil moisture with simulated FLDAS Noah (r = 0.74) and CHIRPS precipitation (r = 0.53) can be noted. installed about 16 automatic weather stations (AWS), which can measure soil moisture data at different depths of soil in addition to other climatic information.
For this study, only six stations are found in and around the UBN basin, corresponding to the acquisition of Sentinel-1, were used for the period of 2016 and 2017. The six stations are "Dangila", "Kachis", "Motta", "Nedjo", "Simada", and "Weliso" (Figure 1). At each site, soil moisture measurements are taken at 20, 50, and 100 cm depth with a 15 minutes time interval. To overcome the absence of calibration standards, the spatiotemporal distributions of AWS (0 to 20 cm) data sets have been compared to FLDAS Noah soil moisture product (0 to 10 cm depth) and Climate Hazards Group Infrared Precipitations with Stations (CHIRPS) satellite rainfall product. The CHIRPS has shown good agreement with ground observed rainfall over our area of interest [58]. To define the spatial patterns in the relationship, a point to pixel-wise correlation between the daily mean of all the six AWS observed volumetric soil moistures with that of FLDAS Noah and CHIRPS precipitation has been made ( Figure 2). Within this domain, AWS observations have shown consistent spatiotemporal distribution with the simulated FLDAS Noah and CHIRPS precipitation events ( Figure 2). A strong correlation of AWS measured volumetric soil moisture with simulated FLDAS Noah (r=0.74) and CHIRPS precipitation (r=0.53) can be noted. It is also noticed that the inconsistency of the soil depth measured by AWS (0 to 20 cm) and the sensitivity of C-band SAR data are usually more responsive to the top few centimeters of soil [59]. However, [60] have observed that the sensitivity of the C-band SAR data to soil moisture variation could extend up to 20 cm depth. For example, Humphrey [61] has found a significant correlation between backscatter variables extracted from the C-band RADARSAT imagery and soil moisture measured at both 5 cm and 20 cm depths, with r= 0.83 and 0.79, respectively. Similarly, [62] reported the sensitivity of RADARSAT-2 SAR data to the amount of soil moisture measured at 20 cm depth and resulted in a correlation value of 0.85. A correlation value up to 0.84 between backscatter values from ERS SAR data and in-situ soil moisture also observed at a depth of 20 cm [63]. Therefore, these may suggest the sensitivity of C-band SAR data up to 20 cm depth measurements of soil moisture and could be used to calibrate a soil moisture prediction model using Sentinel-1 SAR data. However, still readers should note that the vertical heterogeneity of soil moisture at 20 cm depth of soil could It is also noticed that the inconsistency of the soil depth measured by AWS (0 to 20 cm) and the sensitivity of C-band SAR data are usually more responsive to the top few centimeters of soil [59]. However, [60] have observed that the sensitivity of the C-band SAR data to soil moisture variation could extend up to 20 cm depth. For example, Humphrey [61] has found a significant correlation between backscatter variables extracted from the C-band RADARSAT imagery and soil moisture measured at both 5 cm and 20 cm depths, with r = 0.83 and 0.79, respectively. Similarly, [62] reported the sensitivity of RADARSAT-2 SAR data to the amount of soil moisture measured at 20 cm depth and resulted in a correlation value of 0.85. A correlation value up to 0.84 between backscatter values from ERS SAR data and in-situ soil moisture also observed at a depth of 20 cm [63]. Therefore, these may suggest the sensitivity of C-band SAR data up to 20 cm depth measurements of soil moisture and could be used to calibrate a soil moisture prediction model using Sentinel-1 SAR data. However, still readers should note that the vertical heterogeneity of soil moisture at 20 cm depth of soil could affect the relationship between SAR backscattering and volumetric soil moisture and might affect model prediction performance. It is also clear that a better correlation and model inversion performance would be observed with soil moisture measured at a top few centimeters of soil.

FLDAS Noah Model
To expand the spatial and temporal evaluation of the proposed method (SCA), a model-based soil moisture product has been used as an additional resource. Thus, FLDAS Noah model (simulated for East Africa) was used in this study, in addition to ground AWS, to calibrate and validate the SCA method [64]. The FLDAS Noah model is simulated from the widely used Noah land surface model and provides volumetric soil moisture data at different soil layers and spatiotemporal resolutions [64]. In this study, the top 10 cm soil layer moisture content with a spatial resolution of 0.1 × 0.1 degree was obtained from the NASA Goddard Earth Science Data and Information Services Center (GES DISC) web site. The FLDAS Noah measurements that correspond to Sentinel-1 SAR temporal coverage are available for 2016 and 2017 at a daily time step.

Methods
Volumetric soil moisture from both AWS observations and FLDAS Noah model were used as a dependent variable, while backscatter values of both VV and VH polarizations from Sentinel-1 SAR, vegetation information based on NDVI analysis, and elevation information derived from DEM data were considered as independent variables to calibrate and validate the model. The Sentinel-1 SAR data and DEM were provided with a spatial resolution of 10 and 30 m, respectively. Therefore, for the first model based on AWS observation, average values of backscatter measurements and elevation within a 1 × 1 km ground area were used to keep the spatial resolution consistent with MODIS NDVI. In this case, it was assumed that point measurement at the ground station is representing the average soil moisture in the area corresponding to remote sensing data. The main limitation is still the necessity for having sufficient number of distributed ground observation at each satellite footprints/grid cells in order to assure that the assumption of point measurement is corresponding to satellite observations. Similar methods of data preparation were applied to soil moisture data obtained from the FLDAS Noah model. Dual-polarized Sentinel-1 SAR, NDVI and DEM data measurements were resampled to the ground resolution of~10 × 10 km in order to match with the spatial resolution of FLDAS Noah soil moisture.
Studies have reported that soil moisture radar backscattering is affected by a number of time and space varying parameters such as vegetation, soil property, and topography [14,65,66]. Thus, our study was started by investigating the effect of each remote sensing input variable for soil moisture estimation in our area of interest. So, the linear regression analyses between remote sensing data and volumetric soil moisture was conducted ( Table 1). The simple linear regression was first done between backscattering values from VV polarization and volumetric soil moisture and then the multiple regressions were continued using VV and VH radar backscattering values. Afterward, in a step fashion, the other independent variables (i.e., vegetation and elevations) were sequentially added to the regression model. The models' coefficient of correlation (r) values were used to evaluate the soil moisture prediction performance of each model. All the regressions in Table 1 have a significant correlation at p < 0.01 with volumetric soil moisture (for both AWS observed and FLDAS Noah simulated model). The combination of VV and VH polarization has shown a slight improvement in the correlation values but VH backscatter seems to be important for the overall performance of the model. The inclusion of vegetation (NDVI) into the regression model has considerably improved the correlation values to 0.63 and 0.57 for AWS and model simulated soil moisture, respectively. The effect of elevation cannot be ignored in our case and its inclusion in the modeling process has further improved the prediction performance of the model with r 0.76 (0.65) for AWS (model simulated) soil moisture. Thus, to get complementary information from all these remote sensing variables, it was decided to feed a combination of σ VV , σ V H , NDVI, and E as input parameters to the proposed stepwise-cluster analysis model. According to [67], the combined use of remote sensing data obtained from different sensing systems (e.g., microwave and optical sensing system), can grant complementary information regarding the extent of soil moisture content in a given land use class.

Model Development
Considering the complexity and non-linearity of retrieval problems, a statistical relationship between volumetric soil moisture and remote sensing variables (i.e., σ VV , σ V H , NDVI, and DEM) was established using a SCA model. The principle of SCA is to divide samples (containing a number of independent and dependent variables) into a set of clusters with significant differences based on a series of cutting (i.e., splitting one set into two) and merging (i.e., joining two sets together) process according to a given statistical criterion [40][41][42]. Similar to other nonparametric tree regression statistical methods such as the Random Forest (RF), SCA can effectively capture the inherent nonlinear relationship between predictors and predictands [68], apply a defined set of criteria to split and merge datasets into different nodes, and use the regression tree method for predicting. However, unlike the SCA method, the RF uses a bootstrapping method for training/testing the model [69]. While splitting the tree's node, RF searches for the best features, among a random subset of features instead of searching for the most important features from the dataset. In addition, RF uses multiple decision trees for prediction, afterward these decision trees are merged together to get more stable estimation [70].
Basically, the SCA approach follows four major steps: (i) set criteria for cutting and merging clusters-, based on Wilks statistic [71], (ii) cutting/merging clusters operation-based on the criteria, (iii) produce single SCA cluster tree-, that contains a set of prediction nodes, and (iv) prediction. Generally, the SCA clustering process begins with a cutting action by which the original training sample dataset will be split into two groups. Then, the merging and cutting loops will be continued up until none of the sub-clusters can be further divided or merged with other sub-clusters. Finally, a cluster tree which contains a set of prediction nodes (tip cluster) will be generated from the training sample datasets and used to predict the dependent variable for any new values of the independent datasets. The flow of SCA model development is given in Figure 3.

Training
In order to train the SCA model, first, the original datasets were divided into training and testing datasets randomly. The training sample datasets contains a set of independent (σ VV , σ V H , NDVI, and DEM) and dependent variables (volumetric soil moisture). Assuming that there are n α samples, with m independent variables (X) and one dependent variable (Y). Thus, the training set can be given as one cluster (C), shown as the following equation (Equation (1)): Thus, a cluster tree can be derived through cutting and merging operation of the training set following the cut-merge loop provided in Figure 3.
Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 24 Thus, a cluster tree can be derived through cutting and merging operation of the training set following the cut-merge loop provided in Figure 3. Let cluster , which contains samples, be cut into two sub-clusters and , which contain and samples, respectively ( = + ). According to Wilks' likelihood-ratio principle, the cutting point is optimal only if the value of Wilks statistic Λ is minimum [40,71]. The smaller the Λ value refers the larger the difference between the sample means of and sub-clusters. When the Λ value is very large, sub-cluster and cannot be cut and must be merged instead. According to Rao's F-approximation [61], the Wilks Λ statistic under the above two sub-clusters ( and ) can be correlated to F-variant as follows (Equation (2)): Where = number of predictors. Since the Λ is related to the F statistics, the sample means of the sub-cluster and can be evaluated for their significant differences using an F-test [39]. Therefore, cutting (or merging) of clusters will be decided based on the F tests [72]. The null hypothesis would be : = versus the alternative hypothesis : ≠ , where and are sample mean of and . Let the significance level be . In this study, the sensitivity of modeling result has been tested for different significance levels (i.e., = 0.01, = 0.05, and = 0.1). An operation of cutting would be applied if: ≥ and is false, which implies that differences of means between two sub-clusters are significant; whereas, < and is true would be the merging action that indicates these two sub-clusters have no significant variations.
All the sub-cluster produced from the original training sample dataset will go through a number of iterative runs of cutting and merging processes and the training procedures are completed when all tests are undertaken and all hypotheses of further cut (or merge) are rejected. Then, a cluster tree can be obtained. Afterward, Y can be predicted for any new input data of X using the derived cluster tree. A cluster tree usually contains a tip cluster and a series of cutting and merging rules. Tip clusters are those clusters that contain the prediction systems, which can no Let cluster C, which contains n C samples, be cut into two sub-clusters β and γ, which contain n β and n γ samples, respectively (n C = n β + n γ ). According to Wilks' likelihood-ratio principle, the cutting point is optimal only if the value of Wilks statistic Λ is minimum [40,71]. The smaller the Λ value refers the larger the difference between the sample means of β and γ sub-clusters. When the Λ value is very large, sub-cluster β and γ cannot be cut and must be merged instead. According to Rao's F-approximation [61], the Wilks Λ statistic under the above two sub-clusters (β and γ) can be correlated to F-variant as follows (Equation (2)): where P = number of predictors. Since the Λ is related to the F statistics, the sample means of the sub-cluster β and γ can be evaluated for their significant differences using an F-test [39]. Therefore, cutting (or merging) of clusters will be decided based on the F tests [72]. The null hypothesis would be H 0 : µ β = µ γ versus the alternative hypothesis H 1 : µ β = µ γ , where µ β and µ γ are sample mean of β and γ. Let the significance level be α. In this study, the sensitivity of modeling result has been tested for different significance levels (i.e., α = 0.01, α = 0.05, and α = 0.1). An operation of cutting would be applied if: F cal ≥ F α and H 0 is false, which implies that differences of means between two sub-clusters are significant; whereas, F cal < F α and H 0 is true would be the merging action that indicates these two sub-clusters have no significant variations. All the sub-cluster produced from the original training sample dataset will go through a number of iterative runs of cutting and merging processes and the training procedures are completed when all tests are undertaken and all hypotheses of further cut (or merge) are rejected. Then, a cluster tree can be obtained. Afterward, Y can be predicted for any new input data of X using the derived cluster tree. A cluster tree usually contains a tip cluster and a series of cutting and merging rules. Tip clusters are those clusters that contain the prediction systems, which can no longer be split or merged with others. Usually, the mean value of the tip cluster is used to estimate the predicted results [41].

Prediction
Following the completion of model training, a cluster tree can be derived for a new sample prediction. The prediction is indeed a searching procedure by itself, starting from the top of the tree and ending at a tip cluster, following the route lead by the cutting and merging rules [45]. When a new sample (x 1 , x 2 , . . . , x m , y p : y p is unkown) enters the tree at a cutting point, step-by-step the sample set will eventually drop into one of the tip sub-cluster which cannot be either cut or merged further. The right tip sub-cluster is determined by the routes (or values) of new independent variables (x 1 , x 2 , . . . , x m ). The predicted value of y p will be the mean of dependent variables of the training samples in that tip cluster. Let cluster e be the tip cluster where the new sample {x m } enters. The predictand y p is (Equation (3)): where y e p = mean of dependent variable (e.g., volumetric soil moisture) in sub-cluster e (Equation (4)) and R e p = radius of y p in sub-cluster e (Equation (5)).
R e p = max y e p,k − min y e p,k /2, The correlation coefficient (r) and the root mean square error (RMSE) were used to evaluate the performance of the SCA model during the training and testing periods. The software packages (called rSCA) included in 'R' statistical packages were used in this study [73]. To examine the performance of the developed model, SCA was compared with a nonlinear support vector regression (SVR) method.

Support Vector Regression (SVR)
The support vector regression (SVR) technique is based on the structured risk minimization principle. This method maps the input data into a high dimensional feature space using non-linear mapping and then a linear regression problem is obtained in the feature space. A set of training data (x i , y i ) is considered where x i is the input vector (e.g., the SAR backscattering coefficients) and y i is the corresponding output vector (e.g., the volumetric soil moisture); i = 1, 2 . . . , L and L is the total number of data pairs, y ∈ R, x ∈ R D . The aim of the SVR model is to find a function f (x) that has at most ε-deviation from the actually obtained targets for all the training data (Equation (6)). The function is given as [74]: where w, x denotes the dot product of a weighted vector w and input vector x, and b is the bias. The first prediction is attained according to ε-insensitive losses function, where ε quantifies the tolerance to errors. A penalty function is applied to the output variables if the predicted value is greater than a distance ε from the actual values and, the penalty can be represented by one of two slack variables ξ i and ξ * i (where ξ i ≥ 0, ξ * i ≥ 0∀i) (Equation (7)). The cost function to minimize can then be written as: Satisfying the following constraints (Equation (8)) where C is a regularization parameter determining the tradeoff between the training errors and the complexity of the function f (x). The slack variables decide the degree to which sample data points are penalized if the error is greater than ε. Therefore, for any (absolute) error small than ε, ξ i = ξ * i = 0. The constrained optimization problem in Equation (7) can be solved using dual formulation. In dual formulation, Lagrange multipliers α and α * , are used and the minimization problem is solved by differentiating relating to the primary variables (Equation (9)). The final estimation function can then be written as follows: where α and α * are Lagrange multipliers; and k(x, x i ) is the kernel function. A kernel function measures non-linear dependence between the two input variables x and x i . The x i 's are "support vectors" and N (usually N L) is the number of selected data points or support vectors corresponding to values of the independent variable that are at least ε away from actual observations.
Several nonlinear kernel functions such as Radial Basis Function (RBF), linear, polynomial, and sigmoid have been proposed [75]. The RBF kernel (Equation (10)) performs better in comparison to other kernel functions [76]. The nonlinear radial basis function is defined as: where σ is known as the kernel parameter (radial width). Thus, a nonlinear SVR model using RBF kernel was developed to estimate residual soil moisture. The SVR model was tuned and the optimum values for insensitive loss function (ε), regularization parameter (C), and kernel parameter (σ) were used. In this study, an internal 10-fold cross-validation during the development of SVR model was used for the optimal combinations of the three parameters.

Stepwise Cluster Analysis
In this study, two different SCA cluster trees were generated to show the relationship between remote sensing variables and volumetric residual soil moisture (obtained from both AWS observations and model simulations). The prediction performance and the structure of the SCA tree could be affected by the internal parameters such as the cutting (or merging) action of clusters governed by the significance level (α) used in the analysis. According to Sun et al. [41] and Wang et al. [45], SCA analysis at different significance level could lead to different cluster trees with a different number of cluster nodes and predictions. Thus, it is vital to iteratively run SCA model adjusting for different significance levels until the prediction model showed the finest performance in reproducing observed values. The SCA cluster model, in this study, has been verified for different significance levels (i.e., α = 0.01, α = 0.05, and α = 0.1). Table 2 provides the statistical performance of the SCA cluster tree at different significance levels during training and testing phases. Also, Figure 4 provides the scattering properties of predicted values obtained from different significance levels during the testing periods.
Results in Table 2 indicated that the number of cluster nodes and cutting operations is increased with an increase of α from 0.01 to 0.1. The SCA cutting (or merging) action, as well as the rejection of the iteration process, relied on this parameter. Model with α = 0.1 produced a more complex cluster tree among the three significance levels and produced a large number of cluster nodes, due to more cutting actions than the others ( Table 2). The reason for this is that the higher α value results in the decreased strictness in the cutting operation [41,45]. Thus, the higher α value will lead to more cutting actions and more cluster nodes. According to the statistical results in Table 2, different values of significance levels lead to distinct prediction results. However, a good agreement was found between remote sensing based estimates and observed/simulated volumetric soil moisture for all significance levels of the SCA model. agreement was found between remote sensing based estimates and observed/simulated volumetric soil moisture for all significance levels of the SCA model.  For AWS based SCA model (Table 2), there is a slight improvement in predicting volumetric soil moisture while value increases from 0.01 to 0.05, with r being 0.95 (0.87) and RMSE 0.032 (0.097) m 3 /m 3 during training (testing) phase. Although further increasing value to 0.1 leads to the higher number of cluster nodes and cutting actions, it resulted to decrease the statistical performance of the prediction model, except a slight improvement in RMSE (0.088 m 3 /m 3 ) during the testing phase. Thus, the AWS based SCA model with = 0.05 is an optimal model for predicting volumetric residual soil moisture in our area of interest.
In the case of FLDAS Noah SCA model, an optimal prediction performance was observed when the value was set to 0.01 with r being 0.93 (0.87) and RMSE 0.043 (0.058) m 3 /m 3 during the training (testing) phase. In fact, the performance of the prediction model has improved while the value increases from 0.01 to 0.1 for the training datasets but not confirmed during the testing period. Thus, the validation result clearly demonstrated that the SCA model is a reliable technique for soil moisture prediction using remote sensing data. For AWS based SCA model (Table 2), there is a slight improvement in predicting volumetric soil moisture while α value increases from 0.01 to 0.05, with r being 0.95 (0.87) and RMSE 0.032 (0.097) m 3 /m 3 during training (testing) phase. Although further increasing α value to 0.1 leads to the higher number of cluster nodes and cutting actions, it resulted to decrease the statistical performance of the prediction model, except a slight improvement in RMSE (0.088 m 3 /m 3 ) during the testing phase. Thus, the AWS based SCA model with α = 0.05 is an optimal model for predicting volumetric residual soil moisture in our area of interest.
In the case of FLDAS Noah SCA model, an optimal prediction performance was observed when the α value was set to 0.01 with r being 0.93 (0.87) and RMSE 0.043 (0.058) m 3 /m 3 during the training (testing) phase. In fact, the performance of the prediction model has improved while the α value increases from 0.01 to 0.1 for the training datasets but not confirmed during the testing period. Thus, the validation result clearly demonstrated that the SCA model is a reliable technique for soil moisture prediction using remote sensing data.
Also, Figure 4 shows that at a lower significance level (i.e., α = 0.01, in Figure 4a,d) the prediction models have produced several redundant values, but still with strong correlation coefficients, for a single value of AWS observed and FLDAS Noah volumetric soil moisture. Thus, plots in Figure 4a,d have presented intense horizontal lines. However, as the significance level increased (Figure 4b,c,e,f), the prediction models have produced a relatively wide range of values. The two optimal SCA prediction model derived from the combinations of SAR and optical remote sensing data (Figure 4b,d) have shown a good performance in predicting maximum and minimum soil moisture values observed/simulated by automatic weather stations and FLDAS Noah model. The SCA has managed to produce volumetric soil moisture with an overall bias value of 1.21 and 0.99, where a value of 1 is a perfect score, in comparison to AWS observed and FLDAS Noah simulated soil moisture, respectively. Thus, the overall agreement between observed/simulated and predicted soil moisture indicates that coupling of satellite data (e.g., Sentinel-1 SAR and NDVI) and the nonlinear SCA approach is capable of detecting surface soil moisture and its spatiotemporal dynamics. Figures 5 and 6 gives the two optimal SCA cluster trees for the case of AWS and model simulated soil moisture. The cluster tree clearly shows the role of every independent remote sensing parameter in describing the relationship. Both Figures 5 and 6 demonstrated that X 3 (vegetation) is the most important variable that determines the accuracy of residual soil moisture prediction of the model. The other independent variables (X 1 = backscattering from VH polarization, X 2 = backscattering from VV polarization, and X 4 = elevation information) also have a profound effect on the predicted volumetric soil moisture. Based on these trees, the residual soil moisture values for new observations of the remote sensing variables can be predicted. Also, Figure 4 shows that at a lower significance level (i.e., = 0.01, in Figure 4a,d) the prediction models have produced several redundant values, but still with strong correlation coefficients, for a single value of AWS observed and FLDAS Noah volumetric soil moisture. Thus, plots in Figure 4a,d have presented intense horizontal lines. However, as the significance level increased (Figure 4b,c,e,f), the prediction models have produced a relatively wide range of values. The two optimal SCA prediction model derived from the combinations of SAR and optical remote sensing data (Figure 4b,d) have shown a good performance in predicting maximum and minimum soil moisture values observed/simulated by automatic weather stations and FLDAS Noah model. The SCA has managed to produce volumetric soil moisture with an overall bias value of 1.21 and 0.99, where a value of 1 is a perfect score, in comparison to AWS observed and FLDAS Noah simulated soil moisture, respectively. Thus, the overall agreement between observed/simulated and predicted soil moisture indicates that coupling of satellite data (e.g., Sentinel-1 SAR and NDVI) and the nonlinear SCA approach is capable of detecting surface soil moisture and its spatiotemporal dynamics. Figure 5 and 6 gives the two optimal SCA cluster trees for the case of AWS and model simulated soil moisture. The cluster tree clearly shows the role of every independent remote sensing parameter in describing the relationship. Both Figure 5 and 6 demonstrated that (vegetation) is the most important variable that determines the accuracy of residual soil moisture prediction of the model. The other independent variables ( = backscattering from VH polarization, = backscattering from VV polarization, and = elevation information) also have a profound effect on the predicted volumetric soil moisture. Based on these trees, the residual soil moisture values for new observations of the remote sensing variables can be predicted. Figure 5. The optimal SCA tree with significance level α = 0.05 for AWS observed soil moisture. The boxes are called as nodes (total nodes = 39). The nodes with green and yellow colors are tip clusters (14) which basically contains the prediction systems.
For example, let X 1 = −20.4, X 2 = −4.5, X 3 = 0.41 and X 4 = 2100 as a new observations for Figure 5 (AWS model) cluster tree. To predict the residual soil moisture: the new values: X 3 ≤ 0.437 for the first cluster so that the sample input enters to cluster 2; X 4 ≤ 2017, so that it enters to cluster 6; X 3 > 0.326, so that it enters to cluster 25; X 2 > −9.66 so that it finally enters to tip cluster 27 with a soil moisture prediction value of 0.409 m 3 /m 3 . On the same cluster tree (Figure 5), let us take another input sample, X 1 = −20.4, X 2 = −22.5, X 3 = 0.35, and X 4 = 2500. Then to predict the volumetric residual soil moisture for these new input variables, for the first branch X 3 ≤ 0.47, so that it enters to cluster 2; X 4 > 2417 so that it enters to cluster 7; X 3 ≤ 0.377, so that it enters to cluster 8; X 3 ≤ 0.376, so that it enters to cluster 10; X 3 > 0.306, so that it enters to cluster 13; X 1 > −22.37, so that it enters to cluster 15; X 3 ≤ 0.376, so that it enters to intermediate cluster 16  ≤0.437 for the first cluster so that the sample input enters to cluster 2; ≤2017, so that it enters to cluster 6; > 0.326, so that it enters to cluster 25; > −9.66 so that it finally enters to tip cluster 27 with a soil moisture prediction value of 0.409 m 3 /m 3 . On the same cluster tree (Figure 5) (Figure 6).

Comparing SCA with SVR Method
Results of the SCA model were also compared with the SVR, state-of-the-art techniques used for soil moisture prediction using remote sensing data [36,38]. The SVR model, using the same datasets as those used for the two clusters, was developed. Then, quantitative evaluation between model predicted and observed/simulated soil moisture were implemented (Figures 7 and 8). Scatter plots in Figure 7 presents the comparison between the model (SCA and SVR) predicted and AWS observed residual soil moisture both during the training and testing periods. In this case, both SCA and SVR model have shown a comparable performance in predicting residual soil moisture. However, the proposed method (SCA) outperformed the SVR model for predicting residual soil moisture in our area of interest (Figure 7) in terms of Pearson correlation coefficient (r) and the root mean square error (RMSE).

Comparing SCA with SVR method
Results of the SCA model were also compared with the SVR, state-of-the-art techniques used for soil moisture prediction using remote sensing data [36,38]. The SVR model, using the same datasets as those used for the two clusters, was developed. Then, quantitative evaluation between model predicted and observed/simulated soil moisture were implemented (Figures7 and 8). Scatter plots in Figure 7 presents the comparison between the model (SCA and SVR) predicted and AWS observed residual soil moisture both during the training and testing periods. In this case, both SCA and SVR model have shown a comparable performance in predicting residual soil moisture. However, the proposed method (SCA) outperformed the SVR model for predicting residual soil moisture in our area of interest (Figure 7) in terms of Pearson correlation coefficient (r) and the root mean square error (RMSE). Also, Figure 8 gives the comparison made between the model predicted and the FLDAS Noah model simulated soil moisture both during the training and testing phase of the analysis. The SCA soil moisture model has shown as good a performance as the SVR method, with slightly better prediction accuracy during the testing phase. The SCA model achieved r = 0.93 (0.87) and RMSE = 0.043 (0.058) m 3 /m 3 , while SVR method resulted in r = 0.93 (0.86) and RMSE = 0.043 (0.061) m 3 /m 3 during the training (testing) phase. In general, the result implied the better fitting and predictive performance of SCA tree relative to the SVR when dealing with the nonlinear relationship between remote sensing variables and volumetric soil moisture. In addition, unlike SVR, SCA produced a cluster tree that shows the links among variables and one can clearly identify the role of every independent variable in mapping the relationships. Also, Figure 8 gives the comparison made between the model predicted and the FLDAS Noah model simulated soil moisture both during the training and testing phase of the analysis. The SCA soil moisture model has shown as good a performance as the SVR method, with slightly better prediction accuracy during the testing phase. The SCA model achieved r = 0.93 (0.87) and RMSE = 0.043 (0.058) m 3 /m 3 , while SVR method resulted in r = 0.93 (0.86) and RMSE = 0.043 (0.061) m 3 /m 3 during the training (testing) phase. In general, the result implied the better fitting and predictive performance of SCA tree relative to the SVR when dealing with the nonlinear relationship between remote sensing variables and volumetric soil moisture. In addition, unlike SVR, SCA produced a cluster tree that shows the links among variables and one can clearly identify the role of every independent variable in mapping the relationships. Remote Sens. 2018, 10, x FOR PEER REVIEW 16 of 24

Spatial Patterns of Estimated Soil Moisture
Six soil moisture maps, for two selected sites of the study area, were presented from the time series to demonstrate the spatial variability of estimated soil moisture (using AWS based SCA prediction model) at various dates ( Figure 9). The spatial patterns of soil moisture in both sites (site one and site two) follow the meteorological and geomorphological conditions of the selected area. The higher soil moisture values for both sites have been observed in areas relatively with high vegetation cover and elevation values. Most parts of the sites with scattered vegetation and lower elevation have shown a comparatively small amount of estimated soil moisture values.
Thus, the higher soil moisture values for the site are observed in the southeastern and northwest parts, which appear towards the highest elevation and vegetation coverage areas. While the lower estimates are observed in north and south ends of the site that can be characterized by a relatively low elevation and scattered vegetation condition. With the same spatial pattern in site two, the higher soil moisture values are observed in higher elevation and vegetation areas situated in the south and southeastern parts, while their lower values are concentrated in the north and central parts. Indeed, for the selected dates in site one and site two the estimated soil moisture values are reasonably high in most places on 28 September 2016 (Figure 9a) and 29 October 2016 (Figure 9d), respectively, due to rainfall events and good surface moisture conditions on these dates. However, the spatial distributions of soil moisture are gradually reduced in the other dates (e.g., 22 October 2016 and 22 November 2016) following the dry days, with the exception of the river catchment areas, which are described by higher values of soil moisture even during the dry periods.

Spatial Patterns of Estimated Soil Moisture
Six soil moisture maps, for two selected sites of the study area, were presented from the time series to demonstrate the spatial variability of estimated soil moisture (using AWS based SCA prediction model) at various dates ( Figure 9). The spatial patterns of soil moisture in both sites (site one and site two) follow the meteorological and geomorphological conditions of the selected area.  (Figure 9d), respectively, due to rainfall events and good surface moisture conditions on these dates. However, the spatial distributions of soil moisture are gradually reduced in the other dates (e.g., 22 October 2016 and 22 November 2016) following the dry days, with the exception of the river catchment areas, which are described by higher values of soil moisture even during the dry periods.

Discussion
Previous studies (e.g., [16][17][18][19][20][21][22][77][78][79]) demonstrated that surface soil moisture (representing 0-5 cm depths) can be derived from SAR data. However, the radar backscattered signal is not only dependent on the soil moisture content but also sensitive to other time and space varying parameters such as vegetation, topography, and soil properties [14,65,66]. Thus, the linear relationship, between volumetric soil moisture and sentinel-1 SAR backscatter coefficients, made in this study result in lower r (0.34 to 0.36) values. Indeed, the lower r values in our study might not only attributed to the effect these surface parameters but it could be also due to the reduced sensitivity of SAR to soil moisture observed beyond the top few centimeters of soil. Note that simulated and observed soil moisture datasets at 10/20 cm depth of soil were used in this study. However, incorporating additional ancillary variables such as vegetation and elevation conditions of the study area have improved the linear models with r (0.65 to 0.76). Moreover, scholars (e.g., [36,56]) argued that incorporating these and other ancillary variables have further increased the accuracy of SAR based soil moisture prediction using the nonlinear regression model, such as SVR and artificial neural network (ANN) techniques. Although using more predictors would lead to more computational complexities, it can help to develop a more comprehensive relationship and further improve the model prediction performance [47].
In this paper, SCA was used as an alternative statistical approach intended for modeling the nonlinear relationships between remote sensing variables (i.e., dual-polarized Sentinel-SAR data, NDVI, and DEM) and volumetric soil moisture. The SCA model has been trained for volumetric soil moisture obtained from both AWS and FLDAS Noah models.
Previous studies (e.g., [45][46][47][48][49])applied SCA method in different disciplines have shown that SCA model is characterized by higher performance in describing the nonlinear relationship between state variables and dependent variables and better accuracy in predicting observed values. Our findings support these observations using the relationship between volumetric soil moisture and

Discussion
Previous studies (e.g., [16][17][18][19][20][21][22][77][78][79]) demonstrated that surface soil moisture (representing 0-5 cm depths) can be derived from SAR data. However, the radar backscattered signal is not only dependent on the soil moisture content but also sensitive to other time and space varying parameters such as vegetation, topography, and soil properties [14,65,66]. Thus, the linear relationship, between volumetric soil moisture and sentinel-1 SAR backscatter coefficients, made in this study result in lower r (0.34 to 0.36) values. Indeed, the lower r values in our study might not only attributed to the effect these surface parameters but it could be also due to the reduced sensitivity of SAR to soil moisture observed beyond the top few centimeters of soil. Note that simulated and observed soil moisture datasets at 10/20 cm depth of soil were used in this study. However, incorporating additional ancillary variables such as vegetation and elevation conditions of the study area have improved the linear models with r (0.65 to 0.76). Moreover, scholars (e.g., [36,56]) argued that incorporating these and other ancillary variables have further increased the accuracy of SAR based soil moisture prediction using the nonlinear regression model, such as SVR and artificial neural network (ANN) techniques. Although using more predictors would lead to more computational complexities, it can help to develop a more comprehensive relationship and further improve the model prediction performance [47].
In this paper, SCA was used as an alternative statistical approach intended for modeling the nonlinear relationships between remote sensing variables (i.e., dual-polarized Sentinel-SAR data, NDVI, and DEM) and volumetric soil moisture. The SCA model has been trained for volumetric soil moisture obtained from both AWS and FLDAS Noah models.
Previous studies (e.g., [45][46][47][48][49]) applied SCA method in different disciplines have shown that SCA model is characterized by higher performance in describing the nonlinear relationship between state variables and dependent variables and better accuracy in predicting observed values. Our findings support these observations using the relationship between volumetric soil moisture and remote sensing data. The SCA generated cluster trees could be used to predict volumetric soil moisture given inputs of the remote sensing variables. However, the process of SCA analysis is affected by a number of internal parameters such as the significance level [41,45]. Accordingly, various significance levels (i.e., 0.01, 0.05 and 0.1) have resulted in different cluster trees so that a considerable effect on the prediction performance of SCA model has been observed. Thus, an optimal soil moisture prediction cluster tree for AWS and FLDAS Noah based analysis was obtained at α = 0.05 and α = 0.01, respectively. Our result is consistent with the findings of [41], who applied the SCA model for microbial biomass prediction in food waste compositing, with an optimal prediction performance of α = 0.05 and α = 0.01 for thermophilic and mesophilic bacteria, respectively. This implies that an optimum prediction level of the SCA model could be affected by the type and the scale of datasets used in the calibration. Thus, for every dataset an iterative run that is adjusted for different significance levels is the best approach to obtain optimal prediction models using the SCA method. However, at lower significance levels our prediction models have produced several redundant values (still with strong correlation coefficients) for a single value of observed and simulated soil moisture. The same result has been obtained by [41,49] using the SCA method for the prediction of microbial biomass and concentrations of pollutants at lower significance levels, respectively. This could possibly be due to the limited cutting actions at lower significance levels, which in turn results in a lower number of tip clusters (prediction nodes).
The SCA method could not only help to provide the nonlinear relationships between remote sensing parameters and soil moisture, but also provide a cluster tree that shows the links among remote sensing variables and the effects of each variable on residual soil moisture values [49]. This could give us the basis for further understanding the inherent mechanism and determining critical characters of the soil moisture prediction model [47]. The beauty of SCA lays on its step-wise-regression method in which it iterative selects the most important covariates in the prediction model. Thus, each prediction node (tip clusters) contains the most important covariate variables according to the given statistical criteria, instead of incorporating all possible variables in the model. This could contribute in reducing/controlling the overfitting problem often shown in statistical prediction models [80]. Thus, our prediction tree (Figures 5 and 6) indicated that NDVI is the most important input variable, incorporated in each tip clusters, which has a significant effect on the output of the SCA model in comparison to SAR backscattering values and elevation information. Thus, the prediction accuracy of our model is highly controlled by the vegetation conditions of the study area. However, the SAR backscattering and DEM input variables have also a profound effect on the model and important for the optimum results. In this regard, our findings demonstrated the importance of integrating Sentinel-1 SAR data with ancillary surface variables obtained from other optical and/or microwave satellites for the finest prediction of surface soil moisture.
The support vector regression (SVR) method was analyzed to further illustrate the performance of SCA in soil moisture prediction, and the results are presented in Figures 7 and 8. The SCA method performs well in predicting residual soil moisture, particularly during the testing periods, with smaller prediction errors and higher correlation coefficients than the SVR model. Previous studies that applied the SCA model for predicting stream flow, hydrological process, and urban air quality have also confirmed the better fitting and predictive ability of SCA relative to other statistical methods such as random forest, ANN, and SVR [46][47][48][49]. The relatively good prediction accuracy by SCA method might be related to SCA's ability to discriminate the most important predictors and apply cutting/merging actions through searching for the minimum Wilks' statistics (Λ) in each step of the process [56]. Simply put, in the SCA, the optimal cutting point, which split the original sample dataset into two sub-clusters is determined through sequencing the values of the predictor (x m ). When the samples are sequenced according to the values of x m , should satisfy that Λ is minimum comparing to that of any other cutting alternatives using other predictor variables in the model. Then, the SCA analysis will calculate the mean of each sub-cluster and test for their mean difference using an F-test. If a significant difference between the two sub-clusters is observed, we can confirm that the original sample cluster can be cut into two sub-clusters using the optimal cutting point. Then, x m is identified as the most important predictor, which considerably affects the values of the predictands. If the mean difference of the sub-cluster is insignificant, the sample cluster cannot be cut and the analyst will go for testing other alternatives until no cluster can be further cut. Therefore, our finding indicates the potential of the SCA method and it could be used as an alternative statistical approach for soil moisture prediction using remote sensing data.
Our models (including SVR method) prediction error seems to be high in comparison to previous SAR/remote sensing based soil moisture prediction studies (e.g., [23][24][25][36][37][38]) and showed an overestimation of soil moisture in comparison to AWS observed values. This could be attributed to (i) the limited number of ground observed stations-with the limited number of observation stations, it is difficult to entirely characterize the spatial patterns of soil moisture over the study area, (ii) the use of soil moisture observed/simulated at 20/10 cm soil depth during model development, while microwave signals at the C-band are more sensitive to volumetric moisture to the top few centimeters of soil [81], (iii) spatial scale difference between ground observed points and satellite footprints/pixels, and (iv) sub-pixel heterogeneity of land surface conditions for lower scale analysis. Also, the overestimation of the model could be explained by the reduced relationship that could be established by low SAR backscattering values and volumetric soil moisture observed at 20 cm depths of soil during the dry periods. Because, being the dry season of our study period, where there is no/small amount of rainfall events and evaporation, leads to high vertical heterogeneity of soil moisture (i.e., sharply reduce the relationship between surface and 20 cm depth moisture) and low amounts of surface moisture, which in turn results in low backscattering values. The model reliability in this aspect could be improved further using a large number of distributed soil moisture datasets measured at ≤ a 5 cm depth of soil.

Conclusions
The aim of this paper was to develop a stepwise-cluster soil moisture inference model by analyzing the nonlinear relationships between multisource/multi-temporal remote sensing data and volumetric soil moisture in the Upper Blue Nile basin. Sentinel-1 SAR data, MODIS, and SRTM have been used as a source for dual-polarized SAR data, NDVI and elevation information, respectively. The analysis was carried out for the period of 2016 and 2017. Two separate SCA models were developed using volumetric soil moisture obtained from AWS and FLDAS Noah model simulations as response variables. The proposed technique incorporates combinations of SAR data (from both σ VV andσ V H ), NDVI, and DEM as input parameters to develop soil moisture prediction models. The Pearson correlation coefficient (r) and the root mean square error (RMSE) were calculated to present the accuracy of the developed prediction trees.
Our findings reveal that the nonlinear SCA approach can efficiently predict the volumetric residual soil moisture with r as much as 0.87 and RMSE of 0.058 m 3 /m 3 . Moreover, the results denoted the fact that NDVI is the most significant input variable, which has a considerable effect on the output of the SCA model. Compared to the support vector regression (SVR) model, SCA was better in fitting and predicting volumetric residual soil moisture. Thus, we conclude that the SCA is an alternative option for soil moisture prediction using remote sensing data, particularly when we are dealing with soil moisture estimation from multiple satellites. Also, ancillary information (such as vegetation condition, elevation information, and soil properties) obtained from other sensors (e.g., optical sensors) was verified substantial for the finest performance of SAR based soil moisture prediction. We argue that this study is the first attempt to shape the SCA technique for mapping the relationship between remote sensing variables and volumetric soil moisture. The model can be easily transferable to other sites with different climate, land use land cover condition, and geo-morphological settings, given the free and global coverage of C-band SAR, MODIS NDVI and SRTM data. However, it should be noted that the model needs further validation work on independent sites using ground measurements taken at the top few centimeters of soil. In addition, the optimum prediction of the model could be affected by the type and scale of the dataset used and better performances are achieved with multiple input variables.
In the future, the SCA method can be further enhanced for more reliable results by incorporating other auxiliary parameters (e.g., soil texture, soil roughness, and soil temperature).
Further, it is likely that the SCA would have a wider application to other complex relationships in hydrology.
Author Contributions: G.A., T.T. and B.G. conceived and designed the research; G.A. and Y.Y. performed the data collection; G.A., T.T. and B.G. analyzed the results; G.A. wrote the original manuscript and A.C., T.T., B.G. and Y.Y. edited the manuscript. Acknowledgments: The authors would like to thank the National Meteorological Agency (NMA) of Ethiopia for providing automatic weather stations (AWS) soil moisture data found in the UBN basin. We are also grateful to the European Space Agency (ESA), USGS and NASA for providing Sentinel-1 SAR data, MODIS NDVI, and the FLDAS Noah soil moisture products, respectively.

Conflicts of Interest:
The authors declare no conflict of interest.