Spatial Downscaling Methods of Soil Moisture Based on Multisource Remote Sensing Data and Its Application

Soil moisture is an important indicator that is widely used in meteorology, hydrology, and agriculture. Two key problems must be addressed in the process of downscaling soil moisture: the selection of the downscaling method and the determination of the environmental variables, namely, the influencing factors of soil moisture. This study attempted to utilize machine learning and data mining algorithms to downscale the Advanced Microwave Scanning Radiometer-Earth Observing System (AMSR-E) soil moisture data from 25 km to 1 km and compared the advantages and disadvantages of the random forest model and the Cubist algorithm to determine the more suitable soil moisture downscaling method for the middle and lower reaches of the Yangtze River Basin (MLRYRB). At present, either the normalized difference vegetation index (NDVI) or a digital elevation model (DEM) is selected as the environmental variable for the downscaling models. In contrast, variables, such as albedo and evapotranspiration, are infrequently applied; nevertheless, this study selected these two environmental variables, which have a considerable impact on soil moisture. Thus, the selected environmental variables in the downscaling process included the longitude, latitude, elevation, slope, NDVI, daytime and nighttime land surface temperature (LST_D and LST_N, respectively), albedo, evapotranspiration (ET), land cover (LC) type, and aspect. This study achieved downscaling on a 16-day timescale based on Moderate Resolution Imaging Spectroradiometer (MODIS) data. A comparison of the random forest model with the Cubist algorithm revealed that the R2 of the random forest-based downscaling method is higher than that of the Cubist algorithm-based method by 0.0161; moreover, the root-mean-square error (RMSE) is reduced by 0.0006 and the mean absolute error (MAE) is reduced by 0.0014. Testing the accuracies of these two downscaling methods showed that the random forest model is more suitable than the Cubist algorithm for downscaling AMSR-E soil moisture data from 25 km to 1 km in the MLRYRB, which provides a theoretical basis for obtaining high spatial resolution soil moisture data.


Introduction
Soil moisture is an important component of both the water cycle and the surface energy cycle; it is also an important indicator for reflecting land degradation and characterizing surface drought information [1][2][3][4][5]. Soil moisture is related to a number of factors, which include vegetation growth, crop growth, and food production, as well as important parameters in the fields of hydrology, climate research, agriculture, and ecology [6,7]. Consequently, soil moisture has been widely used in various environmental applications, such as hydrological modeling, land surface evapotranspiration simulation, When compared with the global regression model, GWR mainly introduces geographic location information into the regression model [35,36]. However, due to the limitations of the GWR algorithm, it is impossible to effectively screen local environmental variables with the closest soil moisture based on the spatial distribution; as a result, GWR is not applicable for obtaining multiple factors to downscale the soil moisture. To date, following the emergence of machine learning algorithms, the random forest algorithm, which is an ensemble learning algorithm that was developed on the basis of decision trees, has been widely used in many fields, because it provides better ability in capturing the nonlinear relationships between variables. Shi et al. [37] downscaled TRMM products based on the random forest model and obtained precipitation data at a 1 km resolution over the Tibet Plateau. In addition, Ma et al. [38] introduced the Cubist data mining algorithm to downscale TRMM annual precipitation data across the Tibet Plateau from a resolution of 0.25 • × 0.25 • to 1 km × 1 km. The results demonstrated that the performance of the Cubist algorithm is better than that of the GWR model.
Although both algorithms are very effective for remote sensing precipitation products, the most widely used downscaling algorithm for passive microwave radiometer soil moisture measurements is still the empirical relationship with optical remote sensing images. Therefore, the purpose of this study is to use the random forest and Cubist models to downscale the AMSR-E soil moisture products in the study area to obtain higher-resolution soil moisture data. The specific objectives are as follows: (1) to apply the random forest and Cubist algorithms to downscale AMSR-E passive microwave products from 25 km to 1 km and (2) to compare the downscaling results of these two models to determine the most suitable downscaling algorithm for the study area.

Study Area
The Yangtze River is one of the major rivers in China, maintaining a length of approximately 6300 km and a catchment area of approximately 1.8 million km 2 . We choose the middle and lower reaches of the Yangtze River Basin (MLRYRB, see Figure 1), which lies between 25 • and 35 • N and between 106 • and 122 • E, as the study area. The Terra and Aqua combined Moderate Resolution Imaging Spectroradiometer (MODIS) Land Cover Type (MCD12Q1) Version 6 data product adopts five different land cover classification schemes. This study adopts the second classification method of the Annual University of Maryland (UMD) classification, which shows the land cover types in the MLRYRB.

Data
The AMSR-E is a multichannel passive microwave sensor that was launched on NASA's Earth Observing System (EOS) Aqua satellite in May 2002, with daily ascending (13:30 equatorial local crossing time) and descending (01:30 equatorial local crossing time) overpasses [25,39]. In this study, we select the Level-3 land surface product of the AMSR-E (AE_Land3) onboard NASA's Aqua satellite with a spatial resolution of 25 km × 25 km ( Figure 2). The MODIS products that were utilized in this study, including NDVI, daytime land surface temperature (LST_D), nighttime LST (LST_N), albedo, land cover (LC) type, and evapotranspiration (ET) products, are acquired from NASA. Table  1 summarizes the attribute information of the MODIS data set that was used in this study. In addition, the Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) product was obtained by the International Scientific & Technical Data Mirror Site, Computer Network Information Center, Chinese Academy of Science, with a spatial resolution of 90 m (http://www.gscloud.cn). The slope and aspect are derived from the DEM data. Additionally, the observed soil moisture data (0-10 cm) in situ soil stations (as shown in Figure 1) used in this study were provided from the China Meteorological Data Sharing Service System (available at http://cdc.nmic.cn/home.do).

Data
The AMSR-E is a multichannel passive microwave sensor that was launched on NASA's Earth Observing System (EOS) Aqua satellite in May 2002, with daily ascending (13:30 equatorial local crossing time) and descending (01:30 equatorial local crossing time) overpasses [25,39]. In this study, we select the Level-3 land surface product of the AMSR-E (AE_Land3) onboard NASA's Aqua satellite with a spatial resolution of 25 km × 25 km ( Figure 2). The MODIS products that were utilized in this study, including NDVI, daytime land surface temperature (LST_D), nighttime LST (LST_N), albedo, land cover (LC) type, and evapotranspiration (ET) products, are acquired from NASA. Table 1 summarizes the attribute information of the MODIS data set that was used in this study. In addition, the Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) product was obtained by the International Scientific & Technical Data Mirror Site, Computer Network Information Center, Chinese Academy of Science, with a spatial resolution of 90 m (http://www.gscloud.cn). The slope and aspect are derived from the DEM data. Additionally, the observed soil moisture data (0-10 cm) in situ soil stations (as shown in Figure 1) used in this study were provided from the China Meteorological Data Sharing Service System (available at http://cdc.nmic.cn/home.do).

Methodology
We choose the rule-based machine learning approaches, including the random forest and Cubist algorithms, to downscale the AMSR-E soil moisture data from 25 km to 1 km. The random forest method, which is a popularly used machine-learning method, uses randomization when selecting the features at each node. The Cubist method is a spatial data mining algorithm that applies a divideand-conquer strategy. The reader is referred to the literature for an introduction to these two machine learning algorithms [38,[40][41][42][43]. The spatial downscaling method is based on the relationship between soil moisture and various environmental variables. Two key problems must be addressed to downscale the AMSR-E soil moisture data: one is the selection of the downscaling method, and the other is the determination of the surface variables of soil moisture. This study attempts to introduce machine learning algorithms into the downscaling model and compares the advantages and disadvantages of the random forest model with those of the Cubist algorithm to determine the most suitable soil moisture downscaling method for the MLRYRB. The NDVI, DEM, and surface temperature are the most commonly employed environmental variables. In contrast, there are few

Methodology
We choose the rule-based machine learning approaches, including the random forest and Cubist algorithms, to downscale the AMSR-E soil moisture data from 25 km to 1 km. The random forest method, which is a popularly used machine-learning method, uses randomization when selecting the features at each node. The Cubist method is a spatial data mining algorithm that applies a divide-and-conquer strategy. The reader is referred to the literature for an introduction to these two machine learning algorithms [38,[40][41][42][43]. The spatial downscaling method is based on the relationship between soil moisture and various environmental variables. Two key problems must be addressed to downscale the AMSR-E soil moisture data: one is the selection of the downscaling method, and the other is the determination of the surface variables of soil moisture. This study attempts to introduce machine learning algorithms into the downscaling model and compares the advantages and disadvantages of the random forest model with those of the Cubist algorithm to determine the most suitable soil moisture downscaling method for the MLRYRB. The NDVI, DEM, and surface temperature are the most commonly employed environmental variables. In contrast, there are few applications for variables, such as albedo and ET; nevertheless, this study considers these factors to have a greater impact on soil moisture. Therefore, the longitude, latitude, DEM, slope, aspect, NDVI, LST_D, LST_N, albedo, ET, and LC are the final environmental variables that were utilized in the downscaling process.
The basic idea of the downscaling method is to first establish the relationship between soil moisture and all of the environmental variables at a spatial resolution of 25 km and then apply the established model to the environmental variables, which have a resolution of 1 km, to obtain spatially continuous soil moisture with a 1 km spatial resolution. The main steps of the downscaling process are as follows: (1) First, the environmental variables during the 2003-2010 period are resampled to 25 km and 1 km, and the machine learning models, including the random forest and Cubist models, with a 16-day timescale, are established at a spatial resolution of 25 km: where SM 25km is the AMSR-E soil moisture data and f (SM 25km ) is the downscaling model, namely, either the random forest model or the Cubist model.
(2) Subsequently, the AMSR-E soil moisture data are subtracted from the estimated soil moisture at a spatial resolution of 25 km to obtain a residual at 25 km, after which residual at 25 km model is resampled to 1 km; (3) The established model is applied to the environmental variables at a spatial resolution of 1 km to obtain soil moisture data of 1 km before applying a residual correction: (4) Finally, the estimated value at 1 km is added to the residual at 1 km to obtain the 16-day soil moisture data with a spatial resolution of 1 km after a residual correction. Figure 3 provides the flowchart to illustrate the main steps of this procedure.
applications for variables, such as albedo and ET; nevertheless, this study considers these factors to have a greater impact on soil moisture. Therefore, the longitude, latitude, DEM, slope, aspect, NDVI, LST_D, LST_N, albedo, ET, and LC are the final environmental variables that were utilized in the downscaling process.
The basic idea of the downscaling method is to first establish the relationship between soil moisture and all of the environmental variables at a spatial resolution of 25 km and then apply the established model to the environmental variables, which have a resolution of 1 km, to obtain spatially continuous soil moisture with a 1 km spatial resolution. The main steps of the downscaling process are as follows: (1) First, the environmental variables during the 2003-2010 period are resampled to 25 km and 1 km, and the machine learning models, including the random forest and Cubist models, with a 16day timescale, are established at a spatial resolution of 25 km: where SM25km is the AMSR-E soil moisture data and f(SM25km) is the downscaling model, namely, either the random forest model or the Cubist model.
(2) Subsequently, the AMSR-E soil moisture data are subtracted from the estimated soil moisture at a spatial resolution of 25 km to obtain a residual at 25 km, after which residual at 25 km model is resampled to 1 km; (3) The established model is applied to the environmental variables at a spatial resolution of 1 km to obtain soil moisture data of 1 km before applying a residual correction: (4) Finally, the estimated value at 1 km is added to the residual at 1 km to obtain the 16-day soil moisture data with a spatial resolution of 1 km after a residual correction. Figure 3 provides the flowchart to illustrate the main steps of this procedure.

Relationship between Soil Moisture and Environmental Variables
Most of the environmental variables, namely, the longitude, latitude, elevation, slope, NDVI, LST_D, LST_N, albedo, and ET, are numerical variables, whereas the LC and aspect are factor variables. Thus, Figure 4 shows the soil moisture values corresponding to different LC types, which demonstrates that the AMSR-E soil moisture values in different LC types in the MLRYRB exhibit the following trend: deciduous broadleaf forests > evergreen needleleaf forests > mixed forests > woody savannas > evergreen broadleaf forests > savannas > cropland/natural vegetation mosaics > croplands > grasslands > urban and built-up lands. Additionally, the AMSR-E soil moisture values of different forestland LC types are higher than those of the other LC types. Similarly, the distribution of soil moisture values that are based on the aspect from 2003 to 2010 is shown in Figure 5. As shown, in addition to the low AMSR-E value of the flat aspect (Flat = 0.1112), the AMSR-E values of the other eight geographical slopes do not substantially vary (the mean of each slope direction is given in parentheses): southeast Although the aspect has certain influence on the distribution of soil moisture, it is also closely related to the topography and surface vegetation cover type. Figure 6 shows the interannual variation of the AMSR-E soil moisture throughout the study area in 2003-2010. The shaded portion around the straight blue fitting line is the standard deviation of the linear fit. The soil moisture mean clearly exhibits a downward trend during the entire period of 2003-2010; in addition, the standard deviation in the middle of the study period is relatively small, whereas that in the beginning and end of the study period is relatively large.

Relationship between Soil Moisture and Environmental Variables
Most of the environmental variables, namely, the longitude, latitude, elevation, slope, NDVI, LST_D, LST_N, albedo, and ET, are numerical variables, whereas the LC and aspect are factor variables. Thus, Figure 4 shows the soil moisture values corresponding to different LC types, which demonstrates that the AMSR-E soil moisture values in different LC types in the MLRYRB exhibit the following trend: deciduous broadleaf forests > evergreen needleleaf forests > mixed forests > woody savannas > evergreen broadleaf forests > savannas > cropland/natural vegetation mosaics > croplands > grasslands > urban and built-up lands. Additionally, the AMSR-E soil moisture values of different forestland LC types are higher than those of the other LC types. Similarly, the distribution of soil moisture values that are based on the aspect from 2003 to 2010 is shown in Figure 5. As shown, in addition to the low AMSR-E value of the flat aspect (Flat = 0.1112), the AMSR-E values of the other eight geographical slopes do not substantially vary (the mean of each slope direction is given in parentheses): southeast (Southeast = 0.1296) > south (South = 0.1280) > northwest (Northwest = 0.1277) > west (West = 0.1269) > north (North = 0.1264) > southwest (Southwest = 0.1263) > northeast (Northeast = 0.1248) > east (East = 0.1234). Although the aspect has certain influence on the distribution of soil moisture, it is also closely related to the topography and surface vegetation cover type. Figure 6 shows the interannual variation of the AMSR-E soil moisture throughout the study area in 2003-2010. The shaded portion around the straight blue fitting line is the standard deviation of the linear fit. The soil moisture mean clearly exhibits a downward trend during the entire period of 2003-2010; in addition, the standard deviation in the middle of the study period is relatively small, whereas that in the beginning and end of the study period is relatively large.

Downscaling Results Based on the Random Forest Algorithm
The environmental variables at 1 km spatial resolution are applied to estimate soil moisture at 1 km spatial resolution according to the soil moisture downscaling model that was constructed with the random forest at a spatial resolution of 25 km × 25 km ( Figure 7b); then, the residuals at 25 km × 25 km are resampled to residuals at 1 km while using bilinear interpolation ( Figure 7c). Next, the estimated values at 1 km are added to the residuals at 1 km to obtain 16-day soil moisture data with a spatial resolution of 1 km after applying a residual correction (Figure 7d). Figure 7 shows that the original AMSR-E soil moisture data are consistent with the trend of the estimated soil moisture data

Downscaling Results Based on the Random Forest Algorithm
The environmental variables at 1 km spatial resolution are applied to estimate soil moisture at 1 km spatial resolution according to the soil moisture downscaling model that was constructed with the random forest at a spatial resolution of 25 km × 25 km ( Figure 7b); then, the residuals at 25 km × 25 km are resampled to residuals at 1 km while using bilinear interpolation ( Figure 7c). Next, the estimated values at 1 km are added to the residuals at 1 km to obtain 16-day soil moisture data with a spatial resolution of 1 km after applying a residual correction (Figure 7d). Figure 7 shows that the original AMSR-E soil moisture data are consistent with the trend of the estimated soil moisture data

Downscaling Results Based on the Random Forest Algorithm
The environmental variables at 1 km spatial resolution are applied to estimate soil moisture at 1 km spatial resolution according to the soil moisture downscaling model that was constructed with the random forest at a spatial resolution of 25 km × 25 km ( Figure 7b); then, the residuals at 25 km × 25 km are resampled to residuals at 1 km while using bilinear interpolation ( Figure 7c). Next, the estimated values at 1 km are added to the residuals at 1 km to obtain 16-day soil moisture data with a spatial resolution of 1 km after applying a residual correction (Figure 7d). Figure 7 shows that the original AMSR-E soil moisture data are consistent with the trend of the estimated soil moisture data after applying a residual correction, which indicates that the random forest-based downscaling method is well applicable to the study area. In addition, Figure 7 also shows that the soil moisture after the residual correction is more detailed and more representative of the spatial variation in the soil moisture values and is closer to the original AMSR-E soil moisture data distribution than the estimated results before the residual correction.
Water 2019, 11, x FOR PEER REVIEW 9 of 25 after applying a residual correction, which indicates that the random forest-based downscaling method is well applicable to the study area. In addition, Figure 7 also shows that the soil moisture after the residual correction is more detailed and more representative of the spatial variation in the soil moisture values and is closer to the original AMSR-E soil moisture data distribution than the estimated results before the residual correction. We compare the downscaled results with the original AMSR-E soil moisture to further validate the performance of the random forest-based downscaling model (Figure 8). The left panels in Figure  8 are scatterplots between the downscaling results before the residual correction and the original AMSR-E data, while the right panels are the scatterplots between the downscaling results after the residual correction and the original AMSR-E soil moisture. The transparency in Figure 8 is set according to the number of data points. A darker graph indicates that the density is higher and that the data points are very concentrated; the lighter the color is, the smaller the density, that is, the less the data are scattered. Evidently, the range of R 2 between the downscaling results before the residual correction and the original AMSR-E data is 0.55-0.64, while the range of R 2 after the residual correction is 0.68-0.74. The scatter plots on the left side of Figure 8 indicate that most of the data points are distributed above the 1:1 line, when the soil moisture is less than 0.12, and the data points are distributed below the 1:1 line when the soil moisture is greater than 0.12. In the scatter plots on the right side of Figure 8 corresponding to the downscaling results after the residual correction, the data points are evenly distributed on both sides of the 1:1 line. In general, the correlation between the AMSR-E soil moisture and the results of the random forest-based downscaling model are very good, which indicates that this downscaling model has good applicability in the MLRYRB. We compare the downscaled results with the original AMSR-E soil moisture to further validate the performance of the random forest-based downscaling model (Figure 8). The left panels in Figure 8 are scatterplots between the downscaling results before the residual correction and the original AMSR-E data, while the right panels are the scatterplots between the downscaling results after the residual correction and the original AMSR-E soil moisture. The transparency in Figure 8 is set according to the number of data points. A darker graph indicates that the density is higher and that the data points are very concentrated; the lighter the color is, the smaller the density, that is, the less the data are scattered. Evidently, the range of R 2 between the downscaling results before the residual correction and the original AMSR-E data is 0.55-0.64, while the range of R 2 after the residual correction is 0.68-0.74. The scatter plots on the left side of Figure 8 indicate that most of the data points are distributed above the 1:1 line, when the soil moisture is less than 0.12, and the data points are distributed below the 1:1 line when the soil moisture is greater than 0.12. In the scatter plots on the right side of Figure 8 corresponding to the downscaling results after the residual correction, the data points are evenly distributed on both sides of the 1:1 line. In general, the correlation between the AMSR-E soil moisture and the results of the random forest-based downscaling model are very good, which indicates that this downscaling model has good applicability in the MLRYRB.

Downscaling Results Based on the Cubist Model
This study selects the same environmental variables as those used for the random forest-based downscaling model for the Cubist downscaling algorithm: the longitude, latitude, elevation, slope, NDVI, LST_D, LST_N, albedo, ET, LC type, and aspect. However, the results show that, due to the linear relationship among longitude, latitude, and soil moisture, blocky features are too obvious in the downscaling results of the Cubist algorithm (see the red rectangles in Figure 9). In other words, the relationship for each rule between the soil moisture data and the latitude and longitude displays abrupt changes, which is obviously contrary to the continuity of soil moisture in space. Therefore, this study excludes two surface variables in the Cubist downscaling model, namely, longitude and latitude.

Downscaling Results Based on the Cubist Model
This study selects the same environmental variables as those used for the random forest-based downscaling model for the Cubist downscaling algorithm: the longitude, latitude, elevation, slope, NDVI, LST_D, LST_N, albedo, ET, LC type, and aspect. However, the results show that, due to the linear relationship among longitude, latitude, and soil moisture, blocky features are too obvious in the downscaling results of the Cubist algorithm (see the red rectangles in Figure 9). In other words, the relationship for each rule between the soil moisture data and the latitude and longitude displays abrupt changes, which is obviously contrary to the continuity of soil moisture in space. Therefore, this study excludes two surface variables in the Cubist downscaling model, namely, longitude and latitude. Water 2019, 11, x FOR PEER REVIEW 13 of 25 The downscaling process is carried out based on the Cubist model after removing the latitude and longitude surface variables. Table 2 lists the spatial regression relationship between the soil moisture and each surface variable (in the case of DOY = 2003001). 11 rules are generated after removing the longitude and latitude variables; however, not all of the variables participate in the downscaling model in each rule. This is one of the advantages of the Cubist algorithm, which automatically filters the optimal combination of variables that are required within the rules. To more intuitively ascertain whether each variable participates in the relationship of each rule, this study uses the R Lattice package to draw the regression coefficient graph of each rule ( Figure 10). As the aspect and LC are factor variables, they only participate in the classification of each rule and do not participate in the regression calculations. Figure 10 shows that the albedo and ET surface variables participate in fewer rules; albedo participates in the model construction of the first and sixth rules, while the ET participates in the first, fifth, and tenth rules. These graphs can visualize the regression coefficient and intercept participating in each rule. Figure 11 shows the distributions of the main split nodes of the environmental variables in the 11 rules of the Cubist downscaling algorithm (taking DOY = 2003001 as an example). The x-axis is the range of each variable and it is normalized from 0 to 1, while the y-axis is the split node of the variable used in each rule. For example, if a rule's variable is less than a certain value, then the rule's line will be drawn blue; if a rule's variable is greater than a certain value, the rule's line will be drawn purple. Figure 11 shows that during the 16-day period corresponding to DOY = 2003001, the variables composing the main split nodes are the NDVI, slope, DEM, and ET. The DEM is a main split node in all 11 rules, whereas the slope is a main split node in every rule, except the fourth, sixth, and eighth rules, the ET plays the role of a main split node in four of the rules, and the NDVI is a main split node in the first and fifth rules. The regression relationship and split node information of each rule in the Cubist downscaling algorithm are fully reflected through Table 2   The downscaling process is carried out based on the Cubist model after removing the latitude and longitude surface variables. Table 2 lists the spatial regression relationship between the soil moisture and each surface variable (in the case of DOY = 2003001). 11 rules are generated after removing the longitude and latitude variables; however, not all of the variables participate in the downscaling model in each rule. This is one of the advantages of the Cubist algorithm, which automatically filters the optimal combination of variables that are required within the rules. To more intuitively ascertain whether each variable participates in the relationship of each rule, this study uses the R Lattice package to draw the regression coefficient graph of each rule ( Figure 10). As the aspect and LC are factor variables, they only participate in the classification of each rule and do not participate in the regression calculations. Figure 10 shows that the albedo and ET surface variables participate in fewer rules; albedo participates in the model construction of the first and sixth rules, while the ET participates in the first, fifth, and tenth rules. These graphs can visualize the regression coefficient and intercept participating in each rule. Figure 11 shows the distributions of the main split nodes of the environmental variables in the 11 rules of the Cubist downscaling algorithm (taking DOY = 2003001 as an example). The x-axis is the range of each variable and it is normalized from 0 to 1, while the y-axis is the split node of the variable used in each rule. For example, if a rule's variable is less than a certain value, then the rule's line will be drawn blue; if a rule's variable is greater than a certain value, the rule's line will be drawn purple. Figure 11 shows that during the 16-day period corresponding to DOY = 2003001, the variables composing the main split nodes are the NDVI, slope, DEM, and ET. The DEM is a main split node in all 11 rules, whereas the slope is a main split node in every rule, except the fourth, sixth, and eighth rules, the ET plays the role of a main split node in four of the rules, and the NDVI is a main split node in the first and fifth rules. The regression relationship and split node information of each rule in the Cubist downscaling algorithm are fully reflected through Table 2 and the graphs in Figures 10 and 11.      The environmental variables with a 1 km spatial resolution are applied to estimate soil moisture at a 1 km spatial resolution according to the Cubist-based soil moisture downscaling model at a spatial resolution of 25 km × 25 km (Figure 12b); then, the residuals at 25 km × 25 km are resampled to residuals at 1 km using bilinear interpolation (Figure 12c). Next, the estimated values at 1 km are added to the residuals at 1 km to obtain 16-day soil moisture data with a spatial resolution of 1 km after applying a residual correction. Figure 12 demonstrates that the AMSR-E passive microwave soil moisture data are consistent with the distribution of the residual-corrected estimated values, which indicates that the Cubist-based downscaling method also shows good performance in the MLRYRB. The environmental variables with a 1 km spatial resolution are applied to estimate soil moisture at a 1 km spatial resolution according to the Cubist-based soil moisture downscaling model at a spatial resolution of 25 km × 25 km (Figure 12b); then, the residuals at 25 km × 25 km are resampled to residuals at 1 km using bilinear interpolation (Figure 12c). Next, the estimated values at 1 km are added to the residuals at 1 km to obtain 16-day soil moisture data with a spatial resolution of 1 km after applying a residual correction. Figure 12 demonstrates that the AMSR-E passive microwave soil moisture data are consistent with the distribution of the residual-corrected estimated values, which indicates that the Cubist-based downscaling method also shows good performance in the MLRYRB. We compare the original AMSR-E soil moisture data with the downscaling results to further validate the effectiveness of the Cubist-based downscaling algorithm (including the downscaling results before and after the residual correction). Again, we set the transparency in Figure 13 according to the data density: the darker areas correspond to a higher data density, while lighter areas indicate fewer data points. The graphs on the left in Figure 13 are the downscaling results of the Cubist algorithm before the residual correction, and the graphs on the right are the soil moisture downscaling results after the residual correction. Figure 13 shows that the R 2 values of the downscaled data before the residual correction vary from 0.50 to 0.56, while the R 2 values of the downscaled data based on the random forest model before the residual correction range from 0.55-0.64. The correlations of the random forest-based results are higher than those of the Cubistdownscaling results. In addition, the distributions of data points in the scatter plots of the Cubist model are similar to those of the random forest model. Most of the downscaled data before the residual correction are higher than the original AMSR-E soil moisture values when the soil moisture is less than 0.12; when the soil moisture is greater than 0.12, most of the original AMSR-E soil moisture values are larger than the downscaled soil moisture values. Moreover, the results are not evenly We compare the original AMSR-E soil moisture data with the downscaling results to further validate the effectiveness of the Cubist-based downscaling algorithm (including the downscaling results before and after the residual correction). Again, we set the transparency in Figure 13 according to the data density: the darker areas correspond to a higher data density, while lighter areas indicate fewer data points. The graphs on the left in Figure 13 are the downscaling results of the Cubist algorithm before the residual correction, and the graphs on the right are the soil moisture downscaling results after the residual correction. Figure 13 shows that the R 2 values of the downscaled data before the residual correction vary from 0.50 to 0.56, while the R 2 values of the downscaled data based on the random forest model before the residual correction range from 0.55-0.64. The correlations of the random forest-based results are higher than those of the Cubist-downscaling results. In addition, the distributions of data points in the scatter plots of the Cubist model are similar to those of the random forest model. Most of the downscaled data before the residual correction are higher than the original AMSR-E soil moisture values when the soil moisture is less than 0.12; when the soil moisture is greater than 0.12, most of the original AMSR-E soil moisture values are larger than the downscaled soil moisture values. Moreover, the results are not evenly distributed on both sides of the 1:1 line; that is, when compared with the random forest algorithm, the data points that are based on the Cubist algorithm are more dispersed. The R 2 values between the residual-corrected downscaling results of the Cubist algorithm and the AMSR-E soil moisture values range from 0.68 to 0.71, which is lower than the range of R 2 values corresponding to the random forest downscaling algorithm (0.68 to 0.74). The distribution of data points is more concentrated after the residual correction than the downscaling results before the residual correction, which indicates that a residual correction can significantly improve the downscaling results of soil moisture data. distributed on both sides of the 1:1 line; that is, when compared with the random forest algorithm, the data points that are based on the Cubist algorithm are more dispersed. The R 2 values between the residual-corrected downscaling results of the Cubist algorithm and the AMSR-E soil moisture values range from 0.68 to 0.71, which is lower than the range of R 2 values corresponding to the random forest downscaling algorithm (0.68 to 0.74). The distribution of data points is more concentrated after the residual correction than the downscaling results before the residual correction, which indicates that a residual correction can significantly improve the downscaling results of soil moisture data.   Table 3 shows the accuracy verification between the downscaled results and the observed in situ soil moisture. The results showed that the mean R 2 , root-mean-square error (RMSE), and mean absolute error (MAE) values between the original AMSR-E data and in situ soil moisture were 0.6018, 0.0131 m 3 /m 3 , and 0.0113 m 3 /m 3 , respectively. Furthermore, the results showed that the mean R 2 , RMSE, and MAE values between the downscaled results based on the random forest and the in situ soil moisture was 0.7819, 0.0090 m 3 /m 3 , and 0.0076 m 3 /m 3 , which was better than that of the downscaled results that were based on Cubist (R 2 = 0.6722, RMSE = 0.0128 m 3 / m 3 , and MAE = 0.0111 m 3 /m 3 ). The two developed downscaling methods in this study can improve not only the spatial resolution of the remote sensing AMSR-E data, but also the accuracy of the data. This study compares the random forest and Cubist algorithms to determine the most suitable method for downscaling soil moisture in the MLRYRB. Figure 14 illustrates the downscaled results of the two downscaling methods. Similarly, we set the transparency according to the density of the points: the darker the plot, the more data points there are, while the lighter areas indicate fewer data points. Evidently, the correlation between the downscaling results based on the random forest method and the AMSR-E soil moisture (R 2 = 0.71) is better than that between the results that are based on the Cubist algorithm and the AMSR-E soil moisture (R 2 = 0.70). The data points in Figure 14a are more densely concentrated around the 1:1 line than are those in Figure 14b, while the data points of the Cubist algorithm (Figure 14b) are more dispersed; however, in both panels, there are some outliers below the 1:1 line, although the number of points is small. This is related to the quality of the MODIS images, because clouds cover some images, though other reasons can also lead to noise. To further analyze the performance of the two methods, their accuracies were tested while using three evaluation indicators, namely, R 2 , the root-mean-square error (RMSE), and the mean absolute error (MAE); Table 4 shows the results. The results show that the R 2 , RMSE, and MAE values between the soil moisture data downscaled by the random forest method and the original AMSR-E soil moisture are 0.7045, 0.0155, and 0.0096, respectively, while those between the downscaling results that are based on the Cubist algorithm and the AMSR-E soil moisture are 0.6884, 0.0162, and 0.0010, respectively. The R 2 values that are based on the random forest downscaling method are 0.011, which is higher than those based on the Cubist algorithm, while the RMSE is reduced by 0.0006, and the MAE is reduced by 0.0014. The accuracies of the two downscaling methods are verified through a box plot to more intuitively compare the performance of the two downscaling methods (Figure 15). Figure 15 illustrates that the R 2 , RMSE, and MAE of the soil moisture data obtained by the random forest-based downscaling method are better than those that were obtained by the Cubist algorithm, which indicated that the former model is a more suitable AMSR-E soil moisture downscaling method than the latter in the MLRYRB. This finding also provides a theoretical basis for obtaining high spatial resolution soil moisture data. To further analyze the performance of the two methods, their accuracies were tested while using three evaluation indicators, namely, R 2 , the root-mean-square error (RMSE), and the mean absolute error (MAE); Table 4 shows the results. The results show that the R 2 , RMSE, and MAE values between the soil moisture data downscaled by the random forest method and the original AMSR-E soil moisture are 0.7045, 0.0155, and 0.0096, respectively, while those between the downscaling results that are based on the Cubist algorithm and the AMSR-E soil moisture are 0.6884, 0.0162, and 0.0010, respectively. The R 2 values that are based on the random forest downscaling method are 0.011, which is higher than those based on the Cubist algorithm, while the RMSE is reduced by 0.0006, and the MAE is reduced by 0.0014. The accuracies of the two downscaling methods are verified through a box plot to more intuitively compare the performance of the two downscaling methods (Figure 15). Figure 15 illustrates that the R 2 , RMSE, and MAE of the soil moisture data obtained by the random forest-based downscaling method are better than those that were obtained by the Cubist algorithm, which indicated that the former model is a more suitable AMSR-E soil moisture downscaling method than the latter in the MLRYRB. This finding also provides a theoretical basis for obtaining high spatial resolution soil moisture data.

Conclusions
In this study, we choose the random forest model and Cubist algorithm to downscale AMSR-E soil moisture from 25 km to 1 km in the MLRYRB; for this task, the longitude, latitude, elevation, slope, NDVI, LST_D, LST_N, albedo, ET, LC, and aspect are selected as the environmental variables. Moreover, the random forest model and Cubist algorithm are compared and analyzed. The main conclusions can be summarized, as follows: (1) Based on the random forest model, we downscale the AMSR-E soil moisture from 25 km to 1 km in the MLRYRB. The results show that the random forest downscaling method is strongly applicable in the MLRYRB, and the downscaled results after a residual correction have more details and they are more representative of the spatial distribution of soil moisture.
(2) The R 2 between the downscaling results that are based on the Cubist downscaling algorithm after the residual correction and the original AMSR-E soil moisture values range from 0.68 to 0.71,

Conclusions
In this study, we choose the random forest model and Cubist algorithm to downscale AMSR-E soil moisture from 25 km to 1 km in the MLRYRB; for this task, the longitude, latitude, elevation, slope, NDVI, LST_D, LST_N, albedo, ET, LC, and aspect are selected as the environmental variables. Moreover, the random forest model and Cubist algorithm are compared and analyzed. The main conclusions can be summarized, as follows: (1) Based on the random forest model, we downscale the AMSR-E soil moisture from 25 km to 1 km in the MLRYRB. The results show that the random forest downscaling method is strongly applicable in the MLRYRB, and the downscaled results after a residual correction have more details and they are more representative of the spatial distribution of soil moisture.
(2) The R 2 between the downscaling results that are based on the Cubist downscaling algorithm after the residual correction and the original AMSR-E soil moisture values range from 0.68 to 0.71, which is lower than the range of the R 2 between those that are based on the random forest downscaling algorithm and original AMSR-E soil moisture data (0.68 to 0.74).
(3) A comparison between the random forest model and the Cubist algorithm reveals that the R 2 of the random forest-based downscaling method is higher than that of the Cubist algorithm-based downscaling method by 0.0161; moreover, the RMSE is reduced by 0.0006 and the MAE is reduced by 0.0014. Furthermore, testing the accuracies of the two downscaling methods reveals that the random forest model is a more suitable method than the Cubist algorithm for downscaling AMSR-E soil moisture data from 25 km to 1 km in the MLRYRB, and this finding provides a theoretical basis for obtaining high spatial resolution soil moisture data.