Downscaling Satellite Soil Moisture Using a Modular Spatial Inference Framework

.


Introduction
The top layer of soil is critical for the root system of plants and the available water that sustains most of the vegetation and controls many soil processes.Due to its importance, soil moisture has been recognized as an Essential Climate Variable [1], and in conjunction with variables, such as land cover, is critical in shaping Earth system dynamics.Soil moisture importance relies not only on its role within the water cycle, but also on its relationship with other ecological processes, such as runoff generation, sediment transport and energy balance [2][3][4], drought occurrence [5,6], plant and soil respiration [7][8][9], regulation of greenhouse gas fluxes from soils to the atmosphere [10][11][12], and plant growth, which influences the terrestrial carbon budget [4,7,13].Water content in the top centimeters of the soil also serves as a retardant for wildfires, regulates runoff during extreme rain events, and provides information for flash floods and drought early warning systems [14][15][16][17].Additionally, soil moisture information is a key input for agricultural planning [6,18], regional stewardship [19], and multiple models used in weather forecasting or climate variability and change [20][21][22].
Traditionally, soil moisture information was acquired from point measurements using instruments, such as Time-Domain Reflectometers (TDR), which offer instantaneous values of soil water content based on information of electric and dielectric properties within a small volume of soil [23].However, the availability of soil moisture data from these ground sensors across large areas is often limited [24,25].At the global scale, the International Soil Moisture Network [26,27] provides ground-truth information, and within the United States, the Soil Climate Analysis Network (SCAN) [28] and the North American Soil Moisture Database (NASMD) [29] provide soil moisture information derived from ground sensors.However, due to large spatial and temporal variability in soil moisture, this information, although invaluable, is not enough to address multiple applications where detailed spatial and temporal variability in soil moisture is required.
To address the limited spatial coverage of ground-based soil moisture networks, alternative approaches can be applied to estimate soil moisture.Satellite-based sensors offer a feasible way to estimate soil moisture over large areas on a regular basis, ranging from 3 to ~36 km [30][31][32][33].Satellite sensors estimate soil moisture using radar instruments or radiometers, which are based on the dielectric constant and temperature emissivity of the soil, respectively [33,34].Various satellite sensors are used to estimate soil moisture, some specifically conceived for this purpose, such as SMAP (Soil Moisture Active Passive) [30] or SMOS (Soil Moisture and Ocean Salinity mission) [35], while others, such as the European Space Agency Climate Change Initiative (ESA-CCI) soil moisture [15], Sentinel [36] and GPS-aided values [37], can be used to indirectly derive soil moisture information.These satellite-based efforts aim to provide global soil moisture values at high temporal resolution (1~3 days).The ESA-CCI offers the longest available global records at the daily scale, beginning in November 1978, with improved accuracy since 1991 due to a combination of information from active and passive sensors [38].These efforts have provided unprecedented information, but they have two important limitations: they have coarse spatial resolution, and they have spatial and temporal gaps.
Various approaches have been used to downscale satellite-derived soil moisture values.These approaches can be categorized as (1) satellite-based, (2) geoinformation-based, and (3) model-based [39].Satellite-based approaches include various techniques, such as Active and Passive Microwave Data Fusion and Optical/Thermal and Microwave Fusion [39].Geoinformation-based methods have explored the known correlation of soil moisture with topography, soil attributes, and vegetation characteristics [39].Model-based methods include other approaches, such as statistical models, integration of a Land Surface Model, statistical downscaling, and data assimilation [39].
Here, we present a geoinformation-based approach, considering the relationship between soil moisture and topography to downscale and gap-fill satellite-based soil moisture information at the regional scale [39,40].Topography has been explored previously as a meaningful environmental variable for downscaling soil moisture at the catchment scale [41][42][43] and across the United States [44].We used a modular spatial inference framework, which is the foundation of a cyberinfrastructure tool named SOil Moisture SPatial Inference Engine (SOMOSPIE) [45][46][47].We tested the performance of two modeling methods coupled with geoinformation from terrain parameters to downscale satellite-derived soil moisture.Specifically, SOMOSPIE framework combines publicly available satellitederived soil moisture information to generate fine-grained and gap-free predictions (from 0.25 degrees (which is about 27 km) to 1 km) using different modeling methods: a kernelbased approach (Kernel-Weighted k-Nearest Neighbors (KKNN), and a tree-based approach (Random Forests or RF).
We tested our framework across two contrasting regions of interest (ROIs) within the conterminous United States at monthly and weekly time scales in 2010 and 1 km spatial resolution.We found that RF was consistently the method that performed better at the monthly and weekly scales when compared with the reference ESA-CCI data.In contrast, KKNN showed a slightly higher agreement with ground-truth information as part of independent validation.We postulate that differences in model performance are influenced by the multivariate space of topographic features, where more heterogeneous landscapes (i.e., high topographic variation) may be more challenging to downscale and predict soil moisture.Finally, we demonstrate that our framework is a flexible, transparent, and replicable approach to downscale satellite-derived soil moisture at different temporal scales.

Regions of Interest
Our study was conducted over two regions of interest (ROI) within the conterminous United States (CONUS; Figure 1a).Each region encompasses a polygon of 7.5° × 3.75° (450 pixels with 30 columns and 15 rows in the native resolution of the ESA-CCI soil moisture product), and each ROI was aligned to the original edges of the ESA-CCI grid.Both areas were selected as they offer a contrast in climatic and topographic conditions, and anthropogenic activities such as different agricultural and forestry practices.
The West region (Figure 1b) comprises an area of 275,516 km 2 with heterogeneous topographic features and a wide diversity of climate conditions ranging from the central valley of California in the West, passing through the densely forested areas in the Rocky Mountains, and water-limited ecosystems across California, Nevada, Utah, and Arizona.
The Midwest region (Figure 1c) comprises an area of 283,499 km 2 .This region lacks extensive mountainous areas (except for the Ouachita Mountains) and has a large influence of agricultural activity that strongly influences the dynamics of soil moisture.This region was also selected because of the extensive availability of ground-truth data [48] from the monitoring network MESONET [49], mainly over Oklahoma.

Satellite-Derived Soil Moisture Data
We use information from the ESA-CCI soil moisture product Version 6.1 (revised in September 2021) which is the latest release by ESA-CCI [50].ESA-CCI product merges daily data derived from C-band scatterometers (e.g., ERS-½ , METOP) and data from multi-frequency radiometers (e.g., SMMR, SSM/I, TMI, AMSR-E, Windsat, AMSR-2, SMOS, SMAP, GPM, and FengYun-3B) at 0.25 degrees spatial resolution [51].Based on daily soil moisture values, we calculated mean values for each pixel at the monthly and weekly scales for each ROI.Thus, obtaining 12 monthly layers and 52 weekly layers of mean soil moisture for the year 2010.

Terrain Parameters
Topographic information was derived from a digital elevation model (DEM) [52] and we extracted hydrologically meaningful terrain parameters for each ROI following a standardized approach [53].Briefly, an initial set of 15 terrain parameters was calculated using the terrain analysis module in RSAGA [54], which implements SAGA GIS [55] in R statistical platform [56].The original terrain parameters were: Aspect, Analytical Hillshading, Channel Network Base Level, Convergence Index, Cross Sectional Curvature, Catchment Area, Elevation, Flow Accumulation, Longitudinal Curvature, Length-Slope Factor, Relative Slope Position, Slope, Topographic Wetness Index, Valley Depth, and Vertical Distance to Channel Network.To reduce model complexity, identify the best prediction parameters, and avoid redundancy of information, we predicted soil moisture at 1 km over CONUS using different combinations of terrain parameters and geographic coordinates (i.e., latitude and longitude).This test was performed using a KKNN algorithm, combinations of the aforementioned predictors, and the ESA-CCI soil moisture annual mean of 2010 as the training dataset.Based on correlation and error values from cross-validation automatically performed during model training and evaluation, we identified the combination of predictors that best represented soil moisture reference values.Our results identified geographic coordinates (latitude and longitude) and 4 terrain parameters (elevation, aspect, slope, and topographic wetness index) as the best predictors for our study.Results of cross-validation from all the predictor combinations tested are included in Supplementary Material S1.

Data Used for Independent Validation
We validated downscaled soil moisture predictions using independent data from ground-truth soil moisture records from the North American Soil Moisture Database (NASMD).The NASMD integrates data from 33 observation networks, as well as 2 shortterm monitoring campaigns that put together over 1800 observation sites across the United States, Canada, and Mexico [29].We reiterate that data from the NASMD was not used for downscaling satellite-derived soil moisture, and only used for independent validation purposes.
We selected all the available stations for the year 2010 with daily records of soil moisture in the top 5 cm of the soil layer for the two ROIs.The maximum number of available stations within CONUS was 743 (Figure 2a), while a maximum of 39 stations were available for the West region (Figure 2b) and a maximum of 116 were available for the Midwest region (Figure 2c).The number of stations available at the monthly and weekly scales ranged from ~26 to 39 in the West region, and from ~110 to 116 in the Midwest region (Supplementary Material S2).Monthly and weekly means of top 5 cm soil moisture records were calculated for each field station, to generate the reference data to validate monthly and weekly downscaled soil moisture predictions.

Training Matrices
We generated a set of training matrices to obtain model parameters required by KKNN and RF.We selected the coordinates of the centroid of each original pixel (0.25 degrees) from the ESA-CCI product and assigned the soil moisture values to those coordinates.Then, we extracted the values of the 4 predefined terrain parameters at the finer resolution (1 km) that overlapped the ESA-CCI pixels centroids, and we added them to the training matrix.In each matrix, 70% of the available sampling points were randomly selected to conform the training dataset to build the models, and the 30% of remaining sampling points were set aside for further validation of models' outputs.
Our final training matrices represent 12 monthly and 52 weekly files for each ROI, containing up to 315 records (70% of the maximum number of pixels available for each ROI that included soil moisture values and 6 predictors (4 terrain parameters, and latitude and longitude values)).

Prediction Matrices
We generated one matrix for each ROI to predict soil moisture at 1 km spatial resolution.We extracted all available records of the 4 predefined terrain parameters (predictors) at 1 km and added their corresponding coordinates to the prediction matrices.We integrated a total of 273,840 point locations into each of the two final prediction matrices; this number corresponds to the extension of the two ROIs in square kilometers, encompassing areas of 652 km (X-axis) by 420 km (Y-axis; Figure 1).

Downscaling Soil Moisture
We used the modular framework of SOMOSPIE to predict soil moisture on a userdefined temporal (e.g., daily, monthly, annual) and spatial resolution (i.e., spatial granularity) to provide gap-free information within an ROI.The SOMOSPIE framework is composed of three main modules that include (1) preprocessing data from: satellite-derived soil moisture, predictive terrain parameters in the target resolution for downscaling (e.g., 1 km spatial resolution), and ground-truth reference data for independent validation purposes; (2) model construction: definition of optimal parameters for each modeling method (i.e., KKNN, RF); and (3) soil moisture prediction: application of model parameters defined in the previous module to predict soil moisture at the target resolution, as well as cross-validation and independent ground-truth validation (Figure 3).
We implemented our framework with two modeling methods (i.e., Kernel-Weighted K-Nearest Neighbors (KKNN) and Random Forest (RF)) to downscale soil moisture at 1 km over the two ROIs at monthly and weekly scales.We used the cloud-based cluster "Caviness" at the University of Delaware High Performance Computing (HPC) [57].Caviness is a distributed-memory Linux cluster with 126 compute nodes representing 4536 cores with 24.6 TiB of RAM and 200 TB of storage.K-nearest neighbors (KKNN) in its traditional form is a regression technique that builds many simple models from local data [58], and is based upon decision rules that classify an unsampled point, based on the values of the nearest set of previously classified points or reference values in the sampling space [59].This method assumes a different level of influence in the prediction space, where the nearest k-points to the target location are the ones with the most relevant influence, while the influence in the construction of the prediction model decreases with distance [45].To assign distance-related relevance to predict soil moisture, a weighted mean of the k-nearest soil moisture ratios is calculated.This variant is based on the definition of kernel functions (i.e., Triangular, Epanechnikov, Gaussian, Optimal) that serve to find the number of neighbors (k) to be used in the prediction.The number of neighbors and the optimal kernel function are automatically selected through 10-fold cross validation [44,45].
The KKNN code used in the SOMOSPIE framework has been described previously [45] and has been successfully used to downscale satellite-derived soil moisture at different spatial scales [44].The code is based on the 'kknn' package [60] developed for the Rstatistical platform [56].The definitions of optimal parameters found for each monthly and weekly layer in 2010, over the two ROIs, are shown in Supplementary Material S2.

Random Forest (RF)
Random Forest (RF) in the SOMOSPIE framework has been described previously [45] and is based on the 'quantregForest' package [61] developed for the R-statistical platform [56].It is based on an ensemble of decision trees through a "bootstrap aggregation" process (bagging), which is a method to generate multiple versions of a predictor and then uses these versions to generate an aggregated predictor that depends on the values of a random vector independently sampled and weighed [62,63].To predict values at an unsampled location, all decision trees in the ensemble are queried and their prediction outputs are combined through a weighted arithmetic mean.Techniques such as RF do not assume any particular geometric or functional form of the model and are suitable for sampling spaces with sparse data [45].
The definition of optimal parameters for soil moisture prediction with RF in SO-MOSPIE considers two main values: (1) the number of trees to grow in the ensemble of regression trees and (2) the number of covariates randomly selected at each level of tree growth.The maximum number of trees allowed was 500, while the number of covariates changes in relation to the number of predictors defined as input (6 predictors for this study: latitude, longitude, elevation, aspect, slope, and topographic wetness index).The automatic variable selection is performed by 'quantregForest' through a cross-validation process.The optimal parameters selected for each monthly and weekly layer of 2010 over the two ROIs are reported in Supplementary Material S2.

Validation
To test the two modeling methods (i.e., KKNN and RF), we first used cross-validation with reference satellite-derived soil moisture data not used in the construction of the models, and then we used independent ground-truth soil moisture from the NASMD.We reiterate that the NASMD data was not used to parameterize any model and was only used for independent validation.Predicted soil moisture values were extracted from the 12 monthly and 52 weekly layers over the two ROIs, taking overlapping locations with the centroids of the ESA-CCI soil moisture reference data, and the point-locations of the NASMD available stations for each month and week, respectively.

Cross-Validation with Reference Satellite-Derived Soil Moisture Data
We calculated the correlation and root mean square error (RMSE) values based on matrices containing the predicted and reference values (from ESA-CCI data).The input data for this validation approach corresponds with the 30% of the sampling points set aside during the generation of the training matrices and were not used in the definition of the models' parameters.The cross-validation data matrices contained up to 135 records, depending on the number of available reference points from the ESA-CCI mean values for each month and week.
The values of each predicted soil moisture pixel at a finer spatial resolution (i.e., 1 km) were compared with the reference values of satellite-derived soil moisture values at their original spatial resolution.The results from these analyses for each month and week over the two ROIs are reported in Supplementary Material S3.

Independent Validation with Ground-Truth Data
For these independent analyses, we calculated the overall correlation and RMSE between the predicted downscaled values from each method with the point-based groundtruth data from the NASMD.The results of correlation and RMSE between fine spatial resolution predicted soil moisture values and the point-based ground-truth data for each month and week over the two ROIs are reported in Supplementary Material S3.

Spatial Distribution of Prediction Outputs and Errors
To evaluate the performance of the two methods, we compared the mean values of all monthly and weekly predictions (12 monthly and 52 weekly outputs) in the two ROIs.We generated maps showing the mean values of ESA-CCI values at 0.25 degrees of spatial resolution and the mean values of our 1 km predictions over the set of 30% sampling points set aside for testing in each monthly and weekly scale.Thus, none of the points used in this approach to describe the spatial distribution of error were used to define the models' parameters.We calculated the absolute difference between the mean of predicted soil moisture and the mean of ESA-CCI values at all our monthly and weekly scales over all the centroid coordinates of the ESA-CCI pixels.In a similar approach for all monthly and weekly scales, we calculated the absolute difference between the mean predicted soil moisture at 1 km and the mean values of the point-scale ground-truth records at the coordinates of all available NASMD stations during our time frame.Thus, we aim to observe the similarities in the spatial distribution between ESA-CCI data and the outputs of the two methods tested, as well as the distribution of the prediction errors.

Results
In this section, we present our 1 km soil moisture prediction results and evaluate the performance of the two methods used.We compared the predicted soil moisture values with the reference ESA-CCI values, and with independent values from the NASMD.The final soil moisture predictions at monthly and weekly scales over the two ROIs are available at the Consortium of Universities for the Advancement of Hydrologic Science data repository (HydroShare; doi:10.4211/hs.96eeb0d796a64b578f24e8154c166988)[64].

Optimal Model Parameters for Each Method
In the case of KKNN, we found that the automatic generation of model parameters defined a number of K-neighbors between 6 and 29 in the Midwest ROI for all models at monthly and weekly scales.Correlation ranged from 0.489 to 0.894, and RMSE from 0.03 to 0.046.In the West ROI, the number of K-neighbors ranged from 3 to 49, with correlation from 0.244 to 0.785, and RMSE from 0.025 to 0.055.
In the generation of RF models, we found that the number of covariates used as predictors in every model in the Midwest ROI ranged from two to six (out of six possible predefined predictors for this study).Correlation ranged from 0.537 to 0919, and RMSE from 0.028 to 0.043.In the West ROI, the number of covariates ranged from two to six.Correlation ranged from 0.413 to 0.833, and RMSE from 0.023 to 0.047.
All individual KKNN and RF models' parameters are included in Supplementary Material S2.

Evaluation of Models' Outputs
To evaluate the performance of each method tested, we present a series of Taylor Diagrams [65] that show the similarity of our predictions with both data from the ESA-CCI soil moisture values and independent ground-truth records from the NASMD.Taylor diagrams quantify the correspondence between reference observed data and predicted values by means of Pearson correlation coefficient, RMSE and the standard deviation.

Evaluation with Reference Satellite-Derived Soil Moisture Values
We found that RF was consistently the best method in predicting monthly soil moisture when compared against the reference values from the ESA-CCI values (Figure 4).RF correlation and RMSE values ranged from 0.566 to 0.856, and from 0.027 to 0.037, respectively, in the Midwest ROI.In the West ROI, RF correlation and RMSE values ranged from 0.443 to 0.78, and from 0.023 to 0.056, respectively.Regardless of the ROI, values predicted with RF showed the highest correlation and the lowest RMSE in every month, except in January in the West ROI.
Predictions with KKNN showed a consistent lower prediction performance than RF, with monthly correlation and RMSE values ranging from 0.508 to 0.844 and, 0.028 to 0.037, respectively, in the Midwest ROI.KKNN correlation and RMSE values in the West ROI ranged from 0.405 to 0.712 and from 0.023 to 0.054, respectively.
Similar to monthly predictions, we report the weekly performance of the two methods tested, grouping 52 weeks into four 3-month periods (Figure 5).Like monthly predictions, RF consistently showed better performance in all 3-month periods and in both ROIs.Correlation and RMSE values with RF ranged from 0.764 to 0.846, and 0.031 to 0.033, respectively, in the Midwest ROI, and from 0.634 to 0.785, and 0.026 to 0.041 in the West ROI.In contrast, correlation and RMSE values with KKNN in the Midwest region ranged from 0.726 to 0.823, and 0.033 to 0.036, while in the West ROI, these values ranged from 0.555 to 0.746, and 0.028 to 0.043, respectively.
All correlation and RMSE values shown in Figures 4 and 5 are included in Supplementary Material S3.

Evaluation with Independent Ground-Truth Information
In Figure 6, we show the results of independent validation of monthly soil moisture predictions with ground-truth information from the NASMD.In the Midwest ROI, a similar correspondence between our predicted values and the reference data in all months was clear, except in August, where the ESA-CCI reference better corresponded with ground-truth records.Although the correlation and RMSE values for our two methods are consistently clustered in Figure 6a, RF showed a better correspondence with groundtruth data, and it was closer to the correlation and RMSE values of the reference satellitederived values.A similar prediction performance was obtained for the West ROI (Figure 6b), where RF had consistently better agreement with the ground-truth reference data.However, the general agreement between ground-truth data, the reference satellite derived data and the models' outputs was evidently lower in the West ROI.
The reference satellite-derived data monthly correlation and RMSE values with the ground-truth data ranged from 0.331 to 0.  In the ground-truth validation of the weekly predictions (Figure 7), we found that the two methods showed similar correlation and RMSE values with ground truth data as the reference ESA-CCI in the Midwest ROI.Although there was not a clear pattern of better performance for either of the two methods tested, RF showed slightly better performance for the four 3-month periods in the Midwest ROI.In the West ROI, there was a consistent decrease in the correspondence between ground-truth data, our predictions, and the ESA-CCI values, although RF still showed a better performance in three of the four 3-month periods.

Spatial Distribution of Prediction Errors
As we display in Figure 8c,d for the Midwest ROI, the spatial patterns of soil moisture values exhibited a similar behavior as the reference ESA-CCI values (Figure 8b).Similar to the ESA-CCI, the lowest soil moisture values were distributed over the west part of the ROI, and highest values over the east section.Low values were also consistent in the south-central portion, and high values in the central-north.The absolute differences between the 30% of sampling points set aside for testing in all layers derived from ESA-CCI values at 0.25 degrees and their spatially correspondent predicted soil moisture values in all layers at 1 km using the two methods tested are shown in Figure 8e,f.Difference values were distributed between 0 and 0.03 for both methods, with highest values in the western portion of the ROI.KKNN was the method with the lowest difference values over most of the ROI.In Figure 8g,h, we present the absolute differences between predicted soil moisture and ground-truth data.Difference values were constantly higher for the two methods in the Midwest ROI.Unlike the comparison between predicted soil moisture and reference ESA-CCI data, the performance of the two methods was similar when compared to ground-truth information.The lowest differences ranged between 0 and 0.04 m 3 m −3 , and the highest values were up to 0.14 m 3 m −3 .Although there was not a clear spatial distribution of the absolute differences, the distribution of low and high values was similar across the two methods.
Figure 9 shows the spatial distribution of soil moisture predicted values and absolute differences with ESA-CCI values, and ground-truth data in the West ROI.Similar to ESA-CCI soil moisture, the lowest predicted values were distributed from the south-center to the north-west of the ROI (Figure 9c,d).However, low soil moisture values described a pattern not as dry as in the ESA-CCI data (between 0.05 and 0.1 m 3 m −3 ).The highest predicted values with both methods were consistently located in two south-east to northwest lines, along the highest elevations of the Rocky Mountains and the central valley of California, ranging from 0.18 to 0.28 m 3 m −3 .Absolute differences between the 30% of test sampling points from ESA-CCI values at 0.25 degrees and their spatially correspondent prediction output values in all layers at 1 km in the West ROI can be observed in Figure 9e,f.Overall, the differences were consistently higher in the West ROI than in the Midwest ROI.The lowest difference values in the West ROI ranged between 0 and 0.045 m 3 m −3 , and highest values reached an absolute difference of 0.13 m 3 m −3 .Unlike the absolute differences shown in the Midwest ROI, in the West ROI, there was not a clear pattern in the spatial distribution of errors between ESCA-CCI and predicted values with our two methods.Absolute differences between predicted soil moisture and ground-truth data were consistently higher, regardless of the method used (Figure 9g,h).The distribution of the absolute differences across the locations with ground-truth data was similar for the two methods, although RF generally showed lower differences than KKNN.In contrast to the Midwest ROI, the absolute differences between predicted soil moisture and ground-truth information were significantly higher, ranging from 0.015 up to 0.21 m 3 m −3 .(e,f) spatial distribution of mean absolute differences between ESA-CCI sampling points at 0.25 degrees and their spatially correspondent predicted soil moisture values in all layers at 1 km with KKNN and RF; (g,h) spatial distribution of mean absolute differences between all monthly and weekly soil moisture values from NASMD and predicted values at 1 km using the two methods tested.

Discussion
Our work shows the performance of two methods within the SOMOSPIE framework for downscaling satellite-derived soil moisture values.We used two ROIs with different topographic and climatic characteristics to compare the performance of the framework.Given the limitations in obtaining field-based measurements of soil moisture over large areas, flexible and adaptable frameworks are alternatives to obtain spatially and temporally detailed information.The SOMOSPIE framework offers an alternative approach to downscale satellite-derived soil moisture and to traditional predictions based on simple extrapolation and interpolation using information from monitoring networks [14,66,67].
Our framework demonstrates that it is possible to obtain soil moisture across different spatial and temporal scales, in relation to the resolution of the predictors and the temporal availability of the input satellite data.In our work, we used 1 km terrain parameters as predictors, but this framework could be extended to use topographic information at different spatial resolutions as input for further predictions.It is known that topography has different levels of influence on the spatial distribution of soil moisture [39], as previous studies have explored the impact of terrain characteristics at watershed and regional scales [40,42,44,45,68], and here, we showed that terrain parameters are suitable predictors at the regional scale.Although other environmental covariates, such as soil texture, surface temperature, and vegetation characteristics, are known to be correlated with the spatial and temporal distribution of soil moisture [3,39,40,[69][70][71][72], these covariates did not offer significant advantages in our approach.First, soil texture is highly dependent on sitespecific conditions [69] rather than our regional approach, while surface temperature and vegetation features might introduce bias that would hinder the effect of using solely terrain parameters as downscaling predictors [44].
We identified that latitude and longitude values, along with Aspect, Elevation, and Topographic Wetness Index, were the most suitable parameters to predict soil moisture at 1 km when using the two proposed methods.This aligns with previous studies that identified similar terrain parameters as relevant factors to derive soil moisture based on their relation with lateral distribution of water in the surface soil layer [40,43,[73][74][75][76].In general, we obtained better results with both algorithms in the Midwest ROI, where topographic characteristics are more homogenous than in the West ROI, with more complex terrain.Additionally, we saw similar patterns of soil moisture spatial distribution across coarse and fine scales, supporting previous work in downscaling satellite-derived soil moisture that found that spatial variability agrees with landscape heterogeneity [77].We highlight that there is increasing evidence on how terrain parameters are useful for modeling soil moisture [39,74], but other environmental factors, such as precipitation, temperature, land cover, and soil properties [69,70,78], should be considered across different scenarios.
The SOMOSPIE framework takes advantage of daily values from the ESA-CCI soil moisture product, being able to predict soil moisture at different temporal scales (e.g., monthly, weekly).The comparison of predicted soil moisture across different periods helps to identify any temporal biases or patterns related to different environmental conditions throughout the year and identify emerging relationships with environmental factors at different points during wet-up and dry-down cycles [79,80].In autumn and spring, topography becomes a more relevant indicator, whereas its importance decreases during summer and winter due to the influence of evapotranspiration, as well as extensive saturation and porosity control, respectively [74].This might support the lower prediction performance observed during January and February in the West ROI, where topography plays a more important role in the spatial variability.Additionally, several studies have shown that more homogenous patterns of satellite-derived soil moisture occur under dry conditions, leading to an improved accuracy in satellite retrievals [81,82].In this regard, the higher prediction accuracy we observed in the Midwest ROI might be linked to a lower retrieval error from ESA-CCI.This contrasts with the prediction accuracy in the West ROI, which might be impacted by a higher retrieval error of ESA-CCI, linked to more heterogeneous environmental conditions.
In general, we found that RF performed better at the monthly and weekly scales across both ROIs.This could be explained because this technique does not assume any particular geometric or functional form of the model.Furthermore, it is suitable in sampling spaces with sparse data [45], such as satellite-derived soil moisture in a coarse resolution, where the distance between pixels' centroids yields substantial separation between data points.In contrast, although KKNN showed a lower prediction performance than RF, this technique still offers advantages for soil moisture downscaling in other regions with high density of sample points based on its ability to build many simple models when more data are available [59].
We observed that the two methods tested showed a similar correspondence to ground-truth information as the original ESA-CCI values in most of the monthly and weekly periods in our experiments.However, KKNN predictions showed a slightly better correspondence with ground-truth information in comparison with RF (values reporting the absolute correlation and RMSE differences between ground-truth information and ESA-CCI, as well as ground-truth and KKNN and RF outputs, are presented in Supplementary Materials S3).Differences in correlation and RMSE values between the two ROIs might be related to the sparse and uneven spatial distribution of available ground-truth stations in the West region (Figure 2).Previous studies found that the optimal number of ground-truth points for validating satellite-derived soil moisture products ranges from 10 to 20 per pixel [75], which is far from the desirable distribution of field stations available in the West ROI.
Although our work aimed at identifying the effect of terrain parameters in downscaling satellite-derived soil moisture information, other parameters, such as surface temperature, vegetation indexes, surface albedo, land cover, and rainfall, have been widely considered in previous research [3,39,40,71,72,75,83,84] and represent an opportunity to evaluate the flexibility of the SOMOSPIE framework.

Conclusions
Based on our analysis, we conclude that there is no "best" method that can be defined for every place in the world, as different methods perform differently in each ROI.As has been acknowledged in previous research, different downscaling methods have their own applicability under certain purposes, closely linked to differences in surface and climate conditions, and every method must be calibrated before its implementation elsewhere [39].Thus, we believe that SOMOSPIE is a flexible framework that should include the methods tested in our work but is able to expand to incorporate additional methods to be tested in other regions around the world.
Despite the advantages of modeling techniques, such as KKNN and RF, in predicting soil moisture at a fine spatial resolution, it is also important to consider the computational resources needed when selecting these methods.When the ROI does not represent a large number of locations where soil moisture will be predicted, the two methods can be applied with no major challenges, but when the sampling space surpasses hundreds of thousands of locations, the selection of the modeling method and the use of computational resources become more important.The understanding of suitable cyberinfrastructure to work with more extensive regions and soil moisture predictions at finer spatial scales (e.g., 100 m, 30 m), along with the implementation of additional modeling methods in SO-MOSPIE, is still being addressed through current efforts.
Our research contributes an alternative approach for downscaling satellite-derived soil moisture using a modular spatial inference framework.Here, we tested two methods, but the framework is flexible so multiple algorithms can be included [58,85].Additional efforts to improve the SOMOSPIE framework include developing a containerized environment that will facilitate the deployment and management of the entire workflow in High-Performance Computing (HPC) or cloud environments [86].

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10.3390/rs14133137/s1,Supplementary Materials S1: Selection of most relevant terrain parameters used as predictors to estimate soil moisture at 1 km spatial resolution over the conterminous United States.Refs.[44,45,52,[54][55][56][57][58][59][60]87] are cited in the Supplementary Materials S1.Supplementary Materials S2: Number of North American Soil Moisture Database available stations in 2010 over the two regions of interest.Supplementary Materials S3: Cross-validation and ground-truth validation tables of monthly and weekly soil moisture predictions.
Author Contributions: R.M.L., L.V., M.T. and R.V. conceived and designed the research.R.M.L. and L.V. performed the experiments and analysis.R.M.L. wrote the first draft of the manuscript with input from L.V., P.O., M.T., and R.V. R.M.L. wrote the code for data and analysis visualization and made the cartographic edition of map figures.P.O.contributed to the optimization of analyses performance in cloud-based computing environments.All authors contributed to interpretation of the results, reviewed, and approved the manuscript.R.V. and M.T. supervised and coordinated the research team and managed funding acquisition.All authors have read and agreed to the published version of the manuscript.
Funding: This study was funded by a University of Delaware Strategic Initiative research grant and the NSF (OAC grants #2103854 and #2103836 "Software Ecosystem for kNowledge diScOveRY-a data-driven framework for soil moisture applications").

Figure 2 .
Figure 2. (a) North American Soil Moisture Database (NASMD) stations over the two ROIs available in 2010; (b) West ROI; and (c) Midwest ROI.

Figure 4 .
Figure 4. Taylor diagrams showing cross-validation between monthly 1 km predicted soil moisture and ESA-CCI reference data; (a) monthly cross-validation of the Midwest ROI; (b) monthly crossvalidation of the West ROI.

Figure 5 .
Figure 5.Taylor diagrams showing cross-validation between weekly 1 km predicted soil moisture and ESA-CCI reference data, the 52 weekly predictions are grouped in four 3-month periods; (a) weekly cross-validation of the Midwest ROI; (b) weekly cross-validation of the West ROI.
637 and 0.054 to 0.07 in the Midwest ROI, and from −0.953 to 0.272, and 0.078 to 0.167 in the West ROI, respectively.Monthly RF correlation and RMSE values in the Midwest ROI ranged from 0.216 to 0.55, and 0.052 to 0.073, while in the West ROI, these values ranged from −0.194 to 0.279, and 0.079 to 0.137, respectively.KKNN consistently showed the lowest correspondence with ground-truth data, except in October in the West ROI.KKNN correlation and RMSE values ranged from 0.3 to 0.603, and 0.051 to 0.069 in the Midwest ROI, and from −0.173 to 0.259, and 0.077 to 0.147 in the West ROI.

Figure 6 .
Figure 6.Taylor diagrams showing validation between monthly 1 km predicted soil moisture and ESA-CCI values, and ground-truth data from the NASMD; (a) monthly ground-truth validation of the Midwest ROI; (b) monthly ground-truth validation of the West ROI.
For weekly validation, ESA-CCI reference values exhibited the best correspondence with ground-truth data, with correlation and RMSE values ranging from 0.46 to 0.53, and 0.064 to 0.07 in the Midwest ROI, and from −0.195 to 0.166, and 0.097 to 0.132 in the West ROI.RF correlation and RMSE values ranged from 0.445, to 0.46, and 0.062 to 0.071 in the Midwest ROI, and from −0.041 to 0.158, and 0.091 to 0.126 in the West ROI.KKNN correlation and RMSE values, ranged from 0.464 to 0.494, and 0.06 to 0.069 in the Midwest ROI, and −0.077 to 0.154, and 0.09 to 0.126 in the West ROI.All correlation and RMSE values shown in Figures 6 and 7 are included in Supplementary Material S3.

Figure 7 .
Figure 7. Taylor diagrams showing validation between weekly 1 km predicted soil moisture and ESA-CCI values, and ground-truth data from the NASMD, the 52 weekly layers are grouped in four 3-month periods; (a) weekly ground-truth validation of the Midwest ROI; (b) weekly ground-truth validation of the West ROI (correlation and RMSE values in the week 1 to 13 period were consistently negative and values are described in Section 3.2.2).

Figure 8 .
Figure 8.(a) Midwest ROI and distribution of NASMD stations throughout 2010; (b) mean soil moisture values of 12 monthly and 52 weekly layers based on the reference ESA-CCI values at 0.25 degrees of spatial resolution; (c,d) mean values of 1 km soil moisture predictions with KKNN and RF;(e,f) spatial distribution of mean absolute differences between ESA-CCI sampling points at 0.25 degrees and their spatially correspondent predicted soil moisture values in all layers at 1 km with KKNN and RF; (g,h) spatial distribution of mean absolute differences between all monthly and weekly soil moisture values from NASMD and predicted values at 1 km using the two methods tested.

Figure 9 .
Figure 9. (a) West ROI and distribution of NASMD stations throughout 2010; (b) mean soil moisture values of 12 monthly and 52 weekly layers based on the reference ESA-CCI values at 0.25 degrees of spatial resolution; (c,d) mean values of 1 km soil moisture predictions with KKNN and RF; (e,f) spatial distribution of mean absolute differences between ESA-CCI sampling points at 0.25 degrees and their spatially correspondent predicted soil moisture values in all layers at 1 km with KKNN and RF; (g,h) spatial distribution of mean absolute differences between all monthly and weekly soil moisture values from NASMD and predicted values at 1 km using the two methods tested.