Next Article in Journal
Identifying Winter Wheat Using Landsat Data Based on Deep Learning Algorithms in the North China Plain
Previous Article in Journal
Editorial for the Special Issue Entitled Hyperspectral Remote Sensing from Spaceborne and Low-Altitude Aerial/Drone-Based Platforms—Differences in Approaches, Data Processing Methods, and Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning Downscaling of SoilMERGE in the United States Southern Great Plains

1
Center for Earth and Environmental Studies, Texas A&M International University, Laredo, TX 78041, USA
2
School of Engineering, Texas A&M International University, Laredo, TX 78041, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(21), 5120; https://doi.org/10.3390/rs15215120
Submission received: 6 October 2023 / Revised: 18 October 2023 / Accepted: 21 October 2023 / Published: 26 October 2023
(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)

Abstract

:
SoilMERGE (SMERGE) is a root-zone soil moisture (RZSM) product that covers the entire continental United States and spans 1978 to 2019. Machine learning techniques, Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Gradient Boost (GBoost) downscaled SMERGE to spatial resolutions straddling the field scale domain (100 to 3000 m). Study area was northern Oklahoma and southern Kansas. The coarse resolution of SMERGE (0.125 degree) limits this product’s utility. To validate downscaled results in situ data from four sources were used that included: United States Department of Energy Atmospheric Radiation Measurement (ARM) observatory, United States Climate Reference Network (USCRN), Soil Climate Analysis Network (SCAN), and Soil moisture Sensing Controller and oPtimal Estimator (SoilSCAPE). In addition, RZSM retrievals from NASA’s Airborne Microwave Observatory of Subcanopy and Surface (AirMOSS) campaign provided a nearly spatially continuous comparison. Three periods were examined: era 1 (2016 to 2019), era 2 (2012 to 2015), and era 3 (2003 to 2007). During eras 1 and 2, RF outperformed XGBoost and GBoost, whereas during era 3 no model dominated. Performance was better during eras 1 and 2 as opposed to the pre-L band era 3. Improvements across all eras, regions, and models realized from downscaling included an increase in correlation from 0.03 to 0.42 and a decrease in ubRMSE from −0.0005 to −0.0118 m3/m3. This study demonstrates the feasibility of SMERGE downscaling opening the prospect for the development of a long-term RZSM dataset at a more desirable field-scale resolution with the potential to support diverse hydrometeorological and agricultural applications.

1. Introduction

Satellite-derived soil moisture products, such as Soil Moisture Active Passive (SMAP) and Soil Moisture and Ocean Salinity (SMOS), have revolutionized hydrological and agricultural studies by providing estimates of surface soil moisture (SM) worldwide [1,2]. However, their use is hindered by their coarse spatial resolution (0.25–0.50 degrees) and their lack of ability to penetrate below the surface skin or the top 5 cm layer. While products like the European Space Agency Climate Change Initiative (ESA-CCI) have blended satellite retrievals to enhance observation frequency [3], there remains a pressing need for higher spatial resolution data to support hydrological and ecological applications [4] and accurate drought monitoring [5]. To bridge this gap, advanced machine learning (ML) techniques have emerged as promising tools for downscaling satellite-based soil moisture data. ML offers advantages in handling large and noisy datasets from dynamic and non-linear systems [6,7,8,9,10]. In recent years, studies have successfully employed ML algorithms such as Random Forest (RF), gradient boost decision tree, and eXtreme Gradient Boosting (XGBoost) to enhance spatial resolution and accuracy of surface soil moisture estimates [11,12,13,14,15,16]. Notable improvements have included increased correlations and decreased unbiased root mean square error [11,14,15,16]. While surface soil moisture retrievals are valuable, the scientific community increasingly recognizes the importance of deeper root-zone soil moisture (RZSM) data, as it directly influences agricultural productivity and groundwater interactions. However, directly sensing RZSM remains a challenge. Efforts like NASA’s SigNals of Opportunity: P-band Investigation (SNoOPI) plan to use the penetrating P-band to retrieve RZSM, but at present, RZSM is often inferred from surface measurements. Two common approaches used to estimate RZSM include the Ensemble Kalman Filter [17] and Exponential Filter [18,19], which, while useful, are not as robust as having direct retrievals of soil moisture.
In this context, the SoilMERGE (SMERGE) product stands as a notable endeavor, covering the continental United States and spanning multiple decades (1979 to 2019) [20]. Like other products, SMERGE faces spatial resolution constraints (0.125 degrees) and is available at a daily time step. SMERGE provides an overall estimate of RZSM between 0 to 40 cm and is based on the fusion of NLDAS Noah-2 land surface model output with surface satellite retrievals from the European Space Agency (ESA) Climate Change Initiative (CCI). In this study, we address the following critical questions related to the downscaling of SMERGE: (1) What is the most optimal ML technique and (2) what downscaling resolution provides the most robust results? To answer these questions, we explore three ML approaches and compare downscaled SMERGE with diverse datasets, including in situ measurements and airborne radar estimates of RZSM from the Marena Oklahoma Soil Moisture Active Passive In Situ Testbed (MOISST) site associated with NASA’s Airborne Microwave Observatory of Subcanopy and Surface (AirMOSS) campaign [21]. The analysis spans three distinct eras (2016 to 2019, 2012 to 2015, and 2003 to 2007) and aims to achieve finer spatial resolution for SMERGE, thereby enabling more accurate RZSM estimation, which can be used to support diverse applications.

2. Study Areas

Figure 1 provides an overview of this study’s focus areas within north-central Oklahoma and south-central Kansas. This is a location where SMERGE exhibited robust performance [20] making it an ideal candidate to explore downscaling. Rectangular areas in Figure 1 reflect zones where SMERGE was downscaled. During era 1 (2016 to 2019), two regions (in red) associated with the United States Department of Energy Atmospheric Radiation Measurement (ARM) observatory were examined (ARM_1_1, ARM_1_2). The naming convention specifies Network_Era_Region where ARM_1_1 represents ARM era 1, region 1. Data from MOISST and Soil moisture Sensing Controller and oPtimal Estimator (SoilSCAPE) comprise era 2 observations (2012 to 2015), which are indicated in blue. Excessive missing data precluded the use of ARM during era 2. During era 3 (2003 to 2007; in black), ARM was divided into four distinct zones (ARM_3_1, ARM_3_2, ARM_3_3, and ARM_3_4). Table 1 indicates the ARM stations utilized in each region. The location of United States Climate Reference Network (USCRN; Stillwater 2W, Stillwater 5WNW) and Soil Climate Analysis Network (SCAN; Abrams) sensors are also indicated in Figure 1.
Table 2 provides an overview of the physical characteristics of these focus areas. Clay, silt, and sand values exhibit great variability in all areas. On average, all areas have a similar soil texture with near-equal clay, silt, and sand values representing an overall loamy texture. Elevations generally define a moderate relief within the regions. ARM_1_1, ARM_1_2, ARM_3_3, and MOISST have average elevations of less than 400 m, whereas ARM_3_1, ARM_3_2, ARM_3_4, and SoilSCAPE have higher elevations. In terms of land use and land cover (LULC), herbaceous and cultivated crops dominate all examined areas except for ARM_3_3 (Table 2), which, instead of cultivated crops, has significant hay/pasture and deciduous forest.
Table 3 describes the meteorological conditions within the study areas. Values are generally consistent. Lower spring and fall temperatures were recorded during era 2, as represented by MOISST and SoilSCAPE, when compared against eras 1 and 3. A pronounced drying trend from east to west is present in the study area. This is reflected by the higher warm season (April to October) precipitation values present in ARM_3_3, which is further east than the other ARM era 3 regions. Additionally, during era 2, MOISST had higher precipitation values than SoilSCAPE, which is located to the west.

3. Methodology

This study’s methodologies were implemented in five steps that include: (1) Gathering data; (2) selecting dates to use for validation; (3) executing machine learning downscaling; (4) evaluation of each model run using objective metrics; and (5) performance comparison between different models and spatial resolutions.

3.1. Data Gathering

RZSM data used in this study includes SMERGE version 2.0 (Table 4). The downscaling approach described below incorporated independent variables (Table 4) that are both static (soil texture, elevation, aspect, slope) and dynamic (Normalized Vegetation Difference Index—NDVI, albedo, Leaf Area Index—LAI, surface temperature) following the approach of [13]. Note that a one-month lag was used for NDVI and LAI, consistent with the approach of [20]. The pre-processing of the above datasets was conducted in ArcGIS Pro, where values were extracted into points with a 30 m resolution. The ArcGIS AggregatePoints function was used to aggregate points into grid files at the different resolutions examined in this study.
For evaluation, both in situ and AirMOSS data were used. In situ data from the ARM, SCAN, SoilSCAPE, and USCRN networks were gathered from the International Soil Moisture Network portal [22], and hourly results were converted into a daily overall estimate of soil moisture, between 0 to 40 cm using a proportional weighting scheme. For example, a common configuration at ARM was to have sensors at 5 cm, 15 cm, 25 cm, and 35 cm, resulting in a corresponding weight for each sensor of 18.75%, 31.25%, 25.00%, and 25.00%, respectively. AirMOSS L2/3 RZSM retrievals using the University of Southern California algorithm were applied at MOISST as described by [21].

3.2. Date Selection and Data Organization

This study followed the approach of [13], which indicated that spatial variation is more important than temporal variation. With a focus on the warm season (April to October) dates were selected that were separated from each other by at least seven days. In situ data were also screened using the methods of [22], and dates with anomalous spikes and plateaus were rejected. For all in situ datasets, if there were fewer than ten available dates with valid measurements, then that site was not utilized. Also, if a site had multiple sensors, the sensor with the highest correlation with baseline or default version of SMERGE was used for analysis. In situ datasets with a correlation < 0.5 with default SMERGE were rejected.
At ARM, era 1 (2016 to 2019) and era 3 (2003 to 2007) were examined. Era 1 was divided into two regions, and era 3 consisted of four regions (Figure 1). At the ARM site, a total of 47 and 52 dates were examined during eras 1 and 3, respectively (Table 5). At the SoilSCAPE site, in situ estimates of RZSM were within a field scale cluster of sensors or nodes that spanned from 2012 to 2015 (era 2). This study focused on a small site near Canton, Oklahoma between latitude 36.000 to 36.003°N and longitude 98.628 to 98.633°W (Figure 1). Daily soil moisture measurements were obtained at depths of 4 cm, 13 cm, and 30 or 40 cm. These values were weighted to determine an overall estimate of RZSM in the top 40 cm. Seventy-nine dates were used for SMERGE downscaling (Table 5) at SoilSCAPE. Note that in situ data are missing for some of the selected dates. This was performed intentionally to provide more data to support ML analysis. Basically, the dataset used at SoilSCAPE was inflated temporally to compensate for this site’s small spatial footprint.
The AirMOSS project (2012 to 2015; era 2) directly sensed RZSM estimates using the P-band radar, which operates at a low frequency (420–440 MHz). RSZM is estimated up to a depth of 40 cm. This study focuses on the MOISST site. This site is unique in that it affords a continuous comparison with the downscaled SMERGE estimate. This is unlike the discrete in situ measurements from ARM and SoilSCAPE. Observations focused on the 13 SMERGE grids that had greater than 75% coverage of AirMOSS retrievals [23] (Figure 2a) and span latitude 36.000 to 36.250°N and longitude 97.000 to 98.125°W (Figure 1). The 22 dates with acceptable AirMOSS retrievals from MOISST are listed in Table 5.
During eras 1 and 2, in situ data from the Stillwater 2W and 5 WNW stations from USCRN and Abrams from SCAN were used for additional evaluation (Figure 1). Daily soil moisture measurements were obtained at depths of 5 cm, 10 cm, 20 cm, and 50 cm and were used to estimate RZSM between the surface and 40 cm depth using a proportional weighting scheme.

3.3. Machine Learning Implementation

Random Forest (RF) served as our baseline machine learning algorithm, which has been used to down-scale soil moisture estimates in many studies e.g., [12,13,24,25,26,27]. We also executed eXtreme Gradient Boosting (XGBoost) [12] and Gradient Boost (GBoost) [12]. The dataset was randomly split (70% training and 30% testing) using the SKLearning train_test_split function, and rows that corresponded to ARM and SoilSCAPE in situ data sites were removed from the training set and inserted into the testing set for later data verification purposes. Experimentation between 400 to 1400 m for ARM, 400 to 3000 m for MOISST, and 30 to 100 m for SoilSCAPE was executed to determine the optimum spatial resolution for each era/region. The average independent values within grids at varying resolutions were obtained with zonal statistics in ArcGIS Pro. To implement the downscaling of SMERGE, we used TensorFlow Decision Forests’ Random Forest implantation, Distributed (Deep) Machine Learning Community (DMLC)’s XGBoost, and Sklearing’s Gradient Boost (GBoost); all of these were set to run as regressors. The hypertuning parameters/settings used for these ML models are specified in Table 6. Hypertuning for three models was conducted via iteration method, where a range of tuning values were given and ideal parameters were found over hundreds of runs. For the TensorFlow Random Forest model, the tuner helper function was included in the iterations.
Independent variable sensitivity was examined with metrics customized for each model, and the results of this analysis are summarized in Table 7. Two approaches were used to gauge model sensitivity because of the different model structures. RF sensitivity was gauged using TensorFlow’s Inverse Mean Minimum Depth (IMMD). The mean IMMD for each independent variable was calculated by tracking the depth of the first occurrence of a feature in each tree in the forest. The depth was averaged for every feature, with the inverse of it taken, resulting in higher IMMD values, reflecting greater sensitivity. For XGBoost and GBoost, independent variable sensitivity was evaluated using the interpretability model tool SHapley Additive exPlanations (SHAP). The reason we opted not to use SHAP for RF is that this tool is not compatible with the TensorFlow Decision Forest library. SHAP explains the individual variable contributions to the predictions made by the model. The higher the SHAP value, the higher the importance of that feature towards the given prediction.
Because of the different tools used to evaluate sensitivity, a direct numerical comparison between IMMD and SHAP was not feasible. Therefore, sensitivity was based on relative ranking. For a given model and spatial resolution, the most sensitive variable was assigned 1, and subsequent variables were assigned a number based on their relative ranking, with the least sensitive variable being 11. Variables that ranked between 1 and 4 were deemed highly sensitive. Rankings between 5 to 8 were designed as moderately sensitive, and those variables ranked greater than 8 were considered to have a low sensitivity. Table 7 shows the average rankings between different spatial resolutions ran for a given model with the sensitivity indicated as high (H), medium (M), or low (L).

3.4. Model Evaluation

A comparison of default and downscaled SMERGE at different spatial resolutions was made against in situ data from the ARM and SoilSCAPE sites and SCAN and USCRN sensors. In addition, downscaled SMERGE was evaluated against AirMOSS data from the MOISST site. Standard evaluation metrics (correlation, r; unbiased root-squared error, ubRMSE [m3/m3]) were utilized in these comparisons. Delta r and Delta ubRMSE were used to compare relative performance between default and downscaled versions of SMERGE and are defined as follows:
Delta r = Downscaled SMERGE r − Default SMERGE r
Delta ubRMSE = Downscaled SMERGE ubRMSE − Default SMERGE ubRMSE
where Downscaled SMERGE is the value obtained from ML and Default SMERGE is the value obtained from the original 12.5 km resolution product. Delta values represent a specific value from a model type and spatial resolution combination (e.g., RF, 400 m) executed within a given network/era/region (e.g., ARM_1_1). Delta r and Delta ubRMSE values provided insight into whether the downscaled model results exhibited improvement (or degradation) compared with the Default SMERGE product. Improvement for r is defined as having Downscaled SMERGE > Default SMERGE and for ubRMSE Default SMERGE > Downscaled SMERGE.

3.5. Comparisons within a Region/Era

To facilitate comparison across the different models and spatial resolutions examined, an objective metric was derived that combined correlation and ubRMSE metrics. The objective metric varies between zero and one and is defined as indicated below:
If Delta r ≤ 0 Then a = 0
If Delta r > 0 Then a = [Delta r/Maximum Delta r] × 0.5
If Delta ubRMSE > 0 Then b = 0
If Delta ubRMSE < 0 Then b = [Delta ubRMSE/Minimum Delta ubRMSE] × 0.5
Objective Metric = a + b
where maximum Delta r and minimum Delta ubRMSE are the maximum and minimum values respectively obtained within an era/region, where a is the component of the objective metric derived from Delta r, and b is the component of the objective metric derived from Delta ubRMSE. The objective metric value ranges between zero and one. A value of one indicates that a model type and spatial resolution combination had the best possible r and ubRMSE values within an era/region. Zero indicates that downscaling yielded no improvement based on both r and ubRMSE metrics. Models with an objective metric between 0 to 0.8 were described as slightly improved. Objective metrics between 0.8 to 1.0 were designated as high performing. If a downscaled model had improvement in only one metric, then this model was deemed as non-improved. This represents a model that recorded improvement in r but not ubRMSE or vice versa.

4. Results

Model sensitivity results are summarized in Table 7. The date and aspect independent variables generally have consistently high and moderate sensitivity, respectively. Elevation mostly exhibits moderate sensitivity, except for ARM era 3, where this variable has a high sensitivity. Albedo and NDVI have a higher sensitivity for RF compared to XGBoost and GBoost models. For albedo, the exception is in SoilSCAPE, which has a high sensitivity for all models. Also, during ARM era 1 and ARM_3_1, NDVI had a uniformly low sensitivity. LAI also generally had a higher sensitivity for RF compared with XGBoost and GBoost models. In SoilSCAPE, LAI sensitivity was uniformly high, and in MOISST and ARM_3_2, low sensitivity was recorded for all models. Conversely, temperature exhibited a lower sensitivity for RF models compared with XGBoost and GBoost, but during ARM era 1, a uniformly high sensitivity was noted for all models. Within ARM, slope, sand, and silt mostly had a higher sensitivity for XGBoost and GBoost models than that recorded by RF. In MOISST and SoilSCAPE, slope had low to moderate sensitivity without a consistent pattern between downscaling models, whereas sand and silt had uniform moderate sensitivity within MOISST and low sensitivity in SoilSCAPE. Finally, clay does not have a consistent relationship among models and generally has a moderate to high sensitivity. Model sensitivity was also examined as a function of spatial resolution. In general, sensitivity was consistent as a function of spatial resolution. The exception is the RF model at the MOISST site, where silt exhibited a marked increase in sensitivity at coarser spatial resolutions.
Figure 3 indicates the number of models with a specified performance from eras 1 and 2. A total of 12 models were executed from each ARM region (four using each ML approach). At ARM, during era 1, 63% of the executed models resulted in improved performance (Figure 3a,b). Of these, 17% were high performing and included all three ML approaches. MOISST and SoilSCAPE (era 2) outperformed ARM (era 1), which had a total of 18 and 6 models executed, respectively. For MOISST, 100% of the downscaling attempts yielded improvements compared to Default SMERGE (Figure 3c). At MOISST, 44% of the models were high performing with a slight preference for RF over the other model types. Figure 2b shows that the spatial distribution of Downscaled SMERGE is similar to AirMOSS data (Figure 2c) from MOISST. At SoilSCAPE the same was true except for a single attempt, resulting in an 83% improvement rate (Table 8). However, only one RF model (17%) from SoilSCAPE was high performing. Figure 4 summarizes the results from ARM era 3, with only 37% of downscaling attempts yielding improvements. In ARM_3_4 not a single model yielded an improvement, compared with Default SMERGE. ARM_3_1, ARM_3_2, and ARM_3_3 had success rates of 42%, 50%, and 58%, respectively. Overall, high performing models from ARM era 3 was only 10%.
Figure 5 and Table A1 illustrates ARM era 1 results as a function of spatial resolution. In ARM_1_1, RF outperformed the other ML approaches. Optimal performance for RF was recorded at 400 m with declining objective metrics obtained at coarser spatial resolutions. XGBoost and GBoost yielded inconsistent results, and these approaches recorded non-improvement at 700 and 1000 m. When compared against the SCAN Abrams site, none of the models yielded an improvement. Conversely, all RF models were non-improved in ARM_1_2. XGBoost and GBoost exhibited erratic behavior as a function of spatial resolution with best results obtained by XGBoost at a 400m resolution. Interestingly, comparison with USCRN data from the Stillwater sites within ARM_1_2 yielded the opposite with improvement realized with only RF and not XGBoost and GBoost models.
Figure 6 and Table A2 depict the results from MOISST (Era 2). RF recorded the highest objective metric at a 400 m spatial resolution. Examination of an in situ sensor from the USCRN Stillwater 5 WNW site also had RF outperforming the other methods. RF exhibited a different trend in terms of performance as a function spatial resolution compared against XGBoost and GBoost. Maximum objective metrics for RF were noted at 400 and 3000 m with lessened performance at the resolutions between 700 to 2000 m. Conversely, XGBoost and GBoost had maximum objective metrics around 1400 to 2000 m with lower values at other resolutions. SoilSCAPE (Table 8), also from era 2, recorded a similar preference for RF. It is noteworthy to indicate that the optimal spatial resolution for all era 1 and 2 regions was ≤700 m.
During ARM era 3, improvements in downscaled results were noted in ARM_3_1, ARM_3_2, and ARM_3_3 (Figure 7; Table A3) and the model that yielded the highest objective metric varied. In ARM_3_1, GBoost had the highest objective metric at a spatial resolution of 700 m and in ARM_3_2 XGBoost, at a 1400 m resolution, yielded the best results. Note that ARM_3_1 and ARM_3_2 had only one high performing model per region (Figure 4). ARM_3_3 differed in that RF yielded three high performing models, with the highest objective metric at 1000 m. For ARM_3_4, no downscaling results yielded an improvement over Default SMERGE. In general, for era 3 the optimal spatial resolution was ≥700 m. Also, in all three regions the general trend was for the objective metric to increase as spatial resolutions becomes coarser.

5. Discussion

Overall, improvement in downscaled SMERGE across all eras, regions, and models ranged from 0.03 to 0.42 for Delta r and −0.0005 to −0.0118 m3/m3 for Delta ubRMSE (Table 8 and Table A1, Table A2 and Table A3). These results are comparable to results from previous ML downscaling efforts. Ref. [15] downscaled SMAP on the Iberian Peninsula using RF, yielding an increase in correlation of 0.31 and a decrease in ubRMSE of 0.026 m3/m3 compared to SMAP at its native resolution. Reference [14], from a study in China, yielded an increase of 0.1 to 0.2 for correlation and a decrease of 0.01 m3/m3 for ubRMSE compared with original land surface model data. Reference [11] also noted an improvement in correlation by 0.1 for RF generated soil moisture, compared with the default ESA-CCI product. [16] used RF to develop the 1 km resolution ChinaCropSM estimate of soil moisture that had a correlation value of 0.93 compared to a 0.35 for ESA-CCI. ubRMSE also recorded a dramatic improvement, from 0.093 to 0.033 m3/m3 in this product.
There are clear differences in performance between eras facilitated by comparison of ARM data between eras 1 and 3. During era 1, more than half (15 out of 24) of the downscaling models yielded improvements (Figure 3a,b). Conversely, during era 3 only slightly more than a third of the models (16 out of 48) were improved (Figure 4). A similar trend was noted for high-performing models. For era 1, 17% of downscaled models were high performing compared with the only 10% high performing models for era 3. During era 1, deeper penetrating L-band retrievals, with increased accuracy, were available from the SMOS [2] and SMAP [28] missions and included in the ESA-CCI product that forms the backbone of SMERGE. Another consideration is the completeness of the independent variables and validation datasets. Missing albedo and LAI data is present during era 3 (Table 9). Note that spatial averaging minimized the impact of these missing datasets at coarser resolutions (1000 to 1400 m). Also, albedo and LAI, have a higher sensitivity within RF models compared with XGBoost and GBoost (Table 7). This could explain the relative underperformance of RF during ARM era 3 compared with other models. RF had a success rate of 19% during era 3. XGBoost and GBoost had higher success rates of 44% and 50%, respectively. Another dimension to examine is the increasing interpolation within the SMERGE product during the earlier eras (Table 9). The ESA-CCI product had some missing daily data in the product that was estimated by interpolation within SMERGE, see [17]. During eras 1 and 2 the degree of interpolation is relatively low (1 to 17%) unlike era 3 (24 to 36%). Interpolation can produce uncertainties within SMERGE estimates that get propagated to the downscaled version, possibly contributing to the poor performance of SMERGE downscaling during era 3. Finally, the spatial representativeness of the in situ data is different between eras 1 and 3 within the ARM regions (Figure 1). The coverage of in situ stations with acceptable data (correlation with Default SMERGE > 0.5) is greater in era 1. Interestingly, the region with the best in situ coverage is the small region ARM_3_3, which recorded the best performance of all ARM regions during era 3.
As notable as the above ARM comparisons are, they are still based on sparse in situ data, which [29] indicated can be problematic when providing validation for coarse-resolution satellite-based soil moisture products. The spatial variability within a grid, even for a downscaled product, may not be represented by a point in situ measurement. A spatial mismatch exists between a grid mean and sparse in situ sampling that can increase uncertainty and produce spurious errors within the downscaled product. This sampling issue is negated for the era 2 sites where MOISST RZSM retrievals were collected over a continuous extent during the AirMOSS campaign. Additionally, SoilSCAPE is a small site, less than a square kilometer, with a cluster of 21 sensors. As such, the validation at these sites can be considered a best-case scenario as reflected by an improvement rate of 100% and 83% for MOISST and SoilSCAPE, respectively. Of particular note are the improvements seen at the 100 m resolution at SoilSCAPE, suggesting that under ideal circumstances downscaling to field scale resolutions is feasible.
Spatial resolution trends are different between eras and models. RF at ARM_1_1 and MOISST had a maximum objective metric at 400 m with declining performance at coarser spatial resolutions out to 1400 m (Figure 5 and Figure 6). Conversely, the highest objective metric for XGBoost and GBoost models at MOISST was at 1400 to 2000 m (Figure 6). During ARM era 3, all models recorded increasing performance with coarser spatial resolutions (Figure 7). During this era, incomplete albedo and LAI likely hampered model execution at finer spatial resolutions (400 to 700 m).
This study has provided valuable insights for the future development of a regional downscaled version of SMERGE. This has implications in that SMERGE is a long duration RZSM product that provides a retrospective estimate of this variable unlike SMOS and SMAP that are limited temporally (post-2010). Therefore, this work lays the groundwork for the development of a long-term field-scale estimate that can support diverse user communities. The new downscaled product will focus on the United States Southern Great Plains where the Default SMERGE performance was the best [20]. This work is not intended to provide a comprehensive validation. Instead, its goal was to determine possible temporal coverage, spatial resolution, and ML model to be used to develop the downscaled product. Additional ranked correlation analyses will be applied to fully validate downscaled SMERGE. Inclusion of these techniques here is beyond this work’s scope.

6. Conclusions

This study successfully downscaled SMERGE, focusing on the warm season (April to October) in the Southern Great Plains (Oklahoma and Kansas). More robust SMERGE downscaling results were yielded during eras 1 (2016 to 2019) and 2 (2012 to 2015), where RF produced optimal results at ≤700 m. Improvements in the downscaled SMERGE at the 100 m resolution at SoilSCAPE were particularly noteworthy. These results suggest that SMERGE can be successfully downscaled to the field scale with the advent of L-band microwave retrievals after 2010. However, some caution regarding this conclusion is warranted, given the small area of the SoilSCAPE site. During era 3, downscaling efforts were less successful for several reasons. Optimum model and spatial resolution were less consistent across the ARM era 3 regions, but in general, they exceeded 700 m. Results from this study straddles existing (Sentinel 1 & 2) and planned NASA ISRO Synthetic Aperture Radar (NISAR) mission’s capabilities. In addition, downscaled RZSM from long duration products like SMERGE can support more robust hydrologic and ecologic modeling and drought monitoring than surface satellite SM estimates.

Author Contributions

Conceptualization, K.T.; methodology, K.T., A.S., D.E. and M.G.; software, A.S. and D.G.; validation, K.T. and A.S.; formal analysis, K.T.; investigation, K.T.; resources, K.T.; data curation, K.T. and A.S.; writing—original draft preparation, K.T.; writing—review and editing, M.B.; visualization, K.T.; supervision, K.T. and A.S.; project administration, K.T.; funding acquisition, K.T. and D.G. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge support from the United States Department of Energy Research Development and Partnership Pilot (RDPP, award number DE-SC0023067). Support from NASA Climate Indicator and Data Products for Future National Climate Assessments program through award # NNX16AH30G and NSF Geoscience Equipment (Award Number 1636769) is also gratefully acknowledged.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

The Texas Advanced Computing Center (TACC; http://www.tacc.utexas.edu) at The University of Texas at Austin also provided computational resources that have contributed to the research results reported within this paper. The assistance of Franco Zamora (TAMIU ARC Writing Consultant) is also greatly appreciated.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. ARM Era 1 results.
Table A1. ARM Era 1 results.
Model TypeRegionSpatial
Resolution
m
Default
SMERGE
r
Downscaled
SMERGE
r
Default
SMERGE
ubRMSE
Downscaled
SMERGE
ubRMSE
RF114000.57140.69070.07100.0678
RF110000.57140.68700.07100.0666
RF17000.57140.74090.07100.0660
RF14000.57140.73180.07100.0654
XGBoost114000.57140.65500.07100.0685
XGBoost110000.57140.52430.07100.0701
XGBoost17000.57140.52490.07100.0724
XGBoost14000.57140.62980.07100.0693
GBoost114000.57140.71000.07100.0650
GBoost110000.57140.48500.07100.0720
GBoost17000.57140.44890.07100.0736
GBoost14000.57140.64770.07100.0660
RF214000.60370.65540.09700.1049
RF210000.60440.69660.09700.1024
RF27000.60340.71450.09700.1010
RF24000.60170.69790.09720.1007
XGBoost214000.60370.66490.09700.0970
XGBoost210000.60440.75670.09700.0929
XGBoost27000.60340.61280.09700.0965
XGBoost24000.60170.75110.09720.0897
GBoost214000.60370.44440.09700.1071
GBoost210000.60440.63490.09700.0934
GBoost27000.60340.67570.09700.0881
GBoost24000.60170.61810.09720.0954
In Situ Comparison from USCRN Stillwater Sites
RF214000.63240.34960.07120.0800
RF210000.75020.57770.06840.0757
RF27000.64170.67560.06810.0673
XGBoost214000.63240.59750.07120.0724
XGBoost210000.75020.66920.06840.0681
XGBoost27000.64170.48980.06810.0717
GBoost214000.63240.46630.07120.0752
GBoost210000.75020.66920.06840.0757
GBoost27000.64170.47130.06810.0716
Best model results are indicated in bold.
Table A2. MOISST results.
Table A2. MOISST results.
Model TypeResolution
m
Default
SMERGE
r
Downscaled
SMERGE
r
Default
SMERGE
ubRMSE
Downscaled
SMERGE
ubRMSE
RF30000.48060.81930.05720.0508
RF20000.43750.76080.06270.0568
RF14000.43300.71610.06560.0620
RF10000.41130.70990.06980.0648
RF7000.38000.73040.07550.0704
RF4000.33210.74970.08600.0805
XGBoost30000.48060.80920.05720.0542
XGBoost20000.43750.80820.06270.0579
XGBoost14000.43300.83370.06560.0612
XGBoost10000.41130.77450.06980.0671
XGBoost7000.38000.74610.07550.0720
XGBoost4000.33210.72390.08600.0847
GBoost30000.48060.80720.05720.0557
GBoost20000.43750.79970.06270.0571
GBoost14000.43300.84600.06560.0614
GBoost10000.41130.78650.06980.0670
GBoost7000.38000.75520.07550.0739
GBoost4000.33210.73140.08600.0856
In Situ Comparison from USCRN Stillwater 5WNW
RF7000.56500.61890.03600.0347
XGBoost7000.56500.23160.03600.0460
GBoost7000.56500.57740.03600.0368
Best model results are indicated in bold.
Table A3. ARM Era 3 results.
Table A3. ARM Era 3 results.
Model TypeRegionResolution
m
Default
SMERGE
r
Downscaled
SMERGE
r
Default
SMERGE
ubRMSE
Downscaled
SMERGE
ubRMSE
RF114000.61090.49940.03040.0207
RF110000.60870.58890.03120.0194
RF17000.62350.57810.02930.0201
RF14000.64630.62990.02880.0197
XGBoost114000.61090.63740.03040.0264
XGBoost110000.60870.54760.03120.0287
XGBoost17000.62350.64060.02930.0251
XGBoost14000.64630.64790.02880.0265
GBoost114000.61090.29180.03040.0381
GBoost110000.60870.59560.03120.0344
GBoost17000.62350.69220.02930.0205
GBoost14000.64630.70160.02880.0239
RF214000.70320.72550.02710.0274
RF210000.71150.66170.02650.0289
RF27000.70760.47980.02770.0344
RF24000.70970.57280.02660.0310
XGBoost214000.70320.80470.02710.0246
XGBoost210000.71150.72150.02650.0262
XGBoost27000.70760.65190.02770.0299
XGBoost24000.70970.72120.02660.0261
GBoost214000.70320.71800.02710.0270
GBoost210000.71150.43740.02650.0374
GBoost27000.70760.63810.02770.0380
GBoost24000.70970.20110.02660.0550
RF314000.48950.70290.03480.0255
RF310000.48950.71200.03480.0251
RF37000.48800.66050.03500.0263
RF34000.53020.45290.03460.0332
XGBoost314000.48950.48790.03480.0333
XGBoost310000.48950.56060.03480.0314
XGBoost37000.48800.44670.03500.0349
XGBoost34000.53020.28350.03460.0382
GBoost314000.48950.59450.03480.0282
GBoost310000.48950.60250.03480.0315
GBoost37000.48800.60690.03500.0317
GBoost34000.53020.46450.03460.0357
Best model results are indicated in bold.

References

  1. Entekhabi, D.; Njoku, E.G.; O’Neill, P.E.; Kellogg, K.H.; Crow, W.T.; Edelstein, W.N.; Entin, J.K.; Goodman, S.D.; Jackson, T.J.; Johnson, J.; et al. The Soil Moisture Active Passive (SMAP) mission. Proc. IEEE 2010, 98, 704–716. [Google Scholar] [CrossRef]
  2. Kerr, Y.H.; Waldteufel, P.; Wigneron, J.P.; Maerinuzzi, J.M.; Font, J.; Berger, M. Soil moisture retrieval from space: The Soil Moisture and Ocean Salinity (SMOS) mission. IEEE Trans. Geosci. Remote Sens. 2001, 39, 1729–1735. [Google Scholar] [CrossRef]
  3. Liu, Y.Y.; Dorigo, W.A.; Parinussa, R.M.; de Jeu, R.A.M.; Wagner, W.; McCabe, M.F.; Evans, J.P.; van Dijk, A.I.J.M. Trend-preserving blending of passive and active microwave soil moisture retrievals. Remote Sens. Environ. 2012, 123, 280–297. [Google Scholar] [CrossRef]
  4. O’donnell, M.S.; Manier, D.J. Spatial Estimates of Soil Moisture for Understanding Ecological Potential and Risk: A Case Study for Arid and Semi-Arid Ecosystems. Land 2022, 11, 1856. [Google Scholar] [CrossRef]
  5. Bhardwaj, J.; Kuleshov, Y.; Chua, Z.-W.; Watkins, A.B.; Choy, S.; Sun, Q. Evaluating Satellite Soil Moisture Datasets for Drought Monitoring in Australia and the South-West Pacific. Remote Sens. 2022, 14, 3971. [Google Scholar] [CrossRef]
  6. Peng, J.; Loew, A.; Merlin, O.; Verhoest, N.E.C. A review of spatial downscaling remotely sensed soil moisture: Downscale satellite-based soil moisture. Rev. Geophys. 2017, 55, 341–366. [Google Scholar] [CrossRef]
  7. Sabaghy, S.; Walker, J.P.; Renzullo, L.J.; Jackson, T.J. Spatially enhanced passive microwave derived soil moisture: Capabilities and opportunities. Remote Sens. Environ. 2018, 209, 551–580. [Google Scholar] [CrossRef]
  8. Srivastava, P.; Han, D.; Ramirez, M.R.; Islan, T. Machine learning techniques for downscaling SMOS satellite soil moisture using MODIS land surface temperature for hydrologic applications. Water Resour. Manag. 2013, 27, 3127–3144. [Google Scholar] [CrossRef]
  9. Im, J.; Park, S.; Rhee, J.; Balk, J.; Choi, M. Downscaling of AMSR-E soil moisture with MODIS products using machine learning approaches. Environ. Earth Sci. 2016, 75, 1120. [Google Scholar] [CrossRef]
  10. Liu, Y.; Yang, Y.; Jing, W.; Yue, X. Comparison of different machine learning approaches for monthly satellite-based soil moisture downscaling over Northeast China. Remote Sens. 2018, 10, 31. [Google Scholar] [CrossRef]
  11. Zhang, L.; Zeng, Y.; Zhuang, R.; Szabó, B.; Manfreda, S.; Han, Q.; Su, Z. In Situ Observation-Constrained Global Surface Soil Moisture Using Random Forest Model. Remote Sens. 2022, 13, 4893. [Google Scholar] [CrossRef]
  12. Liu, Y.; Xia, X.; Yao, L.; Jing, W.; Zhou, C.; Huang, W.; Li, Y.; Yang, J. Downscaling satellite retrieved soil moisture using Regression tree-based machine learning algorithms over Southwest France. Earth Space Sci. 2020, 7, e2020EA001267. [Google Scholar] [CrossRef]
  13. Zappa, L.; Forkel, M.; Xaver, A.; Dorigo, W. Deriving field scale soil moisture from satellite observations and ground measurements in a hilly agricultural region. Remote Sens. 2019, 11, 2596. [Google Scholar] [CrossRef]
  14. Abowarda, A.S.; Bai, L.; Zhang, C.; Long, D.; Li, X.; Huang, Q.; Sun, Z. Generating surface soil moisture at 30 m spatial resolution using both data fusion and machine learning toward better water resources management at the field scale. Remote Sens. Environ. 2021, 255, 112301. [Google Scholar] [CrossRef]
  15. Zhao, W.; Sánchez, N.; Lu, H.; Li, A. A spatial downscaling approach for the SMAP passive surface soil moisture product using random forest regression. J. Hydrol. 2018, 563, 1009–1024. [Google Scholar] [CrossRef]
  16. Cheng, F.; Zhang, Z.; Zhuang, H.; Han, J.; Luo, Y.; Cao, J.; Zhang, L.; Zhang, J.; Xu, J.; Tao, F. ChinaCropSM1 km: A fine 1 km daily soil moisture dataset for dryland wheat and maize across China during 1993–2018. Earth Syst. Sci. Data 2023, 15, 395–409. [Google Scholar] [CrossRef]
  17. Reichle, R.H.; Crow, W.T.; Koster, R.D.; Sharif, H.O.; Mahanama, S.P.P. Contribution of soil moisture retrievals to land data assimilation products. Geophys. Res. Lett. 2008, 35, L01404. [Google Scholar] [CrossRef]
  18. Wagner, W.; Lemoine, G.; Rott, H. A method for estimating soil moisture from ERS scatterometer and soil data. Remote Sens. Environ. 1999, 70, 191–207. [Google Scholar] [CrossRef]
  19. Albergel, C.; Ruediger, C.; Pellarin, T.; Calvet, J.-C.; Fritz, N.; Froissard, F.; Suquia, D.; Petitpa, A.; Piguet, B.; Martin, E. From near-surface to root-zone soil moisture using an exponential filter: An assessment of the method based on in-situ observations and model simulations. Hydrol. Earth Syst. Sci. 2008, 12, 1323–1337. [Google Scholar] [CrossRef]
  20. Tobin, K.J.; Crow, W.T.; Dong, J.; Bennett, M.E. Validation of a new soil moisture product Soil MERGE or SMERGE. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 3351–3365. [Google Scholar] [CrossRef]
  21. Tabatabaeenejad, A.; Burgin, M.; Duan, X.; Moghaddam, M. P-Band radar retrieval of subsurface soil moisture profile as a second-order polynomial: First AirMOSS results. IEEE Trans. Geosci. Remote Sens. 2015, 53, 645–658. [Google Scholar] [CrossRef]
  22. Dorigo, W.A.; Xavier, A.; Vreugdenhil, M.; Gruber, A.; Hegyiová, A.; Sanchis-Dufau, A.D.; Zamojski, D.; Cordes, C.; Wagner, W.; Drusch, M. Global automated quality control of in situ soil moisture data from the International Soil Moisture Network. Vadose Zone J. 2013, 12, 1–21. [Google Scholar] [CrossRef]
  23. Tobin, K.J.; Crow, W.T.; Bennett, M.E. Root zone soil moisture comparisons: AirMOSS, SMERGE, and SMAP. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  24. Jing, W.; Zhang, P.; Zhao, X. Reconstructing monthly ECV global soil moisture with an improved spatial resolution. Water Resour. Manag. 2018, 32, 2523–2537. [Google Scholar] [CrossRef]
  25. Yan, R.; Bai, R. A new approach for soil moisture downscaling in the presence of seasonal difference. Remote Sens. 2020, 12, 2818. [Google Scholar] [CrossRef]
  26. Kovačević, J.; Cvijetinović, Ž.; Stančić, N.; Brodić, N.; Mihajlović, D. New downscaling approach using ESA CCI SM products for obtaining high resolution surface soil moisture. Remote Sens. 2020, 12, 1119. [Google Scholar] [CrossRef]
  27. Xu, Y.; Wang, L.; Ma, Z.; Li, B.; Bartels, R.; Liu, C.; Zhang, X.; Dong, J. Spatially explicit model for statistical downscaling of satellite passive microwave soil moisture. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1182–1191. [Google Scholar] [CrossRef]
  28. Reichle, R.H.; De Lannoy, G.J.M.; Liu, Q.; Ardizzone, J.V.; Colliander, A.; Conaty, A.; Crow, W.; Jackson, T.J.; Jones, L.A.; Kimball, J.S.; et al. Assessment of the SMAP Level-4 surface and root-zone soil moisture product using in situ measurements. J. Hydrometeorol. 2017, 18, 2621–2645. [Google Scholar] [CrossRef]
  29. Crow, W.T.; Berg, A.A.; Cosh, M.H.; Loew, A.; Mohanty, B.P.; Panciera, R.; de Rosnay, P.; Ryu, D.; Walker, J.P. Upscaling sparse ground-based soil moisture observations for the validation of coarse-resolution satellite soil moisture products. Rev. Geophys. 2012, 50, RG2002. [Google Scholar] [CrossRef]
Figure 1. Locality map with era 1 sites in red, era 2 in blue, and era 3 in black. Upper-right inset map shows the study area location in Kansas and Oklahoma. Locations of ARM era 1 and era 3 in situ sites are indicated by red open circles and black circles, respectively. The Marena Oklahoma Soil Moisture Active Passive In Situ Testbed (MOISST) site is a solid blue rectangle and the Soil moisture Sensing Controller and oPtimal Estimator (SoilSCAPE) site is indicated by a blue triangle. Other in situ data from United States Climate Reference Network (USCRN) Stillwater sites are squares with a blue halo (eras 1 and 2) and the Soil Climate Analysis Network (SCAN) Abrams site with a red square (era 1).
Figure 1. Locality map with era 1 sites in red, era 2 in blue, and era 3 in black. Upper-right inset map shows the study area location in Kansas and Oklahoma. Locations of ARM era 1 and era 3 in situ sites are indicated by red open circles and black circles, respectively. The Marena Oklahoma Soil Moisture Active Passive In Situ Testbed (MOISST) site is a solid blue rectangle and the Soil moisture Sensing Controller and oPtimal Estimator (SoilSCAPE) site is indicated by a blue triangle. Other in situ data from United States Climate Reference Network (USCRN) Stillwater sites are squares with a blue halo (eras 1 and 2) and the Soil Climate Analysis Network (SCAN) Abrams site with a red square (era 1).
Remotesensing 15 05120 g001
Figure 2. Comparison of (a) Default SMERGE, (b) Downscaled SMERGE (1000 m resolution), and (c) AirMOSS (90 m resolution) for the MOISST site on 24 October 2012. Bounding coordinates are provided in (a).
Figure 2. Comparison of (a) Default SMERGE, (b) Downscaled SMERGE (1000 m resolution), and (c) AirMOSS (90 m resolution) for the MOISST site on 24 October 2012. Bounding coordinates are provided in (a).
Remotesensing 15 05120 g002
Figure 3. Model performance in eras 1 and 2 regions examined with (a) ARM_1_1, (b) ARM_1_2, and (c) MOISST. Random Forest (RF) is blue, eXtreme Gradient Boosting (XGBoost) is red, and Gradient Boost (GBoost) is green.
Figure 3. Model performance in eras 1 and 2 regions examined with (a) ARM_1_1, (b) ARM_1_2, and (c) MOISST. Random Forest (RF) is blue, eXtreme Gradient Boosting (XGBoost) is red, and Gradient Boost (GBoost) is green.
Remotesensing 15 05120 g003
Figure 4. Model performance in era 3 regions examined with (a) ARM_3_1, (b) ARM_3_2, and (c) ARM_3_3. RF is blue, XGBoost is red, and GBoost is green.
Figure 4. Model performance in era 3 regions examined with (a) ARM_3_1, (b) ARM_3_2, and (c) ARM_3_3. RF is blue, XGBoost is red, and GBoost is green.
Remotesensing 15 05120 g004
Figure 5. Objective metric as a function of spatial resolution for ARM Era 1. ARM_1_1 represented by squares and ARM_1_2 by triangles. In situ comparison with the USCRN Stillwater sites is indicated as a star. Only objective metrics for improved models were plotted. Blue indicates RF, red indicates XGBoost, and green indicates GBoost (also see legend).
Figure 5. Objective metric as a function of spatial resolution for ARM Era 1. ARM_1_1 represented by squares and ARM_1_2 by triangles. In situ comparison with the USCRN Stillwater sites is indicated as a star. Only objective metrics for improved models were plotted. Blue indicates RF, red indicates XGBoost, and green indicates GBoost (also see legend).
Remotesensing 15 05120 g005
Figure 6. Objective metric as a function of spatial resolution for the MOISST. Colors are as indicated in Figure 5 (also see legend).
Figure 6. Objective metric as a function of spatial resolution for the MOISST. Colors are as indicated in Figure 5 (also see legend).
Remotesensing 15 05120 g006
Figure 7. Objective metric as a function of spatial resolution for the ARM Era 3. Colors are as indicated in Figure 5 (also see legend). ARM_3_1 represented by squares, ARM_3_2 by triangles, and ARM_3_3 by circles.
Figure 7. Objective metric as a function of spatial resolution for the ARM Era 3. Colors are as indicated in Figure 5 (also see legend). ARM_3_1 represented by squares, ARM_3_2 by triangles, and ARM_3_3 by circles.
Remotesensing 15 05120 g007
Table 1. United States Department of Energy Atmospheric Radiation Measurement (ARM) sites by era/region.
Table 1. United States Department of Energy Atmospheric Radiation Measurement (ARM) sites by era/region.
Era_RegionIn Situ Station
Era 1, Region 1Anthony, Ashton, Bryon, Lamont-CF1, Maple City, Medford, Newkirk, Pawhuska
Era 1, Region 2Marshall, Morrison, Omega, Ringwood, Tyron, Waukomis
Era 3, Region 1Hillsboro, Towanda
Era 3, Region 2Ashton, Byron, Lamont-CF1
Era 3, Region 3Elk Falls, Pawhuska, Tyro
Era 3, Region 4El Reno, Meeker
Table 2. Physical characteristics of examined regions.
Table 2. Physical characteristics of examined regions.
Network_Era_RegionClay (%)Silt (%)Sand (%)Dominant Land CoverSecondary Land CoverElevation (m)
ARM_1_17–50 (30)1–67 (46)4–90 (24)Cultivated CropsHerbaceous244–449 (347)
ARM_1_28–51 (27)1–64 (37)6–90 (36)HerbaceousCultivated Crops251–443 (332)
MOISST5–42 (24)4–65 (37)13–90 (40)HerbaceousCultivated Crops267–377 (322)
SoilSCAPE13–23 (19)22–58 (45)19–65 (35)HerbaceousNone520–535 (523)
ARM_3_13–48 (27)2–64 (36)3–95 (37)Cultivated CropsHerbaceous371–694 (517)
ARM_3_21–54 (27)1–67 (40)7–98 (34)HerbaceousCultivated Crops280–677 (427)
ARM_3_317–57 (34)18–63 (44)3–62 (22)HerbaceousHay/PastureDeciduous Forest202–475 (287)
ARM_3_47–50 (24)1–65 (35)6–90 (41)HerbaceousCultivated Crops247–666 (413)
Range of values for soil texture and elevation (%) by region are given with average values indicated in parentheses.
Table 3. Meteorological characteristics of study areas during warm season (April to October).
Table 3. Meteorological characteristics of study areas during warm season (April to October).
Network_Era_RegionApril Mean
Temp (°C)
July Mean
Temp (°C)
October Mean
Temp (°C)
Warm Season Precipitation (mm)
ARM_1_116.823.419.0114.7
ARM_1_217.324.519.298.0
MOISST12.926.315.394.6
SoilSCAPE15.327.015.877.0
ARM_3_117.324.217.885.3
ARM_3_219.725.218.283.3
ARM_3_318.225.118.4113.9
ARM_3_419.225.719.686.8
Table 4. Data sources used for SoilMERGE (SMERGE) downscaling.
Table 4. Data sources used for SoilMERGE (SMERGE) downscaling.
Data SourceDescription and Download URL (Accesssed on 30 June 2023)
Static Variables
ElevationUSGS Elevation Products (3DEP), 1/3 arc-sec DEM: TNM Download v2 (nationalmap.gov)
Soil TextureGridded National Soil Survey Geographic Database (gNATSGO), the ratio of sand, silt, and clay (Spatial Resolution = 30 m): https://www.nrcs.usda.gov/resources/data-and-reports/gridded-national-soil-survey-geographic-database-gnatsgo
Dynamic Variables
SMERGESmerge-Noah-CCI root zone soil moisture 0-40 cm L4 daily 0.125 × 0.125 degree V2.0 (SMERGE_RZSM0_40CM): https://www.tamiu.edu/cees/smerge/data.shtml
AlbedoMCD15A3H v061 MODIS/Terra+Aqua Leaf Area Index/FPAR 4-Day L4 Global 500 m SIN Grid: https://lpdaac.usgs.gov/products/mcd43a3v006/
LAIMCD15A3H v061 MODIS/Terra+Aqua Leaf Area Index/FPAR 4-Day L4 Global 500 m SIN Grid: https://lpdaac.usgs.gov/products/mcd15a3hv061/
NDVITemporally Smoothed Weekly AQUA Collect 6 (C6) Moderate Resolution Imaging Spectroradiometer (MODIS) Normalized Difference Vegetation Index (NDVI) at 250 m: Remote Sensing Phenology CONUS 250 m Smoothed NDVI (usgs.gov)
TemperatureDaily mean temperature, calculated as (tmax + tmin)/2 (Spatial Resolution = 4 km): https://ftp.prism.oregonstate.edu/daily/tmean/
Table 5. Dates with acceptable values for comparison.
Table 5. Dates with acceptable values for comparison.
Network_EraDates
ARM_120160401, 20160421, 20160505, 20160524, 20160715, 20160729, 20160812, 20160826, 20160909, 20160923, 20161007, 20161021, 20170401, 20170415, 20170429, 20170513, 20170527, 20170610, 20170624, 20170708, 20170722, 20170805, 20170819, 20170902, 20170916, 20170930, 20171014, 20171028, 20180401, 20180415, 20180429, 20180513, 20180527, 20180610, 20180624, 20180708,20180722, 20180805, 20180819, 20180902, 20180916, 20180930, 20181014, 20181028,20190401, 20190415, 20190429
ARM_320030401, 20030415, 20030429, 20030513, 20030527, 20030610, 20030624, 20030731, 20030814, 20030828, 20030910, 20031023, 20040401, 20040415, 20040430, 20040514, 20040528, 20040617, 20040701, 20040715, 20040729, 20040812, 20040826, 20040909, 20040923, 20041007, 20041021, 20050401, 20050415, 20050429, 20050513, 20050527, 20050610, 20050628, 20050713, 20050727,20050822, 20050905, 20050920, 20051004, 20051018,20060401, 20060415, 20060707, 20060806, 20060826, 20060909, 20060923, 20061007, 20061021, 20070620, 20070813
MOISST_220121024, 20121027, 20121030, 20130617, 20130716, 20130719, 20130723, 20130927, 20140416, 20140418, 20140424, 20140708, 20140711, 20140715, 20141014,20141017, 20141021, 20150416, 20150420, 20150807, 20150811, 20150814
SoilSCAPE_220120421, 20120601, 20120608, 20120615, 20120628, 20120705, 20120712, 20120719, 20120726, 20120802, 20120817, 20120824, 20120831, 20120907, 20120922, 20120929, 20121006, 20121013, 20121020, 20121030, 20121106, 20121113,20121120, 20121126, 20130407, 20130414, 2013042, 20130428, 20130505, 20130512, 20130519, 20130530, 20130606, 20130613, 20130626, 20130703, 20130710, 20130717, 20130724, 2013080,20130810, 20131006, 20131013, 20131020, 20131110, 20131117, 20131124, 20141027, 20150401, 20150408, 20150415, 20150422, 20150429, 20150506, 20150513, 20150520, 20150527, 20150603, 20150610, 20150617, 20150624, 20150701, 20150708, 20150715, 20150722, 20150729, 20150805, 20150812, 20150819, 20150826, 20150902, 20150909, 20150916, 20150923, 20150930, 20151007, 20151014, 20151021, 20151028
Table 6. Machine learning algorithm settings/hypertuning parameters.
Table 6. Machine learning algorithm settings/hypertuning parameters.
Random Forest (RF)
  • Tuner = forest.tuner.RandomSearch(num_trials =135, use_predefined_hps = True)
  • Winner_take_all = True
  • Categorical_algorithm = ‘CART’
  • Honest = True
  • Honest_fixed_separation = True
  • Honest_ratio_leaf_examples = 0.75
  • Bootstrap_size_ratio = 1.05
  • Adapt_bootstrap_size_ratio_for_maximum_training_duration = True
  • Keep_non_leaf_label_distribution = False
  • Max_depth = 9
eXtreme Gradient Boosting (XGBoost)
  • N_estimators = 500
  • Max_depth = 10
  • Tree_method = ‘hist’
Gradient Boost (GBoost)
  • N_estimators = 175
  • Max_depth = 10
  • Min_samples_split = 4
  • Learing_rate = 0.3
  • Loss = squared_error
Table 7. Model sensitivity results.
Table 7. Model sensitivity results.
Network_Era_RegionModel TypeDateAlbedoClayAspectTempElevNDVILaiSandSiltSlope
ARM_1_1RFHHHMHMLMMLL
ARM_1_1XGBoostHLMMHMLLHMM
ARM_1_1GBoostHLMMHMLLHMM
ARM_1_2RFHHMMHMLMLHL
ARM_1_2XGBoostHLMMHMLLHHM
ARM_1_2GBoostHLMMHMLLHHM
MOISSTRFHHHMMLMLMML
MOISSTXGBoostHMHLHMLLMMM
MOISSTGBoostHMHMHMLLMML
SoilSCAPERFHHLMMMHHLLM
SoilSCAPEXGBoostHHMMHMMHLLL
SoilSCAPEGBoostHHMMHMMHLLM
ARM_3_1RFHHHMLHLMLML
ARM_3_1XGBoostHLMMMHLLMHL
ARM_3_1GBoostHLMMMHLLMHM
ARM_3_2RFHHHMLHMLMLL
ARM_3_2XGBoostHLMMMHLLHHM
ARM_3_2GBoostHLMMHHLLHMM
ARM_3_3RFHHHMMHMMLLL
ARM_3_3XGBoostHLHMHHLLMMM
ARM_3_3GBoostHLMMHHLLMMM
H represents high sensitivity, M is medium sensitivity, and L is low sensitivity.
Table 8. SoilSCAPE results.
Table 8. SoilSCAPE results.
Model TypeResolution
m
Default
SMERGE
r
Downscaled
SMERGE
r
Default
SMERGE
ubRMSE
Downscaled
SMERGE
ubRMSE
Objective
Metric
RF1000.48050.52170.11270.11220.9061
RF300.46620.51690.12100.12070.7738
XGBoost1000.48050.48370.11270.11260.1200
XGBoost300.46620.46650.12100.12100.0188
GBoost1000.48050.48090.11270.11270.0137
GBoost300.46620.46620.12100.12100
Best model results are indicated in bold.
Table 9. Reasons for incompleteness in datasets.
Table 9. Reasons for incompleteness in datasets.
Network_Era_Region or SiteResolution
m
Percent
Complete
Data Used for TrainingIncomplete
MOISST/In Situ Data
Missing Albedo/LAIPercentage SMERGE
Interpolated
ARM_1_11400100%-0%0%10%
ARM_1_11000100%-0%0%11%
ARM_1_1700100%-0%0%16%
ARM_1_1400100%-0%0%17%
SCAN_1_Abrams100034%66%0%0%-
SCAN_1_Abrams70032%68%0%0%-
ARM_1_21400100%-0%0%9%
ARM_1_21000100%-0%0%9%
ARM_1_2700100%-0%0%11%
ARM_1_2400100%-0%0%14%
USCRN_1_Stillwater Sites140029%71%0%0%-
USCRN_1_Stillwater Sites100025%75%0%0%-
USCRN_1_Stillwater Sites70031%69%0%0%-
AirMOSS_2_MOISST300082%-18%0%1%
AirMOSS_2_MOISST200084%-16%0%1%
AirMOSS_2_MOISST140083%-17%0%1%
AirMOSS_2_MOISST100082%-18%0%1%
AirMOSS_2_MOISST70083%-17%0%1%
AirMOSS_2_MOISST40081%-19%0%1%
USCRN_2_Stillwater 5WNW70046%54%0%0%-
SoilSCAPE_210065%-0%0%10%
SoilSCAPE_23072%-0%0%10%
ARM_3_11400100%-0%0%24%
ARM_3_1100093%-0%7%24%
ARM_3_170075%-0%25%25%
ARM_3_140063%-0%37%25%
ARM_3_21400100%-0%0%29%
ARM_3_2100092%-0%8%29%
ARM_3_270077%-0%23%29%
ARM_3_240077%-0%23%29%
ARM_3_3140092%-7%1%36%
ARM_3_3100092%-7%1%36%
ARM_3_370091%-7%2%36%
ARM_3_340077%-6%17%36%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tobin, K.; Sanchez, A.; Esparza, D.; Garcia, M.; Ganta, D.; Bennett, M. Machine Learning Downscaling of SoilMERGE in the United States Southern Great Plains. Remote Sens. 2023, 15, 5120. https://doi.org/10.3390/rs15215120

AMA Style

Tobin K, Sanchez A, Esparza D, Garcia M, Ganta D, Bennett M. Machine Learning Downscaling of SoilMERGE in the United States Southern Great Plains. Remote Sensing. 2023; 15(21):5120. https://doi.org/10.3390/rs15215120

Chicago/Turabian Style

Tobin, Kenneth, Aaron Sanchez, Daniela Esparza, Miguel Garcia, Deepak Ganta, and Marvin Bennett. 2023. "Machine Learning Downscaling of SoilMERGE in the United States Southern Great Plains" Remote Sensing 15, no. 21: 5120. https://doi.org/10.3390/rs15215120

APA Style

Tobin, K., Sanchez, A., Esparza, D., Garcia, M., Ganta, D., & Bennett, M. (2023). Machine Learning Downscaling of SoilMERGE in the United States Southern Great Plains. Remote Sensing, 15(21), 5120. https://doi.org/10.3390/rs15215120

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop