Machine Learning Downscaling of SoilMERGE in the United States Southern Great Plains

: SoilMERGE (SMERGE) is a root-zone soil moisture (RZSM) product that covers the entire continental United States and spans 1978 to 2019. Machine learning techniques, Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Gradient Boost (GBoost) downscaled SMERGE to spatial resolutions straddling the ﬁeld scale domain (100 to 3000 m). Study area was northern Oklahoma and southern Kansas. The coarse resolution of SMERGE (0.125 degree) limits this product’s utility. To validate downscaled results in situ data from four sources were used that included: United States Department of Energy Atmospheric Radiation Measurement (ARM) observatory, United States Climate Reference Network (USCRN), Soil Climate Analysis Network (SCAN), and Soil moisture Sensing Controller and oPtimal Estimator (SoilSCAPE). In addition, RZSM retrievals from NASA’s Airborne Microwave Observatory of Subcanopy and Surface (AirMOSS) campaign provided a nearly spatially continuous comparison. Three periods were examined: era 1 (2016 to 2019), era 2 (2012 to 2015), and era 3 (2003 to 2007). During eras 1 and 2, RF outperformed XGBoost and GBoost, whereas during era 3 no model dominated. Performance was better during eras 1 and 2 as opposed to the pre-L band era 3. Improvements across all eras, regions, and models realized from downscaling included an increase in correlation from 0.03 to 0.42 and a decrease in ub RMSE from − 0.0005 to − 0.0118 m 3 /m 3 . This study demonstrates the feasibility of SMERGE downscaling opening the prospect for the development of a long-term RZSM dataset at a more desirable ﬁeld-scale resolution with the potential to support diverse hydrometeorological and agricultural applications.


Introduction
Satellite-derived soil moisture products, such as Soil Moisture Active Passive (SMAP) and Soil Moisture and Ocean Salinity (SMOS), have revolutionized hydrological and agricultural studies by providing estimates of surface soil moisture (SM) worldwide [1,2].However, their use is hindered by their coarse spatial resolution (0.25-0.50 degrees) and their lack of ability to penetrate below the surface skin or the top 5 cm layer.While products like the European Space Agency Climate Change Initiative (ESA-CCI) have blended satellite retrievals to enhance observation frequency [3], there remains a pressing need for higher spatial resolution data to support hydrological and ecological applications [4] and accurate drought monitoring [5].To bridge this gap, advanced machine learning (ML) techniques have emerged as promising tools for downscaling satellite-based soil moisture data.ML offers advantages in handling large and noisy datasets from dynamic and non-linear systems [6][7][8][9][10].In recent years, studies have successfully employed ML algorithms such as Random Forest (RF), gradient boost decision tree, and eXtreme Gradient Boosting (XGBoost) to enhance spatial resolution and accuracy of surface soil moisture estimates [11][12][13][14][15][16].Notable improvements have included increased correlations and decreased unbiased root mean square error [11,[14][15][16].While surface soil moisture retrievals are valuable, the scientific community increasingly recognizes the importance of deeper root-zone soil moisture (RZSM) data, as it directly influences agricultural productivity and groundwater interactions.However, directly sensing RZSM remains a challenge.Efforts like NASA's SigNals of Opportunity: P-band Investigation (SNoOPI) plan to use the penetrating P-band to retrieve RZSM, but at present, RZSM is often inferred from surface measurements.Two common approaches used to estimate RZSM include the Ensemble Kalman Filter [17] and Exponential Filter [18,19], which, while useful, are not as robust as having direct retrievals of soil moisture.
In this context, the SoilMERGE (SMERGE) product stands as a notable endeavor, covering the continental United States and spanning multiple decades (1979 to 2019) [20].Like other products, SMERGE faces spatial resolution constraints (0.125 degrees) and is available at a daily time step.SMERGE provides an overall estimate of RZSM between 0 to 40 cm and is based on the fusion of NLDAS Noah-2 land surface model output with surface satellite retrievals from the European Space Agency (ESA) Climate Change Initiative (CCI).In this study, we address the following critical questions related to the downscaling of SMERGE: (1) What is the most optimal ML technique and (2) what downscaling resolution provides the most robust results?To answer these questions, we explore three ML approaches and compare downscaled SMERGE with diverse datasets, including in situ measurements and airborne radar estimates of RZSM from the Marena Oklahoma Soil Moisture Active Passive In Situ Testbed (MOISST) site associated with NASA's Airborne Microwave Observatory of Subcanopy and Surface (AirMOSS) campaign [21].The analysis spans three distinct eras (2016 to 2019, 2012 to 2015, and 2003 to 2007) and aims to achieve finer spatial resolution for SMERGE, thereby enabling more accurate RZSM estimation, which can be used to support diverse applications.

Study Areas
Figure 1 provides an overview of this study's focus areas within north-central Oklahoma and south-central Kansas.This is a location where SMERGE exhibited robust performance [20] making it an ideal candidate to explore downscaling.Rectangular areas in Figure 1 reflect zones where SMERGE was downscaled.During era 1 (2016 to 2019), two regions (in red) associated with the United States Department of Energy Atmospheric Radiation Measurement (ARM) observatory were examined (ARM_1_1, ARM_1_2).The naming convention specifies Network_Era_Region where ARM_1_1 represents ARM era 1, region 1.Data from MOISST and Soil moisture Sensing Controller and oPtimal Estimator (SoilSCAPE) comprise era 2 observations (2012 to 2015), which are indicated in blue.Excessive missing data precluded the use of ARM during era 2. During era 3 (2003 to 2007; in black), ARM was divided into four distinct zones (ARM_3_1, ARM_3_2, ARM_3_3, and ARM_3_4).Table 1 indicates the ARM stations utilized in each region.The location of United States Climate Reference Network (USCRN; Stillwater 2W, Stillwater 5WNW) and Soil Climate Analysis Network (SCAN; Abrams) sensors are also indicated in Figure 1.El Reno, Meeker Table 2 provides an overview of the physical characteristics of these focus areas.Clay, silt, and sand values exhibit great variability in all areas.On average, all areas have a similar soil texture with near-equal clay, silt, and sand values representing an overall loamy Table 2 provides an overview of the physical characteristics of these focus areas.Clay, silt, and sand values exhibit great variability in all areas.On average, all areas have a similar soil texture with near-equal clay, silt, and sand values representing an overall loamy texture.Elevations generally define a moderate relief within the regions.ARM_1_1, ARM_1_2, ARM_3_3, and MOISST have average elevations of less than 400 m, whereas ARM_3_1, ARM_3_2, ARM_3_4, and SoilSCAPE have higher elevations.In terms of land use and land cover (LULC), herbaceous and cultivated crops dominate all examined areas except for ARM_3_3 (Table 2), which, instead of cultivated crops, has significant hay/pasture and deciduous forest.
Table 3 describes the meteorological conditions within the study areas.Values are generally consistent.Lower spring and fall temperatures were recorded during era 2, as represented by MOISST and SoilSCAPE, when compared against eras 1 and 3. A pronounced drying trend from east to west is present in the study area.This is reflected by the higher warm season (April to October) precipitation values present in ARM_3_3, which is further east than the other ARM era 3 regions.Additionally, during era 2, MOISST had higher precipitation values than SoilSCAPE, which is located to the west.

Methodology
This study's methodologies were implemented in five steps that include: (1) Gathering data; (2) selecting dates to use for validation; (3) executing machine learning downscaling; (4) evaluation of each model run using objective metrics; and (5) performance comparison between different models and spatial resolutions.

Data Gathering
RZSM data used in this study includes SMERGE version 2.0 (Table 4).The downscaling approach described below incorporated independent variables (Table 4) that are both static (soil texture, elevation, aspect, slope) and dynamic (Normalized Vegetation Difference Index-NDVI, albedo, Leaf Area Index-LAI, surface temperature) following the approach of [13].Note that a one-month lag was used for NDVI and LAI, consistent with the approach of [20].The pre-processing of the above datasets was conducted in ArcGIS Pro, where values were extracted into points with a 30 m resolution.The ArcGIS AggregatePoints function was used to aggregate points into grid files at the different resolutions examined in this study.
For evaluation, both in situ and AirMOSS data were used.In situ data from the ARM, SCAN, SoilSCAPE, and USCRN networks were gathered from the International Soil Moisture Network portal [22], and hourly results were converted into a daily overall estimate of soil moisture, between 0 to 40 cm using a proportional weighting scheme.For example, a common configuration at ARM was to have sensors at 5 cm, 15 cm, 25 cm, and 35 cm, resulting in a corresponding weight for each sensor of 18.75%, 31.25%,25.00%, and 25.00%, respectively.AirMOSS L2/3 RZSM retrievals using the University of Southern California algorithm were applied at MOISST as described by [21].

Date Selection and Data Organization
This study followed the approach of [13], which indicated that spatial variation is more important than temporal variation.With a focus on the warm season (April to October) dates were selected that were separated from each other by at least seven days.In situ data were also screened using the methods of [22], and dates with anomalous spikes and plateaus were rejected.For all in situ datasets, if there were fewer than ten available dates with valid measurements, then that site was not utilized.Also, if a site had multiple sensors, the sensor with the highest correlation with baseline or default version of SMERGE was used for analysis.In situ datasets with a correlation < 0.5 with default SMERGE were rejected.
At ARM, era 1 (2016 to 2019) and era 3 (2003 to 2007) were examined.Era 1 was divided into two regions, and era 3 consisted of four regions (Figure 1).At the ARM site, a total of 47 and 52 dates were examined during eras 1 and 3, respectively (Table 5).At the SoilSCAPE site, in situ estimates of RZSM were within a field scale cluster of sensors or nodes that spanned from 2012 to 2015 (era 2).This study focused on a small site near Canton, Oklahoma between latitude 36.000 to 36.003 • N and longitude 98.628 to 98.633 • W (Figure 1).Daily soil moisture measurements were obtained at depths of 4 cm, 13 cm, and 30 or 40 cm.These values were weighted to determine an overall estimate of RZSM in the top 40 cm.Seventy-nine dates were used for SMERGE downscaling (Table 5) at SoilSCAPE.Note that in situ data are missing for some of the selected dates.This was performed intentionally to provide more data to support ML analysis.Basically, the dataset used at SoilSCAPE was inflated temporally to compensate for this site's small spatial footprint.
The AirMOSS project (2012 to 2015; era 2) directly sensed RZSM estimates using the P-band radar, which operates at a low frequency (420-440 MHz).RSZM is estimated up to a depth of 40 cm.This study focuses on the MOISST site.This site is unique in that it affords a continuous comparison with the downscaled SMERGE estimate.This is unlike the discrete in situ measurements from ARM and SoilSCAPE.Observations focused on the 13 SMERGE grids that had greater than 75% coverage of AirMOSS retrievals [23] (Figure 2a) and span latitude 36.000 to 36.250 • N and longitude 97.000 to 98.125 • W (Figure 1).The 22 dates with acceptable AirMOSS retrievals from MOISST are listed in Table 5.During eras 1 and 2, in situ data from the Stillwater 2W and 5 WNW stations from USCRN and Abrams from SCAN were used for additional evaluation (Figure 1).Daily soil moisture measurements were obtained at depths of 5 cm, 10 cm, 20 cm, and 50 cm and were used to estimate RZSM between the surface and 40 cm depth using a proportional weighting scheme.During eras 1 and 2, in situ data from the Stillwater 2W and 5 WNW stations from USCRN and Abrams from SCAN were used for additional evaluation (Figure 1).Daily soil moisture measurements were obtained at depths of 5 cm, 10 cm, 20 cm, and 50 cm and were used to estimate RZSM between the surface and 40 cm depth using a proportional weighting scheme.

Machine Learning Implementation
Random Forest (RF) served as our baseline machine learning algorithm, which has been used to down-scale soil moisture estimates in many studies e.g., [12,13,[24][25][26][27].We also executed eXtreme Gradient Boosting (XGBoost) [12] and Gradient Boost (GBoost) [12].The dataset was randomly split (70% training and 30% testing) using the SKLearning train_test_split function, and rows that corresponded to ARM and SoilSCAPE in situ data sites were removed from the training set and inserted into the testing set for later data verification purposes.Experimentation between 400 to 1400 m for ARM, 400 to 3000 m for MOISST, and 30 to 100 m for SoilSCAPE was executed to determine the optimum spatial resolution for each era/region.The average independent values within grids at varying resolutions were obtained with zonal statistics in ArcGIS Pro.To implement the downscaling of SMERGE, we used TensorFlow Decision Forests' Random Forest implantation, Distributed (Deep) Machine Learning Community (DMLC)'s XGBoost, and Sklearing's Gradient Boost (GBoost); all of these were set to run as regressors.The hypertuning parameters/settings used for these ML models are specified in Table 6.Hypertuning for three models was conducted via iteration method, where a range of tuning values were given and ideal parameters were found over hundreds of runs.For the TensorFlow Random Forest model, the tuner helper function was included in the iterations.
eXtreme Gradient Boosting (XGBoost) Independent variable sensitivity was examined with metrics customized for each model, and the results of this analysis are summarized in Table 7. Two approaches were used to gauge model sensitivity because of the different model structures.RF sensitivity was gauged using TensorFlow's Inverse Mean Minimum Depth (IMMD).The mean IMMD for each independent variable was calculated by tracking the depth of the first occurrence of a feature in each tree in the forest.The depth was averaged for every feature, with the inverse of it taken, resulting in higher IMMD values, reflecting greater sensitivity.For XGBoost and GBoost, independent variable sensitivity was evaluated using the interpretability model tool SHapley Additive exPlanations (SHAP).The reason we opted not to use SHAP for RF is that this tool is not compatible with the TensorFlow Decision Forest library.SHAP explains the individual variable contributions to the predictions made by the model.The higher the SHAP value, the higher the importance of that feature towards the given prediction.
H represents high sensitivity, M is medium sensitivity, and L is low sensitivity.
Because of the different tools used to evaluate sensitivity, a direct numerical comparison between IMMD and SHAP was not feasible.Therefore, sensitivity was based on relative ranking.For a given model and spatial resolution, the most sensitive variable was assigned 1, and subsequent variables were assigned a number based on their relative ranking, with the least sensitive variable being 11.Variables that ranked between 1 and 4 were deemed highly sensitive.Rankings between 5 to 8 were designed as moderately sensitive, and those variables ranked greater than 8 were considered to have a low sensitivity.Table 7 shows the average rankings between different spatial resolutions ran for a given model with the sensitivity indicated as high (H), medium (M), or low (L).

Model Evaluation
A comparison of default and downscaled SMERGE at different spatial resolutions was made against in situ data from the ARM and SoilSCAPE sites and SCAN and USCRN sensors.In addition, downscaled SMERGE was evaluated against AirMOSS data from the MOISST site.Standard evaluation metrics (correlation, r; unbiased root-squared error, ubRMSE [m 3 /m 3 ]) were utilized in these comparisons.Delta r and Delta ubRMSE were used to compare relative performance between default and downscaled versions of SMERGE and are defined as follows: where Downscaled SMERGE is the value obtained from ML and Default SMERGE is the value obtained from the original 12.5 km resolution product.Delta values represent a specific value from a model type and spatial resolution combination (e.g., RF, 400 m) executed within a given network/era/region (e.g., ARM_1_1).Delta r and Delta ubRMSE values provided insight into whether the downscaled model results exhibited improvement (or degradation) compared with the Default SMERGE product.Improvement for r is defined as having Downscaled SMERGE > Default SMERGE and for ubRMSE Default SMERGE > Downscaled SMERGE.

Comparisons within a Region/Era
To facilitate comparison across the different models and spatial resolutions examined, an objective metric was derived that combined correlation and ubRMSE metrics.The objective metric varies between zero and one and is defined as indicated below: If Delta ubRMSE > 0 Then b = 0 (5) Objective Metric = a + b (7) where maximum Delta r and minimum Delta ubRMSE are the maximum and minimum values respectively obtained within an era/region, where a is the component of the objective metric derived from Delta r, and b is the component of the objective metric derived from Delta ubRMSE.The objective metric value ranges between zero and one.A value of one indicates that a model type and spatial resolution combination had the best possible r and ubRMSE values within an era/region.Zero indicates that downscaling yielded no improvement based on both r and ubRMSE metrics.Models with an objective metric between 0 to 0.8 were described as slightly improved.Objective metrics between 0.8 to 1.0 were designated as high performing.If a downscaled model had improvement in only one metric, then this model was deemed as non-improved.This represents a model that recorded improvement in r but not ubRMSE or vice versa.

Results
Model sensitivity results are summarized in Table 7.The date and aspect independent variables generally have consistently high and moderate sensitivity, respectively.Elevation mostly exhibits moderate sensitivity, except for ARM era 3, where this variable has a high sensitivity.Albedo and NDVI have a higher sensitivity for RF compared to XGBoost and GBoost models.For albedo, the exception is in SoilSCAPE, which has a high sensitivity for all models.Also, during ARM era 1 and ARM_3_1, NDVI had a uniformly low sensitivity.LAI also generally had a higher sensitivity for RF compared with XGBoost and GBoost models.In SoilSCAPE, LAI sensitivity was uniformly high, and in MOISST and ARM_3_2, low sensitivity was recorded for all models.Conversely, temperature exhibited a lower sensitivity for RF models compared with XGBoost and GBoost, but during ARM era 1, a uniformly high sensitivity was noted for all models.Within ARM, slope, sand, and silt mostly had a higher sensitivity for XGBoost and GBoost models than that recorded by RF.In MOISST and SoilSCAPE, slope had low to moderate sensitivity without a consistent pattern between downscaling models, whereas sand and silt had uniform moderate sensitivity within MOISST and low sensitivity in SoilSCAPE.Finally, clay does not have a consistent relationship among models and generally has a moderate to high sensitivity.Model sensitivity was also examined as a function of spatial resolution.In general, sensitivity was consistent as a function of spatial resolution.The exception is the RF model at the MOISST site, where silt exhibited a marked increase in sensitivity at coarser spatial resolutions.
Figure 3 indicates the number of models with a specified performance from eras 1 and 2. A total of 12 models were executed from each ARM region (four using each ML approach).At ARM, during era 1, 63% of the executed models resulted in improved performance (Figure 3a,b).Of these, 17% were high performing and included all three ML approaches.MOISST and SoilSCAPE (era 2) outperformed ARM (era 1), which had a total of 18 and 6 models executed, respectively.For MOISST, 100% of the downscaling attempts yielded improvements compared to Default SMERGE (Figure 3c).At MOISST, 44% of the models were high performing with a slight preference for RF over the other model types.Figure 2b shows that the spatial distribution of Downscaled SMERGE is similar to AirMOSS data (Figure 2c) from MOISST.At SoilSCAPE the same was true except for a single attempt, resulting in an 83% improvement rate (Table 8).However, only one RF model (17%) from SoilSCAPE was high performing.Figure 4 summarizes the results from ARM era 3, with only 37% of downscaling attempts yielding improvements.In ARM_3_4 not a single model yielded an improvement, compared with Default SMERGE.ARM_3_1, ARM_3_2, and ARM_3_3 had success rates of 42%, 50%, and 58%, respectively.Overall, high performing models from ARM era 3 was only 10%. Figure 5 and Table A1 illustrates ARM era 1 results as a function of spatial resolution.In ARM_1_1, RF outperformed the other ML approaches.Optimal performance for RF was recorded at 400 m with declining objective metrics obtained at coarser spatial resolutions.XGBoost and GBoost yielded inconsistent results, and these approaches recorded nonimprovement at 700 and 1000 m.When compared against the SCAN Abrams site, none of the models yielded an improvement.Conversely, all RF models were non-improved in ARM_1_2.XGBoost and GBoost exhibited erratic behavior as a function of spatial resolution with best results obtained by XGBoost at a 400m resolution.Interestingly, comparison with USCRN data from the Stillwater sites within ARM_1_2 yielded the opposite with improvement realized with only RF and not XGBoost and GBoost models.
Figure 6 and Table A2 depict the results from MOISST (Era 2).RF recorded the highest objective metric at a 400 m spatial resolution.Examination of an in situ sensor from the USCRN Stillwater 5 WNW site also had RF outperforming the other methods.RF exhibited a different trend in terms of performance as a function spatial resolution compared against XGBoost and GBoost.Maximum objective metrics for RF were noted at 400 and 3000 m with lessened performance at the resolutions between 700 to 2000 m.Conversely, XGBoost and GBoost had maximum objective metrics around 1400 to 2000 m with lower values at other resolutions.SoilSCAPE (Table 8), also from era 2, recorded a similar preference for RF.It is noteworthy to indicate that the optimal spatial resolution for all era 1 and 2 regions was ≤700 m.       8), also from era 2, recorded a similar preference for RF.It is noteworthy to indicate that the optimal spatial resolution for all era 1 and 2 regions was <700 m.During ARM era 3, improvements in downscaled results were noted in ARM_3_1, ARM_3_2, and ARM_3_3 (Figure 7; Table A3) and the model that yielded the highest objective metric varied.In ARM_3_1, GBoost had the highest objective metric at a spatial resolution of 700 m and in ARM_3_2 XGBoost, at a 1400 m resolution, yielded the best results.Note that ARM_3_1 and ARM_3_2 had only one high performing model per re- During ARM era 3, improvements in downscaled results were noted in ARM_3_1, ARM_3_2, and ARM_3_3 (Figure 7; Table A3) and the model that yielded the highest objective metric varied.In ARM_3_1, GBoost had the highest objective metric at a spatial resolution of 700 m and in ARM_3_2 XGBoost, at a 1400 m resolution, yielded the best results.Note that ARM_3_1 and ARM_3_2 had only one high performing model per region (Figure 4).ARM_3_3 differed in that RF yielded three high performing models, with the highest objective metric at 1000 m.For ARM_3_4, no downscaling results yielded an improvement over Default SMERGE.In general, for era 3 the optimal spatial resolution was ≥700 m.Also, in all three regions the general trend was for the objective metric to increase as spatial resolutions becomes coarser.

Discussion
Overall, improvement in downscaled SMERGE across all eras, regions, and models ranged from 0.03 to 0.42 for Delta r and −0.0005 to −0.0118 m 3 /m 3 for Delta ubRMSE (Tables 8 and A1-A3).These results are comparable to results from previous ML downscaling efforts.Ref. [15] downscaled SMAP on the Iberian Peninsula using RF, yielding an increase in correlation of 0.31 and a decrease in ubRMSE of 0.026 m 3 /m 3 compared to SMAP at its native resolution.Reference [14], from a study in China, yielded an increase of 0.1 to 0.2 for correlation and a decrease of 0.01 m 3 /m 3 for ubRMSE compared with original land surface model data.Reference [11] also noted an improvement in correlation by 0.1 for RF generated soil moisture, compared with the default ESA-CCI product.[16] used RF to develop the 1 km resolution ChinaCropSM estimate of soil moisture that had a correlation value of 0.93 compared to a 0.35 for ESA-CCI.ubRMSE also recorded a dramatic improvement, from 0.093 to 0.033 m 3 /m 3 in this product.
There are clear differences in performance between eras facilitated by comparison of ARM data between eras 1 and 3.During era 1, more than half (15 out of 24) of the downscaling models yielded improvements (Figure 3a,b).Conversely, during era 3 only slightly more than a third of the models (16 out of 48) were improved (Figure 4).A similar trend was noted for high-performing models.For era 1, 17% of downscaled models were high performing compared with the only 10% high performing models for era 3.During era 1, deeper penetrating L-band retrievals, with increased accuracy, were available from the SMOS [2] and SMAP [28] missions and included in the ESA-CCI product that forms

Discussion
Overall, improvement in downscaled SMERGE across all eras, regions, and models ranged from 0.03 to 0.42 for Delta r and −0.0005 to −0.0118 m 3 /m 3 for Delta ubRMSE (Tables 8 and A1, Tables A2 and A3).These results are comparable to results from previous ML downscaling efforts.Ref. [15] downscaled SMAP on the Iberian Peninsula using RF, yielding an increase in correlation of 0.31 and a decrease in ubRMSE of 0.026 m 3 /m 3 compared to SMAP at its native resolution.Reference [14], from a study in China, yielded an increase of 0.1 to 0.2 for correlation and a decrease of 0.01 m 3 /m 3 for ubRMSE compared with original land surface model data.Reference [11] also noted an improvement in correlation by 0.1 for RF generated soil moisture, compared with the default ESA-CCI product.[16] used RF to develop the 1 km resolution ChinaCropSM estimate of soil moisture that had a correlation value of 0.93 compared to a 0.35 for ESA-CCI.ubRMSE also recorded a dramatic improvement, from 0.093 to 0.033 m 3 /m 3 in this product.
There are clear differences in performance between eras facilitated by comparison of ARM data between eras 1 and 3.During era 1, more than half (15 out of 24) of the downscaling models yielded improvements (Figure 3a,b).Conversely, during era 3 only slightly more than a third of the models (16 out of 48) were improved (Figure 4).A similar trend was noted for high-performing models.For era 1, 17% of downscaled models were high performing compared with the only 10% high performing models for era 3.During era 1, deeper penetrating L-band retrievals, with increased accuracy, were available from the SMOS [2] and SMAP [28] missions and included in the ESA-CCI product that forms the backbone of SMERGE.Another consideration is the completeness of the independent variables and validation datasets.Missing albedo and LAI data is present during era 3 (Table 9).Note that spatial averaging minimized the impact of these missing datasets at coarser resolutions (1000 to 1400 m).Also, albedo and LAI, have a higher sensitivity within RF models compared with XGBoost and GBoost (Table 7).This could explain the relative underperformance of RF during ARM era 3 compared with other models.RF had a success rate of 19% during era 3. XGBoost and GBoost had higher success rates of 44% and 50%, respectively.Another dimension to examine is the increasing interpolation within the SMERGE product during the earlier eras (Table 9).The ESA-CCI product had some missing daily data in the product that was estimated by interpolation within SMERGE, see [17].During eras 1 and 2 the degree of interpolation is relatively low (1 to 17%) unlike era 3 (24 to 36%).Interpolation can produce uncertainties within SMERGE estimates that get propagated to the downscaled version, possibly contributing to the poor performance of SMERGE downscaling during era 3. Finally, the spatial representativeness of the in situ data is different between eras 1 and 3 within the ARM regions (Figure 1).The coverage of in situ stations with acceptable data (correlation with Default SMERGE > 0.5) is greater in era 1.Interestingly, the region with the best in situ coverage is the small region ARM_3_3, which recorded the best performance of all ARM regions during era 3.
As notable as the above ARM comparisons are, they are still based on sparse in situ data, which [29] indicated can be problematic when providing validation for coarseresolution satellite-based soil moisture products.The spatial variability within a grid, even for a downscaled product, may not be represented by a point in situ measurement.A spatial mismatch exists between a grid mean and sparse in situ sampling that can increase uncertainty and produce spurious errors within the downscaled product.This sampling issue is negated for the era 2 sites where MOISST RZSM retrievals were collected over a continuous extent during the AirMOSS campaign.Additionally, SoilSCAPE is a small site, less than a square kilometer, with a cluster of 21 sensors.As such, the validation at these sites can be considered a best-case scenario as reflected by an improvement rate of 100% and 83% for MOISST and SoilSCAPE, respectively.Of particular note are the improvements seen at the 100 m resolution at SoilSCAPE, suggesting that under ideal circumstances downscaling to field scale resolutions is feasible.
Spatial resolution trends are different between eras and models.RF at ARM_1_1 and MOISST had a maximum objective metric at 400 m with declining performance at coarser spatial resolutions out to 1400 m (Figures 5 and 6).Conversely, the highest objective metric for XGBoost and GBoost models at MOISST was at 1400 to 2000 m (Figure 6).During ARM era 3, all models recorded increasing performance with coarser spatial resolutions (Figure 7).During this era, incomplete albedo and LAI likely hampered model execution at finer spatial resolutions (400 to 700 m).
This study has provided valuable insights for the future development of a regional downscaled version of SMERGE.This has implications in that SMERGE is a long duration RZSM product that provides a retrospective estimate of this variable unlike SMOS and SMAP that are limited temporally (post-2010).Therefore, this work lays the groundwork for the development of a long-term field-scale estimate that can support diverse user communities.The new downscaled product will focus on the United States Southern Great Plains where the Default SMERGE performance was the best [20].This work is not intended to provide a comprehensive validation.Instead, its goal was to determine possible temporal coverage, spatial resolution, and ML model to be used to develop the downscaled product.Additional ranked correlation analyses will be applied to fully validate downscaled SMERGE.Inclusion of these techniques here is beyond this work's scope.

Conclusions
This study successfully downscaled SMERGE, focusing on the warm season (April to October) in the Southern Great Plains (Oklahoma and Kansas).More robust SMERGE downscaling results were yielded during eras 1 (2016 to 2019) and 2 (2012 to 2015), where RF produced optimal results at ≤700 m.Improvements in the downscaled SMERGE at the 100 m resolution at SoilSCAPE were particularly noteworthy.These results suggest that SMERGE can be successfully downscaled to the field scale with the advent of L-band microwave retrievals after 2010.However, some caution regarding this conclusion is warranted, given the small area of the SoilSCAPE site.During era 3, downscaling efforts were less successful for several reasons.Optimum model and spatial resolution were less consistent across the ARM era 3 regions, but in general, they exceeded 700 m. Results from this study straddles existing (Sentinel 1 & 2) and planned NASA ISRO Synthetic Aperture Radar (NISAR) mission's capabilities.In addition, downscaled RZSM from long duration products like SMERGE can support more robust hydrologic and ecologic modeling and drought monitoring than surface satellite SM estimates.

Figure 1 .
Figure 1.Locality map with era 1 sites in red, era 2 in blue, and era 3 in black.Upper-right inset map shows the study area location in Kansas and Oklahoma.Locations of ARM era 1 and era 3 in situ sites are indicated by red open circles and black circles, respectively.The Marena Oklahoma Soil Moisture Active Passive In Situ Testbed (MOISST) site is a solid blue rectangle and the Soil moisture Sensing Controller and oPtimal Estimator (SoilSCAPE) site is indicated by a blue triangle.Other in situ data from United States Climate Reference Network (USCRN) Stillwater sites are squares with a blue halo (eras 1 and 2) and the Soil Climate Analysis Network (SCAN) Abrams site with a red square (era 1).

Figure 1 .
Figure 1.Locality map with era 1 sites in red, era 2 in blue, and era 3 in black.Upper-right inset map shows the study area location in Kansas and Oklahoma.Locations of ARM era 1 and era 3 in situ sites are indicated by red open circles and black circles, respectively.The Marena Oklahoma Soil Moisture Active Passive In Situ Testbed (MOISST) site is a solid blue rectangle and the Soil moisture Sensing Controller and oPtimal Estimator (SoilSCAPE) site is indicated by a blue triangle.Other in situ data from United States Climate Reference Network (USCRN) Stillwater sites are squares with a blue halo (eras 1 and 2) and the Soil Climate Analysis Network (SCAN) Abrams site with a red square (era 1).

Figure 5 .
Figure 5. Objective metric as a function of spatial resolution for the ARM Era 1. ARM_1_1 represented by squares and ARM_1_2 by triangles.In situ comparison with the USCRN Stillwater sites is indicated as a star.Only objective metrics for improved models were plotted.Blue indicates RF, red indicates XGBoost, and green indicates GBoost (also see legend).

Figure 6 and
Figure 6 and Table A2 depict the results from MOISST (Era 2).RF recorded the highest objective metric at a 400 m spatial resolution.Examination of an in situ sensor from the USCRN Stillwater 5 WNW site also had RF outperforming the other methods.RF exhibited a different trend in terms of performance as a function spatial resolution compared against XGBoost and GBoost.Maximum objective metrics for RF were noted at 400 and 3000 m with lessened performance at the resolutions between 700 to 2000 m.Conversely, XGBoost and GBoost had maximum objective metrics around 1400 to 2000 m with lower values at other resolutions.SoilSCAPE (Table8), also from era 2, recorded a similar preference for RF.It is noteworthy to indicate that the optimal spatial resolution for all era 1 and 2 regions was <700 m.

Figure 5 . 22 Figure 6 .
Figure 5. Objective metric as a function of spatial resolution for ARM Era 1. ARM_1_1 represented by squares and ARM_1_2 by triangles.situ comparison with the USCRN Stillwater sites is indicated as a star.Only objective metrics for improved models were plotted.Blue indicates RF, red indicates XGBoost, and green indicates GBoost (also see legend).Remote Sens. 2023, 15, x FOR PEER REVIEW 15 of 22

Figure 6 .
Figure 6.Objective metric as a function of spatial resolution for the MOISST.Colors are as indicated in Figure 5 (also see legend).

22 Figure 7 .
Figure 7. Objective metric as a function of spatial resolution for the ARM Era 3. Colors are as indicated in Figure 5 (also see legend).ARM_3_1 represented by squares, ARM_3_2 by triangles, and ARM_3_3 by circles.

Figure 7 .
Figure 7. Objective metric as a function of spatial resolution for the ARM Era 3. Colors are as indicated in Figure 5 (also see legend).ARM_3_1 represented by squares, ARM_3_2 by triangles, and ARM_3_3 by circles.

Table 1 .
United States Department of Energy Atmospheric Radiation Measurement (ARM) sites by era/region.

Table 1 .
United States Department of Energy Atmospheric Radiation Measurement (ARM) sites by era/region.

Table 2 .
Physical characteristics of examined regions.

Table 3 .
Meteorological characteristics of study areas during warm season (April to October).

Table 5 .
Dates with acceptable values for comparison.
Best model results are indicated in bold.

Table 9 .
Reasons for incompleteness in datasets.
Best model results are indicated in bold.
Best model results are indicated in bold.